A cornucopia of vector tools help with some marking

While both students and academics are celebrating the end of another semester, it may well be a bit short-lived. Poor students have exams to cope with and poor academics have scads of marking to endure. One of the GIS courses has recently done a project on sea level rise impacts, and as I set out to get the marking done, it was clear that I was going to have to have do some analysis to help make that as painless as possible.

Let me explain: for this project, we were looking at a rise in sea level of 1.0 m (as projected by the CCC by 2051) plus an additional 1.08 m to account for high tide effects (as outlined by NIWA), for a total of 2.08 m above current sea level. As a first part of this, I used a LiDAR DEM to identify all those areas up to that height, ran some filters (Focal Statistics, for those playing at home) to weed out some of the very small (but numerous) unconnected areas and then converted that to a polygon layer (called SLRZone), as shown below:

Are study area extends from the Waimakariri River, all the way around Banks Peninsula and out to Akaroa. Rather than look at all these areas, I chose to break things down so students could look at the impact for a selected suburb, chosen from a layer called AffectedAreas, here shown with purpley boundaries (these are the SA2 areas from the 2018 census via StatsNZ):

With these two layers, students can then pick one of the suburbs and then for each, create a bit of an inventory of the roads, buildings, landcovers and people affected – classic GIS stuff, yes?

For most, that meant selecting a suburb, exporting it to a new layer, then using that to Clip the SLRZone layer, thus giving them the flooded extent within each suburb. This can then be used to clip out each of the important features from the right data layer and total up the important elements.

When it came to marking, things were a bit different, as I will need to know the “correct” values for each suburb so I can check their results. Rather than do each one individually, I wanted to optimise my time by doing this analysis for all suburbs at once. Along the way, I ended up using a range of some of the more useful vector tools, so I thought it might be helpful to cover them here. There are the key steps I’ll cover:

  • Knowing which flood area goes with which suburb
  • Preparing layers for each of the four factors
  • Making it easy to check results

Knowing which flood area goes with which suburb

On the map, I can see each suburb and the extent of flooded area within. But these are still two separate layers – one has the suburb details (name being the most important) and the other that just has the extent of the flooded area. I’d like to have both in one layer, so an Intersect spatial join sounds like a good way to go. I’m calling this layer AffectedAreasSLRZone (in blue outline below) and deleted any unneeded attributes:

This doesn’t look terribly different from my SLRZone layer, but the key is in the table – each layer is now broken down by suburb (SA2 area), so if I select one record in the table, I can see that its features are contained totally within that suburb – a simple but critical step for everything that follows:

Preparing layers for each of the four factors

Using the layer from above, I can extract what I need for roads (total length), buildings (number), landcovers (type and area in m2) and population counts (number within SLRZone).

*Roads

Using a road centerlines layer from the LINZ Data Service, I first Intersected this with AffectedAreasSLRZone (Ed. couldn’t you have come up with a shorter name?) but I’m not done yet – in the result, I’ve got all the roads (multiple) within each suburb.

See how there are multiple entries for Akaora? And a total of 976 features (rows)? To make things a bit easier to manage, I ran the Dissolve tool on this using the suburb name:

In the output from this tool, anything with an SA22018_1 value of Akaroa will be compressed into one feature made up of multiple parts. Since this is going to be in a geodatabase, I automatically get a Shape_Length attribute. Here’s the output, with the unnecessary attributes removed:

With this layer I’ve got the total length of roads in each suburb just by looking at Shape_Length. And now there are only 40 records, one for each suburb.

*Buildings

I probably could have done something similar to the above for buildings, but there’s a much better tool to use for this: Spatial Join. This tool is a bit of an unsung hero in my book. Amongst other things, it does a nice job of counting features within polygons. So with my LINZ building outlines layer (downloaded from the LINZ Data Service):

Notice here that I’m using AffectedAreasSLRZone as the target and nz-building-outlines for the Join Features, so for each AffectedAreasSLRZone polygon, it’s going to count the number of buildings within it’s boundary and put that in a new attribute called Join_Count. The Join Operation requires a bit of thought but one-to-one is definitely the way to go here (output again simplified in the table):

Again, only 40 records so I know how many buildings (Join_Count) each suburb has. That’s two done.

*Landcover

Our tried and true friend in the Landcover Database serves as our base data here. I could use either Intersect or Identity here and used the latter for some reason, unknown to me now, but since they’re both polygons, the result should be the same:

In my result I’ve got a bit of a double problem – not only do I have multiple entries for suburbs, there are also multiple entries for the different landcovers:

For example, in Aranui, Urban Parklands/Open Space occurs at least five times. No worries – Dissolve can work at multiple levels, so for each suburb, I want it to compress all the Name_2018 entries that have the same value:

I’ve added in a request for a new attribute that Sums up the Shape_Area values. The output now breaks down each suburb with the unique landcover types and I get the Shape_Area for free:

*Populations

Possibly the trickiest one, only because we’re working with spreadsheet data from the census, and hence, some table joins.

So far we’ve been working with the SA2 data to give us our suburb boundaries. We could grab the population counts for each SA2 area from the right census spreadsheet, but that will only give us the entire population of each suburb with no ability to look at a finer scale. For that we need the SA1 data (in green below).

With a bit of behind the scenes table joining, the population values were added. Next up, an Identity between AffectedAreasSLRZone and my SA1 population layer (SA1AffectedAreas_Pop) giving me Pop_Identity:

(Intersect would also have worked here, by the way.) Now I’ve got my populations (C18_CURPop) within each suburb’s flood area, but again, multiple records because of the finer scale SA1 polygons. I could use Dissolve here so long as I ask for it to Sum up the populations.

If not using Dissolve, there are two other options here: Summarize Within or Summary Statistics. The first is going to be a bit fiddly so I went with the second option. This will give me an output table, rather than a layer, so I’m already thinking about what to join this to.

Setting the Case Field forces it to summarize by the suburb name:

Note: 40 records and I’ve got the name in SA22018_1. What next? I’ve got my choice of layers to join this to and this time I’m going to join it to my AffectedAreas layer (you’ll see why in a sec).

Almost done – by this stage I have all that I need to start marking (apart from courage and coffee, oh and some chocolate and Spotify). Last step is to set this up in a way that streamlines it all. What I’ve managed to do here is to relate everything to a suburb name, and therein lies my advantage at the last stage.

Making it easy to check results

If you’re still with me (and you can be forgiven for toddling off to bed by now), I’d like to be able to quickly check the key values for each of my four factors for a selected suburb. As I read through a report, the first thing I’m looking for is the chosen suburb. Once I know that, I can go into the AffectedAreas table and select that record manually and click on Zoom To. Let’s stick with Avondale:

Notice that I can see the affected population straightaway. I’ll then use this selected record to select all the others in the other three layers using Select by Location:

When Apply is clicked, the right features in the other layers get selected – by showing just the Selected records, I get to see all the values I need to check (I used Within as the Relationship to ensure neighbouring polygons weren’t also selected):

This did actually help a lot by minimising the time I have to look at the data to check results. I’d say it took a good hour and a bit to set things up, but once done, two clicks got me what I needed.

While this post has been about trying to do things smarter (Ed. You?), it’s really been about using some helpful vector tools, including Intersect, Identity, Spatial Join, Clip, Summary Statistics, Dissolve and table joins, all in one analysis package. And while I certainly don’t enjoy marking (who does?), this did make things easier. So take away what may be useful for making your own life easier.

Maybe giving up GIS all together would be a good start. But where’s the fun in that?

C