Sometimes Simple Is Best
When faced with a potentially challenging task, a simple tool made it all possible.
Sometimes using a simple tool can make all the difference. While working on a recent project, one of my colleagues needed some data from the River Environment Classification (REC) dataset, developed by our good friends at NIWA. This is a really useful surface water layer with a number of different uses. Its original aim was to map all of NZ’s river reaches while also classifying them in a hierarchical schema which took into account six factors: climate, source of flow, geology, land cover, network position and valley landform.
By being able to group river reaches at a level of your choice, the results can allow you to identify reaches that have common characteristics, regardless of where they are. In other words, rivers may be geographically separated but still share common environmental factors. This may help with, say, finding the reaches that have characteristics beneficial for Canterbury mudfish populations or for the design of monitoring programmes. For many, just having a national scale layer of rivers is useful in and of itself, but there are times when the extra depth those attributes provide comes in very handy.
Case in point: in this project we’ve been looking at some river characteristics to support the design water quality monitoring programmes at specific sites (and oh, the (GIS) stories I could tell you…another time). Of particular interest for my colleague were the Climate and Source of Flow attributes in the REC data. And here’s where the problem arose.
There are currently two version of REC, oddly enough known as version 1 and version 2. Even odder, there are some subtle differences between the two, not least is that REC1 has 576,688 reaches while REC2 has 593,517. My colleague has been using REC2 reaches for his modelling and we quickly saw that the Climate and Source of Flow attributes weren’t included in REC2 – data he needed for some subsequent analysis. At first I though there would be no big problems here – I knew that each reach had a unique identifier and surely it’s the same in both versions, right? Alas, no. The IDs are close but are nonetheless different, so my initial thought of a simple table join wasn’t going to work. To make matters worse, the river lines aren’t in the same place between the two versions – in the image below, REC1 is in green and REC2 is in blue:
(Ed. Hmmmm…what’s with the jagged nature of those lines? And why don’t they match the basemap very well? C. I’ll get to that.)
So, how am I going to transfer the attributes from V1 to V2 without there being a unique ID between the two? This was starting to feel like it was going to be a big job (and of course they needed these data today!) After I gave it some more thought, though, a very simple solution presented itself; the image above helps to illustrate – I’ll zoom in a bit:
Each reach has been labelled with its unique ID – you can see they’re not the same and that they are in different places, even though they are representing the same reach of the Avon. What I’ve got going for me is that they are close to each other, so some proximity tools might help my cause. Prime candidate: Near. This handy tool does a very simple job – it calculates the straight line distance between two features.
While the distance is often very useful, what I’m really counting on is something that comes along for the ride. When the the Near tool runs, it adds two new fields to the input layer, NEAR_ID and NEAR_DIST. (NB: this is one of the few tools that makes changes to the input layer rather than creating a new output layer.) As you might imagine, the calculated distance is in NEAR_DIST but to solve this problem, the more valuable thing is the NEAR_ID – the ID of that nearest feature. If I know which feature from REC1 is nearest to its REC2 counterpart, I should be able to join the two together with a table join, et voila!
The theory sounds good – let’s try it. The next few images will need some explanation. You’ll see both versions – REC1 is in green and REC2 is in blue, as above. I’ve already run the Near tool on the REC2 layer so you’ll see the NEAR_FID value which has the FID (Feature ID) of the nearest REC1 reach. For this first example, it seems to have worked well:
No problem here. I’m doing a bit of spot checking to give me confidence that the output is doing what I need to do. On checking one of the other ones, it didn’t look so hot:
This one hasn’t worked so well – Near has identified a feature that’s nearby, but not the right one. I’ve got almost 600,000 reaches here and this makes me think I can’t rely on this working for every single reach. I was assuming (hoping) that the tool works off the centre of each feature but based on this I think it’s working off the first point that got created on these features. Quite likely this won’t work extensively, so back to the drawing board.
That idea of the feature centre got me thinking. If I convert my reach lines to points, maybe I could use those features with the Near tool. It’s an easy thing to do with the Feature to Point tool – and the point is at the geometric centre of the feature (which sometimes that means it’s not on the line). For the map extent above, my points look like this:
It’s not perfect, but I think I’ll get some more robust results out of this. So, rerunning the Near tool gives me the linkage between REC2 and REC1 on the point layers:
Next to last thing to do is a table join to bring the REC1 Climate and Src_of_Flow attributes together with the REC2 reaches (right-click on the REC2_points layer > Joins and Relates > Add Join):
(Don’t worry about the warning triangles – everything’s under control. Ed. Yeah, right)
It looks like we’ve now brought the REC1 attributes into the REC2 table – which should make my colleague very happy:
Last step. My colleague is not a GIS user. He’s much more of an R guy, but we still get along well. I know that he’d like to have a CSV (comma separated values) file to work with – easy peasy.
With the table open, click on the three horizontal bars at upper right and choose Export:
Then save the output to the project folder (not the geodatabase) with a “.csv” on the end:
Which gives me this to email off to him (with a bit of simplifying) and then sit back and bask in the glory:
Job done. I have to say that when I first got wind of this job, I thought I was in for a whole lot of pain, perhaps even needing a Python script. Happily, a simple tool ended up doing the job quite nicely, or at the very least, making it much easier to do. Sometimes, simple is best. I’d venture to go a step further and say that almost all of the time, simple is best. Or at least as simple as possible.