Using ModelBuilder to automate a repetitive task

No question – GIS can certainly be frustrating. And it doesn’t get any less frustrating as you get better; the problems just morph into more complex things. Lately, I’ve been wrestling with a particularly vexing problem that on the surface should have been easy but in practice made my hair even more grey. Let me see if I can set it up for you.

As I’ve been compiling some global datasets for a project, it became clear that we needed some global data on river catchments. In the world of hydrology, rivers (and river catchments) can be broken down into a hierarchy of scales, starting out at the extent of the whole river and getting down to the scale of individual river reaches, all nested within each other (i.e. they share boundaries). We’ve talked about similar hierarchies with rivers earlier. HydroBASINS is a dataset that provides catchment data as part of HydroSHEDS, that include useful data for rivers, lakes, wetlands, water treatment plants as well as waterfalls – great stuff. With HydroBASINS there are 12 levels, or scales, that provide different levels of detail, all freely available for download:

As you can see, rather than downloading the whole dataset, they’ve broken them down into continental scale blocks – this makes some sense as fewer people are probably needing the whole global dataset, but it does make my life difficult, as we’ll see below.

After downloading all 12 Standard sets and unzipping them, I end up with nine folders, each with 12 shapefiles for each level – here’s What the Africa folder looks like:

You can see that each folder has a similar name with things like “af” for Africa and “eu” for Europe, then the 12 levels for that region and above we’re looking at level 1 for Africa – basically the whole continent. As we go down through the scales, we see more and more detail – here’s level 4:

Each layer has a name like “hydbas_af_lev01_v1c.shp” where “lev01” denotes the level.

All up, there are 9 regions at 12 levels making 108 individual shapefiles. For these data to be more useful for my analysis, I really need 1 layer for each scale, so that task involves Merging all the data for each level together. I could do this manually but it cries out for some more fool-proof automation. After all, I’m doing the same thing over and over, nine times. This will involve a lot of pointing and clicking and I’m pretty likely to make some frustrating mistakes, so I set out to do this with ModelBuilder. It’s this sort of task that the thing was designed for so it’s either that or reverting Python. I’ve often run up against the limitations of ModelBuilder so I sort of wanted to see how far I could go with this before needing to use some custom code.

My life would be a lot easier if the data were organised differently. For instance, if I had all the regional files for each level together in one folder, it would be a snap to just add all the files into Merge and run it. But for better or worse, the data are grouped by region, meaning I’ll need to use some loops – I’ve got to set up a model that iterates through each folder, groups the level shapefiles together and then Merges them. Some psuedo-code might look like:

  • Look in a folder
  • Find the file that has “lev01” in its name
  • Add that to a list
  • Go into the next folder and do the same thing
  • When all the folders have been looked at, merge all the layers on the list

Sounds reasonably straight-forward…right?

Within ModelBuilder there is a range of iterators (things that help you do loops) each of which does a repetitive task differently:

The key (as I found) is picking the right one. With ModelBuilder you can drag and drop tools and layers onto the canvas, link them up and create an executable model. My first attempt to gather the files together was this:

All of my regional folders are in a folder called RawFolkders. Iterate Workspaces can look at a folder and use a wildcard to narrow the search for a layer by looking at the text name – I used “lev01” as all the level 1 shapefiles has this as part of their name:

Collect Values does just that – collects the names of the selected layers and then adds them all to a table called lev01. This worked okay and I got tables that listed all the right shapefiles:

So partial success here in building up the model.

For this model, I’ve got to “hard code” a few things in, specifically the “lev01” bit – and I’ve got to do that in both Iterate Workspaces and Collect Values. Right now, this model will only work on the level 1 shapefile, but thinking ahead, I’ve got nine more to do. Can I make it easier for me to run this on the other folders? One way to do that is to add a variable that is the key wildcard value – doing it this way, I only have to do it once and can then refer to it in the tools using inline substitution. Here’s what I mean – I’m adding a variable to the model and giving in the value of “lev01”:

Now, I can refer to that value in the tools using %String% – like this:

Looking good – this works!

Next, I want to add in another step – Merge the layers that have been collected, so:

This works BUT, the only output I get is a table – no spatial layer. So, lots of thinking, lots of gummie bears later, I finally figured out that what Iterate Workspaces works with is layer names, not actual data layers. So, once again, GIS did exactly what I told it to do, and it was me that had things wrong.

Not to worry – replacing Iterate Workspaces with Iterate Feature Classes works!

With this, I get what I need:

Hip hoorah! I’ve only been working on this for two weeks now… With the model as is, all I have to do is change the value of %String% and run it on the other folders with confidence, so only 11 more runs to do.

If I had the time, I would think about maybe nesting this model inside another one that can do the lev01, lev02, etc iteration for me (a model can only have one iterator in it), but that’s a job for another day.

I wasn’t sure if I could get ModelBuilder to do this and I’m pleased I finally did. These data are now available in J:\Data\HydroBASINS for your dining and dancing pleasure.. Now onto other fun stuff!

C