In this post we look at different strategies to approach an analysis plan.  (Updated 26 May with additional comments from a colleague.)

We talk a lot about analysis on the GIS Blog and there have been a lot of examples with specific tools and how a particular analyses werecarried out.  This is all (hopefully) good after-the-fact stuff, bearing in mind that a lot often goes into a workflow before we get to those final layers. I like to think about GIS as a problem solving tool but that works on two levels – first is how we approach the spatial aspects of a problem and how GIS can help and second is the GIS workflows themselves.  How does one know what to use when?

Along those lines I had a student ask a very good question the other day in class.  We were talking through various analysis options for a class project, when, with a bit of noticeable frustration, he said something along the lines of (and I’m paraphrasing), I know how to use these tools, but how do I figure out how to put it all together?

Fair question.  It’s kind of like learning a language, in many ways.  Vocabulary (tools) is one thing, but grammar (workflows) can be quite another – knowing all the words in the world makes no difference unless you can put them together in a meaningful way.  On top of that, there’s what just sounds right, for lack of a better word, and that’s the sort of thing that only comes with experience.

So in this post I’ll do my best to posit some strategies.  Here are the main headings:

  • Begin with the end in mind and work backwards
  • Raster or vector?
  • Use flow diagrams
  • Analyst, know they data
  • …And thy tools
  • Check, check and double-check
  • Know when to walk away

So, on with it then.  I’ll apologise at the outset that this is a long post with no pretty pictures.

Begin with the end in mind and work backwards

One really effective strategy I find myself using is starting at the end and working backwards.  This means knowing as clearly as you can what the end result should look like. Be as clear as you can – if it’s not clear, ask questions of your client/colleague/supervisor/lecturer (sometimes, you’re the client!)  For instance, we’ve recently started some work looking at identifying the best areas to plant hops in, so one end result would be a map showing the best areas.  Working backwards, I can surmise that I’m going to need to know what conditions hops do best in, which is likely to involve things like soil properties, rainfall, maybe slopes, probably temperatures and levels of solar radiation.  I’m no expert on hops, so I’m just making some (hopefully) educated guesses here in lieu of anything more specific.  I’d like to have some spatial criteria that allow me to narrow things down more effectively and I’ll need to rely on someone with some expert knowledge for that.  But assuming I can get that, going further backwards allows me to think about data.  Soils – we’ve got that in a variety of forms: S-Map and the Fundamental Soil Layer being the prime suspects, both vector polygon layers.  Slope?  Our 25 m DEM gives us coverage across the whole country, so tick.  There are several climate raster layers from NIWA that should give us most of what we need in that realm.

In between input layers and output layers I can see that I’ll need to somehow manipulate the inputs to match whatever criteria I’m given – so tool-wise, I’m anticipating I’ll be reclassifying my raster layers and perhaps adding new attributes to the vector layers and then at some point I’ll need to combine all my layers into one output layer, which could be either some map algebra or spatial joins.  Depending on my criteria, I’ll need to think carefully about how I combine the layers (and how they are coded) to get a meaningful result, but this gives me a very rough picture of my workflow, though there’s still a lot of detail to be filled in.

So what I’m trying to illustrate here is that by starting with my endpoint, I think I’ve given myself a way to think though how I’ll get to that endpoint.  And already I’ve had to start thinking through my next strategy.

Raster or Vector?

As we saw above, I’m likely to have a mixture of vector and raster inputs.  This is quite common to most GIS projects.  At some point though, we’re needing to join layers together to get that final output, in which all my layers need to be either raster or vector, but not both.  This is a massive generalisation as some project don’t fit this mold, but I’d venture to say that a majority do.  It’s not like they’re matter and anti-matter that annihilate upon contact, but for most cases, vector and raster don’t mix.  So a decision has to get made at some point: am I doing a raster analysis or a vector analysis?

This is a big question, worthy of its own post, but I’ll hit the highlights here.  Sometimes the answer is very clear:

  • If measuring distance or area is critical, then vector is usually the way to go.  Even with high-resolution (i.e. 1 m resolution) raster data, we still can’t measure distances and areas as precisely as we can with vector features;
  • If modelling things like best routes through a linear network (e.g. roads, rivers, fiber optic cables), then hands down it’s vector and network analysis;
  • Working with terrain and topography (slope, aspect, hillshades, viewsheds) cries out for raster.  It’s vastly superior for handling these as well as continuous data like rainfall, temperatures, air pollution levels, i.e. things we can measure everywhere.

This also ties in with knowing what you’re endpoints are a bit.  There are a few other things to consider on this decision point.  In many cases, a given analysis can be done in either vector or raster; the two approaches tend to differ on the level of complexity (i.e. number of steps) and processing time so, often, it depends (something my students tire of hearing me say – it’s my stock response).

In the hops example, I’m anticipating that most of my input data are raster in their raw form.  It’s feasible to convert data between the two models – noting that we’re almost always sacrificing something, be it resolution, or continuity.  There are suites of tools for converting vector to raster and vice-versa.  One direction is fairly straight-forward, the other has potential fish hooks so approach carefully.  With the hops, most of my data at this early stage are raster, so I’m already leaning towards that as my medium, knowing that it will mean converting my polygons to rasters.

Another related consideration is analysis extent and its effect on processing time.  When comparing the two data models, rasters are very simple compared to vector (polygons especially).  In another post we’ll see how rasters are basically matrices and are easily represented as simple text files.  If we look at, say, a vector polygon layer, my data have to keep track of all the points that make up the corners of the polygons, and all the lines that connect them as well as all the attributes.  When we do spatial joins, we usually end up creating a lot of new polygons.  That takes processing time and more computational grunt than a raster calculation.  As you might imagine, as the extent of the analysis increases, the respective processing times also increase, but they increase much more for vector that for raster.  As a very general statement, raster analysis is generally faster than vector (but it depends…) hence the old, hackneyed statement: raster is faster but vector is correcter (sic).  (Here too).   As the extent of the analysis gets larger (and assuming I could use either model), I lean more towards raster for this reason.  Doing analysis at a national scale is often a good reason to go with raster (but not alwasy).  As processing power becomes faster and faster, this becomes less of an issue, but it’s still relevant for most of us.

So with the hops, I’m thinking raster. (Ed. couldn’t you have just said that earlier?  Geez…)

Use Flow Diagrams

Now that I’ve got a slightly better picture of what I’m doing, I might use flow diagrams to help me think through the overall workflow.  I might use ModelBuilder in Pro, or could just as easily use the back of a napkin.  The important thing is trying to think through everything into an organised, coherent workflow.  Using something graphical helps me see potential pitfalls or may suggest ways of doing things more efficiently.  This works for me personally, but may not be effective for everyone.  Flow diagrams work well on at least two levels – one they can be a nice planning tool a t the start of a project and then, at the end of the analysis, they can serve as a nice, graphical summary of what you actually did.

Which brings up an important point about flow diagrams – the first one seldom matches the final one.  And that’s okay.  More often than not, my original plan changes as I get to know the data better and the possible tools better.  Here’s a good example of typing out loud to figure out the best tool to use in one case.  Flow diagrams are like essay outlines; they tend to change as you get deeper into the task, which is usually a good thing.  At the early stages its helpful to give yourself a place to start, knowing that it will probably change.  How many well planned trips have you taken that ended up being completely different from what you started with?  (And were they better for it?)

Analyst, know thy data

Have you ever went to put petrol in the car (assuming you’ve got one of those carbon spewing internal combustion engines – heathen) and realised you were holding the diesel hose?  Happens to the best of us.  Data are fuel that make the GIS engine go, so it pays to spend time making sure you’ve got the right data.  And that means being conscious of the spatial features (are they in the right place?  Are they are the right scale for your analysis?) and the attributes (Do the attributes give you the information you need?  If not, can you some how add it in?).  Without good quality data going into your analysis, you can only get poor quality outputs.  The Red Cross screens all its blood donations – you should screen your data.  Be aware of data on the J: drive as well as the online data portals or data you may have collected yourself.

…And thy tools

You could easily think of GIS as a craftsperson’s toolkit.  It’s just jam packed full of things, 90% of which you might never need.  But quite often, there’s a tool for the job.  Finding it can sometimes be a challenge.  Keyword searches in both the Geoprocessing pane in Pro as well as every analyst’s best friend, Google, can often put you on track to potential tools to use.  But it’s got to be the right tool for the job.  Trawl through the tools and toolsets built into the Geoprocessing Toolboxes and see what you find.  Chance favours the prepared mind.

I find myself often referring to GIS as a craft.  Effective craftspeople know their tools and they know the material they work with intimately, along with the limitations and the opportunities they open up.  Of course, time and experience all contribute to crafting, so give yourself a break now and then.  It just takes time to get familiar and comfortable with all the options.

Check, check and double-check

So we’ve talked a lot about planning, filling in all the gaps between input data and outputs.  The doing comes next.  One of the most effective strategies I’ve developed over the years is to constantly be checking and double-checking my outputs at all stages.

Sometimes this is anticipating the results before I run a tool and then checking if the output’s what I was expecting.  If not, well, either I was wrong, or something wasn’t set properly (usually it’s the former…).  Either way, I may pick up an error before it propagates through the rest of my analysis.  There’s nothing worse than standing in front of the PowerPoint slide with your output map at the conference and someone saying, “ummm…that’s wrong….”.  Give yourself opportunities to check the outputs and ensure that 1) they make sense, and 2) they’re correct.  We don’t always know what the correct answer is, but the more you can check things as you’re going, the more likely it is that they are correct.

My last strategy:

Know When to Walk Away

Anyone who’s worked with GIS knows how frustrating it can be.  Sometimes, it’s best to close log off the computer and just walk away, go outside, gets some air, think about something else.  I have long maintained a dysfunctional love/hate relationship with GIS.  And it’s often quite frustrating to realise that more often than not, it’s done exactly what I told it to do and the mistake was mine and my expectation.  (That’s especially true of coding.)  Sometimes banging your head against that digital wall does more harm than good.  GIS is frustrating at the best of times but OneDrive has certainly complicated things recently.  I am not a big fan.

So, that’s a bit of a brain dump about doing GIS of some of the strategies that seem to work for me.  But I’m still learning, so I wouldn’t call it the definitive list.  Have you got something that works particularly well?  Drop in a comment and share it.

I’m presuming I’ve lost all my readers by this point, happily sleeping away, so if you’ve persevered to the very last full stop, let me know and I’ll shout you a Moro bar (Terms and conditions apply.  On road charges not included.  Only open to New Zealand citizens whose surnames begin with Q.  And if you order by midnight tonite…!)

C

Postscript: I ran this post by my colleague from Manaaki Whenua, James Barringer, from whom I’ve learned a lot about GIS.  He had some good comments that I thought would be worthwhile, so below are his lightly edited thoughts, used with his permission.

You missed one of my favourites for problem solving – “Divide and Conquer” – sort of comes under flow diagrams – but break complicated analyses down into more manageable sub-tasks.

Don’t entirely agree about raster and vector – I think you can work with both very easily now along the way – but your final product needs to follow your advice about picking one or the other depending on needs.

The end in mind bit – absolutely – call it what you like but you need to know where you’re trying to get to, in order to get there – look up what the Cheshire Cat said to Alice in Wonderland when she asked him the way.

For the record:

“Alice asked the Cheshire Cat, who was sitting in a tree, “What road do I take?”
The cat asked, “Where do you want to go?”
“I don’t know,” Alice answered.
“Then,” said the cat, “it really doesn’t matter, does it?”

I usually paraphrase the last line to something like – “if you don’t know where you’re going you’re hardly likely to get there”.

In that section you also say you’re “no expert on hops” but to me the real skill required here is knowing/learning enough domain knowledge to have a useful conversation with your client and come away knowing/understanding what they want.  If you can’t do that you are a GIS tech/programmer who needs detailed technical instructions about every step.  Basically a whizz at taking a workflow and implementing it, but unable to know if the desired outcome was achieved.  Very hard for you to check results because you won’t have a gut feeling for what the outcome should look like.  If you can do this you can think of yourself as a GIS Analyst who can solve real problems.

Same thinking as you say applies to data and tools – critical that data and tools are fit-for-purpose which means understanding both the task and the data properly.  Garbage in – Garbage out applies.  But with the added dimension that you can have good data put through an inappropriate analysis, or inappropriate data put through an appropriate analysis.  An understanding of scale, resolution, accuracy, precision, units of measure (nominal, classified, interval, ratio) and what these all mean in the context of your analysis is critical.  A lifetime of learning to be had in there! 

Cheers,  JB

Thanks James!