Quality assurance is crucial when creating new data, especially with geocoding.  We cover some of the frustrations of this process and some useful tips (definition queries and label scale ranges) to make it a bit easier.

I can safely say that over the past few days I have been almost everywhere in New Zealand: from Invercargill to Cable Bay, from Te Araroa to Westport, and boy is my mousing finger tired.  I’ve visited no less that 948 locales and, most sadly of all, never even left my office.  Lest you think that GIS is all glitz and glory (I sadly had to turn down the invitation to the Meghan and Harry’s wedding to get this work done), this post will cover some of the drudgery of bearing the mantle of GIS analyst.

In previous posts we’ve covered some analysis around the proximity of tobacco retailers to secondary schools.  One of the key layers needed for that analysis was a point layer of tobacco retailers and late last year we got an updated spreadsheet of addresses that needs to be included into the existing layer.  Geocoding is the process of taking street addresses and converting them to points on a map.  It’s not a difficult to do, but the accuracy of those points on the map shouldn’t be taken for granted – not by a long shot.  Getting the points on the map is the easy part; the much harder part is assuring yourself that they’re correct.  Here’s a for instance.  Amongst the many points in these new data is one listed for 34 London Street, Lyttelton.  Now that’s right in my proverbial backyard (almost)!  Imagine my surprise when the geocoded point on the map ended up here:

Correct, it is London Street, but it’s the Richmond one rather than the Lyttelton one.  Points like this rang enough alarm bells that it was time for a serious quality assurance effort, which, sadly, meant going through each point: one by one…

I did quite a good job of putting this off for as long as possible, but eventually I knew I just had to knuckle down and get it done.  So I thought it through and tried to come up with a good systematic plan to approach this.  Let’s have a quick look at the geocoded layer’s table:

When geocoding through ArcMap, you get a “Score” attribute at the far left which gives you a sense of how good of a match it found.  Some of the above scores are at 100, which is about as good as it gets, while the lowest value visible above is 71.2.  Overall, the lowest score in this layer was 61.16.  Of the 948 points, 541 has a score of 100.  My hypothesis was that points with a score of 100 would probably be pretty close to their true location and decreasing scores would mean more work in getting them properly located (which turned out to be correct).  I tend to approach things by getting the more difficult things out of the way first (while I’ve still got some enthusiasm) and leave the easier tasks for later, so I sorted the points by ascending score and started working my way though.  This meant having the table open and selecting each point individually, then clicking on the Zoom to Selected button:

(I later discovered that I could do the same thing by double-clicking on the wee grey box at the left end of the record).  With some judicious zooming in, I could then see if the point was in the right place:

If it wasn’t, I can just shift it to the right place with the editing turned on.  Points could often be challenging to place accurately as the picture above illustrates.  I can clearly see address points for numbers 215 and 227, but where’s the best place for 217?

Sometimes it wasn’t obvious where the point should be, so I had Google Maps open at the same time (having a second screen was a godsend here):

More often than I care to admit, I also needed to switch to satellite view to confirm or even go so far as to look at Street View:

(Sidenote – this particular example is interesting because in Street View this Caltex station is clearly visible and we saw that it’s on the above Google Maps map view as well.  But here’s that same location in the Satellite view:)

With this process, I worked my way through the points until I got to the 100 scores.  Some of the sub-100 points were 10s or 100s of kilometres away from their true location and took some detective work to place correctly.  I then got a little sidetracked (pesky classes…) and had to leave it for a bit.  When I came back, the records were unsorted I couldn’t quite remember how I was working through so I resorted the points by ascending address and started working through from the 1s upwards (note to self: take better project notes next time).  I soon realised that I was starting to repeat myself and stopped to regroup.  With the points listed in ascending order of address, the scores were a mix of values.  I needed a way to just show the points with a score of 100 but keep the ascending address order.  And here’s where a definition query came in very handy.

I don’t think these have come up before but definition queries are useful ways of limiting which features are visible (on the map and in the table) in a layer.  It’s sort of like a query (well, it is, in fact, a query) but instead of highlighting records in blue, ArcMap only displays the records that match the query.  Let’s start with a quick view of the table with addresses sorted in ascending order – note the mix of scores at left:

To set up a definition query, open the layer Properties and go to the (surprise, surprise) the Definition Query tab:

Click Query Builder… to do just that:

OK > OK then filters the records and I’m just left with only those with a score of 100.  The other records are still there, they’re just not being displayed:

(And note that there are 541 of the 948 displayed.)  Now I can work my way through these points without having to revisit the ones I’ve already done.

Another useful thing was to label the points so I didn’t have to keep going back to the table to see where they should be.  Labelling is also done from the layer Properties:

You can set the labels and font sizes and colour and they’ll be displayed on screen (don’t forget to tick the “Label features in this layer” box):

After a while of doing this, I found the map getting a bit too crowded with all those labels and realised I only really needed them when I was zoomed in beyond a certain scale.  After a bit of trial and error I decided that I only needed to see the labels with scales greater than 1:6,000.  But rather than turning the labels on and off manually for each point, I changed a setting in the label properties so that they only display beyond that scale.  Back in the label properties, one can do this by clicking on the Scale Range… button:

Now, the labels only get turned on when I zoom in far enough:


So while all those things helped, there’s no getting around the fact that this is drudgery.  I will admit to having Another Brick in the Wall Part 3 going through my head from about point 800 on (hence the title of this post).  That said, it was important to get this right as so much depends on getting the retailer location correct.  The next stages of this analysis hinge upon accurately determining if a retailer is within 500 m of a school or not.  So if its position isn’t as accurate as we can make it, what’s the point?  (Sorry, didn’t mean to do that.)

Three main things covered in this post: first, GIS analysis is not always glitz and glory [Is it ever? – Ed.], second, definition queries can be useful ways of limiting what gets displayed in a layer, and third, setting the scale range for labels can be handy.

While we’re talking about scale ranges, the same thing can be done for the display of layers.  For instance, if you look at the Canterbury Maps site, different layers get displayed depending on the scale.  As you zoom in, more and more layers get turned on and vice-versa.  These layers have had their scale ranges set to only display above a certain scale.  This can be set in the Properties > General tab:

This is a good way of keeping your maps uncluttered at smaller scales.  If you’d like to have a look at a map set up with scale ranges, open up EnvironmentalDatabases.mxd in J:\Courses\ERST203.  As you zoom in more layers will get turned on.  Now if you don’t mind, I’ll get back to some Pink Floyd – I’m feeling a bit Comfortably Numb after all those points.