The Spatial Smoking Gun (part 1)
This is the first of two posts about a spatial analysis project related to adolescent smoking rates. In this one, I’ll talk about the problem and the analysis and in the second we’ll cover various ways that the results were shared.
Teen smoking…there can’t be anything good about it, can there? I’ve got a friend, Louise Marsh, who works at the Cancer Society Social and Behavioural Research Unit at Otago whom I’ve known for years (she used to work at Lincoln, in fact). She was telling me about her research one day related to ways of reducing adolescent smoking rates. Their unit was particularly interested in the effect of tobacco outlets close to schools and if there was a relationship to smoking rates amongst teens. Being a spatial analysis person, I suggested that GIS might have something to offer. She said, “GIS? What’s that?” (I get that a lot…) And so began a three year collaboration that’s delivered some useful information for efforts to reduce smoking. Most of the District Health Boards (DHBs) contributed data and they, as well as smokefree advocates, are our end-users.
So the gist of it is a question: is there a relationship between smoking rates in adolescents and the availability of tobacco near secondary schools? The spatial components revolve around proximity – how close are tobacco retailers to secondary schools across New Zealand? And how close is “close”? Assuming that secondary school students will be walking to get their fix, how many are within a reasonable walking distance from their school? To get started on addressing this question we first had to know where the retailers were. At the beginning of the research it became clear that there was no national database on tobacco retailers, so my colleagues at Otago first went out and collected those data. So here’s what they sent me:
An Excel spreadsheet of 5705 tobacco retailers, with name, address, suburb/city, type of outlet, etc. While this is a great start, it needed a bit of work before we could actually do any spatial analysis with this information. Primarily, we needed to convert the addresses to points on a map.
Geocoding
The technical term for this is geocoding – translating a street address to a set of mappable geographic coordinates. It’s an interesting problem – because addresses aren’t always easy to map. If someone tells you their address, you could use your mental map, or an actual map, or, increasingly, the map on your phone or in-car GPS to get you there, so they’re inherently spatial, but not necessarily mappable. For that, we need an x- and a y-coordinate that unambiguously places it on a map. Sounds easy, but it actually requires quite a bit of grunt behind the scenes. So when you type an address in to the search bar on Google Maps, the servers behind the scenes are using a pretty massive line layer of streets and addresses and then determining what latitude and longitude it’s at. Sounds simple, but there are a lot of devils in the detail (think multiple High Streets around the country, for example).
For our purposes we needed to translate the spreadsheet of addresses to points on the map, ending up with a layer that shows where each retailer is as well as having all the important attributes attached. We could have painstakingly found all the address on Google Maps and then digitise them all as unique points one by one, but not only is geocoding easier, doing these by hand is against the Geneva Convention. ArcGIS has some built in geocoding tools but they tend to not work so well in New Zealand. For reasons of expediency more than anything else, I chose to use batchgeo.com:
With this webpage you can paste addresses in a spreadsheet, set up some parameters, click OK and let it do the geocoding. It does have its limitations, such as only being able to do 2500 addresses at one time, which you can workaround with multiple runs, but its biggest issue is the output. Here’s an image of 63 geocoded addresses from Northland:
Great! Points on a map! And it’s done some pretty nifty things like colour them based on outlet type. But we’re not quite there yet as we need these points on our ArcGIS map for them to be useful. Batchgeo allows you to download the points but only as a KML file (Keyhole Markup Language) which is used in Google Earth. You can add a KML file to a map but to be useful for some analysis we need to convert it to a GIS layer using the KML to Layer tool (ArcToolbox > Conversion Tools > From KML):
so after converting it to a feature layer we’re almost there – a quick look at the new layer shows that it does give us points on the map but not much by way of useful attributes, and certainly not the ones we need.
After a lot of careful observing, I was able to find a common link between the original spreadsheet table and the KML file and could do a simple table join to bring in all the attributes from the original spreadsheet.
All this might not sound like that much work but, trust me, it was. And we’re not quite finished, because you can’t assume that Batchgeo does a perfect job of placing the addresses – it’s about 85% correct at a very rough estimate. I next spent a LOT of time reviewing the points and making sure they were where they should be – many had to be shifted before we could confidently carry out some analysis.
Next up, schools. I was able find a layer on Koordinates.com of schools from the Ministry of Education (you can also find a spreadsheet of schools on the Ministry website but it would have to be geocoded – it had some useful attributes so I just joined it to my point layer based on the school name.) Almost there.
Walking Distances
Our next step was to try and quantify how many tobacco retailers were “close” to secondary schools. But how close is close? My colleague did a bit of research and settled on two walking distances that secondary school students could reasonably walk to from school given time etc (and yes, based on this assumption, we’re not including those who can hop in their car in search of smokes). These were 500 m and 1000 m, i.e. we were wanting to count the number of retailers within 500 m and the number within 500 – 1000 m. To do so we needed to create a zone (i.e., a polygon) around the school and use some straightforward overlay analysis to see how many were inside those polygons. A common approach would be to draw a buffer around each school point (or school grounds) and then do some counting, but our approach was an attempt to be more realistic. A buffer is an as-the-crow-flies distance that doesn’t take into account the road network, upon which most people would be constrained to walk. Rather than use simple, circular buffers, we used network analysis to create walking zones based on distance, i.e. everything within a 500 m walking zone is within 500 m of the point for each school along the road network (in ArcGIS lingo this is referred to as a “service area”). We added a layer of the road network and used the Network Analysis tool to create two walking zones around each school. Here’s an example of what one looks like (Linwood College chosen at random – red dots are tobacco outlets, grey area is within 500 m of the school, light brown is 500 – 1000 m away):
By contrast, here’s how simple buffers compare to the walking zones (I haven’t seen too many smoking crows, have you?)
When creating these zones, I ensured that the resulting polygons had some way of identifying which school they were associated with – this meant making sure that the name of school was an attribute of each zone created. To make things simpler later, I created two polygons around each school – one for 0 -500 m from the point and the other 500 – 1000 m.
Now for some relatively simple overlay analysis to finish things off. We can easily identify the number of retailers within the zones visually but to do any sort of statistical analysis we need to quantify those numbers. We’ve got several overlay tools at our disposal in ArcToolbox, the most useful of which are arguably Intersect, Union, Identity and Spatial Join.
Before we launch into this, a quick thought about what we’re wanting here. We’re looking at the spatial co-location of points and polygons, specifically, we’re focusing on which ones are inside a given polygon and how many there are for each school. What I’m wanting as an output is a polygon layer that has the number of tobacco retailers within each zone for each school as an attribute. If you look through all the characteristics of the overlay tools, the Spatial Join tool stands out for one simple reason – it always generates a “Join Count” attribute which does just what I want. Here’s what the tool window looks like:
Notice that i can set the target and the join layers – these help specify which feature type you want as an output. Here I’ve set the target to be the walking zones (polygon) layer and I’ll join characteristics of the retailer points to it. Note also the option of one-to-one or one-to-many. This is a bit confusing to understand and an example might help. Let’s say we have a polygon with three retailers within its borders. If we did a one-to-many join, there would be three records in the output table (one for each retailer) and each would have a join count value of 1. If we did a one-to-one, we would have one record in the output and the join count value would be 3. The one-to-one is exactly what I want. Soooooo…two one-to-one spatial joins give me a summary of the number of retailers within each walking zone which I can tie to each individual school.
I next run the Merge tool to join my two distance polygons and I now have one layer that has results for each school. Using some field calculations compressed the table so there was only one record for each secondary school. Given the counts, I can set up symbols that help differentiate schools based on how many retailers are within the walking zones. Pshew – here’s a summary image.
While this is useful, it becomes even more useful if we try and relate these data to the smoking rates at each school (data that have been collected by ASH) and see if there are any policy implications. We’ve only just got permission to use those data so stay tuned for an update once we’ve done the analysis. In the meantime, what we’ve been able to do is, for the first time, create a spatial database of tobacco retailers for the country and been able to summarise their characteristics. Great! As a next step, how do we get this information out to our end users, the DHBs? This poses some interesting challenges which I’ll cover in part two.
If I were a smoker this is probably where I’d step outside and light one up. Thankfully, I’m not.
C