Deleting Smokers – Part 1
This is the first post in a series covering the development of a Python script to carry out some analysis. We cover the idea behind the script and develop some pseudo-code used to flesh it out.
For those of you having trouble sleeping, this post may be the one for you. In several previous posts, we’ve covered some of the analysis done in looking at the link between tobacco outlets and secondary schools. Some of those analyses required the writing of a Python script to automate some complex analysis. There’s been a move underfoot internationally to look at setting minimum distances between tobacco retailers and my colleagues at Otago have been keen to follow this up. We’ve been looking at a paper that employed a Python script to systematically look over a dataset of retailers, determine their distances apart and then randomly remove retailers until there are none within a given distance of each other. Once all the retailers within a given distance are removed, we can run some statistics to evaluate the effects on policy. Then we do that several more times to get a representative sample size. Different countries have looked at different distances – the paper we’ve been looking at uses 500 ft (heathens).
We contacted the paper’s authors, hoping that they might share their script with us, but alas, we’ve had no response. So if we’re going to carry out a similar analysis in NZ, we’ll have to write our own script, which will probably be an unpleasant experience, but it may work out that I can reuse some of the code another time. (I do slightly exaggerate things…I actually do like coding, but it can often be a frustrating experience.)
In this series of posts I’ll step through the whole process, from development, to testing, to debugging and, hopefully, to a fully fledged, error-free, working script. This will either inspire you to start (or continue) your own coding, or more likely, convince you that coding is for someone else to do (I hope it’s the former). To my way of thinking, there are four stages to scripting: conceptualising the script, writing the script, debugging the script and finally, running the script error free. I won’t say this is universal, but just my approach to it.
Disclaimer: (ed. you seem to be doing a lot of that lately…) I wouldn’t call myself a Python expert. I know enough to make me dangerous. I’m sure some readers out there will be kilometres ahead of me and will easily pick out my errors.
Stage 1 – conceptualising the script
A first step in writing a script is to be sure you’re clear about what you want to achieve. And it doubly helps if you can be as crystal clear as possible. Any ambiguity will rear its ugly head and force you to rethink things – that’s not a bad thing but if you can be as clear as possible going into it, you can often speed up the process. So a good first cut is to list the important tasks that need to be done as well as the order in which they have to happen if order is important.
Here’s what they say in the paper about their script: “ArcMap was used to identify all tobacco retailers within 500 ft from another tobacco retailer. A custom script was written in Python to randomly select one tobacco retailer to be deleted from the list. The process continued iteratively until the list contained zero tobacco retail outlets from within 500 ft of another retailer. This random -choice analysis (sic) yields different results each the process is run (see Figs. 1 and 2). Thus the process was run 1000 times and the mean number of retailers was removed from each list.” (Myers et al., 2015). I include figure 1 to help explain this:
It’s interesting to note the figure above focuses on picking “proximity relationships” rather than points. After a bit of thought this make sense when you consider two points close to each other. Each has a point within a given distance and so would be of interest, but there’s only one relationship – this makes the whole process a bit easier, I think. That’s already got me thinking of a potential tool I could use – an OD Cost Matrix. Not sure yet that that’s the right tool.
That’s an overview in sentence form – next I’ll see if I can list those steps in order with more of a GIS slant:
- Take a set of points and find those that are within a given distance of each other and add them to a list
- Pick one at random and then randomly select a relationship with one of the close points – delete one of them at random
- Review the list. If there’s still a proximity relationship delete one of the points at random.
- Continue until there are no more proximity relationships
- Calculate the statistics
- Do this 999 more times. working with a different set of randomly selected proximity relationships and points.
Technically, I would refer to the list above as pseudocode. This is nothing more than trying to write the script (we could also call this the algorithm) in plain english that “anyone” could understand. Later, this pseudocode will be translated into a specific language, Python in this case. It’s a good place to start
What I’ve tried to do here is set out as clearly as I can the tasks that need to be done – and it’s a work in progress. Looking over the list, there are some tasks that need to be done repeatedly at different points in the script. These are good places for a loop – a set of instructions that take different inputs but do the same thing over and over again. There’s one internal loop that keep deleting points until there are no points within the set distance. And of course a big loop that makes this happen 1000 times in total.
At this point, my brain hurts so in a follow up post we’ll look at this pseudo-code with some loops thrown in. Bet you can’t wait…
One thing you may have noticed in this post is that there are no pretty pictures. It’s hard to find good images to go along with coding, but let’s try one in honour of Abel Mamboleo, who has just successfully defended his PhD thesis on geosimulation of human-elephant conflicts – he definitely had to learn some coding for this work. And he was happy to finish:
C