Being Fair to Covid-19
We take a look at mapping Covid-19 cases in New Zealand and how we can ensure that our interpretations are more realistic. MOH figures in this post are current as at 10 May 2020 at 9.00
As we shift a bit closer to Level 2 and a perhaps bit more freedom, it may be an appropriate time to have a look at the spatial aspects of Covid-19. There are lots, but in this post we’ll look at mapping the cases and also trying to ensure that our interpretations are accurate. So, where to start?
Let’s start with the Ministry of Health, who maintain a webpage of current case numbers with some useful information:
There’s even a map! (Thanks MOH):
This is, of course, a great way to present these data (I’m a bit biased, I know). But I fear these numbers don’t quite give us the full picture. Let me illustrate by recreating this map so we can play around with it a bit.
First off, on that same webpage, the MOH lists the number of current cases by district health board (DHB) – here’s a screenshot from that page :
As we’ve talked about in a previous post, while this is essentially a table of data, it’s spatial data by virtue of having the DHB names, which have locations (areas) implicit in them. Let’s start with those areas. I’ve got a copy of this on J: somewhere but in the short term, I went to Koordinates.com and search on “health boards” – there were several versions listed there and I chose the latest one, from 2015:
StatsNZ also has a copy of the same layer. Unfortunately, both extend beyond the coastline to take into about the 12 nautical mile territorial sea baseline – I can deal to that later. So I downloaded and unzipped a copy and here it is on the map below (note: I selected the 12 nm sea baseline polygon and just deleted it – that worked for most of the country except for the Nelson Marlborough DHB – I could remove that if I has a coastline polygon layer but I’m not going to worry about it for this. Also note that the Chathams are included with Canterbury):
With the table open you can see the DHB names there – that will be critical for our next step – adding the cases data. So back to the MOH webpage where I copied and pasted the values on the cases into an Excel spreadsheet:
Luck is on my side and doing a quick comparison, the names of the DHBs are mostly the same in the table and in the layer, so my table join will be pretty straightforward. I’ve got three issues though: macrons in Waitemata and Tairawhiti plus a space in “Mid Central” – my spatial layer has MidCentral. Before saving my spreadsheet, I’ll make those changes and ensure that the first row attribute names have no spaces, no crazy characters, none starting with a number, save it and add it to my map:
Note the <Null> values – these are due to spreadsheet cells that were blank. I’m just going to map the Total cases so I’m not worried about it but if I were I could do a Select by Attribute where “Deceased” IS NULL to select those records and change them to 0 with a field calculation. I would have to do that again for the Change attribute as I can only do field calculations one attribute at a time (or, I could have dealt to this in the spreadsheet before joining).
Next, right-click on the layer name and go to Joins and Relates > Add Join. The tool recognises the layer and the table and I just have to link up the attributes in each that holds the name. When I click Run those numbers get added to the layer’s attribute table:
Now I’ve can use the case values in my map and can basically recreate the MOH’s map:
Not an exact replica – the more I look at the MOH colour scheme the more I think it’s not linked to the values – more like a world map where colours are chosen to show different countries but no countries with a shared border have the same colour. On this map, the colours have more meaning: the darker the blue, the higher the case numbers. Anyway, we at roughly the same place as the MOH map now.
But herein lies my issue with this map (and not just this one – many instances like this). On the face of it, we’re comparing the number of cases by region – sweet. But is it a fair representation? Are the regions equal? Equal enough that the comparison is valid? To put it another way, Southland and Waitemata have comparable values (216 vs 233). But there are a few differences between them – two obvious ones are area and population. The area one is pretty obvious just by looking at the map, but the population one is less so. I’m going to focus on the population – mainly because the number of cases per km2 doesn’t seem to have much real-world meaning in this context.
So, population – my first thought was StatsNZ as they are the ones doing the counting. My search of their data was fruitless, but I did find this from the Ministry of Health:
Clicking through to each DHB shows a population estimate (from 2018) as text. Try as I might, I can’t find an easily downloadable table, so had to enter the values in region by region (couldn’t copy and paste – grrrrrrr….) so had to carefully transpose by hand and double-check to make sure I didn’t make a mistake. Here’s the outcome, grouped by defined interval with an interval size of 200,000 people:
The minimum population was the West Coast DHB at 32,410 while the maximum was 628,970 for Waitemata. Keeping in mind our earlier comparison, the population of Southern DHB is just over half of Waitemata at 329,890. The map also shows you the differences in their relative areas.
Now I’ve got one layer with case numbers and populations in the table.
What I want to do next is take the different DHB populations into account by dividing the case numbers by the population – standardising the values on a per capita basis. To do this I need to add a new field to the table and do a field calculation.
- In the DHB table, add a new floating point attribute – I’ve called it CaseDense but its alias is “Case Density”
- Right-click on the attribute name and choose Field Calculator
- Divide the number of cases (Total) by the DHB region population
- Review the numbers – here’s a histogram and some summary statistics:
These numbers are quite small so I’ll redo the calculation and multiply by 100,000 so that our numbers are now cases/100,000 people – these numbers are a little easier to process:
Same distribution as above but now the figures are a little easier to grasp. Okay – so let’s see that on the map, shall we?
Does this tell a different story from the original cases map? Arguably, yes. Southern DHB now has 65 cases/100,000 people (rounded) and Waitemata has a value of 37 at half the population. I would argue that these make the differences between the DHBs more comparable because they allow us to take into account the differences in populations between the regions. Does it mean I should stay away from Southland? Maybe not – more room to spread out than Waitemata so then it becomes about the risk of exposure. Given the higher population in Waitemata one’s exposure risk may be higher – but in either case, Social Distancing is the key!
(By the by, there’s already a post bubbling away in my subconscious about some of the choices made on the map above…stay tuned.)
In this same way, we could now more easily compare New Zealand’s case load to that of other countries and the comparison is fairer. Here, for example, are a few population weighted comparisons to finish things off (as of 5 May 2020. By comparison, we are about 305.8 cases per 1 million people as of today [11 May]):
Luckily, someone’s done the hard work already of mapping the deaths per capita (note, deaths per million people – different from what we’ve been looking at):
This sort of data comparison is fraught with problems – mainly due to how data are collected (i.e. is a death counted if it can be directly tied to Covid-19 as well if it was from another condition arising from a Covid infection? The answer can differ from country to country), but these are the data that are available and we have to do the best we can with what we’ve got.
No one likes Covid-19 though I suspect there are some virologists that probably have some grudging admiration of it. And I guess this post isn’t so much about being fair to Covid-19 as it is being fair to our interpretation of its effects. Covid-19’s certainly not very concerned about being fair to us. Quite the opposite. But that’s what viruses do.
In these early (?) days of the pandemic, we’ve been seeing lots of maps of cases and deaths and the story they tell is not always as fair as it might seem. With GIS, a picture (map) always tells a 1,000 words – as analysts and map makers, we always need to make sure that we’re telling a fair story. This post has been about data mainly, and how best to present it. Along the way, we saw (once again) the value of table joins and generally scrapping around for useful data. Mainly it’s been about being a responsible analyst and being fair to the data.
Keep washing those hands! And maintain your buffer zones!