Three different ways of mapping some dissolved reactive phosphorus data are covered.

Some of our recent work in pulling together a set of global datasets is beginning to bear fruit. Over the next few posts, I’ll go into excruciating detail on how the datasets were pulled together and the analysis results mapped.

Along with nitrogen, phosphorus is a key growth nutrient for plants. But like too much of anything being a bad thing, excessive levels of phosphorus in water can promote the growth of algae and plants leading to less oxygen available for fish and a decrease in water quality. Dissolved Reactive Phosphorus (DRP), the form of phosphate most available for plant growth, is monitored as a water quality indicator and can be found in both surface and groundwater. With some of the good folk at AgResearch, we’ve been pulling together these global scale datasets to model its spatial distribution in groundwater.

We’re working with different sets of modelled data that I’ll have to go into another time, but in this post we’ll look at some different strategies for mapping them. To get things started, I’m working with a polygon layer that has two values of DRP from two different modelling efforts.

  • DRP_pred is from a global data set modelled using about 19 different factors
  • Mean DRP_curr – also modelled using a different set of input data

Here’s the attribute table:

You can see that there are 253,293 records here. These are tied to a set of global cells, each 50 km2 in size; all are within 2 km of an existing river and also in areas of shallow groundwater. In the image below, I’ll show you the NZ cells – at the global scale these would be too small to see.

The immediate challenge here is to display the data in a meaningful way for my colleagues. Of interest to them is knowing in which cells the value of DRP_pred is greater then mean DRP_curr. There are two approaches I can use – either create a new attribute to symbolise off of or just use straight symbology to show them. The difference between these two is that in the first, the attribute becomes a permanent part of the data but in the second it’s a more temporary thing, part of map but not the data.

Using Attributes

So this sounds like it could be a fairly simple thing, right? There are (at least) 2 ways I could do this:

  • Subtract the two values and look at values above and below 0, or
  • Use values such as 1 for DRP_pred greater than mean DRP_curr or 0 for not

In both bases I’ll need to add a new attribute. For the first option, I’ll add a new field called Difference and set it as a float type. (Both DRP values are floating point so my attribute needs to be floating point as well). Going to the Fields view I can set this up as:

When saved, the Difference attribute gets added to the table with <Null> values:

Right-clicking on the Difference name and opting for Calculate Field allows me to set up the subtraction:

Clicking OK gets me the output:

Now I can use the symbology to map this difference. Given that the values can range from some maximum through to some possibly negative minimum, I’ll use Graduated Colours in Symbology with only two classes:

For NZ it turns out that DRP_pred is greater than median DRP_curr for all cells, but that’s not true globally. For some areas in central Africa, it’s different:

My other option was to use an attribute with a sort of “dummy” variable that’s really just an indicator, such as using a 1 if DRP_pred is greater than mean DRP_curr and a 0 if it’s not. To do this I’ll first create a new attribute as a short type called DRP Dummy (not being judgemental here…) as a Short type:

And here’s where the challenge came in. My first plan was to do a Select by Attribute to find the records that I wanted and then Calculate Field to add the right value into DRP Dummy. That actually won’t work. Without using SQL (which is what Select by Attribute uses under the bonnet), queries only allow you to use attribute values rather than being able to compare values between attributes:

So, gotta think of another approach. I knew what I need was some sort of if…then approach, which led me to think about using the Calculate Field tool. With that we can write mini-functions that can use Python or VBScript in a code block to do something special.

With a bit of trial and error, I settled on this:

(NB: in the Code Block I could have used different variable names for DRP_pred and mean_DRP_curr.)

Decoding this, the DRP_dummy = Calc(!DRP_pred!, !mean_DRP_curr!) line means that the Calc function will need to use the values in the DRP_pred and mean_DRP_curr attributes to do its thing.

Its thing is to use an if statement to test each record. If DRP_pred is great than mean_DRP_curr, the add a 1 to DRP_dummy. If it’s not (else), give it a zero.

Not a whole lot to it, but it does the trick. When we run this, here’s the output:

The vast majority of cells get a 1 but I cheated a little to show you that not all do.

Now I can use this to symbolise my data with Unique Values:

Using Symbology

Both of the above require a bit of upfront work but the symbology is now hard-wired into the data itself. I could send this layer to someone else and they could use these attributes to symbolise in a similar way. (We could also use some layer files…)

The third option was to use straight symbology without making any changes to the data. I’ve got a lot of options for symbology – for this I used Uniqe Values:

When that window opens, notice the wee function button at the right hand side of Field

Clicking that allows me to write a custom function to use:

Then, setting the number of classes to 2 lets me symbolise things in a non-permanent way:

So, three ways of doing the same thing. You may well ask, well, how did I get here? which one should I use?

My response would be two-fold: it depends on what you’re comfortable with and whether you want to make the symbology a permanent part of the data. With GIS it’s often the case that you can do the same thing in multiple ways, and a practical thought is which way is the most elegant or efficient way. Personally, I’d rather make it permanent, and I quite liked using code here to do it, but that’s just me. It seemed like a good use of my time.

So what next for these data? These layers ended up on a webmap which my colleagues could then review and formulate some next steps, which you might hear about in another post.