GIS Blog

2

Those Unformatted Dragonflies Again…

Analysis and mapping is almost always preceded by time spent getting data ready to be mapped.  In this post we cover an instance of reformatting some raw spatial data so that they can be used for mapping.

Over the years, there have been a few posts about dragonflies of the Pacific, from using a smartphone app to collect data in the field, to acquiring elevation data, and effective Pacific coordinate systems.  Those dragonflies aren’t going anywhere (we hope!) and still need to be catalogued, so my colleague, Milen, is still travelling the Pacific (though not in the past year), most recently in the island group of Wallis and Futuna.  Where are these islands?

By TUBS – Own work This W3C-unspecified vector image was created with Adobe Illustrator. This file was uploaded with Commonist. This vector image includes elements that have been taken or adapted from this file: Polynesian triangle.svg (by Gringer)., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15127571

Note how Wallis and Futuna are south of the equator and just on the other side of the international dateline from us – that will become more important later.

Now Milen is well versed with using the app to collect his observations, but we needed some locations collected by someone else to be mapped.  He was sent a Word document with the points that looked something like this (around 45 points in all):

Here was my reaction:

Why?  Clearly, there is spatial information in each entry given by the latitude and longitude BUT it is in a form that is not easily imported into ArcGIS.  This is a very frequent occurrence, I’m afraid, and one must be prepared to massage the data into a more useful form.

A few things to note before we dive into this.  The coordinates are essential for getting the points on the map, but they aren’t quite in the right form.  For instance,  give me latitude and longitude, but the “S” (for South of the equator) and the “W” (for West of the Prime Meridian) won’t make sense to Pro.  What is preferred is either a positive or a negative value to indicate N/S of the equator or E/W of the prime meridian, as shown below:

http://www.fekete.com/san/webhelp/feketeharmony/harmony_webhelp/content/html_files/reference_material/General_Concepts/GIS_Theory.htm

For mapping in our part of the world, latitudes need to be negative, so I’ll have to take this into account at some point.  And being just the other side of the dateline, the longitudes will be negative as well.

But the biggest problem by far is getting these data organised into something more closely resembling something mappable – GIS does not like Word docs.  There are a number of different ways to do this, and more often than not I fall back on Excel (How would you do it?)  I can’t open a Word file in Excel so as a first step, I saved the Word file as a straight text file and then opened it in Excel.

This dialogue will allow me start parsing the text into more useful columns.  The file is not well aligned with columns so I chose to use Delimited > Next:

With some judicious use of delimeters (semicolons, commas and open parentheses) I’ve got a good first cut.  I accepted the defaults on the next screen, hit Finish and here’s what I got:

Not a bad start but I’ve still got some mucking around to do.  I’m aiming to get all the latitudes and longitudes into their own column – there are 45 rows in this file so I’ll bite the bullet and do some manual rearranging, being careful not to lose any data:

Notice that I’ve added column names – we’ll need those as field names when we add the file to Pro.  I know I’m going to be creating some new lat/long columns along the way so I add a “1” to these column names.  Next I’ll remove the “S” from the latitudes and make the number negative using a formula with the LEFT function.  This function lets me extract a set number of characters starting from the left of the first one:

 C2 is the cell that has my South enabled latitude.  Most of my coordinates have seven characters (including the “.”) multiplying this by -1 gives me the right result:

That’s half the battle – now to my longitudes.  The LEFT function works well here too:

I’m mostly done though there was a bit of jiggery pokery to be done with a few coordinates that marked the beginnings and ends of transects and tidying up the comments, so I copied the relevant columns to a new sheet and saved the Excel file:

(The ones highlighted in green are the transect points.)  In Pro, I can now add that sheet to a map and use Display X Y Data to get the points plotted:

Job done!  To wrap this up, here’s the figure that’s most likely to go in the paper for this study, coming to a reputable journal near you:

As many of you GIS users out there will know, much of “doing GIS” means prepping your data before the glory work of analysis and mapping – and this was a good example of the work that often needs to go into the raw data that someone sends you and getting them on the map.  I don’t feel as downtrodden as Van Gogh portrayed above after this…but I’ll certainly encourage the next expedition to use the bloomin’ app!

Oh wait, what’s this I see in my inbox?  It’s a note from Milen with a Word doc of additional dragonfly observations…

C

• 24/02/2021


Previous Post

Next Post

Comments

  1. James Brasington 24/02/2021 - 4:55 pm Reply

    Nice work Crile. If I had a dollar for everything I’ve been asked solve issues like this, I’d be a rich man!

  2. Jon Sullivan 25/02/2021 - 12:38 pm Reply

    I’d write a quick regular expressions script to convert the text of that Word file to a CSV spreadsheet. It’s worth looking into. It’s saved me a massive amount of time wrangling these kinds of data sets, and once you’ve written it, you can rerun it on new data as they come in.,

    Here’s a quick example.

    Source:
    1. Lake Kikila Hahake, (13.2947S, 176.1889W; 14 a.s.l.): 29 February, 01, 06 March, Wallis 3

    Regular expression search:
    ^[0-9]+\. (.+)\(([0-9\.]+)S, ([0-9\.]+)W; ([0-9]+) a\.s\.l\.\): (.+)$

    Regular expression replace:
    “\1″,”-\2″,”-\3″,”\4″,”\5″

    Output CSV:
    “Lake Kikila Hahake, “,”-13.2947″,”-176.1889″,”14″,”29 February, 01, 06 March, Wallis 3″

    That would work on as many rows as you gave it. I’d add a second step to extract out any comments from the end of the place names in the first column, and separate off the dates in the second column and parcel them out as separate rows.

    Text editors do regular expressions (eg Microsoft’s Visual Studio) and R does it with its commands sub, gsub, grep, and grepl.

Leave a Reply

Your email address will not be published / Required fields are marked *