{"id":2686,"date":"2020-09-24T20:19:42","date_gmt":"2020-09-24T08:19:42","guid":{"rendered":"http:\/\/blogs.lincoln.ac.nz\/gis\/?p=2686"},"modified":"2023-05-07T03:13:29","modified_gmt":"2023-05-07T03:13:29","slug":"deleting-smokers-part-2","status":"publish","type":"post","link":"https:\/\/blogs.lincoln.ac.nz\/gis\/deleting-smokers-part-2\/","title":{"rendered":"Deleting Smokers &#8211; Part 2"},"content":{"rendered":"<p><em>Part 2 of this series looks at refining our script pseudo-code to get input data prepped for further analysis<\/em><\/p>\n<p>Way back when the dinosaurs roamed the earth,\u00a0we saw the <a href=\"http:\/\/blogs.lincoln.ac.nz\/gis\/deleting-smokers-part-1\/\">first post<\/a> in this series on developing a Python script for some analysis.\u00a0 With Python being the topic of the moment in the current GIS course, I&#8217;ve been encouraged to pick up where I left off and continue developing this script.\u00a0 You might recall that the aim was to take a collection of points (here being tobacco retailers) and test out the effect of reducing the number on tobacco retailers, of limiting availability.\u00a0 This would mean systematically removing retailer points and then evaluating average additional costs to consumers.\u00a0 This is not a straightforward thing to do at all, not least because we needed to have some objective way of removing retailers.\u00a0 We decided we might look at imposing a minimum distance between retailers, such as 500 m, and then using that criterion as a way to remove them.\u00a0 That&#8217;s the theory behind this and what I&#8217;ve been trying to figure out is how I can set the data up to do this.\u00a0 I&#8217;ll reinsert a figure used in the previous post to illustrate:<\/p>\n<figure id=\"attachment_2392\" aria-describedby=\"caption-attachment-2392\" style=\"width: 809px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0091743515000274\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2392 size-full\" src=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2019\/05\/Fig1Myers.png\" alt=\"\" width=\"809\" height=\"815\" srcset=\"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2019\/05\/Fig1Myers.png 809w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2019\/05\/Fig1Myers-298x300.png 298w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2019\/05\/Fig1Myers-150x150.png 150w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2019\/05\/Fig1Myers-768x774.png 768w\" sizes=\"auto, (max-width: 809px) 100vw, 809px\" \/><\/a><figcaption id=\"caption-attachment-2392\" class=\"wp-caption-text\"><em>https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0091743515000274<\/em><\/figcaption><\/figure>\n<p>The key here is being able to set up what the authors refer to as a &#8220;proximity relationship&#8221;, meaning one relationship between two points and how far apart they are.\u00a0 A script would randomly select a proximity relationship and delete one of the two points (again, at random) and then continue iterating through all the points again until there were no points closer than a certain distance.\u00a0 Unfortunately, there&#8217;s no off-the-shelf tool that I&#8217;m aware of that will do this, so I&#8217;ve got to come up with some systematic way of doing this.<\/p>\n<p>Initially, there were two main tools that came to mind.<\/p>\n<ul>\n<li><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/tool-reference\/analysis\/near.htm\" target=\"_blank\" rel=\"noopener noreferrer\">Near<\/a> (and its tabular cousin <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/tool-reference\/analysis\/generate-near-table.htm\" target=\"_blank\" rel=\"noopener noreferrer\">Generate Near Table<\/a>)<\/li>\n<li><a href=\"https:\/\/desktop.arcgis.com\/en\/arcmap\/latest\/extensions\/network-analyst\/od-cost-matrix.htm\" target=\"_blank\" rel=\"noopener noreferrer\">OD Cost Matrix<\/a><\/li>\n<\/ul>\n<p>Near does a fairly straightforward thing &#8211; take a vector layer and find the nearest feature in another layer.\u00a0 Usually, there are two separate layers, such as water wells and roads.\u00a0 In my case, I&#8217;m wanting to find the distances between features in the same layer.\u00a0 When this tool is run, new attributes get added to the input layer that add the ID of the nearest feature and its straight line distance away (there are other options as well, like bearing and angle).\u00a0 Generate Near Table does the same thing but puts the results into a table rather than a layer.<\/p>\n<p>To give myself something to work with, I clipped out a series of points from the national layer of tobacco retailers, keeping it close to home so that I could more easily pick up any obvious locational errors &#8211; here&#8217;s my sandbox &#8211; 329 points:<\/p>\n<p><a href=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sandbox.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2687\" src=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sandbox.jpg\" alt=\"\" width=\"1412\" height=\"854\" srcset=\"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sandbox.jpg 1412w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sandbox-300x181.jpg 300w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sandbox-1024x619.jpg 1024w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sandbox-768x464.jpg 768w\" sizes=\"auto, (max-width: 1412px) 100vw, 1412px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/Near.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-2689 alignright\" src=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/Near.jpg\" alt=\"\" width=\"319\" height=\"552\" srcset=\"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/Near.jpg 428w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/Near-174x300.jpg 174w\" sizes=\"auto, (max-width: 319px) 100vw, 319px\" \/><\/a>The setup for the Near tool is at right &#8211; notice that the Input Features and the Near Features are the same layer.\u00a0 This allows me to find the distance to the nearest retailer for each input point.<\/p>\n<p>In the output, the useful attributes are NEAR_FID (the object ID of the closest retailer) and NEAR_DIST (the distance to that retailer).\u00a0 When running this tool I set a search radius of 500 m, meaning that the searching stopped at that distance.\u00a0 That&#8217;s why there&#8217;s a -1 in record 5 &#8211; there were no retailers within 500 m of that point.<\/p>\n<p>So this <em>sort of<\/em> gives me the proximity relationship between any two points &#8211; <em>but<\/em>, I&#8217;ve got two instances of each relationship.\u00a0 For instance, I&#8217;ve got an entry (record 1) that shows the closest retailer to OBJECTID 1 is FID 274.\u00a0 Further down the list, there will be its reciprocal entry: OBJECTID 274 and <em>its<\/em> nearest retailer, FID 1.\u00a0 This gets me part of the way there but will still require an extra bit of work, as I only want one proximity relationship per pair.<\/p>\n<p>The same holds for the table that comes out of Generate Near Table.<\/p>\n<p>I next looked at OD Cost Matrices.\u00a0 The &#8220;OD&#8221; stands for origin-destination.\u00a0 These layers come to us as part of Network Analysis, where we&#8217;re using linear, connected networks (such as roads, or fiber optic cables, or sewer pipes) and doing analysis based on how close features area along the network (think Google Maps and finding directions from Point A to Point B.\u00a0 Instead of using as-the-crow-flies distances, we&#8217;re using distances along, in this case, the road network.\u00a0 The nice thing about OD matrices is they explicitly calculate distances between points that are set up as origins and destinations.<\/p>\n<figure id=\"attachment_2690\" aria-describedby=\"caption-attachment-2690\" style=\"width: 837px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODFigure.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2690 size-full\" src=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODFigure.jpg\" alt=\"\" width=\"837\" height=\"534\" srcset=\"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODFigure.jpg 837w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODFigure-300x191.jpg 300w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODFigure-768x490.jpg 768w\" sizes=\"auto, (max-width: 837px) 100vw, 837px\" \/><\/a><figcaption id=\"caption-attachment-2690\" class=\"wp-caption-text\"><a href=\"https:\/\/desktop.arcgis.com\/en\/arcmap\/latest\/extensions\/network-analyst\/od-cost-matrix.htm\" target=\"_blank\" rel=\"noopener noreferrer\"><em>https:\/\/desktop.arcgis.com\/en\/arcmap\/latest\/extensions\/network-analyst\/od-cost-matrix.htm<\/em><\/a><\/figcaption><\/figure>\n<p>For quick display the connecting lines are shown as straight but the calculated distances in the output are along the network<\/p>\n<p>For this tool to work, I need to have a road network that has been set up to do network analysis &#8211; luckily, there&#8217;s one in J:\\Data\\NetworkAnalysisData called RoadsNA.\u00a0 Once added to the map, I can set up this tool (Analysis tab &gt; Network Analysis) by importing\u00a0Origins and Destinations (the same layer in this case) and then setting a few parameters:<\/p>\n<p><a href=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODSetup.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2691\" src=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODSetup.jpg\" alt=\"\" width=\"1380\" height=\"235\" srcset=\"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODSetup.jpg 1380w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODSetup-300x51.jpg 300w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODSetup-1024x174.jpg 1024w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODSetup-768x131.jpg 768w\" sizes=\"auto, (max-width: 1380px) 100vw, 1380px\" \/><\/a><\/p>\n<p>I&#8217;ve set the mode to &#8220;Driving Distance&#8221; so it&#8217;s distance based rather than time based and have limited the number of destinations to 4 (just to keep things manageable).\u00a0 Here&#8217;s my result:<\/p>\n<p><a href=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODResults.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2692\" src=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODResults.jpg\" alt=\"\" width=\"1160\" height=\"919\" srcset=\"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODResults.jpg 1160w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODResults-300x238.jpg 300w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODResults-1024x811.jpg 1024w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/ODResults-768x608.jpg 768w\" sizes=\"auto, (max-width: 1160px) 100vw, 1160px\" \/><\/a><\/p>\n<p>Each of the purple lines joins two of the retailer points.\u00a0 In the table you can also see that each input point has four records &#8211; one for each of the four destinations set in the tool, ranked from 1 (closest) to 4 (furthest away).\u00a0 In the first record, see how it&#8217;s found the distance to itself (scrolling to the right you would see the distance to be 0)?\u00a0 FID 274 is the closest, followed by 56 and then 277.\u00a0 This is arguably a more realistic output as it&#8217;s a real-world network distance.<\/p>\n<p>But this layer has proven problematic to work with, especially when trying to deal with removing each pair&#8217;s reciprocal.\u00a0 If I sort these by distance, it becomes easy to see that the reciprocals line up one after the other:<\/p>\n<p><a href=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sorted.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2693\" src=\"https:\/\/d-blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sorted.jpg\" alt=\"\" width=\"1152\" height=\"587\" srcset=\"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sorted.jpg 1152w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sorted-300x153.jpg 300w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sorted-1024x522.jpg 1024w, https:\/\/blogs.lincoln.ac.nz\/gis\/wp-content\/uploads\/sites\/3\/2020\/09\/sorted-768x391.jpg 768w\" sizes=\"auto, (max-width: 1152px) 100vw, 1152px\" \/><\/a><\/p>\n<p>Ah, if life were simple enough that all I need to do is delete every other record, I&#8217;d be a happy chappy.\u00a0 Nor is it quite so simple to just delete everything other than rank 2.\u00a0 On closer inspection, there are some inherent problems, mainly caused by this layer ranking the points.\u00a0 It&#8217;s a difficult one to fully explain, suffice it say that while a point may be the closest retailer to one, that other retailer may have another one closer but not close to the first.\u00a0 (<em>Ed., clear as mud, thank you.<\/em>)\u00a0 This is important, because with this dataset I&#8217;ve only got 329 points &#8211; worst case is I eyeball things and manually delete the right ones.\u00a0 But this is my sandbox dataset &#8211; the full dataset has close to 6,000 retailers and I DO NOT WANT TO HAVE TO GO THROUGH ALL THOSE ONE BY ONE.\u00a0 I&#8217;ll go insane (well, <em>more<\/em> insane).\u00a0 There must be some way to automate this.<\/p>\n<p>So, where does this leave us?\u00a0 I&#8217;ve got three outputs that get me part of the way to where I need to be.\u00a0 But a bit more work is required.\u00a0 To be thorough, I decided that I would be best off writing a script that will do this next bit for me.\u00a0 The biggest issue appears to be how to remove a retailer pair&#8217;s reciprocal.\u00a0 I have found that one way to sort though stuff like this is LSD.\u00a0 Ha &#8211; not the LSD you <em>might<\/em> be thinking of, no, Long Slow Distance runs.\u00a0 I can&#8217;t tell you how many difficult problems I have found solutions for with LSD, often within the first few minutes of the trip, er run.\u00a0 Anyway, on today&#8217;s run I worked out in my head the rough pseudo-code I would need to do this.<\/p>\n<p>Thinking about the OD Matrix results, with OriginID being the point in question and DestinationID\u00a0being the ID of the nearest retailer, the proximity relationship is between those two values.\u00a0 If I swap them, those are the values for the reciprocal pair.\u00a0 So psuedo-code would look like this:<\/p>\n<p>For each record in the Near table,<\/p>\n<ul>\n<li>Get the attribute values for OriginID\u00a0and\u00a0DestinationID<\/li>\n<li>Find the record with a Select by Attribute where OriginID\u00a0 = DestinationID\u00a0 AND DestinationID\u00a0 = OriginID\u00a0.<\/li>\n<li>Delete that record.<\/li>\n<\/ul>\n<p>That&#8217;s confusing, I know.\u00a0 The Select by Attribute finds the record where the OriginID and the DestinationID\u00a0have been swapped &#8211; the reciprocal of the first proximity relationship.\u00a0 By deleting it I am left with only one of those to work with.<\/p>\n<p><span style=\"font-size: 14pt\"><strong>Pshew!<\/strong><\/span><\/p>\n<p>Still awake?<\/p>\n<p>So far, this has all been about prepping my data for the real show &#8211; the bigger picture of deleting retailers so that there are none within 500 m of each other.\u00a0 That will be a bigger, more involved script which I still need to think more about.\u00a0 But this is a good start.<\/p>\n<p>Many people use Python to automate tasks, things that you do over and over again, as a way of saving time.\u00a0 In this context, we&#8217;re using Python as the analysis tool, to make GIS do the things it wasn&#8217;t really designed to do, make it jump through some spatial hoops.\u00a0 It can be fun; it can also be frustrating.\u00a0 But it does help to keep the Alzheimer&#8217;s at bay.<\/p>\n<p>In subsequent posts, we&#8217;ll delve deeper into this task.\u00a0 Hope you&#8217;ll come along for the journey.<\/p>\n<p>C<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Part 2 of this series looks at refining our script pseudo-code to get input data prepped for further analysis Way back when the dinosaurs roamed the earth,\u00a0we saw the first post in this series on developing a Python script for some analysis.\u00a0 With Python being the topic of the moment in the current GIS course, [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2686","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/posts\/2686","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/comments?post=2686"}],"version-history":[{"count":1,"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/posts\/2686\/revisions"}],"predecessor-version":[{"id":4094,"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/posts\/2686\/revisions\/4094"}],"wp:attachment":[{"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/media?parent=2686"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/categories?post=2686"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.lincoln.ac.nz\/gis\/wp-json\/wp\/v2\/tags?post=2686"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}