-
Notifications
You must be signed in to change notification settings - Fork 0
/
foss4g_oceania_2018.html
2 lines (2 loc) · 9.87 KB
/
foss4g_oceania_2018.html
1
2
<!DOCTYPE html>
<html lang="en"><head><title>Dr Strangedata; or, How I Learned to Stop Worrying and Love the_geom</title><meta charset="utf-8"><meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"><meta content="width=device-width, initial-scale=1.0, user-scalable=no" name="viewport"><link href="https://fonts.googleapis.com/css?family=Lato" rel="stylesheet" type="text/css"><link href="foss4g_oceania_2018/trailmarker-strangelove/trailmarker-strangelove.css" rel="stylesheet" type="text/css"><link href="foss4g_oceania_2018/css/reveal.css" rel="stylesheet" type="text/css"><link href="foss4g_oceania_2018/css/night.min.css" rel="stylesheet" type="text/css"><link href="foss4g_oceania_2018/css/leaflet.css" rel="stylesheet" type="text/css"><link href="foss4g_oceania_2018/stylesheet.css" rel="stylesheet" type="text/css"><script src="foss4g_oceania_2018/js/jquery-3.1.1.slim.min.js" type="text/javascript"></script></head><body><div class="reveal"><div class="slides"><section class="centred-slide"><img class="slideshow-background" src="foss4g_oceania_2018/img/columbia_intro.png"><h2 class="width-warning hidden">Enter at your own risk, this slide deck needs 1000px or so of screen width to be fully navigable</h2></section><section class="centred-slide" data-state="title-slide"><h1 class="strangelove-title">Dr. StrangeData, Or:</h1><br><h1 class="strangelove-title">How I Learned To Stop</h1><br><h1 class="strangelove-title">Worrying, And Love</h1><br><h1 class="strangelove-title huge">the_geom</h1></section><section class="centred-slide light-scheme"><img class="slideshow-background" src="foss4g_oceania_2018/img/spatial_galaxy_meme.png"></section><section class="centred-slide" data-state="title-slide"><h1 class="strangelove-title">Designing</h1><br><h1 class="strangelove-title">Spatial</h1><br><h1 class="strangelove-title">Applications</h1><br><h1 class="strangelove-title huge">is (fairly) hard</h1></section><section class="light-scheme"><h1>Performance in spatial systems</h1><div><p>A non-exhaustive list of historical big bangs in GIS that were invented to alleviate performance issues</p><ul><li>Spatial indexes</li><li>Image overviews or "raster pyramids"</li><li>Pre-rendered cartographic raster tile caches</li><li>Vector tiles</li></ul><p>Fixing our problems has always been about Big Ideas—but are our designs too reliant on Big Ideas?</p></div></section><section class="light-scheme"><h1>Guarantees and upper bounds</h1><img src="foss4g_oceania_2018/img/reactive_systems.png" style="width: 80%;"><div><p>(<em>excerpted from The Reactive Manifesto</em>)</p><p>Almost all sufficiently complex spatial applications remain vulnerable to conditions of poor, unresponsive performance.</p></div></section><section class="light-scheme"><h1>What's good about guarantees</h1><div><p></p><table> <thead> <tr> <th></th> <th>Guarantees</th> <th>Big Ideas</th> </tr> </thead> <tbody> <tr> <td>User experience</td> <td>The <em>worst</em> experience is defined</td> <td>The <em>average</em> experience is improved</td> </tr> <tr> <td>What do we know?</td> <td>The worst for a system is no worse than the worst of all of its components</td> <td>We don't readily know how components' good and bad behaviour will relate</td> </tr> <tr> <td>Design decisions</td> <td>Can be made with confidence for the worst case</td> <td>Can only be made with some risk</td> </tr> </tbody> </table><p></p></div></section><section class="light-scheme"><h1>Looking at a basic spatial operation</h1><div><ul><li>A commonly implemented function: the user defines an Area of Interest in relation to a set of configured layers</li><li>The geographical intersection of each layer with the Area of Interest is computed and extracted or visualised</li></ul></div><img src="foss4g_oceania_2018/img/martinez_example.png"></section><section class="light-scheme"><h1>How can we reason about this feature?</h1><div><ul><li>What does openness buy us in our software foundations?</li><li>eg in <a href="https://github.com/Turfjs/">Turfjs</a> the algorithm used is the the <a href="https://github.com/w8r/martinez">Martinez-Rueda polygon clipping algorithm</a></li><li>The complexity of the Martinez algorithm is <span style="font-style: italic; font-size: 120%">O((n+k)*log(n))</span>, where:<ul><li><em>n</em> is the number of <strong>all edges</strong> in the interacting polygons</li><li><em>k</em> is the number of <strong>all interactions</strong> between polygon edges<br></li></ul></li><li>O(n*log(n)) is already not too great</li><li>Dependent on both size and <em>spatial relations</em> of the input data!<br></li><li>It turns out that useful guarantees are hard to obtain here</li></ul></div></section><section class="light-scheme"><h1>Measuring input data size</h1><p style="display: flex;"><b>PostGIS</b></p><br><p style="margin-left: 3em;font-family: monospace">SELECT ST_NPoints(geom);</p><br><p style="display: flex;"><b>Turfjs</b></p><br><p style="margin-left: 3em;font-family: monospace">const line = turf.lineString([[-83, 30], [-84, 36], [-78, 41]])</p><p style="margin-left: 3em;font-family: monospace">console.log(line.coordinates.length)</p></section><section class="light-scheme"><h1>Measuring spatial relations</h1><div><ul><li>Patterns of relation between geographical <em>record types</em> are often un-recorded relations of the <em>real things</em> they are recording<br><br></li><li>We don't have a straightforward technical practice that understands "in the suburbs, roads and houses go together"<ul><li>ideas like spatial correlation, proximity analysis, and similarity measures</li><li>ad hoc measures, eg "mean envelope per feature in a layer"<br><br></li></ul></li><li><strong>However</strong>: these patterns make a very big difference to the risk and variation of spatial operation complexity</li></ul></div></section><section class="light-scheme"><h1>Ways to estimate job size?</h1><div><ul><li>Create a <strong>heuristic function</strong> that combines known metrics on the inputs to the job<ul><li>eg sum the vertex counts of two inputs</li></ul></li><li>Use a <strong>simplified representation</strong> of application data to do a "dumbed down" pre-estimate the cost of a full process<ul><li>eg run an intersection against a <em>square covering</em> of the application's reference layers, instead of the layers themselves</li></ul></li><li>Estimates can trigger application logic</li></ul></div></section><section class="light-scheme"><h1>Mitigation - constraints</h1><div><ul><li>Simplify input data through application constraints—for example<ul><li>eg the Area of Interest can only be a bounding box</li><li>eg the Area of Interest can only have a maximum total area<br><br></li></ul></li><li>Reduce the size of input data<ul><li>Curate simplified variants of reference layers</li></ul></li></ul></div></section><section class="light-scheme"><h1>Mitigation - regulating incoming data</h1><div><ul><li>Prevent complex data (especially with low value) from entering the system</li><li>Very important for systems that offer to <em>ingest</em> user data<ul><li><span style="color: red;">We're sorry, the maximum number of drill points you can upload is 1000</span></li></ul></li><li>Proactively report or alert when time and size limits are exceeded<ul><li>Record any related interactive user input and use in testing</li></ul></li><li><strong>Question</strong>: how to reason about global limits given the metrics of a new record?</li></ul></div></section><section class="light-scheme"><h1>Mitigation - concealment</h1><div><ul><li>Run processes asynchronously, or in the background<ul><li><strong>But only when</strong> their estimated complexity exceeds a configured threshold</li><li>Many applications have specific functions (printing, reporting) that run through a tasking environment regardless of performance</li></ul></li></ul></div></section><section class="light-scheme"><h1>Exploration - synthesise random data</h1><div><ul><li>An area poorly served by current libraries and frameworks</li><li>Turfjs provides <code>@turf/random</code>, which can generate random polygons, linestrings and points within a bbox<ul><li>but eg only generates "simple" polygons using a radial method</li></ul></li><li>Generic PostgreSQL tools do not provide useful support for geometry</li><li>There is no mainstream tool to generate data that's "similar" to a provided dataset, or data that expresses eg the spatial relatedness of two or more layers</li></ul></div></section><section data-state="map-slide"><div class="slideshow-map"></div><p class="raise-z-index">Current output from @turf/random</p><p class="raise-z-index">Zero resemblance to real world data</p></section><section class="light-scheme"><h1>Where to from here?</h1><div><ul><li>We still don't build many solid, consistent spatial apps</li><li>Measurement, defence and common sense need to be applied:<ul><li>to reference data</li><li>to user-supplied data</li></ul></li><li>An emphasis on Big Ideas hides some of the underlying patterns of the things being represented</li><li>FOSS4G would benefit from flexible geometry synthesisers<ul><li>More realistic test cases</li><li>Property-based testing frameworks</li></ul></li></ul></div></section><section class="light-scheme"><h1>Thanks and questions</h1><div><ul><li>Tom Lynch <a href="mailto:tom@trailmarker.io">tom@trailmarker.io</a></li><li>Twitter <a href="https://twitter.com/trmarker">@trmarker</a></li><li>Slides online <a href="https://trailmarker.io/foss4g_oceania_2018.html">here</a>: https://trailmarker.io/foss4g_oceania_2018.html</li></ul></div></section></div></div><script src="foss4g_oceania_2018/js/reveal.min.js" type="text/javascript"></script><script src="foss4g_oceania_2018/js/leaflet.js" type="text/javascript"></script><script src="foss4g_oceania_2018/script.js"></script><script src="https://www.googletagmanager.com/gtag/js?id=UA-129674450-1" type="text/javascript"></script><script>window.dataLayer=window.dataLayer||[];function gtag(){dataLayer.push(arguments);}gtag('js',new Date());gtag('config','UA-129674450-1');</script></body></html>