HBaseCon 2012 | Scaling GIS In Three Acts
 

Like this? Share it with your network

Share

HBaseCon 2012 | Scaling GIS In Three Acts

on

  • 1,796 views

Scaling geospatial data is hard. State of the art GIS technologies available to the general public are locked in the realm of relational databases with PostGIS as the prominent leader. Though a number ...

Scaling geospatial data is hard. State of the art GIS technologies available to the general public are locked in the realm of relational databases with PostGIS as the prominent leader. Though a number of location-based startups have walked this path before, few have marked their trail along the way. Act one proveds a survey of the landscape, defining terms, and highlighting pitfalls. Act two explores the world of open source, horizontally scalable GIS and outlines the problems they solve. Act three explores implementations backed by HBase. No previous GIS knowledge is required.

Statistics

Views

Total Views
1,796
Views on SlideShare
1,637
Embed Views
159

Actions

Likes
2
Downloads
65
Comments
0

2 Embeds 159

http://www.cloudera.com 156
http://blog.cloudera.com 3

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • IntrouctionsWho am INick Dimiduk, Data Platform teamwhat I do:Help growers manage risk. Sell insurance.3 bad years = lose your farm”what willthe weather be like this spring in Jasper County, IL?”“How many consecutive days above 74 degs?”“How similar is the weather Sioux Falls, SD vs. Dayton OH?”the data:*get data stats from zimmer*
  • IntrouctionsWho am INick Dimiduk, Data Platform teamwhat I do:Help growers manage risk. Sell insurance.3 bad years = lose your farm”what willthe weather be like this spring in Jasper County, IL?”“How many consecutive days above 74 degs?”“How similar is the weather Sioux Falls, SD vs. Dayton OH?”the data:*get data stats from zimmer*
  • On the sideCoauthor: Amandeep Khurana, ClouderaIn print this fall“Generous” discount code
  • USGS (US Geological Survey) has a boring definition“Software system capable of capturing, storing, analyzing, displaying geographically referenced information”
  • TN River Gorge, ChattanoogaMaps + Data actionable insight from dataThat happens to make really pretty picturesDifferent views for different peopleReally not all that different from this whole “Big Data” thingMap = dataBase layers:Terrain information: dataRiver and lake boundaries: dataCities, roads: dataReally just data + data in a picture (interactive?)Quick Poll: “how many of you make pretty pictures in your day-to-day data activities?”
  • Not all fun and gamesMost GIS built by geographers… for bureaucratsNo software engineering experience or motivation“state of the art” is ArcGISthis is not a modern technical landscapeThe World isn’t flatMostly 3D informationWhich, btw, often changes over time (4D)Reduced to 2D for everyone’s convenienceStored in a 1D world (disk platters read linearly)“Here are Dragons”Hunt-Lenox Globe, 1503-07Unknown areas, stories of lions and dragons
  • What does Climate do with Geo data?Historically:StoringAnalyzingNow and future:CapturingDisplaying
  • Queries against spatial dataGeometry/Geography as first-class citizensIntersperse spatial queries with other attributesit’s all just data, remember?Geometric queries:Example of an “intersection” queryAlso: containment, overlap.Describe intersection between two geometries: “Dimensionally Extended 9-intersection model (DE-9IM)”Nearest neighborNo linear measurements (miles, kilometers) involved!GIS visualizing query results from early prototype.
  • Measuring units becomes trickyAngular distance is not linear distance.Know that old joke about physicists? “Assume a spherical horse”Earth is an irregular sphereApproximated into idealUsing planar (2D) coordinatesCoordinate Reference Systems180 deg does not a triangle make. Validate your assumptions.
  • Or “Horizontal scaling of geospatial systems.” In the cloud.What’s the data?Vector data: 1.5B features (geometry + metadata)Raster data: whole US on 10m resolutionReference into 30-100+ years’ worth of historical time-seriesAll on AWS.
  • Preformant access to data requires indexingLinearization via Space-filling curveIndex 2D data in a single dimensionPreserve locality as much as possibleZ curve, Hilbert curve, etcGeohashingFar from perfect, edge-cases still hurtWorks okay for points but not for arbitrary geometries
  • Horizontal scaling requires partitioningMany (2) ways to slice it (boo)Domain-partitioning: cut the world into chunksLon, lat are fixed domain. Simple to split evenly (hemispheres)Poor distribution of work.Range-partitioning: split according to the values you haveScale according to data densityEffective partitioning requires knowledge of your dataOr a specialized data-structure (foreshadowing)
  • Many (2) ways to slice it (boo)Domain-partitioning: cut the world into chunksLon, lat are fixed domain. Simple to split evenly (hemispheres)Poor distribution of work.Range-partitioning: split according to the values you haveScale according to data density
  • Preformant access to data requires indexingDimensionally aware indicesKD-Trees work great for point data (nearest neighbors)R-Tree variants for arbitrary geometries, but costly to constructUniform partitions => uniform trees => uniform access performanceTwo approaches to scaling:2-layer indices1st layer: coarse-grained partition2nd layer: specialized indexThis is MD-HBaseEasier to implement. Potentially miss geometries => incomplete results!!!Persisted spatial indicesImplement persisted R-TreeCustom regions via RegionSplitPolicy (0.94+)Should be more correct… “there are dragons”
  • Questions?

HBaseCon 2012 | Scaling GIS In Three Acts Presentation Transcript

  • 1. SCALING GIS IN 3 ACTS presented by: Nick Dimiduk May 22, 20121
  • 2. SCALING GIS IN 3 ACTS - Lightning Edition! - presented by: Nick Dimiduk May 22, 20122
  • 3. HBase in Action Manning Press, Fall 2012 MULTI-BAR CHART TITLE, LEFT ALIGNED Thousands 10 9 8 7 6 hbaseinaction.com 5 Discount code: 12hb10 4 3 2 1 0 Series 1 Series 2 Series 3 Series 4 © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State3 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  • 4. Act I: What is GIS?
  • 5. GIS: Data (on maps!)MULTI-BAR CHART TITLE, LEFT ALIGNED MULTI-BAR CHART TITLE, LEFT ALIGNEDThousands 10 Thousands 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 Series 1 Series 2 Series 3 Series 4 Series 1 Series 2 Series 3 Series 4 © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State5 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  • 6. HC SVNT DRAGONES MULTI-BAR CHART TITLE, LEFT ALIGNED Thousands 10 9 “Here are Dragons” 8 7 6 5 4 3 DRAGONES!!! 2 1 Image: Psalter World Map, 1265 0 http://en.wikipedia.org/wiki/Here_be_dragons Series 1 Series 2 Series 3 Series 4 © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State6 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  • 7. Act II: What to do with GIS?
  • 8. Geospatial Queries © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State8 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  • 9. Non-Euclidean Geometry “Know thy surface” Image: Trigonometry on a Spehere http://en.wikipedia.org/wiki/Non-Euclidean_geometry © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State9 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  • 10. Act III: GIS on HBase
  • 11. The devil is in the Indices Image: Six iterations of the Hilbert curve http://en.wikipedia.org/wiki/Space-filling_curve © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State12 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  • 12. Spatial Partitioning Image: USA night lights http://www.noaanews.noaa.gov/stories/s2015.htm © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State14 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  • 13. The devil is in the (Spatial) Indices Image: German zipcodes, R*Tree http://en.wikipedia.org/wiki/R*_tree © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State16 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  • 14. Thank you!