Advertisement
Advertisement

More Related Content

Similar to HBaseCon 2012 | Scaling GIS In Three Acts(20)

More from Cloudera, Inc.(20)

Advertisement

HBaseCon 2012 | Scaling GIS In Three Acts

  1. SCALING GIS IN 3 ACTS presented by: Nick Dimiduk May 22, 2012 1
  2. SCALING GIS IN 3 ACTS - Lightning Edition! - presented by: Nick Dimiduk May 22, 2012 2
  3. HBase in Action Manning Press, Fall 2012 MULTI-BAR CHART TITLE, LEFT ALIGNED Thousands 10 9 8 7 6 hbaseinaction.com 5 Discount code: 12hb10 4 3 2 1 0 Series 1 Series 2 Series 3 Series 4 © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State 3 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  4. Act I: What is GIS?
  5. GIS: Data (on maps!) MULTI-BAR CHART TITLE, LEFT ALIGNED MULTI-BAR CHART TITLE, LEFT ALIGNED Thousands 10 Thousands 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 Series 1 Series 2 Series 3 Series 4 Series 1 Series 2 Series 3 Series 4 © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State 5 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  6. HC SVNT DRAGONES MULTI-BAR CHART TITLE, LEFT ALIGNED Thousands 10 9 “Here are Dragons” 8 7 6 5 4 3 DRAGONES!!! 2 1 Image: Psalter World Map, 1265 0 http://en.wikipedia.org/wiki/Here_be_dragons Series 1 Series 2 Series 3 Series 4 © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State 6 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  7. Act II: What to do with GIS?
  8. Geospatial Queries © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State 8 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  9. Non-Euclidean Geometry “Know thy surface” Image: Trigonometry on a Spehere http://en.wikipedia.org/wiki/Non-Euclidean_geometry © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State 9 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  10. Act III: GIS on HBase
  11. The devil is in the Indices Image: Six iterations of the Hilbert curve http://en.wikipedia.org/wiki/Space-filling_curve © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State 12 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  12. Spatial Partitioning Image: USA night lights http://www.noaanews.noaa.gov/stories/s2015.htm © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State 14 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  13. The devil is in the (Spatial) Indices Image: German zipcodes, R*Tree http://en.wikipedia.org/wiki/R*_tree © 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State 16 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
  14. Thank you!

Editor's Notes

  1. IntrouctionsWho am INick Dimiduk, Data Platform teamwhat I do:Help growers manage risk. Sell insurance.3 bad years = lose your farm”what willthe weather be like this spring in Jasper County, IL?”“How many consecutive days above 74 degs?”“How similar is the weather Sioux Falls, SD vs. Dayton OH?”the data:*get data stats from zimmer*
  2. IntrouctionsWho am INick Dimiduk, Data Platform teamwhat I do:Help growers manage risk. Sell insurance.3 bad years = lose your farm”what willthe weather be like this spring in Jasper County, IL?”“How many consecutive days above 74 degs?”“How similar is the weather Sioux Falls, SD vs. Dayton OH?”the data:*get data stats from zimmer*
  3. On the sideCoauthor: Amandeep Khurana, ClouderaIn print this fall“Generous” discount code
  4. USGS (US Geological Survey) has a boring definition“Software system capable of capturing, storing, analyzing, displaying geographically referenced information”
  5. TN River Gorge, ChattanoogaMaps + Data actionable insight from dataThat happens to make really pretty picturesDifferent views for different peopleReally not all that different from this whole “Big Data” thingMap = dataBase layers:Terrain information: dataRiver and lake boundaries: dataCities, roads: dataReally just data + data in a picture (interactive?)Quick Poll: “how many of you make pretty pictures in your day-to-day data activities?”
  6. Not all fun and gamesMost GIS built by geographers… for bureaucratsNo software engineering experience or motivation“state of the art” is ArcGISthis is not a modern technical landscapeThe World isn’t flatMostly 3D informationWhich, btw, often changes over time (4D)Reduced to 2D for everyone’s convenienceStored in a 1D world (disk platters read linearly)“Here are Dragons”Hunt-Lenox Globe, 1503-07Unknown areas, stories of lions and dragons
  7. What does Climate do with Geo data?Historically:StoringAnalyzingNow and future:CapturingDisplaying
  8. Queries against spatial dataGeometry/Geography as first-class citizensIntersperse spatial queries with other attributesit’s all just data, remember?Geometric queries:Example of an “intersection” queryAlso: containment, overlap.Describe intersection between two geometries: “Dimensionally Extended 9-intersection model (DE-9IM)”Nearest neighborNo linear measurements (miles, kilometers) involved!GIS visualizing query results from early prototype.
  9. Measuring units becomes trickyAngular distance is not linear distance.Know that old joke about physicists? “Assume a spherical horse”Earth is an irregular sphereApproximated into idealUsing planar (2D) coordinatesCoordinate Reference Systems180 deg does not a triangle make. Validate your assumptions.
  10. Or “Horizontal scaling of geospatial systems.” In the cloud.What’s the data?Vector data: 1.5B features (geometry + metadata)Raster data: whole US on 10m resolutionReference into 30-100+ years’ worth of historical time-seriesAll on AWS.
  11. Preformant access to data requires indexingLinearization via Space-filling curveIndex 2D data in a single dimensionPreserve locality as much as possibleZ curve, Hilbert curve, etcGeohashingFar from perfect, edge-cases still hurtWorks okay for points but not for arbitrary geometries
  12. Horizontal scaling requires partitioningMany (2) ways to slice it (boo)Domain-partitioning: cut the world into chunksLon, lat are fixed domain. Simple to split evenly (hemispheres)Poor distribution of work.Range-partitioning: split according to the values you haveScale according to data densityEffective partitioning requires knowledge of your dataOr a specialized data-structure (foreshadowing)
  13. Many (2) ways to slice it (boo)Domain-partitioning: cut the world into chunksLon, lat are fixed domain. Simple to split evenly (hemispheres)Poor distribution of work.Range-partitioning: split according to the values you haveScale according to data density
  14. Preformant access to data requires indexingDimensionally aware indicesKD-Trees work great for point data (nearest neighbors)R-Tree variants for arbitrary geometries, but costly to constructUniform partitions => uniform trees => uniform access performanceTwo approaches to scaling:2-layer indices1st layer: coarse-grained partition2nd layer: specialized indexThis is MD-HBaseEasier to implement. Potentially miss geometries => incomplete results!!!Persisted spatial indicesImplement persisted R-TreeCustom regions via RegionSplitPolicy (0.94+)Should be more correct… “there are dragons”
  15. Questions?
Advertisement