-
1.
SCALING GIS IN 3 ACTS
presented by:
Nick Dimiduk
May 22, 2012
1
-
2.
SCALING GIS IN 3 ACTS
- Lightning Edition! -
presented by:
Nick Dimiduk
May 22, 2012
2
-
3.
HBase in Action
Manning Press, Fall 2012
MULTI-BAR CHART TITLE, LEFT ALIGNED
Thousands 10
9
8
7
6
hbaseinaction.com 5
Discount code: 12hb10 4
3
2
1
0
Series 1 Series 2 Series 3 Series 4
© 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State
3 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
-
4.
Act I: What is GIS?
-
5.
GIS: Data (on maps!)
MULTI-BAR CHART TITLE, LEFT ALIGNED MULTI-BAR CHART TITLE, LEFT ALIGNED
Thousands 10 Thousands 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
Series 1 Series 2 Series 3 Series 4 Series 1 Series 2 Series 3 Series 4
© 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State
5 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
-
6.
HC SVNT DRAGONES
MULTI-BAR CHART TITLE, LEFT ALIGNED
Thousands 10
9
“Here are Dragons” 8
7
6
5
4
3
DRAGONES!!!
2
1
Image: Psalter World Map, 1265 0
http://en.wikipedia.org/wiki/Here_be_dragons
Series 1 Series 2 Series 3 Series 4
© 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State
6 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
-
7.
Act II: What to do with GIS?
-
8.
Geospatial Queries
© 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State
8 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
-
9.
Non-Euclidean Geometry
“Know thy surface”
Image: Trigonometry on a Spehere
http://en.wikipedia.org/wiki/Non-Euclidean_geometry
© 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State
9 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
-
10.
Act III: GIS on HBase
-
11.
The devil is in the Indices
Image: Six iterations of the Hilbert curve
http://en.wikipedia.org/wiki/Space-filling_curve
© 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State
12 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
-
12.
Spatial Partitioning
Image: USA night lights
http://www.noaanews.noaa.gov/stories/s2015.htm
© 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State
14 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
-
13.
The devil is in the (Spatial) Indices
Image: German zipcodes, R*Tree
http://en.wikipedia.org/wiki/R*_tree
© 2012 The Climate Corporation. All Rights Reserved. Policies are underwritten by State
16 National Insurance Company, Inc. and administered by The Climate Insurance Agency LLC.
-
14.
Thank you!
IntrouctionsWho am INick Dimiduk, Data Platform teamwhat I do:Help growers manage risk. Sell insurance.3 bad years = lose your farm”what willthe weather be like this spring in Jasper County, IL?”“How many consecutive days above 74 degs?”“How similar is the weather Sioux Falls, SD vs. Dayton OH?”the data:*get data stats from zimmer*
IntrouctionsWho am INick Dimiduk, Data Platform teamwhat I do:Help growers manage risk. Sell insurance.3 bad years = lose your farm”what willthe weather be like this spring in Jasper County, IL?”“How many consecutive days above 74 degs?”“How similar is the weather Sioux Falls, SD vs. Dayton OH?”the data:*get data stats from zimmer*
On the sideCoauthor: Amandeep Khurana, ClouderaIn print this fall“Generous” discount code
USGS (US Geological Survey) has a boring definition“Software system capable of capturing, storing, analyzing, displaying geographically referenced information”
TN River Gorge, ChattanoogaMaps + Data actionable insight from dataThat happens to make really pretty picturesDifferent views for different peopleReally not all that different from this whole “Big Data” thingMap = dataBase layers:Terrain information: dataRiver and lake boundaries: dataCities, roads: dataReally just data + data in a picture (interactive?)Quick Poll: “how many of you make pretty pictures in your day-to-day data activities?”
Not all fun and gamesMost GIS built by geographers… for bureaucratsNo software engineering experience or motivation“state of the art” is ArcGISthis is not a modern technical landscapeThe World isn’t flatMostly 3D informationWhich, btw, often changes over time (4D)Reduced to 2D for everyone’s convenienceStored in a 1D world (disk platters read linearly)“Here are Dragons”Hunt-Lenox Globe, 1503-07Unknown areas, stories of lions and dragons
What does Climate do with Geo data?Historically:StoringAnalyzingNow and future:CapturingDisplaying
Queries against spatial dataGeometry/Geography as first-class citizensIntersperse spatial queries with other attributesit’s all just data, remember?Geometric queries:Example of an “intersection” queryAlso: containment, overlap.Describe intersection between two geometries: “Dimensionally Extended 9-intersection model (DE-9IM)”Nearest neighborNo linear measurements (miles, kilometers) involved!GIS visualizing query results from early prototype.
Measuring units becomes trickyAngular distance is not linear distance.Know that old joke about physicists? “Assume a spherical horse”Earth is an irregular sphereApproximated into idealUsing planar (2D) coordinatesCoordinate Reference Systems180 deg does not a triangle make. Validate your assumptions.
Or “Horizontal scaling of geospatial systems.” In the cloud.What’s the data?Vector data: 1.5B features (geometry + metadata)Raster data: whole US on 10m resolutionReference into 30-100+ years’ worth of historical time-seriesAll on AWS.
Preformant access to data requires indexingLinearization via Space-filling curveIndex 2D data in a single dimensionPreserve locality as much as possibleZ curve, Hilbert curve, etcGeohashingFar from perfect, edge-cases still hurtWorks okay for points but not for arbitrary geometries
Horizontal scaling requires partitioningMany (2) ways to slice it (boo)Domain-partitioning: cut the world into chunksLon, lat are fixed domain. Simple to split evenly (hemispheres)Poor distribution of work.Range-partitioning: split according to the values you haveScale according to data densityEffective partitioning requires knowledge of your dataOr a specialized data-structure (foreshadowing)
Many (2) ways to slice it (boo)Domain-partitioning: cut the world into chunksLon, lat are fixed domain. Simple to split evenly (hemispheres)Poor distribution of work.Range-partitioning: split according to the values you haveScale according to data density
Preformant access to data requires indexingDimensionally aware indicesKD-Trees work great for point data (nearest neighbors)R-Tree variants for arbitrary geometries, but costly to constructUniform partitions => uniform trees => uniform access performanceTwo approaches to scaling:2-layer indices1st layer: coarse-grained partition2nd layer: specialized indexThis is MD-HBaseEasier to implement. Potentially miss geometries => incomplete results!!!Persisted spatial indicesImplement persisted R-TreeCustom regions via RegionSplitPolicy (0.94+)Should be more correct… “there are dragons”
Questions?