SlideShare a Scribd company logo
1 of 22
Scalable Geospatial Queries
with Presto
Maria Basmanova
July 2018
Geospatial Data
• Values of type Geometry
• Points – location information (latitude and longitude)
• Lines – roads, cables
• Polygons – countries, regions, provinces, cities, cell tower coverage areas
• Stored as strings in Well-Known-Text (WKT) format
CC BY-SA 3.0 https://en.wikipedia.org/wiki/Well-known_text
• Multi-* - a collection of geometries of the same type
Multi-Geometry Types
CC BY-SA 3.0 https://en.wikipedia.org/wiki/Well-known_text
• A collection of geometries of different types
• Used to capture the result of an operation,
• e.g. intersection, difference, etc.
GeometryCollection
intersection
LINESTRING (…)
POLYGON(…)
GEOMETRYCOLLECTION
(LINESTRING(…), POINT(…))
Geospatial Functions
• ISO Standard - SQL/MM Part 3
• MM – multimedia
• Part 3 Spatial
• ST_ prefix (S – spatial, T – temporal)
• https://prestodb.io/docs/current/functions/geospatial.html
WKT-to-Geometry
• To Geometry
• ST_GeometryFromText(wkt)
• ST_Point(x, y)
• ST_Point(longitude, latitude)
• To WKT
• ST_AsText
Operations
• Inputs (and outputs) are geometry objects, not WKT strings
ST_Contains(g1, g2) ST_Intersection(g1, g2)
ST_Intersects(g1, g2) ST_ConvexHull(g)
ST_Distance(g1, g2) * ST_Union(g1, g2)
ST_Area(g) * ST_Centroid(g)
ST_Length(g) * ST_Envelope(g)
(*) Computation is done on Eucledian plane in the units of the input geometries
Spatial Join
• ST_Contains, ST_Intersects and ST_Distance
• R-Tree index for the build side
CC BY-SA 3.0 https://en.wikipedia.org/wiki/R-tree
SELECT *
FROM points, polygons
WHERE ST_Contains(ST_GeometryFromText(wkt), ST_Point(lng, lat))
Spatial Join Types
• Inner join
• Left join enables scalar correlated subqueries
SELECT (SELECT arbitrary(name) FROM polygons WHERE ST_Contains(polygon, ST_Point(lng, lat)))
FROM points
Distance Query
• Logically equivalent to ST_Contains(circle(b.point, radius), a.point)
• Radius can be a constant value or an expression using symbols from b
• A lot more efficient then ST_Contains(ST_Buffer(b.point, radius), a.point)
• What about the units?
SELECT * FROM a, b
WHERE ST_Distance(a.point, b.point) <= radius
Angular units
• 1 degree of latitude =~ 111.321 km and stays constant
• 1 degree of longitude =~ 111.321 km * cos(latitude)
• ST_Distance, ST_Area, ST_Length return results in angular units
• Within small areas, multiply by
• 111.321 km * cos(radians(ST_Y(ST_Centroid(ST_Envelope(g1)))))
Latitude at the center of the
bounding box of g1
Distance Query in km: Step 1
• ST_Distance(center, p) <= r / 111.321
• For r = 1
• Circle of 1 km near equator
• Ellipse with minor axis along the longitude
• and smaller diameter of 0.34 km at 70th
latitude
Distance Query in km: Step 2
• ST_Distance(center, p) <= r / (111.321 * cos(radians(center.latitude))))
• Ellipse with minor axis fixed at r km
• major axis starting at r km near equator
• and growing to 3r at 70th latitude
Distance Query in km: Step 3
SELECT *
FROM a, b
WHERE ST_Distance(ST_Point(a.lng, a.lat), ST_Point(b.lng, b.lat)) <=
radius_km / (111.321 * cos(radians(b.lat)))
AND great_circle_distance(a.lat, a.lng, b.lat, b.lng) <= radius_km
• Divide the radius by 111.321 * cos(latitude)
• Refine spatial join results using great_circle_distance
Bing Tiles
© 2018 Microsoft https://msdn.microsoft.com/en-us/library/bb259689.aspx.
Bing Tile Functions
• bing_tile_at(latitude, longitude, zoom_level)
• bing_tiles_around(latitude, longitude, zoom_level)
• geometry_to_bing_tiles(geometry, zoom_level)
• Choose zoom level based on radius
• tile width >= radius
• Refine join results using great_circle_distance
Distance Query using Bing Tiles
SELECT *
FROM a, (
SELECT * FROM b
CROSS JOIN UNNEST (bing_tiles_around(lat, lng, 14)) as t(tile)
) b
WHERE bing_tile_at(a.lat, a.lng, 14) = b.tile
AND great_circle_distance(a.lat, a.lng, b.lat, b.lng) <= radius_km
• Tile size depends on zoom level and latitude
• Smaller tiles at larger zoom levels and near the polls
How Large are Bing Tiles?
Tile width in kilometers
Questions?
Spatial Join
• Spatial joins are similar to Hash joins
• Hash-based partitioning -> Spatial partitioning
• Hash table -> Spatial Index (R-Tree)
• Broadcast spatial join requires only spatial index
SELECT *
FROM polygons, points
WHERE ST_Contains(ST_GeometryFromText(wkt), ST_Point(lng, lat))
CC BY-SA 3.0 https://en.wikipedia.org/wiki/R-tree
Spatial Partitioning
• Overall extent is split into non-overlapping rectanges
• KDB-Tree (K = 2)
• Total number of records, overall extent and a sample of
the data is needed to compute the partitioning scheme
• Some records may go into multiple partitions
• Polygons may intersect multiple rectangles
• Efficient inline de-dup technique is necessary
• Reference point of the intersection of bounding boxes
Inline Deduplication
• Some shapes intersect multiple partitions
• Only one partition contains a reference point
• Lower left corner of the intersection of bounding boxes

More Related Content

What's hot

Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
Bitsy graph database
Bitsy graph databaseBitsy graph database
Bitsy graph databaseLambdaZen LLC
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Daniel Hochman
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Julien Le Dem
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowDataWorks Summit
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowDatabricks
 
Why we chose Argo Workflow to scale DevOps at InVision
Why we chose Argo Workflow to scale DevOps at InVisionWhy we chose Argo Workflow to scale DevOps at InVision
Why we chose Argo Workflow to scale DevOps at InVisionNebulaworks
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptxHarissh16
 
REST-API overview / concepts
REST-API overview / conceptsREST-API overview / concepts
REST-API overview / conceptsPatrick Savalle
 

What's hot (20)

Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Bitsy graph database
Bitsy graph databaseBitsy graph database
Bitsy graph database
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
HTTP/3 for everyone
HTTP/3 for everyoneHTTP/3 for everyone
HTTP/3 for everyone
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Why we chose Argo Workflow to scale DevOps at InVision
Why we chose Argo Workflow to scale DevOps at InVisionWhy we chose Argo Workflow to scale DevOps at InVision
Why we chose Argo Workflow to scale DevOps at InVision
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
 
REST-API overview / concepts
REST-API overview / conceptsREST-API overview / concepts
REST-API overview / concepts
 

Similar to Presto Summit 2018 - 06 - Facebook Geospatial

Geek Sync | Having Fun with Spatial Data
Geek Sync | Having Fun with Spatial DataGeek Sync | Having Fun with Spatial Data
Geek Sync | Having Fun with Spatial DataIDERA Software
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Texas Natural Resources Information System
 
Geodesic algorithms: an experimental study
Geodesic algorithms: an experimental studyGeodesic algorithms: an experimental study
Geodesic algorithms: an experimental studyVissarion Fisikopoulos
 
Mar 8 single_map_analysis_1
Mar 8 single_map_analysis_1Mar 8 single_map_analysis_1
Mar 8 single_map_analysis_1dellissimo
 
The Earth is not flat; but it's not round either (Geography on Boost.Geometry)
The Earth is not flat; but it's not round either (Geography on Boost.Geometry)The Earth is not flat; but it's not round either (Geography on Boost.Geometry)
The Earth is not flat; but it's not round either (Geography on Boost.Geometry)Vissarion Fisikopoulos
 
Traversing Notes |surveying II | Sudip khadka
Traversing Notes |surveying II | Sudip khadka Traversing Notes |surveying II | Sudip khadka
Traversing Notes |surveying II | Sudip khadka Sudip khadka
 
Project 2- traversing
Project 2- traversingProject 2- traversing
Project 2- traversingseenyee
 
Project 2- traversing
Project 2- traversingProject 2- traversing
Project 2- traversingseenyee
 
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...Accumulo Summit
 
OBJECT DECOMPOSITION BASED ON SKELETON ANALYSIS FOR ROAD EXTRATION
OBJECT DECOMPOSITION BASED ON SKELETON ANALYSIS FOR ROAD EXTRATIONOBJECT DECOMPOSITION BASED ON SKELETON ANALYSIS FOR ROAD EXTRATION
OBJECT DECOMPOSITION BASED ON SKELETON ANALYSIS FOR ROAD EXTRATIONSaurabh Giratkar
 
SQLBits X SQL Server 2012 Spatial
SQLBits X SQL Server 2012 SpatialSQLBits X SQL Server 2012 Spatial
SQLBits X SQL Server 2012 SpatialMichael Rys
 
Global Map Matching using BLE Beacons for Indoor Route and Stay Estimation
Global Map Matching using BLE Beacons for Indoor Route and Stay EstimationGlobal Map Matching using BLE Beacons for Indoor Route and Stay Estimation
Global Map Matching using BLE Beacons for Indoor Route and Stay EstimationDaisuke Yamamoto
 
Mapwork skills.pptx
Mapwork skills.pptxMapwork skills.pptx
Mapwork skills.pptxKarl Mberema
 
TYBSC IT PGIS Unit IV Spacial Data Analysis
TYBSC IT PGIS Unit IV  Spacial Data AnalysisTYBSC IT PGIS Unit IV  Spacial Data Analysis
TYBSC IT PGIS Unit IV Spacial Data AnalysisArti Parab Academics
 

Similar to Presto Summit 2018 - 06 - Facebook Geospatial (20)

Geek Sync | Having Fun with Spatial Data
Geek Sync | Having Fun with Spatial DataGeek Sync | Having Fun with Spatial Data
Geek Sync | Having Fun with Spatial Data
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...
 
Geodesic algorithms: an experimental study
Geodesic algorithms: an experimental studyGeodesic algorithms: an experimental study
Geodesic algorithms: an experimental study
 
Day 6 - PostGIS
Day 6 - PostGISDay 6 - PostGIS
Day 6 - PostGIS
 
Mar 8 single_map_analysis_1
Mar 8 single_map_analysis_1Mar 8 single_map_analysis_1
Mar 8 single_map_analysis_1
 
The Earth is not flat; but it's not round either (Geography on Boost.Geometry)
The Earth is not flat; but it's not round either (Geography on Boost.Geometry)The Earth is not flat; but it's not round either (Geography on Boost.Geometry)
The Earth is not flat; but it's not round either (Geography on Boost.Geometry)
 
Gis basic
Gis basicGis basic
Gis basic
 
Gis Concepts 3/5
Gis Concepts 3/5Gis Concepts 3/5
Gis Concepts 3/5
 
Traversing Notes |surveying II | Sudip khadka
Traversing Notes |surveying II | Sudip khadka Traversing Notes |surveying II | Sudip khadka
Traversing Notes |surveying II | Sudip khadka
 
GIS
GISGIS
GIS
 
Project 2- traversing
Project 2- traversingProject 2- traversing
Project 2- traversing
 
Project 2- traversing
Project 2- traversingProject 2- traversing
Project 2- traversing
 
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
Accumulo Summit 2015: GeoWave: Geospatial and Geotemporal Data Storage and Re...
 
OBJECT DECOMPOSITION BASED ON SKELETON ANALYSIS FOR ROAD EXTRATION
OBJECT DECOMPOSITION BASED ON SKELETON ANALYSIS FOR ROAD EXTRATIONOBJECT DECOMPOSITION BASED ON SKELETON ANALYSIS FOR ROAD EXTRATION
OBJECT DECOMPOSITION BASED ON SKELETON ANALYSIS FOR ROAD EXTRATION
 
SQLBits X SQL Server 2012 Spatial
SQLBits X SQL Server 2012 SpatialSQLBits X SQL Server 2012 Spatial
SQLBits X SQL Server 2012 Spatial
 
Triangulation survey
Triangulation surveyTriangulation survey
Triangulation survey
 
Global Map Matching using BLE Beacons for Indoor Route and Stay Estimation
Global Map Matching using BLE Beacons for Indoor Route and Stay EstimationGlobal Map Matching using BLE Beacons for Indoor Route and Stay Estimation
Global Map Matching using BLE Beacons for Indoor Route and Stay Estimation
 
Mapwork skills.pptx
Mapwork skills.pptxMapwork skills.pptx
Mapwork skills.pptx
 
TYBSC IT PGIS Unit IV Spacial Data Analysis
TYBSC IT PGIS Unit IV  Spacial Data AnalysisTYBSC IT PGIS Unit IV  Spacial Data Analysis
TYBSC IT PGIS Unit IV Spacial Data Analysis
 
30838893 chain-survey
30838893 chain-survey30838893 chain-survey
30838893 chain-survey
 

More from kbajda

Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Bostonkbajda
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubolekbajda
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRAkbajda
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyftkbajda
 
Presto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber ElasticsearchPresto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber Elasticsearchkbajda
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containerskbajda
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedInkbajda
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Prestokbajda
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBOkbajda
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkkbajda
 

More from kbajda (13)

Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubole
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
 
Presto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber ElasticsearchPresto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber Elasticsearch
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 

Recently uploaded

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 

Recently uploaded (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 

Presto Summit 2018 - 06 - Facebook Geospatial

  • 1. Scalable Geospatial Queries with Presto Maria Basmanova July 2018
  • 2. Geospatial Data • Values of type Geometry • Points – location information (latitude and longitude) • Lines – roads, cables • Polygons – countries, regions, provinces, cities, cell tower coverage areas • Stored as strings in Well-Known-Text (WKT) format CC BY-SA 3.0 https://en.wikipedia.org/wiki/Well-known_text
  • 3. • Multi-* - a collection of geometries of the same type Multi-Geometry Types CC BY-SA 3.0 https://en.wikipedia.org/wiki/Well-known_text
  • 4. • A collection of geometries of different types • Used to capture the result of an operation, • e.g. intersection, difference, etc. GeometryCollection intersection LINESTRING (…) POLYGON(…) GEOMETRYCOLLECTION (LINESTRING(…), POINT(…))
  • 5. Geospatial Functions • ISO Standard - SQL/MM Part 3 • MM – multimedia • Part 3 Spatial • ST_ prefix (S – spatial, T – temporal) • https://prestodb.io/docs/current/functions/geospatial.html
  • 6. WKT-to-Geometry • To Geometry • ST_GeometryFromText(wkt) • ST_Point(x, y) • ST_Point(longitude, latitude) • To WKT • ST_AsText
  • 7. Operations • Inputs (and outputs) are geometry objects, not WKT strings ST_Contains(g1, g2) ST_Intersection(g1, g2) ST_Intersects(g1, g2) ST_ConvexHull(g) ST_Distance(g1, g2) * ST_Union(g1, g2) ST_Area(g) * ST_Centroid(g) ST_Length(g) * ST_Envelope(g) (*) Computation is done on Eucledian plane in the units of the input geometries
  • 8. Spatial Join • ST_Contains, ST_Intersects and ST_Distance • R-Tree index for the build side CC BY-SA 3.0 https://en.wikipedia.org/wiki/R-tree SELECT * FROM points, polygons WHERE ST_Contains(ST_GeometryFromText(wkt), ST_Point(lng, lat))
  • 9. Spatial Join Types • Inner join • Left join enables scalar correlated subqueries SELECT (SELECT arbitrary(name) FROM polygons WHERE ST_Contains(polygon, ST_Point(lng, lat))) FROM points
  • 10. Distance Query • Logically equivalent to ST_Contains(circle(b.point, radius), a.point) • Radius can be a constant value or an expression using symbols from b • A lot more efficient then ST_Contains(ST_Buffer(b.point, radius), a.point) • What about the units? SELECT * FROM a, b WHERE ST_Distance(a.point, b.point) <= radius
  • 11. Angular units • 1 degree of latitude =~ 111.321 km and stays constant • 1 degree of longitude =~ 111.321 km * cos(latitude) • ST_Distance, ST_Area, ST_Length return results in angular units • Within small areas, multiply by • 111.321 km * cos(radians(ST_Y(ST_Centroid(ST_Envelope(g1))))) Latitude at the center of the bounding box of g1
  • 12. Distance Query in km: Step 1 • ST_Distance(center, p) <= r / 111.321 • For r = 1 • Circle of 1 km near equator • Ellipse with minor axis along the longitude • and smaller diameter of 0.34 km at 70th latitude
  • 13. Distance Query in km: Step 2 • ST_Distance(center, p) <= r / (111.321 * cos(radians(center.latitude)))) • Ellipse with minor axis fixed at r km • major axis starting at r km near equator • and growing to 3r at 70th latitude
  • 14. Distance Query in km: Step 3 SELECT * FROM a, b WHERE ST_Distance(ST_Point(a.lng, a.lat), ST_Point(b.lng, b.lat)) <= radius_km / (111.321 * cos(radians(b.lat))) AND great_circle_distance(a.lat, a.lng, b.lat, b.lng) <= radius_km • Divide the radius by 111.321 * cos(latitude) • Refine spatial join results using great_circle_distance
  • 15. Bing Tiles © 2018 Microsoft https://msdn.microsoft.com/en-us/library/bb259689.aspx.
  • 16. Bing Tile Functions • bing_tile_at(latitude, longitude, zoom_level) • bing_tiles_around(latitude, longitude, zoom_level) • geometry_to_bing_tiles(geometry, zoom_level)
  • 17. • Choose zoom level based on radius • tile width >= radius • Refine join results using great_circle_distance Distance Query using Bing Tiles SELECT * FROM a, ( SELECT * FROM b CROSS JOIN UNNEST (bing_tiles_around(lat, lng, 14)) as t(tile) ) b WHERE bing_tile_at(a.lat, a.lng, 14) = b.tile AND great_circle_distance(a.lat, a.lng, b.lat, b.lng) <= radius_km
  • 18. • Tile size depends on zoom level and latitude • Smaller tiles at larger zoom levels and near the polls How Large are Bing Tiles? Tile width in kilometers
  • 20. Spatial Join • Spatial joins are similar to Hash joins • Hash-based partitioning -> Spatial partitioning • Hash table -> Spatial Index (R-Tree) • Broadcast spatial join requires only spatial index SELECT * FROM polygons, points WHERE ST_Contains(ST_GeometryFromText(wkt), ST_Point(lng, lat)) CC BY-SA 3.0 https://en.wikipedia.org/wiki/R-tree
  • 21. Spatial Partitioning • Overall extent is split into non-overlapping rectanges • KDB-Tree (K = 2) • Total number of records, overall extent and a sample of the data is needed to compute the partitioning scheme • Some records may go into multiple partitions • Polygons may intersect multiple rectangles • Efficient inline de-dup technique is necessary • Reference point of the intersection of bounding boxes
  • 22. Inline Deduplication • Some shapes intersect multiple partitions • Only one partition contains a reference point • Lower left corner of the intersection of bounding boxes