Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Geospatial Big Data

554 views

Published on

Fabio Petroni
Big Data London Meetup @ Big Data LND
3 November 2016

Abstract: "Spatio-temporal data is one of the largest types of data being collected today. In this talk I will present the experience we had in KPMG with a completely open-source architecture for geospatial big data analytics - based on GeoMesa, Apache Accumulo, Apache Spark, GeoTools and GeoServer."

Published in: Engineering
  • Be the first to comment

Geospatial Big Data

  1. 1. Geospatial BigDataDr. Fabio Petroni
  2. 2. 2 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. •  exponential grow in volume of spatio-temporal data •  e.g., total number of foursquare check-ins: ∼8 billion Motivation
  3. 3. 3 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. ExamplesofGeospatialBigDataAnalysis
  4. 4. 4 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. monitor the evolving sentiment trends over time and over geographies about BREXIT CaseStudy
  5. 5. 5 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. •  Scalability -  storing, processing and visualizing large scale spatio-temporal data three dimensions one dimension (latitude, longitude, time) lexicographical ordering of keys in a table Challenges B+ tree
  6. 6. 6 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. •  binary string in which each character indicates alternating divisions of the global longitude-latitude rectangle Solution:Geohashes 0! 1!
  7. 7. 7 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Solution:Geohashes 00! 10! 01! 11!
  8. 8. 8 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Solution:Geohashes 0000! 0010! 0001! 0011! 0100! 0110! 0101! 0111! 1000! 1010! 1001! 1011! 1100! 1110! 1101! 1111!
  9. 9. 9 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. •  z-order traversal of the globe via 4-bit geohashes Solution:Z-orderTraversal 0000! 0010! 0001! 0011! 0100! 0110! 0101! 0111! 1000! 1010! 1001! 1011! 1100! 1110! 1101! 1111!
  10. 10. 10 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. •  a cluster node holds neighboring data points LocalityAwareIndex 1 2 3 4
  11. 11. 11 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. ! KPMGopensourcepipeline/stack HDFS! Accumulo! ! ! visualizationprocessingstorage ! ! large-scale data analysis query and share data
  12. 12. 12 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. •  the GDELT Project monitors the world’s broadcast, print and web news from the entire world •  GDELT Global Knowledge Graph (GKG) -  hyper-edges → represent news stories -  vertices → represent persons, organizations, locations, etc. Experiments–GDELTData Hillary Clinton! Donald Trump! h"p://www.bbc.co.uk/…. Washington,.D.C.. New.York.City. Tone:.?3.7. Tone:.+5.1. London. h"p://www.nyGmes.com/…. e1! e2!
  13. 13. 13 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. •  ∼200000 data points (news stories) Experiments-BrexitDataset •  First Location •  Date and Time •  URL •  Average Tone data point 2 October 2016 London (51.509865, -0.118092) #01111010… -2.76
  14. 14. 14 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. GeoServer/OpenLayersDataVisualization
  15. 15. 15 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. GeoServer/OpenLayersHeatmap-GeoMesaPlugin
  16. 16. 16 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Shiny/Leaflet-InteractiveDataVisualization
  17. 17. 17 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. 1.  project data points on a covering set of polygons 2.  calculate aggregate statistics •  1010… •  0010…. •  0000…. •  1000…. are these points in Australia? •  1011…. •  1001… •  1100… •  …. AggregatingDataWithApacheSpark
  18. 18. 18 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. •  1010… •  0010…. •  0000…. •  1000…. are these points in Australia? •  1011…. •  1001… •  1100… •  …. AggregatingDataWithApacheSpark 1000! 1010! 1001! 11! 0! 1011!
  19. 19. 19 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. AverageTonePerCountry 0 10000 20000 30000 40000 50000 60000 UK US CH GM AS JA EI BE NO PO FR CA IN MX RS NL LG IT SZ SP numberofnewsstories 0 10000 20000 30000 40000 50000 60000 UK US CH GM AS JA EI BE NO PO FR CA IN MX RS NL LG IT SZ SP numberofnewsstories News stories per country POST-BREXITPRE-BREXIT OVERALL
  20. 20. 20 Document Classification: KPMG Public © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. •  We have presented an architecture for Geo Spatial Big Data storage, processing and visualization •  Completely open-source! •  Fast and efficient: few minutes to perform the aggregation on Apache Spark with a few machines AWS cluster Conclusions
  21. 21. Thankyou! Dr. Fabio Petroni
  22. 22. Document Classification: KPMG Public The KPMG name, logo and “cutting through complexity” are registered trademarks or trademarks of KPMG International. Designed by CREATE | CRT057939 The information contained herein is of a general nature and is not intended to address the circumstances of any particular individual or entity. Although we endeavour to provide accurate and timely information, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future. No one should act on such information without appropriate professional advice after a thorough examination of the particular situation. © 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. kpmg.com/uk

×