SlideShare a Scribd company logo
1 of 43
Big Data y geoposicionamiento
or
it’s bigger on the inside
Jorge López-Malla Matute
Senior Data Engineer
1. Presentation
2. What does Key Value means and why does it matters so much?
3. Why do we need Geopositioning analytics?
4. How can we merge these two worlds?
5. Q&A
Index
Presentation
SKILLS
JORGE LÓPEZ-MALLA
@jorgelopezmalla
Arquitecto Big Data, certificado número
13 de Spark, riojano y miope.
Después de años tratando de solventar
problemas modernos con tecnologías
tradicionales lo intenté con el Big Data
y, ¡vi que lo resolvían!
What we do
Geoblink is the ultimate location
Intelligence solution that helps companies
of any size make strategic, location-related
decisions on an easy-to-use platform
COLLECTING
DATA
We combine our
client’s internal data
with external data and
Geoblink’s proprietary
location data
TRANSFORMING
DATA
We process and analyze
data using advanced
analytics (big data) and
artificial intelligence
techniques
PROVIDING
INSIGHTS
We present insights on a
user-friendly platform to
help companies make
powerful, data-driven
decisions
How we do it
What does “Key Value” mean
and why does it matters so
much?
● Big Data was born in the early 2000s
● Data is no longer small enough to fit in a single commodity
machine
● Data grows exponentially
● Vertical scaling is both dangerous and expensive
A little bit of history
● Solutions?
3G
1G
15G
15G
6G
12G
12G
12G
Processing & Storing
● Choosing a proper key is not only critical in a stored system but
also very important in distributed processing frameworks
● Spark, is probably the most important distributed processing
framework right now, is no exception
● Both important in streaming and batch processing
Why do we need Geopositioning
analytics?
The Five Ws are questions whose answers are considered basic in
information gathering or problem solving
● Who was involved?
● What happened?
● Why did that happen?
● When did it take place?
● Where did it take place?
Five W
● Digital society needs immediate reactions
● “Slows” responses are not useful anymore
● Big Data allows us to answer 4 of the 5 W questions
● Geospatial problem is not just an enterprise problem
The where matters
Real world
Business world
How can we merge these two worlds?
● Knowing both the problem to solve and technology should be
enough
● Obtaining the proper key is the “key” in every Big Data project
● In geospatial projects it is fundamental to obtain the results
exactly where we want
● Taking this in mind we should find the key to each record of our
dataset, easy … or not?
Merging worlds
It’s bigger on the inside
The real problem
● Remember: We should assign a key to a value using as few
logic as possible
● All geospatial logic must be understandable by humans
● The intuitive behaviour is to assign each point to a knowing
geospatial cardinality
The real problem
The real Problem
Intersection
● Each coordinate is not relevant by itself
● To assign each coordinate to a recognizable area we need both
geometries
● So we need to intersect the coordinates with the areas
Intersection
Intersection
● The intersect operation has a high computational cost
● We need to do this operation only in the cases that a
intersection is probable
● We need to find a key to reduce the operation cost
● First of all, there is no silver bullet
● The “key” problem is worse in the Geospatial world
● Both storing and processing technologies have similar problems
● Geospatial indexes help a lot
Finding a proper Key
Spatial partitioner
● Some Geospatial tech has been grouped by Eclipse in
locationtech
● Geospark and Magellan are spatial modules for Spark
● Although we only talk about Spark, other processing engines
have this functionality
● We have tested only processing engines but researched for
storage techs
Big Data initiatives
Processing engines
● Both Magellan and Geospark offer geospatial functionality
powered by Apache Spark
● Both allow us to use SparkSQL for Geospatial queries
● Both optimize the queries in Spark
● Geospark’s documentation is better than Magellan Spark
Processing engines
GeoSpark optimization
● Spatial joins allows us to assign several geometries to a
geometry
● Remember intersect operations came with a high cost
● In most use cases you only want a 1:1 mapping
● You can use Broadcast variables!
Do you really need a join?
Geomesa-Big Data storing
● Geomesa is an open-source project that allows performing
geospatial operations against several datasources and
processing engines
● Has connectors with visual tools (like Geoserver)
● We only tested Geomesa with Hbase and as a POC (yet
● We only have tested Geomesa as a POC
(U1 ,Madrid, Point(x1, y1))
(U2 ,Logroño, Point(x2, y2))
(U1 ,Cadiz, Point(x3, y3))
(U3 ,Logroño, Point(x4, y4))
Geomesa
(U1 ,Ávila, Point(x5, y5))
(U2 ,Huelva, Point(x6, y6))
(U3 ,Huelva, Point(x7, y7))
(U2 ,Logroño, Point(x8, y8))
HBase
Master
Spark Executor-1
Spark Executor-2
Point(x1, y1), [U1 ,Madrid]
Point(x5, y5), [U1 ,Ávila]
Point(x2, y2), [U21 ,Logroño]
Point(x4, y4), [U31 ,Logroño]
Point(x6, y6), [U1 ,Huelva]
Point(x7, y7), [U3 ,Huelva]
Point(x3, y3), [U1 ,Cadiz]
Point(x8, y8), [U21 ,Logroño]
Region Server-1
Region Server-2
Region Server-3
ECLQuery.toCQL(“people
between 1000, 200”)
Geomesa
HBase
Master
Client.java
Region Server-1
Region Server-2
Region Server-3
Point(x1, y1), [U1 ,Madrid]
Point(x5, y5), [U1 ,Ávila]
Point(x2, y2), [U21 ,Logroño]
Point(x4, y4), [U31 ,Logroño]
Point(x6, y6), [U1 ,Huelva]
Point(x7, y7), [U3 ,Huelva]
Point(x3, y3), [U1 ,Cadiz]
Point(x8, y8), [U21 ,Logroño]
Geomesa
HBase
Master
val dataFrame =
sparkSession.read
.format("geomesa")
.options(dsParams)
.option("geomesa.featur
e", "spain")
.load()
Spark Driver
Region Server-1
Region Server-2
Region Server-3
Point(x1, y1), [U1 ,Madrid]
Point(x5, y5), [U1 ,Ávila]
Point(x2, y2), [U21 ,Logroño]
Point(x4, y4), [U31 ,Logroño]
Point(x6, y6), [U1 ,Huelva]
Point(x7, y7), [U3 ,Huelva]
Point(x3, y3), [U1 ,Cadiz]
Point(x8, y8), [U21 ,Logroño]
Takeaways
● We really need to give the insights in the proper location
● Big Data requires finding suitable key to our problem
● When dealing with big amount of data we have to aggregate it
● Spatial indexes are adequate keys but they are not perfect
● If you only need to assign one geometry to another, a spatial
join is not a good idea
Q&A
Q&A
★ Job offers:
○ https://www.geoblink.com/work-with-us/
★ Contact:
○ jobs@geoblink.com
○ jlmalla@geoblink.com
Jorge Lopez-Malla Matute | Geoposicionamiento Big Data o It's bigger on the inside | Codemotion Madrid 2018

More Related Content

More from Codemotion

More from Codemotion (20)

Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
Mike Kotsur - What can philosophy teach us about programming - Codemotion Ams...
 
Mete Atamel - Serverless with Knative - Codemotion Amsterdam 2019
Mete Atamel - Serverless with Knative - Codemotion Amsterdam 2019Mete Atamel - Serverless with Knative - Codemotion Amsterdam 2019
Mete Atamel - Serverless with Knative - Codemotion Amsterdam 2019
 
Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019
Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019
Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019
 
Mario Viviani - Designing apps for fire TV - Codemotion Amsterdam 2019
Mario Viviani - Designing apps for fire TV - Codemotion Amsterdam 2019Mario Viviani - Designing apps for fire TV - Codemotion Amsterdam 2019
Mario Viviani - Designing apps for fire TV - Codemotion Amsterdam 2019
 
Ilona Demidenko - Conversational Sign Up - Codemotion Amsterdam 2019
Ilona Demidenko - Conversational Sign Up - Codemotion Amsterdam 2019Ilona Demidenko - Conversational Sign Up - Codemotion Amsterdam 2019
Ilona Demidenko - Conversational Sign Up - Codemotion Amsterdam 2019
 
Katie Koschland - Ready, steady, crash - Codemotion Amsterdam 2019
Katie Koschland - Ready, steady, crash - Codemotion Amsterdam 2019Katie Koschland - Ready, steady, crash - Codemotion Amsterdam 2019
Katie Koschland - Ready, steady, crash - Codemotion Amsterdam 2019
 
Matteo Antony Mistretta - React, the Inglorious way - Codemotion Amsterdam 2019
Matteo Antony Mistretta - React, the Inglorious way - Codemotion Amsterdam 2019Matteo Antony Mistretta - React, the Inglorious way - Codemotion Amsterdam 2019
Matteo Antony Mistretta - React, the Inglorious way - Codemotion Amsterdam 2019
 
Andreea Marin - Our journey into Cassandra performance optimisation -
Andreea Marin - Our journey into Cassandra performance optimisation -Andreea Marin - Our journey into Cassandra performance optimisation -
Andreea Marin - Our journey into Cassandra performance optimisation -
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Jorge Lopez-Malla Matute | Geoposicionamiento Big Data o It's bigger on the inside | Codemotion Madrid 2018

  • 1. Big Data y geoposicionamiento or it’s bigger on the inside Jorge López-Malla Matute Senior Data Engineer
  • 2. 1. Presentation 2. What does Key Value means and why does it matters so much? 3. Why do we need Geopositioning analytics? 4. How can we merge these two worlds? 5. Q&A Index
  • 4. SKILLS JORGE LÓPEZ-MALLA @jorgelopezmalla Arquitecto Big Data, certificado número 13 de Spark, riojano y miope. Después de años tratando de solventar problemas modernos con tecnologías tradicionales lo intenté con el Big Data y, ¡vi que lo resolvían!
  • 5. What we do Geoblink is the ultimate location Intelligence solution that helps companies of any size make strategic, location-related decisions on an easy-to-use platform
  • 6. COLLECTING DATA We combine our client’s internal data with external data and Geoblink’s proprietary location data TRANSFORMING DATA We process and analyze data using advanced analytics (big data) and artificial intelligence techniques PROVIDING INSIGHTS We present insights on a user-friendly platform to help companies make powerful, data-driven decisions How we do it
  • 7. What does “Key Value” mean and why does it matters so much?
  • 8. ● Big Data was born in the early 2000s ● Data is no longer small enough to fit in a single commodity machine ● Data grows exponentially ● Vertical scaling is both dangerous and expensive A little bit of history ● Solutions?
  • 9.
  • 10.
  • 11.
  • 13. Processing & Storing ● Choosing a proper key is not only critical in a stored system but also very important in distributed processing frameworks ● Spark, is probably the most important distributed processing framework right now, is no exception ● Both important in streaming and batch processing
  • 14. Why do we need Geopositioning analytics?
  • 15. The Five Ws are questions whose answers are considered basic in information gathering or problem solving ● Who was involved? ● What happened? ● Why did that happen? ● When did it take place? ● Where did it take place? Five W
  • 16. ● Digital society needs immediate reactions ● “Slows” responses are not useful anymore ● Big Data allows us to answer 4 of the 5 W questions ● Geospatial problem is not just an enterprise problem The where matters
  • 19. How can we merge these two worlds?
  • 20. ● Knowing both the problem to solve and technology should be enough ● Obtaining the proper key is the “key” in every Big Data project ● In geospatial projects it is fundamental to obtain the results exactly where we want ● Taking this in mind we should find the key to each record of our dataset, easy … or not? Merging worlds
  • 21. It’s bigger on the inside
  • 23. ● Remember: We should assign a key to a value using as few logic as possible ● All geospatial logic must be understandable by humans ● The intuitive behaviour is to assign each point to a knowing geospatial cardinality The real problem
  • 25. Intersection ● Each coordinate is not relevant by itself ● To assign each coordinate to a recognizable area we need both geometries ● So we need to intersect the coordinates with the areas
  • 27. Intersection ● The intersect operation has a high computational cost ● We need to do this operation only in the cases that a intersection is probable ● We need to find a key to reduce the operation cost
  • 28. ● First of all, there is no silver bullet ● The “key” problem is worse in the Geospatial world ● Both storing and processing technologies have similar problems ● Geospatial indexes help a lot Finding a proper Key
  • 30. ● Some Geospatial tech has been grouped by Eclipse in locationtech ● Geospark and Magellan are spatial modules for Spark ● Although we only talk about Spark, other processing engines have this functionality ● We have tested only processing engines but researched for storage techs Big Data initiatives
  • 32. ● Both Magellan and Geospark offer geospatial functionality powered by Apache Spark ● Both allow us to use SparkSQL for Geospatial queries ● Both optimize the queries in Spark ● Geospark’s documentation is better than Magellan Spark Processing engines
  • 34. ● Spatial joins allows us to assign several geometries to a geometry ● Remember intersect operations came with a high cost ● In most use cases you only want a 1:1 mapping ● You can use Broadcast variables! Do you really need a join?
  • 35. Geomesa-Big Data storing ● Geomesa is an open-source project that allows performing geospatial operations against several datasources and processing engines ● Has connectors with visual tools (like Geoserver) ● We only tested Geomesa with Hbase and as a POC (yet ● We only have tested Geomesa as a POC
  • 36. (U1 ,Madrid, Point(x1, y1)) (U2 ,Logroño, Point(x2, y2)) (U1 ,Cadiz, Point(x3, y3)) (U3 ,Logroño, Point(x4, y4)) Geomesa (U1 ,Ávila, Point(x5, y5)) (U2 ,Huelva, Point(x6, y6)) (U3 ,Huelva, Point(x7, y7)) (U2 ,Logroño, Point(x8, y8)) HBase Master Spark Executor-1 Spark Executor-2 Point(x1, y1), [U1 ,Madrid] Point(x5, y5), [U1 ,Ávila] Point(x2, y2), [U21 ,Logroño] Point(x4, y4), [U31 ,Logroño] Point(x6, y6), [U1 ,Huelva] Point(x7, y7), [U3 ,Huelva] Point(x3, y3), [U1 ,Cadiz] Point(x8, y8), [U21 ,Logroño] Region Server-1 Region Server-2 Region Server-3
  • 37. ECLQuery.toCQL(“people between 1000, 200”) Geomesa HBase Master Client.java Region Server-1 Region Server-2 Region Server-3 Point(x1, y1), [U1 ,Madrid] Point(x5, y5), [U1 ,Ávila] Point(x2, y2), [U21 ,Logroño] Point(x4, y4), [U31 ,Logroño] Point(x6, y6), [U1 ,Huelva] Point(x7, y7), [U3 ,Huelva] Point(x3, y3), [U1 ,Cadiz] Point(x8, y8), [U21 ,Logroño]
  • 38. Geomesa HBase Master val dataFrame = sparkSession.read .format("geomesa") .options(dsParams) .option("geomesa.featur e", "spain") .load() Spark Driver Region Server-1 Region Server-2 Region Server-3 Point(x1, y1), [U1 ,Madrid] Point(x5, y5), [U1 ,Ávila] Point(x2, y2), [U21 ,Logroño] Point(x4, y4), [U31 ,Logroño] Point(x6, y6), [U1 ,Huelva] Point(x7, y7), [U3 ,Huelva] Point(x3, y3), [U1 ,Cadiz] Point(x8, y8), [U21 ,Logroño]
  • 39. Takeaways ● We really need to give the insights in the proper location ● Big Data requires finding suitable key to our problem ● When dealing with big amount of data we have to aggregate it ● Spatial indexes are adequate keys but they are not perfect ● If you only need to assign one geometry to another, a spatial join is not a good idea
  • 40. Q&A
  • 41. Q&A
  • 42. ★ Job offers: ○ https://www.geoblink.com/work-with-us/ ★ Contact: ○ jobs@geoblink.com ○ jlmalla@geoblink.com