SlideShare a Scribd company logo
Building an Open Source, Real-Time,
Billion Object Spatio-Temporal Search
Plaform
2016 International Workshop on Cloud Computing and Big Data
Benjamin Lewis, David Strohschein, Paolo Corti, David Smiley
Center for Geographic Analysis, Harvard University
Background
● Big data is everywhere: sensors (weather, pollution…), mobile devices,
social platform activities, software logs, etc.
● Data are generally streaming, so they are temporal
● Most of those data are spatial as well
● Traditional RDBMS, desktop statistics and visualization packages have
difficulty handling big data
● Current solutions involve “massive parallel software running on a large
number of servers”
Use case
● We work in a research university so we need to provide big data to students and
researchers
● Our goal is to lower barriers to interactive data exploration
● Some systems support visualization of large spatio-temporal datasets but don’t handle
search well
● Many search applications (most search engines) handle text but do not support the
geographic dimension.
● Great need for tool to allow user to interactively search large collections and visualize
them geographically. To support such increasingly common datasets, a new kind of
map server and client is needed.
● Project funded by the Sloan Foundation in partnership with Dataverse team at
Harvard IQSS
Solution
● A general solution. Prototype
with geotagged tweets (tweets
containing GPS coordinates
from originating device)
● Platform adaptable to other
big data spatial time streams
(weather and pollution
sensors, geoRSS feeds etc...)
● Integrate the new platform
within Harvard WorldMap
and Dataverse systems
Objective
● Create a missing piece of geo-infrastructure and make it
available
● Demonstrate possibility of addressing scalability limitations
with non-exotic software and hardware
● Make setting up platforms for big spatio-temporal
visualization as easy as setting up a standard GIS stack
Streaming big data
Geotagged tweets
● Geotagged tweets: tweets containing GPS coordinates from originating
device
● Currently about 2% of tweets are geotagged, about 8 million per day
● The CGA has been harvesting geo-tweets since October 2012 using the
Twitter API
● Billion Object Platform(BOP) will provide a client and API to browse and
search the latest 1 billion geotagged tweets (about 3 months range)
● Command line tools to extract older geotagged tweets from archives
The BOP (Billion Object Platform)
● General purpose, open source platform to support exploration of large collections
of spatio-temporal entities
● Built on top of a search engine
● Supports exploration, visualization, extraction via a RESTful API
● Queryable by time, space, text
● Responsive
● Spatial heatmap to represent the distribution of results (spatial faceting: results
per cell in a grid)
● Support temporal histograms (temporal faceting: results per date time range)
● Support word clouds as a mechanism to enhance results browsing by topic
● Support downloads of subsets for registered users (up to 10,000 features)
● Sentiment stamping
Solution Stack
● Apache Lucene: an indexing and search library
● Apache Solr: a search web server platform built on top of
Lucene
● Apache Kafka: a message broker written in Scala to provide
a platform for handling real-time data streams
● Apache ZooKeeper: enables highly reliable distributed
coordination
● Swagger: a framework for building APIs
● scikit-learn library: Machine Learning in Python
● OpenLayers: a javascript mapping client
● AngularJS: a javascript framework
Search engine features
● Faceted searches (category, space and time)
● Stemming: ability to detect words derived from a common root
● Synonyms detection and controlled vocabulary such as thesauri and taxonomies
● Weighted results
● Wildcard and fuzzy search: provide results for a given term and its common
variations
● Boolean queries: search results using terms and boolean operators such as AND,
OR, NOT…
● Hit highlighting: provides immediate suggestions to the user typing the text to
search
● Stop words: words filtered out during the processing of text
Client to enable data exploration and extraction
API to streaming geotagged tweets
Sentiment Analysis
● Sentiment analysis is a field of study which identifies the opinion of people
expressed in a text using natural language processing tools
● Social media such as Twitter provides a constant source of textual data, many
with an opinion, which can be analyzed using Sentiment Analysis tools.
● Using the scikit-learn library (Machine Learning in Python) we sentiment stamp
as positive or negative each tweet
HHypermap
Similar approach to BOP
(Solr/Lucene): provides a
searchable registry of map
service layers from OGC
and Esri public endpoints

More Related Content

What's hot

Location based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagLocation based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagMicrosoft Mobile Developer
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial dataKudos S.A.S
 
CKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みCKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みYoichi Kayama
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisRob Emanuele
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DCCCRinc
 
Building a Spatial Database in PostgreSQL
Building a Spatial Database in PostgreSQLBuilding a Spatial Database in PostgreSQL
Building a Spatial Database in PostgreSQLKudos S.A.S
 

What's hot (6)

Location based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagLocation based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tag
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
 
CKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みCKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試み
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
Building a Spatial Database in PostgreSQL
Building a Spatial Database in PostgreSQLBuilding a Spatial Database in PostgreSQL
Building a Spatial Database in PostgreSQL
 

Viewers also liked

2016 New Lighting Technology Ivan Tchakarov
2016 New Lighting Technology Ivan Tchakarov2016 New Lighting Technology Ivan Tchakarov
2016 New Lighting Technology Ivan TchakarovIvan Tchakarov
 
Idiomatic Gradle Plugin Writing
Idiomatic Gradle Plugin WritingIdiomatic Gradle Plugin Writing
Idiomatic Gradle Plugin WritingSchalk Cronjé
 
Clivaje y elecciones de 1851 - CHILE
Clivaje y elecciones de 1851 - CHILEClivaje y elecciones de 1851 - CHILE
Clivaje y elecciones de 1851 - CHILETavita Vargas
 
Pritam Naik Resume
Pritam Naik ResumePritam Naik Resume
Pritam Naik Resumepritam naik
 
Trabajo práctico ayudantía 2011
Trabajo práctico ayudantía 2011Trabajo práctico ayudantía 2011
Trabajo práctico ayudantía 2011Tavita Vargas
 
ZOO_DIGITAL_300414 HR
ZOO_DIGITAL_300414 HRZOO_DIGITAL_300414 HR
ZOO_DIGITAL_300414 HRLars Clausen
 
Your application ever up-to-date? Go continuous delivery
Your application ever up-to-date? Go continuous deliveryYour application ever up-to-date? Go continuous delivery
Your application ever up-to-date? Go continuous deliveryDavide Benvegnù
 
DocDoc's Guide To Digital Marketing
DocDoc's Guide To Digital MarketingDocDoc's Guide To Digital Marketing
DocDoc's Guide To Digital MarketingJon Samsel
 
Gradle in 45min - JBCN2-16 version
Gradle in 45min - JBCN2-16 versionGradle in 45min - JBCN2-16 version
Gradle in 45min - JBCN2-16 versionSchalk Cronjé
 
Voxxed Belgrade 2016
Voxxed Belgrade 2016Voxxed Belgrade 2016
Voxxed Belgrade 2016Karina Popova
 
Кастомная разработка в области E-Commerce
Кастомная разработка в области E-CommerceКастомная разработка в области E-Commerce
Кастомная разработка в области E-CommerceDZ Systems
 

Viewers also liked (14)

2016 New Lighting Technology Ivan Tchakarov
2016 New Lighting Technology Ivan Tchakarov2016 New Lighting Technology Ivan Tchakarov
2016 New Lighting Technology Ivan Tchakarov
 
Las plantas
Las plantasLas plantas
Las plantas
 
Idiomatic Gradle Plugin Writing
Idiomatic Gradle Plugin WritingIdiomatic Gradle Plugin Writing
Idiomatic Gradle Plugin Writing
 
Clivaje y elecciones de 1851 - CHILE
Clivaje y elecciones de 1851 - CHILEClivaje y elecciones de 1851 - CHILE
Clivaje y elecciones de 1851 - CHILE
 
Pritam Naik Resume
Pritam Naik ResumePritam Naik Resume
Pritam Naik Resume
 
Trabajo práctico ayudantía 2011
Trabajo práctico ayudantía 2011Trabajo práctico ayudantía 2011
Trabajo práctico ayudantía 2011
 
ZOO_DIGITAL_300414 HR
ZOO_DIGITAL_300414 HRZOO_DIGITAL_300414 HR
ZOO_DIGITAL_300414 HR
 
Your application ever up-to-date? Go continuous delivery
Your application ever up-to-date? Go continuous deliveryYour application ever up-to-date? Go continuous delivery
Your application ever up-to-date? Go continuous delivery
 
Nuevas Tecnologias
Nuevas TecnologiasNuevas Tecnologias
Nuevas Tecnologias
 
DocDoc's Guide To Digital Marketing
DocDoc's Guide To Digital MarketingDocDoc's Guide To Digital Marketing
DocDoc's Guide To Digital Marketing
 
Gradle in 45min - JBCN2-16 version
Gradle in 45min - JBCN2-16 versionGradle in 45min - JBCN2-16 version
Gradle in 45min - JBCN2-16 version
 
Voxxed Belgrade 2016
Voxxed Belgrade 2016Voxxed Belgrade 2016
Voxxed Belgrade 2016
 
Java Docs
Java DocsJava Docs
Java Docs
 
Кастомная разработка в области E-Commerce
Кастомная разработка в области E-CommerceКастомная разработка в области E-Commerce
Кастомная разработка в области E-Commerce
 

Similar to Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Platform

Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)ijcsitcejournal
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainabilityDaniel S. Katz
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesDataWorks Summit
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
Fast App development with SwellRT
Fast App development  with SwellRTFast App development  with SwellRT
Fast App development with SwellRTSamer Hassan
 
Deprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackDeprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackJustina Petraitytė
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...PyData
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciencesterradue
 
Free remote sensing and GIS data
Free remote sensing and GIS dataFree remote sensing and GIS data
Free remote sensing and GIS dataNopphawanTamkuan
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdfRAHULRAHU8
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadataLuis Bermudez
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
ESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemPlanetek Italia Srl
 

Similar to Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Platform (20)

Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Viswanth_chadalawada_ft_resume
Viswanth_chadalawada_ft_resumeViswanth_chadalawada_ft_resume
Viswanth_chadalawada_ft_resume
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Fast App development with SwellRT
Fast App development  with SwellRTFast App development  with SwellRT
Fast App development with SwellRT
 
Deprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackDeprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stack
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
 
Free remote sensing and GIS data
Free remote sensing and GIS dataFree remote sensing and GIS data
Free remote sensing and GIS data
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
Introduction to Google Earth Engine .pptx
Introduction to Google Earth Engine .pptxIntroduction to Google Earth Engine .pptx
Introduction to Google Earth Engine .pptx
 
ESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication System
 

More from Paolo Corti

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019Paolo Corti
 
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Paolo Corti
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructurePaolo Corti
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesPaolo Corti
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016Paolo Corti
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitariePaolo Corti
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demoPaolo Corti
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionPaolo Corti
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...Paolo Corti
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Paolo Corti
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Paolo Corti
 

More from Paolo Corti (12)

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019
 
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data Infrastructure
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queues
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze Umanitarie
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demo
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk Reduction
 
Geonode 2.0
Geonode 2.0Geonode 2.0
Geonode 2.0
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1
 

Recently uploaded

Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfVictor Lopez
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabbereGrabber
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Soroosh Khodami
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareinfo611746
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignNeo4j
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfMeon Technology
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityamy56318795
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfDeskTrack
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationHelp Desk Migration
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfMehmet Akar
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024vaibhav130304
 

Recently uploaded (20)

Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 

Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Platform

  • 1. Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Plaform 2016 International Workshop on Cloud Computing and Big Data Benjamin Lewis, David Strohschein, Paolo Corti, David Smiley Center for Geographic Analysis, Harvard University
  • 2. Background ● Big data is everywhere: sensors (weather, pollution…), mobile devices, social platform activities, software logs, etc. ● Data are generally streaming, so they are temporal ● Most of those data are spatial as well ● Traditional RDBMS, desktop statistics and visualization packages have difficulty handling big data ● Current solutions involve “massive parallel software running on a large number of servers”
  • 3. Use case ● We work in a research university so we need to provide big data to students and researchers ● Our goal is to lower barriers to interactive data exploration ● Some systems support visualization of large spatio-temporal datasets but don’t handle search well ● Many search applications (most search engines) handle text but do not support the geographic dimension. ● Great need for tool to allow user to interactively search large collections and visualize them geographically. To support such increasingly common datasets, a new kind of map server and client is needed. ● Project funded by the Sloan Foundation in partnership with Dataverse team at Harvard IQSS
  • 4. Solution ● A general solution. Prototype with geotagged tweets (tweets containing GPS coordinates from originating device) ● Platform adaptable to other big data spatial time streams (weather and pollution sensors, geoRSS feeds etc...) ● Integrate the new platform within Harvard WorldMap and Dataverse systems
  • 5. Objective ● Create a missing piece of geo-infrastructure and make it available ● Demonstrate possibility of addressing scalability limitations with non-exotic software and hardware ● Make setting up platforms for big spatio-temporal visualization as easy as setting up a standard GIS stack
  • 7. Geotagged tweets ● Geotagged tweets: tweets containing GPS coordinates from originating device ● Currently about 2% of tweets are geotagged, about 8 million per day ● The CGA has been harvesting geo-tweets since October 2012 using the Twitter API ● Billion Object Platform(BOP) will provide a client and API to browse and search the latest 1 billion geotagged tweets (about 3 months range) ● Command line tools to extract older geotagged tweets from archives
  • 8. The BOP (Billion Object Platform) ● General purpose, open source platform to support exploration of large collections of spatio-temporal entities ● Built on top of a search engine ● Supports exploration, visualization, extraction via a RESTful API ● Queryable by time, space, text ● Responsive ● Spatial heatmap to represent the distribution of results (spatial faceting: results per cell in a grid) ● Support temporal histograms (temporal faceting: results per date time range) ● Support word clouds as a mechanism to enhance results browsing by topic ● Support downloads of subsets for registered users (up to 10,000 features) ● Sentiment stamping
  • 9. Solution Stack ● Apache Lucene: an indexing and search library ● Apache Solr: a search web server platform built on top of Lucene ● Apache Kafka: a message broker written in Scala to provide a platform for handling real-time data streams ● Apache ZooKeeper: enables highly reliable distributed coordination ● Swagger: a framework for building APIs ● scikit-learn library: Machine Learning in Python ● OpenLayers: a javascript mapping client ● AngularJS: a javascript framework
  • 10. Search engine features ● Faceted searches (category, space and time) ● Stemming: ability to detect words derived from a common root ● Synonyms detection and controlled vocabulary such as thesauri and taxonomies ● Weighted results ● Wildcard and fuzzy search: provide results for a given term and its common variations ● Boolean queries: search results using terms and boolean operators such as AND, OR, NOT… ● Hit highlighting: provides immediate suggestions to the user typing the text to search ● Stop words: words filtered out during the processing of text
  • 11. Client to enable data exploration and extraction
  • 12. API to streaming geotagged tweets
  • 13. Sentiment Analysis ● Sentiment analysis is a field of study which identifies the opinion of people expressed in a text using natural language processing tools ● Social media such as Twitter provides a constant source of textual data, many with an opinion, which can be analyzed using Sentiment Analysis tools. ● Using the scikit-learn library (Machine Learning in Python) we sentiment stamp as positive or negative each tweet
  • 14. HHypermap Similar approach to BOP (Solr/Lucene): provides a searchable registry of map service layers from OGC and Esri public endpoints