SlideShare a Scribd company logo
1 of 22
Building a real time
Tweet map with
Flink in six weeks
OSTMap
Fast poc development with
flink
Proof of concept - an important tool in the
industry
• PoC often necessary to show feasibility to customers
• touch several topics:
• Scalability
• Stream processing
• Batch processing
• Storage and querying of data
• OSTMap as example PoC
Goals for OSTMap
• Increase trust into big data
technologies on customer side
• It is easy to build an application
with current technologies
• With almost no experience
• Teach students big data technologies
• Recruiting
• Bring big data to the university
• Build a real time application to view
recent geotagged tweets on a map
• Search for terms and users, show
these tweets on a map
• Analytics:
• First data science jobs
• …
Industry in practice: IT-Ringvorlesung 2016
• A course at the University of Leipzig.
• work on projects of local companies
• six students
• over a period of 6 weeks - no full time
invest
• Weekly meetings
• Github project: github.com/IIDP/OSTMap
Nico Graebling Vincent Märkl
Hans Dieter Pogrzeba
Christopher SchottChristopher Rost
Kevin Shrestha
Michael Schmeißer
Martin Grimmer
Matthias Kricke
OSTMap
mgm technology partners
We bring applications into production!
• Innovative software solution provider with application responsibility
• Specialist for highly scalable, transactional online applications
• Central lines of business: Insurance, E-Commerce, E-Government
• Founded in 1994
• 347 employees, 9 offices (2014)
• Revenue: 43,7 Mio € (2014)
• Part of Allgeier SE
ScaDS
Competence center for scalable data services and solutions Dresden/Leipzig
• bundled Big Data research expertise of the TU
Dresden and Leipzig University
• Drive Big Data innovations
• Bring industry and science together
• Knowledge exchange and transfer
Walking skeleton
“A Walking Skeleton is a tiny implementation of the system that performs a small end-to-
end function. It need not use the final architecture, but it should link together the main
architectural components. The architecture and the functionality can then evolve in
parallel.”
- Alistair Cockburn
gif from http://blog.codeclimate.com/blog/2014/03/20/kickstart-your-next-project-with-a-
walking-skeleton
Milestone 1
read stream, store data as json file, show tweets, read data from json files
Milestone 2
write to and read from accumulo, show tweets on map, full table scans, slow visualization
Milestone 3
Term index, geotemporal index, ui improvements, clustering, …
OSTMap – stream, batch, storage and querying
geotagged tweets
webservice
a) stream processing
b) batch processing
c) querying data
Stream processing of incoming data – first
version
GeoTweetSourc
e
KeyGeneration RawTweetSinkDateExtraction
This enabled us to build a slow term search and a slow map search via full table scans.
time index
data for
Stream processing of incoming data – final
version
TermIndexSink
GeoTweetSourc
e
KeyGeneration RawTweetSinkDateExtraction
Now we were able to build a faster term and map search and language frequency visualization.
time index
TermExtraction
(tokenizing)
UserExtraction
LanguageFrequ
encySink
Language
Extraction
term index
language statistics
GeoTemporalInd
exCreation
GeoTemporalInd
exSink
geotemporal index
data for
1 minute
window
sum by
language
Batch processing
• Initial creation of the term index and geotemporal
index for already processed tweets
• Data export
• Other statistics like:
• Area/ tweet distance a user covers with his tweets
Storage
Table Row Column Family Column Qualifier Value
RawTweetData (TimeIndex)
timestamp, hash
8b + 4b
- - raw tweet json
TermIndex term field (user,text)
RawTweetData key
12b
-
LanguageFrequency
time bucket
YYYYMMDDhhmm
language-tag -
tweet count
4b
Accumulo table design
Geotemporal Index for OSTMap
Geo index
geo data
geohashes used
as row keys
in accumulo
…
3z
6b
6c
6f
6q
9p
9r
9x
9z
d0
d1
d2
d3
d4
d5
d6
…
dg
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash (z
curve)
function from 2d coordinate
space to 1d key space
Row CF CQ
geohash RawTweetKey -
Geotemporal Index for OSTMap
Geo index – querying?
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash
bounding
box
calculate
coverage of
bounding box
range: [9p]
calculate scan
ranges from
coverage
range: [9r]
range:
[d0,d1,d2,d3]
…
3z
6b
6c
6f
6q
9p
9r
9x
9z
d0
d1
d2
d3
d4
d5
d6
…
dg
accumulo
iteratorsaccumulo
iterators
accumulo
iterators
result
Row CF CQ
geohash RawTweetKey lat/lon
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
Geotemporal Index for OSTMap
Add some time!
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash,
with timebuckets
…
13z
16b
16c
16f
16q
19p
19r
19x
19z
1d0
1d1
1d2
1d3
1d4
1d5
1d6
…
1dg
day
lon
lat
…
23z
26b
26c
26f
26q
29p
29r
29x
29z
2d0
2d1
2d2
2d3
2d4
2d5
2d6
…
2dg
…
Row CF CQ
day, geohash RawTweetKey lat/lon
day 1 day 2 day i …
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
Geotemporal Index for OSTMap
What about Hotspots?
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash,
with timebuckets
…
13z
16b
16c
16f
16q
19p
19r
19x
19z
1d0
1d1
1d2
1d3
1d4
1d5
1d6
…
1dg
day
lon
lat
…
23z
26b
26c
26f
26q
29p
29r
29x
29z
2d0
2d1
2d2
2d3
2d4
2d5
2d6
…
2dg
…
Row CF CQ
day, geohash RawTweetKey lat/lon
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
Geotemporal Index for OSTMap
What about Hotspots?
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash,
with timebuckets
day
lon
lat
…
12d2
12d3
12d4
…
…
Row CF CQ
sb, day, geohash RawTweetKey lat/lon
…
11d2
11d3
11d4
…
…
02d2
02d3
02d4
…
…
…
01d2
01d3
01d4
…
…
22d2
22d3
22d4
…
…
…
21d2
21d3
21d4
…
…
spreading byte
node 0
node 1
node 2
node n
• spreading byte = hash(tweet) % 255
• reproducable
• pre table splits in accumulo
demo
Martin Grimmer grimmer[at]informatik.uni-leipzig.de
Matthias Kricke kricke[at]informatik.uni-leipzig.de
www.mgm-tp.comwww.scads.de
Thank you
Michael Schmeißer michael.schmeisser[at]mgm-tp.com

More Related Content

What's hot

RaspberryPiで作るガイガーカウンター
RaspberryPiで作るガイガーカウンターRaspberryPiで作るガイガーカウンター
RaspberryPiで作るガイガーカウンターYu Kusanagi
 
Python crash course for geologists in the mining industry
Python crash course for geologists in the mining industryPython crash course for geologists in the mining industry
Python crash course for geologists in the mining industryJohann Dangin
 
G2G マッピングに関するアップデート
G2G マッピングに関するアップデートG2G マッピングに関するアップデート
G2G マッピングに関するアップデートShota Matsumoto
 
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...vishnu rao
 
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHub
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHubEuroPython 2019: GeoSpatial Analysis using Python and JupyterHub
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHubMartin Christen
 
OpenHistoricMap: overview
OpenHistoricMap: overviewOpenHistoricMap: overview
OpenHistoricMap: overviewSK53
 
OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...
OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...
OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...mfrancis
 
Ronan Kerr: Exploring the Debris Disk Around Beta Pictoris
Ronan Kerr: Exploring the Debris Disk Around Beta PictorisRonan Kerr: Exploring the Debris Disk Around Beta Pictoris
Ronan Kerr: Exploring the Debris Disk Around Beta PictorisJeremyHeyl
 
Analysing OpenStreetMap Data with QGIS
Analysing OpenStreetMap Data with QGISAnalysing OpenStreetMap Data with QGIS
Analysing OpenStreetMap Data with QGISSK53
 
Open Historical Map: Vector Tiles & Other Updates
Open Historical Map: Vector Tiles & Other UpdatesOpen Historical Map: Vector Tiles & Other Updates
Open Historical Map: Vector Tiles & Other Updatesgwhathistory
 
Python Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation ExtravaganzaPython Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation ExtravaganzaGuy K. Kloss
 
Use of Nlog library in c#
Use of Nlog library in c#Use of Nlog library in c#
Use of Nlog library in c#bhai1122
 
LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...Shaun Lewis
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료BJ Jang
 

What's hot (15)

RaspberryPiで作るガイガーカウンター
RaspberryPiで作るガイガーカウンターRaspberryPiで作るガイガーカウンター
RaspberryPiで作るガイガーカウンター
 
Python crash course for geologists in the mining industry
Python crash course for geologists in the mining industryPython crash course for geologists in the mining industry
Python crash course for geologists in the mining industry
 
G2G マッピングに関するアップデート
G2G マッピングに関するアップデートG2G マッピングに関するアップデート
G2G マッピングに関するアップデート
 
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
 
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHub
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHubEuroPython 2019: GeoSpatial Analysis using Python and JupyterHub
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHub
 
OpenHistoricMap: overview
OpenHistoricMap: overviewOpenHistoricMap: overview
OpenHistoricMap: overview
 
OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...
OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...
OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...
 
Ronan Kerr: Exploring the Debris Disk Around Beta Pictoris
Ronan Kerr: Exploring the Debris Disk Around Beta PictorisRonan Kerr: Exploring the Debris Disk Around Beta Pictoris
Ronan Kerr: Exploring the Debris Disk Around Beta Pictoris
 
Analysing OpenStreetMap Data with QGIS
Analysing OpenStreetMap Data with QGISAnalysing OpenStreetMap Data with QGIS
Analysing OpenStreetMap Data with QGIS
 
Open Historical Map: Vector Tiles & Other Updates
Open Historical Map: Vector Tiles & Other UpdatesOpen Historical Map: Vector Tiles & Other Updates
Open Historical Map: Vector Tiles & Other Updates
 
Python Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation ExtravaganzaPython Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation Extravaganza
 
Use of Nlog library in c#
Use of Nlog library in c#Use of Nlog library in c#
Use of Nlog library in c#
 
LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...LIDAR-derived DTM for archaeology and landscape history research some recent ...
LIDAR-derived DTM for archaeology and landscape history research some recent ...
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
 
GStreamer Instruments
GStreamer InstrumentsGStreamer Instruments
GStreamer Instruments
 

Similar to Building a real time Tweet map with Flink in six weeks

Copy of Copy of Untitled presentation (1).pdf
Copy of Copy of Untitled presentation (1).pdfCopy of Copy of Untitled presentation (1).pdf
Copy of Copy of Untitled presentation (1).pdfjosephdonnelly2024
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Thoughtworks
 
Esta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-dataEsta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-datageoknow
 
ESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical dataESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical datageoknow
 
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Databricks
 
Chapter 6 project management
Chapter 6 project managementChapter 6 project management
Chapter 6 project managementShadina Shah
 
Engineering + Programming portfolio
Engineering + Programming portfolioEngineering + Programming portfolio
Engineering + Programming portfolioJosephDonnelly14
 
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020SC20 SYCL and C++ Birds of a Feather 19th Nov 2020
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020rodburns
 
Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka
 
Graph operations in Git version control system
Graph operations in Git version control systemGraph operations in Git version control system
Graph operations in Git version control systemJakub Narębski
 
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungScalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungRendy Bambang Junior
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
 
QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...
QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...
QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...Thoughtworks
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBjavier ramirez
 
Graphite, an introduction
Graphite, an introductionGraphite, an introduction
Graphite, an introductionjamesrwu
 
Building maps for apps in the cloud - a Softlayer Use Case
Building maps for  apps in the cloud - a Softlayer Use CaseBuilding maps for  apps in the cloud - a Softlayer Use Case
Building maps for apps in the cloud - a Softlayer Use CaseTiman Rebel
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 

Similar to Building a real time Tweet map with Flink in six weeks (20)

Portfolio
PortfolioPortfolio
Portfolio
 
Copy of Copy of Untitled presentation (1).pdf
Copy of Copy of Untitled presentation (1).pdfCopy of Copy of Untitled presentation (1).pdf
Copy of Copy of Untitled presentation (1).pdf
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013
 
Esta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-dataEsta ld -exploring-spatio-temporal-linked-statistical-data
Esta ld -exploring-spatio-temporal-linked-statistical-data
 
ESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical dataESTA-LD exploring spatio-temporal linked statistical data
ESTA-LD exploring spatio-temporal linked statistical data
 
CitySDK Workshop Feedback
CitySDK Workshop FeedbackCitySDK Workshop Feedback
CitySDK Workshop Feedback
 
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
 
Chapter 6 project management
Chapter 6 project managementChapter 6 project management
Chapter 6 project management
 
Engineering + Programming portfolio
Engineering + Programming portfolioEngineering + Programming portfolio
Engineering + Programming portfolio
 
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020SC20 SYCL and C++ Birds of a Feather 19th Nov 2020
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020
 
Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2
 
Graph operations in Git version control system
Graph operations in Git version control systemGraph operations in Git version control system
Graph operations in Git version control system
 
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungScalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev Bandung
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
 
QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...
QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...
QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
 
Graphite, an introduction
Graphite, an introductionGraphite, an introduction
Graphite, an introduction
 
Building maps for apps in the cloud - a Softlayer Use Case
Building maps for  apps in the cloud - a Softlayer Use CaseBuilding maps for  apps in the cloud - a Softlayer Use Case
Building maps for apps in the cloud - a Softlayer Use Case
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Building a real time Tweet map with Flink in six weeks

  • 1. Building a real time Tweet map with Flink in six weeks OSTMap Fast poc development with flink
  • 2. Proof of concept - an important tool in the industry • PoC often necessary to show feasibility to customers • touch several topics: • Scalability • Stream processing • Batch processing • Storage and querying of data • OSTMap as example PoC
  • 3. Goals for OSTMap • Increase trust into big data technologies on customer side • It is easy to build an application with current technologies • With almost no experience • Teach students big data technologies • Recruiting • Bring big data to the university • Build a real time application to view recent geotagged tweets on a map • Search for terms and users, show these tweets on a map • Analytics: • First data science jobs • …
  • 4. Industry in practice: IT-Ringvorlesung 2016 • A course at the University of Leipzig. • work on projects of local companies • six students • over a period of 6 weeks - no full time invest • Weekly meetings • Github project: github.com/IIDP/OSTMap Nico Graebling Vincent Märkl Hans Dieter Pogrzeba Christopher SchottChristopher Rost Kevin Shrestha Michael Schmeißer Martin Grimmer Matthias Kricke OSTMap
  • 5. mgm technology partners We bring applications into production! • Innovative software solution provider with application responsibility • Specialist for highly scalable, transactional online applications • Central lines of business: Insurance, E-Commerce, E-Government • Founded in 1994 • 347 employees, 9 offices (2014) • Revenue: 43,7 Mio € (2014) • Part of Allgeier SE
  • 6. ScaDS Competence center for scalable data services and solutions Dresden/Leipzig • bundled Big Data research expertise of the TU Dresden and Leipzig University • Drive Big Data innovations • Bring industry and science together • Knowledge exchange and transfer
  • 7. Walking skeleton “A Walking Skeleton is a tiny implementation of the system that performs a small end-to- end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel.” - Alistair Cockburn gif from http://blog.codeclimate.com/blog/2014/03/20/kickstart-your-next-project-with-a- walking-skeleton
  • 8. Milestone 1 read stream, store data as json file, show tweets, read data from json files
  • 9. Milestone 2 write to and read from accumulo, show tweets on map, full table scans, slow visualization
  • 10. Milestone 3 Term index, geotemporal index, ui improvements, clustering, …
  • 11. OSTMap – stream, batch, storage and querying geotagged tweets webservice a) stream processing b) batch processing c) querying data
  • 12. Stream processing of incoming data – first version GeoTweetSourc e KeyGeneration RawTweetSinkDateExtraction This enabled us to build a slow term search and a slow map search via full table scans. time index data for
  • 13. Stream processing of incoming data – final version TermIndexSink GeoTweetSourc e KeyGeneration RawTweetSinkDateExtraction Now we were able to build a faster term and map search and language frequency visualization. time index TermExtraction (tokenizing) UserExtraction LanguageFrequ encySink Language Extraction term index language statistics GeoTemporalInd exCreation GeoTemporalInd exSink geotemporal index data for 1 minute window sum by language
  • 14. Batch processing • Initial creation of the term index and geotemporal index for already processed tweets • Data export • Other statistics like: • Area/ tweet distance a user covers with his tweets
  • 15. Storage Table Row Column Family Column Qualifier Value RawTweetData (TimeIndex) timestamp, hash 8b + 4b - - raw tweet json TermIndex term field (user,text) RawTweetData key 12b - LanguageFrequency time bucket YYYYMMDDhhmm language-tag - tweet count 4b Accumulo table design
  • 16. Geotemporal Index for OSTMap Geo index geo data geohashes used as row keys in accumulo … 3z 6b 6c 6f 6q 9p 9r 9x 9z d0 d1 d2 d3 d4 d5 d6 … dg 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash (z curve) function from 2d coordinate space to 1d key space Row CF CQ geohash RawTweetKey -
  • 17. Geotemporal Index for OSTMap Geo index – querying? 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash bounding box calculate coverage of bounding box range: [9p] calculate scan ranges from coverage range: [9r] range: [d0,d1,d2,d3] … 3z 6b 6c 6f 6q 9p 9r 9x 9z d0 d1 d2 d3 d4 d5 d6 … dg accumulo iteratorsaccumulo iterators accumulo iterators result Row CF CQ geohash RawTweetKey lat/lon
  • 18. 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g Geotemporal Index for OSTMap Add some time! 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash, with timebuckets … 13z 16b 16c 16f 16q 19p 19r 19x 19z 1d0 1d1 1d2 1d3 1d4 1d5 1d6 … 1dg day lon lat … 23z 26b 26c 26f 26q 29p 29r 29x 29z 2d0 2d1 2d2 2d3 2d4 2d5 2d6 … 2dg … Row CF CQ day, geohash RawTweetKey lat/lon day 1 day 2 day i …
  • 19. 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g Geotemporal Index for OSTMap What about Hotspots? 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash, with timebuckets … 13z 16b 16c 16f 16q 19p 19r 19x 19z 1d0 1d1 1d2 1d3 1d4 1d5 1d6 … 1dg day lon lat … 23z 26b 26c 26f 26q 29p 29r 29x 29z 2d0 2d1 2d2 2d3 2d4 2d5 2d6 … 2dg … Row CF CQ day, geohash RawTweetKey lat/lon
  • 20. 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g Geotemporal Index for OSTMap What about Hotspots? 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash, with timebuckets day lon lat … 12d2 12d3 12d4 … … Row CF CQ sb, day, geohash RawTweetKey lat/lon … 11d2 11d3 11d4 … … 02d2 02d3 02d4 … … … 01d2 01d3 01d4 … … 22d2 22d3 22d4 … … … 21d2 21d3 21d4 … … spreading byte node 0 node 1 node 2 node n • spreading byte = hash(tweet) % 255 • reproducable • pre table splits in accumulo
  • 21. demo
  • 22. Martin Grimmer grimmer[at]informatik.uni-leipzig.de Matthias Kricke kricke[at]informatik.uni-leipzig.de www.mgm-tp.comwww.scads.de Thank you Michael Schmeißer michael.schmeisser[at]mgm-tp.com

Editor's Notes

  1. 8