SlideShare a Scribd company logo
1 of 31
1A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Frank Cremer (Geomatik) Mansour Raad (ESRI)
A Hadoop-enabled Ship Tracking
Application for the Port of Rotterdam
Hadoop Summit, Brussels, 15 April 2015
2A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Rue des Bouchers
3A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Where are the ships?
AIS = Automatic Identification System
4A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Radar and control station
VTS = Vessel Traffic Service
5A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Port Challenges
The Port should
become smarter,
faster and more
sustainable
Allard Castelein
CEO Port of Rotterdam
6A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Access information in three clicks
7A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Presentation Highlights
• Geospatial data set
• Only geospatial presentation this summit
• Sensor data
• Structured, but not flawless
• Easy access to Hadoop functionality
• Retrieving information in three clicks
• Users can be agnostic about Hadoop
8A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
About the speakers
• Mansour Raad
• BigData advocate
• Senior Software Architect
• ESRI – World’s largest GIS company
• GIS = Geographical Information System
• Frank Cremer
• Independent Geospatial and big data Consultant
• Consulting for the Port since 2008
• Geomatik
9A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Port of Rotterdam: the facts
• 8th largest port in the
world
• Largest port of Europe
• Total area: 12,600 ha
• Depth 24 meter
• 70.5 km quay length
Maasvlakte 2
10A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Port of Rotterdam in figures (1 year)
• 35.000 ship visits with 400 million ton cargo
• 80.000 barge visits
• 7.500.000 trucks (25.000 per day)
28%
48%
7%
17%
Road
Barge
Railway
Pipe lines
Over 40 kilometers
+
11A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Usage of ship position data
• Harbour master
• Incident analysis
• Safety checks
• Capacity management
• Identifying bottlenecks
• Planning decision support
• Environmental management
• Pollution (NOx) calculations
• Speed measures to reduce pollutions
?
12A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
PortMaps project
• New geographical information system
• Deployed in partnership with ESRI
• Key characteristics:
• One uniform source of data
• Easy access
• Ship position data
13A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Is ship position data Big Data?
• 5 Terabytes
(since 2009)
• 1 Terabyte
per year >1,000 records
every 10 sec
Single data
format (csv)
Volume
Variety
Velocity
14A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Portmaps: Ship position data
• Challenges:
• Receiving data every 10s
• 10 Million records per day
• Considered options:
• Geospatial database; possible but
• Expensive
• Custom partitioning required
• Analyses could be a challenge
• Hadoop
• Commodity hardware
• Built for huge data sets (Petabytes)
• Framework for analyses
15A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Simplified architecture
16A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
How it all got started…
ESRI.NL ad hoc cluster
• ESRI.NL hardware
• 4 x R200 Dell
• 2 x 1 Tb harddisks
• 1 x 4 Cores
• 16 GB RAM
• CentOS
• Installed CDH4
• Handed 2 1Tb USB drives
• 2 days to bulk load 2.5 Tb of data
17A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Dataset
N|3270|N|550|N|-14927|441077|1|N|1|N|0|N|194|N|N||||||||||||||2231|01-03-2015 01:00:04|
J|N|N|N|N|||||5155.929,N|00254.968,E|||01-03-2015 00:00:04|A|R|N|ORE SALVADOR|
N|D5DO9|N|179|N|5|N|0|N|-5|N|188|N|-3|N|209|N|NL RTM|N|23-02-2015 17:45:00|N|
Voor anker|N|9607045|N|636015935|N|Klasse A|N|Vracht|N|5155.995,N|N|00254.979,E|
N|-14911|N|441197|N|3270|N|550|N||N||
• Track number
• MMSI
• X
• Y
• Navigational status: anchored
• Length (x 0.1 m)
• Width (x 0.1 m)
• Time (UTC)
18A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Dataset storage
• Considerations:
• Hadoop prefers large files (64 Mbyte++)
• User selection by date/time
• Implementation:
• Partitioned by year, month, day and hour
• Separate directory for each partition, e.g.
/…/year=2015/month=4/day=15/hour=14
19A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Production Hadoop cluster
• Using Hadoop as a service
• Based on Hortonworks
Data Platform 2.1
• Provided by KPN
• 4 data nodes
• 12 CPUs
• 96 GB memory
• 3 x 4 Tb disk
• Running as virtual machines
• No shared disks!
20A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Flume configuration
• Extract time stamp
• Select the 45th field:
• Input: ^(?:[^|]*|){44}([^|])
• Output: dd-MM-yyyy HH:mm:ss
• Custom serializer:
• Implemented in Java
• Outputs only selected fields
• Configuration: sink1.serializer = com.esri.serializer.AramisSerializer$Builder
21A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Developed tools
PolyTrackTool
LineTool & LineStatTool
DensityTool
SpeedTool
22A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Custom ArcGIS Java Toolbox
• ArcGIS is Java-based
• Limited ArcObjects Java API examples
• ArcGIS server toolbox
• Almost same as client toolbox
• Certain functionality not available (getmap)
• FeatureSet
• Allowing to draw geographical inputs
• Unit test
• Test functionality before deploying
23A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Access from browser (WebMap)
24A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
The challenge of counting
25A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Results: Passages (LineTool)
• Large job (LineTool)
• Passages of 55 lines
• Full year of data
• Results
• Takes 1 hour on the cluster
• versus 1 week on a PC
• with 6 times more data!
26A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Density Tool
• Number of observations per grid cell
• Output: centre point & population
27A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Implementation challenges
Challenge
• Performance
• Performance YARN – MR v1
• Connectivity
• No access through firewall to
application master
• Flume
• Too slow when too many files in
CIFS spool directory
Solution
• Performance
• # containers per node = # cores
• More reduces for bigger jobs
• Connectivity
• Submit job and poll resource
manager
• Flume
• Spoon feed Flume by limiting max
number of files
28A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Future work
• Using Spark instead of MapReduce jobs
• Faster and potentially real time
• Easier in development
• Using Python for interfacing with ArcMap
• Easier development
• Better supported / documented
• Having a web service at the Hadoop cluster
• Easier for connectivity
• Spring framework for easy development
29A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Technical conclusions
• Geospatial application for Hadoop in production
• Integrated within the GIS system
• Easy to use for end users
30A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Benefits
• User testimonials
• René Kronieger: “I can now obtain faster access to the information I need”
• Eric van Andel: “I can provide results more often”
• Bob van Hell: “I get my result with existing GIS tools”
• Our Hadoop solution:
• provides better insight in Port usage – smarter
• provides results more often – faster
• enables pollution calculation – more sustainable
Allard Castelein
CEO Port of Rotterdam
31A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam
Questions?
• How geospatial is your data set?
• Can you use our approach?
• For further information and questions please contact:
• Frank Cremer f.cremer@portofrotterdam.com
or frank@geomatik.nl
• Mansour Raad mraad@esri.com

More Related Content

Similar to A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam

Maersk line presentation port of rotterdam
Maersk line presentation   port of rotterdamMaersk line presentation   port of rotterdam
Maersk line presentation port of rotterdamJaianand Lall
 
APIdays Helsinki 2019 - Enabling New Business Models with Lonneke Dikmans, eP...
APIdays Helsinki 2019 - Enabling New Business Models with Lonneke Dikmans, eP...APIdays Helsinki 2019 - Enabling New Business Models with Lonneke Dikmans, eP...
APIdays Helsinki 2019 - Enabling New Business Models with Lonneke Dikmans, eP...apidays
 
RIPE & RIPE NCC/Using RIPE Atlas and RIPEstat for Network Analysis
RIPE & RIPE NCC/Using RIPE Atlas and RIPEstat for Network AnalysisRIPE & RIPE NCC/Using RIPE Atlas and RIPEstat for Network Analysis
RIPE & RIPE NCC/Using RIPE Atlas and RIPEstat for Network AnalysisRIPE NCC
 
Cloud computing application for water resources based on open source software...
Cloud computing application for water resources based on open source software...Cloud computing application for water resources based on open source software...
Cloud computing application for water resources based on open source software...Blagoj Delipetrev
 
APIPRO for Ports and Terminals
APIPRO for Ports and TerminalsAPIPRO for Ports and Terminals
APIPRO for Ports and TerminalsPeter Schwoerer
 
Creating serverless APIs for marine traffic
Creating serverless APIs for marine trafficCreating serverless APIs for marine traffic
Creating serverless APIs for marine trafficMassimoPrencipe6
 
Using RIPE Atlas and RIPEstat for Network Analysis
Using RIPE Atlas and RIPEstat for Network AnalysisUsing RIPE Atlas and RIPEstat for Network Analysis
Using RIPE Atlas and RIPEstat for Network AnalysisRIPE NCC
 
Workshop on Vehicular Networks and Sustainable Mobility Testbed - Tânia calça...
Workshop on Vehicular Networks and Sustainable Mobility Testbed - Tânia calça...Workshop on Vehicular Networks and Sustainable Mobility Testbed - Tânia calça...
Workshop on Vehicular Networks and Sustainable Mobility Testbed - Tânia calça...Future Cities Project
 
Workshop on Cyber-physical Systems Platforms – Tânia Calçada “UrbanSense Plat...
Workshop on Cyber-physical Systems Platforms – Tânia Calçada “UrbanSense Plat...Workshop on Cyber-physical Systems Platforms – Tânia Calçada “UrbanSense Plat...
Workshop on Cyber-physical Systems Platforms – Tânia Calçada “UrbanSense Plat...Future Cities Project
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
A World of Connected Fleets - M2M meets the API Economy (Service Delivery Inn...
A World of Connected Fleets - M2M meets the API Economy (Service Delivery Inn...A World of Connected Fleets - M2M meets the API Economy (Service Delivery Inn...
A World of Connected Fleets - M2M meets the API Economy (Service Delivery Inn...Toralf Richter
 
RIS lock mangement system RGO
RIS lock mangement system RGORIS lock mangement system RGO
RIS lock mangement system RGODamir Obad
 
RIPEstat, RIPE Atlas and RIS
RIPEstat, RIPE Atlas and RISRIPEstat, RIPE Atlas and RIS
RIPEstat, RIPE Atlas and RISRIPE NCC
 
More Measurements: Expanding RIPE Atlas Anchors
More Measurements: Expanding RIPE Atlas AnchorsMore Measurements: Expanding RIPE Atlas Anchors
More Measurements: Expanding RIPE Atlas AnchorsRIPE NCC
 
RIPEstat, RIPE Atlas and the new DNSMON
RIPEstat, RIPE Atlas and the new DNSMONRIPEstat, RIPE Atlas and the new DNSMON
RIPEstat, RIPE Atlas and the new DNSMONRIPE NCC
 
RIPE NCC Tools and Measurements
RIPE NCC Tools and MeasurementsRIPE NCC Tools and Measurements
RIPE NCC Tools and MeasurementsRIPE NCC
 
RIPE Atlas - Cisco Workshop
RIPE Atlas - Cisco WorkshopRIPE Atlas - Cisco Workshop
RIPE Atlas - Cisco WorkshopMassimo Candela
 
ISRSE37 Terradue Cloud Platform & ellip
ISRSE37 Terradue Cloud Platform & ellipISRSE37 Terradue Cloud Platform & ellip
ISRSE37 Terradue Cloud Platform & ellipterradue
 
Does Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarDoes Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarImpetus Technologies
 

Similar to A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam (20)

Maersk line presentation port of rotterdam
Maersk line presentation   port of rotterdamMaersk line presentation   port of rotterdam
Maersk line presentation port of rotterdam
 
APIdays Helsinki 2019 - Enabling New Business Models with Lonneke Dikmans, eP...
APIdays Helsinki 2019 - Enabling New Business Models with Lonneke Dikmans, eP...APIdays Helsinki 2019 - Enabling New Business Models with Lonneke Dikmans, eP...
APIdays Helsinki 2019 - Enabling New Business Models with Lonneke Dikmans, eP...
 
RIPE & RIPE NCC/Using RIPE Atlas and RIPEstat for Network Analysis
RIPE & RIPE NCC/Using RIPE Atlas and RIPEstat for Network AnalysisRIPE & RIPE NCC/Using RIPE Atlas and RIPEstat for Network Analysis
RIPE & RIPE NCC/Using RIPE Atlas and RIPEstat for Network Analysis
 
Cloud computing application for water resources based on open source software...
Cloud computing application for water resources based on open source software...Cloud computing application for water resources based on open source software...
Cloud computing application for water resources based on open source software...
 
APIPRO for Ports and Terminals
APIPRO for Ports and TerminalsAPIPRO for Ports and Terminals
APIPRO for Ports and Terminals
 
Creating serverless APIs for marine traffic
Creating serverless APIs for marine trafficCreating serverless APIs for marine traffic
Creating serverless APIs for marine traffic
 
Using RIPE Atlas and RIPEstat for Network Analysis
Using RIPE Atlas and RIPEstat for Network AnalysisUsing RIPE Atlas and RIPEstat for Network Analysis
Using RIPE Atlas and RIPEstat for Network Analysis
 
Workshop on Vehicular Networks and Sustainable Mobility Testbed - Tânia calça...
Workshop on Vehicular Networks and Sustainable Mobility Testbed - Tânia calça...Workshop on Vehicular Networks and Sustainable Mobility Testbed - Tânia calça...
Workshop on Vehicular Networks and Sustainable Mobility Testbed - Tânia calça...
 
Workshop on Cyber-physical Systems Platforms – Tânia Calçada “UrbanSense Plat...
Workshop on Cyber-physical Systems Platforms – Tânia Calçada “UrbanSense Plat...Workshop on Cyber-physical Systems Platforms – Tânia Calçada “UrbanSense Plat...
Workshop on Cyber-physical Systems Platforms – Tânia Calçada “UrbanSense Plat...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
A World of Connected Fleets - M2M meets the API Economy (Service Delivery Inn...
A World of Connected Fleets - M2M meets the API Economy (Service Delivery Inn...A World of Connected Fleets - M2M meets the API Economy (Service Delivery Inn...
A World of Connected Fleets - M2M meets the API Economy (Service Delivery Inn...
 
RIS lock mangement system RGO
RIS lock mangement system RGORIS lock mangement system RGO
RIS lock mangement system RGO
 
RIPEstat, RIPE Atlas and RIS
RIPEstat, RIPE Atlas and RISRIPEstat, RIPE Atlas and RIS
RIPEstat, RIPE Atlas and RIS
 
More Measurements: Expanding RIPE Atlas Anchors
More Measurements: Expanding RIPE Atlas AnchorsMore Measurements: Expanding RIPE Atlas Anchors
More Measurements: Expanding RIPE Atlas Anchors
 
RIPEstat, RIPE Atlas and the new DNSMON
RIPEstat, RIPE Atlas and the new DNSMONRIPEstat, RIPE Atlas and the new DNSMON
RIPEstat, RIPE Atlas and the new DNSMON
 
RIPE NCC Tools and Measurements
RIPE NCC Tools and MeasurementsRIPE NCC Tools and Measurements
RIPE NCC Tools and Measurements
 
RIPE Atlas - Cisco Workshop
RIPE Atlas - Cisco WorkshopRIPE Atlas - Cisco Workshop
RIPE Atlas - Cisco Workshop
 
ISRSE37 Terradue Cloud Platform & ellip
ISRSE37 Terradue Cloud Platform & ellipISRSE37 Terradue Cloud Platform & ellip
ISRSE37 Terradue Cloud Platform & ellip
 
Does Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus WebinarDoes Big Data Spell Big Costs- Impetus Webinar
Does Big Data Spell Big Costs- Impetus Webinar
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 

Recently uploaded (20)

Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 

A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam

  • 1. 1A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Frank Cremer (Geomatik) Mansour Raad (ESRI) A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam Hadoop Summit, Brussels, 15 April 2015
  • 2. 2A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Rue des Bouchers
  • 3. 3A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Where are the ships? AIS = Automatic Identification System
  • 4. 4A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Radar and control station VTS = Vessel Traffic Service
  • 5. 5A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Port Challenges The Port should become smarter, faster and more sustainable Allard Castelein CEO Port of Rotterdam
  • 6. 6A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Access information in three clicks
  • 7. 7A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Presentation Highlights • Geospatial data set • Only geospatial presentation this summit • Sensor data • Structured, but not flawless • Easy access to Hadoop functionality • Retrieving information in three clicks • Users can be agnostic about Hadoop
  • 8. 8A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam About the speakers • Mansour Raad • BigData advocate • Senior Software Architect • ESRI – World’s largest GIS company • GIS = Geographical Information System • Frank Cremer • Independent Geospatial and big data Consultant • Consulting for the Port since 2008 • Geomatik
  • 9. 9A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Port of Rotterdam: the facts • 8th largest port in the world • Largest port of Europe • Total area: 12,600 ha • Depth 24 meter • 70.5 km quay length Maasvlakte 2
  • 10. 10A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Port of Rotterdam in figures (1 year) • 35.000 ship visits with 400 million ton cargo • 80.000 barge visits • 7.500.000 trucks (25.000 per day) 28% 48% 7% 17% Road Barge Railway Pipe lines Over 40 kilometers +
  • 11. 11A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Usage of ship position data • Harbour master • Incident analysis • Safety checks • Capacity management • Identifying bottlenecks • Planning decision support • Environmental management • Pollution (NOx) calculations • Speed measures to reduce pollutions ?
  • 12. 12A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam PortMaps project • New geographical information system • Deployed in partnership with ESRI • Key characteristics: • One uniform source of data • Easy access • Ship position data
  • 13. 13A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Is ship position data Big Data? • 5 Terabytes (since 2009) • 1 Terabyte per year >1,000 records every 10 sec Single data format (csv) Volume Variety Velocity
  • 14. 14A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Portmaps: Ship position data • Challenges: • Receiving data every 10s • 10 Million records per day • Considered options: • Geospatial database; possible but • Expensive • Custom partitioning required • Analyses could be a challenge • Hadoop • Commodity hardware • Built for huge data sets (Petabytes) • Framework for analyses
  • 15. 15A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Simplified architecture
  • 16. 16A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam How it all got started… ESRI.NL ad hoc cluster • ESRI.NL hardware • 4 x R200 Dell • 2 x 1 Tb harddisks • 1 x 4 Cores • 16 GB RAM • CentOS • Installed CDH4 • Handed 2 1Tb USB drives • 2 days to bulk load 2.5 Tb of data
  • 17. 17A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Dataset N|3270|N|550|N|-14927|441077|1|N|1|N|0|N|194|N|N||||||||||||||2231|01-03-2015 01:00:04| J|N|N|N|N|||||5155.929,N|00254.968,E|||01-03-2015 00:00:04|A|R|N|ORE SALVADOR| N|D5DO9|N|179|N|5|N|0|N|-5|N|188|N|-3|N|209|N|NL RTM|N|23-02-2015 17:45:00|N| Voor anker|N|9607045|N|636015935|N|Klasse A|N|Vracht|N|5155.995,N|N|00254.979,E| N|-14911|N|441197|N|3270|N|550|N||N|| • Track number • MMSI • X • Y • Navigational status: anchored • Length (x 0.1 m) • Width (x 0.1 m) • Time (UTC)
  • 18. 18A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Dataset storage • Considerations: • Hadoop prefers large files (64 Mbyte++) • User selection by date/time • Implementation: • Partitioned by year, month, day and hour • Separate directory for each partition, e.g. /…/year=2015/month=4/day=15/hour=14
  • 19. 19A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Production Hadoop cluster • Using Hadoop as a service • Based on Hortonworks Data Platform 2.1 • Provided by KPN • 4 data nodes • 12 CPUs • 96 GB memory • 3 x 4 Tb disk • Running as virtual machines • No shared disks!
  • 20. 20A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Flume configuration • Extract time stamp • Select the 45th field: • Input: ^(?:[^|]*|){44}([^|]) • Output: dd-MM-yyyy HH:mm:ss • Custom serializer: • Implemented in Java • Outputs only selected fields • Configuration: sink1.serializer = com.esri.serializer.AramisSerializer$Builder
  • 21. 21A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Developed tools PolyTrackTool LineTool & LineStatTool DensityTool SpeedTool
  • 22. 22A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Custom ArcGIS Java Toolbox • ArcGIS is Java-based • Limited ArcObjects Java API examples • ArcGIS server toolbox • Almost same as client toolbox • Certain functionality not available (getmap) • FeatureSet • Allowing to draw geographical inputs • Unit test • Test functionality before deploying
  • 23. 23A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Access from browser (WebMap)
  • 24. 24A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam The challenge of counting
  • 25. 25A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Results: Passages (LineTool) • Large job (LineTool) • Passages of 55 lines • Full year of data • Results • Takes 1 hour on the cluster • versus 1 week on a PC • with 6 times more data!
  • 26. 26A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Density Tool • Number of observations per grid cell • Output: centre point & population
  • 27. 27A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Implementation challenges Challenge • Performance • Performance YARN – MR v1 • Connectivity • No access through firewall to application master • Flume • Too slow when too many files in CIFS spool directory Solution • Performance • # containers per node = # cores • More reduces for bigger jobs • Connectivity • Submit job and poll resource manager • Flume • Spoon feed Flume by limiting max number of files
  • 28. 28A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Future work • Using Spark instead of MapReduce jobs • Faster and potentially real time • Easier in development • Using Python for interfacing with ArcMap • Easier development • Better supported / documented • Having a web service at the Hadoop cluster • Easier for connectivity • Spring framework for easy development
  • 29. 29A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Technical conclusions • Geospatial application for Hadoop in production • Integrated within the GIS system • Easy to use for end users
  • 30. 30A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Benefits • User testimonials • René Kronieger: “I can now obtain faster access to the information I need” • Eric van Andel: “I can provide results more often” • Bob van Hell: “I get my result with existing GIS tools” • Our Hadoop solution: • provides better insight in Port usage – smarter • provides results more often – faster • enables pollution calculation – more sustainable Allard Castelein CEO Port of Rotterdam
  • 31. 31A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam Questions? • How geospatial is your data set? • Can you use our approach? • For further information and questions please contact: • Frank Cremer f.cremer@portofrotterdam.com or frank@geomatik.nl • Mansour Raad mraad@esri.com

Editor's Notes

  1. For the Port of Rotterdam it’s of course important to know where are all the ships! One of the primary roles of the Port of Rotterdam Authorities is to guide ships safely to their destinations. Each ship has an AIS transponder that continuously transmits the current location of the ship. AIS stands for Automatic Identification System, so not only the location is broadcast but also the ID of the ship (amongst other parameters). That’s one way, how the Port of Rotterdam “knows” where the ships are.
  2. Throughout the Port the radar stations continuously scan for ships. This information together with the AIS information is passed to the (vessel traffic service) VTS operators. The VTS operators are located in several control stations throughout the port and they are responsible to manage the shipping in real time.
  3. As ships become bigger and bigger, it poses a challenge to get the goods transported further on. This container ship can handle up to 18,000 containers, transporting the equivalent of 125 milion pairs of shoes. Therefore the CEO of the Port of Rotterdam has said that the Port should become smarter, faster and more sustainable. The way to do that is to innovate in which this project contributes.
  4. This animated slides gives a offline demonstration.
  5. Where does our presentation stand out? First, our presentation deal with a geospatial data set, which means data where location is important. Despite location being omnipresent, very few Hadoop applications deal with this data set; as far as I can tell this might be the only presentation in the summit. Second, we deal with sensor data; meaning observations and measurements. We’re obviously not only at that, although most Hadoop applications deal with social and/or transaction data. As with any measurement, errors can and will occur. Special care is need tot take that into account for analysis. Third, we’ve created an interface that allow end users easy access to the information obtained from big data. I’ll demonstrate that in the presentation.
  6. Here are some facts about the Port of Rotterdam. It’s area is about three times the city of Brussels or 80% of the Brussels region.
  7. You may have heard about the Port of Rotterdam. It’s one of the biggest in the world. In terms of size, it’ big; it stretches for more than 40 km. It takes up to 4 hours to sail from one end to the other.
  8. We store all this data for three main customers. The Harbour master main interest is in safety. They use the tool for incidents. For example when there is a collision, they’ll like to know what happened. They of course like to prevent this from happening so they’ll like to see how the harbour is used and identify possible safety concerns. The second group, capacity management is interested to ensure quick and easy passage of goods through the harbour. They’re interesting in identifying bottlenecks by looking at traffic patterns. Furthermore they’re interested in how current traffic patterns may alter if certain changes are made like widening of channels. This enables better decision making. The third group, environmental management is interested in the pollution effect of the shipping. They are also evaluating speed measures that are put in place to reduce the pollution.
  9. The big data work here is part of the Portmaps project. In this project the Port of Rotterdam has implemented a new geographical information system. Uniform source of data.
  10. All this data about ships and their location: is it big data? And does it make sense to use Hadoop for it? Let us look at the big data score card: For big data three key characteristics are import: volume, velocity and variety. The data has a reasonable volume. It comes in at quite a high velocity at over a 1,000 records per second. It has only a single data format so it doesn’t meet the variety characteristic. However, it meets the other two characteristics so it is big data and it does make sense to use Hadoop for it. Volume = 18 billion records since 2009, there is three times the number of people in the world. Velocity = during this presentation 250,000 records have been added
  11. One part of the data set for Portmaps data set is ship position data. As we’ve seen the properties for this data set are receiving data every 10 seconds. Several options were considered. The most potential one is storing it in a geospatial database. However it is expensive, it may require custom partitioning. It also requires custom queries and code to perform analyses. And then there is of course Hadoop.
  12. The external radar/AIS system places a file in the spool directory every 10s. Flume picks up this file and serialises it and sinks it into Hadoop. To be able to access data a custom toolbox has been created that access the Hadoop cluster. It can read and write data from HDFS and can submit jobs. The clients ArcMap and WebMap make use of the geoprocessing services that is provided by the custom Java toolbox.
  13. The data set is just a CSV line for each observed ship every 10 seconds. Here is one example line. Each field is separated by the bar character. The following information is extracted from this line: Track number – an assigned number by the radar/AIS system MMSI – an unique identification number of the ship; based on that we know which ship it is. X – The X-coordinate of the ship Y – The Y-coordinate of the ship Navigational status: whether is moored, anchored or moving. In this case it is anchored. Length – the length of the ship. Although based on the MMSI this ship property may be found. However, may differ for barges that are pushing boat with car floats the length is variable. Breadth – or width same as for the length. Time – th
  14. The ship positions data set is stored in Hadoop. Two considerations are important. One is that Hadoop prefers big files. In fact it can split big files and have it send to different mappers if need. Second, users often wants ranges of data to be considered. We have chosen to partition the data at the hourly level. For each hour we store about 80Mb. Each file can therefore be processed by one mapper. If we consider a day, 24 mappers can work in parallel.
  15. To facilitate easy deployment, the Port of Rotterdam has chosen for the Hadoop as a Service solution as provided by KPN. KPN is one of the main IT service providers for the Port. The cluster is configured as stated. Although the cluster is virtual, each node has exclusive access to its three disks.
  16. This animated slides gives a offline demonstration. To make it even easier for end users to obtain information. We’ve also created a webmap application. The end user just needs to go to the right website and gets a map of the area.