SlideShare a Scribd company logo
1 of 63
Download to read offline
© 2017 MapR TechnologiesMapR Confidential 1
Fast Cars, Big Data
How Streaming Can Help Formula1 ?
Tugdual Grall
@tgrall
© 2017 MapR TechnologiesMapR Confidential 2
Agenda
•  What’s the point of data in motorsports?
•  Demo
•  Architecture
•  What’s next?
•  Q&A
© 2017 MapR TechnologiesMapR Confidential 3
What’s the point of data in motorsports?
© 2017 MapR TechnologiesMapR Confidential
Some of the most heavily instrumented objects in the world
https://www.metasphere.co.uk/telemetry-data-journey-f1/
© 2017 MapR TechnologiesMapR Confidential 5
Got examples?
•  Up to 300 sensors per car
•  Up to 2000 channels
•  Sensor data are sent to the paddock in 2ms
•  1.5 billions of data points for a race
•  5 billions for a full race weekend
•  5/6Gb of compressed data per car for 90mn
US Grand Prix 2014 : 243 Tb (race teams combined)!
© 2017 MapR TechnologiesMapR Confidential
Data Communication
© 2017 MapR TechnologiesMapR Confidential 7
© 2017 MapR TechnologiesMapR Confidential 8
Data in Motorsports
F1 Framework - http://f1framework.blogspot.de/2013/08/short-guide-to-f1-telemetry-spa-circuit.html
RPM
speed
Lateral
acceleration
gear
throttle
Brake
pressure
© 2017 MapR TechnologiesMapR Confidential 9
Race Strategy
F1 Framework - http://f1framework.blogspot.be/2013/08/race-strategy-explained.html
© 2017 MapR TechnologiesMapR Confidential 10
How does that work?
© 2017 MapR TechnologiesMapR Confidential
Replicating
Stream
Topic
FIA
Stream
Topic
Stream
Topic
Team Engineers
Trackside
Production System Outline
Ad-hoc
analysis
Other Data
Sources
Real-time
analysis Reporting
Streaming
Factory Operations Room
MAPR EDGE
MAPR EDGE
Kafka APIKafka API
Kafka API
Kafka API
Kafka API
© 2017 MapR TechnologiesMapR Confidential
Simplified Demo System Outline
Ad-hoc
analysis
MapR Sandbox
Torcs race simulator
D3
Real-time
analysis
Kafka API
Kafka API
Kafka API
© 2017 MapR TechnologiesMapR Confidential 13
TORCS for Cars, Physics & Drivers
•  The Open Racing Car Simulator
–  http://torcs.sourceforge.net/
•  a pseudo-physics based racing
simulator
•  AI Game & Research Platform
© 2017 MapR TechnologiesMapR Confidential 14
Architecture
Event Producers
Event Consumers
sensor
data
Real Time
Analytics with SQL
https://github.com/mapr-demos/racing-time-series
Stream
Topic
Kafka API Kafka API
© 2017 MapR TechnologiesMapR Confidential 15
Demo
© 2017 MapR TechnologiesMapR Confidential 16
How to Capture IOT
events at scale?
© 2017 MapR TechnologiesMapR Confidential
Traditional Message queue
© 2017 MapR TechnologiesMapR Confidential 18
Data Streaming

Moving millions of events per h|mn|s
© 2017 MapR TechnologiesMapR Confidential 19
What is Kafka?
•  http://kafka.apache.org/
•  Created at LinkedIn, open sourced in 2011
•  Distributed messaging system built to scale
•  Implemented in Scala / Java
© 2017 MapR TechnologiesMapR Confidential
Topics Organize Events Into Categories
And Decouple Producers from Consumers
© 2017 MapR TechnologiesMapR Confidential 21
Topics are Partitioned for Concurrency and Scalability
© 2017 MapR TechnologiesMapR Confidential
Partition message order is like a Queue
Partitions:
•  Messages are
appended to the
end
•  delivered in order
received
Offset:
•  Sequential id of a
message
© 2017 MapR TechnologiesMapR Confidential
Partition message order is like a Queue
•  Producers Append New messages to end
•  Consumers Read from front
•  Read cursor: offset ID of most recent read
message
© 2017 MapR TechnologiesMapR Confidential
Unlike a Queue, events are still persisted after delivered
• Messages remain on the partition, available to other
consumers
• Minimizes Non-Sequential disk read-writes
© 2017 MapR TechnologiesMapR Confidential 25
Processing of the same message for different views
© 2017 MapR TechnologiesMapR Confidential 26
Processing of the same message for different views
© 2017 MapR TechnologiesMapR Confidential 27
MapR Streams
•  Distributed messaging system built to scale
•  Use Apache Kafka API 0.9.0
•  No code change
•  Does not use the same “broker” architecture
–  Log stored in MapR Storage (Scalable, Secured, Fast, Multi DC)
–  No Zookeeper
© 2017 MapR TechnologiesMapR Confidential 28
Produce Messages
!
ProducerRecord<String, byte[]> rec = new ProducerRecord<>(!
“/apps/racing/stream:sensors_data“,!
eventName,!
value.toString().getBytes());!
!
producer.send(rec, (recordMetadata, e) -> {!
if (e != null) { … });!
!
producer.flush();!
© 2017 MapR TechnologiesMapR Confidential 29
Consume Messages
long pollTimeOut = 800;!
while(true) {!
!
ConsumerRecords<String, String> records = consumer.poll(pollTimeOut);!
if (!records.isEmpty()) {!
Iterable<ConsumerRecord<String, String>> iterable = records.iterator();!
!
while (iterable.hasNext()) {!
ConsumerRecord<String, String> record = iterable.next();!
System.out.println(record.value());!
!
}!
!
!
} !
}!
© 2017 MapR TechnologiesMapR Confidential 30
Consume Messages
long pollTimeOut = 800;!
while(true) {!
!
ConsumerRecords<String, String> records = consumer.poll(pollTimeOut);!
if (!records.isEmpty()) {!
Iterable<ConsumerRecord<String, String>> iterable = records::iterator;!
!
StreamSupport!
.stream(iterable.spliterator(), false)!
.forEach((record) -> {!
// work with record object!
record.value();!
…!
!
!
});!
consumer.commitAsync();!
}!
}!
© 2017 MapR TechnologiesMapR Confidential 31
Where/How to store data ?
© 2017 MapR TechnologiesMapR Confidential 32
Big Datastore
Distributed File System
HDFS/MapR-FS
NoSQL Database
HBase/MapR-DB
….
© 2017 MapR TechnologiesMapR Confidential
Relational Database vs. MapR-DB/HBase
© 2017 MapR TechnologiesMapR Confidential
Designed for Partitioning and Scaling
© 2017 MapR TechnologiesMapR Confidential 35
What’s next?
© 2017 MapR TechnologiesMapR Confidential
Tables as Objects, Objects as Tables
c1 c2 c3
Row-wise form
c1 c2 c3
Column-wise form
[ { c1:v1, c2:v2, c3:v3 },
{ c1:v1, c2:v2, c3:v3 },
{ c1:v1, c2:v2, c3:v3 } ]
List of objects
{ c1:[v1, v2, v3],
c2:[v1, v2, v3],
c3:[v1, v2, v3] }
Object containing lists
Data collected is in JSON
© 2017 MapR TechnologiesMapR Confidential 37
Sensor Data V1
•  For speed group sensor data
•  Each message contains
array of Sensor data:
–  Speed (m/s)
–  RPM
–  Distance (m)
{ "_id":"1.458141858E9/0.324",
"car" = "car1",
"timestamp":1458141858,
"racetime”:0.324,
"records":
[
{
"sensors":{
"Speed":3.588583,
"Distance":2003.023071,
"RPM":1896.575806
},
"racetime":0.324,
"timestamp":1458141858
},
{
"sensors":{
"Speed":6.755624,
"Distance":2004.084717,
"RPM":1673.264526
},
"racetime":0.556,
"timestamp":1458141858
},
© 2017 MapR TechnologiesMapR Confidential 38
Capture more data
© 2017 MapR TechnologiesMapR Confidential 39
Sensor Data V2
•  Add 2 more data points:
–  Speed (m/s)
–  RPM
–  Distance (m)
–  Throttle
–  Gears
–  …
{ "_id":"1.458141858E9/0.324",
"car" = "car1",
"timestamp":1458141858,
"racetime”:0.324,
"records":
[
{
"sensors":{
"Speed":3.588583,
"Distance":2003.023071,
"RPM":1896.575806,
"gear" : 2
},
"racetime":0.324,
"timestamp":1458141858
},
{
"sensors":{
"Speed":6.755624,
"Distance":2004.084717,
“RPM":1673.264526,
"gear" : 2
},
"racetime":0.556,
"timestamp":1458141858
},
© 2017 MapR TechnologiesMapR Confidential 40
Add new services
© 2017 MapR TechnologiesMapR Confidential 41
New Data Service
Event Producers Event Consumers
sensor
data
Real Time
Analytics with SQL
https://github.com/mapr-demos/racing-time-series
Stream
Topic
Alerts
© 2017 MapR TechnologiesMapR Confidential 42
•  Cluster Computing Platform
•  Extends “MapReduce” with extensions
–  Streaming
–  Interactive Analytics
•  Run in Memory
•  http://spark.apache.org/
© 2017 MapR TechnologiesMapR Confidential 43
•  Streaming Dataflow Engine
–  Datastream/Dataset APIs
–  CEP, Graph, ML
•  Run in Memory
•  https://flink.apache.org/
© 2017 MapR TechnologiesMapR Confidential
Apache Flink Stack
44
DataStream	API	
Stream	Processing	
DataSet	API	
Batch	Processing	
Run/me	
Distributed	Streaming	Data	Flow	
Libraries	
Streaming and batch as first class citizens.!
© 2017 MapR TechnologiesMapR Confidential
A Stream Processing Pipeline
45
collect capture analyze Store and serve
Topic
© 2017 MapR TechnologiesMapR Confidential 46
Flink Consumer processing
DataStream stream = env.addSource(!
new FlinkKafkaConsumer09<>("/apps/racing/stream:sensors_data",!
new JSONDeserializationSchema(),!
properties)!
);!
!
!
stream.flatMap(new CarEventParser())!
.keyBy(0)!
.timeWindow(Time.seconds(10))!
.reduce(new AvgReducer())!
.flatMap(new AvgMapper())!
.map(new AvgPrinter())!
.print();!
© 2017 MapR TechnologiesMapR Confidential 47
Flink Consumer processing
!
!
class AvgReducer implements ReduceFunction<Tuple3<String, Float, Integer>> {!
@Override!
public Tuple3<String, Float, Integer> reduce(Tuple3<String, Float,Integer> value1,!
Tuple3<String, Float, Integer> value2) {!
return new Tuple3<>(value1.f0, value1.f1 + value2.f1, value1.f2+1);!
}!
}!
!
!
class AvgMapper implements FlatMapFunction<Tuple3<String, Float, Integer>, !
Tuple2<String, Float>> {!
@Override!
public void flatMap(Tuple3<String, Float, Integer> carInfo, !
Collector<Tuple2<String, Float>> out) throws Exception {!
out.collect( new Tuple2<>( carInfo.f0 , carInfo.f1/carInfo.f2 ) );!
}!
}!
© 2017 MapR TechnologiesMapR Confidential
What is Drill?
•  SQL	query engine for	data/apps	on	Hadoop	
	
•  Schema	op4onal	-	query	schema-less	data	in	situ	
	
•  Provides low latency interactive response times
•  SQL engine on “everything”
•  Semi structured formats – Ex: JSON
•  Structured formats – Ex: parquet
•  Ecosystem components – Hbase, MapRDB, Hive
© 2017 MapR TechnologiesMapR Confidential
Apache Drill: Drillbit components
	
• RPC	Endpoint	
• SQL	Parser	
• Op4mizer	
• Storage	plug	ins	
• Built in plugins for Hive,
Hbase, MapRDB, DFS,
traditional Linux
filesystems
© 2017 MapR TechnologiesMapR Confidential
Apache Drill Architecture
© 2017 MapR TechnologiesMapR Confidential
Drill Query Example
SELECT * FROM dfs.logs`/telemetry/all_cars`
LIMIT 10
© 2017 MapR TechnologiesMapR Confidential
Drill Query Example: average speed by car and race
SELECT race_id, car, ROUND(AVG( `t`.`records`.`sensors`.`Speed` ),2) as `mps`,
ROUND(AVG( `t`.`records`.`sensors`.`Speed` * 18 / 5 ), 2) as `kph`
FROM (
SELECT race_id, car,
flatten(records) as `records`
FROM dfs.`/telemetry/all_cars`
) AS `t`
GROUP BY race_id, car
© 2017 MapR TechnologiesMapR Confidential 53
Streaming Architecture & Formula 1
•  Stream & Process data in real time
–  Use distributed & scalable processing and storage
–  NoSQL Database
•  Decouple the source from the consumer(s)
–  Dashboard, Analytics, Machine Learning
–  Add new use case….
© 2017 MapR TechnologiesMapR Confidential 54
Streaming Architecture & Formula 1
• Stream & Process data in real
time
–  Use distributed & scalable
processing and storage
–  NoSQL Database, Distributed
File System
• Decouple the source from the
consumer(s)
–  Dashboard, Analytics, Machine
Learning
–  Add new use case….
This is not only about Formula 1!
(Telco, Finance, Retail, Content, IT)
© 2017 MapR TechnologiesMapR Confidential
To Learn More:
•  Download example code
–  https://github.com/mapr-demos/racing-time-series.
–  Read explanation of example code
–  https://mapr.com/blog/fast-cars-fast-data-formula1
–  https://mapr.com/blog/monitoring-uber-with-spark-streaming-kafka-and-
vertx/
© 2017 MapR TechnologiesMapR Confidential 56
MONITORING UBER DATA USING SPARK MACHINE
LEARNING, STREAMING, THE KAFKA API, AND VERT.X
© 2017 MapR TechnologiesMapR Confidential 57
MONITORING UBER DATA USING SPARK MACHINE
LEARNING, STREAMING, THE KAFKA API, AND VERT.X
https://mapr.com/blog/monitoring-uber-with-spark-streaming-kafka-and-vertx/
© 2017 MapR TechnologiesMapR Confidential 58
MONITORING UBER DATA USING SPARK MACHINE
LEARNING, STREAMING, THE KAFKA API, AND VERT.X
https://mapr.com/blog/monitoring-uber-with-spark-streaming-kafka-and-vertx/
© 2017 MapR TechnologiesMapR Confidential 59
One Last Thing…
Events Producers Events Consumers
Web Socket
https://github.com/mapr-demos/racing-time-series
BLE
MapR-Streams
Apache Kafka
© 2017 MapR TechnologiesMapR Confidential
MapR Blog
•  https://www.mapr.com/blog/
© 2017 MapR TechnologiesMapR Confidential
…helping you put data technology to work
●  Find answers
●  Ask technical questions
●  Join on-demand training course
discussions
●  Follow release announcements
●  Share and vote on product ideas
●  Find Meetup and event listings
Connect with fellow Apache
Hadoop and Spark professionals
community.mapr.com
© 2017 MapR TechnologiesMapR Confidential 62
Open Source Engines & Tools Commercial Engines & Applications
Utility-Grade Platform Services
DataProcessing
Web-Scale Storage
MapR-FS MapR-DB
Search and
Others
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Cloud and
Managed
Services
Search and
Others
UnifiedManagementandMonitoring
Search and
Others
Event StreamingDatabase
Custom
Apps
MapR Converged Data Platform
© 2017 MapR TechnologiesMapR Confidential
Stream Processing
Building a Complete Data Architecture
MapR File System
(MapR-FS)
MapR Converged Data Platform
MapR Database
(MapR-DB)
MapR Streams
Sources/Apps Bulk Processing

More Related Content

What's hot

NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
 

What's hot (20)

Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...Analysis of Popular Uber Locations using Apache APIs:  Spark Machine Learning...
Analysis of Popular Uber Locations using Apache APIs: Spark Machine Learning...
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 

Similar to Fast Cars, Big Data How Streaming can help Formula 1

Similar to Fast Cars, Big Data How Streaming can help Formula 1 (20)

Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
Fast Cars, Big Data - How Streaming Can Help Formula 1 - Tugdual Grall - Code...
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limits
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 

More from Carol McDonald (8)

Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
Streaming healthcare Data pipeline using Apache APIs: Kafka and Spark with Ma...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
Spark machine learning predicting customer churn
Spark machine learning predicting customer churnSpark machine learning predicting customer churn
Spark machine learning predicting customer churn
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
Machine Learning Recommendations with Spark
Machine Learning Recommendations with SparkMachine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
 
CU9411MW.DOC
CU9411MW.DOCCU9411MW.DOC
CU9411MW.DOC
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 

Fast Cars, Big Data How Streaming can help Formula 1

  • 1. © 2017 MapR TechnologiesMapR Confidential 1 Fast Cars, Big Data How Streaming Can Help Formula1 ? Tugdual Grall @tgrall
  • 2. © 2017 MapR TechnologiesMapR Confidential 2 Agenda •  What’s the point of data in motorsports? •  Demo •  Architecture •  What’s next? •  Q&A
  • 3. © 2017 MapR TechnologiesMapR Confidential 3 What’s the point of data in motorsports?
  • 4. © 2017 MapR TechnologiesMapR Confidential Some of the most heavily instrumented objects in the world https://www.metasphere.co.uk/telemetry-data-journey-f1/
  • 5. © 2017 MapR TechnologiesMapR Confidential 5 Got examples? •  Up to 300 sensors per car •  Up to 2000 channels •  Sensor data are sent to the paddock in 2ms •  1.5 billions of data points for a race •  5 billions for a full race weekend •  5/6Gb of compressed data per car for 90mn US Grand Prix 2014 : 243 Tb (race teams combined)!
  • 6. © 2017 MapR TechnologiesMapR Confidential Data Communication
  • 7. © 2017 MapR TechnologiesMapR Confidential 7
  • 8. © 2017 MapR TechnologiesMapR Confidential 8 Data in Motorsports F1 Framework - http://f1framework.blogspot.de/2013/08/short-guide-to-f1-telemetry-spa-circuit.html RPM speed Lateral acceleration gear throttle Brake pressure
  • 9. © 2017 MapR TechnologiesMapR Confidential 9 Race Strategy F1 Framework - http://f1framework.blogspot.be/2013/08/race-strategy-explained.html
  • 10. © 2017 MapR TechnologiesMapR Confidential 10 How does that work?
  • 11. © 2017 MapR TechnologiesMapR Confidential Replicating Stream Topic FIA Stream Topic Stream Topic Team Engineers Trackside Production System Outline Ad-hoc analysis Other Data Sources Real-time analysis Reporting Streaming Factory Operations Room MAPR EDGE MAPR EDGE Kafka APIKafka API Kafka API Kafka API Kafka API
  • 12. © 2017 MapR TechnologiesMapR Confidential Simplified Demo System Outline Ad-hoc analysis MapR Sandbox Torcs race simulator D3 Real-time analysis Kafka API Kafka API Kafka API
  • 13. © 2017 MapR TechnologiesMapR Confidential 13 TORCS for Cars, Physics & Drivers •  The Open Racing Car Simulator –  http://torcs.sourceforge.net/ •  a pseudo-physics based racing simulator •  AI Game & Research Platform
  • 14. © 2017 MapR TechnologiesMapR Confidential 14 Architecture Event Producers Event Consumers sensor data Real Time Analytics with SQL https://github.com/mapr-demos/racing-time-series Stream Topic Kafka API Kafka API
  • 15. © 2017 MapR TechnologiesMapR Confidential 15 Demo
  • 16. © 2017 MapR TechnologiesMapR Confidential 16 How to Capture IOT events at scale?
  • 17. © 2017 MapR TechnologiesMapR Confidential Traditional Message queue
  • 18. © 2017 MapR TechnologiesMapR Confidential 18 Data Streaming
 Moving millions of events per h|mn|s
  • 19. © 2017 MapR TechnologiesMapR Confidential 19 What is Kafka? •  http://kafka.apache.org/ •  Created at LinkedIn, open sourced in 2011 •  Distributed messaging system built to scale •  Implemented in Scala / Java
  • 20. © 2017 MapR TechnologiesMapR Confidential Topics Organize Events Into Categories And Decouple Producers from Consumers
  • 21. © 2017 MapR TechnologiesMapR Confidential 21 Topics are Partitioned for Concurrency and Scalability
  • 22. © 2017 MapR TechnologiesMapR Confidential Partition message order is like a Queue Partitions: •  Messages are appended to the end •  delivered in order received Offset: •  Sequential id of a message
  • 23. © 2017 MapR TechnologiesMapR Confidential Partition message order is like a Queue •  Producers Append New messages to end •  Consumers Read from front •  Read cursor: offset ID of most recent read message
  • 24. © 2017 MapR TechnologiesMapR Confidential Unlike a Queue, events are still persisted after delivered • Messages remain on the partition, available to other consumers • Minimizes Non-Sequential disk read-writes
  • 25. © 2017 MapR TechnologiesMapR Confidential 25 Processing of the same message for different views
  • 26. © 2017 MapR TechnologiesMapR Confidential 26 Processing of the same message for different views
  • 27. © 2017 MapR TechnologiesMapR Confidential 27 MapR Streams •  Distributed messaging system built to scale •  Use Apache Kafka API 0.9.0 •  No code change •  Does not use the same “broker” architecture –  Log stored in MapR Storage (Scalable, Secured, Fast, Multi DC) –  No Zookeeper
  • 28. © 2017 MapR TechnologiesMapR Confidential 28 Produce Messages ! ProducerRecord<String, byte[]> rec = new ProducerRecord<>(! “/apps/racing/stream:sensors_data“,! eventName,! value.toString().getBytes());! ! producer.send(rec, (recordMetadata, e) -> {! if (e != null) { … });! ! producer.flush();!
  • 29. © 2017 MapR TechnologiesMapR Confidential 29 Consume Messages long pollTimeOut = 800;! while(true) {! ! ConsumerRecords<String, String> records = consumer.poll(pollTimeOut);! if (!records.isEmpty()) {! Iterable<ConsumerRecord<String, String>> iterable = records.iterator();! ! while (iterable.hasNext()) {! ConsumerRecord<String, String> record = iterable.next();! System.out.println(record.value());! ! }! ! ! } ! }!
  • 30. © 2017 MapR TechnologiesMapR Confidential 30 Consume Messages long pollTimeOut = 800;! while(true) {! ! ConsumerRecords<String, String> records = consumer.poll(pollTimeOut);! if (!records.isEmpty()) {! Iterable<ConsumerRecord<String, String>> iterable = records::iterator;! ! StreamSupport! .stream(iterable.spliterator(), false)! .forEach((record) -> {! // work with record object! record.value();! …! ! ! });! consumer.commitAsync();! }! }!
  • 31. © 2017 MapR TechnologiesMapR Confidential 31 Where/How to store data ?
  • 32. © 2017 MapR TechnologiesMapR Confidential 32 Big Datastore Distributed File System HDFS/MapR-FS NoSQL Database HBase/MapR-DB ….
  • 33. © 2017 MapR TechnologiesMapR Confidential Relational Database vs. MapR-DB/HBase
  • 34. © 2017 MapR TechnologiesMapR Confidential Designed for Partitioning and Scaling
  • 35. © 2017 MapR TechnologiesMapR Confidential 35 What’s next?
  • 36. © 2017 MapR TechnologiesMapR Confidential Tables as Objects, Objects as Tables c1 c2 c3 Row-wise form c1 c2 c3 Column-wise form [ { c1:v1, c2:v2, c3:v3 }, { c1:v1, c2:v2, c3:v3 }, { c1:v1, c2:v2, c3:v3 } ] List of objects { c1:[v1, v2, v3], c2:[v1, v2, v3], c3:[v1, v2, v3] } Object containing lists Data collected is in JSON
  • 37. © 2017 MapR TechnologiesMapR Confidential 37 Sensor Data V1 •  For speed group sensor data •  Each message contains array of Sensor data: –  Speed (m/s) –  RPM –  Distance (m) { "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, "RPM":1673.264526 }, "racetime":0.556, "timestamp":1458141858 },
  • 38. © 2017 MapR TechnologiesMapR Confidential 38 Capture more data
  • 39. © 2017 MapR TechnologiesMapR Confidential 39 Sensor Data V2 •  Add 2 more data points: –  Speed (m/s) –  RPM –  Distance (m) –  Throttle –  Gears –  … { "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806, "gear" : 2 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, “RPM":1673.264526, "gear" : 2 }, "racetime":0.556, "timestamp":1458141858 },
  • 40. © 2017 MapR TechnologiesMapR Confidential 40 Add new services
  • 41. © 2017 MapR TechnologiesMapR Confidential 41 New Data Service Event Producers Event Consumers sensor data Real Time Analytics with SQL https://github.com/mapr-demos/racing-time-series Stream Topic Alerts
  • 42. © 2017 MapR TechnologiesMapR Confidential 42 •  Cluster Computing Platform •  Extends “MapReduce” with extensions –  Streaming –  Interactive Analytics •  Run in Memory •  http://spark.apache.org/
  • 43. © 2017 MapR TechnologiesMapR Confidential 43 •  Streaming Dataflow Engine –  Datastream/Dataset APIs –  CEP, Graph, ML •  Run in Memory •  https://flink.apache.org/
  • 44. © 2017 MapR TechnologiesMapR Confidential Apache Flink Stack 44 DataStream API Stream Processing DataSet API Batch Processing Run/me Distributed Streaming Data Flow Libraries Streaming and batch as first class citizens.!
  • 45. © 2017 MapR TechnologiesMapR Confidential A Stream Processing Pipeline 45 collect capture analyze Store and serve Topic
  • 46. © 2017 MapR TechnologiesMapR Confidential 46 Flink Consumer processing DataStream stream = env.addSource(! new FlinkKafkaConsumer09<>("/apps/racing/stream:sensors_data",! new JSONDeserializationSchema(),! properties)! );! ! ! stream.flatMap(new CarEventParser())! .keyBy(0)! .timeWindow(Time.seconds(10))! .reduce(new AvgReducer())! .flatMap(new AvgMapper())! .map(new AvgPrinter())! .print();!
  • 47. © 2017 MapR TechnologiesMapR Confidential 47 Flink Consumer processing ! ! class AvgReducer implements ReduceFunction<Tuple3<String, Float, Integer>> {! @Override! public Tuple3<String, Float, Integer> reduce(Tuple3<String, Float,Integer> value1,! Tuple3<String, Float, Integer> value2) {! return new Tuple3<>(value1.f0, value1.f1 + value2.f1, value1.f2+1);! }! }! ! ! class AvgMapper implements FlatMapFunction<Tuple3<String, Float, Integer>, ! Tuple2<String, Float>> {! @Override! public void flatMap(Tuple3<String, Float, Integer> carInfo, ! Collector<Tuple2<String, Float>> out) throws Exception {! out.collect( new Tuple2<>( carInfo.f0 , carInfo.f1/carInfo.f2 ) );! }! }!
  • 48. © 2017 MapR TechnologiesMapR Confidential What is Drill? •  SQL query engine for data/apps on Hadoop •  Schema op4onal - query schema-less data in situ •  Provides low latency interactive response times •  SQL engine on “everything” •  Semi structured formats – Ex: JSON •  Structured formats – Ex: parquet •  Ecosystem components – Hbase, MapRDB, Hive
  • 49. © 2017 MapR TechnologiesMapR Confidential Apache Drill: Drillbit components • RPC Endpoint • SQL Parser • Op4mizer • Storage plug ins • Built in plugins for Hive, Hbase, MapRDB, DFS, traditional Linux filesystems
  • 50. © 2017 MapR TechnologiesMapR Confidential Apache Drill Architecture
  • 51. © 2017 MapR TechnologiesMapR Confidential Drill Query Example SELECT * FROM dfs.logs`/telemetry/all_cars` LIMIT 10
  • 52. © 2017 MapR TechnologiesMapR Confidential Drill Query Example: average speed by car and race SELECT race_id, car, ROUND(AVG( `t`.`records`.`sensors`.`Speed` ),2) as `mps`, ROUND(AVG( `t`.`records`.`sensors`.`Speed` * 18 / 5 ), 2) as `kph` FROM ( SELECT race_id, car, flatten(records) as `records` FROM dfs.`/telemetry/all_cars` ) AS `t` GROUP BY race_id, car
  • 53. © 2017 MapR TechnologiesMapR Confidential 53 Streaming Architecture & Formula 1 •  Stream & Process data in real time –  Use distributed & scalable processing and storage –  NoSQL Database •  Decouple the source from the consumer(s) –  Dashboard, Analytics, Machine Learning –  Add new use case….
  • 54. © 2017 MapR TechnologiesMapR Confidential 54 Streaming Architecture & Formula 1 • Stream & Process data in real time –  Use distributed & scalable processing and storage –  NoSQL Database, Distributed File System • Decouple the source from the consumer(s) –  Dashboard, Analytics, Machine Learning –  Add new use case…. This is not only about Formula 1! (Telco, Finance, Retail, Content, IT)
  • 55. © 2017 MapR TechnologiesMapR Confidential To Learn More: •  Download example code –  https://github.com/mapr-demos/racing-time-series. –  Read explanation of example code –  https://mapr.com/blog/fast-cars-fast-data-formula1 –  https://mapr.com/blog/monitoring-uber-with-spark-streaming-kafka-and- vertx/
  • 56. © 2017 MapR TechnologiesMapR Confidential 56 MONITORING UBER DATA USING SPARK MACHINE LEARNING, STREAMING, THE KAFKA API, AND VERT.X
  • 57. © 2017 MapR TechnologiesMapR Confidential 57 MONITORING UBER DATA USING SPARK MACHINE LEARNING, STREAMING, THE KAFKA API, AND VERT.X https://mapr.com/blog/monitoring-uber-with-spark-streaming-kafka-and-vertx/
  • 58. © 2017 MapR TechnologiesMapR Confidential 58 MONITORING UBER DATA USING SPARK MACHINE LEARNING, STREAMING, THE KAFKA API, AND VERT.X https://mapr.com/blog/monitoring-uber-with-spark-streaming-kafka-and-vertx/
  • 59. © 2017 MapR TechnologiesMapR Confidential 59 One Last Thing… Events Producers Events Consumers Web Socket https://github.com/mapr-demos/racing-time-series BLE MapR-Streams Apache Kafka
  • 60. © 2017 MapR TechnologiesMapR Confidential MapR Blog •  https://www.mapr.com/blog/
  • 61. © 2017 MapR TechnologiesMapR Confidential …helping you put data technology to work ●  Find answers ●  Ask technical questions ●  Join on-demand training course discussions ●  Follow release announcements ●  Share and vote on product ideas ●  Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com
  • 62. © 2017 MapR TechnologiesMapR Confidential 62 Open Source Engines & Tools Commercial Engines & Applications Utility-Grade Platform Services DataProcessing Web-Scale Storage MapR-FS MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps MapR Converged Data Platform
  • 63. © 2017 MapR TechnologiesMapR Confidential Stream Processing Building a Complete Data Architecture MapR File System (MapR-FS) MapR Converged Data Platform MapR Database (MapR-DB) MapR Streams Sources/Apps Bulk Processing