This document discusses two real-time geospatial analytics demos using MemSQL - PowerStream and Supercar. PowerStream predicts the health of 197,000 wind turbines globally using 1 million data points per second from sensors. Supercar tracks NYC taxi and limo data in real-time to analyze the on-demand economy. Both demos extract, transform and load streaming data into MemSQL for real-time querying and visualization.
5. Germany Just Got Almost All of Its
Power From Renewable Energy
May 15, 2016
Bloomberg: http://www.bloomberg.com/news/articles/2016-05-
16/germany-just-got-almost-all-of-its-power-from-renewable-energy
6. Denmark is aiming for
50% renewable
energy sources within
the next five years
Independent: http://www.independent.co.uk/environment/germany-just-got-almost-all-of-
its-power-from-renewable-energy-a7037851.html
42% of electricity
produced from wind
turbines in 2015
The Guardian:
http://www.theguardian.com/environment/2016/jan/18/denmark-
broke-world-record-for-wind-power-in-2015
7. Portugal Runs for Four Days
Straight on Renewable Energy Alone
http://www.theguardian.com/environment/2016/may/18/portugal-runs-for-four-days-straight-on-renewable-energy-alone
22% of electricity
provided by wind in 2015
10. 1 to 2 million data points per second
with MemSQL Streamliner
11. Simulation Details
11
Data producers (Python programs) push to Kafka
▪ 1M data points per second from 200k turbines
▪ Generated sensor data is based on predetermined turbine failure
model
Transform models individual turbine
(2 components per turbine) failures w/
machine learning, determining:
How fast is the turbine deteriorating?
How bad does the turbine get before being
repaired?
16. Extract, Transform
REAL-TIME
INPUTS
Streamliner
REAL-TIME
APPLICATION
Demo Architecture and Data Flow
Simulated sensor data is written to Kafka
Streamliner Extractor pulls data from Kafka into Spark
Streamliner Transformer then “scores” the failure model (ML algorithm)
• Failure model is scored through performing a regression on incoming sensor data values
16
17. Extract, Transform, Load
REAL-TIME
INPUTS
Streamliner
REAL-TIME
APPLICATION
Demo Architecture and Data Flow
Simulated sensor data is written to Kafka
Streamliner Extractor pulls data from Kafka into Spark
Streamliner Transformer then “scores” the failure model (ML algorithm)
• Failure model is scored through performing a regression on incoming sensor data values
Streamliner Loader inserts the data into MemSQL
17
19. Cluster Architecture
19
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
ZooKeeper
Spark Master
20. Internet-of-Things simulation depicting
health of wind turbines globally.
8 machines - AWS C4-2X
large instances, at
$0.311 per hour per machine,
annual cost ~ $22,000.
Cluster Architecture
20
21. Visual Layer
21
▪ MemSQL data is rendered in a web UI
• Turbine Health (green, yellow, red)
▪ Draw positions of turbines on a MapBox map
• A geospatial query is sent to MemSQL each time the map
view is moved
▪ Alerts based on predicted turbine health
▪ Data points shown on the UI map are all from real-time
queries
• Real-time in this case = 1 second interval
31. Cluster Architecture
▪ One single 16 core machine w/ 64 GB RAM is enough to
handle all of the data in real time.
▪ That’s really it
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
ZooKeeper
Spark Master
31
32. Simulation Details
▪ NYC Taxi and Limo Commission Trip Record Data
• Downloads available each year fo’ free
▪ Simulation utilizes dataset from NYE
(one of the busiest days for cabs in NYC)
▪ Drivers are assigned pickups and dropoffs from real
data set
▪ Routes are replayed over time
32
33. Extract, Transform, Load
REAL-TIME
INPUTS
Streamliner
REAL-TIME
APPLICATION
Demo Architecture and Data Flow
Simulated driver data is written to Kafka
Streamliner Extractor pulls data from Kafka into Spark
Streamliner Transformer parses the CSV and transforms it to a Spark DataFrame
Streamliner Loader inserts the data into MemSQL
33
36. Resources
▪ Powerstream blog post
http://blog.memsql.com/powerstream-demo/
▪ Powerstream recording
https://youtu.be/DhP324uNZMI?t=589
▪ Supercar blog post
http://blog.memsql.com/real-time-geospatial-intelligence-with-supercar/
▪ Supercar recording
https://www.youtube.com/watch?v=2txICCLUV-Y
▪ Today’s talks will be published soon.
36