Real-Time, Geospatial, Maps
Neil Dahlke
29 June 2016
Agenda
2
▪ PowerStream
▪ Supercar
▪ Q&A
▪ Drinks
Renewable Energy
in the News
BCC: http://www.bbc.com/news/science-environment-36420750
Investment in renewables
reached $286 billion worldwide
in 2015
Germany Just Got Almost All of Its
Power From Renewable Energy
May 15, 2016
Bloomberg: http://www.bloomberg.com/news/articles/2016-05-
16/germany-just-got-almost-all-of-its-power-from-renewable-energy
Denmark is aiming for
50% renewable
energy sources within
the next five years
Independent: http://www.independent.co.uk/environment/germany-just-got-almost-all-of-
its-power-from-renewable-energy-a7037851.html
42% of electricity
produced from wind
turbines in 2015
The Guardian:
http://www.theguardian.com/environment/2016/jan/18/denmark-
broke-world-record-for-wind-power-in-2015
Portugal Runs for Four Days
Straight on Renewable Energy Alone
http://www.theguardian.com/environment/2016/may/18/portugal-runs-for-four-days-straight-on-renewable-energy-alone
22% of electricity
provided by wind in 2015
MemSQL PowerStream
Predicting the global health of wind turbines
Sensors
Wind Turbine Wind Farm
MemSQL PowerStream
197,000 wind turbines around the world
1 to 2 million data points per second
with MemSQL Streamliner
Simulation Details
11
Data producers (Python programs) push to Kafka
▪ 1M data points per second from 200k turbines
▪ Generated sensor data is based on predetermined turbine failure
model
Transform models individual turbine
(2 components per turbine) failures w/
machine learning, determining:
 How fast is the turbine deteriorating?
 How bad does the turbine get before being
repaired?
How does it work?
REAL-TIME
INPUTS
REAL-TIME
APPLICATION
Demo Architecture and Data Flow
13
REAL-TIME
INPUTS
REAL-TIME
APPLICATION
Demo Architecture and Data Flow
Simulated sensor data is written to Kafka
14
Extract
REAL-TIME
INPUTS
Streamliner
REAL-TIME
APPLICATION
Demo Architecture and Data Flow
Simulated sensor data is written to Kafka
Streamliner Extractor pulls data from Kafka into Spark
15
Extract, Transform
REAL-TIME
INPUTS
Streamliner
REAL-TIME
APPLICATION
Demo Architecture and Data Flow
Simulated sensor data is written to Kafka
Streamliner Extractor pulls data from Kafka into Spark
Streamliner Transformer then “scores” the failure model (ML algorithm)
• Failure model is scored through performing a regression on incoming sensor data values
16
Extract, Transform, Load
REAL-TIME
INPUTS
Streamliner
REAL-TIME
APPLICATION
Demo Architecture and Data Flow
Simulated sensor data is written to Kafka
Streamliner Extractor pulls data from Kafka into Spark
Streamliner Transformer then “scores” the failure model (ML algorithm)
• Failure model is scored through performing a regression on incoming sensor data values
Streamliner Loader inserts the data into MemSQL
17
Cluster Architecture
18
Aggregator
Nodes
Leaf
Nodes
Cluster Architecture
19
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
ZooKeeper
Spark Master
Internet-of-Things simulation depicting
health of wind turbines globally.
8 machines - AWS C4-2X
large instances, at
$0.311 per hour per machine,
annual cost ~ $22,000.
Cluster Architecture
20
Visual Layer
21
▪ MemSQL data is rendered in a web UI
• Turbine Health (green, yellow, red)
▪ Draw positions of turbines on a MapBox map
• A geospatial query is sent to MemSQL each time the map
view is moved
▪ Alerts based on predicted turbine health
▪ Data points shown on the UI map are all from real-time
queries
• Real-time in this case = 1 second interval
Demo
The On-Demand
Economy
24
MemSQL Supercar
Real-time asset tracking and analysis
We live in an
on-demand
economy
Consumers are conditioned to instant
services, like Uber, Stripe, and Airbnb
Where does
that leave
enterprises?
Racing to meet
internal and
external
expectations for
speed and
personalization
Batch
processing in
the enterprise
enemy
Enterprises
must move from
overnight to
real-time, intra-
day operations
Cluster Architecture
▪ One single 16 core machine w/ 64 GB RAM is enough to
handle all of the data in real time.
▪ That’s really it
Data Producer
Kafka
Spark
MemSQL Agg
MemSQL Leaf
ZooKeeper
Spark Master
31
Simulation Details
▪ NYC Taxi and Limo Commission Trip Record Data
• Downloads available each year fo’ free
▪ Simulation utilizes dataset from NYE
(one of the busiest days for cabs in NYC)
▪ Drivers are assigned pickups and dropoffs from real
data set
▪ Routes are replayed over time
32
Extract, Transform, Load
REAL-TIME
INPUTS
Streamliner
REAL-TIME
APPLICATION
Demo Architecture and Data Flow
Simulated driver data is written to Kafka
Streamliner Extractor pulls data from Kafka into Spark
Streamliner Transformer parses the CSV and transforms it to a Spark DataFrame
Streamliner Loader inserts the data into MemSQL
33
Demo
Q&A
Resources
▪ Powerstream blog post
http://blog.memsql.com/powerstream-demo/
▪ Powerstream recording
https://youtu.be/DhP324uNZMI?t=589
▪ Supercar blog post
http://blog.memsql.com/real-time-geospatial-intelligence-with-supercar/
▪ Supercar recording
https://www.youtube.com/watch?v=2txICCLUV-Y
▪ Today’s talks will be published soon.
36
Thank You

Real-Time, Geospatial, Maps by Neil Dahlke