Powering a Virtual Power Station with Big Data

Powering a Virtual Power Station with
Big Data
Michael Bironneau
April 2016

0
5
10
15
20
25
30
35
Installed Capacity (GW) Generation (GW)

0
2
4
6
8
10
12
14
16
18
20
0:00 2:30 5:00 7:30 10:00 12:30 15:00 17:30 20:00 22:30
MW
Total Power
Average upwards flex – 120%
Average downwards flex – 35%

• 25-40k messages processed per second
• Total size of data 500TB-800TB
Open Energi in the coming year:

• 25-40k messages processed per second
• Total size of data 500TB-800TB
Open Energi in the coming year:
Perspective: here’s what “big data” means to Boeing [1]:
• ~64k messages per second from each aircraft
• Total size of data over 100 petabytes
[1]: http://bit.ly/18kQlMn

0
20
40
60
80
100
120
Open Energi Boeing
Size of data (PB)
Our data is not huge at the moment…

…but after domestic demand-side response (or something else on that scale)
0
20
40
60
80
100
120
Open Energi Boeing
Size of data (PB)

Why Hortonworks Data Platform
• Can scale quickly to respond to market demands
• Interoperability with existing code
• Fantastic data integration
• Knowledgeable technical support
• Security and data governance

Batch | Our HDP setup
Flume
Asset Data
National
Electricity Data
Market data
Other “live”
timeseries data
Hive
Streaming
Hive
other
Applications

Real-time | (Work ongoing)
Asset Data
ML models
HDFS, cache,
Elasticsearch
…
Update ML Models
Correlate Events
Enrich

Apache Hive | Example
CREATE EXTERNAL TABLE semi_structured_stuff (...)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = ‘semi/structured',
'es.index.auto.create' = 'false') ;
SELECT something FROM semi_structured_stuff
JOIN metadata m ON …
LEFT JOIN timeseries t ON …
Index semi-structured data
(Elasticsearch)
Use Hive to integrate this with
timeseries data and other metadata
Farm out complex analytics to
Python
SELECT transform(something)
USING ‘insane_maths.py’
AS (result)

Benefits
• Reduced storage cost compared to SAN + SQL Server
• Better utilisation of infrastructure thanks to YARN
• Pain-free integration of multiple data sources with external tables
in Hive
• Scale up/down on demand
• Re-use existing Python code = low development overhead

Dynamic
Demand
Predict
&
Forecast
Optimise
&
Explore
Verify
Alert Simulations
Insights via web
Machine learning
Statistical Analysis
Event correlation
Expert system
Real-time aggregation
Real-time web feed

Thanks for listening. Any questions?

Powering a Virtual Power Station with Big Data

More Related Content

What's hot

Viewers also liked

Similar to Powering a Virtual Power Station with Big Data

More from DataWorks Summit/Hadoop Summit

Recently uploaded

Powering a Virtual Power Station with Big Data

Editor's Notes