Machine Learning in the IoT with Apache NiFi

•Download as PPTX, PDF•

7 likes•2,920 views

Across the globe energy systems are changing, creating unprecedented challenges for the organisations tasked with ensuring the lights stay on. In the UK, National Grid is facing shrinking margins, looming capacity shortages and unpredictable peaks and troughs in energy supply caused by increasing levels of renewable penetration. Open Energi uses its IoT technology to unlock demand-side capacity - from industrial equipment, co-generation and batery storage systems - creating a smarter grid; one that is cleaner, cheaper, more secure and more efficient. I'll talk about how we use Apache Nifi to orchestrate and coordinate Machine Learning microservices that operate on streams of data coming from IoT devices, providing a layer of fault-tolerance and traceability. With built-in retry logic, backpressure and clustering, Nifi helps us keep hard problems away from our code. It comes with processors that integrate with our cloud provider of choice (Microsoft Azure), fitting seamlessly into our processing pipeline.Finally, its straightforward graphical interface makes it easy enough to use that any team member can step in and troubleshoot a flow with little training.

Technology

Machine Learning in the IoT
with
Apache Nifi
Michael Bironneau
April 2017
@OpenEnergi

Image from Wiki Commons https://en.wikipedia.org/wiki/Pearl_Street_Station

0
5
10
15
20
25
30
35
Installed Capacity (GW) Generation (GW)

0
2
4
6
8
10
12
14
16
18
20
0:00 2:30 5:00 7:30 10:00 12:30 15:00 17:30 20:00 22:30
MW
Total Power
Average upwards flex – 120%
Average downwards flex – 35%

Our Data
• ~20k telemetry messages/second
• ~5k messages/second report a change of state that requires
secondary processing (eg. validating forecast)
• Most messages require aggregation for reporting purposes

Why Apache Nifi?
• Data provenance
• Built-in mechanism for backpressure and fault handling
• Easy to use
• Built-in processors for Azure services
• Easy to extend
• Performance not our main concern, but nice to know that it
scales

Downsides
• Source control of flows – possible but diffs not very readable
• Automated flow testing and CI still remain difficult
• Script components not easy maintain
• Not all processors work in clustered mode

Computing Response After Dispatch
0
2
4
6
8
10
12
0 5 10 15 20 25 30
Response(kW)
Time Elapsed (s)
Dynamic Demand Response
-2
0
2
4
6
8
10
12
0 5 10 15 20 25 30
ActivePower(kW)
Time Elapsed (s)
Connected Power Consumption
Response
baseline
Duration of request

Extract JSON properties
Lookup previous state
and cache current state
Compute and publish
state change metrics

Dynamic Demand Response Forecasting
-2
0
2
4
6
8
10
12
0 5 10 15 20 25 30
ActivePower(kW)
Time (s)
Forecasted Response
Before After? Dispatch Request
Forecast

Extract properties from JSON
Metadata/state lookups and
caching
Score model
(Python Script)
First approach - pure Nifi solution

Observations
• Fun example, but not practical
• Nifi scripting is not easy to test or maintain
• Long, messy flows are not easy to troubleshoot

Extract JSON properties
Filter
Get
Forecast
2nd approach – Use Nifi as Orchestrator

Observations
• As practical/maintainable as the HTTP service
• Where did all the logic go? This is boring!
• Why use Nifi at all?
– Traditional stream processing (eg. Storm)
– Serverless (eg. Azure Function)

0%
5%
10%
15%
20%
25%
30%
35%
40%
135
140
145
150
155
160
165
7:12:00 PM 12:00:00 AM 4:48:00 AM 9:36:00 AM 2:24:00 PM 7:12:00 PM 12:00:00 AM 4:48:00 AM 9:36:00 AM 2:24:00 PM
MeanSqErrorOverDay
BitumenTankSetpoint(DegC)
Date
Forecasting Error
Setpoint Mean sq Error
Real-time Model Validation
Setpoint Change
Invalid Model

Forecast
Receive data
Observe
Difference
Increment
Accumulated
Square Error
Fit model parameters
Y
N
Acceptable Error?

Extract JSON properties
Filter
Get and cache
forecast
Validate and re-fit if required
Real-time Validation with Nifi

Next step
• Store the errors in a max-heap and use these to retrain in a
priority order
• Better reporting

Model
Registry/Proxy
Model 1
Model 1
Persistence

Model
Registry/Proxy
Model 1
Model 1
Persistence
Enrichment/
Aggregation
Forecasting/
Optimisation

What's hot

Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkroutconfluent

Re-imagine Data Monitoring with whylogs and SparkDatabricks

Apache Flink and what it is used forAljoscha Krettek

Apache NiFi SDLC ImprovementsBryan Bende

Apache Nifi Crash CourseDataWorks Summit

Apache NiFi Meetup - Princeton NJ 2016Timothy Spann

Batch and Stream Graph Processing with Apache FlinkVasia Kalavri

Introduction to Apache NiFi 1.11.4Timothy Spann

Delta from a Data Engineer's PerspectiveDatabricks

Designing APIs with OpenAPI SpecAdam Paxton

Apache NiFi Meetup - Introduction to NiFi RegistryBryan Bende

Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit

The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.

Apache Spark ArchitectureAlexey Grishchenko

Apache Kafka at LinkedInGuozhang Wang

Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann

Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri

Dataflow with Apache NiFiDataWorks Summit/Hadoop Summit

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Simplilearn

What's hot (20)

Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

Re-imagine Data Monitoring with whylogs and Spark

Apache Flink and what it is used for

Apache NiFi SDLC Improvements

Apache Nifi Crash Course

Apache NiFi Meetup - Princeton NJ 2016

Batch and Stream Graph Processing with Apache Flink

Introduction to Apache NiFi 1.11.4

Delta from a Data Engineer's Perspective

Designing APIs with OpenAPI Spec

Apache NiFi Meetup - Introduction to NiFi Registry

Design Patterns For Real Time Streaming Data Analytics

The Future of Data Warehousing: ETL Will Never be the Same

Apache Spark Architecture

Apache Kafka at LinkedIn

Real time stock processing with apache nifi, apache flink and apache kafka

Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

Dataflow with Apache NiFi

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...

Similar to Machine Learning in the IoT with Apache NiFi

Streaming to a new Jakarta EE / JOTB19Markus Eisele

OpenKilda: Stream Processing Meets OpenflowAPNIC

NTTs Journey with Openstack-finalshintaro mizuno

The Modern Telco Network: Defining The Telco CloudMarco Rodrigues

07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...Indonesia Network Operators Group

[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogicRakuten Group, Inc.

The UniProt SPARQL endpoint: 20 billion quads in productionJerven Bolleman

20 billion triples in productionIoan Toma

20 billion triples in productionLDBC council

Facilitating DevOps Execution in an All Digital EnvironmentKurt Andersen

Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA

Streaming to a New Jakarta EEJ On The Beach

Streaming to a new Jakarta EEMarkus Eisele

Alfredo paganophd 3yAlfredo Pagano

Using OpenStack In a Traditional Hosting EnvironmentOpenStack Foundation

sparql,uniprot.org in productionJerven Bolleman

Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy

Moving to microservices – a technology and organisation transformational journeyBoyan Dimitrov

Sql azure cluster dashboard public.pptQingsong Yao

IPv6/IPv4 Transition: The experience sharing of Tunnel Broker deployment Ethern Lin

Similar to Machine Learning in the IoT with Apache NiFi (20)

Streaming to a new Jakarta EE / JOTB19

OpenKilda: Stream Processing Meets Openflow

NTTs Journey with Openstack-final

The Modern Telco Network: Defining The Telco Cloud

07 (IDNOG02) SDN Research activity in Institut Teknologi Bandung by Affan Bas...

[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic

The UniProt SPARQL endpoint: 20 billion quads in production

20 billion triples in production

Facilitating DevOps Execution in an All Digital Environment

Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...

Streaming to a New Jakarta EE

Streaming to a new Jakarta EE

Alfredo paganophd 3y

Using OpenStack In a Traditional Hosting Environment

sparql,uniprot.org in production

Tsinghua University: Two Exemplary Applications in China

Moving to microservices – a technology and organisation transformational journey

Sql azure cluster dashboard public.ppt

IPv6/IPv4 Transition: The experience sharing of Tunnel Broker deployment

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)

Boost PC performance: How more available memory can improve productivity

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Breaking the Kubernetes Kill Chain: Host Path Mount

Maximizing Board Effectiveness 2024 Webinar.pptx

Salesforce Community Group Quito, Salesforce 101

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

CNv6 Instructor Chapter 6 Quality of Service

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

The Codex of Business Writing Software for Real-World Solutions 2.pptx

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Handwritten Text Recognition for manuscripts and early printed texts

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Understanding the Laravel MVC Architecture

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

A Domino Admins Adventures (Engage 2024)

Unblocking The Main Thread Solving ANRs and Frozen Frames

SQL Database Design For Developers at php[tek] 2024

Machine Learning in the IoT with Apache NiFi

1. Machine Learning in the IoT with Apache Nifi Michael Bironneau April 2017 @OpenEnergi

2. What problem are we solving?

3. Image from Wiki Commons https://en.wikipedia.org/wiki/Pearl_Street_Station

4. 0 5 10 15 20 25 30 35 Installed Capacity (GW) Generation (GW)

6. Our Solution

9. 0 2 4 6 8 10 12 14 16 18 20 0:00 2:30 5:00 7:30 10:00 12:30 15:00 17:30 20:00 22:30 MW Total Power Average upwards flex – 120% Average downwards flex – 35%

10.

11.

12. Our Data • ~20k telemetry messages/second • ~5k messages/second report a change of state that requires secondary processing (eg. validating forecast) • Most messages require aggregation for reporting purposes

13. Why Apache Nifi? • Data provenance • Built-in mechanism for backpressure and fault handling • Easy to use • Built-in processors for Azure services • Easy to extend • Performance not our main concern, but nice to know that it scales

14. Downsides • Source control of flows – possible but diffs not very readable • Automated flow testing and CI still remain difficult • Script components not easy maintain • Not all processors work in clustered mode

15. Examples

16. Computing Response After Dispatch 0 2 4 6 8 10 12 0 5 10 15 20 25 30 Response(kW) Time Elapsed (s) Dynamic Demand Response -2 0 2 4 6 8 10 12 0 5 10 15 20 25 30 ActivePower(kW) Time Elapsed (s) Connected Power Consumption Response baseline Duration of request

17. Extract JSON properties Lookup previous state and cache current state Compute and publish state change metrics

18. Dynamic Demand Response Forecasting -2 0 2 4 6 8 10 12 0 5 10 15 20 25 30 ActivePower(kW) Time (s) Forecasted Response Before After? Dispatch Request Forecast

19. Extract properties from JSON Metadata/state lookups and caching Score model (Python Script) First approach - pure Nifi solution

20. Observations • Fun example, but not practical • Nifi scripting is not easy to test or maintain • Long, messy flows are not easy to troubleshoot

21. Extract JSON properties Filter Get Forecast 2nd approach – Use Nifi as Orchestrator

22. Observations • As practical/maintainable as the HTTP service • Where did all the logic go? This is boring! • Why use Nifi at all? – Traditional stream processing (eg. Storm) – Serverless (eg. Azure Function)

23. 0% 5% 10% 15% 20% 25% 30% 35% 40% 135 140 145 150 155 160 165 7:12:00 PM 12:00:00 AM 4:48:00 AM 9:36:00 AM 2:24:00 PM 7:12:00 PM 12:00:00 AM 4:48:00 AM 9:36:00 AM 2:24:00 PM MeanSqErrorOverDay BitumenTankSetpoint(DegC) Date Forecasting Error Setpoint Mean sq Error Real-time Model Validation Setpoint Change Invalid Model

24. Forecast Receive data Observe Difference Increment Accumulated Square Error Fit model parameters Y N Acceptable Error?

25. Extract JSON properties Filter Get and cache forecast Validate and re-fit if required Real-time Validation with Nifi

26. Next step • Store the errors in a max-heap and use these to retrain in a priority order • Better reporting

27. Architecture

28. Model Registry/Proxy Model 1 Model 1 Persistence

29. Model Registry/Proxy Model 1 Model 1 Persistence Enrichment/ Aggregation Forecasting/ Optimisation

30. Model Registry/Proxy Model 1 Model 1 Persistence

31.

32. Thank you for listening

Editor's Notes

We are trying to improve the efficiency in power networks. About 20% at the time of the first power station – still only about 25% now.
The under-utilisation is even worse for renewables! Consumers end up footing the bill for this chronic inefficacy.
Why? Because we think we can’t control demand, so we have to over-supply in case of spikes…
Let’s control demand!
The gateway contains hard-coded information on the assets it controls, sensors that help it tell when the asset has stored energy and constraints (such as peak tariff avoidance), enabling it to dispatch them when grid frequency is too low or too high.
The aggregation means that each asset need not be a proportional control to grid frequency, but remains free to perform operational duties 94% of the time – our service is invisible to the end customer (except for the monthly checks).
Dynamic Demand can deliver approx £85,000 per MW/Yr FCDM / Static FFR £22,000 - £26,000 per MW/Yr STOR - £10,000 - £15,000 per MW/Yr
- Open Energi is turning the energy system on it’s head, so that instead of supply adjusting to meet demand, demand adjusts to meet supply By harnessing small amounts of flexible energy demand from energy-intensive equipment we can create a virtual power station and displace fossil-fuelled peaking power stations This is enabling a user-led transformation in how our energy system works, so that businesses and consumers are not only making it happen, but also seeing the benefits It’s a vital part of our transition to a zero carbon economy because we cannot maximise our use of renewables unless our demand for energy becomes more responsive
Basically, we’re 20x cheaper than building a new power station because we just tap into existing infrastructure.
This is not huge data on its own, but Low latency requirement for aggregations One message can feed into multiple streams
There are ongoing discussions to improve flow testing, CI and source control.
This is only one half of the flow!
To the third point, using Nifi gives better traceability, instantaneous feedback on pipeline health (i.e. metrics) and a simple UI.
Timeframe for all this – minutes to hours.
As an output of the PostHTTP processor we get not only the forecast but also expectation of error. We keep track of the accumulated square error, so that we can have a single “reduceable” map key in the distributed cache.
Nifi cluster – 5 nodes, 28 flows Flink cluster – 4 nodes, 16 jobs Persistence – Azure
In blue – tools used primarily by data science team. In grey – tools used primarily by software team. Others – shared infrastructure.
Nifi Auditability Shallow learning curve (easy to use) Nice UI Flink Ultimate control Windowing Steeper learning curve

Machine Learning in the IoT with Apache NiFi

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning in the IoT with Apache NiFi

Similar to Machine Learning in the IoT with Apache NiFi (20)

More from DataWorks Summit/Hadoop Summit

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded

Recently uploaded (20)

Machine Learning in the IoT with Apache NiFi

Editor's Notes