SlideShare a Scribd company logo
1 of 30
Download to read offline
Distributed Near Real-Time Processing
of Sensor Network Data Flows
for Smart Grids
Advisor: Prof. Dr. Philippe O. A. Navaux
Co-advisor: Prof. M.Sc. Eduardo Roloff
.
Otávio Moraes de Carvalho
January 16, 2016
Institute of Informatics | Federal University of Rio Grande do Sul
Table of contents
.
1. Introduction
2. Background
3. Design
4. Implementation
5. Evaluation
6. Conclusion and Future work
2
Introduction
.
• Motivation
• Internet Ubiquity
• Ubiquity of Sensors
• Data velocity
• Smart Grids
• Objective
• Provide a scalable platform for distributed near real-time processing
of sensor networks data flows, focused on data profiles of Smart
Grids
1. How to scale a distributed platform for IoT?
2. How to provide insights in near real-time?
3. How to test a platform like this?
3
Internet of Things
.
• Pervasivity of sensors, that have ability to interact with each
other through unique addressing schemes, and cooperate with their
neighbours to reach common goals. [?]
Figure 1: Total units of connected devices - Gartner Inc. 2013 Forecast [?]
4
Internet of Things
.
Figure 2: IoT paradigm as the convergence of different visions [?]
5
Distributed Stream Processing Systems
.
• Online applications that require real-time or near-real-time
processing functionalities are the main motivation.
• Low latency alternatives to Hadoop processing approach
(MapReduce) are needed [?].
• Common requirements:
1. Input streams with high up to very high data rates (> 10000
events/s).
2. Relaxed latency constraints (up to a few seconds).
3. Use cases require the correlation among historical and live data.
4. Systems that elastically scale and to support diverse workloads.
5. Low overhead fault tolerance supporting out of order events and
exactly once semantic.
6
Distributed Stream Processing Systems
.
• The most prominent frameworks found on the state-of-the-art:
1. Apache Storm
2. Apache Spark Streaming
3. Apache Flink
7
Cloud Computing
.
• According to NIST definition [?], Cloud Computing is a model that
conveniently provides on-demand network access to a shared pool
of configurable computing resources that can be provisioned and
released quickly without large management efforts and
interaction with the service provider.
Figure 3: Cloud Computing service models stack and their relationships
8
Big Data
.
• NIST defines big data as ”Big data shall mean the data of which
the data volume, acquisition speed, or data representation limits
the capacity of using traditional relational methods to conduct
effective analysis or the data which may be effectively processed with
important horizontal zoom technologies”. [?]
• ”3Vs” model: [?]
1. Volume, following the increasing generation and collection of masses
of data, data scale becomes increasingly big.
2. Variety, indicates the various types of data, which include
semi-structured and unstructured data such as audio, video,
webpage, and text, as well as traditional structured data.
3. Velocity, meaning the timeliness of big data, specifically, data
collection and analysis, etc. that must be rapidly and timely
conducted, so as to maximumly utilize the commercial value of big
data.
9
Smart Grids
.
• For 100 years, there has been no change in the basic structure of
the electrical power grid. Experiences have shown that the
hierarchical, centrally controlled grid of the 20th Century is ill-suited
to the needs of the 21st Century.
• Advanced Metering Infrastructure (AMI): Infrastructure for
information gathering through smart meters. Drives the need for
high throughput when using large number of IoT meters.
• Demand Side Management (DSM): Energy generation peak
management and reductions of the need for investments in power
generation sources.
• Energy Consumption Forecasts: Provide a prediction of an
amount of electricity consumed at a certain point of time. The
purpose of electricity load forecasting is an efficient economic and
quality planning of energy generation. Drives the need for low
processing latency.
10
Architecture
.
• A found a few architectural patterns on the state-of-the-art:
1. Lambda Architecture
2. Kappa Architecture
3. Liquid Architecture
11
Cyclic Architecture
.
• We propose Cyclic architecture, which is a hybrid solution mixing
architectural solutions from Kappa architecture and Liquid
architecture.
Figure 4: An overview of the proposed Cyclic Architecture
12
Dataset
.
1. The dataset used to evaluate the platform originates from the 8th
ACM International Conference on Distributed Event-Based Systems
(DEBS 2014).
2. The synthesized data file contains over 4055 Millions of
measurements for 2125 plugs distributed across 40 houses, for a
total amount of 136 GB.
3. Generated measurements cover a period of one month, from Sept.
1st, 2013, 00:00:00, to Sept. 30th, 2013, 23:59:59. For our tests, we
used a subset of this file, which have 100 Million measurements,
using the same amount of plugs and houses, for a total amount of
3.6 GB.
13
Dataset
.
14
Forecasting Method
.
• The select forecast method was chosen due to need of a model fit
between the algorithm and the processing capabilities of a
distributed stream processing framework. It represents a mixed
approach between MLP (Multilayer Perceptron) and
Autoregressive Integrated Moving Average (ARIMA). [?].
• More specifically, the set of queries provide a forecast of the load for:
(1) each house, i.e., house-based and (2) for each individual plug,
i.e., plug-based. The forecast for each house and plug is made
based on the current load of the connected plugs and a plug specific
prediction model.
• The aim of these queries is not provide the best prediction model,
but at stressing the interplay between modules for model learning
that operate on long-term (historic) data with components that
apply the model on top of live, high velocity data.
15
Forecasting Method
.
L(si+2) =
avgL(si) + median(avgL(sj))
2
(1)
In the formula (1), avgL(si) represents the current average load for the
slice si. The value of avgL(si), in case of plug-based prediction, is
calculated as the average of all load values reported by the given plug
with timestamps ∈ si. In case of a house-based prediction the avgL(si) is
calculated as a sum of average values for each plug within the house.
avgL(sj) is a set of average load value for all slices sj such that:
sj = si+2−n∗k (2)
where k is the number of slices in a 24 hour period, n is a natural number
with values between 1 and floor(i+2
k ). The value of avgL(sj) is calculated
analogously to avgL(si) in case of plug-based and house-based (sum of
averages) variants.
16
Implementation
.
Figure 5: An overview of the stack used to implement the Cyclic Architecture
17
Processing flow
.
Figure 6: An overview of the data processing flow 18
Platform
.
• In order to evaluate the system, we needed a platform for being able
to execute our tests. The platform was built relying on Microsoft
Azure to host our application, and it was configured using the
following settings:
19
Latency
.
Figure 7: Best case scenario - Large batches with 8 processing nodes
20
Latency
.
Figure 8: Worst case scenario - Small batches with 1 processing node
21
Throughput
.
Figure 9: Average message throughput, by number of nodes, with 30 seconds
batch
22
Throughput
.
Figure 10: Average message throughput, by batch sizes, with 8 processing
nodes
23
Conclusion
.
• A system for processing distributed near real-time data flows, with
focus on Smart Grids data profiles, was successfully design and
implemented.
• The build system is able to scale linearly up to 8 processing
nodes. Which is important to process large numbers of smart
meters.
• The system is able to provide desirable latencies, which is
important to provide load forecasts in time to be used. However, it
was found that tiny batch sizes could turn processing unstable.
• It was found that greater batch sizes improve throughput, in
expense of latencies, which start to increase proportionally.
24
Future work
.
• Improvements on throughput by increasing the number of parallel
data input feeds into Apache Kafka.
• Deeper research on prediction forecasting and results on forecast
accuracy.
• Studies on fault-tolerance and system availability.
• Abstraction layer for machine deployment and management,
using Apache YARN or Apache Mesos with Docker containers.
25
Questions?
26
References I
.
L. Atzori et al.
The Internet of Things: A Survey.
Computer networks, 54(15):2787–2805, 2010.
T. Bylander and B. Rosen.
A Perceptron-like Online Algorithm for Tracking the Median.
In Neural Networks, 1997., International Conference on, volume 4,
pages 2219–2224. IEEE, 1997.
D. Laney.
3-D Data Management: Controlling Data Volume.
Velocity and Variety, META Group Original Research Note, 2001.
27
References II
.
I. Lee et al.
The Internet of Things (IoT): Applications, Investments and
Challenges for Enterprises.
Business Horizons, 2015.
P. Mell and T. Grance.
The NIST definition of Cloud Computing.
2011.
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt,
S. Madden, and M. Stonebraker.
A Comparison of Approaches to Large-Scale Data Analysis.
In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages
165–178. ACM, 2009.
28
References III
.
N. B. D. PWG.
Nist big data interoperability framework.
Reference Architecture, 2014.
29
D-Streams
.
• Treat streaming computation as a series of deterministic batch
computations on small time intervals.
• D-Streams bring traditional functional transformation operators and
introduce new stateful operators that work over multiple intervals.
These include:
• Windowing
• Incremental aggregation over sliding windows
• Time-skewed joins
Figure 11: Comparison between a simple and a windowed DStream
30

More Related Content

What's hot

High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
KamleshKumar394
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computing
DIGVIJAY SHINDE
 

What's hot (20)

Big Data Visualization Problem in IT Management
Big Data Visualization Problem in IT ManagementBig Data Visualization Problem in IT Management
Big Data Visualization Problem in IT Management
 
Big data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at KitwareBig data visualization frameworks and applications at Kitware
Big data visualization frameworks and applications at Kitware
 
STDCS
STDCSSTDCS
STDCS
 
A TIME EFFICIENT APPROACH FOR DETECTING ERRORS IN BIG SENSOR DATA ON CLOUD
A TIME EFFICIENT APPROACH FOR DETECTING ERRORS IN BIG SENSOR DATA ON CLOUDA TIME EFFICIENT APPROACH FOR DETECTING ERRORS IN BIG SENSOR DATA ON CLOUD
A TIME EFFICIENT APPROACH FOR DETECTING ERRORS IN BIG SENSOR DATA ON CLOUD
 
Fast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environmentsFast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environments
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
 
A time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloudA time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloud
 
Fast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environmentsFast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environments
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiStanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
Task Scheduling methodology in cloud computing
Task Scheduling methodology in cloud computing Task Scheduling methodology in cloud computing
Task Scheduling methodology in cloud computing
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computing
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
task scheduling in cloud datacentre using genetic algorithm
task scheduling in cloud datacentre using genetic algorithmtask scheduling in cloud datacentre using genetic algorithm
task scheduling in cloud datacentre using genetic algorithm
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 

Similar to Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart Grids

Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd Iaetsd
 

Similar to Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart Grids (20)

Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
F233842
F233842F233842
F233842
 
Grid computing & its applications
Grid computing & its applicationsGrid computing & its applications
Grid computing & its applications
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Grid computing
Grid computingGrid computing
Grid computing
 
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...A Survey on Neural Network Based Minimization of Data Center in Power Consump...
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
GRID COMPUTING
GRID COMPUTINGGRID COMPUTING
GRID COMPUTING
 
Computation grid as a connected world
Computation grid as a connected worldComputation grid as a connected world
Computation grid as a connected world
 
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSMULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
 
A cloud service architecture for analyzing big monitoring data
A cloud service architecture for analyzing big monitoring dataA cloud service architecture for analyzing big monitoring data
A cloud service architecture for analyzing big monitoring data
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storage
 
Service oriented cloud architecture for improved performance of smart grid ap...
Service oriented cloud architecture for improved performance of smart grid ap...Service oriented cloud architecture for improved performance of smart grid ap...
Service oriented cloud architecture for improved performance of smart grid ap...
 
Service oriented cloud architecture for improved
Service oriented cloud architecture for improvedService oriented cloud architecture for improved
Service oriented cloud architecture for improved
 
Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)
 
migrate-case-study
migrate-case-studymigrate-case-study
migrate-case-study
 
Grid computing
Grid computingGrid computing
Grid computing
 

More from Otávio Carvalho

More from Otávio Carvalho (8)

Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
 
Stream Processing - ThoughtWorks Architecture Group - 2017
Stream Processing - ThoughtWorks Architecture Group - 2017Stream Processing - ThoughtWorks Architecture Group - 2017
Stream Processing - ThoughtWorks Architecture Group - 2017
 
Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17
Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17
Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
A Survey of the State-of-the-art in Event Processing
A Survey of the State-of-the-art in Event ProcessingA Survey of the State-of-the-art in Event Processing
A Survey of the State-of-the-art in Event Processing
 
Análise e Caracterização das Novas Ferramentas para Computação em Nuvem
Análise e Caracterização das Novas Ferramentas para Computação em NuvemAnálise e Caracterização das Novas Ferramentas para Computação em Nuvem
Análise e Caracterização das Novas Ferramentas para Computação em Nuvem
 
Utilização de traços de execução para migração de aplicações para a nuvem
Utilização de traços de execução para migração de aplicações para a nuvemUtilização de traços de execução para migração de aplicações para a nuvem
Utilização de traços de execução para migração de aplicações para a nuvem
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart Grids

  • 1. Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart Grids Advisor: Prof. Dr. Philippe O. A. Navaux Co-advisor: Prof. M.Sc. Eduardo Roloff . Otávio Moraes de Carvalho January 16, 2016 Institute of Informatics | Federal University of Rio Grande do Sul
  • 2. Table of contents . 1. Introduction 2. Background 3. Design 4. Implementation 5. Evaluation 6. Conclusion and Future work 2
  • 3. Introduction . • Motivation • Internet Ubiquity • Ubiquity of Sensors • Data velocity • Smart Grids • Objective • Provide a scalable platform for distributed near real-time processing of sensor networks data flows, focused on data profiles of Smart Grids 1. How to scale a distributed platform for IoT? 2. How to provide insights in near real-time? 3. How to test a platform like this? 3
  • 4. Internet of Things . • Pervasivity of sensors, that have ability to interact with each other through unique addressing schemes, and cooperate with their neighbours to reach common goals. [?] Figure 1: Total units of connected devices - Gartner Inc. 2013 Forecast [?] 4
  • 5. Internet of Things . Figure 2: IoT paradigm as the convergence of different visions [?] 5
  • 6. Distributed Stream Processing Systems . • Online applications that require real-time or near-real-time processing functionalities are the main motivation. • Low latency alternatives to Hadoop processing approach (MapReduce) are needed [?]. • Common requirements: 1. Input streams with high up to very high data rates (> 10000 events/s). 2. Relaxed latency constraints (up to a few seconds). 3. Use cases require the correlation among historical and live data. 4. Systems that elastically scale and to support diverse workloads. 5. Low overhead fault tolerance supporting out of order events and exactly once semantic. 6
  • 7. Distributed Stream Processing Systems . • The most prominent frameworks found on the state-of-the-art: 1. Apache Storm 2. Apache Spark Streaming 3. Apache Flink 7
  • 8. Cloud Computing . • According to NIST definition [?], Cloud Computing is a model that conveniently provides on-demand network access to a shared pool of configurable computing resources that can be provisioned and released quickly without large management efforts and interaction with the service provider. Figure 3: Cloud Computing service models stack and their relationships 8
  • 9. Big Data . • NIST defines big data as ”Big data shall mean the data of which the data volume, acquisition speed, or data representation limits the capacity of using traditional relational methods to conduct effective analysis or the data which may be effectively processed with important horizontal zoom technologies”. [?] • ”3Vs” model: [?] 1. Volume, following the increasing generation and collection of masses of data, data scale becomes increasingly big. 2. Variety, indicates the various types of data, which include semi-structured and unstructured data such as audio, video, webpage, and text, as well as traditional structured data. 3. Velocity, meaning the timeliness of big data, specifically, data collection and analysis, etc. that must be rapidly and timely conducted, so as to maximumly utilize the commercial value of big data. 9
  • 10. Smart Grids . • For 100 years, there has been no change in the basic structure of the electrical power grid. Experiences have shown that the hierarchical, centrally controlled grid of the 20th Century is ill-suited to the needs of the 21st Century. • Advanced Metering Infrastructure (AMI): Infrastructure for information gathering through smart meters. Drives the need for high throughput when using large number of IoT meters. • Demand Side Management (DSM): Energy generation peak management and reductions of the need for investments in power generation sources. • Energy Consumption Forecasts: Provide a prediction of an amount of electricity consumed at a certain point of time. The purpose of electricity load forecasting is an efficient economic and quality planning of energy generation. Drives the need for low processing latency. 10
  • 11. Architecture . • A found a few architectural patterns on the state-of-the-art: 1. Lambda Architecture 2. Kappa Architecture 3. Liquid Architecture 11
  • 12. Cyclic Architecture . • We propose Cyclic architecture, which is a hybrid solution mixing architectural solutions from Kappa architecture and Liquid architecture. Figure 4: An overview of the proposed Cyclic Architecture 12
  • 13. Dataset . 1. The dataset used to evaluate the platform originates from the 8th ACM International Conference on Distributed Event-Based Systems (DEBS 2014). 2. The synthesized data file contains over 4055 Millions of measurements for 2125 plugs distributed across 40 houses, for a total amount of 136 GB. 3. Generated measurements cover a period of one month, from Sept. 1st, 2013, 00:00:00, to Sept. 30th, 2013, 23:59:59. For our tests, we used a subset of this file, which have 100 Million measurements, using the same amount of plugs and houses, for a total amount of 3.6 GB. 13
  • 15. Forecasting Method . • The select forecast method was chosen due to need of a model fit between the algorithm and the processing capabilities of a distributed stream processing framework. It represents a mixed approach between MLP (Multilayer Perceptron) and Autoregressive Integrated Moving Average (ARIMA). [?]. • More specifically, the set of queries provide a forecast of the load for: (1) each house, i.e., house-based and (2) for each individual plug, i.e., plug-based. The forecast for each house and plug is made based on the current load of the connected plugs and a plug specific prediction model. • The aim of these queries is not provide the best prediction model, but at stressing the interplay between modules for model learning that operate on long-term (historic) data with components that apply the model on top of live, high velocity data. 15
  • 16. Forecasting Method . L(si+2) = avgL(si) + median(avgL(sj)) 2 (1) In the formula (1), avgL(si) represents the current average load for the slice si. The value of avgL(si), in case of plug-based prediction, is calculated as the average of all load values reported by the given plug with timestamps ∈ si. In case of a house-based prediction the avgL(si) is calculated as a sum of average values for each plug within the house. avgL(sj) is a set of average load value for all slices sj such that: sj = si+2−n∗k (2) where k is the number of slices in a 24 hour period, n is a natural number with values between 1 and floor(i+2 k ). The value of avgL(sj) is calculated analogously to avgL(si) in case of plug-based and house-based (sum of averages) variants. 16
  • 17. Implementation . Figure 5: An overview of the stack used to implement the Cyclic Architecture 17
  • 18. Processing flow . Figure 6: An overview of the data processing flow 18
  • 19. Platform . • In order to evaluate the system, we needed a platform for being able to execute our tests. The platform was built relying on Microsoft Azure to host our application, and it was configured using the following settings: 19
  • 20. Latency . Figure 7: Best case scenario - Large batches with 8 processing nodes 20
  • 21. Latency . Figure 8: Worst case scenario - Small batches with 1 processing node 21
  • 22. Throughput . Figure 9: Average message throughput, by number of nodes, with 30 seconds batch 22
  • 23. Throughput . Figure 10: Average message throughput, by batch sizes, with 8 processing nodes 23
  • 24. Conclusion . • A system for processing distributed near real-time data flows, with focus on Smart Grids data profiles, was successfully design and implemented. • The build system is able to scale linearly up to 8 processing nodes. Which is important to process large numbers of smart meters. • The system is able to provide desirable latencies, which is important to provide load forecasts in time to be used. However, it was found that tiny batch sizes could turn processing unstable. • It was found that greater batch sizes improve throughput, in expense of latencies, which start to increase proportionally. 24
  • 25. Future work . • Improvements on throughput by increasing the number of parallel data input feeds into Apache Kafka. • Deeper research on prediction forecasting and results on forecast accuracy. • Studies on fault-tolerance and system availability. • Abstraction layer for machine deployment and management, using Apache YARN or Apache Mesos with Docker containers. 25
  • 27. References I . L. Atzori et al. The Internet of Things: A Survey. Computer networks, 54(15):2787–2805, 2010. T. Bylander and B. Rosen. A Perceptron-like Online Algorithm for Tracking the Median. In Neural Networks, 1997., International Conference on, volume 4, pages 2219–2224. IEEE, 1997. D. Laney. 3-D Data Management: Controlling Data Volume. Velocity and Variety, META Group Original Research Note, 2001. 27
  • 28. References II . I. Lee et al. The Internet of Things (IoT): Applications, Investments and Challenges for Enterprises. Business Horizons, 2015. P. Mell and T. Grance. The NIST definition of Cloud Computing. 2011. A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A Comparison of Approaches to Large-Scale Data Analysis. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 165–178. ACM, 2009. 28
  • 29. References III . N. B. D. PWG. Nist big data interoperability framework. Reference Architecture, 2014. 29
  • 30. D-Streams . • Treat streaming computation as a series of deterministic batch computations on small time intervals. • D-Streams bring traditional functional transformation operators and introduce new stateful operators that work over multiple intervals. These include: • Windowing • Incremental aggregation over sliding windows • Time-skewed joins Figure 11: Comparison between a simple and a windowed DStream 30