SlideShare a Scribd company logo
MACHINE LEARNING ON THE
RUN: OPTIMIZED FEATURE
ENGINEERING FOR STREAMING
TIMESERIES DATA IN
INDUSTRIAL AUTOMATION
- Mayank Prasoon
AGENDA
• Challenges with processing high volume streaming data for industrial
automation
• Using semantic graphs in feature engineering of streaming data
• Using graphs for model creation and processing
• Parallelization in computing semantic graphs
• Conclusions
Challenges in industry 4.0
About Quartic.ai and Myself
• Quartic.ai
• Industrial IoT Platform for Manufacturing Industries
• Used by Fortune 100 companies in Pharma, Mining, Power and F&B
• Data Processing:
• 10B + daily data points in the cloud
• 100B + daily data points on the edge
• Mayank Prasoon
• Software Developer and product enthusiast
• Quartic.ai
• https://www.linkedin.com/in/prasoonmayank/
• Ukulele enthusiast
LATENCY
• A typical manufacturing pipeline has 10s to 100s of different
machines with each having 100s of sensors to measure multiple
factors like pressure, temperature, pH levels and so on, with each
giving data at interval of a few milli seconds to seconds.
• Further the raw data has to be processed in near real time, to ensure
that the time lag between the raw data intake and the data
processing is minimal.
• The data processing involves multiple combination of logical,
arithmetic and binary operations on the incoming data stream.
INTERDEPENDENCE
• The processed data often has less number of dimensions compared
to the raw data. This is done by dimensionality reduction by
combining, removing or minimizing the input raw data. This makes
the feature engineering process dependent on the input tags.
• The sensors don’t give data at the same sampling rates. Also, the
sampling rates may vary at different times due to drift and noise.
• All these factors have to be taken into consideration when creating
the feature engineering process.
GRANULARITY
• The incoming data can be very granular which means the data
processing should be happening at similar time intervals.
DATA VOLUME
• The volume of data can easily be in billions data points, each day.
• Quartic’s ingestion and processing is ~10B data points per day
Using semantic graphs in feature engineering
of streaming data
What are semantic graphs?
• According to Wikipedia, a semantic graph is "a network that
represents semantic relationships between concepts...It is a directed
or undirected graph consisting of vertices, which represent concepts,
and edges, which represent semantic relations between concepts."
SEMANTIC GRAPHS IN THE CASE OF
INDUSTRIAL AUTOMATION
Scaled down problem statement:
• We get input from 5 different sensors(features here), which have to
be reduced to a single variable(Data here)
• Lets denote the variables as feature inputs as F1, F2, F3, F4, F5
• These are used to compute CF1, CF2, CF3 and CF4, which are further
used to compute CF`1, CF`2 and CF`3.
• We eventually calculate Data(T)
• The equations considered while calculating Data are shown in the
next slide
EQUATIONS CONSIDERED:
• CF1 = (F1 + F2 + F3) * Data(T-1)
• CF2 = (F2-F3)
• CF3 = F3 + CF2 + CF4
• CF4 = F4 – F5
• CF`1 = F2 + Data(T-1)
• CF`2 = if(CF`3) CF`2 else 0
• CF`3 = F4 + CF3
• Data(T) = f(CF`1, CF`2, CF`3, Data(T-1))
The above equations shown as a part of
semantic graph
SCALED DOWN VERSION FOR
APPLICATION IN INDUSTRIAL
AUTOMATION
F1
F2
F3
F4
F5
DATA(T-1)
CF1
CF2
CF3
CF4
CF`1
CF`2
CF`3
DATA(T)
F – Feature
CF – Computed Feature
CF` - Computed Feature 2
Data – Result calculated at t
What we noticed above?
• Each of the input variables are stored in nodes in the graph, while the
computations are stored as edges of the graph.
• It is relatively simpler to trace a particular computer variable back to
its origins.
• Advantages in code refactoring due to the semantic graph nature of
feature engineering.
• There is no recalculation of variables required.
INTRODUCING PARALLALIZATION
WHY?
• As can be seen in the above graph, some features don’t require the
calculation of other features. Also, in the practical case, the delay in
the streaming data for any feature should not affect the feature
calculation.
F1
F2
F3
F4
F5
DATA(T-1)
CF1
CF2
CF3
CF4
CF`1
CF`2
CF`3
DATA(T)
P1
P2
• In the above case, calculation of CF4 is completely independent of
calculation of CF1, CF2 and CF3. Thus, once, the calculation of CF1
starts, a new thread is spawned that checks for unrelated
computations as in CF4 and calculates them.
• Thus, as one moves further along the graph, one can always calculate
the unrelated variables by keeping a dictionary of all the calculated
and raw features, which is what we do in our application.
KEY TAKEAWAYS
• Difficulties in processing streaming data due to high granularity,
latency, interdependence and data volume
• Semantic graphs for feature engineering of streaming datasets
• Introducing parallelization in semantic graph
REFERENCES
• https://www.semanticscholar.org/paper/Mixed-Initiative-Feature-
Engineering-Using-Graphs-Atzm%C3%BCller-
Sternberg/31d92d052cedcef7ad6f3a04860c662a264c7e66
• https://www.sciencedirect.com/science/article/pii/01663615899011
15
• https://tdwi.org/articles/2018/11/09/adv-all-4-reasons-use-graphs-
optimize-machine-learning-data-engineering.aspx
• https://github.com/yahoo/graphkit
THANK YOU

More Related Content

What's hot

Winter Maintenance Management System Bavaria
Winter Maintenance Management System BavariaWinter Maintenance Management System Bavaria
Winter Maintenance Management System Bavaria
Safe Software
 
Data Block
Data BlockData Block
Data Block
Mahmoud Hussein
 
Cad cam-cae
Cad cam-caeCad cam-cae
Cad cam-cae
Sangram Petkar
 
From Outdoor to Indoor: 3D and Venue Mapping
From Outdoor to Indoor: 3D and Venue MappingFrom Outdoor to Indoor: 3D and Venue Mapping
From Outdoor to Indoor: 3D and Venue Mapping
Safe Software
 
Data Integration Solutions for Airports
Data Integration Solutions for AirportsData Integration Solutions for Airports
Data Integration Solutions for Airports
Safe Software
 
Database Comparison and ArcMap Data Driven Pages
Database Comparison and ArcMap Data Driven PagesDatabase Comparison and ArcMap Data Driven Pages
Database Comparison and ArcMap Data Driven Pages
Safe Software
 
Economics project maruti
Economics project marutiEconomics project maruti
Economics project maruti
Akhilendra Tiwari
 
Charan
CharanCharan
Charan
Charan Teja
 
Class prodctn i retail
Class prodctn i retailClass prodctn i retail
Class prodctn i retail
mayankvns
 
PowerGraph
PowerGraphPowerGraph
PowerGraph
Igor Shevchenko
 
Lecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptLecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.ppt
RaJibRaju3
 
Industrial communication
Industrial communicationIndustrial communication
Industrial communication
Mahmoud Hussein
 
FIBER OPTIC PROJECT AS-BUILT DRAWING (ABD)
FIBER OPTIC PROJECT AS-BUILT  DRAWING (ABD)FIBER OPTIC PROJECT AS-BUILT  DRAWING (ABD)
FIBER OPTIC PROJECT AS-BUILT DRAWING (ABD)
JOHNSON ADEJOLA ( Msc, CCNA,CFOT, CFOS/D/H/I/O/L
 
IRJET- Literature Survey on Hardware Addition and Subtraction
IRJET- Literature Survey on Hardware Addition and SubtractionIRJET- Literature Survey on Hardware Addition and Subtraction
IRJET- Literature Survey on Hardware Addition and Subtraction
IRJET Journal
 
MachMotion Introduction
MachMotion IntroductionMachMotion Introduction
MachMotion Introduction
Nathan Ayres
 
Programe evaluation review technique
Programe evaluation review techniquePrograme evaluation review technique
Programe evaluation review technique
shynusamuel
 
URF Poster
URF PosterURF Poster
URF Poster
Tony Zhang
 
DM Season Group Calculation in Non Interval Scenario
DM Season Group Calculation in Non Interval ScenarioDM Season Group Calculation in Non Interval Scenario
DM Season Group Calculation in Non Interval Scenario
Rakesh Dasgupta
 
Parallel & Distributed Computing
Parallel & Distributed ComputingParallel & Distributed Computing
Parallel & Distributed Computing
rohit_ainapure
 
Rena CV
Rena CVRena CV

What's hot (20)

Winter Maintenance Management System Bavaria
Winter Maintenance Management System BavariaWinter Maintenance Management System Bavaria
Winter Maintenance Management System Bavaria
 
Data Block
Data BlockData Block
Data Block
 
Cad cam-cae
Cad cam-caeCad cam-cae
Cad cam-cae
 
From Outdoor to Indoor: 3D and Venue Mapping
From Outdoor to Indoor: 3D and Venue MappingFrom Outdoor to Indoor: 3D and Venue Mapping
From Outdoor to Indoor: 3D and Venue Mapping
 
Data Integration Solutions for Airports
Data Integration Solutions for AirportsData Integration Solutions for Airports
Data Integration Solutions for Airports
 
Database Comparison and ArcMap Data Driven Pages
Database Comparison and ArcMap Data Driven PagesDatabase Comparison and ArcMap Data Driven Pages
Database Comparison and ArcMap Data Driven Pages
 
Economics project maruti
Economics project marutiEconomics project maruti
Economics project maruti
 
Charan
CharanCharan
Charan
 
Class prodctn i retail
Class prodctn i retailClass prodctn i retail
Class prodctn i retail
 
PowerGraph
PowerGraphPowerGraph
PowerGraph
 
Lecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.pptLecutre-6 Datapath Design.ppt
Lecutre-6 Datapath Design.ppt
 
Industrial communication
Industrial communicationIndustrial communication
Industrial communication
 
FIBER OPTIC PROJECT AS-BUILT DRAWING (ABD)
FIBER OPTIC PROJECT AS-BUILT  DRAWING (ABD)FIBER OPTIC PROJECT AS-BUILT  DRAWING (ABD)
FIBER OPTIC PROJECT AS-BUILT DRAWING (ABD)
 
IRJET- Literature Survey on Hardware Addition and Subtraction
IRJET- Literature Survey on Hardware Addition and SubtractionIRJET- Literature Survey on Hardware Addition and Subtraction
IRJET- Literature Survey on Hardware Addition and Subtraction
 
MachMotion Introduction
MachMotion IntroductionMachMotion Introduction
MachMotion Introduction
 
Programe evaluation review technique
Programe evaluation review techniquePrograme evaluation review technique
Programe evaluation review technique
 
URF Poster
URF PosterURF Poster
URF Poster
 
DM Season Group Calculation in Non Interval Scenario
DM Season Group Calculation in Non Interval ScenarioDM Season Group Calculation in Non Interval Scenario
DM Season Group Calculation in Non Interval Scenario
 
Parallel & Distributed Computing
Parallel & Distributed ComputingParallel & Distributed Computing
Parallel & Distributed Computing
 
Rena CV
Rena CVRena CV
Rena CV
 

Similar to Abstract presentation on feature engineering on streaming data for pycon

Why You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped DataWhy You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped Data
DevOps.com
 
Why You Should NOT Be Using an RDBS for Time-stamped Data
 Why You Should NOT Be Using an RDBS for Time-stamped Data Why You Should NOT Be Using an RDBS for Time-stamped Data
Why You Should NOT Be Using an RDBS for Time-stamped Data
DevOps.com
 
Affinitymeterflow Sinkorswimwithsmartmeterdatamanagement
Affinitymeterflow SinkorswimwithsmartmeterdatamanagementAffinitymeterflow Sinkorswimwithsmartmeterdatamanagement
Affinitymeterflow Sinkorswimwithsmartmeterdatamanagement
Talyam
 
Meterflowsinkorswim
MeterflowsinkorswimMeterflowsinkorswim
Meterflowsinkorswim
loichares
 
SI_050212_Olsen_Alknecht_PGE_CBMproject
SI_050212_Olsen_Alknecht_PGE_CBMprojectSI_050212_Olsen_Alknecht_PGE_CBMproject
SI_050212_Olsen_Alknecht_PGE_CBMproject
Jerry Olson
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Flink Forward
 
Development of Software for Estimation of Structural Dynamic Characteristics ...
Development of Software for Estimation of Structural Dynamic Characteristics ...Development of Software for Estimation of Structural Dynamic Characteristics ...
Development of Software for Estimation of Structural Dynamic Characteristics ...
IRJET Journal
 
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
Network: Synchronization: IEEE1588's Future in Computing and the Data CenterNetwork: Synchronization: IEEE1588's Future in Computing and the Data Center
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
Michelle Holley
 
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDBBest Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
InfluxData
 
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and MLContinuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Paris Carbone
 
The Case for Disaggregation of Compute in the Data Center
The Case for Disaggregation of Compute in the Data CenterThe Case for Disaggregation of Compute in the Data Center
The Case for Disaggregation of Compute in the Data Center
Juniper Networks
 
Chapter 6 computer and controls systems within manufacturing
Chapter 6   computer and controls systems within manufacturingChapter 6   computer and controls systems within manufacturing
Chapter 6 computer and controls systems within manufacturing
N. A. Sutisna
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
confluent
 
Electroneumática: Fluidísim con electroneumática
Electroneumática: Fluidísim con electroneumática Electroneumática: Fluidísim con electroneumática
Electroneumática: Fluidísim con electroneumática
SANTIAGO PABLO ALBERTO
 
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
InfluxData
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Databricks
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Data
pepeborja
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
IRJET Journal
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
Sambit Banerjee
 
PLANT INFORMATION SYSTEM.ppt
PLANT INFORMATION SYSTEM.pptPLANT INFORMATION SYSTEM.ppt
PLANT INFORMATION SYSTEM.ppt
Sachin Patidar
 

Similar to Abstract presentation on feature engineering on streaming data for pycon (20)

Why You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped DataWhy You Should NOT Be Using an RDBMS for Time-stamped Data
Why You Should NOT Be Using an RDBMS for Time-stamped Data
 
Why You Should NOT Be Using an RDBS for Time-stamped Data
 Why You Should NOT Be Using an RDBS for Time-stamped Data Why You Should NOT Be Using an RDBS for Time-stamped Data
Why You Should NOT Be Using an RDBS for Time-stamped Data
 
Affinitymeterflow Sinkorswimwithsmartmeterdatamanagement
Affinitymeterflow SinkorswimwithsmartmeterdatamanagementAffinitymeterflow Sinkorswimwithsmartmeterdatamanagement
Affinitymeterflow Sinkorswimwithsmartmeterdatamanagement
 
Meterflowsinkorswim
MeterflowsinkorswimMeterflowsinkorswim
Meterflowsinkorswim
 
SI_050212_Olsen_Alknecht_PGE_CBMproject
SI_050212_Olsen_Alknecht_PGE_CBMprojectSI_050212_Olsen_Alknecht_PGE_CBMproject
SI_050212_Olsen_Alknecht_PGE_CBMproject
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
 
Development of Software for Estimation of Structural Dynamic Characteristics ...
Development of Software for Estimation of Structural Dynamic Characteristics ...Development of Software for Estimation of Structural Dynamic Characteristics ...
Development of Software for Estimation of Structural Dynamic Characteristics ...
 
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
Network: Synchronization: IEEE1588's Future in Computing and the Data CenterNetwork: Synchronization: IEEE1588's Future in Computing and the Data Center
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
 
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDBBest Practices: How to Analyze IoT Sensor Data with InfluxDB
Best Practices: How to Analyze IoT Sensor Data with InfluxDB
 
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and MLContinuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
 
The Case for Disaggregation of Compute in the Data Center
The Case for Disaggregation of Compute in the Data CenterThe Case for Disaggregation of Compute in the Data Center
The Case for Disaggregation of Compute in the Data Center
 
Chapter 6 computer and controls systems within manufacturing
Chapter 6   computer and controls systems within manufacturingChapter 6   computer and controls systems within manufacturing
Chapter 6 computer and controls systems within manufacturing
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Electroneumática: Fluidísim con electroneumática
Electroneumática: Fluidísim con electroneumática Electroneumática: Fluidísim con electroneumática
Electroneumática: Fluidísim con electroneumática
 
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...Sam Dillard [InfluxData] | Performance Optimization in InfluxDB  | InfluxDays...
Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays...
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Data
 
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
High Speed and Area Efficient Matrix Multiplication using Radix-4 Booth Multi...
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
 
PLANT INFORMATION SYSTEM.ppt
PLANT INFORMATION SYSTEM.pptPLANT INFORMATION SYSTEM.ppt
PLANT INFORMATION SYSTEM.ppt
 

Recently uploaded

Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 

Recently uploaded (20)

Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 

Abstract presentation on feature engineering on streaming data for pycon

  • 1. MACHINE LEARNING ON THE RUN: OPTIMIZED FEATURE ENGINEERING FOR STREAMING TIMESERIES DATA IN INDUSTRIAL AUTOMATION - Mayank Prasoon
  • 2. AGENDA • Challenges with processing high volume streaming data for industrial automation • Using semantic graphs in feature engineering of streaming data • Using graphs for model creation and processing • Parallelization in computing semantic graphs • Conclusions
  • 4. About Quartic.ai and Myself • Quartic.ai • Industrial IoT Platform for Manufacturing Industries • Used by Fortune 100 companies in Pharma, Mining, Power and F&B • Data Processing: • 10B + daily data points in the cloud • 100B + daily data points on the edge • Mayank Prasoon • Software Developer and product enthusiast • Quartic.ai • https://www.linkedin.com/in/prasoonmayank/ • Ukulele enthusiast
  • 5. LATENCY • A typical manufacturing pipeline has 10s to 100s of different machines with each having 100s of sensors to measure multiple factors like pressure, temperature, pH levels and so on, with each giving data at interval of a few milli seconds to seconds. • Further the raw data has to be processed in near real time, to ensure that the time lag between the raw data intake and the data processing is minimal. • The data processing involves multiple combination of logical, arithmetic and binary operations on the incoming data stream.
  • 6. INTERDEPENDENCE • The processed data often has less number of dimensions compared to the raw data. This is done by dimensionality reduction by combining, removing or minimizing the input raw data. This makes the feature engineering process dependent on the input tags. • The sensors don’t give data at the same sampling rates. Also, the sampling rates may vary at different times due to drift and noise. • All these factors have to be taken into consideration when creating the feature engineering process.
  • 7. GRANULARITY • The incoming data can be very granular which means the data processing should be happening at similar time intervals.
  • 8. DATA VOLUME • The volume of data can easily be in billions data points, each day. • Quartic’s ingestion and processing is ~10B data points per day
  • 9. Using semantic graphs in feature engineering of streaming data
  • 10. What are semantic graphs? • According to Wikipedia, a semantic graph is "a network that represents semantic relationships between concepts...It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts."
  • 11. SEMANTIC GRAPHS IN THE CASE OF INDUSTRIAL AUTOMATION
  • 12. Scaled down problem statement: • We get input from 5 different sensors(features here), which have to be reduced to a single variable(Data here) • Lets denote the variables as feature inputs as F1, F2, F3, F4, F5 • These are used to compute CF1, CF2, CF3 and CF4, which are further used to compute CF`1, CF`2 and CF`3. • We eventually calculate Data(T) • The equations considered while calculating Data are shown in the next slide
  • 13. EQUATIONS CONSIDERED: • CF1 = (F1 + F2 + F3) * Data(T-1) • CF2 = (F2-F3) • CF3 = F3 + CF2 + CF4 • CF4 = F4 – F5 • CF`1 = F2 + Data(T-1) • CF`2 = if(CF`3) CF`2 else 0 • CF`3 = F4 + CF3 • Data(T) = f(CF`1, CF`2, CF`3, Data(T-1))
  • 14. The above equations shown as a part of semantic graph
  • 15. SCALED DOWN VERSION FOR APPLICATION IN INDUSTRIAL AUTOMATION F1 F2 F3 F4 F5 DATA(T-1) CF1 CF2 CF3 CF4 CF`1 CF`2 CF`3 DATA(T) F – Feature CF – Computed Feature CF` - Computed Feature 2 Data – Result calculated at t
  • 16. What we noticed above? • Each of the input variables are stored in nodes in the graph, while the computations are stored as edges of the graph. • It is relatively simpler to trace a particular computer variable back to its origins. • Advantages in code refactoring due to the semantic graph nature of feature engineering. • There is no recalculation of variables required.
  • 18. WHY? • As can be seen in the above graph, some features don’t require the calculation of other features. Also, in the practical case, the delay in the streaming data for any feature should not affect the feature calculation.
  • 20. • In the above case, calculation of CF4 is completely independent of calculation of CF1, CF2 and CF3. Thus, once, the calculation of CF1 starts, a new thread is spawned that checks for unrelated computations as in CF4 and calculates them. • Thus, as one moves further along the graph, one can always calculate the unrelated variables by keeping a dictionary of all the calculated and raw features, which is what we do in our application.
  • 21. KEY TAKEAWAYS • Difficulties in processing streaming data due to high granularity, latency, interdependence and data volume • Semantic graphs for feature engineering of streaming datasets • Introducing parallelization in semantic graph