SlideShare a Scribd company logo
1 of 27
Download to read offline
Clustering
•   Sensor signal is not labeled. For classification, we need to
    label first, e.g. by clustering events
Clustering
•   Sensor signal is not labeled. For classification, we need to
    label first, e.g. by clustering events

             Truck?                        Car? Noise?
Clustering
           Windowing     Windowing     Windowing     Windowing     Windowing




           Convolute     Convolute     Convolute     Convolute     Convolute



random     Calculate     Calculate     Calculate     Calculate     Calculate     update
clusters   distance      distance      distance      distance      distance      clusters




            Average       Average       Average       Average       Average
           per cluster   per cluster   per cluster   per cluster   per cluster
Clustering in Hadoop
                                                               Subsequences
 Raw data                   Data massage
                                                               (with lead-in/out)
                          ...
<ts, {s1, s2, ...}>
                          <tsi-1,{1,s1}>
                                                           <ts1,          >
                          <tsi , {0,s1}>    Reduce
     split 1          Map <ts ,{-1,s }>
                              i+1      1    (per tsi )     <ts2,          >
                          ...
                                                           <ts3,          >

     split 2          Map                   Reduce         <ts4,          >
        ...                                                <ts5,          >
     split n          Map                   Reduce         <ts6,          >




                                                         lead-in ts       lead-out
                                    lead-in/out needed to center
                                          bumps (snapping)
Clustering in Hadoop
                                                               Subsequences
 Raw data                   Data massage
                                                               (with lead-in/out)
                          ...
<ts, {s1, s2, ...}>
                          <tsi-1,{1,s1}>
                                                           <ts1,          >
                          <tsi , {0,s1}>    Reduce
     split 1          Map <ts ,{-1,s }>
                              i+1      1    (per tsi )     <ts2,          >
                          ...
                                                           <ts3,          >

     split 2          Map                   Reduce         <ts4,          >
        ...                                                <ts5,          >
     split n          Map                   Reduce         <ts6,          >


            Some clever
         partitioning of keys                            lead-in ts       lead-out
                                    lead-in/out needed to center
                                          bumps (snapping)
Clustering in Hadoop
                                                   Distance calculation parallel
      Subsequences                                                   Cluster
                                      Clustering
      (with lead-in/out)                                             centroids
                                            <clusi,partialsums>
  <ts1,          > split 1         Map                Reduce
                                                                    <clus1,        >
  <ts2,          >                                   (per clusi )
                     split 2       Map
  <ts3,          >
                       ...                            Reduce        <clus2,        >
  <ts4,          > split n         Map

  <ts5,          >
  <ts6,          >               current
                                 cluster
                                                update current cluster centroids
                                centroids
                                                iterate
                               k (random)
lead-in ts       lead-out
Clustering
Clustering
       Truck (traffic jam)
Clustering
       Truck (traffic jam)



       Small truck (traffic jam)
Clustering
       Truck (traffic jam)



       Small truck (traffic jam)

       Cars (traffic jam)
Clustering
       Truck (traffic jam)



       Small truck (traffic jam)

       Cars (traffic jam)


       Heavy truck
Clustering
       Truck (traffic jam)



       Small truck (traffic jam)

       Cars (traffic jam)


       Heavy truck
       Medium truck
Clustering
       Truck (traffic jam)



       Small truck (traffic jam)

       Cars (traffic jam)


       Heavy truck
       Medium truck
       Small truck
Clustering
       Truck (traffic jam)



       Small truck (traffic jam)

       Cars (traffic jam)


       Heavy truck
       Medium truck
       Small truck
       Car
Clustering
       Truck (traffic jam)



       Small truck (traffic jam)

       Cars (traffic jam)


       Heavy truck
       Medium truck
       Small truck
       Car
       Idle (noise)
Clustering
Truck (traffic jam)



Small truck (traffic jam)

Cars (traffic jam)


Heavy truck
Medium truck
Small truck
Car
Idle (noise)
Clustering
Truck (traffic jam)



Small truck (traffic jam)

Cars (traffic jam)          Truck!
                                    Car!
Heavy truck
Medium truck
Small truck
Car
Idle (noise)
Performance
         • MapReduce: Techniques scale linearly (6 node cluster)
          • Noticeable overhead on small amounts of data
                                                     Convolution   Clustering
                  40,00
Runtime (hours)




                  30,00


                  20,00


                  10,00


                     0
                     3 days     10 days            1 month           3 months

                                  Amount of sensor data
Performance
         • MapReduce: Techniques scale linearly (6 node cluster)
          • Noticeable overhead on small amounts of data
                                                     Convolution   Clustering
                  40,00
Runtime (hours)




                  30,00


                  20,00                                            66 node
                  10,00
                                                                    cluster

                     0
                     3 days     10 days            1 month           3 months

                                  Amount of sensor data
Performance
         • MapReduce: Techniques scale linearly (6 node cluster)
          • Noticeable overhead on small amounts of data
                                                     Convolution   Clustering
                  40,00
Runtime (hours)




                  30,00


                  20,00                                            66 node
                  10,00
                                                                    cluster

                     0
                     3 days     10 days            1 month           3 months

                                  Amount of sensor data
Multi-scale analysis
• Sensor signal is composite of events that happen at
  different time-scales
  • Passing truck (small), traffic jam (medium), seasonal
    change (long scale)
• Try to de-compose signals in ‘natural’ timescales
• Basic idea:
 • Convolute data at different scales (scale space)
 • Subtract key convolutions (band-pass filters)
Scale space




  Amount of sensor data
Multi-scale analysis
• Subtraction of two such convolutions (band-pass filter)




                      Amount of sensor data
Scale space
Decomposition
                S2

                S4-S2


                S4
Up to speed...
• Large-scale preprocessing allows advanced analysis
 • Equation discovery
       s100(t) = 1.196 s101(t) - 0.272 s102(t) + 0.156 s106(t)


  • Long-term trends (regression)
   • E.g. change in response, eigenfrequency,...
  • Correlations:
                                                strain
  • ...

                                                                 temperature
Dank U
                  Hvala             Thanks
        Xie Xie
                                             Diolch
     Toda
                                                Merci
 Grazie
                                                  Spasiba
Efharisto
                                                 Obrigado
  Arigato
                                              Köszönöm
    Tesekkurler                          Danke
            Dhanyavaad             Gracias

More Related Content

Similar to Hadoop sensordata part3

Hadoop and Cloud at Netflix
Hadoop and Cloud at NetflixHadoop and Cloud at Netflix
Hadoop and Cloud at NetflixDataWorks Summit
 
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...Austin Benson
 
07 2
07 207 2
07 2a_b_g
 
signals and systems_isooperations.pptx
signals and   systems_isooperations.pptxsignals and   systems_isooperations.pptx
signals and systems_isooperations.pptxMrFanatic1
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeLorenzo Alberton
 
EarGram: an Application for Interactive Exploration of Large Databases of Aud...
EarGram: an Application for Interactive Exploration of Large Databases of Aud...EarGram: an Application for Interactive Exploration of Large Databases of Aud...
EarGram: an Application for Interactive Exploration of Large Databases of Aud...Gilberto Bernardes
 
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Yahoo Developer Network
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceDr Ganesh Iyer
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 
Conway's Game of Life with Repa
Conway's Game of Life with RepaConway's Game of Life with Repa
Conway's Game of Life with Repakizzx2
 

Similar to Hadoop sensordata part3 (13)

Tombstones and Compaction
Tombstones and CompactionTombstones and Compaction
Tombstones and Compaction
 
Hadoop and Cloud at Netflix
Hadoop and Cloud at NetflixHadoop and Cloud at Netflix
Hadoop and Cloud at Netflix
 
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
 
07 2
07 207 2
07 2
 
Extreme dxt compression
Extreme dxt compressionExtreme dxt compression
Extreme dxt compression
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
signals and systems_isooperations.pptx
signals and   systems_isooperations.pptxsignals and   systems_isooperations.pptx
signals and systems_isooperations.pptx
 
Graphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks AgeGraphs in the Database: Rdbms In The Social Networks Age
Graphs in the Database: Rdbms In The Social Networks Age
 
EarGram: an Application for Interactive Exploration of Large Databases of Aud...
EarGram: an Application for Interactive Exploration of Large Databases of Aud...EarGram: an Application for Interactive Exploration of Large Databases of Aud...
EarGram: an Application for Interactive Exploration of Large Databases of Aud...
 
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Conway's Game of Life with Repa
Conway's Game of Life with RepaConway's Game of Life with Repa
Conway's Game of Life with Repa
 

More from Joaquin Vanschoren (19)

Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Designed Serendipity
Designed SerendipityDesigned Serendipity
Designed Serendipity
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017OpenML Reproducibility in Machine Learning ICML2017
OpenML Reproducibility in Machine Learning ICML2017
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
 
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine LearningOpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
 
Data science
Data scienceData science
Data science
 
OpenML 2014
OpenML 2014OpenML 2014
OpenML 2014
 
Open Machine Learning
Open Machine LearningOpen Machine Learning
Open Machine Learning
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop sensordata part2
Hadoop sensordata part2Hadoop sensordata part2
Hadoop sensordata part2
 
Hadoop sensordata part1
Hadoop sensordata part1Hadoop sensordata part1
Hadoop sensordata part1
 

Recently uploaded

21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationNeilDeclaro1
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17Celine George
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfstareducators107
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 

Recently uploaded (20)

21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

Hadoop sensordata part3

  • 1. Clustering • Sensor signal is not labeled. For classification, we need to label first, e.g. by clustering events
  • 2. Clustering • Sensor signal is not labeled. For classification, we need to label first, e.g. by clustering events Truck? Car? Noise?
  • 3. Clustering Windowing Windowing Windowing Windowing Windowing Convolute Convolute Convolute Convolute Convolute random Calculate Calculate Calculate Calculate Calculate update clusters distance distance distance distance distance clusters Average Average Average Average Average per cluster per cluster per cluster per cluster per cluster
  • 4. Clustering in Hadoop Subsequences Raw data Data massage (with lead-in/out) ... <ts, {s1, s2, ...}> <tsi-1,{1,s1}> <ts1, > <tsi , {0,s1}> Reduce split 1 Map <ts ,{-1,s }> i+1 1 (per tsi ) <ts2, > ... <ts3, > split 2 Map Reduce <ts4, > ... <ts5, > split n Map Reduce <ts6, > lead-in ts lead-out lead-in/out needed to center bumps (snapping)
  • 5. Clustering in Hadoop Subsequences Raw data Data massage (with lead-in/out) ... <ts, {s1, s2, ...}> <tsi-1,{1,s1}> <ts1, > <tsi , {0,s1}> Reduce split 1 Map <ts ,{-1,s }> i+1 1 (per tsi ) <ts2, > ... <ts3, > split 2 Map Reduce <ts4, > ... <ts5, > split n Map Reduce <ts6, > Some clever partitioning of keys lead-in ts lead-out lead-in/out needed to center bumps (snapping)
  • 6. Clustering in Hadoop Distance calculation parallel Subsequences Cluster Clustering (with lead-in/out) centroids <clusi,partialsums> <ts1, > split 1 Map Reduce <clus1, > <ts2, > (per clusi ) split 2 Map <ts3, > ... Reduce <clus2, > <ts4, > split n Map <ts5, > <ts6, > current cluster update current cluster centroids centroids iterate k (random) lead-in ts lead-out
  • 8. Clustering Truck (traffic jam)
  • 9. Clustering Truck (traffic jam) Small truck (traffic jam)
  • 10. Clustering Truck (traffic jam) Small truck (traffic jam) Cars (traffic jam)
  • 11. Clustering Truck (traffic jam) Small truck (traffic jam) Cars (traffic jam) Heavy truck
  • 12. Clustering Truck (traffic jam) Small truck (traffic jam) Cars (traffic jam) Heavy truck Medium truck
  • 13. Clustering Truck (traffic jam) Small truck (traffic jam) Cars (traffic jam) Heavy truck Medium truck Small truck
  • 14. Clustering Truck (traffic jam) Small truck (traffic jam) Cars (traffic jam) Heavy truck Medium truck Small truck Car
  • 15. Clustering Truck (traffic jam) Small truck (traffic jam) Cars (traffic jam) Heavy truck Medium truck Small truck Car Idle (noise)
  • 16. Clustering Truck (traffic jam) Small truck (traffic jam) Cars (traffic jam) Heavy truck Medium truck Small truck Car Idle (noise)
  • 17. Clustering Truck (traffic jam) Small truck (traffic jam) Cars (traffic jam) Truck! Car! Heavy truck Medium truck Small truck Car Idle (noise)
  • 18. Performance • MapReduce: Techniques scale linearly (6 node cluster) • Noticeable overhead on small amounts of data Convolution Clustering 40,00 Runtime (hours) 30,00 20,00 10,00 0 3 days 10 days 1 month 3 months Amount of sensor data
  • 19. Performance • MapReduce: Techniques scale linearly (6 node cluster) • Noticeable overhead on small amounts of data Convolution Clustering 40,00 Runtime (hours) 30,00 20,00 66 node 10,00 cluster 0 3 days 10 days 1 month 3 months Amount of sensor data
  • 20. Performance • MapReduce: Techniques scale linearly (6 node cluster) • Noticeable overhead on small amounts of data Convolution Clustering 40,00 Runtime (hours) 30,00 20,00 66 node 10,00 cluster 0 3 days 10 days 1 month 3 months Amount of sensor data
  • 21. Multi-scale analysis • Sensor signal is composite of events that happen at different time-scales • Passing truck (small), traffic jam (medium), seasonal change (long scale) • Try to de-compose signals in ‘natural’ timescales • Basic idea: • Convolute data at different scales (scale space) • Subtract key convolutions (band-pass filters)
  • 22. Scale space Amount of sensor data
  • 23. Multi-scale analysis • Subtraction of two such convolutions (band-pass filter) Amount of sensor data
  • 25. Decomposition S2 S4-S2 S4
  • 26. Up to speed... • Large-scale preprocessing allows advanced analysis • Equation discovery s100(t) = 1.196 s101(t) - 0.272 s102(t) + 0.156 s106(t) • Long-term trends (regression) • E.g. change in response, eigenfrequency,... • Correlations: strain • ... temperature
  • 27. Dank U Hvala Thanks Xie Xie Diolch Toda Merci Grazie Spasiba Efharisto Obrigado Arigato Köszönöm Tesekkurler Danke Dhanyavaad Gracias