SlideShare a Scribd company logo
1 of 15
Download to read offline
Utilizing Data warehousing and Data Mining Algorithms on
information gathered with IoT Sensors
Eric Matthews – Mohsen Tavakoli Fall 2016
Emerging Non-Traditional Database Systems:
Data Warehousing and Mining
(03-60-539) Dr. Ezeife
1
Contents
● Software and Hardware
● Data Warehouse
● Roll-Up Function
● WEKA Clustering
○ 3 Clusters
○ 6 Clusters
● Conclusion
2
Software and Hardware
● Arduino / Arduino IDE 1.6.13 (1)
● Ubuntu Linux Server
● Python 2.7 (Server & Client)
● MySQL Server v5.5.46
● WEKA 3.8
Sensors:
(1)
(2)
(5)
(3)
(4)
3
Sound (RB-Wav-26) (2)
Ultrasonic Distance (SR04) (3)
Temperature (DHT11) (4)
Light (Photoresistor) (5)
Motion Sensor (HC-SR501) (6)
(6)
Data Warehouse
Our data warehouse consists of the following
fields:
● location_id (any unique location that the device is placed)
● average , maximum and minimum over 10 readings of:
○ Distance
○ Light
○ Sound
○ Temperature
○ Humidity
○ # of Counts of Motion
● time_collected (time that client collected data)
● srv_time_collected (time that server collected data) 4
Data warehouse - Location 5 data
5
Location Table - Data warehouse
6
Roll-Up
We have created a stored procedure in MySQL that allows us to roll-up our data
by any interval of time and location
CALL database_project.rollup_time(time_interval_seconds, location_id)
This query allows us to aggregate our data into fact tables by any time interval
(minute, hour, day, year, or any amount of seconds) and location
We do this using GROUP BY on our time_collected field in MySQL
7
Roll-up (15s) - Location 5 - Example
8
WEKA Clustering - Location 5 - 3 Clusters
Using EM clustering with a maximum of 3 clusters, we have retrieved clusters for
location 3, per minute, that we call Not Home, Passively Home, and Actively
Home
Passively home Not home Actively Home
9
● 47% Being
used
● 53% Not being
used
Location 5 - Cluster Centroids
Using 3 of our attributes (Light, Motion, and Sound) we have calculated these
centroids for our clusters in location 5. Data has been normalized.
Passively Home Not Home Actively Home
Avg Light 0.2533 0.7012 0.6758
Max Motion Count 0.1172 0 0.2433
Avg Sound 0.0431 0.0306 0.0819
# of Data Points 176 (9%) 899 (47%) 851 ( 44%)
10
WEKA Clustering - Location 5 - 6 Clusters
Using EM on location 5 with no maximum cluster parameter resulted in 6 clusters
Based on the clusters we came to the conclusion that:
● 51% location being used
● 49% location not being used
● Highly Active
● 2 Lights no Activity
● Quietly Active
● No light No Activity
● Main Light No Activity
● 1 Light Quietly Active
11
Location 5 - Cluster Centroids
Highly
Active
2 Lights
No Activity
Quietly
Active
No Light
No Activity
Main Light
No Activity
1 Light
Quietly Active
Avg Light 0.6901 0.8603 0.6777 0 0.6485 0.3654
Max
Motion
Count
0.2052 0 0.2438 0 0 0.1826
Avg
Sound
0.3034 0.0194 0.0717 0.0317 0.0338 0.049
# of Data
Points
896 (52%) 519 (30%) 316
(18%)
73 (4%) 166 (9%) 94 (5%)
12
Conclusion
- We can conclude that it is possible to define three different states of home
presence, namely: Not Home, Passively Home, and Actively Home
- Any new readings can be categorized into these clusters to determine
whether the subject is home or not
- Also, we can gain finer detail into the state of a location by using more
clusters:
● Determine when lights or heating/cooling are turned on but nobody is
using the location
● Monitor sources of ambient or constant noise
● Detection of presence during usual periods of no activity (locked building,
or house) 13
Future Work
We hope to find out more information from our data by:
● Collecting more data
● Rolling up larger amounts of time
● Using different subsets of our data for different hypotheses
● Using different algorithms for clustering
14
Thank You
Any Questions?
15

More Related Content

Viewers also liked

Big data and smart cities: Key data issues
Big data and smart cities: Key data issuesBig data and smart cities: Key data issues
Big data and smart cities: Key data issuesrobkitchin
 
Open data for smart cities
Open data for smart citiesOpen data for smart cities
Open data for smart citiesSören Auer
 
Data Analytics for Smart Cities: Looking Back, Looking Forward
Data Analytics for Smart Cities: Looking Back, Looking Forward Data Analytics for Smart Cities: Looking Back, Looking Forward
Data Analytics for Smart Cities: Looking Back, Looking Forward PayamBarnaghi
 
Smart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftSmart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftCustom Soft
 
Building Smart Cities with Everything as a Service
Building Smart Cities with Everything as a ServiceBuilding Smart Cities with Everything as a Service
Building Smart Cities with Everything as a ServiceSyam Madanapalli
 
Big Data & Smart Cities
Big Data & Smart CitiesBig Data & Smart Cities
Big Data & Smart CitiesMoutaz Haddara
 
REDtone IOT Smart City Solutions - CitiAct and CitiSense
REDtone IOT Smart City Solutions - CitiAct and CitiSenseREDtone IOT Smart City Solutions - CitiAct and CitiSense
REDtone IOT Smart City Solutions - CitiAct and CitiSenseDr. Mazlan Abbas
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

Viewers also liked (12)

Big data and smart cities: Key data issues
Big data and smart cities: Key data issuesBig data and smart cities: Key data issues
Big data and smart cities: Key data issues
 
Open data for smart cities
Open data for smart citiesOpen data for smart cities
Open data for smart cities
 
Data Analytics for Smart Cities: Looking Back, Looking Forward
Data Analytics for Smart Cities: Looking Back, Looking Forward Data Analytics for Smart Cities: Looking Back, Looking Forward
Data Analytics for Smart Cities: Looking Back, Looking Forward
 
Smart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoftSmart health prediction using data mining by customsoft
Smart health prediction using data mining by customsoft
 
Building Smart Cities with Everything as a Service
Building Smart Cities with Everything as a ServiceBuilding Smart Cities with Everything as a Service
Building Smart Cities with Everything as a Service
 
Big Data & Smart Cities
Big Data & Smart CitiesBig Data & Smart Cities
Big Data & Smart Cities
 
REDtone IOT Smart City Solutions - CitiAct and CitiSense
REDtone IOT Smart City Solutions - CitiAct and CitiSenseREDtone IOT Smart City Solutions - CitiAct and CitiSense
REDtone IOT Smart City Solutions - CitiAct and CitiSense
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining
Data miningData mining
Data mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to Utilizing Data warehousing and Data Mining Algorithms on information gathered with IoT Sensors

OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?ScyllaDB
 
Writing Applications for Scylla
Writing Applications for ScyllaWriting Applications for Scylla
Writing Applications for ScyllaScyllaDB
 
SLES Performance Enhancements for Large NUMA Systems
SLES Performance Enhancements for Large NUMA SystemsSLES Performance Enhancements for Large NUMA Systems
SLES Performance Enhancements for Large NUMA SystemsDavidlohr Bueso
 
Differential data processing for energy efficiency of wireless
Differential data processing for energy efficiency of wirelessDifferential data processing for energy efficiency of wireless
Differential data processing for energy efficiency of wirelessDaniel Lim
 
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014Davidlohr Bueso
 
Using Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC IssuesUsing Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC IssuesAnil Nair
 
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...Mark Rittman
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Fisnik Kraja
 
A Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity ConsumptionA Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity ConsumptionRyousei Takano
 
ICIS - Power price prediction with neural networks
ICIS - Power price prediction with neural networksICIS - Power price prediction with neural networks
ICIS - Power price prediction with neural networksICIS
 
OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud
 
04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbersYutaka Kawai
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Taking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - ArchitectureTaking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - ArchitectureSplunk
 
Primary mirror edge sensor project for the Southern African Large Telescope
Primary mirror edge sensor project for the Southern African Large TelescopePrimary mirror edge sensor project for the Southern African Large Telescope
Primary mirror edge sensor project for the Southern African Large TelescopeDeonBester4
 
AWS Summit 2013 | Singapore - Understanding the Total Cost of (Non) Ownership...
AWS Summit 2013 | Singapore - Understanding the Total Cost of (Non) Ownership...AWS Summit 2013 | Singapore - Understanding the Total Cost of (Non) Ownership...
AWS Summit 2013 | Singapore - Understanding the Total Cost of (Non) Ownership...Amazon Web Services
 

Similar to Utilizing Data warehousing and Data Mining Algorithms on information gathered with IoT Sensors (20)

OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?OSNoise Tracer: Who Is Stealing My CPU Time?
OSNoise Tracer: Who Is Stealing My CPU Time?
 
Writing Applications for Scylla
Writing Applications for ScyllaWriting Applications for Scylla
Writing Applications for Scylla
 
Masters Thesis
Masters ThesisMasters Thesis
Masters Thesis
 
SLES Performance Enhancements for Large NUMA Systems
SLES Performance Enhancements for Large NUMA SystemsSLES Performance Enhancements for Large NUMA Systems
SLES Performance Enhancements for Large NUMA Systems
 
Differential data processing for energy efficiency of wireless
Differential data processing for energy efficiency of wirelessDifferential data processing for energy efficiency of wireless
Differential data processing for energy efficiency of wireless
 
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
An Overview of [Linux] Kernel Lock Improvements -- Linuxcon NA 2014
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
Bose_Presentation
Bose_PresentationBose_Presentation
Bose_Presentation
 
Using Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC IssuesUsing Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC Issues
 
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
 
A Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity ConsumptionA Low-cost and Scalable Visualization System for Electricity Consumption
A Low-cost and Scalable Visualization System for Electricity Consumption
 
ICIS - Power price prediction with neural networks
ICIS - Power price prediction with neural networksICIS - Power price prediction with neural networks
ICIS - Power price prediction with neural networks
 
OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud Databases
 
ANPR FPGA Workshop
ANPR FPGA WorkshopANPR FPGA Workshop
ANPR FPGA Workshop
 
04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Taking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - ArchitectureTaking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - Architecture
 
Primary mirror edge sensor project for the Southern African Large Telescope
Primary mirror edge sensor project for the Southern African Large TelescopePrimary mirror edge sensor project for the Southern African Large Telescope
Primary mirror edge sensor project for the Southern African Large Telescope
 
AWS Summit 2013 | Singapore - Understanding the Total Cost of (Non) Ownership...
AWS Summit 2013 | Singapore - Understanding the Total Cost of (Non) Ownership...AWS Summit 2013 | Singapore - Understanding the Total Cost of (Non) Ownership...
AWS Summit 2013 | Singapore - Understanding the Total Cost of (Non) Ownership...
 

Utilizing Data warehousing and Data Mining Algorithms on information gathered with IoT Sensors

  • 1. Utilizing Data warehousing and Data Mining Algorithms on information gathered with IoT Sensors Eric Matthews – Mohsen Tavakoli Fall 2016 Emerging Non-Traditional Database Systems: Data Warehousing and Mining (03-60-539) Dr. Ezeife 1
  • 2. Contents ● Software and Hardware ● Data Warehouse ● Roll-Up Function ● WEKA Clustering ○ 3 Clusters ○ 6 Clusters ● Conclusion 2
  • 3. Software and Hardware ● Arduino / Arduino IDE 1.6.13 (1) ● Ubuntu Linux Server ● Python 2.7 (Server & Client) ● MySQL Server v5.5.46 ● WEKA 3.8 Sensors: (1) (2) (5) (3) (4) 3 Sound (RB-Wav-26) (2) Ultrasonic Distance (SR04) (3) Temperature (DHT11) (4) Light (Photoresistor) (5) Motion Sensor (HC-SR501) (6) (6)
  • 4. Data Warehouse Our data warehouse consists of the following fields: ● location_id (any unique location that the device is placed) ● average , maximum and minimum over 10 readings of: ○ Distance ○ Light ○ Sound ○ Temperature ○ Humidity ○ # of Counts of Motion ● time_collected (time that client collected data) ● srv_time_collected (time that server collected data) 4
  • 5. Data warehouse - Location 5 data 5
  • 6. Location Table - Data warehouse 6
  • 7. Roll-Up We have created a stored procedure in MySQL that allows us to roll-up our data by any interval of time and location CALL database_project.rollup_time(time_interval_seconds, location_id) This query allows us to aggregate our data into fact tables by any time interval (minute, hour, day, year, or any amount of seconds) and location We do this using GROUP BY on our time_collected field in MySQL 7
  • 8. Roll-up (15s) - Location 5 - Example 8
  • 9. WEKA Clustering - Location 5 - 3 Clusters Using EM clustering with a maximum of 3 clusters, we have retrieved clusters for location 3, per minute, that we call Not Home, Passively Home, and Actively Home Passively home Not home Actively Home 9 ● 47% Being used ● 53% Not being used
  • 10. Location 5 - Cluster Centroids Using 3 of our attributes (Light, Motion, and Sound) we have calculated these centroids for our clusters in location 5. Data has been normalized. Passively Home Not Home Actively Home Avg Light 0.2533 0.7012 0.6758 Max Motion Count 0.1172 0 0.2433 Avg Sound 0.0431 0.0306 0.0819 # of Data Points 176 (9%) 899 (47%) 851 ( 44%) 10
  • 11. WEKA Clustering - Location 5 - 6 Clusters Using EM on location 5 with no maximum cluster parameter resulted in 6 clusters Based on the clusters we came to the conclusion that: ● 51% location being used ● 49% location not being used ● Highly Active ● 2 Lights no Activity ● Quietly Active ● No light No Activity ● Main Light No Activity ● 1 Light Quietly Active 11
  • 12. Location 5 - Cluster Centroids Highly Active 2 Lights No Activity Quietly Active No Light No Activity Main Light No Activity 1 Light Quietly Active Avg Light 0.6901 0.8603 0.6777 0 0.6485 0.3654 Max Motion Count 0.2052 0 0.2438 0 0 0.1826 Avg Sound 0.3034 0.0194 0.0717 0.0317 0.0338 0.049 # of Data Points 896 (52%) 519 (30%) 316 (18%) 73 (4%) 166 (9%) 94 (5%) 12
  • 13. Conclusion - We can conclude that it is possible to define three different states of home presence, namely: Not Home, Passively Home, and Actively Home - Any new readings can be categorized into these clusters to determine whether the subject is home or not - Also, we can gain finer detail into the state of a location by using more clusters: ● Determine when lights or heating/cooling are turned on but nobody is using the location ● Monitor sources of ambient or constant noise ● Detection of presence during usual periods of no activity (locked building, or house) 13
  • 14. Future Work We hope to find out more information from our data by: ● Collecting more data ● Rolling up larger amounts of time ● Using different subsets of our data for different hypotheses ● Using different algorithms for clustering 14