An Autonomic Approach to Real-Time Predictive Analytics using Open Data and Internet of Things

An Autonomic Approach to Real-Time Predictive
Analytics using Open Data and Internet of Things
Wassim Derguech, Eanna Burke, Edward Curry
Insight Centre For Data Analytics
National University of Ireland, Galway
UIC 2014 - The 11th IEEE International Conference
on Ubiquitous Intelligence and Computing
December 9-12, 2014
Ayodya Resort, Bali, Indonesia

Motivation: Internet of Things (IOT)
Smart Homes, Grids, Cities…
by 2020 50 billion devices connected to mobile networks (OECD, 2012)
Today’s Internet of Things “behaviour”(Abbas M. Keynote Presentation at UiTM WSN Seminar 2012)
74%
Real-time
location-
based info.
71%
Payment
apps.
60%
Weather
apps.
60%
Want
connected
system in
car.
29%
Health apps.
51%
Maps/Naviga
tion/Search

Motivation: Open Data
• Open Data, not simply big data, will be driver for growth, ingenuity, and innovation
in the UK economy. (Deloitte Analytics, 2012)
• $1.5 Billion: US National Weather Service supporting a private weather industry
per year (CapGemini 2014)
• $32 Billion: Estimated direct impact of Open Data in 2010 on the EU27, annual
growth of 7% (Vickery 2011)
• $140 Billion: Estimated aggregate direct and indirect impact across EU27
(Vickery 2011)
• $3 Trillion: Estimated annual economic potential across seven domains.
(McKinsey 2013)
[Deirdre Lee, Presentation from The Open Group Conference in London, 22 October 2014]

Problem statement and contribution
• Open Data is becoming more and more available and valuable!
• Both public and private data can be used to drive decision making.
 Challenge: selection of the best Open Data and IoT source
to support predictive analytics.
• Our contribution:
(1) data management: collection, filtering, and warehousing
(2) data analytics: source selection and predictive analytics
• Our promise: An autonomic system
Self-Configuration Self-Optimization Self-Healing

Open Data
Weather Forecast
Web Services
Internet of Things
Building Power Prediction System
Sensor Data
Building Power consumption
Learning
relationships
between variables
Use the learned
relationships for
prediction
Use Case: predicting energy usage using sensor and weather data

Open Data Management
Any data source including:
sensors, web services, IoT, etc.
For each data source a collector
is required depending on their
communication protocol.
Receive data from collectors
and transform it using a
predefined format/RDF
schema/Ontology.
Persist the data into a local RDF
store.

Ontologies in use for our use case
• Ontologies constitute formal specifications for shared
conceptualisations  foster reuse of existing assets
• Criteria for choosing the right ontology for our use case:
•In use : proven to be good in practice and well documented for easy learning
•Relevant: describing surface readings rather than space or marine conditions
•Numerical: in order to be suitable for machine learning. Avoid vague terms such as
“hot” or “humid”
Sensor Data
Semantic Sensor
Network (SSN)
Weather Observation
AEMET Weather
Observation
Ontology
Weather Prediction
Meteo Ontology

Source
Selector
• Evaluates sources and builds a
prediction model from the best
Open Data and IoT source
• Requirements for machine learning algorithm:
• Reasonable accuracy
• Quick prediction model generation
• Work well with little data
• Work well with nominal and numerical inputs
• Low configuration
• Give insights into the factors influencing predictions

Reselection
Controller
• Triggers the re-
selection of data
sources for building a
new prediction model
• Criteria for a re-
selection

Error Comparison
Module
• Uses the prediction model generated by the Source Selector (6) for
generating energy usage predictions
• Sends predicted data to the UI (7) to be displayed to the user
• Compares the predicted and current energy usage values and
generates the error rates produced (1)

Implementation: Data Sources
Weather Observations
Local Weather
Station within the
University
Open Weather Map
web service
Weather Forecast
Ham Weather
Forecast Web
Service
YR.no Weather
Forecast Web
Service
Sensor Data
Building Power
consumption

Implementation:
User Interface
• The User Interface is the
consumer of produced
predictions.
• Used only for displaying
results
• Not used for providing
any inputs or
configurations

Evaluation: Initiation of the System
Object: Evaluate how quick the system starts to
provide reasonable results
Very high error
rate at the
initiation of the
system
Better results
due to 1 day
historical data
First non-
working day
(Saturday)
Conclusion:
• Initial very high error rates, reduced over time depending on the length of
the historical data
• The system runs for 12 days to reach a steady error rate

Without introducing faults
Usage ratio between August 23
at 16:23 and 26th at 18:11
64%
36%
Evaluation: Partial Failure of a Weather Station
Object: Evaluate the autonomous aspect of the
system (self-healing and self-configuration)
Local Weather Station
within the University
Open Weather Map web
service
Close proximity of the weather
station provides better results
Introducing Faults
Temperature = 0.0
 4 hours to select an alternative station
 Possible improvement: update the reselection
criteria

Evaluation: Machine Learning Algorithms Testing
Object: What is the most suitable machine learning
algorithm for effective predictive analytics
Experimental Set-Up
• 1 week data as short term data set
• 5 weeks data as a long term data set
• Observation: building main incoming power
• Weather observation: local NUIG station
• Training set = 66% vs. test set = 34%
Four machine learning algorithms in WEKA
1. SMOReg – Support Vector Machine for Regression
2. 1 hidden layer back propagation ANN
3. 2 hidden layer back propagation ANN
4. Linear Regression

Evaluation: Machine Learning Algorithms Testing
DATASET SIZE = 672 – TRAINING DATA SIZE = 443
– TEST DATOA SET SIZE = 229
MA Error RMSE Time Corr. Coeff
SMOReg 5.3638 6.9759 0.851 0.6233
1 layer ANN 2.7606 3.6242 47.004 0.9158
2 layer ANN 3.0473 4.1506 50.842 0.8961
Linear Regression 4.8586 5.9283 0.759 0.7396
MA Error RMSE Time Corr. Coeff
SMOReg 4.6755 6.5965 32.4 0.8054
1 layer ANN 3.3332 4.5841 229.7 0.9162
2 layer ANN 3.7566 4.7279 247.0 0.9221
Linear Regression 4.7579 6.0173 2.4 0.8396
DATASET SIZE = 3298 – TRAINING DATA SIZE = 2176
– TEST DATOA SET SIZE = 1122
ANN more
accurate but
very slow
SMOReg and LR
took roughly the
same time but
LR more
accurate

Conclusion
Our contribution: an architecture for evaluating open data sources for
real-time predictive analytics comprising of:
(1) data management: collection, filtering, and warehousing
 Multiple data sources
 Reused existing vocabularies
(2) data analytics: source selection and predictive analytics
 Tested with short and long term data sets
 No configuration input required
 Results shown on a simple UI
• Our promise: An autonomic system:
 Experiment 1 = self-configuration and self-optimization
 Experiment 2 = self-healing and self-optimization

Future Work
• Fully autonomous system:
 discover data source autonomously
• ANN has high accuracy but very slow:
 investigate solutions for reducing the time for creating ANNs
• Using Open Data for better analytics
 working days vs. non-working days
Contact details
Wassim Derguech, Eanna Burke, Edward Curry
Insight Centre For Data Analytics -National University of Ireland, Galway
wassim.derguech@insight-centre.org ,
eannaburke1@gmail.com
edward.curry@insight-centre.org

An Autonomic Approach to Real-Time Predictive Analytics using Open Data and Internet of Things

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (11)

Similar to An Autonomic Approach to Real-Time Predictive Analytics using Open Data and Internet of Things

Similar to An Autonomic Approach to Real-Time Predictive Analytics using Open Data and Internet of Things (20)

Recently uploaded

Recently uploaded (20)

An Autonomic Approach to Real-Time Predictive Analytics using Open Data and Internet of Things