Sensing the world with
Data of Things
By:Sriskandarajah Suhothayan (Suho)
Technical Lead at WSO2
@suhothayan
suho@wso2.com
STRUCTURE DATA 2016
MARCH 9 - 10 • SAN FRANCISCO
Any customer can have a car
painted any colour that he wants
so long as it is black
~ Henry Ford ~
Me Me Me !!!
Your customers want to have a
personalized experience.
We are in the time of ME!
What to do ?
You need to know the customer profile, e.g.
historical data, to take a decision
You need to understand the context in which the
customer evolves
You need to be able to react in real time to certain
conditions or patterns
Is IoT New ?
• source: http://community.arm.com/groups/internet-of-things/blog/2014/06
Internet of Things
http://na1.www.gartner.com/imagesrv/newsroom/images/HC_ET_2014.jpg;wadf79d1c8397a49a2
source : http://na1.www.gartner.com/imagesrv/newsroom/images/HC_ET_2014.jpg;wadf79d1c8397a49a2
IoT Ecosystem
WSO2 IoT Server M3 : https://goo.gl/nhbxnG
http://wso2.com/iot
Concepts of IoT Analytics
● Type of Data
● Distributed Nature
● Event-Drivenness
● Possible Type of Analytics
● Scalability
● Edge Analytics
● Uncertainty
Data Types of Things
● Time based data
○ Continuous monitoring & reporting
○ Time series processing (e.g. Energy
consumption over time)
○ Specialised DBs - OpenTSDB
● Location based data
○ Things are allover the place & they move
○ Tracked via GPS / iBeacons
○ Geospatial processing (e.g Traffic planning,
better route suggestion for vehicles)
○ Geospatial optimised processing engines -
GeoTrellis
IoT is Distributed
● Constant changes
○ When components added and removed
○ Data flows are modified or repurposed
● Data collection need to support
○ Weak 3G networks to Ad-hoc peer-to-peer networks.
○ Message Queuing Telemetry Transport (MQTT)
○ Common Open Source Publishing Platform (CoApp)
○ ZigBee or Bluetooth low energy (BLE)
● Dynamic scaling
○ Hybrid cloud
IoT Analytics are Event-Driven
● Sensors report data as Event Streams
● Analysis on flowing (or perishable) data
● Realtime Analytics
○ Detect temporal and logical patterns
○ Identify KPIs and Thresholds
○ Send out alerts immediately
○ E.g. Alert when temperature sensor hit a limit, notify in
car dashboard of low tire pressure
○ Systems : Apache Storm, Google Cloud DataFlow &
WSO2 CEP
History Repeats
● Present vs usual behavior
● Understand the history
● Batch Analytics
○ Perform periodic summarisation/analytics
○ E.g. Average temperature in a room last month, total
power usage of the factory last year
○ Systems : Apache Hadoop, Apache Spark + Storage
● Ad-Hoc Queries
● Interactive Analytics
○ Provides searchability
○ E.g. Identify fraud rings from simple fraud alerts
○ Systems : Apache Drill, indexed storage systems such
as Couchbase, Apache Lucene
Deep Investigations
Thinking Ahead
● When you don’t Know the equations
● Focusing conditions & preventing issues
● Predictive Analytics
○ Incremental Learning
○ E.g. Proactive maintenance, fraud detection and health
warnings
○ Systems : Apache Mahout, Apache Spark MLlib,
Microsoft Azure Machine Learning, WSO2 ML, Skytree
Technology we’ve chosen
Realtime Batch
Interactive Predictive
WSO2 Data Analytics Server
Plenty of Data
Scalable Data Processing
source : http://www.websitemagazine.com/content/blogs/posts/archive/2014/09/25/customer-service-in-2039.aspx
Scalable Realtime Deployment
More info : https://docs.wso2.com/display/CEP410/Creating+a+Storm+Based+Distributed+Execution+Plan
Scalable Deployment
Interactive
BatchRealtime &
Predictive
● Publishing all events is not good!
○ Hardware may not be scalable
○ Network getting flooded
● What we usually need
○ Aggregation over time
○ Trends that exceed thresholds
○ Event matching a rare condition
● Results in
○ Local optimisation
○ Quick detection of issues
○ Instant notification
Is Every Event Significant?
Edge Analytics
Analytics on the Edge
with WSO2 Siddhi
Push
Outliers ...
● E.g. Anomaly detection, Fraud
Analytics
● Alerts for known and unknown frauds and
Deep Search Analytics
https://goo.gl/TWV5C1
Outliers
● We used: Linear Regression, Markov Models & Credit Scoring
Uncertainty in Data of Things
Data can be
● Duplicated
● Arrives out of order
● Not arrive at all
● Wrong readings
Events Duplicates & Out of Order …
● Due redundant sensors & network latency
● Difficult for temporal data processing
○ Time Windows
○ Temporal ordering
● Such as Fraud detection
define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 10] ) ->
a2 = Purchase[ price >10000 and a1.cardNo == a2.cardNo ]
within 1 day
select a1.cardNo as cardNo, a2.price as price, a2.place as place
insert into PotentialFraud ;
Events Arriving Out of Order
E.g. Realtime Soccer Analytics (DEBS 2013) https://goo.gl/c2gPrQ
● Identify ball kicks, ball possession, shot on goal & offside
● Solutions : K-Slack Based Algorithms
https://www2.informatik.uni-erlangen.de/publication/download/IPDPS2013.pdf
Missing Data
● Due to network outages
● E.g. Smart Meters (DEBS 2014)
○ Smart home electricity data: 2000 sensors,
40 houses, 4 Billion events in four months
○ Processed 400K events/sec
● Solutions:
○ Approximate using complimenting
sensor reading
■ Electricity Monitoring
● Frequent Load readings
● Occasional Work readings
○ Fault-tolerant data streams : Google
Millwheel
Wrong Sensor Readings
● From GPS
● E.g.TFL Traffic Analysis
○ Using Transport for London open
data feeds.
○ http://goo.gl/04tX6k, http://goo.
gl/9xNiCm
○ Scales to 500,000 Events/Sec
and more
● From iBcons at shops, ships
and airport
● Solution: Kalman Filter
Visualisation
● Per-device & Summarization View
● Ability to group by categories
● Solutions: Composable Dashboard with sampling &
indexing
Communicate to Mobile & 3rd Party Apps
● Expose analytics
Results as API
○ Mobile Apps,
Third Party
● Provides
○ Security, Billing,
○ Throttling, Quotas
& SLA
● Solution
○ Write data to database
○ Expose them via secured APIs (E.g. WSO2 API Manager)
Reference Architecture for IoT Analytics
IoT Analytics
● (WSO2 DAS) 3.0.1
○ Combines all types of analytics.
● (WSO2 CEP) 4.1
○ For who need to analyze event streams in realtime.
● (WSO2 ML) 1.1
○ For building Predictive Models
http://wso2.com/analytics
http://wso2.com/iot
Thank You
Any Questions ?
Contact us !

Sensing the world with Data of Things

  • 1.
    Sensing the worldwith Data of Things By:Sriskandarajah Suhothayan (Suho) Technical Lead at WSO2 @suhothayan suho@wso2.com STRUCTURE DATA 2016 MARCH 9 - 10 • SAN FRANCISCO
  • 2.
    Any customer canhave a car painted any colour that he wants so long as it is black ~ Henry Ford ~
  • 3.
    Me Me Me!!! Your customers want to have a personalized experience. We are in the time of ME!
  • 6.
    What to do? You need to know the customer profile, e.g. historical data, to take a decision You need to understand the context in which the customer evolves You need to be able to react in real time to certain conditions or patterns
  • 7.
    Is IoT New? • source: http://community.arm.com/groups/internet-of-things/blog/2014/06
  • 8.
    Internet of Things http://na1.www.gartner.com/imagesrv/newsroom/images/HC_ET_2014.jpg;wadf79d1c8397a49a2 source: http://na1.www.gartner.com/imagesrv/newsroom/images/HC_ET_2014.jpg;wadf79d1c8397a49a2
  • 9.
  • 10.
    WSO2 IoT ServerM3 : https://goo.gl/nhbxnG http://wso2.com/iot
  • 11.
    Concepts of IoTAnalytics ● Type of Data ● Distributed Nature ● Event-Drivenness ● Possible Type of Analytics ● Scalability ● Edge Analytics ● Uncertainty
  • 12.
    Data Types ofThings ● Time based data ○ Continuous monitoring & reporting ○ Time series processing (e.g. Energy consumption over time) ○ Specialised DBs - OpenTSDB ● Location based data ○ Things are allover the place & they move ○ Tracked via GPS / iBeacons ○ Geospatial processing (e.g Traffic planning, better route suggestion for vehicles) ○ Geospatial optimised processing engines - GeoTrellis
  • 13.
    IoT is Distributed ●Constant changes ○ When components added and removed ○ Data flows are modified or repurposed ● Data collection need to support ○ Weak 3G networks to Ad-hoc peer-to-peer networks. ○ Message Queuing Telemetry Transport (MQTT) ○ Common Open Source Publishing Platform (CoApp) ○ ZigBee or Bluetooth low energy (BLE) ● Dynamic scaling ○ Hybrid cloud
  • 14.
    IoT Analytics areEvent-Driven ● Sensors report data as Event Streams ● Analysis on flowing (or perishable) data ● Realtime Analytics ○ Detect temporal and logical patterns ○ Identify KPIs and Thresholds ○ Send out alerts immediately ○ E.g. Alert when temperature sensor hit a limit, notify in car dashboard of low tire pressure ○ Systems : Apache Storm, Google Cloud DataFlow & WSO2 CEP
  • 15.
    History Repeats ● Presentvs usual behavior ● Understand the history ● Batch Analytics ○ Perform periodic summarisation/analytics ○ E.g. Average temperature in a room last month, total power usage of the factory last year ○ Systems : Apache Hadoop, Apache Spark + Storage
  • 16.
    ● Ad-Hoc Queries ●Interactive Analytics ○ Provides searchability ○ E.g. Identify fraud rings from simple fraud alerts ○ Systems : Apache Drill, indexed storage systems such as Couchbase, Apache Lucene Deep Investigations
  • 17.
    Thinking Ahead ● Whenyou don’t Know the equations ● Focusing conditions & preventing issues ● Predictive Analytics ○ Incremental Learning ○ E.g. Proactive maintenance, fraud detection and health warnings ○ Systems : Apache Mahout, Apache Spark MLlib, Microsoft Azure Machine Learning, WSO2 ML, Skytree
  • 18.
    Technology we’ve chosen RealtimeBatch Interactive Predictive
  • 19.
  • 20.
    Plenty of Data ScalableData Processing source : http://www.websitemagazine.com/content/blogs/posts/archive/2014/09/25/customer-service-in-2039.aspx
  • 21.
    Scalable Realtime Deployment Moreinfo : https://docs.wso2.com/display/CEP410/Creating+a+Storm+Based+Distributed+Execution+Plan
  • 22.
  • 23.
    ● Publishing allevents is not good! ○ Hardware may not be scalable ○ Network getting flooded ● What we usually need ○ Aggregation over time ○ Trends that exceed thresholds ○ Event matching a rare condition ● Results in ○ Local optimisation ○ Quick detection of issues ○ Instant notification Is Every Event Significant?
  • 24.
    Edge Analytics Analytics onthe Edge with WSO2 Siddhi Push
  • 25.
    Outliers ... ● E.g.Anomaly detection, Fraud Analytics ● Alerts for known and unknown frauds and Deep Search Analytics https://goo.gl/TWV5C1
  • 26.
    Outliers ● We used:Linear Regression, Markov Models & Credit Scoring
  • 27.
    Uncertainty in Dataof Things Data can be ● Duplicated ● Arrives out of order ● Not arrive at all ● Wrong readings
  • 28.
    Events Duplicates &Out of Order … ● Due redundant sensors & network latency ● Difficult for temporal data processing ○ Time Windows ○ Temporal ordering ● Such as Fraud detection define stream Purchase (price double, cardNo long,place string); from every (a1 = Purchase[price < 10] ) -> a2 = Purchase[ price >10000 and a1.cardNo == a2.cardNo ] within 1 day select a1.cardNo as cardNo, a2.price as price, a2.place as place insert into PotentialFraud ;
  • 29.
    Events Arriving Outof Order E.g. Realtime Soccer Analytics (DEBS 2013) https://goo.gl/c2gPrQ ● Identify ball kicks, ball possession, shot on goal & offside ● Solutions : K-Slack Based Algorithms https://www2.informatik.uni-erlangen.de/publication/download/IPDPS2013.pdf
  • 30.
    Missing Data ● Dueto network outages ● E.g. Smart Meters (DEBS 2014) ○ Smart home electricity data: 2000 sensors, 40 houses, 4 Billion events in four months ○ Processed 400K events/sec ● Solutions: ○ Approximate using complimenting sensor reading ■ Electricity Monitoring ● Frequent Load readings ● Occasional Work readings ○ Fault-tolerant data streams : Google Millwheel
  • 31.
    Wrong Sensor Readings ●From GPS ● E.g.TFL Traffic Analysis ○ Using Transport for London open data feeds. ○ http://goo.gl/04tX6k, http://goo. gl/9xNiCm ○ Scales to 500,000 Events/Sec and more ● From iBcons at shops, ships and airport ● Solution: Kalman Filter
  • 32.
    Visualisation ● Per-device &Summarization View ● Ability to group by categories ● Solutions: Composable Dashboard with sampling & indexing
  • 33.
    Communicate to Mobile& 3rd Party Apps ● Expose analytics Results as API ○ Mobile Apps, Third Party ● Provides ○ Security, Billing, ○ Throttling, Quotas & SLA ● Solution ○ Write data to database ○ Expose them via secured APIs (E.g. WSO2 API Manager)
  • 34.
  • 35.
    IoT Analytics ● (WSO2DAS) 3.0.1 ○ Combines all types of analytics. ● (WSO2 CEP) 4.1 ○ For who need to analyze event streams in realtime. ● (WSO2 ML) 1.1 ○ For building Predictive Models http://wso2.com/analytics http://wso2.com/iot
  • 36.
  • 37.