Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IoT Analytics


Published on

My WSO2Con Asia 2016 Talk

Published in: Technology
  • Login to see the comments

IoT Analytics

  1. 1. Source: Rolls Royce Trent 1000 Analytics data collected in ● Design ● Manufacture ● After-sales One fan blade manufacturing -> 0.5 TB of data Real-time data transmitted back to RR when planes are in-flight.
  2. 2. Source: Caterpillar From autonomous mining trucks to locomotives, they have sensors monitoring fuel, idle time, location for maximum operation efficiency. Predictive maintenance has saved millions from timely fuel pump replacement to adjusting ship hull cleaning intervals in their marine services.
  3. 3. ● What type of data? ● How fast you need results? ● How much data to keep? ● Historical, real-time, or predictive? ● Cloud or fog / edge analytics? Source:
  4. 4. ● Time related data ○ Time series processing ■ Energy consumption with time ■ Failure prediction ■ Specialized DBs - OpenTSDB ● Location data ○ GPS / iBeacons ○ Used in agriculture ■ Detect soil moisture, crop growth ■ Manage irrigation equipment ○ Traffic planning ■ Monitor vehicle speeds, location for better route suggestions ○ Geospatial optimized processing engines - GeoTrellis
  5. 5. Do we need the results instantaneously?, or a few seconds delay okay?, or else, results after several minutes or more is fine?
  6. 6. ● The most often used processing mode in IoT ○ Immediately take action on some event occurring with the source devices ■ Send out alerts from a temperature sensor hitting a limit ■ Notification in a car dashboard of low tire pressure ● Generating instant alerts and information based on the data sent by sensors, requires stream processing. Process events one by one in real-time to match to a predefined set of rules. ○ Apache Storm as a stream processing engine ■ Scalable and fault tolerant ○ For advanced pattern matching, a full fledged CEP engine can be used, e.g. WSO2 CEP, Esper etc..
  7. 7. ● Long term statistics generations, a batch processing system can be used: Apache Hadoop, Apache Spark ○ Average temperature in a room in the last month ○ Total power usage of the house in the last year ● Interactive analytics with technologies such as Apache Drill and indexed storage systems such as Couchbase. ● Most often, we may need to mash-up both batch analytics results with real-time processing ○ Comparing a long term statistics result with incoming real-time events for alerts etc.. ● Batch operations can be brought together with an indexing system for real-time analytics to lookup data instantly when required ○ Apache Lucene, WSO2 DAS Analytics / Event Tables
  8. 8. ● IoT devices generate high volume or different types of data ● We can decide to process right away when we receive it, and discard it, or else, keep it for more detailed processing ● Big Data stores gives us the option to store huge amounts of data as such. ● Purge the data, after the raw data is no longer required
  9. 9. ● Hindsight can be achieved by processing historical data, and understanding what has happened. ○ Batch processing systems such as Apache Hadoop and Apache Spark is used in this area ○ Data visualization with dashboards, showing related data together ● Insight would be understanding what is happening now ○ Achieved with real-time processing systems ○ Scenario: How are my jet engines performing right now ● Foresight is predicting what is going to happen ○ Achieved with machine learning systems such as Apache Mahout, Apache Spark MLlib, Microsoft Azure Machine Learning, WSO2 ML ○ Scenario: Predictive maintenance -> time to change specific parts in my car, service scheduling on an aeroplane
  10. 10. … … ● IoT will mean, naturally large amounts of data created, thus large amount of computation resources are required ● Typical scenario of a centralized analytics server for all devices may not be feasible all the time ○ Centralized analytics hardware may not be scalable for all the thousands of devices getting added frequently ○ The network communication will get flooded with analytics chatter when the device count increases ● Solution: edge analytics, a.k.a, fog analytics ○ Some of the analytics operations are offloaded to the end device itself or to an immediate gateway, for doing most or some of the analytics operations required. This creates a scalable infrastructure for device management in the IoT ecosystem.