Data Science and Smart Systems: Creating the Digital Brain

1,382 views
1,155 views

Published on

Big Data technologies enable us to build the digital brain of smart systems. I will illustrate with examples how we build a digital brain by collecting data from a large number of sensors and using the brain to find value in that data. We build a Data Lake using cutting edge technology from Pivotal and use it to store large amounts of sensor and other data. Then we can find patterns in that data by applying the Data Science methodology using sophisticated machine learning and statistical algorithms customized to run on big data within the Data Lake. Armed with these patterns the system can detect anomalies and respond in an appropriate manner. Data Science combined with sensors and actuators can make a system smart!

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,382
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • We have the Internet Of Things – it is a dumb collection of very sophisticated machines with thousands of sensors and actuators / machine
  • 2010 accident on BP offshore platform in the Gulf of Mexico
    Drilling rigs cost between $350,000 and $1,000,000 per day
    Non-productive Time (NPT) is the measurement most watched
    Worst case scenario is a Macondo type blow-out = $40B liability
    We need - better safety protocols……better regulation……and a smart system!
  • Sensors:
    GE blowout preventers (BOP) collect information like ram position, system health and maintenance, etc.
    Jet engine
    Fitbit
    We are all carrying a large number of sensors on us right now! There’s a lot of intelligence (analytics) built into these sensors/machines
    SENSORS: We now collect huge amounts of data about activities of humans and machines
    We apply predictive analytics to that data to build decision support tools – like using regression models to facilitate scenario analysis
    … and action is still taken based on human orders
    Create a list of customers likely to churn and reach out to them
    Optimize the pricing and discounting plan for the next year and modify it on the fly
    NUDGE: Figure out the drivers that affect human behavior like saving and encourage that
    Understand what is causing shutdowns and investigate
    Exception: the internet has automated systems that react to events of interest
  • They are not connected!
    We have humans look at the data from sensors and then make decisions – full of delays and room for error
    WE CAN DO MUCH BETTER!
  • Once connected, a smart system takes action in response to an event of interest
    Brian:
    Input – signal from eyes
    Analysis – compute trajectory of ball
    Action – swing bat to connect woth ball
  • Brian:
    Input – signal from eyes
    Analysis – compute trajectory of ball
    Action – swing bat to connect woth ball
  • Shown – a gamma ray detector in a drill stem – part of MWD – Monitoring While Drilling
  • Smart System = Data lake for storing sensor data + data science for building and operationalizing models + actuators for taking action
    Sensors collect data and send it to a Data Lake
    The Digital Brain –
    We create the brain by building models that extract patterns
    The brain is then activated and can detect deviations from these patterns
    The system can initiate action on it’s own as well as provide alerts and predictive intelligence to humans
    Actuators:
    -Connect to the control system and send action messages
    -Shutdown system if blowup is predicted
    -Send an alert to humans in-charge if something is anomalous but dangerous
    -Predictive maintenance: Flag component for maintenance when required
    THE DIGITAL BRAIN CANNOT BE IN ONE MACHINE – IT NEEDS INFORMATION FROM THE NETWORK OF MACHINES
    A Parallel Storage system (or Data Lake) where all the sensor information is collected (at Pivotal we have developed Pivotal HD + HAWQ based on Hadoop)
    Capability to build models which keeping the data in-parallel and in-place
    EXTRACTING PATTERNS: Clustering algorithms – we have used the k-means clustering function available in MADlib, graph-based clustering, clustering of time-series data in frequency space etc.
    FINDING ANOMALIES: distance from centroid, change in cluster assignment etc.
    LIVING MODELS: Models have to learn and update continuously
    The ability to send appropriate signals to the actuator control systems
    Low latency scoring
    API that can connect to Business Intelligence tools and Apps
    Refer to the debate in the AI community between Douglas Hofstadter (UMich, Indiana University) one one hand and Peter Norvig (Google) and Stuart Russell (Berkeley) on the other hand
    We want to step out of that debate and combine humans and machines into smart system (a la Arnab Gupta of Opera @ Strata 2011 - > man + machine)
  • We are not taking the humans out of the loop but empowering them
    Tiers:
    Ingestion: Ability to bring data from multiple data sources across all timelines with varying QoS
    Distillation: Ability to take the data stored in the storage tier and coverting it to structured data for easier analysis by downstream applications
    Processing: Ability to run analytical algorithms and user queries with varying QoS (real-time, interactive, batch) to generate structured data for easier analysis by downstream applications
    Insights: Ability to analyze all the data with varying QoS (real-time, interactive, batch) to generate insights for business decision making
    Action: Ability to integrate the insights with the business decision making systems
    Unified Data Management: Ability to manage the data lifecycle, access policy definition, and master data management and reference data management services
    Unified Operations: Ability to monitor, configure and manage the whole Data Lake from a single operations environment
    Processing Tier – PHD (Hive, HBase, Pig and MapReduce)
    Distillation Tier – Pivotal Data Dispatch, Pivotal Analytics, ETL Partner Products
    Informational
    Ability to get information in a dashboard
    Integration with business intelligence toolsTableau, MicroStrategy, BusinessObjects, Pentaho.
    Alerting
    Ability to alert the decision maker
    -Integration with the alert systems
    -Dashboard, alarms, emails, pagers, phones etc.
    Automation
    - Ability to integrate with business decision making systems
    - Integration with the applications to take automated actions
    - MessageMQ, Rabbit, Spring, & other technologies.
    Store Everything
    Analyze Anything
    Build Next Generation
  • Smart meters measure power twice an hour – it’s a measure of all activity!
    We can Fourier transform 10 weeks of data from 100,000 meters in 5 seconds flat
    … and take action: once an anomaly is detected we can detect theft, prevent blackouts, and much, much more
  • Batch process – FT time series data (data parallel algorithm) and use k-means clustering (not explicitly parallel – use MADlib); identify and label outliers
    Real-time process – detect changes and outliers and set off the suitable alarm
  • 2010 accident on BP offshore platform in the Gulf of Mexico
    Drilling rigs cost between $350,000 and $1,000,000 per day
    Non-productive Time (NPT) is the measurement most watched
    Worst case scenario is a Macondo type blow-out = $20B liability
    We need - better safety protocols……better regulation……and a smart system!
  • An ecosystem of smart machines, much like a natural ecosystem, will be self-healing and self-sustaining
    That’s the true realization of the potential of the Internet Of Things
    This is a movement we can all make happen
  • We have the Internet Of Things – it is a dumb collection of very sophisticated machines with thousands of sensors and actuators / machine
  • We have the Internet Of Things – it is a dumb collection of very sophisticated machines with thousands of sensors and actuators / machine
  • Data Science and Smart Systems: Creating the Digital Brain

    1. 1. A NEW PLATFORM FOR A NEW ERA
    2. 2. Creating the Digital Brain Kaushik Das Senior Principal Data Scientist © Copyright 2014 Pivotal. All rights reserved. 2
    3. 3. TODAY We have analytics on Big Data …and we have smart machines. © Copyright 2014 Pivotal. All rights reserved. 3
    4. 4. But That Does not Help Us Prevent Accidents Like the Macondo Disaster © Copyright 2014 Pivotal. All rights reserved. 4
    5. 5. WHAT IF We Could Prevent Disasters Like This ? © Copyright 2014 Pivotal. All rights reserved. 5
    6. 6. We Have All the Ingredients of a Smart System Sensors © Copyright 2014 Pivotal. All rights reserved. Actuators Decision Support Tools 6
    7. 7. But Where is the Brain? Sensors © Copyright 2014 Pivotal. All rights reserved. ? Actuators 7
    8. 8. The Brain Brings it All Together The Brain: 1.takes in the input from the eyes 2.analyzes it to compute the trajectory of the ball 3.tells the body what action to take to hit the ball © Copyright 2014 Pivotal. All rights reserved. 8
    9. 9. Let’s Put in a Digital Brain and Make Systems Smart © Copyright 2014 Pivotal. All rights reserved. 9
    10. 10. Let’s Build a Digital Brain The brain brings it all together 1.takes in the input from a large number of sensors 2.Builds a model and uses it to analyze incoming data 3.tells the actuators what action to take Over a network of machines © Copyright 2014 Pivotal. All rights reserved. 10
    11. 11. And Now Let’s Prevent Disasters Input: Action: Shut down to prevent blowout Sensors in drill The Digital Brain: Extracts patterns and flags outliers/anomalies © Copyright 2014 Pivotal. All rights reserved. 11
    12. 12. Smart System = Sensors + Digital Brain + Actuators Data Lake Sensors & actuators Data Science for Building Models
    13. 13. Data Lake Architecture Unified Sources Centralized Management System monitoring Real-time ingestion System management Unified Data Management Tier Data mgmt. services Flexible Actions MDM RDM Audit and policy mgmt. Real-time insights Workflow management Micro batch ingestion Processing Tier In-memory Interactive insights MPP database Batch ingestion Distillation Tier Batch insights HDFS storage Unstructured and structured data © Copyright 2014 Pivotal. All rights reserved. 13
    14. 14. The Digital Brain: Making a Smart Grid Smarter! Input: Data from smart meters Action: Where to send trucks and when, preventive maintenance The Digital Brain: Uses Fourier transform extracts patterns and flags outliers/anomalies © Copyright 2014 Pivotal. All rights reserved. 14
    15. 15. The Data Science that Goes into the Digital Brain Identify patterns based on frequency © Copyright 2014 Pivotal. All rights reserved. Detect anomalies 15
    16. 16. Now the Power Company Knows a Tree has Fallen Even Before the Residents Complain © Copyright 2014 Pivotal. All rights reserved. 16
    17. 17. “Zero unplanned downtime is a key goal for GE’s use of the Industrial Internet” Jeff Immelt, GE © Copyright 2014 Pivotal. All rights reserved. 17
    18. 18. WHAT IF We can take this even further – what about zero unplanned outages, zero industrial accidents and zero environmental disasters? © Copyright 2014 Pivotal. All rights reserved. 18
    19. 19. FOR MORE DETAILS http://blog.gopivotal.com/ features/creating-the-digital-brain © Copyright 2014 Pivotal. All rights reserved. 19
    20. 20. A NEW PLATFORM FOR A NEW ERA

    ×