REAL time
Analytics AT
SCALE
SMART DATA PIPES For THE
INTERNET OF THINGS
Assaf Araki, Big Data Analytics Architect
Big Data Analytics, Intel
Intro to Big Data
Analytics @ Intel People (+100)
Data
Scientists
Management
Big Data
Developers
Analytics
PMs
13%
41%
9%
37%
CONTRIBUTION TO Data Center Group
CONTRIBUTION TO INTEL
Operations
MISSIO
N
#1 Operational excellence
#2 Help Intel win area of
Intelligent machines
VISION
Analytics is a
competitive
advantage for Intel
Industry / Academy
Technical due-diligence
assessment for Intel Capital
Benchmark with startups
Academy Collaborations
Assist Intel Sales & Marketing
DESIGN
Cut validations time-to-market
MANUFACTURI
NGReduce test cost
SALES &
MARKETINGIncrease sales through analytics
Stream
Analytics
Cloud
Parkinson
Research
Machine
Learning
Strategy
The IOT challenge
CloudIngestionThings
Cloud Infrastructure
Data Platform
Analytics Platform
UI Services
Use case : The Parkinson Disease
research
44
CLINICAL TRIALS
Create and Validate
Algorithms & Measures
POPULATION STUDY
Generate insights
Using Big data analytics
10
Medication
reporting
Medication
reminder
Report
PATIENT
REPORTING
OTHER
Configurable
data
collections
Contribution
score
Integrated
Login and
registration Pebble
notifications
OBJECTIVE
MEASURES
Gait
Sleep
Tremor
Activity Level
Controlled
Tests
So, Why is it Big-Data Problem?
30
subjects
5
DaysperSubject
0.15TB
Weekly
500
subjects
30
DaysperSubject
1GB
PerSubjectperDay
15TB
Monthly
1000
subjects
365
DaysperSubject
365TB
Yearly
1GB
PerSubjectperDay
1GB
PerSubjectperDay
SERVICE
BATCH ANAYTICS
STREAM ANALYICS
INGESTION
STORAGE
USER INTERFACE
Mosquitt
o
7
CLOUD COMPUTING SERVICES
Smart Ingestion
characteristics
Personalized
Easy to use
Smart Data
Pipe
• Per single device or user
• Maintain state and required data for ML
• Easily subscribe to any Stream
• Use familiar development Languages (Java, Scala)
• Developers focus on logic development
• Apply analytics on the Stream
• Trigger actions (close the feedback loop) in timely manner
Scalability
• Linear scalability (scale Out)
• Extremely High concurrencies
• High Throughput
Fault
Tolerance• No Single point of failure
• Seamless recovery
• Persistent
Smart Data Ingestion – High level
overview
9
Device
Device
Device
Device
Scalable, Persistent Broker Processing, Stream
Analytics
What is Akka?
• Micro-service(Actor) oriented.
• Message Driven
• Lock-free
• Location-transparent
• High performance
• Fault Tolerant
• Scales linearly
Stream Processing - the Akka
way…
11
Each actor is a small peace of Java or Scala
code performing its role
A set of actors creates a topology which is
responsible for device’s data stream
processing
A single Akka node may have millions of
concurrent actors handling different streams
and operations
Change
detection
Automatic
change
detection
time rules
matcher
Detect & raise
alert for
matched rules
Sleep
quality
calculating
users’ sleep
quality
Tremor
detection
Tremor
detection based
on devices’
Aggregator
Aggregation
(50hz to
minutes / hours)
Sample Parkinson Disease re
Subscriber Parser Aggregator
HBase
Writer
Analytics
Manager
Change
Detection
UnZip
Real Time
Rules
Sleep
Quality
STREAM Processing
MANAGEMENT Layer (“Pigeon”)
• Core OS & Docker containers enable portability and ease of deployment anywhere
• Enables the flexibility of choosing a set of desired containers based on a given use case
requirements
Easy Portability With Docker &
Core OS
Preconfigured containers ready to be loaded
• IoT data Ingestion goes beyond moving the data into the cloud
• We have deployed a scalable and fault tolerance, multi-protocol pipeline that
enables stream Analytics
• Stream Analytics platform is leveraged for Other IoT projects
Summary
Thank You!

Assaf Araki – Real Time Analytics at Scale

  • 1.
    REAL time Analytics AT SCALE SMARTDATA PIPES For THE INTERNET OF THINGS Assaf Araki, Big Data Analytics Architect Big Data Analytics, Intel
  • 2.
    Intro to BigData Analytics @ Intel People (+100) Data Scientists Management Big Data Developers Analytics PMs 13% 41% 9% 37% CONTRIBUTION TO Data Center Group CONTRIBUTION TO INTEL Operations MISSIO N #1 Operational excellence #2 Help Intel win area of Intelligent machines VISION Analytics is a competitive advantage for Intel Industry / Academy Technical due-diligence assessment for Intel Capital Benchmark with startups Academy Collaborations Assist Intel Sales & Marketing DESIGN Cut validations time-to-market MANUFACTURI NGReduce test cost SALES & MARKETINGIncrease sales through analytics Stream Analytics Cloud Parkinson Research Machine Learning Strategy
  • 3.
    The IOT challenge CloudIngestionThings CloudInfrastructure Data Platform Analytics Platform UI Services
  • 4.
    Use case :The Parkinson Disease research 44 CLINICAL TRIALS Create and Validate Algorithms & Measures POPULATION STUDY Generate insights Using Big data analytics
  • 5.
  • 6.
    So, Why isit Big-Data Problem? 30 subjects 5 DaysperSubject 0.15TB Weekly 500 subjects 30 DaysperSubject 1GB PerSubjectperDay 15TB Monthly 1000 subjects 365 DaysperSubject 365TB Yearly 1GB PerSubjectperDay 1GB PerSubjectperDay
  • 7.
    SERVICE BATCH ANAYTICS STREAM ANALYICS INGESTION STORAGE USERINTERFACE Mosquitt o 7 CLOUD COMPUTING SERVICES
  • 8.
    Smart Ingestion characteristics Personalized Easy touse Smart Data Pipe • Per single device or user • Maintain state and required data for ML • Easily subscribe to any Stream • Use familiar development Languages (Java, Scala) • Developers focus on logic development • Apply analytics on the Stream • Trigger actions (close the feedback loop) in timely manner Scalability • Linear scalability (scale Out) • Extremely High concurrencies • High Throughput Fault Tolerance• No Single point of failure • Seamless recovery • Persistent
  • 9.
    Smart Data Ingestion– High level overview 9 Device Device Device Device Scalable, Persistent Broker Processing, Stream Analytics
  • 10.
    What is Akka? •Micro-service(Actor) oriented. • Message Driven • Lock-free • Location-transparent • High performance • Fault Tolerant • Scales linearly
  • 11.
    Stream Processing -the Akka way… 11 Each actor is a small peace of Java or Scala code performing its role A set of actors creates a topology which is responsible for device’s data stream processing A single Akka node may have millions of concurrent actors handling different streams and operations Change detection Automatic change detection time rules matcher Detect & raise alert for matched rules Sleep quality calculating users’ sleep quality Tremor detection Tremor detection based on devices’ Aggregator Aggregation (50hz to minutes / hours) Sample Parkinson Disease re Subscriber Parser Aggregator HBase Writer Analytics Manager Change Detection UnZip Real Time Rules Sleep Quality
  • 12.
  • 13.
    • Core OS& Docker containers enable portability and ease of deployment anywhere • Enables the flexibility of choosing a set of desired containers based on a given use case requirements Easy Portability With Docker & Core OS Preconfigured containers ready to be loaded
  • 14.
    • IoT dataIngestion goes beyond moving the data into the cloud • We have deployed a scalable and fault tolerance, multi-protocol pipeline that enables stream Analytics • Stream Analytics platform is leveraged for Other IoT projects Summary
  • 15.

Editor's Notes

  • #4 The Internet of Things (IoT) is creating unprecedented business opportunities for both individuals and organizations.
  • #5 The story The name of the man in the picture on the left is Andy Grove and he is one of Intel’s founders and has Parkinson (PD) The story begins when he reads and article in the NY times about Big Data and decides to start a project within Intel related to PD and Big Data He contacts Michael J fox foundation and then decides to start a joint effort together The idea is to elaborate Internet of things, wearable's technology and big data platforms to assist PD research PD Neurodegenerative disease, movement disorder symptoms Existing treatment are mainly for quality of life improvements and not for curing ~6M patients, ~1M in the US and ~5M in the rest of the globe Life expectancy: ~10-15 years 1 out 100 over the age of 60 is a PD patient No Test and no Progression markers
  • #6 On this slide the focus should be on the patient reported capabilities and the configurable data collection strategies. For the patient reported explain the Medication reminder and reporting capabilities which helps us track patients compliance, learn abour medication effect on the motor symptoms and this while providing value to the patients The Objective measures part is covered later on in the PPT. In the Other section talk about the ability to configure which sensorial data to use for each cohort of users
  • #8 Quick review of PD solution layers as a use case of IoT platform Batch Layer based on Spark Storage layer using Hadoop, HBase & MySQL for Metadata Powerful, scalable ingestion layer based on Akka & Kafka A dynamic stream analytics layer based on Akka actor system framework Scalable Service layer providing set of APIs for registration & data extraction out of the platform UI layer – the only layer in this diagram which is unique to PD solution – using Pebble watch and Android application to collect data and interact with patients You can note that 5 out of the presented 6 layers (excluding the UI layer) are part of the IoT platform and can be used for similar products / verticals
  • #10 Multi-protocol pipeline built over AKKA & KAFKA KAKFA is a fast, scalable, durable & distributed messaging system -  high-throughput, low-latency platform for handling real-time data feeds.  AKKA is an Actor based framework allowing high concurrency, distributed and resilient based on events / messaging This layer is responsible for: Pulling messages Parse & Process Concurrent & controlled write
  • #11 Writing correct concurrent, fault-tolerant and scalable applications is hard. Akka uses the Actor Model to raise the abstraction level and provide a better platform to build correct concurrent and scalable applications. Can support millions of concurrent actors handling different streams which is a good fit to IoT characteristics. We use Akka for: Processing messages Near Real-time rules Change detection at the device level
  • #14 Docker is an open-source project that automates the deployment of applications inside software containers CoreOS is an open source lightweight operating system based on the Linux kernel and designed for providing infrastructure to clustered deployments
  • #15 Change Detection – Single (Kolmogorov-Smirnov) & Multi sensor ( Under patent ) Anomaly Detection Periodicity Stream classification