Successfully reported this slideshow.

HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Disease Research

5

Share

Loading in …3
×
1 of 34
1 of 34

HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Disease Research

5

Share

n this session, you will learn about a solution developed in partnership between Intel and the Michael J. Fox foundation to enable breakthroughs in Parkinson's disease (PD) research, by leveraging wearable sensors and smartphone to monitor PD patient's motor movements 24/7. We'll elaborate on how we're using HBase for time-series data storage and integrating it with various stream, batch, and interactive technologies. We'll also review our efforts to create an interactive querying solution over HBase.

n this session, you will learn about a solution developed in partnership between Intel and the Michael J. Fox foundation to enable breakthroughs in Parkinson's disease (PD) research, by leveraging wearable sensors and smartphone to monitor PD patient's motor movements 24/7. We'll elaborate on how we're using HBase for time-series data storage and integrating it with various stream, batch, and interactive technologies. We'll also review our efforts to create an interactive querying solution over HBase.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Disease Research

  1. 1. Enable breakthroughs in Parkinson disease research through wearables and Big Data analytics technologies
  2. 2. About us… • Part of the Big Data Analytics Solutions group @Intel reporting to the Data Center Group • Developing products & solutions leveraging: • Big Data edge-technologies • Self developed machine learning & steam analytics algorithms • Our team includes developers, data scientists and system analysts • I am a Big Data Analytics Architect and Development Manager responsible for leading-edge technology projects within Intel involving Big Data and stream analytics solutions in the Internet of Things and Parkinson's disease research 2
  3. 3. How It All Started? 3 Big data analytics IOT
  4. 4. Parkinson’s Disease 4 OVER AGE 0F 60 1/100 60,000 NEW 1M/US 5M/WORLD NO CURE, MEDICATION ONLY HELPS WITH SYMPTOMSThere is NO TEST and no PROGRESSION MARKER PARKINSON’S DISEASE IS CAUSED BY THE DEATH OF DOPAMINE CELLS. OF THESE CELLS ARE ALREADY LOST BY THE TIME MOTOR SYMPTOMS APPEAR. 60 80%TO
  5. 5. Challenges To Address NO OBJECTIVE MEASURE 3-6 MONTHS BETWEEN PHYSICIAN VISITS CHANGES ARE SLOW AND HARD TO DETECT AVERAGE TRIAL SIZE < 100 PATIENTS VERY SMALL number of patients contribute to research COST OF TRIALS are in the scales of $M 5
  6. 6. HOW? 6
  7. 7. The Solution Wear a watch Start an application 7 1 2
  8. 8. Use Cases MANAGE THE DISEASE USING DATA FREE DATA FOR 1000’S OF PATIENTS ACCURATE REPORT SINCE LAST VISIT MEASURE MEDICATION EFFECT RESEARCHER PHARMACEUTICAL CLINICIAN INTEL BIG DATA CLOUDANALYSTICS INSIGHT / VALUE 8
  9. 9. DEMO 9
  10. 10. THE APPLICATION 10
  11. 11. 10 Medication reporting Medication reminder Report something PATIENT REPORTED OTHER Configurable data collections Contribution score Integrated Login and registration Pebble notifications OBJECTIVE MEASURES Gait Sleep Tremor Activity Level Controlled Tests
  12. 12. BIG-DATA and IOT TECHNOLOGIES 12
  13. 13. IoT Cloud Simplified Framework DatacenterNetworkThing Cloud Infrastructure Data Platform Analytics Platform UI Services Gateway 13
  14. 14. SERVICE LAYER BATCH LAYER STREAM ANALYICS LAYER INGESTION LAYER STORAGE LAYER USER INTERFACE LAYER Mosquitt o 14 CLOUD COMPUTING SERVICES
  15. 15. Storage Layer • Cloudera Enterprise Data Hub • HBase as main scalable time series data storage layer • Allows high writes throughput • Random real-time access to stored data • Highly available MySQL as metadata storage 15
  16. 16. • Multi-protocol pipeline built over AKKA & KAFKA • KAKFA is a fast, scalable, durable & distributed messaging system • AKKA is an Actor based framework allowing high concurrency, distributed and resilient based on events / messaging • This layer is responsible for: • Pulling messages • Parse & Process • Concurrent & controlled write HBase Load Balancer Device Device Device Device Mosquito Mosquito 16 Data Ingestion Layer
  17. 17. • Based on AKKA actors framework • Contains millions of concurrent actors handling different streams and operations • Each actor is a small peace of code performing its role • A set of actors creates a topology which is responsible for device’s data stream processing 17 Stream Analytics Layer Subscriber Parser Aggregator HBase Writer Analytics Manager Change Detection UnZip Real Time Rules Sleep Quality
  18. 18. • Based on Apache Spark over HBase • Spark is a fast and general engine for large-scale data processing • Algorithms & Calculations are being executed on large data sets on a daily basis • Layer includes: • Set of complex machine learning algorithms • Rule engine rules baseline calculations 18 Batch Analytics Layer
  19. 19. • Interactive and scalable web services layer • A set of RESTful APIs allowing: • Registration to platform • Row & calculated data retrieval from HBase • Built on top of Play framework and providing secured entry point • Uses Apache Phoenix & native HBase client HBase Load Balancer 19 Service Layer
  20. 20. HBase Challenges 20
  21. 21. Data Ingestion to HBase • Challenge Concurrent ingesting millions messages into HBase creates a massive load on HBase region servers and causes disconnections • Development Evolution 1. HBase client per topology (millions of writers) 2. Pool of HBase clients, each is using separate HTable 3. Pool of HBase clients, all are using same HBase connection pool (HConnectionManager) • Solution Creating “fixed” number of connections to HBase, allowing batch writes and load balancing 21 . . . Hbase Writer Hbase Writer Hbase Writer . .. Pool Router
  22. 22. Table Indicators over Large Tables • Challenge Gathering indicators (i.e. counts) on large HBase tables results in long table scans and performance reduction • Solution • Real time update new indicator columns using incrementColumnValue • Allows atomic increment of a specific column • Large table counts successfully implemented • Allowed implementation of required Indicators • Real time hourly counts • Real time Max values (i.e. last time a user transmitted data) 22
  23. 23. Batch Processing Input Format • Challenge • Batch processing is done using Spark – InputFormat is required for scan • TableInputFormat was used and is equivalent to a single scan • Poor performance when data from “remote” parts of a table are required • Solution • Using MultiTableInputFormat • Allows usage of multiple scans • Successfully used with more than 100 scans per MultiTableInputFormat 23
  24. 24. ANALYTICS 24
  25. 25. Activity Level • Measure that will continuously describe the intensity of the patient’s activity throughout the day and will motivate the patients to be more active. • Motivates the patients to be more active (known to be important for PD patients) • Personalized measure per patient based on their average activity of walking periods (avoid frustration) • Based on intensity measurement from the accelerometer • Filters our tremor as 25
  26. 26. Activity Level – An Example 26 Activity Level in Controlled Session (ON State) Activity Level in Controlled Session (OFF State)
  27. 27. Tremor • Tremor is one of the most obvious symptoms of PD • Most PD patients experience tremor • Tremor is detectable using signal processing techniques 27
  28. 28. TRAILS AND PARTNERS 28
  29. 29. REAL PD L-DOPA RESPONSE TRIAL DATA GATHERING TRIAL FOX INTEL APPLICATION TRIAL 1000 50 30 20 FOX INSIGHT WEAR 1000 20 20 30 20 10 29 Trial And Partners SCRIPS TREMMOR TRIAL 1000
  30. 30. WHAT’S NEXT? 30
  31. 31. SCALE PLATFORM • Scale to 1000’s of patients in the US • Scale to 1000’s of patients in the Netherlands • IOS support • Support additional wearable's • Build more value generating capabilities • Upgrade to HBase 1.0 • Upgrade Spark to 1.3 • Enrich Platform (i.e. Advanced Export, Reporting) • Enrich Parkinson Disease solution • Analytics • Value to patients 31
  32. 32. Q&A
  33. 33. Thank you! 33
  34. 34. • Strategic direction 34 R: 0 G: 112 B: 197 R: 247 G: 127 B: 0 R: 130 G: 170 B: 50 R: 0 G: 172 B: 240

Editor's Notes


  • On Wednesday, August 13, Intel and the Michael J. Fox Foundation (MJFF) announced a collaboration to improve research and treatment of Parkinson’s disease using wearable computing and big data analytics. The collaboration includes a multi-stage study using wearable devices to collect patient data and an Intel-built big data analytics platform to analyze the vast pools of data with the goal of developing objective measures for Parkinson’s disease progression, treatment response and drug development.
    Key Messages:
    Intel and the Michael J. Fox Foundation are joining forces to use wearable computing and big data analytics to help improve Parkinson’s disease research and treatments.
    The collaboration includes a multi-stage study to validate the use of wearable devices and big data analytics to track disease symptoms and develop objective measures for disease progression.
    The big data analytics platform combines Intel’s hardware and software technologies to provide a powerful cloud-based platform to collect, transform, store, and visualize data from sensors.



  • The story
    The name of the man in the picture on the left is Andy Grove and he is one of Intel’s founders and has Parkinson (PD)
    The story begins when he reads an article in the NY times about Big Data and decides to start a project within Intel related to PD and Big Data
    He contacts Michael J fox foundation and they make a decision to start a joint effort together
    The idea is to elaborate Internet of things, wearable's technology and big data platforms to assist PD research
  • Disclaimer: I’m not a neurologist, and do not intend to provide an extensive introduction of the disease

    Parkinson Disease (PD)
    Parkinson’s disease is a degenerative disorder of the central nervous system that is characterized by serious motoric disabilities, such as shaking, rigidity and slowness of movement
    It is also characterized also by complicated non-motoric implications, like low sleeping quality, depression and tendency to compulsive behavior
    There are ~6M Parkinson patients out of them about 1M in the US and about 5M in the rest of the globe
    1 out 100 people over the age of 60 in the US is a Parkinson patient and in the US only, ~60 thousand new patients are diagnosed every year
    Parkinson patients life expectancy is usually between 10 to 15 years
    There is on cure for the disease and existing medications are mainly for improving patients quality of life by helping with symptoms

    The disease progresses slowly, meaning that changes in patients condition and disease progress can be observed only over the course of months or years, making management and research of disease difficult
    Parkinson test & progression are subjectively assessed by physicians being and there is no standard test or progression marker




    Parkinson’s is a complex disease with symptoms and treatment responses that vary widely. The disease progresses slowly, meaning that changes in clinical and molecular features can be observed only over the course of months or years, making management of and research into Parkinson’s disease difficult. Today, the diagnosis of Parkinson’s, assessment of disease progression, and clinical trials for treatments and medications have largely relied on periodic clinical assessments by a physician and on patient reports. The advent of wearable computing and big data analytics could dramatically enhance our understanding of Parkinson’s disease by enabling scientists and physicians to gather vital data continuously and unobtrusively, without putting a burden on patients, and in significantly larger populations than in traditional clinical trials.


  • So… Why do we really need this solution? I’ll try to describe part of the challenges we’re addressing
    One of the main challenges is the lack of objective measures, both for patients and physicians
    Today, patients are monitored only during occasional clinical visits – usually every 3-6 months
    In those visits, mainly due to stress, patients behave differently and their daily collected reports are subjective to their opinion
    In addition and as I mentioned in previous slide – the disease progress is also subjective and highly depended on physicians observation during patient visits

    Additional challenge is related to clinical trials
    Today, only small amount of data is available to the research community
    Collecting meaningful amounts of good and reliable data is not trivial

    Only small amounts of data is available to the research community:
    One of the main reasons is the fact that the cost of clinical trials is in the scales of millions of dollars, takes extensive time and effort to arrange and complete
    Today, there are very small number of patients contributing to research resulting in small trial sizes – actually, average trial size is less than 100 patients
    In addition, collecting data in fine granularity & good quality cannot scale due to trial’s technology limitations






    Lack Need to handcraft medication regime
    Levodopa’s positive effect progressively declines, and some patients suffers from dyskinesia
    No biomarkers (diagnosis is hard: PD ≠ Parkinsonism)
  • Actually, the theoretical solution very simple, we just ask patients to do 2 very simple steps…
    Wear a watch & start a cell phone application

    If we want patients to follow those two simple steps we must make sure that the value that our solution is providing is greater than the burden on patients
    That is our mission definition
    In our solution we’ll address main challenges I reviewed in previous slides by:
    Continuous collecting of movement data and objective measurement 24 hours a day, 7 days a week and 365 days a year
    And providing value to patients by providing them real time insights on their disease and condition (in the form of activity level, tremor detection and sleep quality indicators)
  • So far, so good… we have provided value to patients and collected objective measurement – but what’s next? Who will use this? What are the main use cases?
    After collection of the data, advanced analytics algorithms are applied on it (I’ll elaborate on those later on) and it is being saved into Intel Big Data Cloud platform
    Using this data we’re providing:
    Researchers access to free reliable data of thousands of patients of patients (LDopa)
    Clinicians get accurate repot for their patients condition since their last visit (RealPD)
    And pharmaceutical companies the capability to measure their medication effectiveness during its test phase ()
  • Either use real demo or show the next 2 slides
  • As mentioned before The application is based on sensorial data
  • The solution is based on a self developed generic Internet Of Things platform
    Platform allows “Things” which can practically be any type of devices with some kind of internet connectivity (can be direct or using gateways) sending data to the platform
    Data transmission can be done using different protocols and can be transformed during transition or after landing in cloud
    The Entire code stack for this platform is based on open source with Hadoop eco system at its core
    The platform is cloud based and offers application developers tools to develop their own application on top of it. The key tools are:
    Data storage which is based on top of Hadoop and HBase
    Analytics platform – allows both batch and stream analytics developments
    Built in analytics features such as near real time rule engine and change detection engines
    And data extraction tools such as an export service

  • Parkinson disease solution is was developed on top of the generic IoT platform I described a moment ago
    I’ll quickly review the different layers and will dive into few of those later on
    Computing services are
    Batch Layer based on Spark
    Storage layer using Hadoop, HBase & MySQL for Metadata
    Powerful, scalable ingestion layer based on Akka & Kafka
    A dynamic stream analytics layer based on Akka actor system framework
    Scalable Service layer providing set of APIs for registration & data extraction out of the platform
    UI layer – the only layer in this diagram which is unique to PD solution – using Pebble watch and Android application to collect data and interact with patients
    You can note that 5 out of the presented 6 layers (excluding the UI layer) are part of the IoT platform and can be used for similar products / verticals
  • Need to redo the whole slide
  • Need to redo the whole slide
  • Need to redo the whole slide
  • This is the activity level of the same subject in his two successive visits in the clinic. In the on state and in the off state (the two sessions were recorded in different days). Although the patient repeated the same (or at least highly similar) protocol in the two visits, we can see that his activity level while in OFF is around half than the activity level while ON. We checked, and this result is also seen while comparing particular activities.
  • Need to redo the whole slide
  • ×