Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Disease Research

3,516 views

Published on

n this session, you will learn about a solution developed in partnership between Intel and the Michael J. Fox foundation to enable breakthroughs in Parkinson's disease (PD) research, by leveraging wearable sensors and smartphone to monitor PD patient's motor movements 24/7. We'll elaborate on how we're using HBase for time-series data storage and integrating it with various stream, batch, and interactive technologies. We'll also review our efforts to create an interactive querying solution over HBase.

Published in: Software
  • Be the first to comment

HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Disease Research

  1. 1. Enable breakthroughs in Parkinson disease research through wearables and Big Data analytics technologies
  2. 2. About us… • Part of the Big Data Analytics Solutions group @Intel reporting to the Data Center Group • Developing products & solutions leveraging: • Big Data edge-technologies • Self developed machine learning & steam analytics algorithms • Our team includes developers, data scientists and system analysts • I am a Big Data Analytics Architect and Development Manager responsible for leading-edge technology projects within Intel involving Big Data and stream analytics solutions in the Internet of Things and Parkinson's disease research 2
  3. 3. How It All Started? 3 Big data analytics IOT
  4. 4. Parkinson’s Disease 4 OVER AGE 0F 60 1/100 60,000 NEW 1M/US 5M/WORLD NO CURE, MEDICATION ONLY HELPS WITH SYMPTOMSThere is NO TEST and no PROGRESSION MARKER PARKINSON’S DISEASE IS CAUSED BY THE DEATH OF DOPAMINE CELLS. OF THESE CELLS ARE ALREADY LOST BY THE TIME MOTOR SYMPTOMS APPEAR. 60 80%TO
  5. 5. Challenges To Address NO OBJECTIVE MEASURE 3-6 MONTHS BETWEEN PHYSICIAN VISITS CHANGES ARE SLOW AND HARD TO DETECT AVERAGE TRIAL SIZE < 100 PATIENTS VERY SMALL number of patients contribute to research COST OF TRIALS are in the scales of $M 5
  6. 6. HOW? 6
  7. 7. The Solution Wear a watch Start an application 7 1 2
  8. 8. Use Cases MANAGE THE DISEASE USING DATA FREE DATA FOR 1000’S OF PATIENTS ACCURATE REPORT SINCE LAST VISIT MEASURE MEDICATION EFFECT RESEARCHER PHARMACEUTICAL CLINICIAN INTEL BIG DATA CLOUDANALYSTICS INSIGHT / VALUE 8
  9. 9. DEMO 9
  10. 10. THE APPLICATION 10
  11. 11. 10 Medication reporting Medication reminder Report something PATIENT REPORTED OTHER Configurable data collections Contribution score Integrated Login and registration Pebble notifications OBJECTIVE MEASURES Gait Sleep Tremor Activity Level Controlled Tests
  12. 12. BIG-DATA and IOT TECHNOLOGIES 12
  13. 13. IoT Cloud Simplified Framework DatacenterNetworkThing Cloud Infrastructure Data Platform Analytics Platform UI Services Gateway 13
  14. 14. SERVICE LAYER BATCH LAYER STREAM ANALYICS LAYER INGESTION LAYER STORAGE LAYER USER INTERFACE LAYER Mosquitt o 14 CLOUD COMPUTING SERVICES
  15. 15. Storage Layer • Cloudera Enterprise Data Hub • HBase as main scalable time series data storage layer • Allows high writes throughput • Random real-time access to stored data • Highly available MySQL as metadata storage 15
  16. 16. • Multi-protocol pipeline built over AKKA & KAFKA • KAKFA is a fast, scalable, durable & distributed messaging system • AKKA is an Actor based framework allowing high concurrency, distributed and resilient based on events / messaging • This layer is responsible for: • Pulling messages • Parse & Process • Concurrent & controlled write HBase Load Balancer Device Device Device Device Mosquito Mosquito 16 Data Ingestion Layer
  17. 17. • Based on AKKA actors framework • Contains millions of concurrent actors handling different streams and operations • Each actor is a small peace of code performing its role • A set of actors creates a topology which is responsible for device’s data stream processing 17 Stream Analytics Layer Subscriber Parser Aggregator HBase Writer Analytics Manager Change Detection UnZip Real Time Rules Sleep Quality
  18. 18. • Based on Apache Spark over HBase • Spark is a fast and general engine for large-scale data processing • Algorithms & Calculations are being executed on large data sets on a daily basis • Layer includes: • Set of complex machine learning algorithms • Rule engine rules baseline calculations 18 Batch Analytics Layer
  19. 19. • Interactive and scalable web services layer • A set of RESTful APIs allowing: • Registration to platform • Row & calculated data retrieval from HBase • Built on top of Play framework and providing secured entry point • Uses Apache Phoenix & native HBase client HBase Load Balancer 19 Service Layer
  20. 20. HBase Challenges 20
  21. 21. Data Ingestion to HBase • Challenge Concurrent ingesting millions messages into HBase creates a massive load on HBase region servers and causes disconnections • Development Evolution 1. HBase client per topology (millions of writers) 2. Pool of HBase clients, each is using separate HTable 3. Pool of HBase clients, all are using same HBase connection pool (HConnectionManager) • Solution Creating “fixed” number of connections to HBase, allowing batch writes and load balancing 21 . . . Hbase Writer Hbase Writer Hbase Writer . .. Pool Router
  22. 22. Table Indicators over Large Tables • Challenge Gathering indicators (i.e. counts) on large HBase tables results in long table scans and performance reduction • Solution • Real time update new indicator columns using incrementColumnValue • Allows atomic increment of a specific column • Large table counts successfully implemented • Allowed implementation of required Indicators • Real time hourly counts • Real time Max values (i.e. last time a user transmitted data) 22
  23. 23. Batch Processing Input Format • Challenge • Batch processing is done using Spark – InputFormat is required for scan • TableInputFormat was used and is equivalent to a single scan • Poor performance when data from “remote” parts of a table are required • Solution • Using MultiTableInputFormat • Allows usage of multiple scans • Successfully used with more than 100 scans per MultiTableInputFormat 23
  24. 24. ANALYTICS 24
  25. 25. Activity Level • Measure that will continuously describe the intensity of the patient’s activity throughout the day and will motivate the patients to be more active. • Motivates the patients to be more active (known to be important for PD patients) • Personalized measure per patient based on their average activity of walking periods (avoid frustration) • Based on intensity measurement from the accelerometer • Filters our tremor as 25
  26. 26. Activity Level – An Example 26 Activity Level in Controlled Session (ON State) Activity Level in Controlled Session (OFF State)
  27. 27. Tremor • Tremor is one of the most obvious symptoms of PD • Most PD patients experience tremor • Tremor is detectable using signal processing techniques 27
  28. 28. TRAILS AND PARTNERS 28
  29. 29. REAL PD L-DOPA RESPONSE TRIAL DATA GATHERING TRIAL FOX INTEL APPLICATION TRIAL 1000 50 30 20 FOX INSIGHT WEAR 1000 20 20 30 20 10 29 Trial And Partners SCRIPS TREMMOR TRIAL 1000
  30. 30. WHAT’S NEXT? 30
  31. 31. SCALE PLATFORM • Scale to 1000’s of patients in the US • Scale to 1000’s of patients in the Netherlands • IOS support • Support additional wearable's • Build more value generating capabilities • Upgrade to HBase 1.0 • Upgrade Spark to 1.3 • Enrich Platform (i.e. Advanced Export, Reporting) • Enrich Parkinson Disease solution • Analytics • Value to patients 31
  32. 32. Q&A
  33. 33. Thank you! 33
  34. 34. • Strategic direction 34 R: 0 G: 112 B: 197 R: 247 G: 127 B: 0 R: 130 G: 170 B: 50 R: 0 G: 172 B: 240

×