Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Applying Machine Learning to Live Patient Data

1,138 views

Published on

Applying Machine Learning to Live Patient Data using Apache Spark

Published in: Software

Applying Machine Learning to Live Patient Data

  1. 1. ® © 2017 MapR Technologies 1® 1MapR Confidential © 2017 MapR Technologies ® Applying Machine Learning to Live Patient Data Carol McDonald (@caroljmcdonald) & Joseph Blue (@joebluems) March 15, 2017
  2. 2. ® © 2017 MapR Technologies 2® 2MapR Confidential Data-Driven Experience
  3. 3. ® © 2017 MapR Technologies 3® 3MapR Confidential The Promise of Big Data in Healthcare SMARTERBIGGER FASTER
  4. 4. ® © 2017 MapR Technologies 4® 4MapR Confidential Life moves pretty fast. If you don't stop and look around once in a while, you could miss it. Ferris Bueller, Fictional High School Student
  5. 5. ® © 2017 MapR Technologies 5® 5MapR Confidential Reading an EKG P Q R S T atrial depolarization ventricular depolarization ventricular repolarization
  6. 6. ® © 2017 MapR Technologies 6® 6MapR Confidential Windowing the EKG for Clustering window length = 32, step size = 2
  7. 7. ® © 2017 MapR Technologies 7® 7MapR Confidential Displaying Centroids Showing 25 of K=400 centroids Begin reconstruction
  8. 8. ® © 2017 MapR Technologies 8® 8MapR Confidential Reconstructing the Signal 1 2 1 2 + window length = 32, step size = 16
  9. 9. ® © 2017 MapR Technologies 9® 9MapR Confidential Diagnosing the Anomalies residuals
  10. 10. ® © 2017 MapR Technologies 10® 10MapR Confidential Putting it all together… shape catalog input reconstruct encoder t-digest error quantile estimator
  11. 11. ® © 2017 MapR Technologies 11® 11MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies Use Case Architecture
  12. 12. ® © 2017 MapR Technologies 12® 12MapR Confidential Lots of things are producing Streaming Data Data Collection Devices Smart Machinery Phones and Tablets Home Automation RFID Systems Digital Signage Security Systems Medical Devices
  13. 13. ® © 2017 MapR Technologies 13® 13MapR Confidential Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers Streams capture unbounded sequences of events Old Message New Message Events are delivered in the order they are received, like a queue. Kafka API Kafka API
  14. 14. ® © 2017 MapR Technologies 14® 14MapR Confidential Stream Topics Organize Events into Categories Consumers Consumers Consumers Producers Producers Producers MapR-FS Kafka API Kafka API Unlike a queue messages are not deleted, allows processing of same event for different views
  15. 15. ® © 2017 MapR Technologies 15® 15MapR Confidential Predictive Analytics Machine Learning Algorithms Test Model Predictions Model Evaluation Predictive Model Predictions Model Building Model scoring Featurization Historical Data + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ New Data Stream Topic
  16. 16. ® © 2017 MapR Technologies 16® 16MapR Confidential Stream Processing Architecture Serve DataCollect DataData Sources Stream Processing Derive features process Batch Processing Model build model update model Machine- learning Models Devices Feature extraction Stream Topic Images HL7 Social Media lab Stream Topic
  17. 17. ® © 2017 MapR Technologies 17® 17MapR Confidential // put data in a vector val vrdd = rdd.map(line => Vectors.dense(line.split('t').map(_.toDouble))) //window and normalize each record.... // call Kmeans , which returns the model val model = KMeans.train(processed, 300, 10) model.save(sc, "/user/user01/data/anomaly-detection-master") Build Model
  18. 18. ® © 2017 MapR Technologies 18® 18MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies Use the Model with Streaming Data
  19. 19. ® © 2017 MapR Technologies 19® 19MapR Confidential Use Case: Real Time Anomaly Detection real-time monitoring read EKG data Spark processing enrich with cluster normalized data Spark Streaming Stream Topic Stream Topic 17.9200 12.8000 38.4000 {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
  20. 20. ® © 2017 MapR Technologies 20® 20MapR Confidential Create a DStream DStream: a sequence of RDDs representing a stream of data val model = KMeansModel.load(ssc.sparkContext, modelpath) val messagesDStream = KafkaUtils.createDirectStream[String, String]( ssc, LocationStrategies.PreferConsistent, consumerStrategy ) batch time 0 to 1 batch time 1 to 2 batch time 2 to 3 dStream Stored in memory as an RDD Stream Topic
  21. 21. ® © 2017 MapR Technologies 21® 21MapR Confidential Process DStream // get message values from key,value val valuesDStream: DStream[String] = messagesDStream.map(_.value()) valuesDStream.foreachRDD { rdd => val producer = KafkaProducerFactory.getOrCreateProducer(conf) .... // enrich message with model val cluster = model.predict(processed) .... val record = new ProducerRecord(topicp, "key", message) // send enriched message producer.send(record) } }
  22. 22. ® © 2017 MapR Technologies 22® 22MapR Confidential Process DStream dStream RDDs batch time 2 to 3 batch time 1 to 2 batch time 0 to 1 ValueDStream RDDs Transformed RDDs map map map Stream Topic
  23. 23. ® © 2017 MapR Technologies 23® 23MapR Confidential Use Case: Real Time Anomaly Detection real-time monitoring read Spark processing enrich with cluster normalized data Spark Streaming Stream Topic Vert.x HTTP Event bus WebSocket Event Bus Framework {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]} {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
  24. 24. ® © 2017 MapR Technologies 24® 24MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies
  25. 25. ® © 2017 MapR Technologies 25® 25MapR Confidential Resources •  EKG basics - http://en.wikipedia.org/wiki/Electrocardiography •  Source data - http://physionet.org/physiobank/database/apnea-ecg/ •  K-Means basics - http://www.coursera.org/learn/machine-learning/lecture/93VPG/k- means-algorithm •  Code repositories –  Streaming: http://github.com/caroljmcdonald/sparkml-streaming-ekg –  UI: http://github.com/caroljmcdonald/mapr-streams-vertx-dashboard •  t-digest for anomalies - http://github.com/tdunning/t-digest
  26. 26. ® © 2017 MapR Technologies 26® 26MapR Confidential e-book available courtesy of MapR https://www.mapr.com/practical-machine- learning-new-look-anomaly-detection A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman (published by O’Reilly)
  27. 27. ® © 2017 MapR Technologies 27® 27MapR Confidential MapR Blog mapr.com/blog
  28. 28. ® © 2017 MapR Technologies 28® 28MapR Confidential Q&A @mapr Engage with us! mapr-technologies Carol McDonald (@caroljmcdonald) Joseph Blue (@joebluems)

×