®
© 2017 MapR Technologies 1® 1MapR Confidential © 2017 MapR Technologies
®
Applying Machine Learning to
Live Patient Data
Carol McDonald (@caroljmcdonald) & Joseph Blue (@joebluems)
March 15, 2017
®
© 2017 MapR Technologies 2® 2MapR Confidential
Data-Driven Experience
®
© 2017 MapR Technologies 3® 3MapR Confidential
The Promise of Big Data in Healthcare
SMARTERBIGGER FASTER
®
© 2017 MapR Technologies 4® 4MapR Confidential
Life moves pretty fast. If you don't
stop and look around once in a
while, you could miss it.
Ferris Bueller, Fictional High School Student
®
© 2017 MapR Technologies 5® 5MapR Confidential
Reading an EKG
P
Q
R
S
T
atrial
depolarization
ventricular
depolarization
ventricular
repolarization
®
© 2017 MapR Technologies 6® 6MapR Confidential
Windowing the EKG for Clustering
window length = 32, step size = 2
®
© 2017 MapR Technologies 7® 7MapR Confidential
Displaying Centroids
Showing 25 of K=400 centroids
Begin reconstruction
®
© 2017 MapR Technologies 8® 8MapR Confidential
Reconstructing the Signal
1 2
1
2
+
window length = 32,
step size = 16
®
© 2017 MapR Technologies 9® 9MapR Confidential
Diagnosing the Anomalies
residuals
®
© 2017 MapR Technologies 10® 10MapR Confidential
Putting it all together…
shape
catalog
input reconstruct
encoder t-digest
error
quantile
estimator
®
© 2017 MapR Technologies 11® 11MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies
Use Case Architecture
®
© 2017 MapR Technologies 12® 12MapR Confidential
Lots of things are producing Streaming Data
Data Collection
Devices
Smart Machinery Phones and Tablets Home Automation
RFID Systems Digital Signage Security Systems Medical Devices
®
© 2017 MapR Technologies 13® 13MapR Confidential
Consumers
MapR Cluster
Topic: Admission / Server 1
Topic: Admission / Server 2
Topic: Admission / Server 3
Consumers
Consumers
Partition
1
Partition
2
Partition
3
6 5 4 3 2 1
3 2 1
5 4 3 2 1
Producers
Producers
Producers
Streams capture unbounded sequences of events
Old
Message
New
Message
Events are delivered in the order they are received, like a queue.
Kafka API Kafka API
®
© 2017 MapR Technologies 14® 14MapR Confidential
Stream Topics Organize Events into Categories
Consumers
Consumers
Consumers
Producers
Producers
Producers
MapR-FS
Kafka API Kafka API
Unlike a queue messages are not deleted, allows processing of same event
for different views
®
© 2017 MapR Technologies 15® 15MapR Confidential
Predictive Analytics
Machine
Learning
Algorithms
Test Model
Predictions
Model Evaluation
Predictive
Model
Predictions
Model
Building
Model
scoring
Featurization
Historical Data
+
+
̶+
̶ ̶
+
+
̶+
̶ ̶
New Data
Stream
Topic
®
© 2017 MapR Technologies 16® 16MapR Confidential
Stream Processing Architecture
Serve DataCollect DataData Sources Stream Processing
Derive
features
process
Batch Processing
Model
build model
update model
Machine-
learning
Models
Devices
Feature
extraction
Stream
Topic
Images
HL7
Social Media
lab
Stream
Topic
®
© 2017 MapR Technologies 17® 17MapR Confidential
// put data in a vector
val vrdd = rdd.map(line =>
Vectors.dense(line.split('t').map(_.toDouble)))
//window and normalize each record....
// call Kmeans , which returns the model
val model = KMeans.train(processed, 300, 10)
model.save(sc, "/user/user01/data/anomaly-detection-master")
Build Model
®
© 2017 MapR Technologies 18® 18MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies
Use the Model with Streaming Data
®
© 2017 MapR Technologies 19® 19MapR Confidential
Use Case: Real Time Anomaly Detection
real-time
monitoring
read
EKG
data
Spark processing
enrich with cluster normalized data
Spark
Streaming
Stream
Topic
Stream
Topic
17.9200 12.8000 38.4000 {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
®
© 2017 MapR Technologies 20® 20MapR Confidential
Create a DStream
DStream: a sequence of RDDs
representing a stream of data
val model = KMeansModel.load(ssc.sparkContext, modelpath)
val messagesDStream = KafkaUtils.createDirectStream[String, String](
ssc, LocationStrategies.PreferConsistent, consumerStrategy
)
batch
time 0 to 1
batch
time 1 to 2
batch
time 2 to 3
dStream
Stored in memory
as an RDD
Stream
Topic
®
© 2017 MapR Technologies 21® 21MapR Confidential
Process DStream
// get message values from key,value
val valuesDStream: DStream[String] = messagesDStream.map(_.value())
valuesDStream.foreachRDD { rdd =>
val producer = KafkaProducerFactory.getOrCreateProducer(conf)
....
// enrich message with model
val cluster = model.predict(processed)
....
val record = new ProducerRecord(topicp, "key", message)
// send enriched message
producer.send(record)
}
}
®
© 2017 MapR Technologies 22® 22MapR Confidential
Process DStream
dStream RDDs
batch
time 2 to 3
batch
time 1 to 2
batch
time 0 to 1
ValueDStream RDDs
Transformed RDDs
map map map
Stream
Topic
®
© 2017 MapR Technologies 23® 23MapR Confidential
Use Case: Real Time Anomaly Detection
real-time
monitoring
read
Spark processing
enrich with cluster normalized data
Spark
Streaming
Stream
Topic
Vert.x
HTTP
Event bus
WebSocket Event Bus
Framework
{”c":120,"colA":[17.92,
12.88, ..],"colB":[17.91,
12.89, 0...]}
{”c":120,"colA":[17.92,
12.88, ..],"colB":[17.91,
12.89, 0...]}
®
© 2017 MapR Technologies 24® 24MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies
®
© 2017 MapR Technologies 25® 25MapR Confidential
Resources
•  EKG basics - http://en.wikipedia.org/wiki/Electrocardiography
•  Source data -
http://physionet.org/physiobank/database/apnea-ecg/
•  K-Means basics -
http://www.coursera.org/learn/machine-learning/lecture/93VPG/k-
means-algorithm
•  Code repositories
–  Streaming: http://github.com/caroljmcdonald/sparkml-streaming-ekg
–  UI: http://github.com/caroljmcdonald/mapr-streams-vertx-dashboard
•  t-digest for anomalies - http://github.com/tdunning/t-digest
®
© 2017 MapR Technologies 26® 26MapR Confidential
e-book available courtesy of MapR
https://www.mapr.com/practical-machine-
learning-new-look-anomaly-detection
A New Look at Anomaly Detection
by Ted Dunning and Ellen Friedman (published by O’Reilly)
®
© 2017 MapR Technologies 27® 27MapR Confidential
MapR Blog mapr.com/blog
®
© 2017 MapR Technologies 28® 28MapR Confidential
Q&A
@mapr
Engage with us!
mapr-technologies
Carol McDonald (@caroljmcdonald)
Joseph Blue (@joebluems)

Applying Machine Learning to Live Patient Data

  • 1.
    ® © 2017 MapRTechnologies 1® 1MapR Confidential © 2017 MapR Technologies ® Applying Machine Learning to Live Patient Data Carol McDonald (@caroljmcdonald) & Joseph Blue (@joebluems) March 15, 2017
  • 2.
    ® © 2017 MapRTechnologies 2® 2MapR Confidential Data-Driven Experience
  • 3.
    ® © 2017 MapRTechnologies 3® 3MapR Confidential The Promise of Big Data in Healthcare SMARTERBIGGER FASTER
  • 4.
    ® © 2017 MapRTechnologies 4® 4MapR Confidential Life moves pretty fast. If you don't stop and look around once in a while, you could miss it. Ferris Bueller, Fictional High School Student
  • 5.
    ® © 2017 MapRTechnologies 5® 5MapR Confidential Reading an EKG P Q R S T atrial depolarization ventricular depolarization ventricular repolarization
  • 6.
    ® © 2017 MapRTechnologies 6® 6MapR Confidential Windowing the EKG for Clustering window length = 32, step size = 2
  • 7.
    ® © 2017 MapRTechnologies 7® 7MapR Confidential Displaying Centroids Showing 25 of K=400 centroids Begin reconstruction
  • 8.
    ® © 2017 MapRTechnologies 8® 8MapR Confidential Reconstructing the Signal 1 2 1 2 + window length = 32, step size = 16
  • 9.
    ® © 2017 MapRTechnologies 9® 9MapR Confidential Diagnosing the Anomalies residuals
  • 10.
    ® © 2017 MapRTechnologies 10® 10MapR Confidential Putting it all together… shape catalog input reconstruct encoder t-digest error quantile estimator
  • 11.
    ® © 2017 MapRTechnologies 11® 11MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies Use Case Architecture
  • 12.
    ® © 2017 MapRTechnologies 12® 12MapR Confidential Lots of things are producing Streaming Data Data Collection Devices Smart Machinery Phones and Tablets Home Automation RFID Systems Digital Signage Security Systems Medical Devices
  • 13.
    ® © 2017 MapRTechnologies 13® 13MapR Confidential Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers Streams capture unbounded sequences of events Old Message New Message Events are delivered in the order they are received, like a queue. Kafka API Kafka API
  • 14.
    ® © 2017 MapRTechnologies 14® 14MapR Confidential Stream Topics Organize Events into Categories Consumers Consumers Consumers Producers Producers Producers MapR-FS Kafka API Kafka API Unlike a queue messages are not deleted, allows processing of same event for different views
  • 15.
    ® © 2017 MapRTechnologies 15® 15MapR Confidential Predictive Analytics Machine Learning Algorithms Test Model Predictions Model Evaluation Predictive Model Predictions Model Building Model scoring Featurization Historical Data + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ New Data Stream Topic
  • 16.
    ® © 2017 MapRTechnologies 16® 16MapR Confidential Stream Processing Architecture Serve DataCollect DataData Sources Stream Processing Derive features process Batch Processing Model build model update model Machine- learning Models Devices Feature extraction Stream Topic Images HL7 Social Media lab Stream Topic
  • 17.
    ® © 2017 MapRTechnologies 17® 17MapR Confidential // put data in a vector val vrdd = rdd.map(line => Vectors.dense(line.split('t').map(_.toDouble))) //window and normalize each record.... // call Kmeans , which returns the model val model = KMeans.train(processed, 300, 10) model.save(sc, "/user/user01/data/anomaly-detection-master") Build Model
  • 18.
    ® © 2017 MapRTechnologies 18® 18MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies Use the Model with Streaming Data
  • 19.
    ® © 2017 MapRTechnologies 19® 19MapR Confidential Use Case: Real Time Anomaly Detection real-time monitoring read EKG data Spark processing enrich with cluster normalized data Spark Streaming Stream Topic Stream Topic 17.9200 12.8000 38.4000 {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
  • 20.
    ® © 2017 MapRTechnologies 20® 20MapR Confidential Create a DStream DStream: a sequence of RDDs representing a stream of data val model = KMeansModel.load(ssc.sparkContext, modelpath) val messagesDStream = KafkaUtils.createDirectStream[String, String]( ssc, LocationStrategies.PreferConsistent, consumerStrategy ) batch time 0 to 1 batch time 1 to 2 batch time 2 to 3 dStream Stored in memory as an RDD Stream Topic
  • 21.
    ® © 2017 MapRTechnologies 21® 21MapR Confidential Process DStream // get message values from key,value val valuesDStream: DStream[String] = messagesDStream.map(_.value()) valuesDStream.foreachRDD { rdd => val producer = KafkaProducerFactory.getOrCreateProducer(conf) .... // enrich message with model val cluster = model.predict(processed) .... val record = new ProducerRecord(topicp, "key", message) // send enriched message producer.send(record) } }
  • 22.
    ® © 2017 MapRTechnologies 22® 22MapR Confidential Process DStream dStream RDDs batch time 2 to 3 batch time 1 to 2 batch time 0 to 1 ValueDStream RDDs Transformed RDDs map map map Stream Topic
  • 23.
    ® © 2017 MapRTechnologies 23® 23MapR Confidential Use Case: Real Time Anomaly Detection real-time monitoring read Spark processing enrich with cluster normalized data Spark Streaming Stream Topic Vert.x HTTP Event bus WebSocket Event Bus Framework {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]} {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
  • 24.
    ® © 2017 MapRTechnologies 24® 24MapR Confidential © 2016 MapR Technologies© 2017 MapR Technologies
  • 25.
    ® © 2017 MapRTechnologies 25® 25MapR Confidential Resources •  EKG basics - http://en.wikipedia.org/wiki/Electrocardiography •  Source data - http://physionet.org/physiobank/database/apnea-ecg/ •  K-Means basics - http://www.coursera.org/learn/machine-learning/lecture/93VPG/k- means-algorithm •  Code repositories –  Streaming: http://github.com/caroljmcdonald/sparkml-streaming-ekg –  UI: http://github.com/caroljmcdonald/mapr-streams-vertx-dashboard •  t-digest for anomalies - http://github.com/tdunning/t-digest
  • 26.
    ® © 2017 MapRTechnologies 26® 26MapR Confidential e-book available courtesy of MapR https://www.mapr.com/practical-machine- learning-new-look-anomaly-detection A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman (published by O’Reilly)
  • 27.
    ® © 2017 MapRTechnologies 27® 27MapR Confidential MapR Blog mapr.com/blog
  • 28.
    ® © 2017 MapRTechnologies 28® 28MapR Confidential Q&A @mapr Engage with us! mapr-technologies Carol McDonald (@caroljmcdonald) Joseph Blue (@joebluems)