Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real-time personal trainer on the SMACK stack

869 views

Published on

Muvr is a real-time personal trainer system. It must be highly available, resilient and responsive, and so it relies on heavily on Spark, Mesos, Akka, Cassandra, and Kafka—the quintuple also known as the SMACK stack. In this talk, we are going to explore the architecture of the entire muvr system, exploring, in particular, the challenges of ingesting very large volume of data, applying trained models on the data to provide real-time advice to our users, and training & evaluating new models using the collected data. We will specifically emphasize on how we have used Cassandra for consuming lots of fast incoming biometric data from devices and sensors, and how to securely access the big data sets from Cassandra in Spark to compute the models.

We will finish by showing the mechanics of deploying such a distributed application. You will get a clear understanding of how Mesos, Marathon, in conjunction with Docker, is used to build an immutable infrastructure that allows us to provide reliable service to our users and a great environment for our engineers.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Real-time personal trainer on the SMACK stack

  1. 1. Real-time personal trainer on the SMACK stack
 
 @honzam399 Jan Machacek 
 @anirvan_c Anirvan Chakraborty 

  2. 2. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Automated personal trainer - muvr • Suggests the sequence of exercise sessions • Suggests exercises in a session, including exercise parameters (e.g. weight, repetitions, …) • Provides tips on proper exercise form • With additional hardware (smartwatch, smart clothes), muvr provides • Completely unobtrusive exercise experience • More accurate tips on proper exercise form • With over–fitting, it is usable for physiotherapy
  3. 3. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Architecture
  4. 4. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Privacy
  5. 5. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 The technologies—iOS • Learns the users’ behaviour • Exercise sessions • Exercises within exercise session • Short–term prediction of [scalar] labels for the exercises • Performs the real–time analysis of the incoming sensor data • Advised by the expected behaviour • Signal processing to compute repetitions / strokes • Forward–propagation to label the exercise • Submits all recorded sensor data and confirmed (!) labels per session • Handles offline / travel modes • Synchronises the data across the user’s devices using iCloud
  6. 6. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 The technologies—Akka • Reactive services for user profiles, model parameters, and sensor data • CQRS/ES implementation, which helps to • Handle peaks in load • Handle failures of individual nodes • Reason about the scope of the mutable state we keep • Uses Cassandra for its journal and snapshot stores • The written values are binary “blobs” • Writes the sensor data to Cassandra • Writes the sensor data in “readable” form; it can be read outside the Akka / Scala world • Reads the model and exercise parameters from Cassandra • It selects the best / newest model parameters to serve to the mobile app
  7. 7. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 The technologies—Spark • Distributed computation framework • “Big data” tasks • Integrates extremely well with Cassandra • Reads and processes the profiles and sensor data • Identifies clusters of users on their profile information • Slices the sensor inputs by sensor types • Writes the results to another store • Runs in batches • Executes by schedule (typically once a day)
  8. 8. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 The technologies—neon • A machine learning framework, including • “The usual” suspects in tensor algebra • Signal processing • Different ML approaches • Training and evaluation programs • Both programs terminate either upon discovering the perfect model or when their budget is up • Reads clustered training and testing data from the Spark job • Writes the model parameters and evaluation result to Cassandra
  9. 9. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 The technologies—Cassandra • Underpins the entire platform • Journal and snapshot store for Akka • Sensor data store • Model parameter store • “Summary” store • High availability • No single point of failure • High read and write • Replication factor • Tuneable consistency level
  10. 10. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0
  11. 11. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Spark & Cassandra • Group the sensor data into n clusters by user profile with biometric ID • Expand the sensor data • Slices of the sensor data by combinations of accelerometer, gyroscope, heart rate, targeted muscle group strain gauges, … • 1 user = 1 MiB from one sensor per hour; but 4 sensors expand into 4! MiB • Trivial tasks • The most popular user–contributed exercises • The most popular exercise sessions and exercises within the sessions • The most effective (by overall fitness improvement, weight loss, muscle mass gain, …) exercise sessions
  12. 12. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Production ML Take the data from Cassandra (written there by the Spark jobs) and: • Split into training and test datasets • Fit models for various sensor types • Save model parameters • Evaluate the newly fitted models, and re-evaluate old data
  13. 13. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Production ML • We are using convolutional network • 2 seconds of sensor data input (e.g. a @ 50 Hz for accelerometer; a, g @ 50 Hz for accelerometer + gyroscope; u, l @ 10 Hz for smart clothes) • The exercise classes as the outputs • The training program • CNN in neon • Loads the mini–batches from Cassandra • Fits the model; evaluates the fitted model • Saves the model parameters into Cassandra • The re–evaluation program • Re–evaluates past n models against the latest training dataset; computing accuracy, precision, recall, f1
  14. 14. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Having code is jolly good
  15. 15. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Running it • Simplicity • Ease of orchestration • Ease of development • Support for polyglot frameworks and components • Cost effective resource utilisation
  16. 16. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Docker • Deploy reliably & consistently • Execution is fast and light weight • Simplicity • Developer friendly workflow • Fantastic community
  17. 17. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Dockerize Cassandra Dev Environment • Super low memory settings in cassandra-env.sh • MAX_HEAP_SIZE=“128M” • HEAP_NEWSIZE=“24M” • Remove caches in dev mode in cassandra.yml • key_cache_size_in_mb: 0 • reduce_cache_sizes_at: 0 • reduce_cache_capacity_to: 0
  18. 18. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Dockerize Cassandra Production • Use host networking (—net=host) for better network performance • Put data, commitlog and saved_caches in volume mount folders to the underlying host • Run cassandra on the foreground using (-f) • Tune JVM heap for optimal size • Tune JVM garbage collector for your workload
  19. 19. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Mesos • Distributed systems kernel • Scales to 10,000s of nodes • Depends on Zookeeper for fault tolerance and high availability • Creates a highly available, scalable single resource pool • Automatic failover • Ease of management • Simple to operate • Support for Docker container
  20. 20. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Mesos architecture image source: https://assets.digitalocean.com/articles/mesosphere/mesos_architecture.png
  21. 21. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Cassandra on Mesos • Running Cassandra as Docker containers • Custom Dockerfile and entry-point script to control Cassandra configuration • Marathon to initialize and control
  22. 22. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Cost effective resource in AWS • Embrace AWS spot instances • About 50-60% cheaper than on demand instances • Can be reclaimed without notice if outbidded • Run dev and staging on spot instances • Run Spark jobs on spot instances
  23. 23. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0
  24. 24. © 2016 Cake Solutions Limited CC BY-NC-SA 4.0 Thanks! Twitter: @cakesolutions
 Tel: 0845 617 1200 Email: enquiries@cakesolutions.net Jobs: http://www.cakesolutions.net/ careers

×