Real time machine learning

9,353 views

Published on

Published in: Technology, Education

Real time machine learning

  1. 1. 1 Real-time Machine Learning Vinoth Kannan Intelligent software architecture using Modified Lambda architecture & Apache Mahout SkillFactory 71 Vinoth.kannan@widas.de
  2. 2. 2 Agenda What is Machine Learning ? Need for Real Time Machine Learning What is Lambda architecture ? What is Mahout ? How does a basic recommendor engine works ? Some Use Cases
  3. 3. 3 What is machine learning?
  4. 4. 4 Introduction Machine Learning from Streaming Data Model that considers recent history Model that is updatable Machine Learning It has been sunny and 30 degrees in the last two days, it is unlikely that it will be -10 degrees and snowing the next day A retail sales model that remains accurate as the business gets larger Dont they both mean the same ??
  5. 5. 5 Introduction Machine Learning from Streaming Data Time-series prediction non-stationary data distributions weather Retail sales Model that considers recent history Model that is updatable
  6. 6. 6 Introduction Machine Learning from non-stationary data distributions Incremental Algorithms non-stationary data distributions Batch algorithm These are machine learning algorithms that learn incrementally over the data. These are machine learning algorithms that re-trains periodically with a batch algorithm.
  7. 7. 7 Introduction The Challenge for the Best Big Data Technology Hadoop Batch processing System that can churn huge volume of data Storm Real time complex event processing System that can process data stream
  8. 8. Wrong Fight !!!
  9. 9. 9 + = Real-time Big Data Its a Chance not a Challenge Lambda Architecture!!!
  10. 10. 10 Lambda Architecture Overview Speed Layer Serving layer Batch layer
  11. 11. Speed Layer • Only new data • Compensates for high latency Serving layer updates • Batch layer overrides speed layer Serving layer • Loads and expose the batch views for querying • Random access to batch views Batch layer • Immutable, constantly growing datasets • Batch views are computed from this raw dataset Lambda Architecture Overview with description
  12. 12. Basic Idea behind Lambda architecture 12 query = function(all data) - Nathan Marz Big Data - Principles and best practices of scalable realtime data systems
  13. 13. Basic Idea behind Lambda 13 Perform some function from real-time data “0“ to the history data “n“ Real Time Big Data Lambda Architecture Hadoop ProcessStorm ProcessReal Time Big Data } } }Letting the History data processed by Hadoop makes process faster
  14. 14. The Problem 14 Batch ProcessReal-timeReal Time Big Data } } } • How to define the boundery between Real-time and Batch Process ? • How to synchronize the computation between the two system ? • How to avoid gaps and overlaps ? • What algorithm to use? • How to avoid failure and have fault tolerance mechanism ? Questions to be answered Unanswered questions of Lambda architecture
  15. 15. Modified Lamda Architecture Presentation Layer • Presentation layer must aggregate the output of Storm and Hadoop outputs • User will see the result of his events in less than 2 seconds • Seamless merge between short and long term data
  16. 16. Machine Learning with Mahout 16
  17. 17. 17 What is Mahout ? Introduction • Apache Software Foundation Java library • Scalable “machine learning“ library that runs on Hadoop mostly • Currently Mahout supports mainly four use cases Recommendation Clustering Classification Frequent Itemset mining • Core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm
  18. 18. 18 Basic Recommendor algorithm How it works Today‘s FOCUS : Suggesting item to user based on current search
  19. 19. 19 Basic Recommendor algorithm Defining recommendation Two broad categories of recommender engine algorithms Mahout implements a collabrative filtering framework User-based Recommends items by finding similar users. Harder to scale because of dynamic nature of users Item-based Calculate similiarty between items and make recommendations. Items usually dont change much and hence could be calculated offline
  20. 20. 20 Basic Recommendor algorithm Defining recommendation User Preference to an Item • Like Something • Dont Like something • Dont Care 1 Click = 1 Like = Uniform Preference Safe to assume
  21. 21. Mahout Library of Algorithms Lots of algorithms to Choose From
  22. 22. Use Cases Real Time Machine Learning eCommerce Objective : Increase sales revenue Match potential customer to the right product Personalise user experience on web and email Customer lifecycle management
  23. 23. Use Cases Real Time Machine Learning Financial Services Objective : Real Time Fraud Detection Compute patterns/ predictors for individual customers Classify and Cluster custumers and recalculate patterns and predictors Set threshold across all data
  24. 24. Use Cases Real Time Machine Learning Media Objective : Generating Meta Data Video/ Audio/Text analysis Find patterns/cluster for people, places, products, things
  25. 25. Use Cases Real Time Machine Learning Carbookplus Objective : Generating Meta Data Match potential trips to right destination Recommend best gas station Recommend contacts whom user might know Match right advertisers to customer based on vehcile needs
  26. 26. Summary Ability to create real time systems based on lambda architecture Usefulness of predictive algorithms Reason to concentrate on real time predicitions More Read http://storm-project.net/ http://mahout.apache.org/ http://hadoop.apache.org/ 26
  27. 27. 27 Thank You

×