SlideShare a Scribd company logo
1 of 27
1
Real-time Machine Learning
Vinoth Kannan
Intelligent software architecture using Modified Lambda architecture
& Apache Mahout
SkillFactory 71
Vinoth.kannan@widas.de
2
Agenda
What is Machine Learning ?
Need for Real Time Machine Learning
What is Lambda architecture ?
What is Mahout ?
How does a basic recommendor engine works ?
Some Use Cases
3
What is machine learning?
4
Introduction
Machine Learning from Streaming Data
Model that
considers recent
history
Model that is updatable
Machine Learning
It has been sunny and 30
degrees in the last two days, it is
unlikely that it will be -10 degrees
and snowing the next day
A retail sales model that
remains accurate as the
business gets larger
Dont they both mean the same ??
5
Introduction
Machine Learning from Streaming Data
Time-series prediction non-stationary data
distributions
weather Retail sales
Model that
considers recent
history
Model that is updatable
6
Introduction
Machine Learning from non-stationary data distributions
Incremental Algorithms
non-stationary data
distributions
Batch algorithm
These are machine
learning algorithms that
learn incrementally over
the data.
These are machine
learning algorithms that
re-trains periodically
with a batch algorithm.
7
Introduction
The Challenge for the Best Big Data Technology
Hadoop
Batch processing
System that can churn
huge volume of data
Storm
Real time complex event
processing System that
can process data stream
Wrong Fight !!!
9
+ =
Real-time
Big Data
Its a Chance not a Challenge
Lambda Architecture!!!
10
Lambda Architecture
Overview
Speed Layer
Serving layer
Batch layer
Speed Layer
• Only new data
• Compensates for high latency
Serving layer updates
• Batch layer overrides speed
layer
Serving layer
• Loads and expose the batch
views for querying
• Random access to batch views
Batch layer
• Immutable, constantly growing
datasets
• Batch views are computed from
this raw dataset
Lambda Architecture
Overview with description
Basic Idea behind Lambda architecture
12
query = function(all data)
- Nathan Marz
Big Data - Principles and best practices of scalable realtime data systems
Basic Idea behind Lambda
13
Perform some function from real-time data “0“ to the history data “n“
Real Time Big Data
Lambda Architecture
Hadoop ProcessStorm ProcessReal Time Big Data
}
}
}Letting the History data processed by Hadoop makes process faster
The Problem
14
Batch ProcessReal-timeReal Time Big Data
}
}
}
• How to define the boundery between Real-time and Batch
Process ?
• How to synchronize the computation between the two
system ?
• How to avoid gaps and overlaps ?
• What algorithm to use?
• How to avoid failure and have fault tolerance mechanism ?
Questions to be answered
Unanswered questions of Lambda architecture
Modified Lamda Architecture
Presentation Layer
• Presentation layer must aggregate the output of
Storm and Hadoop outputs
• User will see the result of his events in less than 2
seconds
• Seamless merge between short and long term data
Machine Learning with Mahout
16
17
What is Mahout ?
Introduction
• Apache Software Foundation Java library
• Scalable “machine learning“ library that runs on Hadoop mostly
• Currently Mahout supports mainly four use cases
Recommendation Clustering
Classification
Frequent Itemset
mining
• Core algorithms for clustering, classfication and batch based
collaborative filtering are implemented on top of Apache Hadoop using
the map/reduce paradigm
18
Basic Recommendor algorithm
How it works
Today‘s FOCUS : Suggesting item to user based on current search
19
Basic Recommendor algorithm
Defining recommendation
Two broad categories of recommender engine algorithms
Mahout implements a collabrative filtering framework
User-based
Recommends items by
finding similar users.
Harder to scale because of
dynamic nature of users
Item-based
Calculate similiarty between
items and make
recommendations.
Items usually dont change
much and hence could be
calculated offline
20
Basic Recommendor algorithm
Defining recommendation
User Preference to an Item
• Like Something
• Dont Like something
• Dont Care
1 Click = 1 Like = Uniform Preference
Safe to assume
Mahout Library of Algorithms
Lots of algorithms to Choose From
Use Cases
Real Time Machine Learning
eCommerce
Objective : Increase sales revenue
Match potential customer to the right product
Personalise user experience on web and email
Customer lifecycle management
Use Cases
Real Time Machine Learning
Financial Services
Objective : Real Time Fraud Detection
Compute patterns/ predictors for individual
customers
Classify and Cluster custumers and recalculate
patterns and predictors
Set threshold across all data
Use Cases
Real Time Machine Learning
Media
Objective : Generating Meta Data
Video/ Audio/Text analysis
Find patterns/cluster for
people, places, products, things
Use Cases
Real Time Machine Learning
Carbookplus
Objective : Generating Meta Data
Match potential trips to right destination
Recommend best gas station
Recommend contacts whom user might know
Match right advertisers to customer based on
vehcile needs
Summary
Ability to create real time systems based on lambda
architecture
Usefulness of predictive algorithms
Reason to concentrate on real time predicitions
More Read
http://storm-project.net/
http://mahout.apache.org/
http://hadoop.apache.org/
26
27
Thank You

More Related Content

What's hot

Phased life cycle model
Phased life cycle modelPhased life cycle model
Phased life cycle modelStephennancy
 
Agile Process models
Agile Process modelsAgile Process models
Agile Process modelsStudent
 
Network penetration testing
Network penetration testingNetwork penetration testing
Network penetration testingImaginea
 
Lecture 19 design concepts
Lecture 19   design conceptsLecture 19   design concepts
Lecture 19 design conceptsIIUI
 
Introduction to Requirement engineering
Introduction to Requirement engineeringIntroduction to Requirement engineering
Introduction to Requirement engineeringNameirakpam Sundari
 
Secure Software Development Lifecycle
Secure Software Development LifecycleSecure Software Development Lifecycle
Secure Software Development Lifecycle1&1
 
Análise, projeto e implementação de sistemas
Análise, projeto e implementação de sistemasAnálise, projeto e implementação de sistemas
Análise, projeto e implementação de sistemasDiego Marek
 
software project management Waterfall model
software project management Waterfall modelsoftware project management Waterfall model
software project management Waterfall modelREHMAT ULLAH
 
Lecture 6 agile software development
Lecture 6   agile software developmentLecture 6   agile software development
Lecture 6 agile software developmentIIUI
 
Spiral Model - Software Development Life Cycle (SDLC)
Spiral Model - Software Development Life Cycle (SDLC)Spiral Model - Software Development Life Cycle (SDLC)
Spiral Model - Software Development Life Cycle (SDLC)ACM-KU
 
TIME SYNCHRONIZATION IN WIRELESS SENSOR NETWORKS: A SURVEY
 TIME SYNCHRONIZATION IN WIRELESS SENSOR NETWORKS: A SURVEY TIME SYNCHRONIZATION IN WIRELESS SENSOR NETWORKS: A SURVEY
TIME SYNCHRONIZATION IN WIRELESS SENSOR NETWORKS: A SURVEYijujournal
 
10 software maintenance
10 software maintenance10 software maintenance
10 software maintenanceakiara
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systemsknowdiff
 
Gerência de Configuração
Gerência de ConfiguraçãoGerência de Configuração
Gerência de ConfiguraçãoWagner Zaparoli
 
Component Based Development
Component Based DevelopmentComponent Based Development
Component Based DevelopmentBen McCormick
 

What's hot (20)

Phased life cycle model
Phased life cycle modelPhased life cycle model
Phased life cycle model
 
Agile Process models
Agile Process modelsAgile Process models
Agile Process models
 
Network penetration testing
Network penetration testingNetwork penetration testing
Network penetration testing
 
Lecture 19 design concepts
Lecture 19   design conceptsLecture 19   design concepts
Lecture 19 design concepts
 
Data fusion
Data fusionData fusion
Data fusion
 
Introduction to Requirement engineering
Introduction to Requirement engineeringIntroduction to Requirement engineering
Introduction to Requirement engineering
 
Wot
WotWot
Wot
 
Secure Software Development Lifecycle
Secure Software Development LifecycleSecure Software Development Lifecycle
Secure Software Development Lifecycle
 
Análise, projeto e implementação de sistemas
Análise, projeto e implementação de sistemasAnálise, projeto e implementação de sistemas
Análise, projeto e implementação de sistemas
 
Modelos de processos de software
Modelos de processos de softwareModelos de processos de software
Modelos de processos de software
 
software project management Waterfall model
software project management Waterfall modelsoftware project management Waterfall model
software project management Waterfall model
 
Lecture 6 agile software development
Lecture 6   agile software developmentLecture 6   agile software development
Lecture 6 agile software development
 
Spiral Model - Software Development Life Cycle (SDLC)
Spiral Model - Software Development Life Cycle (SDLC)Spiral Model - Software Development Life Cycle (SDLC)
Spiral Model - Software Development Life Cycle (SDLC)
 
TIME SYNCHRONIZATION IN WIRELESS SENSOR NETWORKS: A SURVEY
 TIME SYNCHRONIZATION IN WIRELESS SENSOR NETWORKS: A SURVEY TIME SYNCHRONIZATION IN WIRELESS SENSOR NETWORKS: A SURVEY
TIME SYNCHRONIZATION IN WIRELESS SENSOR NETWORKS: A SURVEY
 
10 software maintenance
10 software maintenance10 software maintenance
10 software maintenance
 
One m2m
One m2mOne m2m
One m2m
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
 
Rad model
Rad modelRad model
Rad model
 
Gerência de Configuração
Gerência de ConfiguraçãoGerência de Configuração
Gerência de Configuração
 
Component Based Development
Component Based DevelopmentComponent Based Development
Component Based Development
 

Viewers also liked

How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...Kai Wähner
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedGuido Schmutz
 
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Sabri Skhiri
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedGuido Schmutz
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Tin Ho
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsJohann Schleier-Smith
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...Vishal Chowdhary
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Docker - An Introduction
Docker - An IntroductionDocker - An Introduction
Docker - An IntroductionKnoldus Inc.
 
Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseFujio Turner
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMateusz Dymczyk
 

Viewers also liked (20)

How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
 
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms compared
 
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Arquitectura Lambda
Arquitectura LambdaArquitectura Lambda
Arquitectura Lambda
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Docker - An Introduction
Docker - An IntroductionDocker - An Introduction
Docker - An Introduction
 
Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + Couchbase
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
 

Similar to Real time machine learning

How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?Rackspace
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...Grid Dynamics
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Big Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real TimeBig Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real TimeBigDataExpo
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 
Data Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OData Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OSri Ambati
 
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineTrieu Nguyen
 
Big data solutions explained for marketeers & business executives
Big data solutions explained for marketeers & business executivesBig data solutions explained for marketeers & business executives
Big data solutions explained for marketeers & business executivesAgile Delivery
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...Flink Forward
 
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsMongoDB
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Build Next Generation Real-time Applications with SAP HANA on AWS (BDT211) | ...
Build Next Generation Real-time Applications with SAP HANA on AWS (BDT211) | ...Build Next Generation Real-time Applications with SAP HANA on AWS (BDT211) | ...
Build Next Generation Real-time Applications with SAP HANA on AWS (BDT211) | ...Amazon Web Services
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 

Similar to Real time machine learning (20)

How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Big Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real TimeBig Data Expo 2015 - Talend Delivering Real Time
Big Data Expo 2015 - Talend Delivering Real Time
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Data Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OData Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2O
 
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Recommendation engine
Recommendation engineRecommendation engine
Recommendation engine
 
Big data solutions explained for marketeers & business executives
Big data solutions explained for marketeers & business executivesBig data solutions explained for marketeers & business executives
Big data solutions explained for marketeers & business executives
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
 
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business InsightsWebinar: MongoDB and Hadoop - Working Together to provide Business Insights
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Build Next Generation Real-time Applications with SAP HANA on AWS (BDT211) | ...
Build Next Generation Real-time Applications with SAP HANA on AWS (BDT211) | ...Build Next Generation Real-time Applications with SAP HANA on AWS (BDT211) | ...
Build Next Generation Real-time Applications with SAP HANA on AWS (BDT211) | ...
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Real time machine learning

  • 1. 1 Real-time Machine Learning Vinoth Kannan Intelligent software architecture using Modified Lambda architecture & Apache Mahout SkillFactory 71 Vinoth.kannan@widas.de
  • 2. 2 Agenda What is Machine Learning ? Need for Real Time Machine Learning What is Lambda architecture ? What is Mahout ? How does a basic recommendor engine works ? Some Use Cases
  • 3. 3 What is machine learning?
  • 4. 4 Introduction Machine Learning from Streaming Data Model that considers recent history Model that is updatable Machine Learning It has been sunny and 30 degrees in the last two days, it is unlikely that it will be -10 degrees and snowing the next day A retail sales model that remains accurate as the business gets larger Dont they both mean the same ??
  • 5. 5 Introduction Machine Learning from Streaming Data Time-series prediction non-stationary data distributions weather Retail sales Model that considers recent history Model that is updatable
  • 6. 6 Introduction Machine Learning from non-stationary data distributions Incremental Algorithms non-stationary data distributions Batch algorithm These are machine learning algorithms that learn incrementally over the data. These are machine learning algorithms that re-trains periodically with a batch algorithm.
  • 7. 7 Introduction The Challenge for the Best Big Data Technology Hadoop Batch processing System that can churn huge volume of data Storm Real time complex event processing System that can process data stream
  • 9. 9 + = Real-time Big Data Its a Chance not a Challenge Lambda Architecture!!!
  • 11. Speed Layer • Only new data • Compensates for high latency Serving layer updates • Batch layer overrides speed layer Serving layer • Loads and expose the batch views for querying • Random access to batch views Batch layer • Immutable, constantly growing datasets • Batch views are computed from this raw dataset Lambda Architecture Overview with description
  • 12. Basic Idea behind Lambda architecture 12 query = function(all data) - Nathan Marz Big Data - Principles and best practices of scalable realtime data systems
  • 13. Basic Idea behind Lambda 13 Perform some function from real-time data “0“ to the history data “n“ Real Time Big Data Lambda Architecture Hadoop ProcessStorm ProcessReal Time Big Data } } }Letting the History data processed by Hadoop makes process faster
  • 14. The Problem 14 Batch ProcessReal-timeReal Time Big Data } } } • How to define the boundery between Real-time and Batch Process ? • How to synchronize the computation between the two system ? • How to avoid gaps and overlaps ? • What algorithm to use? • How to avoid failure and have fault tolerance mechanism ? Questions to be answered Unanswered questions of Lambda architecture
  • 15. Modified Lamda Architecture Presentation Layer • Presentation layer must aggregate the output of Storm and Hadoop outputs • User will see the result of his events in less than 2 seconds • Seamless merge between short and long term data
  • 17. 17 What is Mahout ? Introduction • Apache Software Foundation Java library • Scalable “machine learning“ library that runs on Hadoop mostly • Currently Mahout supports mainly four use cases Recommendation Clustering Classification Frequent Itemset mining • Core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm
  • 18. 18 Basic Recommendor algorithm How it works Today‘s FOCUS : Suggesting item to user based on current search
  • 19. 19 Basic Recommendor algorithm Defining recommendation Two broad categories of recommender engine algorithms Mahout implements a collabrative filtering framework User-based Recommends items by finding similar users. Harder to scale because of dynamic nature of users Item-based Calculate similiarty between items and make recommendations. Items usually dont change much and hence could be calculated offline
  • 20. 20 Basic Recommendor algorithm Defining recommendation User Preference to an Item • Like Something • Dont Like something • Dont Care 1 Click = 1 Like = Uniform Preference Safe to assume
  • 21. Mahout Library of Algorithms Lots of algorithms to Choose From
  • 22. Use Cases Real Time Machine Learning eCommerce Objective : Increase sales revenue Match potential customer to the right product Personalise user experience on web and email Customer lifecycle management
  • 23. Use Cases Real Time Machine Learning Financial Services Objective : Real Time Fraud Detection Compute patterns/ predictors for individual customers Classify and Cluster custumers and recalculate patterns and predictors Set threshold across all data
  • 24. Use Cases Real Time Machine Learning Media Objective : Generating Meta Data Video/ Audio/Text analysis Find patterns/cluster for people, places, products, things
  • 25. Use Cases Real Time Machine Learning Carbookplus Objective : Generating Meta Data Match potential trips to right destination Recommend best gas station Recommend contacts whom user might know Match right advertisers to customer based on vehcile needs
  • 26. Summary Ability to create real time systems based on lambda architecture Usefulness of predictive algorithms Reason to concentrate on real time predicitions More Read http://storm-project.net/ http://mahout.apache.org/ http://hadoop.apache.org/ 26