SlideShare a Scribd company logo
1 of 25
From square to round wheels...
       ...moving from batch to real-time machine learning


                                        tumra.com
                                         @tumra
TUMRA LTD, Building 3, Chiswick Park,
566 Chiswick High Road, W4 5YA                      Michael Cutler - 6th Sept 2012
Batch
Processing
Credit: http://bit.ly/Q71u4W
In Manufacturing...
Batch processing brought advantages :-
 ● Increased scale of production

 ● Reduced manufacturing cost

 ● Economies of scale (reusable parts)




However :-
● Machinery is complex & expensive

● Each product requires some bespoke parts
In Technology...
Been around since the 50's in Mainframes

Hadoop (Map/Reduce) advantages :-
● Increased scale of processing

● Reduced processing cost **

● Economies of scale (reusable code)




However :-
● Complex & expensive **

● Most jobs requires some bespoke code
Map/Reduce != FUN
Sure its "just Java" but...
 ● Requires certain mindset

 ● Multi-stage algorithm complexity

 ● If you get stuck, R.T.F.S.




Alleviated to an extent by tools like :-
 ● Pig, Hive, Cascading, Crunch




Typically requires bespoke code / algorithms
Continuous
Processing
Credit: http://bit.ly/NOslqf
In manufacturing...
Described as:
  "a method used to manufacture, produce, or
  process materials without interruption"

Key features :-
 ● Materials are processed in flows & streams

 ● Can run continuously (exc. maintenance)

 ● Latency e2e can be from seconds to hours




                                            Credit: Wikipedia
In Technology...
We have a problem... most Hadoop related
technologies are inherently batch!!

The trend towards real-time continuous
computation requires :-
 ● New tools (Storm?)

 ● Better algorithms




So what's the solution?
Credit: Scott Simmerman
     http://bit.ly/9cxaHt
It's a hybrid of both!
Batch does have its place...
Map/Reduce is great for 'boil the ocean' jobs;
● tasks that take hours or days

● typically non-interactive with users

● works well for pattern mining, clustering etc.




However, the 'perfect' answer is useless if it
arrives so late it's irrelevant...
Real-time machine learning
Quite simply "data is never at rest"...
● processed in streams not batches

● best for 'supervised learning' models

● end-to-end latency can be in seconds




Key criteria :-
 ● model always has a 'best answer' available

 ● feedback used to train the model
So what works well in real-time?
Classification :-
 ● Easiest to implement




Clustering :-
 ● Periodically batch recompute clusters

 ● Add new data points to the nearest centroid

 ● Rinse, repeat




Collaborative filtering :-
The machine learning gap...
Academic                      Practical
Machine learning gap...
Academia are 'way out there' with new
approaches and algorithms almost every day :-
 ● Many hard to implement in a parallel way




We need more focus on :-
● Inherently distributed algorithms

● Practical implementations

● Speed over marginal accuracy improvements
Mathematical navel gazing
We need practical solutions to real-world
problems...



Recommendations Rant!?!?!?!?!
 ● Most recommenders are 2D matrices

 ● Humans are not very 2D

 ● Is there an N-dimensional solution?
Hybrid approach
Hybrid approach
Example Use-cases
Examples;
 ● eCommerce optimisation

 ● Targeted advertising

 ● Financial services (risk modeling)

 ● Detecting anomalies in M2M data

 ● Automated metadata generation




... many more!
Almost finished!
Introducing TUMRA Labs
API access to some of our real-time models :-
 ● Probabilistic Demographics

 ● Language detection **

 ● Sentiment analysis **

 ● Metadata Generation (entity extraction and

   disambiguation) **

    Free to signup and easy to get started!

            http://labs.tumra.com/
Questions?
  tumra.com
   @tumra

More Related Content

Viewers also liked

Sound effect manipulation word 5
Sound effect manipulation word 5Sound effect manipulation word 5
Sound effect manipulation word 5halo4robo
 
La desigual distribución de la población
La desigual distribución de la poblaciónLa desigual distribución de la población
La desigual distribución de la poblaciónAbraham Galindo Manning
 
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues MessegesprächEndlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues MessegesprächMarkus Deixler-Wimmer
 
Leads facade- Design Develope Deliver
Leads facade- Design Develope DeliverLeads facade- Design Develope Deliver
Leads facade- Design Develope DeliverLeads Facade
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureData Science London
 
Photoshoot and photoshop
Photoshoot and photoshopPhotoshoot and photoshop
Photoshoot and photoshopniamhbarrett
 
Word Association Test by ISSB Guideline
Word Association Test by ISSB GuidelineWord Association Test by ISSB Guideline
Word Association Test by ISSB GuidelineISSBGuideline
 
Smart Hanger Based on Arduino Uno
Smart Hanger Based on Arduino UnoSmart Hanger Based on Arduino Uno
Smart Hanger Based on Arduino Unomugia_islami
 

Viewers also liked (11)

Sound effect manipulation word 5
Sound effect manipulation word 5Sound effect manipulation word 5
Sound effect manipulation word 5
 
Resume 2015
Resume 2015Resume 2015
Resume 2015
 
La desigual distribución de la población
La desigual distribución de la poblaciónLa desigual distribución de la población
La desigual distribución de la población
 
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues MessegesprächEndlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
 
Acme Competition
Acme CompetitionAcme Competition
Acme Competition
 
Leads facade- Design Develope Deliver
Leads facade- Design Develope DeliverLeads facade- Design Develope Deliver
Leads facade- Design Develope Deliver
 
Cyber Crime Investigation
Cyber Crime InvestigationCyber Crime Investigation
Cyber Crime Investigation
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Photoshoot and photoshop
Photoshoot and photoshopPhotoshoot and photoshop
Photoshoot and photoshop
 
Word Association Test by ISSB Guideline
Word Association Test by ISSB GuidelineWord Association Test by ISSB Guideline
Word Association Test by ISSB Guideline
 
Smart Hanger Based on Arduino Uno
Smart Hanger Based on Arduino UnoSmart Hanger Based on Arduino Uno
Smart Hanger Based on Arduino Uno
 

More from Data Science London

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Data Science London
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingData Science London
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Data Science London
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresData Science London
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayData Science London
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignData Science London
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Data Science London
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryData Science London
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutData Science London
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRData Science London
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutData Science London
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersData Science London
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
 

More from Data Science London (20)

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Nowcasting Business Performance
Nowcasting Business PerformanceNowcasting Business Performance
Nowcasting Business Performance
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least Squares
 
Survival Analysis of Web Users
Survival Analysis of Web UsersSurvival Analysis of Web Users
Survival Analysis of Web Users
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems Design
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?
 
Data Science for Live Music
Data Science for Live MusicData Science for Live Music
Data Science for Live Music
 
Research at last.fm
Research at last.fmResearch at last.fm
Research at last.fm
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music Industry
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with Mahout
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook Users
 
Practical Magic with Incanter
Practical Magic with IncanterPractical Magic with Incanter
Practical Magic with Incanter
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

...Moving from batch to real-time machine learning

  • 1. From square to round wheels... ...moving from batch to real-time machine learning tumra.com @tumra TUMRA LTD, Building 3, Chiswick Park, 566 Chiswick High Road, W4 5YA Michael Cutler - 6th Sept 2012
  • 4. In Manufacturing... Batch processing brought advantages :- ● Increased scale of production ● Reduced manufacturing cost ● Economies of scale (reusable parts) However :- ● Machinery is complex & expensive ● Each product requires some bespoke parts
  • 5. In Technology... Been around since the 50's in Mainframes Hadoop (Map/Reduce) advantages :- ● Increased scale of processing ● Reduced processing cost ** ● Economies of scale (reusable code) However :- ● Complex & expensive ** ● Most jobs requires some bespoke code
  • 6. Map/Reduce != FUN Sure its "just Java" but... ● Requires certain mindset ● Multi-stage algorithm complexity ● If you get stuck, R.T.F.S. Alleviated to an extent by tools like :- ● Pig, Hive, Cascading, Crunch Typically requires bespoke code / algorithms
  • 9. In manufacturing... Described as: "a method used to manufacture, produce, or process materials without interruption" Key features :- ● Materials are processed in flows & streams ● Can run continuously (exc. maintenance) ● Latency e2e can be from seconds to hours Credit: Wikipedia
  • 10. In Technology... We have a problem... most Hadoop related technologies are inherently batch!! The trend towards real-time continuous computation requires :- ● New tools (Storm?) ● Better algorithms So what's the solution?
  • 11. Credit: Scott Simmerman http://bit.ly/9cxaHt
  • 12. It's a hybrid of both!
  • 13. Batch does have its place... Map/Reduce is great for 'boil the ocean' jobs; ● tasks that take hours or days ● typically non-interactive with users ● works well for pattern mining, clustering etc. However, the 'perfect' answer is useless if it arrives so late it's irrelevant...
  • 14. Real-time machine learning Quite simply "data is never at rest"... ● processed in streams not batches ● best for 'supervised learning' models ● end-to-end latency can be in seconds Key criteria :- ● model always has a 'best answer' available ● feedback used to train the model
  • 15.
  • 16. So what works well in real-time? Classification :- ● Easiest to implement Clustering :- ● Periodically batch recompute clusters ● Add new data points to the nearest centroid ● Rinse, repeat Collaborative filtering :-
  • 17. The machine learning gap... Academic Practical
  • 18. Machine learning gap... Academia are 'way out there' with new approaches and algorithms almost every day :- ● Many hard to implement in a parallel way We need more focus on :- ● Inherently distributed algorithms ● Practical implementations ● Speed over marginal accuracy improvements
  • 19. Mathematical navel gazing We need practical solutions to real-world problems... Recommendations Rant!?!?!?!?! ● Most recommenders are 2D matrices ● Humans are not very 2D ● Is there an N-dimensional solution?
  • 22. Example Use-cases Examples; ● eCommerce optimisation ● Targeted advertising ● Financial services (risk modeling) ● Detecting anomalies in M2M data ● Automated metadata generation ... many more!
  • 24. Introducing TUMRA Labs API access to some of our real-time models :- ● Probabilistic Demographics ● Language detection ** ● Sentiment analysis ** ● Metadata Generation (entity extraction and disambiguation) ** Free to signup and easy to get started! http://labs.tumra.com/