Your SlideShare is downloading. ×
From square to round wheels...       ...moving from batch to real-time machine learning                                   ...
BatchProcessing
Credit: http://bit.ly/Q71u4W
In Manufacturing...Batch processing brought advantages :- ● Increased scale of production ● Reduced manufacturing cost ● E...
In Technology...Been around since the 50s in MainframesHadoop (Map/Reduce) advantages :-● Increased scale of processing● R...
Map/Reduce != FUNSure its "just Java" but... ● Requires certain mindset ● Multi-stage algorithm complexity ● If you get st...
ContinuousProcessing
Credit: http://bit.ly/NOslqf
In manufacturing...Described as:  "a method used to manufacture, produce, or  process materials without interruption"Key f...
In Technology...We have a problem... most Hadoop relatedtechnologies are inherently batch!!The trend towards real-time con...
Credit: Scott Simmerman     http://bit.ly/9cxaHt
Its a hybrid of both!
Batch does have its place...Map/Reduce is great for boil the ocean jobs;● tasks that take hours or days● typically non-int...
Real-time machine learningQuite simply "data is never at rest"...● processed in streams not batches● best for supervised l...
So what works well in real-time?Classification :- ● Easiest to implementClustering :- ● Periodically batch recompute clust...
The machine learning gap...Academic                      Practical
Machine learning gap...Academia are way out there with newapproaches and algorithms almost every day :- ● Many hard to imp...
Mathematical navel gazingWe need practical solutions to real-worldproblems...Recommendations Rant!?!?!?!?! ● Most recommen...
Hybrid approach
Hybrid approach
Example Use-casesExamples; ● eCommerce optimisation ● Targeted advertising ● Financial services (risk modeling) ● Detectin...
Almost finished!
Introducing TUMRA LabsAPI access to some of our real-time models :- ● Probabilistic Demographics ● Language detection ** ●...
Questions?  tumra.com   @tumra
...Moving from batch to real-time machine learning
Upcoming SlideShare
Loading in...5
×

...Moving from batch to real-time machine learning

585

Published on

Michael Cutler CTO @Tumra talk at Data Science London 6/09/12

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
585
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "...Moving from batch to real-time machine learning"

  1. 1. From square to round wheels... ...moving from batch to real-time machine learning tumra.com @tumraTUMRA LTD, Building 3, Chiswick Park,566 Chiswick High Road, W4 5YA Michael Cutler - 6th Sept 2012
  2. 2. BatchProcessing
  3. 3. Credit: http://bit.ly/Q71u4W
  4. 4. In Manufacturing...Batch processing brought advantages :- ● Increased scale of production ● Reduced manufacturing cost ● Economies of scale (reusable parts)However :-● Machinery is complex & expensive● Each product requires some bespoke parts
  5. 5. In Technology...Been around since the 50s in MainframesHadoop (Map/Reduce) advantages :-● Increased scale of processing● Reduced processing cost **● Economies of scale (reusable code)However :-● Complex & expensive **● Most jobs requires some bespoke code
  6. 6. Map/Reduce != FUNSure its "just Java" but... ● Requires certain mindset ● Multi-stage algorithm complexity ● If you get stuck, R.T.F.S.Alleviated to an extent by tools like :- ● Pig, Hive, Cascading, CrunchTypically requires bespoke code / algorithms
  7. 7. ContinuousProcessing
  8. 8. Credit: http://bit.ly/NOslqf
  9. 9. In manufacturing...Described as: "a method used to manufacture, produce, or process materials without interruption"Key features :- ● Materials are processed in flows & streams ● Can run continuously (exc. maintenance) ● Latency e2e can be from seconds to hours Credit: Wikipedia
  10. 10. In Technology...We have a problem... most Hadoop relatedtechnologies are inherently batch!!The trend towards real-time continuouscomputation requires :- ● New tools (Storm?) ● Better algorithmsSo whats the solution?
  11. 11. Credit: Scott Simmerman http://bit.ly/9cxaHt
  12. 12. Its a hybrid of both!
  13. 13. Batch does have its place...Map/Reduce is great for boil the ocean jobs;● tasks that take hours or days● typically non-interactive with users● works well for pattern mining, clustering etc.However, the perfect answer is useless if itarrives so late its irrelevant...
  14. 14. Real-time machine learningQuite simply "data is never at rest"...● processed in streams not batches● best for supervised learning models● end-to-end latency can be in secondsKey criteria :- ● model always has a best answer available ● feedback used to train the model
  15. 15. So what works well in real-time?Classification :- ● Easiest to implementClustering :- ● Periodically batch recompute clusters ● Add new data points to the nearest centroid ● Rinse, repeatCollaborative filtering :-
  16. 16. The machine learning gap...Academic Practical
  17. 17. Machine learning gap...Academia are way out there with newapproaches and algorithms almost every day :- ● Many hard to implement in a parallel wayWe need more focus on :-● Inherently distributed algorithms● Practical implementations● Speed over marginal accuracy improvements
  18. 18. Mathematical navel gazingWe need practical solutions to real-worldproblems...Recommendations Rant!?!?!?!?! ● Most recommenders are 2D matrices ● Humans are not very 2D ● Is there an N-dimensional solution?
  19. 19. Hybrid approach
  20. 20. Hybrid approach
  21. 21. Example Use-casesExamples; ● eCommerce optimisation ● Targeted advertising ● Financial services (risk modeling) ● Detecting anomalies in M2M data ● Automated metadata generation... many more!
  22. 22. Almost finished!
  23. 23. Introducing TUMRA LabsAPI access to some of our real-time models :- ● Probabilistic Demographics ● Language detection ** ● Sentiment analysis ** ● Metadata Generation (entity extraction and disambiguation) ** Free to signup and easy to get started! http://labs.tumra.com/
  24. 24. Questions? tumra.com @tumra

×