Your SlideShare is downloading. ×
...Moving from batch to real-time machine learning
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

...Moving from batch to real-time machine learning

540
views

Published on

Michael Cutler CTO @Tumra talk at Data Science London 6/09/12

Michael Cutler CTO @Tumra talk at Data Science London 6/09/12

Published in: Technology, Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
540
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. From square to round wheels... ...moving from batch to real-time machine learning tumra.com @tumraTUMRA LTD, Building 3, Chiswick Park,566 Chiswick High Road, W4 5YA Michael Cutler - 6th Sept 2012
  • 2. BatchProcessing
  • 3. Credit: http://bit.ly/Q71u4W
  • 4. In Manufacturing...Batch processing brought advantages :- ● Increased scale of production ● Reduced manufacturing cost ● Economies of scale (reusable parts)However :-● Machinery is complex & expensive● Each product requires some bespoke parts
  • 5. In Technology...Been around since the 50s in MainframesHadoop (Map/Reduce) advantages :-● Increased scale of processing● Reduced processing cost **● Economies of scale (reusable code)However :-● Complex & expensive **● Most jobs requires some bespoke code
  • 6. Map/Reduce != FUNSure its "just Java" but... ● Requires certain mindset ● Multi-stage algorithm complexity ● If you get stuck, R.T.F.S.Alleviated to an extent by tools like :- ● Pig, Hive, Cascading, CrunchTypically requires bespoke code / algorithms
  • 7. ContinuousProcessing
  • 8. Credit: http://bit.ly/NOslqf
  • 9. In manufacturing...Described as: "a method used to manufacture, produce, or process materials without interruption"Key features :- ● Materials are processed in flows & streams ● Can run continuously (exc. maintenance) ● Latency e2e can be from seconds to hours Credit: Wikipedia
  • 10. In Technology...We have a problem... most Hadoop relatedtechnologies are inherently batch!!The trend towards real-time continuouscomputation requires :- ● New tools (Storm?) ● Better algorithmsSo whats the solution?
  • 11. Credit: Scott Simmerman http://bit.ly/9cxaHt
  • 12. Its a hybrid of both!
  • 13. Batch does have its place...Map/Reduce is great for boil the ocean jobs;● tasks that take hours or days● typically non-interactive with users● works well for pattern mining, clustering etc.However, the perfect answer is useless if itarrives so late its irrelevant...
  • 14. Real-time machine learningQuite simply "data is never at rest"...● processed in streams not batches● best for supervised learning models● end-to-end latency can be in secondsKey criteria :- ● model always has a best answer available ● feedback used to train the model
  • 15. So what works well in real-time?Classification :- ● Easiest to implementClustering :- ● Periodically batch recompute clusters ● Add new data points to the nearest centroid ● Rinse, repeatCollaborative filtering :-
  • 16. The machine learning gap...Academic Practical
  • 17. Machine learning gap...Academia are way out there with newapproaches and algorithms almost every day :- ● Many hard to implement in a parallel wayWe need more focus on :-● Inherently distributed algorithms● Practical implementations● Speed over marginal accuracy improvements
  • 18. Mathematical navel gazingWe need practical solutions to real-worldproblems...Recommendations Rant!?!?!?!?! ● Most recommenders are 2D matrices ● Humans are not very 2D ● Is there an N-dimensional solution?
  • 19. Hybrid approach
  • 20. Hybrid approach
  • 21. Example Use-casesExamples; ● eCommerce optimisation ● Targeted advertising ● Financial services (risk modeling) ● Detecting anomalies in M2M data ● Automated metadata generation... many more!
  • 22. Almost finished!
  • 23. Introducing TUMRA LabsAPI access to some of our real-time models :- ● Probabilistic Demographics ● Language detection ** ● Sentiment analysis ** ● Metadata Generation (entity extraction and disambiguation) ** Free to signup and easy to get started! http://labs.tumra.com/
  • 24. Questions? tumra.com @tumra