08448380779 Call Girls In Civil Lines Women Seeking Men
...Moving from batch to real-time machine learning
1. From square to round wheels...
...moving from batch to real-time machine learning
tumra.com
@tumra
TUMRA LTD, Building 3, Chiswick Park,
566 Chiswick High Road, W4 5YA Michael Cutler - 6th Sept 2012
4. In Manufacturing...
Batch processing brought advantages :-
● Increased scale of production
● Reduced manufacturing cost
● Economies of scale (reusable parts)
However :-
● Machinery is complex & expensive
● Each product requires some bespoke parts
5. In Technology...
Been around since the 50's in Mainframes
Hadoop (Map/Reduce) advantages :-
● Increased scale of processing
● Reduced processing cost **
● Economies of scale (reusable code)
However :-
● Complex & expensive **
● Most jobs requires some bespoke code
6. Map/Reduce != FUN
Sure its "just Java" but...
● Requires certain mindset
● Multi-stage algorithm complexity
● If you get stuck, R.T.F.S.
Alleviated to an extent by tools like :-
● Pig, Hive, Cascading, Crunch
Typically requires bespoke code / algorithms
9. In manufacturing...
Described as:
"a method used to manufacture, produce, or
process materials without interruption"
Key features :-
● Materials are processed in flows & streams
● Can run continuously (exc. maintenance)
● Latency e2e can be from seconds to hours
Credit: Wikipedia
10. In Technology...
We have a problem... most Hadoop related
technologies are inherently batch!!
The trend towards real-time continuous
computation requires :-
● New tools (Storm?)
● Better algorithms
So what's the solution?
13. Batch does have its place...
Map/Reduce is great for 'boil the ocean' jobs;
● tasks that take hours or days
● typically non-interactive with users
● works well for pattern mining, clustering etc.
However, the 'perfect' answer is useless if it
arrives so late it's irrelevant...
14. Real-time machine learning
Quite simply "data is never at rest"...
● processed in streams not batches
● best for 'supervised learning' models
● end-to-end latency can be in seconds
Key criteria :-
● model always has a 'best answer' available
● feedback used to train the model
15.
16. So what works well in real-time?
Classification :-
● Easiest to implement
Clustering :-
● Periodically batch recompute clusters
● Add new data points to the nearest centroid
● Rinse, repeat
Collaborative filtering :-
18. Machine learning gap...
Academia are 'way out there' with new
approaches and algorithms almost every day :-
● Many hard to implement in a parallel way
We need more focus on :-
● Inherently distributed algorithms
● Practical implementations
● Speed over marginal accuracy improvements
19. Mathematical navel gazing
We need practical solutions to real-world
problems...
Recommendations Rant!?!?!?!?!
● Most recommenders are 2D matrices
● Humans are not very 2D
● Is there an N-dimensional solution?
24. Introducing TUMRA Labs
API access to some of our real-time models :-
● Probabilistic Demographics
● Language detection **
● Sentiment analysis **
● Metadata Generation (entity extraction and
disambiguation) **
Free to signup and easy to get started!
http://labs.tumra.com/