23. The future of analytics is distributed.
• Your data sources and targets are distributed.
– You may only need a snippet of data
– You still have to retrieve that snippet
• Data movement is expensive.
• Data requirements are expanding.
• Machine learning algorithms can use more data.
• When you need capability, you’d better have it.
41. Summary
• Yes, distributed machine learning is necessary.
• Need generalized distribution framework.
• Today, Spark is the only game in town.
• Race is on to deliver push-down Spark integration.
42. THANK YOU.
Thomas W. Dinsmore
@thomaswdinsmore
The Big Analytics Blog
www.thomaswdinsmore.com