As I went to Spark Summit in San Francisco, early June, I wanted to share key takeaways from the conference with my local friends of the Triangle Apache Spark Meetup.
Jean-Georges PerrinSenior Enterprise Architect | Lifetime IBM Champion at The NPD Group
1. Zaloni Confidential and Proprietary - Provided under NDA
Spark Summit 2017
TASM Feedback
Jean Georges Perrin / jgperrin@zaloni.com
2017-06-22
2. Zaloni Confidential and Proprietary - Provided under NDA
is hiring!
Check out https://www.zaloni.com/about/careers/
Forbes:
Best Big Data Companies And CEOs To
Work For In 2017
3. Zaloni Confidential and Proprietary - Provided under NDA
• June 5-7 2017
• San Francisco's Moscone Center
• Just under 3000 attendees
• 11 tracks: Data Science , Data Science 2, Developer, Enterprise, Machine
Learning, Research, Spark Ecosystem, Use Cases, Sponsored Sessions,
Streaming, Technical Deep Dives
• About 30 exhibitors
• About 50 sponsors
• At least four French speakers
• One Zaloni Speaker
Logistics
8. Zaloni Confidential and Proprietary - Provided under NDA
• Initiative from Databricks
▪ https://databricks.com/blog/2017/06/06/databricks-vision-simplify-large-sc
ale-deep-learning.html
▪ https://github.com/databricks/spark-deep-learning
• Easier integration of TensorFlow and other frameworks
• Partnership with Stanford U
Deep Learning - Making it Easier
9. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
10. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
11. Zaloni Confidential and Proprietary - Provided under NDA
Yes, it is!
Christopher Ré, Stanford U
Zaloni Confidential and Proprietary - Provided under NDA
12. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
13. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
19. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
20. Zaloni Confidential and Proprietary - Provided under NDA
• Building a Mica-like tool internally.
• Looking at Open-Sourcing it.
• Video: https://www.youtube.com/watch?v=-hDIkTUPhZY&feature=youtu.be
• Slides:
https://www.slideshare.net/databricks/using-sparkml-to-power-a-dsaas-data-sc
ience-as-a-service-with-kiran-muglurmath-and-sridhar-alla
Comcast
21. Zaloni Confidential and Proprietary - Provided under NDA
Sunning Too...
Zaloni Confidential and Proprietary - Provided under NDA
22. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
23. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
24. Zaloni Confidential and Proprietary - Provided under NDA
Sunning's Extensions to Spark ML
Zaloni Confidential and Proprietary - Provided under NDA
25. Zaloni Confidential and Proprietary - Provided under NDA
• Giving ML capabilities to Business Users, mainly in fraud detection.
• Slides:
https://www.slideshare.net/databricks/machine-learning-as-a-service-apache-s
park-mllib-enrichment-and-webbased-codeless-modeling-with-zhengyi-le
• Video: https://www.youtube.com/watch?v=R4VEHoCvHy4&feature=youtu.be
Sunning - ML as a Service
26. Zaloni Confidential and Proprietary - Provided under NDA
A Religion War about to Start?
Zaloni Confidential and Proprietary - Provided under NDA
27. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
The Cloud is Too (Damn) Hard!
28. Zaloni Confidential and Proprietary - Provided under NDA
More and more of NLP and Spark
Zaloni Confidential and Proprietary - Provided under NDA
30. Zaloni Confidential and Proprietary - Provided under NDA
Serverless is the Future of Cloud
Zaloni Confidential and Proprietary - Provided under NDA
31. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
32. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
33. Zaloni Confidential and Proprietary - Provided under NDA
• Dynamic allocation of resources.
• More flexibility for the customers.
• Lower TCO.
• Non-blocking jobs.
• Faster.
• Matching Amazon offers?
Serverless
34. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
Up to 12x Faster
35. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
Intermission with Ben, Ion, and Matei
Ben Lorica (O’Reilly Media)
Ion Stoica (UC Berkeley AMP/RISELab & Databricks)
Matei Zaharia
(Databricks)
37. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
DRY & DRO
38. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
Smarter Notebooks
39. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
Microsoft Fully Embracing the Apache Stack
40. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
41. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
42. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
43. Zaloni Confidential and Proprietary - Provided under NDA
Finally Some Common Sense!
Zaloni Confidential and Proprietary - Provided under NDA
44. Zaloni Confidential and Proprietary - Provided under NDA
2.2 rocks!
• Simply Faster.
▪ Autoboxing kills performance!
▪ Scala sucks (yeah!)
▪ Better Catalyst, including cost-based optimizer (donated by IBM).
45. Zaloni Confidential and Proprietary - Provided under NDA
GPU Analytics is a Trend
Zaloni Confidential and Proprietary - Provided under NDA
46. Zaloni Confidential and Proprietary - Provided under NDA
• IBM mentioned it.
• 4 sessions on the subject.
▪ 3 sessions on GPU
▪ 2 sessions on FPGA
• Vendors: MapD, Intel, Nvidia.
Analytics on GPU? FPGA?
48. Zaloni Confidential and Proprietary - Provided under NDA
Classics
• Databricks
• Intel
• IBM
• Cloudera
• Pepperdata
• Cask
• Mesosphere
• Google Cloud
• Amazon
• Mapr
• Netapp
• BlueTalon
• DataIku
• Talend
• MemSQL
• Redis
• Microsoft
• Confluent
• VMware
• ...
• Not Hortonworks
49. Zaloni Confidential and Proprietary - Provided under NDA
• Gridgain - in memory DB
• SnappyData - in memory DB
• Target - looking to hire people
• Yelp! - looking to hire people
Others
51. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
52. Zaloni Confidential and Proprietary - Provided under NDA
And the best session of all times...
53. Zaloni Confidential and Proprietary - Provided under NDAZaloni Confidential and Proprietary - Provided under NDA
54. Zaloni Confidential and Proprietary - Provided under NDA
• Video:
https://www.youtube.com/watch?v=ka8xhQAoj-E&feature=youtu.be
(go like it!)
• Slides:
▪ On Databricks' channel:
https://www.slideshare.net/databricks/the-key-to-machine-learning-is-prep
ping-the-right-data-with-jean-georges-perrin
(go like it!)
▪ On my channel:
https://www.slideshare.net/jgperrin/the-key-to-machine-learning-is-preppin
g-the-right-data
(go like it!)
The Key to ML is Prepping the Right Data