Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Power of Unified Analytics with Ali Ghodsi

361 views

Published on

Keynote from Ali Ghodsi at Spark + AI Summit Europe

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

The Power of Unified Analytics with Ali Ghodsi

  1. 1. Databricks Unifying Data Science and Engineering 1
  2. 2. AMPLab funded by tech companies: The beginnings of Apache Spark at UC Berkeley ● Got a glimpse at their most impactful internal projects ● They were leveraging massive amounts of data ● Doing high impact machine learning/AI We wanted to democratize Data + AI
  3. 3. Apache Spark Streaming Spark SQL +DataFrames MLlib Machine Learning GraphX Graph Computation Spark Core API R SQL Python Scala Java
  4. 4. Databricks started in 2013 Bring Apache Spark to the Enterprise
  5. 5. Only 1% of enterprises successful with AI
  6. 6. Data Engineers Other 99% struggle due to organizational silos Data Scientists IT Line of Business
  7. 7. Databricks goal is to unify data science & engineering
  8. 8. Data is not ready for AI 3 challenges created by data & AI divide 1 2 Data and AI technology silos 3 Data scientists and engineers are in silos
  9. 9. Data is not ready for AI1
  10. 10. Massive data in data lakes Data Lake
  11. 11. Vision to do AI on that data Data Lake AI
  12. 12. Data Lake Data is not ready for AI AI Inconsistent Data Lack of Schema Slow Performance and Costly
  13. 13. Databricks Delta Brings data reliability and performance to data lakes Fast Analytics +Data Reliability Blob Storage
  14. 14. Data Lake Databricks Delta: makes data ready for AI Data Reliability Schema Enforcement ACID Transactions Query Performance Very Fast at Scale Indexing (10-100x Faster) Reporting Machine Learning Alerting Dashboards Delta
  15. 15. Data and AI technology silos2
  16. 16. Data & AI technology silos Great for Data, but not AI Great for AI, but not for data TFServing TensorBoard Supporting and Deployment Libraries
  17. 17. Data & AI technology silos Great for Data, but not AI Great for AI, but not for data TFServing TensorBoard Supporting and Deployment Libraries
  18. 18. 3 Data scientists and engineers are in silos
  19. 19. Data scientists & engineers are in silos Data Engineers Data Scientists Challenging to track and reproduce experiments Build Model2 Have to ensure reliability, SLAs, and quality Deploy Model3 Data Prep Hard to make pipelines reliable 1
  20. 20. Databricks MLflow: unifies data scientists & engineers Data Engineers Data Scientists
  21. 21. Build reliable data pipelines Track the datasets Databricks Delta Track Experiments Reproduce experiments MLflow Project & Tracker Databricks Runtime for ML Deploy models in production, track their quality MLflow Serving Data Prep Deploy Model Build Model Databricks MLflow: unifies data scientists & engineers Data Engineers Data Scientists 1 3 2
  22. 22. Build reliable data pipelines Track the datasets Databricks Delta Track Experiments Reproduce experiments Databricks Runtime for ML MLflow Project & Tracker Deploy models in production, track their quality MLflow Serving Data Prep Deploy Model Build Model Databricks MLflow: unifies data scientists & engineers Data Engineers Data Scientists 1 3 2 Announcing: time travel +Delta
  23. 23. Confidential – for Gartner briefing only
  24. 24. Databricks makes AI possible for the other 99% by unifying data science and engineering
  25. 25. Databricks Unified Analyitcs demo by Michael Armbrust

×