Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spark Summit Europe 2016 Keynote - Databricks CEO


Published on

Machine learning algorithm itself is rarely the main barrier in building AI applications. Instead, the real culprit is the set of complex systems that prepares large-scale training and test data for the ML algorithms.

Apache Spark is a huge leap forward in democratizing AI. However, it does not solve all the problems. Databricks CEO Ali Ghodsi explains how Databricks democratizes AI by making it easier to build end-to-end machine learning pipelines with Apache Spark.

Published in: Software
  • Be the first to comment

Spark Summit Europe 2016 Keynote - Databricks CEO

  1. 1. Democratizing AI with Apache Spark Ali Ghodsi Co-Founderand CEO
  2. 2. AI is changing the world 2 Why now? AlphaGoSIRI/assistantsSelf-driving cars
  3. 3. Data is the catalyst 3 AI hasn’t been democratized Better training, tuning, validation More data Clickstreams Sensor data (IoT) Video Speech Handwriting …
  4. 4. The hardest part of AI isn’t AI 4 “Hidden Technical Debt in Machine LearningSystems“, Google NIPS 2015 How do we democratize AI?
  5. 5. 5 “Hidden Technical Debt in Machine LearningSystems“, Google NIPS 2015 + AI FLEXIBLE FAST BIG DATA
  6. 6. Some gaps remain 6 ManageData infrastructure • Create, configure, monitor resilient big data clusters. • Securely access silos of disparate data sources. • Enforce proper data governance. •1 Empower teams to be productive • Interactively explore data and prototypeideas. • Securely share big data clusters among analysts. • Debug, troubleshoot, version-control big data applications.• 2 Establish Production- Ready Applications • Setup robust ML data pipelines for ETL/ELT. • Productionize real-time applications with HA,FT. • Build, serve, maintain advanced machine learning models. • 3
  7. 7. Databricks: Closing the gap 7 • Separate compute & storage • Integrate existing data stores • Efficient cache on first access Just-in-Time Data Platform1 Agile + Low TCO • Interactive notebooks, dashboards, reports • Real-time exploration, machine learning, graph use cases IntegratedWorkspace2 Accelerate Time to Value • Workflow scheduler for ML, streaming, SQL, ETL • Performance-optimized, high availability, fault- tolerant Automated Spark Management3 Performance
  8. 8. Enterprise AI use-cases 8 Predict credit score,credit limit, anomalies Predict energy demand based on massive weather data Natural language processing to extract author graph Predict player churn, predicting network outages Predict machine equipment failure
  9. 9. New Frontier of AI: Deep Learning 9 Detect cancer Understand speech Infer location Identifylandmarks in photosRecognizeMandarin and English Improvecancer detection
  10. 10. Faster and easier deep learning with Databricks 10 GPUs • TensorFlow: The most popular deep learning framework. • TensorFrames: Makes TensorFlowcomputations faster and easier to program on Spark. TensorFlow on TensorFrames and GPUs support out-of-the-box Massive parallelism
  11. 11. Deep Learning on Databricks 11 Data Ingest Feature extraction Model Training Product- ionize Clusters Jobs & WorkflowsTensorFrames + GPUs Interactive exploration Just-in-time data platform Automated management
  12. 12. Thank you.