Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Analytics Meetup: Azure Databricks Deep Learning Pipelines

120 views

Published on

Microsoft Azure Databricks is a Apache Spark-based tool that allows data scientists and analysts to collaborate in a comprehensive work space. Journey through the fundamentals of Azure Databricks with Data Scientist, Ahmed Sherif, as he explores the tool's collaborative environment and demo's the tool's machine learning capabilities.

See Ahmed's video presentation here: http://ccganalytics.com/resources/videos/building-with-azure-databricks

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data Analytics Meetup: Azure Databricks Deep Learning Pipelines

  1. 1. Ahmed Sherif Technology Solution Profession – Microsoft Data & AI Thursday, August 30, 2018
  2. 2. Agenda • Spark: A Brief History • Azure Databricks: A Quick Overview • Azure Databricks Notebooks • Azure Databricks & AI • Demo
  3. 3. A L L O W M Y S E L F T O I N T R O D U C E . . . M Y S E L F
  4. 4. Spark: A Brief History
  5. 5. S P A R K I N O N E S E N T E N C E
  6. 6. S P A R K : A B R I E F H I S T O R Y
  7. 7. A P A C H E S P A R K An unified, open source, parallel, data processing framework for Big Data Analytics Spark Core Engine Spark SQL Interactive Queries Spark Structured Streaming Stream processing Spark MLlib Machine Learning Spark MLlib Machine Learning Spark Streaming Stream processing GraphX Graph Computation
  8. 8. Azure Databricks A Quick Overview
  9. 9. D A T A B R I C K S - C O M P A N Y O V E R V I E W
  10. 10. A Z U R E D A T A B R I C K S Microsoft Azure
  11. 11. Azure Databricks key audiences & benefits Unified analytics platform Integrated workspace Easy data exploration Collaborative experience Interactive dashboards Faster insights • Best Spark & serverless • Databricks managed Spark Improved ETL performance • Zero management clusters, serverless Easy to schedule jobs Automated workflows Enhanced monitoring & troubleshooting • Automated alerts & easy access to logs Zero Management Spark Cluster democratization (serverless) Fast, collaborative analytics platform accelerating time to market No dev-ops required Enterprise grade security • Encryption • End-to-end auditing • Role-based control • Compliance Data scientist Data engineer CDO, VP of analytics Provided by Microsoft and Databricks under NDA
  12. 12. Azure Databricks Notebooks Not just your slightly older college roommate’s Jupyter Notebook
  13. 13. A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W Notebooks are a popular way to develop, and run, Spark Applications
  14. 14. V I S U A L I Z A T I O N Azure Databricks supports a number of visualization plots out of the box  All notebooks, regardless of their language, support Databricks visualizations.  The visualizations are written in HTML.
  15. 15. M I X I N G L A N G U A G E S I N N O T E B O O K S You can mix multiple languages in the same notebook • Normally a notebook is associated with a specific language. • However, with Azure Databricks notebooks, you can mix multiple languages in the same notebook. This is done using the language magic command: • %python Allows you to execute python code in a notebook (even if that notebook is not python) • %sql Allows you to execute sql code in a notebook (even if that notebook is not sql). • %r Allows you to execute r code in a notebook (even if that notebook is not r). • %scala Allows you to execute scala code in a notebook (even if that notebook is not scala). • %sh Allows you to execute shell code in your notebook. • %fs Allows you to use Databricks Utilities - dbutils filesystem commands. • %md To include rendered markdown
  16. 16. Azure Databricks & AI Machine Learning, Deep Learning, and Transfer Learning
  17. 17. S P A R K M A C H I N E L E A R N I N G ( M L ) O V E R V I E W • Spark MLlib comes pre-installed on Azure Databricks • 3rd Party libraries supported include: H20 Sparkling Water, SciKit-learn and XGBoost Enables Parallel, Distributed ML for large datasets on Spark Clusters
  18. 18. D E E P L E A R N I N G    Applying Pre-trained Models for Scalable Prediction
  19. 19. D E E P L E A R N I N G P I P E L I N E S   Transfer Learning
  20. 20. T R A N S F E R L E A R N I N G  ImageNet  InceptionV3  Xception  ResNet50  VGG16/VGG19 Pre-Trained Libraries
  21. 21. I N S U M M A R Y  What did we learn?  In 2010, Spark was the Napolean Dynamite before he met his brother’s Girlfriend  In 2013, Databricks was created and Spark turned into the Napolean Dynamite after he got the tape cassette from his brother’s Girlfriend and he never looked back
  22. 22. © 2017 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

×