Usama wahab Khan
MVP,MCT, CTO @Evolution Technologies
Usama Wahab Khan
Father, data Scientist, Developer/Nerd, Traveler
Twitter : @usamawahabkhan
LinkedIn : Usamawahabkhan
Data abundance
Processes Businesses are tasked to store,
interpret, manage, transform,
process, aggregate and report
on data
Consumers There are a wider range of
consumers using different
types of devices to consume or
generate data
Variety There’s a wider variety of data
types that need to be
processed and stored
Responsibiliti
es
A data engineers role is
responsible for more data types
and technologies
Technologies Microsoft Azure provides a
wide set of tools and
technologies
New skills
for new
platforms
Changing
loading
approaches
From
implementi
ng to
provisionin
g
Data engineering job
responsibilities
CONTROL EASE OF USE
Azure Data Lake
Analytics
Any Hadoop technology,
any distribution
Workload optimized,
managed clusters
Data Engineering in a
Job-as-a-service model
Azure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
Virtual Machines Managed Clusters Big Data as-a-service
Azure HDInsight
Frictionless & Optimized
Spark clusters
Azure Databricks
BIG
DATA
ANALYTICS
Reduced
Administration
B I G D ATA I N M I C R O S O F T A Z U R E
Azure Data Lake Store
Azure Storage
BIG
DATA
STORAGE
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business
analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)
Best of Microsoft
Best of Databricks
Azure Databricks
Enhance Productivity Build on secure & trusted cloud Scale without limits
Reference architecture
Reference architecture Business intelligence
Reference architecture Real-time analytics Big data
Demo
Q & A
Usama Wahab Khan
Twitter : @usamawahabkhan
LinkedIn : Usamawahabkhan
Thank you 

Azure databricks by usama whaba khan

  • 1.
    Usama wahab Khan MVP,MCT,CTO @Evolution Technologies
  • 2.
    Usama Wahab Khan Father,data Scientist, Developer/Nerd, Traveler Twitter : @usamawahabkhan LinkedIn : Usamawahabkhan
  • 3.
    Data abundance Processes Businessesare tasked to store, interpret, manage, transform, process, aggregate and report on data Consumers There are a wider range of consumers using different types of devices to consume or generate data Variety There’s a wider variety of data types that need to be processed and stored Responsibiliti es A data engineers role is responsible for more data types and technologies Technologies Microsoft Azure provides a wide set of tools and technologies
  • 4.
    New skills for new platforms Changing loading approaches From implementi ngto provisionin g Data engineering job responsibilities
  • 5.
    CONTROL EASE OFUSE Azure Data Lake Analytics Any Hadoop technology, any distribution Workload optimized, managed clusters Data Engineering in a Job-as-a-service model Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics Virtual Machines Managed Clusters Big Data as-a-service Azure HDInsight Frictionless & Optimized Spark clusters Azure Databricks BIG DATA ANALYTICS Reduced Administration B I G D ATA I N M I C R O S O F T A Z U R E Azure Data Lake Store Azure Storage BIG DATA STORAGE
  • 6.
    What is AzureDatabricks? A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure Designed in collaboration with the founders of Apache Spark One-click set up; streamlined workflows Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs) Best of Microsoft Best of Databricks
  • 8.
    Azure Databricks Enhance ProductivityBuild on secure & trusted cloud Scale without limits
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    Q & A UsamaWahab Khan Twitter : @usamawahabkhan LinkedIn : Usamawahabkhan
  • 15.

Editor's Notes

  • #2 Introduce the team (self-introductions). Mention LearnAI – team. 3 day airlift, transition from pure databricks to AML We will use notebooks to introduce tools and techniques, and then return to one use-case We have three kinds of session: (1) presentation style, (2) demos (w/ small exercises), (3) hands-on labs. Last day is a Hackathon (w/ two use cases) Check people’s skills. Experience with Databricks, Jupyter notebooks, VS Code, Deep Learning. Who has heard of AMLCompute? Who has used it? Who has used CI/CD and git version control?
  • #4 Instructor notes It important to stress that at this stage we are just going through a high level overview of how the growth of data has had an impact on a wide range of people, processes and technologies. Is there anyone in the classroom that are having to deal with new data types, processes, or technologies? Has their role evolved? Use the answers to drill down into a short discussion if necessary. The end game is to ensure that the students see relevancy to learning the material you are about to present
  • #5 Instructor notes This slide outlines the three core areas that Data Engineers will be responsible for. First is a responsibility to learn new skills for the new platforms. This may mean understanding new data storage paradigms such as No-SQL solutions, or streaming data solutions. It will also likely involve learning new languages depending on the technologies that are provisioned. Data Engineers will also have to be open to the idea that data loading techniques that were appropriate for on-premises scenarios, may not necessarily work for processing data within the cloud. More details will follow as they go through the course. Students should also understand that with the move from on-premises to the cloud, they will move from a place of physically implementing machines and services, to provisioning them either using the Azure portal, or likely be creating code that can be used to quickly deploy services with minimal errors. Azure Administrators refer to this as infrastructure as code