SlideShare a Scribd company logo
What is Big Data?
What is Big Data?
Big Data = All Data!
Big Data = All Data!
Unstructured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
Semi-Structured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Structured
Big Data = All Data!
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Structured CSV, Columnar Storage (Parquet,
ORC). Strict data model structure
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
• Velocity: It can be streaming, near real-time or batch
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
• Velocity: It can be streaming, near real-time or batch
• Volume: It can be 1GB or 1PB
Why is Processing Big Data Challenging ?
• Variety: It can be structured, semi-structured, or
unstructured
• Velocity: It can be streaming, near real-time or batch
• Volume: It can be 1GB or 1PB
Why is Processing Big Data Challenging ?
TrustedProductive IntelligentHybrid
Azure. Cloud for all.
>80%
of Fortune 500 use
the Microsoft Cloud
Azure Big Data Processing Pipeline Ingest
Azure Event Hubs
Compose, orchestrate & monitor data services at scale
• Fully managed service
• Any data on-premises or in the cloud
• Single pane of glass management
• Global service infrastructure
• Cost Effective
Azure Data Factory
BI & analytics
Stored Procedures
Hadoop on Azure
Data Lake Analytics
Custom Code
Machine Learning
Trusted data
Azure Big Data Processing Pipeline Store
A Z U R E B L O B S T O R A G E
• A highly scalable object storage for unstructured data
 Serverless Azure Service.
 Can store billions of Images, Videos, Audio,
Documents etc.
 Automatically scales as more data is uploaded.
 Four Replication Options: LRS, GRS, ZRS and
RA-GRS
A Z U R E D A T A L A K E S T O R E
• A highly scalable, parallel, file system in the cloud specifically optimized for big data Analytics
 No limits on: data types, number of files, size of
individual files, total amount of data stored, how
long data can be stored or ingestion throughput
 Low latency and high throughput workloads can be
used for ingesting streaming data.
 Is Hadoop-compatible (via WebHDFS REST API).
Supported by leading Hadoop distros and
HDInsight. Backend Storage in Azure
Data Node Data Node Data Node Data Node Data NodeData Node
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rdBlock Block Block Block Block Block
Block 1 Block 2 Block n…
Azure Data Lake Store File
Azure Big Data Processing Pipeline Process
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Azure Databricks
A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W
• Notebooks are a popular way to develop, and run, Spark Applications
 Notebooks are not only for authoring Spark applications but
can be run/executed directly on clusters
• Shift+Enter
•
•
 Notebooks support fine grained permissions—so they can be
securely shared with colleagues for collaboration (see
following slide for details on permissions and abilities)
 Notebooks are well-suited for prototyping, rapid
development, exploration, discovery and iterative
development Notebooks typically consist of code, data, visualization, comments and notes
Big Data Processing Pipeline
Azure
Machine
Learning
SQL
MongoDB
Table API
Turnkey global
distribution
Elastic scale out
of storage & throughput
Guaranteed low latency
at the 99th percentile
Comprehensive
SLAs
Five well-defined
consistency models
Azure Cosmos DB
DocumentColumn-family
Key-value Graph
A globally distributed, massively scalable, multi-model database service
No SQL Decision Tree
Azure Data Explorer Kusto
(Developed in Israel)
Azure Data Explorer Kusto
(Developed in Israel)
Azure Data Explorer Kusto
(Developed in Israel)
Azure Data Explorer
• Perform near real-time queries on terabytes of data
• A lightning-fast indexing and querying service for complex analytics.
• Allows you to quickly identify trends, patterns, or anomalies in all
data types inclusive of structured, semi structured and unstructured
data.
Big Data Processing Pipeline
Visualize
Azure
Machine
Learning
DEMO
Big Data with Azure

More Related Content

What's hot

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
John Archer
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
Koray Kocabas
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
boorad
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
Data Con LA
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
Big Data in Azure
Big Data in AzureBig Data in Azure
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
Vincent Terrasi
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
Stéphane Fréchette
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
James Serra
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Mark Rittman
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
James Serra
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
Venkatesh Narayanan
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
Eyal Ben Ivri
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
Alberto Diaz Martin
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Michael Rys
 

What's hot (20)

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 

Similar to Big Data with Azure

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
James Serra
 
It's All About the Data - Tia Dubuisson
It's All About the Data - Tia DubuissonIt's All About the Data - Tia Dubuisson
It's All About the Data - Tia Dubuisson
Catalina Arango
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
Trivadis
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
James Serra
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
James Serra
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
Jen Stirrup
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Dataconomy Media
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
Amazon Web Services
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
James Serra
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Rakesh Jayaram
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
Microsoft
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
Amazon Web Services
 

Similar to Big Data with Azure (20)

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
It's All About the Data - Tia Dubuisson
It's All About the Data - Tia DubuissonIt's All About the Data - Tia Dubuisson
It's All About the Data - Tia Dubuisson
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 

More from Aaron (Ari) Bornstein

The Importance of Developing Interpretable AI Applications
The Importance of Developing Interpretable AI ApplicationsThe Importance of Developing Interpretable AI Applications
The Importance of Developing Interpretable AI Applications
Aaron (Ari) Bornstein
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
Aaron (Ari) Bornstein
 
Microsoft Breeze CA AI Workshop
Microsoft Breeze CA AI WorkshopMicrosoft Breeze CA AI Workshop
Microsoft Breeze CA AI Workshop
Aaron (Ari) Bornstein
 
What startups need to know about NLP, AI, & ML on the cloud.
What startups need to know about NLP, AI, & ML on the cloud.What startups need to know about NLP, AI, & ML on the cloud.
What startups need to know about NLP, AI, & ML on the cloud.
Aaron (Ari) Bornstein
 
PyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings SlidesPyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings Slides
Aaron (Ari) Bornstein
 
Best practices for DevRel in Israel
Best practices for DevRel in IsraelBest practices for DevRel in Israel
Best practices for DevRel in Israel
Aaron (Ari) Bornstein
 
NLP in the Industry
NLP in the Industry NLP in the Industry
NLP in the Industry
Aaron (Ari) Bornstein
 
Democratizing AI Istanbul Open Source Summit
Democratizing AI Istanbul Open Source SummitDemocratizing AI Istanbul Open Source Summit
Democratizing AI Istanbul Open Source Summit
Aaron (Ari) Bornstein
 
Beyond word embeddings
Beyond word embeddingsBeyond word embeddings
Beyond word embeddings
Aaron (Ari) Bornstein
 
Data Hack 2018 Microsoft Math Teacher Challenge
Data Hack 2018 Microsoft Math Teacher ChallengeData Hack 2018 Microsoft Math Teacher Challenge
Data Hack 2018 Microsoft Math Teacher Challenge
Aaron (Ari) Bornstein
 
A walk through Azure IoT
A walk through Azure IoTA walk through Azure IoT
A walk through Azure IoT
Aaron (Ari) Bornstein
 
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
PyconIL 2017  Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...PyconIL 2017  Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
Aaron (Ari) Bornstein
 
DLD TLV Cognitive Services: The Brains Behind Your Bot
DLD TLV Cognitive Services:The Brains Behind Your BotDLD TLV Cognitive Services:The Brains Behind Your Bot
DLD TLV Cognitive Services: The Brains Behind Your Bot
Aaron (Ari) Bornstein
 

More from Aaron (Ari) Bornstein (13)

The Importance of Developing Interpretable AI Applications
The Importance of Developing Interpretable AI ApplicationsThe Importance of Developing Interpretable AI Applications
The Importance of Developing Interpretable AI Applications
 
Unsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at ScaleUnsupervised Aspect Based Sentiment Analysis at Scale
Unsupervised Aspect Based Sentiment Analysis at Scale
 
Microsoft Breeze CA AI Workshop
Microsoft Breeze CA AI WorkshopMicrosoft Breeze CA AI Workshop
Microsoft Breeze CA AI Workshop
 
What startups need to know about NLP, AI, & ML on the cloud.
What startups need to know about NLP, AI, & ML on the cloud.What startups need to know about NLP, AI, & ML on the cloud.
What startups need to know about NLP, AI, & ML on the cloud.
 
PyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings SlidesPyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings Slides
 
Best practices for DevRel in Israel
Best practices for DevRel in IsraelBest practices for DevRel in Israel
Best practices for DevRel in Israel
 
NLP in the Industry
NLP in the Industry NLP in the Industry
NLP in the Industry
 
Democratizing AI Istanbul Open Source Summit
Democratizing AI Istanbul Open Source SummitDemocratizing AI Istanbul Open Source Summit
Democratizing AI Istanbul Open Source Summit
 
Beyond word embeddings
Beyond word embeddingsBeyond word embeddings
Beyond word embeddings
 
Data Hack 2018 Microsoft Math Teacher Challenge
Data Hack 2018 Microsoft Math Teacher ChallengeData Hack 2018 Microsoft Math Teacher Challenge
Data Hack 2018 Microsoft Math Teacher Challenge
 
A walk through Azure IoT
A walk through Azure IoTA walk through Azure IoT
A walk through Azure IoT
 
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
PyconIL 2017  Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...PyconIL 2017  Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
PyconIL 2017 Realtime Sensor Anomaly Detection with Scikit-Learn and the Azu...
 
DLD TLV Cognitive Services: The Brains Behind Your Bot
DLD TLV Cognitive Services:The Brains Behind Your BotDLD TLV Cognitive Services:The Brains Behind Your Bot
DLD TLV Cognitive Services: The Brains Behind Your Bot
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

Big Data with Azure

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. What is Big Data?
  • 10. What is Big Data? Big Data = All Data!
  • 11. Big Data = All Data! Unstructured
  • 12. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured
  • 13. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured Semi-Structured
  • 14. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured JSON, XML, sensor data, social media, device data, web logs. Flexible data model structure Semi-Structured
  • 15. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured JSON, XML, sensor data, social media, device data, web logs. Flexible data model structure Semi-Structured Structured
  • 16. Big Data = All Data! Audio, video, images. Meaningless without adding some structure Unstructured JSON, XML, sensor data, social media, device data, web logs. Flexible data model structure Semi-Structured Structured CSV, Columnar Storage (Parquet, ORC). Strict data model structure
  • 17. Why is Processing Big Data Challenging ?
  • 18. • Variety: It can be structured, semi-structured, or unstructured Why is Processing Big Data Challenging ?
  • 19. • Variety: It can be structured, semi-structured, or unstructured • Velocity: It can be streaming, near real-time or batch Why is Processing Big Data Challenging ?
  • 20. • Variety: It can be structured, semi-structured, or unstructured • Velocity: It can be streaming, near real-time or batch • Volume: It can be 1GB or 1PB Why is Processing Big Data Challenging ?
  • 21. • Variety: It can be structured, semi-structured, or unstructured • Velocity: It can be streaming, near real-time or batch • Volume: It can be 1GB or 1PB Why is Processing Big Data Challenging ?
  • 22.
  • 23.
  • 25. >80% of Fortune 500 use the Microsoft Cloud
  • 26.
  • 27. Azure Big Data Processing Pipeline Ingest
  • 29. Compose, orchestrate & monitor data services at scale • Fully managed service • Any data on-premises or in the cloud • Single pane of glass management • Global service infrastructure • Cost Effective Azure Data Factory BI & analytics Stored Procedures Hadoop on Azure Data Lake Analytics Custom Code Machine Learning Trusted data
  • 30. Azure Big Data Processing Pipeline Store
  • 31. A Z U R E B L O B S T O R A G E • A highly scalable object storage for unstructured data  Serverless Azure Service.  Can store billions of Images, Videos, Audio, Documents etc.  Automatically scales as more data is uploaded.  Four Replication Options: LRS, GRS, ZRS and RA-GRS
  • 32. A Z U R E D A T A L A K E S T O R E • A highly scalable, parallel, file system in the cloud specifically optimized for big data Analytics  No limits on: data types, number of files, size of individual files, total amount of data stored, how long data can be stored or ingestion throughput  Low latency and high throughput workloads can be used for ingesting streaming data.  Is Hadoop-compatible (via WebHDFS REST API). Supported by leading Hadoop distros and HDInsight. Backend Storage in Azure Data Node Data Node Data Node Data Node Data NodeData Node Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rd Sha rdBlock Block Block Block Block Block Block 1 Block 2 Block n… Azure Data Lake Store File
  • 33. Azure Big Data Processing Pipeline Process
  • 34. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits Azure Databricks
  • 35. A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W • Notebooks are a popular way to develop, and run, Spark Applications  Notebooks are not only for authoring Spark applications but can be run/executed directly on clusters • Shift+Enter • •  Notebooks support fine grained permissions—so they can be securely shared with colleagues for collaboration (see following slide for details on permissions and abilities)  Notebooks are well-suited for prototyping, rapid development, exploration, discovery and iterative development Notebooks typically consist of code, data, visualization, comments and notes
  • 36. Big Data Processing Pipeline Azure Machine Learning
  • 37. SQL MongoDB Table API Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Azure Cosmos DB DocumentColumn-family Key-value Graph A globally distributed, massively scalable, multi-model database service
  • 39. Azure Data Explorer Kusto (Developed in Israel)
  • 40. Azure Data Explorer Kusto (Developed in Israel)
  • 41. Azure Data Explorer Kusto (Developed in Israel)
  • 42. Azure Data Explorer • Perform near real-time queries on terabytes of data • A lightning-fast indexing and querying service for complex analytics. • Allows you to quickly identify trends, patterns, or anomalies in all data types inclusive of structured, semi structured and unstructured data.
  • 43. Big Data Processing Pipeline Visualize Azure Machine Learning
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. DEMO