Big Data with Azure

What is Big Data?
Big Data = All Data!

Unstructured

Audio, video, images. Meaningless
without adding some structure
Unstructured

Unstructured
Semi-Structured

Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured

Unstructured
model structure
Semi-Structured
Structured

Unstructured
model structure
Semi-Structured
Structured CSV, Columnar Storage (Parquet,
ORC). Strict data model structure

Why is Processing Big Data Challenging ?

• Variety: It can be structured, semi-structured, or
unstructured

unstructured
• Velocity: It can be streaming, near real-time or batch

unstructured
• Velocity: It can be streaming, near real-time or batch
• Volume: It can be 1GB or 1PB

TrustedProductive IntelligentHybrid
Azure. Cloud for all.

>80%
of Fortune 500 use
the Microsoft Cloud

Azure Big Data Processing Pipeline Ingest

Compose, orchestrate & monitor data services at scale
• Fully managed service
• Any data on-premises or in the cloud
• Single pane of glass management
• Global service infrastructure
• Cost Effective
Azure Data Factory
BI & analytics
Stored Procedures
Hadoop on Azure
Data Lake Analytics
Custom Code
Machine Learning
Trusted data

Azure Big Data Processing Pipeline Store

A Z U R E B L O B S T O R A G E
• A highly scalable object storage for unstructured data
 Serverless Azure Service.
 Can store billions of Images, Videos, Audio,
Documents etc.
 Automatically scales as more data is uploaded.
 Four Replication Options: LRS, GRS, ZRS and
RA-GRS

A Z U R E D A T A L A K E S T O R E
• A highly scalable, parallel, file system in the cloud specifically optimized for big data Analytics
 No limits on: data types, number of files, size of
individual files, total amount of data stored, how
long data can be stored or ingestion throughput
 Low latency and high throughput workloads can be
used for ingesting streaming data.
 Is Hadoop-compatible (via WebHDFS REST API).
Supported by leading Hadoop distros and
HDInsight. Backend Storage in Azure
Data Node Data Node Data Node Data Node Data NodeData Node
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rd
Sha
rdBlock Block Block Block Block Block
Block 1 Block 2 Block n…
Azure Data Lake Store File

Azure Big Data Processing Pipeline Process

Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Azure Databricks

A Z U R E D A T A B R I C K S N O T E B O O K S O V E R V I E W
• Notebooks are a popular way to develop, and run, Spark Applications
 Notebooks are not only for authoring Spark applications but
can be run/executed directly on clusters
• Shift+Enter
•
•
 Notebooks support fine grained permissions—so they can be
securely shared with colleagues for collaboration (see
following slide for details on permissions and abilities)
 Notebooks are well-suited for prototyping, rapid
development, exploration, discovery and iterative
development Notebooks typically consist of code, data, visualization, comments and notes

Big Data Processing Pipeline
Azure
Machine
Learning

SQL
MongoDB
Table API
Turnkey global
distribution
Elastic scale out
of storage & throughput
Guaranteed low latency
at the 99th percentile
Comprehensive
SLAs
Five well-defined
consistency models
Azure Cosmos DB
DocumentColumn-family
Key-value Graph
A globally distributed, massively scalable, multi-model database service

Azure Data Explorer Kusto
(Developed in Israel)

Azure Data Explorer
• Perform near real-time queries on terabytes of data
• A lightning-fast indexing and querying service for complex analytics.
• Allows you to quickly identify trends, patterns, or anomalies in all
data types inclusive of structured, semi structured and unstructured
data.

Big Data Processing Pipeline
Visualize
Azure
Machine
Learning

Big Data with Azure

More Related Content

What's hot

Similar to Big Data with Azure

More from Aaron (Ari) Bornstein

Recently uploaded

Big Data with Azure