Azure Lowlands: An intro to Azure Data Lake

Rick van den Bosch
Rick van den BoschCloud Solution Architect at Betabit
Thank you to our sponsors!
Gold Sponsors
Silver Sponsors
Community Sponsors
An intro to
Azure Data Lake
Rick van den Bosch
M +31 (0)6 52 34 89 30
r.van.den.bosch@betabit.nl
Calendar
Data Lakes
About Azure Data Lake
Azure Data Lake Store
- DEMO
Azure Data Lake HDInsights
- DEMO
Azure Data Lake Analytics
- DEMO
Power BI
- DEMO
Resources
Rick van den Bosch
Cloud Solutions Architect
@rickvdbosch
rickvandenbosch.net
r.van.den.bosch@betabit.nl
Data Lakes
The Traditional Data Warehouse
6
Data sourcesNon-relational data
Ingest all data
regardless of
requirements
Store all data
in native format
without schema
definition
Do analysis
Hadoop, Spark, R,
Azure Data Lake
Analytics (ADLA)
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices
Designed for the questions you don’t yet know!
The Data Lake Approach
About Azure Data Lake
8
Azure Data Lake
• Store and analyze petabyte-size files and trillions of
objects
• Develop massively parallel programs with simplicity
• Debug and optimize your big data programs with ease
• Enterprise-grade security, auditing, and support
• Start in seconds, scale instantly, pay per job
• Built on YARN, designed for the cloud
9
Azure Lowlands: An intro to Azure Data Lake
HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics
Open Source Apache
Hadoop ADL Client
Azure DataBricks
HDInsight
Hive
• Performance at scale
• Optimized for analytics
• Multiple analytics engines
• Single repository sharing
Why Azure Data Lake?
an on-demand, real-time stream processing service with no-limits data lake built to support
massively parallel analytics
Azure Data Lake Store
Store
• Enterprise-wide hyper-scale repository
• Data of any size, type and ingestion speed
• Operational and exploratory analytics
• WebHDFS-compatible API
• Specifically designed to enable analytics
• Tuned for (data analytics scenario) performance
• Out of the box:
security, manageability, scalability, reliability, and
availability
15
Store
Architected and built for very high throughput at scale for
Big Data workloads
- No limits to file size, account size or number of files
Single-repository for sharing
- Cloud-scale distributed filesystem with file/folder
ACLS and RBAC
- Encryption-at-rest by default with Azure Key Vault
- Authenticated access with Azure Active Directory
integration
The Big Data platform for Microsoft
16
Key capabilities
Built for Hadoop
Unlimited storage, petabyte files
Performance-tuned for big data analytics
Enterprise-ready: Highly-available and secure
All data
Security
Authentication
• Azure Active Directory integration
• Oauth 2.0 support for REST interface
Access control
• Supports POSIX-style permissions (exposed by
WebHDFS)
• ACLs on root, subfolders and individual files
Encryption
18
Compatibility
19
Store
20
HDFS Compatible REST API
ADL Store
DEMO - Store
21
Ingest data – Ad hoc
Local computer
• Azure Portal
• Azure PowerShell
• Azure CLI
• Using Data Lake Tools for Visual Studio
Azure Storage Blob
• Azure Data Factory
• AdlCopy tool
• DistCp running on HDInsight cluster
22
Ingest data
Streamed
• Azure Stream Analytics
• Azure HDInsight Storm
• EventProcessorHost
Relational
• Apache Sqoop
• Azure Data Factory
23
Web server
Upload using custom applications
• Azure CLI
• Azure PowerShell
• Azure Data Lake Storage Gen1 .NET SDK
• Azure Data Factory
Ingest data
24
Process data
25
Download data
26
Visualize data
27
ADLS Gen 2
Takes core capabilities from Azure Data Lake Storage Gen1 such as
- a Hadoop compatible file system
- Azure Active Directory
- POSIX based ACLs
and integrates them into Azure Blob Storage
28
Additional benefits
Unlimited scale and performance
Performance improvements reading/writing individual objects (> throughput & concurrency)
Removes need to decide a priority: run analytics or not at data ingestion time
Data protection capabilities: encryption at rest
Integrated network Firewall capabilities
Durability options (Zone and Geo-Redundant Storage: high-availability and disaster recovery)
Linux integration – BlobFUSE
- mount Blob Storage from Linux VMs
- interact using standard Linux shell commands.
29
Data Lake Storage Gen2
“In Data Lake Storage Gen2, all
the qualities of object storage
remain while adding the
advantages of a file system
interface optimized for analytics
workloads.”
30
Known issues
Blob Storage APIs and Azure Data Lake Gen2 APIs aren't interoperable
Blob storage APIs not available
Azure Storage Explorer >= 1.6.0
AZCopy >= v10
Event Grid doesn't receive events
Soft Delete and Snapshots not available
Object level storage tiers not available
Diagnostic logs not available
31
Azure Data Lake HDInsight
HDInsight
Cloud distribution of the (Hortonworks) Hadoop
components
Supports multiple Hadoop cluster versions (can be
deployed any time)
Hadoop
• YARN for job scheduling & resource management
• MapReduce for parallel processing
• HDFS
33
Azure Lowlands: An intro to Azure Data Lake
HDInsight
35
Open Source Apache Hadoop ADL
Client
Azure DataBricks
HDInsight
Hive
DEMO - HDInsight
36
Azure Data Lake Analytics
Analytics
Dynamic scaling
Develop faster, debug and optimize smarter using familiar
tools
U-SQL: simple and familiar, powerful, and extensible
Integrates seamlessly with your IT investments
Affordable and cost effective
Works with all your Azure data
38
Analytics
On-demand analytics job service to simplify big data
analytics
Can handle jobs of any scale instantly
Azure Active Directory integration
U-SQL
39
Azure Data Lake Analytics
40
Analytics
Storage
HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics• Serverless. Pay per job. Starts in
seconds. Scales instantly.
• Develop massively parallel programs
with simplicity
• Federated query from multiple data
sources
U-SQL
Language that combines declarative SQL with imperative C#
41
U-SQL – Key concepts
Rowset variables
• Each query expression that produces a rowset can be
assigned to a variable.
EXTRACT
• Reads data from a file & defines the schema on read *
OUTPUT
• Writes data from a rowset to a file *
42
U-SQL – Scalar variables
DECLARE @in string = "/Samples/Data/SearchLog.tsv";
DECLARE @out string = "/output/SearchLog-scalar-variables.csv";
@searchlog =
EXTRACT UserId int,
ClickedUrls string
FROM @in
USING Extractors.Tsv();
OUTPUT @searchlog
TO @out
USING Outputters.Csv();
43
U-SQL – Transform rowsets
@searchlog =
EXTRACT UserId int,
Region string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@rs1 =
SELECT UserId, Region
FROM @searchlog
WHERE Region == "en-gb";
OUTPUT @rs1
TO "/output/SearchLog-transform-rowsets.csv"
USING Outputters.Csv();
44
U-SQL – Extractor parameters
delimiter
encoding
escapeCharacter
nullEscape
quoting
rowDelimiter
silent
skipFirstNRows
charFormat
U-SQL – Outputter parameters
delimiter
dateTimeFormat
encoding
escapeCharacter
nullEscape
quoting
rowDelimeter
charFormat
outputHeader
U-SQL
Built-in extractors and outputters:
Text
Csv
Tsv
A (for instance) CSV Extractor or Outputter is
EXACTLY THAT
Data sources
Options in the Azure Portal:
• Data Lake Storage Gen1
• Azure Storage
DEMO - Analytics
49
DEMO - Power BI
Resources
Resources
Basic example
Advanced example
Create Database (U-SQL) & Create Data Source (U-SQL)
This example
HDInsight quickstart
Azure blog
Azure roadmap
Bedankt voor je aandacht
Track 1
15:35 – 16:20
Skynet Is Talking - Microsoft Bot Framework
Kris van der Mast
Track 2
15:35 – 16:20
Enter The Matrix: Securing Azure's Assets
Mike Martin
1 of 52

Recommended

Dipping Your Toes: Azure Data Lake for DBAs by
Dipping Your Toes: Azure Data Lake for DBAsDipping Your Toes: Azure Data Lake for DBAs
Dipping Your Toes: Azure Data Lake for DBAsBob Pusateri
240 views74 slides
Microsoft cloud big data strategy by
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
8.7K views58 slides
Introduction to Azure Data Lake by
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data LakeAntonios Chatzipavlis
3.8K views32 slides
An intro to Azure Data Lake by
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data LakeRick van den Bosch
354 views58 slides
Data Analytics Meetup: Introduction to Azure Data Lake Storage by
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
1.2K views28 slides
Azure Synapse Analytics Overview (r1) by
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
24.8K views229 slides

More Related Content

What's hot

Azure data platform overview by
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
19.4K views57 slides
Azure data bricks by Eugene Polonichko by
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
435 views22 slides
RDX Insights Presentation - Microsoft Business Intelligence by
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business IntelligenceChristopher Foot
764 views30 slides
Running cost effective big data workloads with Azure Synapse and Azure Data L... by
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
735 views12 slides
What's new in SQL Server 2016 by
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
4.7K views49 slides
Azure Data Lake Intro (SQLBits 2016) by
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
2.8K views32 slides

What's hot(20)

Azure data platform overview by James Serra
Azure data platform overviewAzure data platform overview
Azure data platform overview
James Serra19.4K views
Azure data bricks by Eugene Polonichko by Alex Tumanoff
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
Alex Tumanoff435 views
RDX Insights Presentation - Microsoft Business Intelligence by Christopher Foot
RDX Insights Presentation - Microsoft Business IntelligenceRDX Insights Presentation - Microsoft Business Intelligence
RDX Insights Presentation - Microsoft Business Intelligence
Christopher Foot764 views
Running cost effective big data workloads with Azure Synapse and Azure Data L... by Michael Rys
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys735 views
What's new in SQL Server 2016 by James Serra
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
James Serra4.7K views
Azure Data Lake Intro (SQLBits 2016) by Michael Rys
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys2.8K views
Data warehouse con azure synapse analytics by Eduardo Castro
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
Eduardo Castro447 views
Building Modern Data Platform with Microsoft Azure by Dmitry Anoshin
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin3.2K views
Introduction to Azure Databricks by James Serra
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra27.3K views
Introducing Azure SQL Data Warehouse by James Serra
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
James Serra7.8K views
Cortana Analytics Workshop: Azure Data Lake by MSAdvAnalytics
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
MSAdvAnalytics3.2K views
Azure Cosmos DB + Gremlin API in Action by Denys Chamberland
Azure Cosmos DB + Gremlin API in ActionAzure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Denys Chamberland2.8K views
What’s new in SQL Server 2017 by James Serra
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
James Serra10.4K views
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat... by Microsoft Tech Community
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Cortana Analytics Suite by James Serra
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
James Serra9.4K views
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS by Amazon Web Services
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
Amazon Web Services1.3K views
Introduction to Microsoft’s Hadoop solution (HDInsight) by James Serra
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
James Serra8.1K views
Power BI for Big Data and the New Look of Big Data Solutions by James Serra
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
James Serra7.1K views
Delta Lake with Azure Databricks by Dustin Vannoy
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
Dustin Vannoy419 views
Azure Data Factory V2; The Data Flows by Thomas Sykes
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
Thomas Sykes662 views

Similar to Azure Lowlands: An intro to Azure Data Lake

Azure Synapse Analytics Overview (r2) by
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
23.2K views251 slides
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen by
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenMS Cloud Summit
504 views60 slides
Accelerating Business Intelligence Solutions with Microsoft Azure pass by
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
1.9K views75 slides
Modern Analytics Academy - Data Modeling (1).pptx by
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxssuser290967
11 views19 slides
Azure data lake sql konf 2016 by
Azure data lake   sql konf 2016Azure data lake   sql konf 2016
Azure data lake sql konf 2016Kenneth Michael Nielsen
1.2K views41 slides
CC -Unit4.pptx by
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptxRevathiparamanathan
7 views37 slides

Similar to Azure Lowlands: An intro to Azure Data Lake(20)

Azure Synapse Analytics Overview (r2) by James Serra
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra23.2K views
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen by MS Cloud Summit
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
MS Cloud Summit504 views
Accelerating Business Intelligence Solutions with Microsoft Azure pass by Jason Strate
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Jason Strate1.9K views
Modern Analytics Academy - Data Modeling (1).pptx by ssuser290967
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptx
ssuser29096711 views
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini... by Michael Rys
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys1.8K views
Azure Data Lake and U-SQL by Michael Rys
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
Michael Rys5.1K views
Prague data management meetup 2018-03-27 by Martin Bém
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém212 views
Azure - Data Platform by giventocode
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
giventocode2.4K views
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron) by Trivadis
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Trivadis284 views
USQL Trivadis Azure Data Lake Event by Trivadis
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis465 views
Analytics in the Cloud by Ross McNeely
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
Ross McNeely421 views
QuerySurge Slide Deck for Big Data Testing Webinar by RTTS
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS29.7K views
Scalable relational database with SQL Azure by Shy Engelberg
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
Shy Engelberg587 views
Demystifying Data Warehouse as a Service (DWaaS) by Kent Graziano
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano2.9K views
1 Introduction to Microsoft data platform analytics for release by Jen Stirrup
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
Jen Stirrup567 views
5 Comparing Microsoft Big Data Technologies for Analytics by Jen Stirrup
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
Jen Stirrup374 views

More from Rick van den Bosch

Configuration in azure done right by
Configuration in azure done rightConfiguration in azure done right
Configuration in azure done rightRick van den Bosch
353 views28 slides
Getting started with Azure Cognitive services by
Getting started with Azure Cognitive servicesGetting started with Azure Cognitive services
Getting started with Azure Cognitive servicesRick van den Bosch
99 views29 slides
From .NET Core 3, all the rest will be legacy by
From .NET Core 3, all the rest will be legacyFrom .NET Core 3, all the rest will be legacy
From .NET Core 3, all the rest will be legacyRick van den Bosch
291 views54 slides
Getting sh*t done with Azure Functions (on AKS!) by
Getting sh*t done with Azure Functions (on AKS!)Getting sh*t done with Azure Functions (on AKS!)
Getting sh*t done with Azure Functions (on AKS!)Rick van den Bosch
776 views43 slides
SAFwAD @ Intelligent Cloud Conference by
SAFwAD @ Intelligent Cloud ConferenceSAFwAD @ Intelligent Cloud Conference
SAFwAD @ Intelligent Cloud ConferenceRick van den Bosch
298 views45 slides
Securing an Azure Function REST API with Azure Active Directory by
Securing an Azure Function REST API with Azure Active DirectorySecuring an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active DirectoryRick van den Bosch
259 views29 slides

More from Rick van den Bosch(11)

From .NET Core 3, all the rest will be legacy by Rick van den Bosch
From .NET Core 3, all the rest will be legacyFrom .NET Core 3, all the rest will be legacy
From .NET Core 3, all the rest will be legacy
Rick van den Bosch291 views
Getting sh*t done with Azure Functions (on AKS!) by Rick van den Bosch
Getting sh*t done with Azure Functions (on AKS!)Getting sh*t done with Azure Functions (on AKS!)
Getting sh*t done with Azure Functions (on AKS!)
Rick van den Bosch776 views
Securing an Azure Function REST API with Azure Active Directory by Rick van den Bosch
Securing an Azure Function REST API with Azure Active DirectorySecuring an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active Directory
Rick van den Bosch259 views
TechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event Grid by Rick van den Bosch
TechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event GridTechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event Grid
TechDays 2017 - Going Serverless (2/2): Hands-on with Azure Event Grid
Rick van den Bosch379 views
TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water” by Rick van den Bosch
TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”
TechDays 2016 - Case Study: Azure + IOT + LoRa = ”Leven is Water”
Rick van den Bosch996 views
Take control of your deployments with Release Management by Rick van den Bosch
Take control of your deployments with Release ManagementTake control of your deployments with Release Management
Take control of your deployments with Release Management
Rick van den Bosch967 views

Recently uploaded

Business Analyst Series 2023 - Week 4 Session 8 by
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8DianaGray10
145 views13 slides
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...ShapeBlue
199 views20 slides
"Node.js Development in 2024: trends and tools", Nikita Galkin by
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
33 views38 slides
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...BookNet Canada
41 views16 slides
Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
65 views27 slides
Cencora Executive Symposium by
Cencora Executive SymposiumCencora Executive Symposium
Cencora Executive Symposiummarketingcommunicati21
160 views14 slides

Recently uploaded(20)

Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10145 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue199 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 views
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada41 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty65 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue265 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue129 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue224 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue225 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
LLMs in Production: Tooling, Process, and Team Structure by Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage57 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays58 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue162 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE84 views
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro35 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash162 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu437 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue183 views

Azure Lowlands: An intro to Azure Data Lake

  • 1. Thank you to our sponsors! Gold Sponsors Silver Sponsors Community Sponsors
  • 2. An intro to Azure Data Lake Rick van den Bosch M +31 (0)6 52 34 89 30 r.van.den.bosch@betabit.nl
  • 3. Calendar Data Lakes About Azure Data Lake Azure Data Lake Store - DEMO Azure Data Lake HDInsights - DEMO Azure Data Lake Analytics - DEMO Power BI - DEMO Resources
  • 4. Rick van den Bosch Cloud Solutions Architect @rickvdbosch rickvandenbosch.net r.van.den.bosch@betabit.nl
  • 6. The Traditional Data Warehouse 6 Data sourcesNon-relational data
  • 7. Ingest all data regardless of requirements Store all data in native format without schema definition Do analysis Hadoop, Spark, R, Azure Data Lake Analytics (ADLA) Interactive queries Batch queries Machine Learning Data warehouse Real-time analytics Devices Designed for the questions you don’t yet know! The Data Lake Approach
  • 9. Azure Data Lake • Store and analyze petabyte-size files and trillions of objects • Develop massively parallel programs with simplicity • Debug and optimize your big data programs with ease • Enterprise-grade security, auditing, and support • Start in seconds, scale instantly, pay per job • Built on YARN, designed for the cloud 9
  • 11. HDFS Compatible REST API ADL Store .NET, SQL, Python, R scaled out by U-SQL ADL Analytics Open Source Apache Hadoop ADL Client Azure DataBricks HDInsight Hive • Performance at scale • Optimized for analytics • Multiple analytics engines • Single repository sharing Why Azure Data Lake? an on-demand, real-time stream processing service with no-limits data lake built to support massively parallel analytics
  • 13. Store • Enterprise-wide hyper-scale repository • Data of any size, type and ingestion speed • Operational and exploratory analytics • WebHDFS-compatible API • Specifically designed to enable analytics • Tuned for (data analytics scenario) performance • Out of the box: security, manageability, scalability, reliability, and availability 15
  • 14. Store Architected and built for very high throughput at scale for Big Data workloads - No limits to file size, account size or number of files Single-repository for sharing - Cloud-scale distributed filesystem with file/folder ACLS and RBAC - Encryption-at-rest by default with Azure Key Vault - Authenticated access with Azure Active Directory integration The Big Data platform for Microsoft 16
  • 15. Key capabilities Built for Hadoop Unlimited storage, petabyte files Performance-tuned for big data analytics Enterprise-ready: Highly-available and secure All data
  • 16. Security Authentication • Azure Active Directory integration • Oauth 2.0 support for REST interface Access control • Supports POSIX-style permissions (exposed by WebHDFS) • ACLs on root, subfolders and individual files Encryption 18
  • 20. Ingest data – Ad hoc Local computer • Azure Portal • Azure PowerShell • Azure CLI • Using Data Lake Tools for Visual Studio Azure Storage Blob • Azure Data Factory • AdlCopy tool • DistCp running on HDInsight cluster 22
  • 21. Ingest data Streamed • Azure Stream Analytics • Azure HDInsight Storm • EventProcessorHost Relational • Apache Sqoop • Azure Data Factory 23 Web server Upload using custom applications • Azure CLI • Azure PowerShell • Azure Data Lake Storage Gen1 .NET SDK • Azure Data Factory
  • 26. ADLS Gen 2 Takes core capabilities from Azure Data Lake Storage Gen1 such as - a Hadoop compatible file system - Azure Active Directory - POSIX based ACLs and integrates them into Azure Blob Storage 28
  • 27. Additional benefits Unlimited scale and performance Performance improvements reading/writing individual objects (> throughput & concurrency) Removes need to decide a priority: run analytics or not at data ingestion time Data protection capabilities: encryption at rest Integrated network Firewall capabilities Durability options (Zone and Geo-Redundant Storage: high-availability and disaster recovery) Linux integration – BlobFUSE - mount Blob Storage from Linux VMs - interact using standard Linux shell commands. 29
  • 28. Data Lake Storage Gen2 “In Data Lake Storage Gen2, all the qualities of object storage remain while adding the advantages of a file system interface optimized for analytics workloads.” 30
  • 29. Known issues Blob Storage APIs and Azure Data Lake Gen2 APIs aren't interoperable Blob storage APIs not available Azure Storage Explorer >= 1.6.0 AZCopy >= v10 Event Grid doesn't receive events Soft Delete and Snapshots not available Object level storage tiers not available Diagnostic logs not available 31
  • 30. Azure Data Lake HDInsight
  • 31. HDInsight Cloud distribution of the (Hortonworks) Hadoop components Supports multiple Hadoop cluster versions (can be deployed any time) Hadoop • YARN for job scheduling & resource management • MapReduce for parallel processing • HDFS 33
  • 33. HDInsight 35 Open Source Apache Hadoop ADL Client Azure DataBricks HDInsight Hive
  • 35. Azure Data Lake Analytics
  • 36. Analytics Dynamic scaling Develop faster, debug and optimize smarter using familiar tools U-SQL: simple and familiar, powerful, and extensible Integrates seamlessly with your IT investments Affordable and cost effective Works with all your Azure data 38
  • 37. Analytics On-demand analytics job service to simplify big data analytics Can handle jobs of any scale instantly Azure Active Directory integration U-SQL 39
  • 38. Azure Data Lake Analytics 40 Analytics Storage HDFS Compatible REST API ADL Store .NET, SQL, Python, R scaled out by U-SQL ADL Analytics• Serverless. Pay per job. Starts in seconds. Scales instantly. • Develop massively parallel programs with simplicity • Federated query from multiple data sources
  • 39. U-SQL Language that combines declarative SQL with imperative C# 41
  • 40. U-SQL – Key concepts Rowset variables • Each query expression that produces a rowset can be assigned to a variable. EXTRACT • Reads data from a file & defines the schema on read * OUTPUT • Writes data from a rowset to a file * 42
  • 41. U-SQL – Scalar variables DECLARE @in string = "/Samples/Data/SearchLog.tsv"; DECLARE @out string = "/output/SearchLog-scalar-variables.csv"; @searchlog = EXTRACT UserId int, ClickedUrls string FROM @in USING Extractors.Tsv(); OUTPUT @searchlog TO @out USING Outputters.Csv(); 43
  • 42. U-SQL – Transform rowsets @searchlog = EXTRACT UserId int, Region string FROM "/Samples/Data/SearchLog.tsv" USING Extractors.Tsv(); @rs1 = SELECT UserId, Region FROM @searchlog WHERE Region == "en-gb"; OUTPUT @rs1 TO "/output/SearchLog-transform-rowsets.csv" USING Outputters.Csv(); 44
  • 43. U-SQL – Extractor parameters delimiter encoding escapeCharacter nullEscape quoting rowDelimiter silent skipFirstNRows charFormat
  • 44. U-SQL – Outputter parameters delimiter dateTimeFormat encoding escapeCharacter nullEscape quoting rowDelimeter charFormat outputHeader
  • 45. U-SQL Built-in extractors and outputters: Text Csv Tsv A (for instance) CSV Extractor or Outputter is EXACTLY THAT
  • 46. Data sources Options in the Azure Portal: • Data Lake Storage Gen1 • Azure Storage
  • 50. Resources Basic example Advanced example Create Database (U-SQL) & Create Data Source (U-SQL) This example HDInsight quickstart Azure blog Azure roadmap
  • 51. Bedankt voor je aandacht
  • 52. Track 1 15:35 – 16:20 Skynet Is Talking - Microsoft Bot Framework Kris van der Mast Track 2 15:35 – 16:20 Enter The Matrix: Securing Azure's Assets Mike Martin

Editor's Notes

  1. WebHDFS means Hadoop & HDInsight compatibility
  2. Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem It does not impose any limits on account sizes, file sizes, or the amount of data that can be stored in a data lake. Data is stored durably by making multiple copies and there is no limit on the duration of time The data lake spreads parts of a file over a number of individual storage servers. This improves the read throughput when reading the file in parallel for performing data analytics. Redundant copies, enterprise-grade security for the stored data Data in native format, loading data doesn’t require a schema, structured, semi-structured, and unstructured data
  3. including multi-factor authentication, conditional access, role-based access control, application usage monitoring, security monitoring and alerting, etc. Do not enable encryption. Use keys managed by Data Lake Storage Gen1 Use keys from your own Key Vault.
  4. You can use Azure HDInsight and Azure Data Lake Analytics to run data analysis jobs on the data stored in Data Lake Storage Gen1.
  5. Apache Sqoop, Azure Data Factory, Apache DistCp Custom script / application Azure CLI, Azure PowerShell, Azure Data Lake Storage Gen1 .NET SDK
  6. Because Azure Data Lake Storage Gen2 is integrated into the Azure Storage platform, applications can use either the BLOB APIs or Azure Data Lake Storage Gen2 file system APIs for accessing data. BLOB APIs allow you to leverage your existing investments in BLOB Storage and continue to take advantage of the large ecosystem of first and third party applications already available while the Azure Data Lake Storage Gen2 file system APIs are optimized for analytics engines like Hadoop and Spark.
  7. Storage Explorer in portal WILL NOT WORK, Blob Viewing Tool partial support