SlideShare a Scribd company logo
1 of 44
Download to read offline
DP-203 Microsoft Azure Data
Engineer
Day 1 – Data Lake Gen2
25th July 2021
Vinodkumar Bhovi
© 2021 databag.ai – Proprietary and Confidential
Introduction – what to expect from us
• Structured course designed to pass DP 203
• We deliver cheat sheets at the end of the program
• Slack channel access, PPT’s
• We (databag) have put lot of effort in designing the course
• No donations/fees will ever be asked from databag.ai
© 2021 databag.ai – Proprietary and Confidential
Introduction – what we expect from you
• Free means you’ve got nothing to lose, keeps you in comfort zone
• We take these trainings serious and expect same from you
• Setup Free Azure Account
• Schedule exam in 21 days – on 14-08-2021
• Consistency
• If you ever used Google drive or facebook or Amazon that is more
than enough to learn Azure
© 2021 databag.ai – Proprietary and Confidential
Data is new oil
We need to find it, extract it, refine it, distribute it and monetize it
– David Buckinham, Big data expert
© 2021 databag.ai – Proprietary and Confidential
Azure SQL
Azure Cosmos DB
Azure Storage Accounts
Azure Synapse Analytics
Azure Stream Analytics
Azure Data Factory
Azure Databricks Azure HDInsight
Data
Data Storage Data Transformation
Azure Data Lake
© 2021 databag.ai – Proprietary and Confidential
14 days schedule
• Day1: Azure Data Lake - Vinod
• Day2: Azure SQL Databases - Lakshay
• Day3: Azure Cosmos DB - Vinod
• Day4: Azure Cosmos DB - Vinod
• Day5: Azure Data Factory - Vishwamitra
• Day6: Azure Data Factory - Vishwamitra
• Day7: Azure Databricks - Vinod
© 2021 databag.ai – Proprietary and Confidential
14 days schedule (continued #)
• Day8: Azure Databricks - Vinod
• Day11: Azure Synapse Analytics - Vinod
• Day9: Azure HDInsight - Vinod
• Day12: Azure Synapse Analytics - Vinod
• Day13: Azure Synapse Analytics - Vinod
• Day14: Practice Exam(61) & Q/A – databag team
• Day10: Azure Stream Analytics - Vinod
© 2021 databag.ai – Proprietary and Confidential
Cloud 101
The practice of using a network of remote servers hosted on the internet to store, manage, and
process data, rather than a local server or a personal computer.
• Infrastructure as a service (IaaS)
• You rent a virtual server
• Amazon, Azure, GCP, etc.
• Platform as a service (PaaS)
• You rent an abstract machine
• Google app engine, Salesforce, etc.
• Software as a service (SaaS)
• You rent a capability
• Azure SQL, ADF, etc.
© 2021 databag.ai – Proprietary and Confidential
Azure SQL
Azure Cosmos DB
Azure Storage Accounts
Azure Synapse Analytics
Azure Stream Analytics
Azure Data Factory
Azure Databricks Azure HDInsight
Data
Data Storage Data Transformation
Azure Data Lake
© 2021 databag.ai – Proprietary and Confidential
Problem statement
➢ Need a solution which can handle below 4 V’s of data.
© 2021 databag.ai – Proprietary and Confidential
Data Classification
• Structured data
Examples: SQL data, Tabular data, csv, spreadsheets
• Semi - structured data
Examples: NoSQL, key/value pairs, JSON, XML, YAML
• Unstructured data
Examples: Media files, Office files, Text files, Log files
© 2021 databag.ai – Proprietary and Confidential
Azure Storage
Azure Storage is a Microsoft-managed cloud service that provides storage that is highly
available, secure, durable, scalable and redundant. Within Azure there are two types of
storage accounts, four types of storage, four levels of data redundancy and three tiers for
storing files
© 2021 databag.ai – Proprietary and Confidential
Azure File Storage
Azure Files is a shared network file storage service that provides administrators a way to
access native SMB file shares in the cloud
© 2021 databag.ai – Proprietary and Confidential
Azure Queue Storage
Azure Queue Storage is a service that allows users to store high volumes of messages,
process them asynchronously and consume them when needed
© 2021 databag.ai – Proprietary and Confidential
Azure Table Storage
Azure Table Storage is a scalable, NoSQL, key-value data storage system that can be used
to store large amounts of data in the cloud. This storage offering has a schema less design,
and each table has rows that are composed of key-value pairs
© 2021 databag.ai – Proprietary and Confidential
Azure Blob Storage
• Large object storage in cloud
• Optimized for storing massive amounts of
unstructured data
• Text or Binary Data
• General purpose object storage
• Cost efficient
• Provide multiple Tiers
Azure Blob Storage is Microsoft Azure’s service for storing binary large objects or blobs which
are typically composed of unstructured data such as text, images, and videos, along with their
metadata. Blobs are stored in directory-like structures called “containers.”
© 2021 databag.ai – Proprietary and Confidential
Azure Service Hierarchy
© 2021 databag.ai – Proprietary and Confidential
Azure Data lake Gen 2
Data Lake Gen 1
© 2021 databag.ai – Proprietary and Confidential
Hierarchical namespace (demo)
• Hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access.
• Blob storage is not hierarchical namespace
• Use slashes in Blob storage file names to stimulate a tree like directory structure
• Blob can’t integrate with Hadoop
• Because Blob doesn’t have hierarchical namespace
NAME
LEVE
L 1
LEAF
1
CHILD CHILD
LEAF
2
CHILD
LEVE
L 2
LEVE
L 3
LEAF
5
LEVE
L 4
LEAF
3
CHILD CHILD
© 2021 databag.ai – Proprietary and Confidential
Data Ingestion
Azure Data Lake
© 2021 databag.ai – Proprietary and Confidential
Data Lake
Azure Data Lake
© 2021 databag.ai – Proprietary and Confidential
Azure Data Lake
Data Lake
Security
Redundancy
Monitoring services
Access tiers
Life Cycle policy
© 2021 databag.ai – Proprietary and Confidential
Access Tiers
• Hot - Optimized for storing data that is accessed frequently.
• Cool - Optimized for storing data that is infrequently accessed and stored for at least 30 days (early
deletion fee).
• Archive - Optimized for storing data that is rarely accessed and stored for at least 180 days with
flexible latency requirements, on the order of hours (early deletion fee).
© 2021 databag.ai – Proprietary and Confidential
Access Tiers
• Archive - Optimized for storing data that is rarely accessed and stored for at least 180 days with
flexible latency requirements, on the order of hours.
To read data in archive storage, you must first change the tier of the blob to hot or cool. This
process is known as rehydration and can take hours to complete
• Standard priority: The rehydration request will be processed in the order it was received
and may take up to 15 hours.
• High priority: The rehydration request will be prioritized over Standard requests and may
finish in under 1 hour for objects under ten GB in size.
© 2021 databag.ai – Proprietary and Confidential
Access Tiers
© 2021 databag.ai – Proprietary and Confidential
Access Tiers
© 2021 databag.ai – Proprietary and Confidential
Security
• Authentication
• Storage Account keys
• Shared access signature (SAS)
• Azure Active Directory (Azure AD)
• Access Control
• Role based access control (RBAC)
• Access control list (ACL)
• Network access
• Firewall and virtual network
• Data Encryption
© 2021 databag.ai – Proprietary and Confidential
Security
Authentication
© 2021 databag.ai – Proprietary and Confidential
Security
Authentication
© 2021 databag.ai – Proprietary and Confidential
Shared Access Signature (SAS)
• Security token string
• “SAS Token”
• Contains permission like start and end time
• Azure doesn’t track SAS after creation
To invalidate, regenerate storage account
key used to sign SAS
Shared Access Signature
© 2021 databag.ai – Proprietary and Confidential
Azure Active Directory (AD)
• Grand access to Azure Active directory (AD) Identities
• AD is an enterprise identity provider, Identity as a Service (IDaaS)
• Globally available from any device
• Identities – user, group or application principle
• Assign role at Subscription, RG, Storage account, container level.
• No longer need to store credentials with application config files
• Like IIS Application pool identity approach
• Role based Access control (RBAC)
© 2021 databag.ai – Proprietary and Confidential
Firewalls and Virtual Networks
My VN Internet
IP Address
Azure Storage
© 2021 databag.ai – Proprietary and Confidential
Security
• Authentication
• Storage Account keys
• Shared access signature (SAS)
• Azure Active Directory (Azure AD)
• Access Control
• Role based access control (RBAC)
• Access control list (ACL)
• Network access
• Firewall and virtual network
• Data Encryption
© 2021 databag.ai – Proprietary and Confidential
Lifecycle Management
Azure Blob Storage lifecycle management offers a rich, rule-based policy which you can use to transition your
data to the best access tier and to expire data at the end of its lifecycle.
Lifecycle management policy helps you:
• Transition blobs to a cooler storage tier such as hot to cool, hot to archive, or cool to archive in order to
optimize for performance and cost
• Delete blobs at the end of their lifecycles
• Define up to 100 rules
• Run rules automatically once a day
• Apply rules to containers or specific subset of blobs, up to 10 prefixes per rule
© 2021 databag.ai – Proprietary and Confidential
Data redundancy for storage
LRS – Locally-redundant storage: Three copies of your data which is maintained within the same primary data center.
ZRS- Zone-redundant storage: Three copies of your data replicated synchronously to 3 Azure availability zones in a primary region. Zones
are in different physical locations or different data centers.
GRS- Geo-redundant storage: This allows your data to be stored in different geographic areas of the country or world. Again, you get three
copies of the data within a primary region, but it goes one step further and places three additional asynchronous copies in another region. For
example, you can now have a copy in Virginia and in California to protect your data from fires or hurricanes depending on the coast.
RA-GRS- Read Access Geo-redundant storage: This is GRS but adds a read-only element that allows you to have read access for things like
reporting.
Geo-zone-redundant storage (GZRS): Copy your data synchronously over three primary region Azure availability zones using ZRS. It then
asynchronously copies your data to a single physical location within the secondary region.
Read-access geo-zone-redundant storage (RA-GZRS): it adds a layer of readability to your secondaries.
© 2021 databag.ai – Proprietary and Confidential
Data redundancy for storage
Locally Redundant Storage (LRS)
Region 1
Zone 1 Zone 2 Zone 3
© 2021 databag.ai – Proprietary and Confidential
Data redundancy for storage
Zone Redundant Storage (ZRS) --- HA
Region 1
Zone 1 Zone 2 Zone 3
© 2021 databag.ai – Proprietary and Confidential
Data redundancy for storage
Geo Redundant Storage (GRS) --- DR
Region 1
Zone 1 Zone 2 Zone 3
Region 2
Zone 1 Zone 2 Zone 3
© 2021 databag.ai – Proprietary and Confidential
Data redundancy for storage
Geo Zone Redundant Storage (GZRS) – HA/DR
Region 1
Zone 1 Zone 2 Zone 3
Region 2
Zone 1 Zone 2 Zone 3
© 2021 databag.ai – Proprietary and Confidential
Data redundancy for storage
Read Access Geo Redundant Storage (RA-GRS)
Region 1
Zone 1 Zone 2 Zone 3
Region 2 (Read)
Zone 1 Zone 2 Zone 3
© 2021 databag.ai – Proprietary and Confidential
Data redundancy for storage
Read Access Geo Zone Redundant Storage (RA-GZRS) – HA/DR
Region 1
Zone 1 Zone 2 Zone 3
Region 2(Read)
Zone 1 Zone 2 Zone 3
© 2021 databag.ai – Proprietary and Confidential
Data redundancy for storage
© 2021 databag.ai – Proprietary and Confidential
Monitoring services
• Alerts
• Metrics
• Diagnostics
• Logs Analytics
© 2021 databag.ai – Proprietary and Confidential

More Related Content

Similar to Day1_Data Lake_v2.pdf

0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
AZ-204: Monitor, Troubleshoot & Optimize Azure Solutions
AZ-204: Monitor, Troubleshoot & Optimize Azure SolutionsAZ-204: Monitor, Troubleshoot & Optimize Azure Solutions
AZ-204: Monitor, Troubleshoot & Optimize Azure SolutionsAzureEzy1
 
Az 104 session 4: azure storage
Az 104 session 4: azure storageAz 104 session 4: azure storage
Az 104 session 4: azure storageAzureEzy1
 
10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for AnalyticsSenturus
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsDustin Vannoy
 
Examining Technical Best Practices for Veritas and AWS Using a Detailed Refer...
Examining Technical Best Practices for Veritas and AWS Using a Detailed Refer...Examining Technical Best Practices for Veritas and AWS Using a Detailed Refer...
Examining Technical Best Practices for Veritas and AWS Using a Detailed Refer...Veritas Technologies LLC
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudCAMMS
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Aws Solution Architecture Associate - summary
Aws Solution Architecture Associate - summaryAws Solution Architecture Associate - summary
Aws Solution Architecture Associate - summaryonoffshake
 
Slide Storage.pptx
Slide Storage.pptxSlide Storage.pptx
Slide Storage.pptxAseem Goyal
 
iFCloud Secure File Sharing
iFCloud Secure File SharingiFCloud Secure File Sharing
iFCloud Secure File Sharingifcloudus
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Azure Synapse vs. Snowflake: The Data Warehouse Dating Game
Azure Synapse vs. Snowflake: The Data Warehouse Dating GameAzure Synapse vs. Snowflake: The Data Warehouse Dating Game
Azure Synapse vs. Snowflake: The Data Warehouse Dating GameSenturus
 
Unlocking the Power of the Data Lake
Unlocking the Power of the Data LakeUnlocking the Power of the Data Lake
Unlocking the Power of the Data LakeArcadia Data
 
AWS re:Invent 2016: Hybrid Architectures: Bridging the Gap to the Cloud( ARC2...
AWS re:Invent 2016: Hybrid Architectures: Bridging the Gap to the Cloud( ARC2...AWS re:Invent 2016: Hybrid Architectures: Bridging the Gap to the Cloud( ARC2...
AWS re:Invent 2016: Hybrid Architectures: Bridging the Gap to the Cloud( ARC2...Amazon Web Services
 
Three Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active ArchiveThree Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active ArchiveAvere Systems
 

Similar to Day1_Data Lake_v2.pdf (20)

0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
AZ-204: Monitor, Troubleshoot & Optimize Azure Solutions
AZ-204: Monitor, Troubleshoot & Optimize Azure SolutionsAZ-204: Monitor, Troubleshoot & Optimize Azure Solutions
AZ-204: Monitor, Troubleshoot & Optimize Azure Solutions
 
Az 104 session 4: azure storage
Az 104 session 4: azure storageAz 104 session 4: azure storage
Az 104 session 4: azure storage
 
10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics10 Reasons Snowflake Is Great for Analytics
10 Reasons Snowflake Is Great for Analytics
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
 
Examining Technical Best Practices for Veritas and AWS Using a Detailed Refer...
Examining Technical Best Practices for Veritas and AWS Using a Detailed Refer...Examining Technical Best Practices for Veritas and AWS Using a Detailed Refer...
Examining Technical Best Practices for Veritas and AWS Using a Detailed Refer...
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Move your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in CloudMove your on prem data to a lake in a Lake in Cloud
Move your on prem data to a lake in a Lake in Cloud
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Aws Solution Architecture Associate - summary
Aws Solution Architecture Associate - summaryAws Solution Architecture Associate - summary
Aws Solution Architecture Associate - summary
 
Slide Storage.pptx
Slide Storage.pptxSlide Storage.pptx
Slide Storage.pptx
 
iFCloud Secure File Sharing
iFCloud Secure File SharingiFCloud Secure File Sharing
iFCloud Secure File Sharing
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Azure Synapse vs. Snowflake: The Data Warehouse Dating Game
Azure Synapse vs. Snowflake: The Data Warehouse Dating GameAzure Synapse vs. Snowflake: The Data Warehouse Dating Game
Azure Synapse vs. Snowflake: The Data Warehouse Dating Game
 
Unlocking the Power of the Data Lake
Unlocking the Power of the Data LakeUnlocking the Power of the Data Lake
Unlocking the Power of the Data Lake
 
AWS re:Invent 2016: Hybrid Architectures: Bridging the Gap to the Cloud( ARC2...
AWS re:Invent 2016: Hybrid Architectures: Bridging the Gap to the Cloud( ARC2...AWS re:Invent 2016: Hybrid Architectures: Bridging the Gap to the Cloud( ARC2...
AWS re:Invent 2016: Hybrid Architectures: Bridging the Gap to the Cloud( ARC2...
 
Three Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active ArchiveThree Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active Archive
 

Recently uploaded

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Recently uploaded (20)

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 

Day1_Data Lake_v2.pdf

  • 1. DP-203 Microsoft Azure Data Engineer Day 1 – Data Lake Gen2 25th July 2021 Vinodkumar Bhovi
  • 2. © 2021 databag.ai – Proprietary and Confidential Introduction – what to expect from us • Structured course designed to pass DP 203 • We deliver cheat sheets at the end of the program • Slack channel access, PPT’s • We (databag) have put lot of effort in designing the course • No donations/fees will ever be asked from databag.ai
  • 3. © 2021 databag.ai – Proprietary and Confidential Introduction – what we expect from you • Free means you’ve got nothing to lose, keeps you in comfort zone • We take these trainings serious and expect same from you • Setup Free Azure Account • Schedule exam in 21 days – on 14-08-2021 • Consistency • If you ever used Google drive or facebook or Amazon that is more than enough to learn Azure
  • 4. © 2021 databag.ai – Proprietary and Confidential Data is new oil We need to find it, extract it, refine it, distribute it and monetize it – David Buckinham, Big data expert
  • 5. © 2021 databag.ai – Proprietary and Confidential Azure SQL Azure Cosmos DB Azure Storage Accounts Azure Synapse Analytics Azure Stream Analytics Azure Data Factory Azure Databricks Azure HDInsight Data Data Storage Data Transformation Azure Data Lake
  • 6. © 2021 databag.ai – Proprietary and Confidential 14 days schedule • Day1: Azure Data Lake - Vinod • Day2: Azure SQL Databases - Lakshay • Day3: Azure Cosmos DB - Vinod • Day4: Azure Cosmos DB - Vinod • Day5: Azure Data Factory - Vishwamitra • Day6: Azure Data Factory - Vishwamitra • Day7: Azure Databricks - Vinod
  • 7. © 2021 databag.ai – Proprietary and Confidential 14 days schedule (continued #) • Day8: Azure Databricks - Vinod • Day11: Azure Synapse Analytics - Vinod • Day9: Azure HDInsight - Vinod • Day12: Azure Synapse Analytics - Vinod • Day13: Azure Synapse Analytics - Vinod • Day14: Practice Exam(61) & Q/A – databag team • Day10: Azure Stream Analytics - Vinod
  • 8. © 2021 databag.ai – Proprietary and Confidential Cloud 101 The practice of using a network of remote servers hosted on the internet to store, manage, and process data, rather than a local server or a personal computer. • Infrastructure as a service (IaaS) • You rent a virtual server • Amazon, Azure, GCP, etc. • Platform as a service (PaaS) • You rent an abstract machine • Google app engine, Salesforce, etc. • Software as a service (SaaS) • You rent a capability • Azure SQL, ADF, etc.
  • 9. © 2021 databag.ai – Proprietary and Confidential Azure SQL Azure Cosmos DB Azure Storage Accounts Azure Synapse Analytics Azure Stream Analytics Azure Data Factory Azure Databricks Azure HDInsight Data Data Storage Data Transformation Azure Data Lake
  • 10. © 2021 databag.ai – Proprietary and Confidential Problem statement ➢ Need a solution which can handle below 4 V’s of data.
  • 11. © 2021 databag.ai – Proprietary and Confidential Data Classification • Structured data Examples: SQL data, Tabular data, csv, spreadsheets • Semi - structured data Examples: NoSQL, key/value pairs, JSON, XML, YAML • Unstructured data Examples: Media files, Office files, Text files, Log files
  • 12. © 2021 databag.ai – Proprietary and Confidential Azure Storage Azure Storage is a Microsoft-managed cloud service that provides storage that is highly available, secure, durable, scalable and redundant. Within Azure there are two types of storage accounts, four types of storage, four levels of data redundancy and three tiers for storing files
  • 13. © 2021 databag.ai – Proprietary and Confidential Azure File Storage Azure Files is a shared network file storage service that provides administrators a way to access native SMB file shares in the cloud
  • 14. © 2021 databag.ai – Proprietary and Confidential Azure Queue Storage Azure Queue Storage is a service that allows users to store high volumes of messages, process them asynchronously and consume them when needed
  • 15. © 2021 databag.ai – Proprietary and Confidential Azure Table Storage Azure Table Storage is a scalable, NoSQL, key-value data storage system that can be used to store large amounts of data in the cloud. This storage offering has a schema less design, and each table has rows that are composed of key-value pairs
  • 16. © 2021 databag.ai – Proprietary and Confidential Azure Blob Storage • Large object storage in cloud • Optimized for storing massive amounts of unstructured data • Text or Binary Data • General purpose object storage • Cost efficient • Provide multiple Tiers Azure Blob Storage is Microsoft Azure’s service for storing binary large objects or blobs which are typically composed of unstructured data such as text, images, and videos, along with their metadata. Blobs are stored in directory-like structures called “containers.”
  • 17. © 2021 databag.ai – Proprietary and Confidential Azure Service Hierarchy
  • 18. © 2021 databag.ai – Proprietary and Confidential Azure Data lake Gen 2 Data Lake Gen 1
  • 19. © 2021 databag.ai – Proprietary and Confidential Hierarchical namespace (demo) • Hierarchical namespace organizes objects/files into a hierarchy of directories for efficient data access. • Blob storage is not hierarchical namespace • Use slashes in Blob storage file names to stimulate a tree like directory structure • Blob can’t integrate with Hadoop • Because Blob doesn’t have hierarchical namespace NAME LEVE L 1 LEAF 1 CHILD CHILD LEAF 2 CHILD LEVE L 2 LEVE L 3 LEAF 5 LEVE L 4 LEAF 3 CHILD CHILD
  • 20. © 2021 databag.ai – Proprietary and Confidential Data Ingestion Azure Data Lake
  • 21. © 2021 databag.ai – Proprietary and Confidential Data Lake Azure Data Lake
  • 22. © 2021 databag.ai – Proprietary and Confidential Azure Data Lake Data Lake Security Redundancy Monitoring services Access tiers Life Cycle policy
  • 23. © 2021 databag.ai – Proprietary and Confidential Access Tiers • Hot - Optimized for storing data that is accessed frequently. • Cool - Optimized for storing data that is infrequently accessed and stored for at least 30 days (early deletion fee). • Archive - Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements, on the order of hours (early deletion fee).
  • 24. © 2021 databag.ai – Proprietary and Confidential Access Tiers • Archive - Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements, on the order of hours. To read data in archive storage, you must first change the tier of the blob to hot or cool. This process is known as rehydration and can take hours to complete • Standard priority: The rehydration request will be processed in the order it was received and may take up to 15 hours. • High priority: The rehydration request will be prioritized over Standard requests and may finish in under 1 hour for objects under ten GB in size.
  • 25. © 2021 databag.ai – Proprietary and Confidential Access Tiers
  • 26. © 2021 databag.ai – Proprietary and Confidential Access Tiers
  • 27. © 2021 databag.ai – Proprietary and Confidential Security • Authentication • Storage Account keys • Shared access signature (SAS) • Azure Active Directory (Azure AD) • Access Control • Role based access control (RBAC) • Access control list (ACL) • Network access • Firewall and virtual network • Data Encryption
  • 28. © 2021 databag.ai – Proprietary and Confidential Security Authentication
  • 29. © 2021 databag.ai – Proprietary and Confidential Security Authentication
  • 30. © 2021 databag.ai – Proprietary and Confidential Shared Access Signature (SAS) • Security token string • “SAS Token” • Contains permission like start and end time • Azure doesn’t track SAS after creation To invalidate, regenerate storage account key used to sign SAS Shared Access Signature
  • 31. © 2021 databag.ai – Proprietary and Confidential Azure Active Directory (AD) • Grand access to Azure Active directory (AD) Identities • AD is an enterprise identity provider, Identity as a Service (IDaaS) • Globally available from any device • Identities – user, group or application principle • Assign role at Subscription, RG, Storage account, container level. • No longer need to store credentials with application config files • Like IIS Application pool identity approach • Role based Access control (RBAC)
  • 32. © 2021 databag.ai – Proprietary and Confidential Firewalls and Virtual Networks My VN Internet IP Address Azure Storage
  • 33. © 2021 databag.ai – Proprietary and Confidential Security • Authentication • Storage Account keys • Shared access signature (SAS) • Azure Active Directory (Azure AD) • Access Control • Role based access control (RBAC) • Access control list (ACL) • Network access • Firewall and virtual network • Data Encryption
  • 34. © 2021 databag.ai – Proprietary and Confidential Lifecycle Management Azure Blob Storage lifecycle management offers a rich, rule-based policy which you can use to transition your data to the best access tier and to expire data at the end of its lifecycle. Lifecycle management policy helps you: • Transition blobs to a cooler storage tier such as hot to cool, hot to archive, or cool to archive in order to optimize for performance and cost • Delete blobs at the end of their lifecycles • Define up to 100 rules • Run rules automatically once a day • Apply rules to containers or specific subset of blobs, up to 10 prefixes per rule
  • 35. © 2021 databag.ai – Proprietary and Confidential Data redundancy for storage LRS – Locally-redundant storage: Three copies of your data which is maintained within the same primary data center. ZRS- Zone-redundant storage: Three copies of your data replicated synchronously to 3 Azure availability zones in a primary region. Zones are in different physical locations or different data centers. GRS- Geo-redundant storage: This allows your data to be stored in different geographic areas of the country or world. Again, you get three copies of the data within a primary region, but it goes one step further and places three additional asynchronous copies in another region. For example, you can now have a copy in Virginia and in California to protect your data from fires or hurricanes depending on the coast. RA-GRS- Read Access Geo-redundant storage: This is GRS but adds a read-only element that allows you to have read access for things like reporting. Geo-zone-redundant storage (GZRS): Copy your data synchronously over three primary region Azure availability zones using ZRS. It then asynchronously copies your data to a single physical location within the secondary region. Read-access geo-zone-redundant storage (RA-GZRS): it adds a layer of readability to your secondaries.
  • 36. © 2021 databag.ai – Proprietary and Confidential Data redundancy for storage Locally Redundant Storage (LRS) Region 1 Zone 1 Zone 2 Zone 3
  • 37. © 2021 databag.ai – Proprietary and Confidential Data redundancy for storage Zone Redundant Storage (ZRS) --- HA Region 1 Zone 1 Zone 2 Zone 3
  • 38. © 2021 databag.ai – Proprietary and Confidential Data redundancy for storage Geo Redundant Storage (GRS) --- DR Region 1 Zone 1 Zone 2 Zone 3 Region 2 Zone 1 Zone 2 Zone 3
  • 39. © 2021 databag.ai – Proprietary and Confidential Data redundancy for storage Geo Zone Redundant Storage (GZRS) – HA/DR Region 1 Zone 1 Zone 2 Zone 3 Region 2 Zone 1 Zone 2 Zone 3
  • 40. © 2021 databag.ai – Proprietary and Confidential Data redundancy for storage Read Access Geo Redundant Storage (RA-GRS) Region 1 Zone 1 Zone 2 Zone 3 Region 2 (Read) Zone 1 Zone 2 Zone 3
  • 41. © 2021 databag.ai – Proprietary and Confidential Data redundancy for storage Read Access Geo Zone Redundant Storage (RA-GZRS) – HA/DR Region 1 Zone 1 Zone 2 Zone 3 Region 2(Read) Zone 1 Zone 2 Zone 3
  • 42. © 2021 databag.ai – Proprietary and Confidential Data redundancy for storage
  • 43. © 2021 databag.ai – Proprietary and Confidential Monitoring services • Alerts • Metrics • Diagnostics • Logs Analytics
  • 44. © 2021 databag.ai – Proprietary and Confidential