SlideShare a Scribd company logo
1 of 24
Puneet Vijwani
03.02.2024
-Data Toboggan
Managed and External Spark Tables in Fabric Lakehouse
@Puneetvijwani
( Meetup) : Fabric’s & Synapse Explorers
User Group Norway
Agenda
Overview
Inside Fabric Lakehouse
Delta Lake Tables
•Delta Lake is the default table format in Fabric
Lakehouse's
•Brings reliability, performance and simplicity to data lakes
•Supports ACID transactions, schema enforcement, time
travel
Fabric
Tables Files
One Lake
Internal
HMS
ADLS GEN 2
AWS etc..
Data & Metadata
By HMS
Only Metadata
By HMS
SQL SERVER’s
INFORMATION_SCEHMA Tables
Database File’s
Table schemas/ Metadata
Metadata
Power BI
Fabric Workspace
Hive
Metadata
Fabirc Managed service
Managed Table
Data
abfss://<>@onelake.dfs.fabric.microsoft.com/<>/Tables/products
Delta Lake Tables-Fabric
%%sql
CREATE TABLE salesorders
(
Orderid INT NOT NULL,
OrderDate TIMESTAMP NOT NULL,
CustomerName STRING,
SalesTotal FLOAT NOT NULL
)
USING DELTA
%%sql
CREATE TABLE MyExternalTable
USING DELTA
LOCATION 'Files/mydata'
from delta.tables import *
DeltaTable.create(spark) 
.tableName("products") 
.addColumn("Productid", "INT") 
.addColumn("ProductName", "STRING") 
.addColumn("Category", "STRING") 
.addColumn("Price", "FLOAT") 
.execute()
Managed Table
DeltaTableBuilder API
External table
Managed Tables
Handles both data and metadata
Data stored in Lakehouse’s Table directory
Metadata in metastore including info about Lakehouse, Tables, Schema etc.
Dropping table removes ALL data and metadata
Creating
Managed
Table
1.
df = spark.read.load('Files/train_schedule.csv',
format='csv', header=True)
# Save the dataframe as a delta table
df.write.format("delta")
.saveAsTable(“train_schedule")
2. %%sql
CREATE TABLE salesorders (
Orderid INT NOT NULL,
OrderDate TIMESTAMP NOT NULL,
CustomerName STRING,
SalesTotal FLOAT NOT NULL
) USING DELTA
3.
from delta.tables import *
DeltaTable.create(spark) 
.tableName("products") 
.addColumn("Productid", "INT") 
.addColumn("ProductName", "STRING") 
.addColumn("Category", "STRING") 
.addColumn("Price", "FLOAT") 
.execute()
4.
df.write.format("csv"). saveAsTable(“mytable_csv")
df.write.format("json"). saveAsTable(“mytable_json")
df.write.format("parquet").saveAsTable(“mytable_parquet")
Creating
Managed
Table
List Tables
Creating
Managed
Table
4. Load to tables
External Tables
Handles metadata only
You specify external location to store table data
Dropping table removes metadata BUT data
persists externally
Creating
External
Table
1.df.write.format("delta").save
AsTable("myexternaltable",
path="Files/myexternaltable")
2. %%sql
CREATE TABLE
MyExternalTable2
USING DELTA
LOCATION 'Files/mydata'
Creating
External
Table
List Tables
DROP EXTERNAL TABLE
Using
Shortcuts
Table section
Shortcut
(Managed)
Files section
Shortcut
(Unmanaged)
List Tables
External Table from
Shortcut
Key Difference
METADATA HANDLING DATA PERSISTENCE
WHEN TABLE DROPPED
FLEXIBILITY OVER DATA
LOCATION
POWER BI & SQL
ENDPOINT OPERABILITY
One Use Case for Managed Tables
• Scenario: Ephemeral Data Processing
• Description: A data engineering pipeline processes temporary data for analytical or
intermediate computations.
• Rationale: Managed tables are ideal here because they provide ease of cleanup. When
the table is dropped, both metadata and data are deleted, which is perfect for temporary
or transient data that does not need to persist beyond the life of the processing job.
• CREATE TABLE temp_user_sessions
• USING DELTA
• AS SELECT * FROM raw_user_sessions WHERE session_date = '2024-02-02';
Use Case for External Tables
• Scenario: Long-term External Data Storage Integration
• Description: A company stores its data in a data lake such as ADLS or S3 and wants to
make it query able via Spark, but also plans to access this data using other tools or
services outside of the existing environment like MS Fabric for Governance purposes
• Rationale: External tables make sense as they allow the data to remain in place even if
the table definitions in Spark are removed. This flexibility is crucial for scenarios where the
underlying data must be durable and outlive the metadata definitions within the queriable
ecosystem
• CREATE EXTERNAL TABLE user_profiles
• USING PARQUET
• LOCATION ‘Files/external/user_profiles/';
Migrate HMS metadata from Synapse
Export metadata
from source HMS
01
Import metadata
into Fabric
lakehouse
02
Verify metadata
and data available
03
https://learn.microsoft.com/en-us/fabric/data-engineering/migrate-synapse-hms-metadata
• Q&A &
References
https://murggu.medium.com/migrating-spark-catalog-to-
fabric-lakehouse-cc8c14f0f0e1
• From Aitor Murguzur Blogs
https://murggu.medium.com/creating-managed-and-external-
spark-tables-in-fabric-lakehouse-ef6212e75e81
• Spark Data Engineering
Patterns – Shortcuts and
External tables
• Azure Synpase analytics Youtube channel
https://www.youtube.com/watch?v=AObKOOVHRv4&t=300s

More Related Content

What's hot

Vb script reference
Vb script referenceVb script reference
Vb script referencePeterSayer
 
Zenoemenhetdesignthinking
ZenoemenhetdesignthinkingZenoemenhetdesignthinking
ZenoemenhetdesignthinkingCor Noltee, MEd
 
Dita Release Management
Dita Release ManagementDita Release Management
Dita Release Managementjlborie
 
Zero Trust 20211105
Zero Trust 20211105 Zero Trust 20211105
Zero Trust 20211105 Thomas Treml
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridJames Serra
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed InstanceJames Serra
 
The Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft AzureThe Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft AzureMicrosoft Azure
 
Azure Data Studio Extension Development
Azure Data Studio Extension DevelopmentAzure Data Studio Extension Development
Azure Data Studio Extension DevelopmentDrew Skwiers-Koballa
 
Live2D Cubism SDK for Unity(ver 3.0)を使った話
Live2D Cubism SDK for Unity(ver 3.0)を使った話Live2D Cubism SDK for Unity(ver 3.0)を使った話
Live2D Cubism SDK for Unity(ver 3.0)を使った話BURAI_VC2008
 
파이어몽키 3D 애플리케이션 만들기
파이어몽키 3D 애플리케이션 만들기파이어몽키 3D 애플리케이션 만들기
파이어몽키 3D 애플리케이션 만들기Devgear
 
Veean Backup & Replication
Veean Backup & ReplicationVeean Backup & Replication
Veean Backup & ReplicationArnaud PAIN
 
IT Asset Management in ServiceDesk Plus
IT Asset Management in ServiceDesk PlusIT Asset Management in ServiceDesk Plus
IT Asset Management in ServiceDesk PlusManageEngine
 
Introduction to Azure IaaS
Introduction to Azure IaaSIntroduction to Azure IaaS
Introduction to Azure IaaSRobert Crane
 

What's hot (14)

Vb script reference
Vb script referenceVb script reference
Vb script reference
 
Zenoemenhetdesignthinking
ZenoemenhetdesignthinkingZenoemenhetdesignthinking
Zenoemenhetdesignthinking
 
Dita Release Management
Dita Release ManagementDita Release Management
Dita Release Management
 
Zero Trust 20211105
Zero Trust 20211105 Zero Trust 20211105
Zero Trust 20211105
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
The Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft AzureThe Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft Azure
 
Azure Data Studio Extension Development
Azure Data Studio Extension DevelopmentAzure Data Studio Extension Development
Azure Data Studio Extension Development
 
Live2D Cubism SDK for Unity(ver 3.0)を使った話
Live2D Cubism SDK for Unity(ver 3.0)を使った話Live2D Cubism SDK for Unity(ver 3.0)を使った話
Live2D Cubism SDK for Unity(ver 3.0)を使った話
 
파이어몽키 3D 애플리케이션 만들기
파이어몽키 3D 애플리케이션 만들기파이어몽키 3D 애플리케이션 만들기
파이어몽키 3D 애플리케이션 만들기
 
Veean Backup & Replication
Veean Backup & ReplicationVeean Backup & Replication
Veean Backup & Replication
 
IT Asset Management in ServiceDesk Plus
IT Asset Management in ServiceDesk PlusIT Asset Management in ServiceDesk Plus
IT Asset Management in ServiceDesk Plus
 
Design sprint 2.0
Design sprint 2.0Design sprint 2.0
Design sprint 2.0
 
Introduction to Azure IaaS
Introduction to Azure IaaSIntroduction to Azure IaaS
Introduction to Azure IaaS
 

Similar to External & Managed Tables In Fabric Lakehouse.pptx

Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Databricks
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB Shy Engelberg
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL AzureShy Engelberg
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Eric Bragas
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Presentation cloud control enterprise manager 12c
Presentation   cloud control enterprise manager 12cPresentation   cloud control enterprise manager 12c
Presentation cloud control enterprise manager 12cxKinAnx
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxArunPandiyan890855
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 

Similar to External & Managed Tables In Fabric Lakehouse.pptx (20)

Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2Deep Dive into Azure Data Factory v2
Deep Dive into Azure Data Factory v2
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
 
Presentation cloud control enterprise manager 12c
Presentation   cloud control enterprise manager 12cPresentation   cloud control enterprise manager 12c
Presentation cloud control enterprise manager 12c
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

External & Managed Tables In Fabric Lakehouse.pptx

  • 1. Puneet Vijwani 03.02.2024 -Data Toboggan Managed and External Spark Tables in Fabric Lakehouse @Puneetvijwani ( Meetup) : Fabric’s & Synapse Explorers User Group Norway
  • 4. Inside Fabric Lakehouse Delta Lake Tables •Delta Lake is the default table format in Fabric Lakehouse's •Brings reliability, performance and simplicity to data lakes •Supports ACID transactions, schema enforcement, time travel
  • 5. Fabric Tables Files One Lake Internal HMS ADLS GEN 2 AWS etc.. Data & Metadata By HMS Only Metadata By HMS SQL SERVER’s INFORMATION_SCEHMA Tables Database File’s Table schemas/ Metadata Metadata Power BI Fabric Workspace Hive Metadata Fabirc Managed service Managed Table Data abfss://<>@onelake.dfs.fabric.microsoft.com/<>/Tables/products
  • 6. Delta Lake Tables-Fabric %%sql CREATE TABLE salesorders ( Orderid INT NOT NULL, OrderDate TIMESTAMP NOT NULL, CustomerName STRING, SalesTotal FLOAT NOT NULL ) USING DELTA %%sql CREATE TABLE MyExternalTable USING DELTA LOCATION 'Files/mydata' from delta.tables import * DeltaTable.create(spark) .tableName("products") .addColumn("Productid", "INT") .addColumn("ProductName", "STRING") .addColumn("Category", "STRING") .addColumn("Price", "FLOAT") .execute() Managed Table DeltaTableBuilder API External table
  • 7. Managed Tables Handles both data and metadata Data stored in Lakehouse’s Table directory Metadata in metastore including info about Lakehouse, Tables, Schema etc. Dropping table removes ALL data and metadata
  • 8. Creating Managed Table 1. df = spark.read.load('Files/train_schedule.csv', format='csv', header=True) # Save the dataframe as a delta table df.write.format("delta") .saveAsTable(“train_schedule") 2. %%sql CREATE TABLE salesorders ( Orderid INT NOT NULL, OrderDate TIMESTAMP NOT NULL, CustomerName STRING, SalesTotal FLOAT NOT NULL ) USING DELTA 3. from delta.tables import * DeltaTable.create(spark) .tableName("products") .addColumn("Productid", "INT") .addColumn("ProductName", "STRING") .addColumn("Category", "STRING") .addColumn("Price", "FLOAT") .execute() 4. df.write.format("csv"). saveAsTable(“mytable_csv") df.write.format("json"). saveAsTable(“mytable_json") df.write.format("parquet").saveAsTable(“mytable_parquet")
  • 12. External Tables Handles metadata only You specify external location to store table data Dropping table removes metadata BUT data persists externally
  • 20. Key Difference METADATA HANDLING DATA PERSISTENCE WHEN TABLE DROPPED FLEXIBILITY OVER DATA LOCATION POWER BI & SQL ENDPOINT OPERABILITY
  • 21. One Use Case for Managed Tables • Scenario: Ephemeral Data Processing • Description: A data engineering pipeline processes temporary data for analytical or intermediate computations. • Rationale: Managed tables are ideal here because they provide ease of cleanup. When the table is dropped, both metadata and data are deleted, which is perfect for temporary or transient data that does not need to persist beyond the life of the processing job. • CREATE TABLE temp_user_sessions • USING DELTA • AS SELECT * FROM raw_user_sessions WHERE session_date = '2024-02-02';
  • 22. Use Case for External Tables • Scenario: Long-term External Data Storage Integration • Description: A company stores its data in a data lake such as ADLS or S3 and wants to make it query able via Spark, but also plans to access this data using other tools or services outside of the existing environment like MS Fabric for Governance purposes • Rationale: External tables make sense as they allow the data to remain in place even if the table definitions in Spark are removed. This flexibility is crucial for scenarios where the underlying data must be durable and outlive the metadata definitions within the queriable ecosystem • CREATE EXTERNAL TABLE user_profiles • USING PARQUET • LOCATION ‘Files/external/user_profiles/';
  • 23. Migrate HMS metadata from Synapse Export metadata from source HMS 01 Import metadata into Fabric lakehouse 02 Verify metadata and data available 03 https://learn.microsoft.com/en-us/fabric/data-engineering/migrate-synapse-hms-metadata
  • 24. • Q&A & References https://murggu.medium.com/migrating-spark-catalog-to- fabric-lakehouse-cc8c14f0f0e1 • From Aitor Murguzur Blogs https://murggu.medium.com/creating-managed-and-external- spark-tables-in-fabric-lakehouse-ef6212e75e81 • Spark Data Engineering Patterns – Shortcuts and External tables • Azure Synpase analytics Youtube channel https://www.youtube.com/watch?v=AObKOOVHRv4&t=300s