4. Inside Fabric Lakehouse
Delta Lake Tables
•Delta Lake is the default table format in Fabric
Lakehouse's
•Brings reliability, performance and simplicity to data lakes
•Supports ACID transactions, schema enforcement, time
travel
5. Fabric
Tables Files
One Lake
Internal
HMS
ADLS GEN 2
AWS etc..
Data & Metadata
By HMS
Only Metadata
By HMS
SQL SERVER’s
INFORMATION_SCEHMA Tables
Database File’s
Table schemas/ Metadata
Metadata
Power BI
Fabric Workspace
Hive
Metadata
Fabirc Managed service
Managed Table
Data
abfss://<>@onelake.dfs.fabric.microsoft.com/<>/Tables/products
6. Delta Lake Tables-Fabric
%%sql
CREATE TABLE salesorders
(
Orderid INT NOT NULL,
OrderDate TIMESTAMP NOT NULL,
CustomerName STRING,
SalesTotal FLOAT NOT NULL
)
USING DELTA
%%sql
CREATE TABLE MyExternalTable
USING DELTA
LOCATION 'Files/mydata'
from delta.tables import *
DeltaTable.create(spark)
.tableName("products")
.addColumn("Productid", "INT")
.addColumn("ProductName", "STRING")
.addColumn("Category", "STRING")
.addColumn("Price", "FLOAT")
.execute()
Managed Table
DeltaTableBuilder API
External table
7. Managed Tables
Handles both data and metadata
Data stored in Lakehouse’s Table directory
Metadata in metastore including info about Lakehouse, Tables, Schema etc.
Dropping table removes ALL data and metadata
8. Creating
Managed
Table
1.
df = spark.read.load('Files/train_schedule.csv',
format='csv', header=True)
# Save the dataframe as a delta table
df.write.format("delta")
.saveAsTable(“train_schedule")
2. %%sql
CREATE TABLE salesorders (
Orderid INT NOT NULL,
OrderDate TIMESTAMP NOT NULL,
CustomerName STRING,
SalesTotal FLOAT NOT NULL
) USING DELTA
3.
from delta.tables import *
DeltaTable.create(spark)
.tableName("products")
.addColumn("Productid", "INT")
.addColumn("ProductName", "STRING")
.addColumn("Category", "STRING")
.addColumn("Price", "FLOAT")
.execute()
4.
df.write.format("csv"). saveAsTable(“mytable_csv")
df.write.format("json"). saveAsTable(“mytable_json")
df.write.format("parquet").saveAsTable(“mytable_parquet")
12. External Tables
Handles metadata only
You specify external location to store table data
Dropping table removes metadata BUT data
persists externally
20. Key Difference
METADATA HANDLING DATA PERSISTENCE
WHEN TABLE DROPPED
FLEXIBILITY OVER DATA
LOCATION
POWER BI & SQL
ENDPOINT OPERABILITY
21. One Use Case for Managed Tables
• Scenario: Ephemeral Data Processing
• Description: A data engineering pipeline processes temporary data for analytical or
intermediate computations.
• Rationale: Managed tables are ideal here because they provide ease of cleanup. When
the table is dropped, both metadata and data are deleted, which is perfect for temporary
or transient data that does not need to persist beyond the life of the processing job.
• CREATE TABLE temp_user_sessions
• USING DELTA
• AS SELECT * FROM raw_user_sessions WHERE session_date = '2024-02-02';
22. Use Case for External Tables
• Scenario: Long-term External Data Storage Integration
• Description: A company stores its data in a data lake such as ADLS or S3 and wants to
make it query able via Spark, but also plans to access this data using other tools or
services outside of the existing environment like MS Fabric for Governance purposes
• Rationale: External tables make sense as they allow the data to remain in place even if
the table definitions in Spark are removed. This flexibility is crucial for scenarios where the
underlying data must be durable and outlive the metadata definitions within the queriable
ecosystem
• CREATE EXTERNAL TABLE user_profiles
• USING PARQUET
• LOCATION ‘Files/external/user_profiles/';
23. Migrate HMS metadata from Synapse
Export metadata
from source HMS
01
Import metadata
into Fabric
lakehouse
02
Verify metadata
and data available
03
https://learn.microsoft.com/en-us/fabric/data-engineering/migrate-synapse-hms-metadata