Incremental Iceberg Table Replication At Scale.pptx

Incremental Iceberg Table
Replication At Scale
Open Lakehouse Meetup | Mountain View | March 11, 2025
Szehon Ho
Software Engineer at Databricks
Apache Iceberg PMC Member

Did You Know It’s Hard to Copy an Iceberg Table?
● Iceberg Table Spec chose Absolute Paths
○ That’s great, as there’s no ambiguity what files the table refers to
● But, some things become hard
○ Migrating table
○ Copying a table
○ Backup and Disaster Recovery

Absolute Paths
All references in Iceberg
● Catalogs
● Metadata.json (table version file)
● Manifest-List (snapshot file)
● Manifest File
● Position Delete Files (V2)
● Puffin Files (V3)
Copying an Iceberg table to another location
will break all these references.

Current Options
CREATE TABLE target_table AS SELECT * from source_table;
INSERT INTO target_table SELECT * FROM source_table;
Find all source_table’s current files
DistCP source files
CREATE TABLE target_table
For all copied files
ADD FILE to target_table
● Expensive (Process all rows)
● Lose Snapshot and Version History
● Lose Snapshot and Version History
● Delete Files broken (its file references are invalid)
● Files added with different schemas, partition specs will be wrong

Current Copy Table Workflow
Table in Source Location => Table in Target Location
1. Rewrite all metadata files, replacing prefixes in absolute paths
2. Save these to staging location
3. Find a list of files to copy (staging metadata files + original data files)
4. External Copy Tool (ie, DistCP) from Source to Target
5. Check Integrity (check all references are valid)
6. Register latest copied metadata file in target catalog as new table

RewriteTablePaths SparkAction
Args = (Source Location, Target Location, Staging Location)
1. Rewrite all metadata files with absolute paths, replacing source => target
2. Save these to staging location
Return Values
3. List of files to copy (staging metadata files + original data files)
4. Name of latest metadata.json rewritten (will be target table)

Typical Table
Commit Adds:
● New Version
● New Snapshot
● New Manifest
● New Data File
References
● Down
● Back
/source/metadata-2.json
MetadataLog
● /source/metadata-1.json
Snapshots
● /source/snap-1.avro
Snapshots
/source/snap-1.avro
Manifests
● /source/m1.avro
/source/snap-2.avro
Manifests
● /source/m1.avro
● /source/m2.avro
/source/manifest-1.avro
DataFile
● /source/data-1.parquet
DataFiles
/source/data-2.parquet
Source_Table

MetadataLog
Snapshots
Snapshots
/source/snap-1.avro
Manifests
● /source/m1.avro
/source/snap-2.avro
Manifests
● /source/m1.avro
● /source/m2.avro
DataFile
DataFiles
RewriteTablePaths
(source_table)
Replace
/source/… => /target/…
Source_Table

Rewrite metadata
files to /staging
● /source =>
/target
Data Files are not
rewritten
Return
● Staged Metadata
File Paths
● Data File Paths
/staging/metadata-2.json
MetadataLog
● /target/metadata-1.json
Snapshots
● /target/snap-1.avro
Snapshots
/staging/snap-1.avro
Manifests
● /target/m1.avro
Manifests
● /target/m1.avro
● /target/m2.avro
/staging/manifest-1.avro
DataFile
● /target/data-1.parquet
DataFiles

DistCP
● Metadata Files
/staging => /target
● Data Files
/source => /target
RegisterTable
● => target_table
/target/metadata-2.json
MetadataLog
Snapshots
Snapshots
/target/snap-1.avro
Manifests
● /target/m1.avro
/target/snap-2.avro
Manifests
● /target/m1.avro
● /target/m2.avro
/target/manifest-1.avro
DataFile
DataFiles
/target/data-2.parquet
Target_Table

Problem: Incremental Copy
How to keep a table in sync, if the source changes?
RewriteTablePaths Incremental Mode (vs Full Mode)
1. Optional ‘start_version’ and ‘end_version’ arguments
2. Copy only files that are added in this range (between these version files)

RewriteTablePaths
(source_table)
Start_version =
metadata2.json
End_version =
metadata2.json
Identify files added
between two versions
MetadataLog
Snapshots
Snapshots
/source/snap-1.avro
Manifests
● /source/m1.avro
/source/snap-2.avro
Manifests
● /source/m1.avro
● /source/m2.avro
DataFile
DataFiles
Source_Table

Rewrite Metadata
Files in range
● /source =>
/target
Data Files NOT
rewritten
Returns
● Staged Metadata
File Paths
● Data File Paths
in Range
MetadataLog
Snapshots
Manifests
● /target/m1.avro
● /target/m2.avro
DataFiles

Target Side
(Existing Table)
Snapshots
/target/snap-1.avro
Manifests
● /target/m1.avro
DataFile
Target_table

DistCP
● Metadata Files
/staging => /target
● Data Files
/source => /target
Register (target_table)
Historic references in
target automatically re-
established
MetadataLog
Snapshots
Snapshots
/target/snap-1.avro
Manifests
● /target/m1.avro
/target/snap-2.avro
Manifests
● /target/m1.avro
● /target/m2.avro
DataFile
DataFiles
Target_table

Problem: More Scenarios
1. Snapshot Expiration
2. WAP (Write Audit Publish)
3. Branching
4. Rollback

Expire Snapshots
Common Cleanup
Operation
Deletes files without
references in latest
snapshots
Version files left with
invalid snapshots
V0
()
V1
(S1)
Append
V2
(S1,
S2)
V3
(S2)
Append Expire
S1 Files S2 Files
S2 Files
S2 Files

Error Check
RewriteTablePaths
disallow invalid
version between
start, end version
All references must
be valid
V1, V2 disallowed
V0
()
V1
(S1)
Append
V2
(S1,
S2)
V3
(S2)
Append Expire
S1 Files S1 Files
S2 Files
S2 Files

Branching,
Rollback
These do not
delete files, so no
snapshots are
invalidated
(All versions may
be copied here)

Problem: Concurrency
1. Expire snapshot run concurrently with RewriteTablePaths, DistCP
2. Files are deleted, causing failures
3. Solution: Schedule these jobs non-concurrently

Problem: Initial Copy
Too much data in initial copy (DistCP)
Solution
1. Semantically equivalent to copy + expire
2. Possible to filter source metadata.json during rewrite, for all snapshots after X
3. Limit further rewrite to references from snapshots after X
4. But, all future incremental copies must also apply the same filter (else we
bring back snapshots unknown in target table)
/source/metadata.json
Snaphots:
[S1, S2, S3]
/target/metadata.json
Snaphots:
[S3]

Current State
1. RewriteTablePaths Interface was added in Iceberg 1.7.0
2. RewriteTablePaths Spark Action and rewrite_table_paths Procedure was
released in Iceberg 1.8.0
3. Relative Path discussions for Iceberg V4

Incremental Iceberg Table Replication At Scale.pptx

More Related Content

Similar to Incremental Iceberg Table Replication At Scale.pptx

More from Szehon Ho

Recently uploaded

Incremental Iceberg Table Replication At Scale.pptx

Editor's Notes