SlideShare a Scribd company logo
Accelerating Data
Ingestion with
Databricks Autoloader
Simon Whiteley
Director of Engineering, Advancing Analytics
Agenda
▪ Why Incremental is Hard
▪ Autoloader Components
▪ Implementation
▪ Evolution
▪ Lessons
Why Incremental is Hard
Incremental Ingestion
BRONZE SILVER
LANDING
?
Incremental Ingestion
▪ Only Read New Files
▪ Don’t Miss Files
▪ Trigger Immediately
▪ Repeatable Pattern
▪ Fast over large directories
?
Existing Patterns – 1) ETL Metadata
etl batch read
{“lastRead”:”2021/05/26”}
Contents:
• /2021/05/24/file 1
• /2021/05/25/file 2
• /2021/05/26/file 3
• /2021/05/27/file 4
.load(f“/{loadDate}/”
Existing Patterns – 2) Spark File Streaming
file stream read
Contents:
• File 1
• File 2
• File 3
• File 4
Checkpoint:
• File 1
• File 2
• File 3
Existing Patterns – 3) DIY
triggered batch read
Blob File
Trigger
Logic
App
Azure
Function
Databricks
Job API
Incremental Ingestion Approaches
Approach Good At Bad At
Metadata ETL Repeatable
Not immediate,
requires polling
File Streaming
Repeatable
Immediate
Slows down over
large directories
DIY Architecture
Immediate
Triggering
Not Repeatable
Databricks Autoloader
Prakash Chockalingam
Databricks Engineering Blog
Auto Loader is an optimized cloud file
source for Apache Spark that loads
data continuously and efficiently from
cloud storage as new data arrives.
What is Autoloader?
Essentially, Autoloader combines our three approaches of:
• Storing Metadata about what has been read
• Using Structured Streaming for immediate processing
• Utilising Cloud-Native Components to optimise identifying
arriving files
There are two parts to the Autoloader job:
• CloudFiles DataReader
• CloudNotification Services (optional)
Cloudfiles Reader
Blob Storage
Blob Storage Queue
{“fileAdded”:”/landing/file 4.json”
• File 1.json
• File 2.json
• File 3.json
• File 4.json
Dataframe
Check Files in
Queue
Read specific files
from source
CloudFiles DataReader
df = ( spark
.readStream
.format(“cloudfiles”)
.option(“cloudfiles.format”,”json”)
.option(“cloudfiles.useNotifications”,”true”)
.schema(mySchema)
.load(“/mnt/landing/”)
)
Tells Spark to use
Autoloader
Tells Autoloader to
expect JSON files
Should Autoloader use
the Notification Queue
Cloud Notification Services - Azure
Blob Storage
Event Grid Topic
Event Grid Subscription Blob Storage Queue
Event Grid Subscription Blob Storage Queue
Event Grid Subscription Blob Storage Queue
Cloud Notification Services - Azure
Blob Storage
New File Arrives,
Triggers Event Topic
Subscription checks
message filters,
inserts into queue
{fileAdded:“/file 4/”}
NotificationServices Config
cloudFiles
.useNotifications – Directory Listing VS Notification Queue
.queueName – Use an Existing Queue
.connectionString – Queue Storage Connection
.subscriptionId
.resourceGroup
.tenantId
.clientId
.clientSecret
Service Principal for Queue Creation
Implementing Autoloader
▪ Setup Steps
▪ Reading New Files
▪ A Basic ETL Setup
Delta Implementation
Practical Implementations
BRONZE SILVER
LANDING
Autoloader
Low Frequency Streams
Autoloader
One
File Per
Day
24/7
Cluster
Low Frequency Streams
Autoloader
One
File Per
Day
1/7
Cluster df
.writeStream
.trigger(once=True)
.save(path)
Autoloader can be combined with trigger.Once
– each run finds only files not processed since
last run
Delta Merge
Autoloader
Merge?
Delta Merge
Autoloader
df
.writeStream
.foreachBatch(runThis)
.save(path)
def runThis(df, batchId):
(df
.write
.save(path)
)
Delta Implementation
▪ Batch ETL Pattern
▪ Merge Statements
▪ Logging State
Evolving Schemas
New Features since Databricks Runtime 8.2
What is Schema Evolution?
{“ID”:1,“ProductName”:“Belt”}
{“ID”:2,“ProductName”:“T-Shirt”,”Size”:”XL”}
{“ID”:3,“ProductName”:“Shirt”,“Size”:“14”,
“Care”:{ “DryClean”: “Yes”,
“MachineWash”:“Don’t you dare”
}
}
How do we handle Evolution?
1. Fail the Stream
2. Manually Intervene
3. Automatically Evolve
In order to manage schema evolution, we need to know:
• What the schema is expected to be
• What the schema is now
• How we want to handle any changes in schema
Schema Inference
In Databricks 8.2 Onwards – simply don’t provide a
Schema to enable Schema Inference. This infers the
schema once when the stream is started and stores it as
metadata.
cloudfiles
.schemaLocation – where to store the schema
.inferColumnTypes – sample data to infer types
.schemaHints – manually specify data types for certain columns
Schema Metastore
_schemas
{“ID”:1,
“ProductName”:“Belt”}
{
"type": "struct",
"fields": [
{
"name": "ID",
"type": “string",
"nullable": true,
"metadata": {}
},
{
"name": "ProductName",
"type": “string",
"nullable": true,
"metadata": {}
}
]
}
0
On First Read
Schema Metastore – DataType Inference
_schemas
{“ID”:1,
“ProductName”:“Belt”}
{
"type": "struct",
"fields": [
{
"name": "ID",
"type": “int",
"nullable": true,
"metadata": {}
},
{
"name": "ProductName",
"type": “string",
"nullable": true,
"metadata": {}
}
]
}
0
On First Read
.option(“cloudFiles.inferColumnTypes”,”True”)
Schema Metastore – Schema Hints
_schemas
{“ID”:1,
“ProductName”:“Belt”}
{
"type": "struct",
"fields": [
{
"name": "ID",
"type": “long",
"nullable": true,
"metadata": {}
},
{
"name": "ProductName",
"type": “string",
"nullable": true,
"metadata": {}
}
]
}
0
On First Read
.option(“cloudFiles.schemaHints”,”ID long”)
Schema Evolution
cloudFiles.schemaEvolutionMode
• addNewColumns – Fail the job, update the schema
metastore
• failOnNewColumns – Fail the job, no updates made
• rescue – Do not fail, pull all unexpected data into
_rescued_data
To allow for schema evolution, we can include a
schema evolution mode option:
Evolution Reminder
{“ID”:1,“ProductName”:“Belt”}
{“ID”:2,“ProductName”:“T-Shirt”,”Size”:”XL”}
{“ID”:3,“ProductName”:“Shirt”,“Size”:“14”,
“Care”:{ “DryClean”: “Yes”,
“MachineWash”:“Don’t you dare”
}
}
1
2
3
Schema Evolution - Rescue
1
2
3
ID Product Name _rescued_data
1 Belt
ID Product Name _rescued_data
2 T-Shirt {“Size”:”XL”}
ID Product Name _rescued_data
3 Shirt {“Size”:”14”,”Care”:{“DryC…
Schema Evolution – Add New Columns
_schemas
{“ID”:2,
“ProductName”:“T-Shirt”,
“Size”:”XL”}
0
On Arrival
2
1
{
"type": "struct",
"fields": [
{
"name": "ID",
"type": “string",
},
{
"name": "ProductName",
"type": “string",
} ,
{
"name": “Size",
"type": “string",
}…
Schema Evolution
▪ Inference & The Schema
Metastore
▪ Schema Hints
▪ Schema Evolution
Lessons from an Autoloader Life
Autoloader Lessons
▪ EventGrid Quotas &
Settings
▪ Streaming Best
Practices
▪ Batching Best Practices
EventGrid Quota Lessons
• You can have 500 files from a single storage account
using the system topic
• Deleting checkpoint will reset the stream ID and create
a new Subscription/Queue, leaving an orphan set
• Use the CloudNotification Libraries to manage this
more closely with custom topics
Streaming Optimisation
• MaxBytesPerTrigger / MaxFilesPerTrigger
Manages the size of the streaming microbatch
• FetchParallelism
Manages the workload on your queue
Batch Lessons – Look for Lost Messages
Default 7 days!
Databricks Autoloader
▪ Reduces complexity of ingesting files
▪ Has some quirks in implementing ETL processes
▪ Growing number of schema evolution features
Simon Whiteley
Director of
Engineering
hello@advancinganalytics.co.uk
@MrSiWhiteley
www.youtube.com/c/AdvancingAnalytics
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Amazon Web Services
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
Prakash Chockalingam
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
Databricks
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
Databricks
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
Eric Xiao
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
Databricks
 
Data Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its EcosystemData Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its Ecosystem
Databricks
 

What's hot (20)

Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Data Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its EcosystemData Quality With or Without Apache Spark and Its Ecosystem
Data Quality With or Without Apache Spark and Its Ecosystem
 

Similar to Accelerating Data Ingestion with Databricks Autoloader

Essential Camel Components
Essential Camel ComponentsEssential Camel Components
Essential Camel Components
Christian Posta
 
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...Peter Keane
 
The Ball Launch on 2013 Microsoft TechDays Part 2/2
The Ball Launch on 2013 Microsoft TechDays Part 2/2The Ball Launch on 2013 Microsoft TechDays Part 2/2
The Ball Launch on 2013 Microsoft TechDays Part 2/2
Kallex
 
[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics
Ulrich Krause
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Rich Lee
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbt
Fabio Fumarola
 
Using Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 FlowUsing Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 Flow
Karsten Dambekalns
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
Sergey Karayev
 
DSpace 4.2 Basics & Configuration
DSpace 4.2 Basics & ConfigurationDSpace 4.2 Basics & Configuration
DSpace 4.2 Basics & ConfigurationDuraSpace
 
Dspace4 150227090306-conversion-gate01
Dspace4 150227090306-conversion-gate01Dspace4 150227090306-conversion-gate01
Dspace4 150227090306-conversion-gate01
walaba06
 
Salesforce winter 16 release
Salesforce winter 16 releaseSalesforce winter 16 release
Salesforce winter 16 release
Jitendra Zaa
 
XPages -Beyond the Basics
XPages -Beyond the BasicsXPages -Beyond the Basics
XPages -Beyond the Basics
Ulrich Krause
 
The Ring programming language version 1.5.4 book - Part 7 of 185
The Ring programming language version 1.5.4 book - Part 7 of 185The Ring programming language version 1.5.4 book - Part 7 of 185
The Ring programming language version 1.5.4 book - Part 7 of 185
Mahmoud Samir Fayed
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
Unmesh Baile
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
Unmesh Baile
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
Info Alchemy Corporation
 
The Ring programming language version 1.5.3 book - Part 7 of 184
The Ring programming language version 1.5.3 book - Part 7 of 184The Ring programming language version 1.5.3 book - Part 7 of 184
The Ring programming language version 1.5.3 book - Part 7 of 184
Mahmoud Samir Fayed
 
OroCRM Partner Technical Training: September 2015
OroCRM Partner Technical Training: September 2015OroCRM Partner Technical Training: September 2015
OroCRM Partner Technical Training: September 2015
Oro Inc.
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
Kyle Hailey
 
Gradle
GradleGradle

Similar to Accelerating Data Ingestion with Databricks Autoloader (20)

Essential Camel Components
Essential Camel ComponentsEssential Camel Components
Essential Camel Components
 
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
 
The Ball Launch on 2013 Microsoft TechDays Part 2/2
The Ball Launch on 2013 Microsoft TechDays Part 2/2The Ball Launch on 2013 Microsoft TechDays Part 2/2
The Ball Launch on 2013 Microsoft TechDays Part 2/2
 
[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
An introduction to maven gradle and sbt
An introduction to maven gradle and sbtAn introduction to maven gradle and sbt
An introduction to maven gradle and sbt
 
Using Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 FlowUsing Document Databases with TYPO3 Flow
Using Document Databases with TYPO3 Flow
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
DSpace 4.2 Basics & Configuration
DSpace 4.2 Basics & ConfigurationDSpace 4.2 Basics & Configuration
DSpace 4.2 Basics & Configuration
 
Dspace4 150227090306-conversion-gate01
Dspace4 150227090306-conversion-gate01Dspace4 150227090306-conversion-gate01
Dspace4 150227090306-conversion-gate01
 
Salesforce winter 16 release
Salesforce winter 16 releaseSalesforce winter 16 release
Salesforce winter 16 release
 
XPages -Beyond the Basics
XPages -Beyond the BasicsXPages -Beyond the Basics
XPages -Beyond the Basics
 
The Ring programming language version 1.5.4 book - Part 7 of 185
The Ring programming language version 1.5.4 book - Part 7 of 185The Ring programming language version 1.5.4 book - Part 7 of 185
The Ring programming language version 1.5.4 book - Part 7 of 185
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
 
The Ring programming language version 1.5.3 book - Part 7 of 184
The Ring programming language version 1.5.3 book - Part 7 of 184The Ring programming language version 1.5.3 book - Part 7 of 184
The Ring programming language version 1.5.3 book - Part 7 of 184
 
OroCRM Partner Technical Training: September 2015
OroCRM Partner Technical Training: September 2015OroCRM Partner Technical Training: September 2015
OroCRM Partner Technical Training: September 2015
 
BGOUG "Agile Data: revolutionizing database cloning'
BGOUG  "Agile Data: revolutionizing database cloning'BGOUG  "Agile Data: revolutionizing database cloning'
BGOUG "Agile Data: revolutionizing database cloning'
 
Gradle
GradleGradle
Gradle
 

More from Databricks

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 

Recently uploaded (20)

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 

Accelerating Data Ingestion with Databricks Autoloader