Deploying ETL to Cloud

Deploying ETL to the cloud
What it takes to set up a production
data pipeline starting from zero

Our data is moving to cloud, its natural
that our data integration processes follow.
Cloud platforms inherently better at infrastructure
o Security
o Availability
o Trust-worthiness
Motivation

Some ETL belongs on-prem, some belongs in the cloud.
Sometimes ETL location is not such an obvious decision.
Which ETL workloads are candidates for the cloud?
Primary Sources Primary Targets Example Use Case ETL Location
On-prem On-prem Reporting, Migration On prem
On-prem Cloud Big Data Analytics ?
Cloud On-prem Enrichment ?
Cloud Cloud Application integration Cloud

Deploy ETL on-premise and push to the cloud
ON-PREMISE
Data push
ETL CLOUD

Deploy ETL in cloud and pull from on-premise
Open
ports
Data pull
ON-PREMISE ETL CLOUD

Since we’re here to talk
about Deploying ETL to
Cloud, we’ll assume that
choice is made…

Fully-managed ETL-as-a-service
o Quick to setup and operate
o Limited options if you find missing capability
Self-managed ETL
o Wide range of architecture options
o More control over ETL behavior.
o More flexible licensing (perpetual, subscription)
o Costs are less predictable (infrastructure costs, labor costs).
Deployment Model is tightly coupled to ETL vendor selection
There is a range of Cloud ETL deployment models

We’ll focus today
on self-managed
deployments.

Let’s explore issues around:
Architecture
Security
Costs
Operations
Three real-world use cases
Self-managed cloud ETL

Case #1
Operating an Analytics and Reporting Warehouse

Insurance Company tracking applications for new policies
Field Agents submit application packages via SFTP
Multistage process to ingest, assess and load to warehouse
Nightly batches must be completed within SLA
Case #1
Operating an Analytics and Reporting Warehouse

Azure Cloud
Hybrid ETL
o Fully-managed via Azure Data Factory
o Self-managed CloverDX
Varied storage technology
Security services
Case #1
Deployment Features

CloverDX
[SELF MANAGED]
Azure
Data Lake Storage
Azure
Blob Storage
Azure
SQL Database (Staging)
Azure
Key Vault
Azure
Database (Production)
Azure
Database (CloverDX)
Azure Data Factory
[FULLY MANAGED]
Firewall
SFTP
Azure
Case #1
Architecture

Case #2
High volume message processing

Ingest large volume of small data files
Incoming data transformed to canonical JSON, dispatched to
downstream API
10,000 messages per minute
Guarantee each message delivered exactly once
Case #2
High Volume Message Processing

AWS Cloud
Containerized ETL
o Scalability
Message queues
o Guaranteed message delivery
Case #2
Deployment Features

Case #2
Architecture
AWS
API Gateway
Firewall
AWS
Inbound
Message
AWS
SQS Message
Queue
AWS
S3 Storage
AWS
RDS Database
(CloverDX)
CloverDX
CloverDX
CloverDX
Container Manager

Case #3
Integrating cloud CRM with back-office operations

Expedite response to CRM activity
Sales Quote in CRM triggers immediate action in back office
Relatively low volume
Case #3
Integrating cloud CRM with back-office

AWS Cloud
Serverless deployment (for convenience, not scale)
o Web hook handling
o ETL processor
o ETL database
Case #3
Deployment Features

Common patterns in Cloud ETL
Componentization Landing zones Elasticity

It can be difficult to
estimate/control
costs of a cloud
deployment

Deploying ETL to
cloud requires
different skills
and expertise.

Need to develop skill to choose & configure services
CloverDX
VM
Azure
Data Lake Storage
Azure
Blob Storage
Azure
Database (Staging)
Azure
Key Vault
Azure
Database (Production)
Azure
Database (CloverDX)
Azure Data Factory
SERVICE
Firewall
SFTP
Azure

Have realistic
expectations about
the support
provided by Cloud
platforms.

Not all services
are guaranteed
to be available

Final Thought – Cloud Selection

Most of our clients use one of these two providers (or both)
Decision likely already made by business independently of ETL needs
Our completely subjective view
o Azure has better console user interface
o Azure sales experience is more friendly to SMEs
o AWS has larger number of services, generally more feature-full
o AWS is more google-able
Azure or AWS?

www.cloverdx.com
About CloverDX Enterprise Data Management Platform
CloverDX is a data management platform for designing, automating and operating data jobs at scale. We've engineered CloverDX to
solve complex data movement and transformation scenarios with a combination of visual IDE for data jobs, flexibility of coding and
extensible automation and orchestration features.
hello@cloverdx.com

Deploying ETL to Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deploying ETL to Cloud

Similar to Deploying ETL to Cloud (20)

More from CloverDX

More from CloverDX (14)

Recently uploaded

Recently uploaded (20)

Deploying ETL to Cloud