SlideShare a Scribd company logo
Deploying ETL to the cloud
What it takes to set up a production
data pipeline starting from zero
Our data is moving to cloud, its natural
that our data integration processes follow.
Cloud platforms inherently better at infrastructure
o Security
o Availability
o Trust-worthiness
Motivation
Some ETL belongs on-prem, some belongs in the cloud.
Sometimes ETL location is not such an obvious decision.
Which ETL workloads are candidates for the cloud?
Primary Sources Primary Targets Example Use Case ETL Location
On-prem On-prem Reporting, Migration On prem
On-prem Cloud Big Data Analytics ?
Cloud On-prem Enrichment ?
Cloud Cloud Application integration Cloud
Deploy ETL on-premise and push to the cloud
ON-PREMISE
Data push
ETL CLOUD
Deploy ETL in cloud and pull from on-premise
Open
ports
Data pull
ON-PREMISE ETL CLOUD
Since we’re here to talk
about Deploying ETL to
Cloud, we’ll assume that
choice is made…
Fully-managed ETL-as-a-service
o Quick to setup and operate
o Limited options if you find missing capability
Self-managed ETL
o Wide range of architecture options
o More control over ETL behavior.
o More flexible licensing (perpetual, subscription)
o Costs are less predictable (infrastructure costs, labor costs).
Deployment Model is tightly coupled to ETL vendor selection
There is a range of Cloud ETL deployment models
We’ll focus today
on self-managed
deployments.
Let’s explore issues around:
Architecture
Security
Costs
Operations
Three real-world use cases
Self-managed cloud ETL
Case #1
Operating an Analytics and Reporting Warehouse
Insurance Company tracking applications for new policies
Field Agents submit application packages via SFTP
Multistage process to ingest, assess and load to warehouse
Nightly batches must be completed within SLA
Case #1
Operating an Analytics and Reporting Warehouse
Azure Cloud
Hybrid ETL
o Fully-managed via Azure Data Factory
o Self-managed CloverDX
Varied storage technology
Security services
Case #1
Deployment Features
CloverDX
[SELF MANAGED]
Azure
Data Lake Storage
Azure
Blob Storage
Azure
SQL Database (Staging)
Azure
Key Vault
Azure
Database (Production)
Azure
Database (CloverDX)
Azure Data Factory
[FULLY MANAGED]
Firewall
SFTP
Azure
Case #1
Architecture
Case #2
High volume message processing
Ingest large volume of small data files
Incoming data transformed to canonical JSON, dispatched to
downstream API
10,000 messages per minute
Guarantee each message delivered exactly once
Case #2
High Volume Message Processing
AWS Cloud
Containerized ETL
o Scalability
Message queues
o Guaranteed message delivery
Case #2
Deployment Features
Case #2
Architecture
AWS
API Gateway
Firewall
AWS
Inbound
Message
AWS
SQS Message
Queue
AWS
S3 Storage
AWS
RDS Database
(CloverDX)
CloverDX
CloverDX
CloverDX
Container Manager
Case #3
Integrating cloud CRM with back-office operations
Expedite response to CRM activity
Sales Quote in CRM triggers immediate action in back office
Relatively low volume
Case #3
Integrating cloud CRM with back-office
AWS Cloud
Serverless deployment (for convenience, not scale)
o Web hook handling
o ETL processor
o ETL database
Case #3
Deployment Features
CRM
Case #3
Architecture
Common patterns in Cloud ETL
Componentization Landing zones Elasticity
Caveats
It can be difficult to
estimate/control
costs of a cloud
deployment
Managing solution costs
CRM
Deploying ETL to
cloud requires
different skills
and expertise.
Need to develop skill to choose & configure services
CloverDX
VM
Azure
Data Lake Storage
Azure
Blob Storage
Azure
Database (Staging)
Azure
Key Vault
Azure
Database (Production)
Azure
Database (CloverDX)
Azure Data Factory
SERVICE
Firewall
SFTP
Azure
Have realistic
expectations about
the support
provided by Cloud
platforms.
Not all services
are guaranteed
to be available
Final Thought – Cloud Selection
Most of our clients use one of these two providers (or both)
Decision likely already made by business independently of ETL needs
Our completely subjective view
o Azure has better console user interface
o Azure sales experience is more friendly to SMEs
o AWS has larger number of services, generally more feature-full
o AWS is more google-able
Azure or AWS?
www.cloverdx.com
About CloverDX Enterprise Data Management Platform
CloverDX is a data management platform for designing, automating and operating data jobs at scale. We've engineered CloverDX to
solve complex data movement and transformation scenarios with a combination of visual IDE for data jobs, flexibility of coding and
extensible automation and orchestration features.
hello@cloverdx.com

More Related Content

What's hot

Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
Amazon Web Services
 
Los beneficios de migrar sus cargas de trabajo de big data a AWS
Los beneficios de migrar sus cargas de trabajo de big data a AWSLos beneficios de migrar sus cargas de trabajo de big data a AWS
Los beneficios de migrar sus cargas de trabajo de big data a AWS
Amazon Web Services LATAM
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Data warehouse
Data warehouseData warehouse
Data warehouse
krishna kumar singh
 
Literature Review: Security on cloud computing
Literature Review: Security on cloud computingLiterature Review: Security on cloud computing
Literature Review: Security on cloud computing
Suranga Nisiwasala
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
Subhas Kumar Ghosh
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
Amazon Web Services
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
Amazon Web Services
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
Amazon Web Services
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
obieefans
 
Büyük veri teknolojilerine giriş v1l
Büyük veri teknolojilerine giriş v1lBüyük veri teknolojilerine giriş v1l
Büyük veri teknolojilerine giriş v1l
Hakan Ilter
 
AWS May Webinar Series - Getting Started: Storage with Amazon S3 and Amazon G...
AWS May Webinar Series - Getting Started: Storage with Amazon S3 and Amazon G...AWS May Webinar Series - Getting Started: Storage with Amazon S3 and Amazon G...
AWS May Webinar Series - Getting Started: Storage with Amazon S3 and Amazon G...
Amazon Web Services
 
Big Data
Big DataBig Data
Big Data
Seminar Links
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
Kent Graziano
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
idnats
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
Denodo
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
Amazon Web Services
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
Amazon Web Services
 

What's hot (20)

Building a Modern Data Platform in the Cloud
Building a Modern Data Platform in the CloudBuilding a Modern Data Platform in the Cloud
Building a Modern Data Platform in the Cloud
 
Los beneficios de migrar sus cargas de trabajo de big data a AWS
Los beneficios de migrar sus cargas de trabajo de big data a AWSLos beneficios de migrar sus cargas de trabajo de big data a AWS
Los beneficios de migrar sus cargas de trabajo de big data a AWS
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Literature Review: Security on cloud computing
Literature Review: Security on cloud computingLiterature Review: Security on cloud computing
Literature Review: Security on cloud computing
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Intro to AWS: Storage Services
Intro to AWS: Storage ServicesIntro to AWS: Storage Services
Intro to AWS: Storage Services
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Büyük veri teknolojilerine giriş v1l
Büyük veri teknolojilerine giriş v1lBüyük veri teknolojilerine giriş v1l
Büyük veri teknolojilerine giriş v1l
 
AWS May Webinar Series - Getting Started: Storage with Amazon S3 and Amazon G...
AWS May Webinar Series - Getting Started: Storage with Amazon S3 and Amazon G...AWS May Webinar Series - Getting Started: Storage with Amazon S3 and Amazon G...
AWS May Webinar Series - Getting Started: Storage with Amazon S3 and Amazon G...
 
Big Data
Big DataBig Data
Big Data
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
Ppt
PptPpt
Ppt
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 

Similar to Deploying ETL to Cloud

Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
pbonillo1
 
Innovate on Cloud with AWS
Innovate on Cloud with AWSInnovate on Cloud with AWS
Innovate on Cloud with AWS
Amazon Web Services
 
AWS webinar - optimize your aws data transfer out for cost and performance.
AWS webinar - optimize your aws data transfer out for cost and performance.AWS webinar - optimize your aws data transfer out for cost and performance.
AWS webinar - optimize your aws data transfer out for cost and performance.
Nazar Spak
 
A complete-guide-to-oracle-to-redshift-migration
A complete-guide-to-oracle-to-redshift-migrationA complete-guide-to-oracle-to-redshift-migration
A complete-guide-to-oracle-to-redshift-migration
bindu1512
 
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
Amazon Web Services
 
How Globe Telecom does Primary Backups via StorReduce to the AWS Cloud
 How Globe Telecom does Primary Backups via StorReduce to the AWS Cloud How Globe Telecom does Primary Backups via StorReduce to the AWS Cloud
How Globe Telecom does Primary Backups via StorReduce to the AWS Cloud
Amazon Web Services
 
Create Secure Test and Dev Environments in the Cloud
Create Secure Test and Dev Environments in the CloudCreate Secure Test and Dev Environments in the Cloud
Create Secure Test and Dev Environments in the Cloud
RightScale
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
Mark Kromer
 
Trivadis TechEvent 2017 Migrating to Cloud: Capacity Management Martin Berger
Trivadis TechEvent 2017 Migrating to Cloud: Capacity Management Martin BergerTrivadis TechEvent 2017 Migrating to Cloud: Capacity Management Martin Berger
Trivadis TechEvent 2017 Migrating to Cloud: Capacity Management Martin Berger
Trivadis
 
AWS Summit Auckland 2014 | Connecting the Cloud - Session Sponsored by Teleco...
AWS Summit Auckland 2014 | Connecting the Cloud - Session Sponsored by Teleco...AWS Summit Auckland 2014 | Connecting the Cloud - Session Sponsored by Teleco...
AWS Summit Auckland 2014 | Connecting the Cloud - Session Sponsored by Teleco...
Amazon Web Services
 
Build on AWS: Migrating And Platforming
Build on AWS: Migrating And PlatformingBuild on AWS: Migrating And Platforming
Build on AWS: Migrating And Platforming
Amazon Web Services
 
Tổng quan về AWS cực hay
Tổng quan về AWS cực hayTổng quan về AWS cực hay
Tổng quan về AWS cực hay
Hoa PN Thaycacac
 
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Amazon Web Services
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Amazon Web Services
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Amazon Web Services
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
Rob Gillen
 
Cloudwork
CloudworkCloudwork
Cloudwork
Jaap Gorjup
 
Amazon Web Services - 9 Posts.
Amazon Web Services - 9 Posts.Amazon Web Services - 9 Posts.
Amazon Web Services - 9 Posts.
Shagun Rathore
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
Brian Christner
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
Josef Adersberger
 

Similar to Deploying ETL to Cloud (20)

Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
 
Innovate on Cloud with AWS
Innovate on Cloud with AWSInnovate on Cloud with AWS
Innovate on Cloud with AWS
 
AWS webinar - optimize your aws data transfer out for cost and performance.
AWS webinar - optimize your aws data transfer out for cost and performance.AWS webinar - optimize your aws data transfer out for cost and performance.
AWS webinar - optimize your aws data transfer out for cost and performance.
 
A complete-guide-to-oracle-to-redshift-migration
A complete-guide-to-oracle-to-redshift-migrationA complete-guide-to-oracle-to-redshift-migration
A complete-guide-to-oracle-to-redshift-migration
 
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
(DAT303) Oracle on AWS and Amazon RDS: Secure, Fast, and Scalable
 
How Globe Telecom does Primary Backups via StorReduce to the AWS Cloud
 How Globe Telecom does Primary Backups via StorReduce to the AWS Cloud How Globe Telecom does Primary Backups via StorReduce to the AWS Cloud
How Globe Telecom does Primary Backups via StorReduce to the AWS Cloud
 
Create Secure Test and Dev Environments in the Cloud
Create Secure Test and Dev Environments in the CloudCreate Secure Test and Dev Environments in the Cloud
Create Secure Test and Dev Environments in the Cloud
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Trivadis TechEvent 2017 Migrating to Cloud: Capacity Management Martin Berger
Trivadis TechEvent 2017 Migrating to Cloud: Capacity Management Martin BergerTrivadis TechEvent 2017 Migrating to Cloud: Capacity Management Martin Berger
Trivadis TechEvent 2017 Migrating to Cloud: Capacity Management Martin Berger
 
AWS Summit Auckland 2014 | Connecting the Cloud - Session Sponsored by Teleco...
AWS Summit Auckland 2014 | Connecting the Cloud - Session Sponsored by Teleco...AWS Summit Auckland 2014 | Connecting the Cloud - Session Sponsored by Teleco...
AWS Summit Auckland 2014 | Connecting the Cloud - Session Sponsored by Teleco...
 
Build on AWS: Migrating And Platforming
Build on AWS: Migrating And PlatformingBuild on AWS: Migrating And Platforming
Build on AWS: Migrating And Platforming
 
Tổng quan về AWS cực hay
Tổng quan về AWS cực hayTổng quan về AWS cực hay
Tổng quan về AWS cực hay
 
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
 
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech TalksMigrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
Migrating Your Oracle Database to PostgreSQL - AWS Online Tech Talks
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 
Cloudwork
CloudworkCloudwork
Cloudwork
 
Amazon Web Services - 9 Posts.
Amazon Web Services - 9 Posts.Amazon Web Services - 9 Posts.
Amazon Web Services - 9 Posts.
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 

More from CloverDX

Data architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategyData architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategy
CloverDX
 
Characteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovationCharacteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovation
CloverDX
 
How to build an automated customer data onboarding pipeline
How to build an automated customer data onboarding pipelineHow to build an automated customer data onboarding pipeline
How to build an automated customer data onboarding pipeline
CloverDX
 
Automating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and ExcelAutomating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and Excel
CloverDX
 
CloverDX 6.2 Release
CloverDX 6.2 ReleaseCloverDX 6.2 Release
CloverDX 6.2 Release
CloverDX
 
How to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy AppsHow to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy Apps
CloverDX
 
Moving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid RiskMoving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid Risk
CloverDX
 
Starting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyStarting Your Modern DataOps Journey
Starting Your Modern DataOps Journey
CloverDX
 
CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX
 
Modern management of data pipelines made easier
Modern management of data pipelines made easierModern management of data pipelines made easier
Modern management of data pipelines made easier
CloverDX
 
Removing Danger From Data
Removing Danger From DataRemoving Danger From Data
Removing Danger From Data
CloverDX
 
Data Anonymization For Better Software Testing
Data Anonymization For Better Software TestingData Anonymization For Better Software Testing
Data Anonymization For Better Software Testing
CloverDX
 
How to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data ServicesHow to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data Services
CloverDX
 
Moving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really TakesMoving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really Takes
CloverDX
 

More from CloverDX (14)

Data architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategyData architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategy
 
Characteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovationCharacteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovation
 
How to build an automated customer data onboarding pipeline
How to build an automated customer data onboarding pipelineHow to build an automated customer data onboarding pipeline
How to build an automated customer data onboarding pipeline
 
Automating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and ExcelAutomating Data Pipelines: Moving away from Scripts and Excel
Automating Data Pipelines: Moving away from Scripts and Excel
 
CloverDX 6.2 Release
CloverDX 6.2 ReleaseCloverDX 6.2 Release
CloverDX 6.2 Release
 
How to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy AppsHow to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy Apps
 
Moving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid RiskMoving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid Risk
 
Starting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyStarting Your Modern DataOps Journey
Starting Your Modern DataOps Journey
 
CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)
 
Modern management of data pipelines made easier
Modern management of data pipelines made easierModern management of data pipelines made easier
Modern management of data pipelines made easier
 
Removing Danger From Data
Removing Danger From DataRemoving Danger From Data
Removing Danger From Data
 
Data Anonymization For Better Software Testing
Data Anonymization For Better Software TestingData Anonymization For Better Software Testing
Data Anonymization For Better Software Testing
 
How to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data ServicesHow to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data Services
 
Moving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really TakesMoving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really Takes
 

Recently uploaded

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Deploying ETL to Cloud

  • 1. Deploying ETL to the cloud What it takes to set up a production data pipeline starting from zero
  • 2. Our data is moving to cloud, its natural that our data integration processes follow. Cloud platforms inherently better at infrastructure o Security o Availability o Trust-worthiness Motivation
  • 3. Some ETL belongs on-prem, some belongs in the cloud. Sometimes ETL location is not such an obvious decision. Which ETL workloads are candidates for the cloud? Primary Sources Primary Targets Example Use Case ETL Location On-prem On-prem Reporting, Migration On prem On-prem Cloud Big Data Analytics ? Cloud On-prem Enrichment ? Cloud Cloud Application integration Cloud
  • 4. Deploy ETL on-premise and push to the cloud ON-PREMISE Data push ETL CLOUD
  • 5. Deploy ETL in cloud and pull from on-premise Open ports Data pull ON-PREMISE ETL CLOUD
  • 6. Since we’re here to talk about Deploying ETL to Cloud, we’ll assume that choice is made…
  • 7. Fully-managed ETL-as-a-service o Quick to setup and operate o Limited options if you find missing capability Self-managed ETL o Wide range of architecture options o More control over ETL behavior. o More flexible licensing (perpetual, subscription) o Costs are less predictable (infrastructure costs, labor costs). Deployment Model is tightly coupled to ETL vendor selection There is a range of Cloud ETL deployment models
  • 8. We’ll focus today on self-managed deployments.
  • 9. Let’s explore issues around: Architecture Security Costs Operations Three real-world use cases Self-managed cloud ETL
  • 10. Case #1 Operating an Analytics and Reporting Warehouse
  • 11. Insurance Company tracking applications for new policies Field Agents submit application packages via SFTP Multistage process to ingest, assess and load to warehouse Nightly batches must be completed within SLA Case #1 Operating an Analytics and Reporting Warehouse
  • 12. Azure Cloud Hybrid ETL o Fully-managed via Azure Data Factory o Self-managed CloverDX Varied storage technology Security services Case #1 Deployment Features
  • 13. CloverDX [SELF MANAGED] Azure Data Lake Storage Azure Blob Storage Azure SQL Database (Staging) Azure Key Vault Azure Database (Production) Azure Database (CloverDX) Azure Data Factory [FULLY MANAGED] Firewall SFTP Azure Case #1 Architecture
  • 14. Case #2 High volume message processing
  • 15. Ingest large volume of small data files Incoming data transformed to canonical JSON, dispatched to downstream API 10,000 messages per minute Guarantee each message delivered exactly once Case #2 High Volume Message Processing
  • 16. AWS Cloud Containerized ETL o Scalability Message queues o Guaranteed message delivery Case #2 Deployment Features
  • 17. Case #2 Architecture AWS API Gateway Firewall AWS Inbound Message AWS SQS Message Queue AWS S3 Storage AWS RDS Database (CloverDX) CloverDX CloverDX CloverDX Container Manager
  • 18. Case #3 Integrating cloud CRM with back-office operations
  • 19. Expedite response to CRM activity Sales Quote in CRM triggers immediate action in back office Relatively low volume Case #3 Integrating cloud CRM with back-office
  • 20. AWS Cloud Serverless deployment (for convenience, not scale) o Web hook handling o ETL processor o ETL database Case #3 Deployment Features
  • 22. Common patterns in Cloud ETL Componentization Landing zones Elasticity
  • 24. It can be difficult to estimate/control costs of a cloud deployment
  • 26. Deploying ETL to cloud requires different skills and expertise.
  • 27. Need to develop skill to choose & configure services CloverDX VM Azure Data Lake Storage Azure Blob Storage Azure Database (Staging) Azure Key Vault Azure Database (Production) Azure Database (CloverDX) Azure Data Factory SERVICE Firewall SFTP Azure
  • 28. Have realistic expectations about the support provided by Cloud platforms.
  • 29. Not all services are guaranteed to be available
  • 30. Final Thought – Cloud Selection
  • 31. Most of our clients use one of these two providers (or both) Decision likely already made by business independently of ETL needs Our completely subjective view o Azure has better console user interface o Azure sales experience is more friendly to SMEs o AWS has larger number of services, generally more feature-full o AWS is more google-able Azure or AWS?
  • 32. www.cloverdx.com About CloverDX Enterprise Data Management Platform CloverDX is a data management platform for designing, automating and operating data jobs at scale. We've engineered CloverDX to solve complex data movement and transformation scenarios with a combination of visual IDE for data jobs, flexibility of coding and extensible automation and orchestration features. hello@cloverdx.com