SlideShare a Scribd company logo
1 of 29
Automating Data
Pipelines: Moving
away from Scripts
and Excel
Kevin Scott
Director of Sales Engineering
Homegrown ETL solutions are common
Excel Excel, Python, SQL *-SQL, Java, C#
Manual Process Scripts Custom Applications
Naive assessment of the task
o “This is simple, we just need to…”
Urgency
o tight project deadline, no time for research/selection of third-party tools
Exceptional Requirements
o too challenging for a commercial off-the-shelf solution
Exceptional Team
o you have a highly skilled and available dev team eager to DIY
Historical Precedent
o you’ve always done it this way
Motivation for choosing homegrown solutions
Feature Gaps
o new end points, new DQ issues
Lack of transparency
o Logging, alerting, auditing, error reporting
Age
o Needs age-related overhaul, or has accumulated cruft
Maintenance Costs
o dev team has moved on (or you need the dev to move on…)
o maintenance costs ripple beyond that actual maintenance task – what else
could team be working on?
Scaling Issues
o can’t keep up with increased demand
Risks of choosing homegrown solutions
Designed in-house to solve specific in-house data problems
Use some combination of
o Manual processes
o Desktop tools
o Scripts
o Libraries
o Programs
o Data storage
o Operating System Services
Homegrown ETL Solutions
Using a Modern Data
Integration Platform to
properly automate your
data pipelines, in a robust,
scalable way, can eliminate
these risks and save a
significant amount of time.
In cloud — On premise — Hybrid
CloverDX Data Integration Platform
Automation of data
workloads from A to Z
One place for solving the
mundane and the complex
Productivity and trust
for the enterprise
Data self-service for everyone
CloverDX Data Integration Platform helps with..
Replacing legacy/home-grown tooling
Data ingestion/onboarding
Operational data and application integration
Data migration
Data quality
Data for BI and reporting
CloverDX High-level Architecture
Case Study
Ingesting data from many sources for analysis
Fintech Vertical
Business provides analysis services to credit unions
Accept input files from many client institutions
o Variable format
o Variable quality
Transform into standard format
Assess quality
Load into a warehouse for subsequent analysis
Case Study Scenario
As a manual process?
As a scripted process?
Using the CloverDX
Data Integration Platform…
Steps include:
o Detecting arrival of client files to be ingested
o Detecting format and layout of client files
o Reading client files
o Transforming/Mapping
o Assessing quality
o Loading to target
o Detecting/Logging at every step
End-to-end oversight of the ingest process
Steps include:
o Detecting arrival of client files to be ingested
o Detecting format and layout of client files
o Reading client files
o Transforming/Mapping
o Assessing quality
o Loading to target
o Detecting/Logging at every step
End-to-end oversight of the ingest process
Detect data
available for ingest
Match with
client-specific
processing rules
Read
Transform
Map
Validate
Load to warehouse
Update
ingestion log
Orchestrating the ingest process
Orchestrating the ingest process
Orchestrating the ingest process
Ingest process details
Read, validate,
transform,
write, log error
Run ingest jobs automatically, unattended
o Schedule jobs that look for files to onboard
o Listen for arrival of files to onboard
o Launch the onboarding process on-demand
Record all ingest activity
o Alerts when jobs fail
o Logs of every execution
o Graphical inspection of any run
CloverDX automates the ingest process
Run ingest jobs automatically and unattended
(Re)run ingest jobs on demand
Continually monitor ingest jobs
Visually inspect ingest job failures
Eliminate risks of using homegrown Scripts and Excel
Visually design your data jobs
Automate Execution
Instill confidence in operations
Save a significant amount of time
Use a Modern Data Integration Platform
More on automated data ingestion with CloverDX:
www.cloverdx.com/solutions/data-ingest
Request a CloverDX demo:
www.cloverdx.com/demo
Q&A
www.cloverdx.com/webinars

More Related Content

Similar to Automating Data Pipelines: Moving away from Scripts and Excel

Archana_Yadav_Resume
Archana_Yadav_ResumeArchana_Yadav_Resume
Archana_Yadav_Resumearchana yadav
 
Archana_Yadav_Resume
Archana_Yadav_ResumeArchana_Yadav_Resume
Archana_Yadav_Resumearchana yadav
 
Creating a Hybrid Approach to Legacy Conversion
Creating a Hybrid Approach to Legacy ConversionCreating a Hybrid Approach to Legacy Conversion
Creating a Hybrid Approach to Legacy Conversiondclsocialmedia
 
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...Daniel Oh
 
Resume_kallesh_latest
Resume_kallesh_latestResume_kallesh_latest
Resume_kallesh_latestKallesha CB
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die() LivePerson
 
Deepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh Rai
 
Steps in Simulation Study
Steps in Simulation StudySteps in Simulation Study
Steps in Simulation StudyNalin Adhikari
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_ResumeAmit Kumar
 
Resume - Deepak v.s
Resume -  Deepak v.sResume -  Deepak v.s
Resume - Deepak v.sDeepak V S
 
Test Consultant II - Sreekanth Ajith
Test Consultant II  - Sreekanth AjithTest Consultant II  - Sreekanth Ajith
Test Consultant II - Sreekanth AjithSreekanth A
 
Pankaj_Kumar_~3 yr exp.docx
Pankaj_Kumar_~3  yr exp.docxPankaj_Kumar_~3  yr exp.docx
Pankaj_Kumar_~3 yr exp.docxKumar Pankaj
 
Vandana Sathish Maller
Vandana Sathish MallerVandana Sathish Maller
Vandana Sathish MallerVandana Maller
 
Bhagyashree Nayak Resume
Bhagyashree Nayak ResumeBhagyashree Nayak Resume
Bhagyashree Nayak ResumeBhagya Shree
 
Characteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovationCharacteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovationCloverDX
 

Similar to Automating Data Pipelines: Moving away from Scripts and Excel (20)

Archana_Yadav_Resume
Archana_Yadav_ResumeArchana_Yadav_Resume
Archana_Yadav_Resume
 
Archana_Yadav_Resume
Archana_Yadav_ResumeArchana_Yadav_Resume
Archana_Yadav_Resume
 
Arman Jayson Ornido-CV_v3
Arman Jayson Ornido-CV_v3Arman Jayson Ornido-CV_v3
Arman Jayson Ornido-CV_v3
 
Creating a Hybrid Approach to Legacy Conversion
Creating a Hybrid Approach to Legacy ConversionCreating a Hybrid Approach to Legacy Conversion
Creating a Hybrid Approach to Legacy Conversion
 
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...
 
Resume_kallesh_latest
Resume_kallesh_latestResume_kallesh_latest
Resume_kallesh_latest
 
Measure() or die()
Measure() or die()Measure() or die()
Measure() or die()
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Deepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_LatestDeepesh_Rai_Resume_Latest
Deepesh_Rai_Resume_Latest
 
Steps in Simulation Study
Steps in Simulation StudySteps in Simulation Study
Steps in Simulation Study
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
 
ShobhaResume
ShobhaResumeShobhaResume
ShobhaResume
 
Arpit Srivastava
Arpit SrivastavaArpit Srivastava
Arpit Srivastava
 
Resume - Deepak v.s
Resume -  Deepak v.sResume -  Deepak v.s
Resume - Deepak v.s
 
Test Consultant II - Sreekanth Ajith
Test Consultant II  - Sreekanth AjithTest Consultant II  - Sreekanth Ajith
Test Consultant II - Sreekanth Ajith
 
Pankaj_Kumar_~3 yr exp.docx
Pankaj_Kumar_~3  yr exp.docxPankaj_Kumar_~3  yr exp.docx
Pankaj_Kumar_~3 yr exp.docx
 
Vandana Sathish Maller
Vandana Sathish MallerVandana Sathish Maller
Vandana Sathish Maller
 
Bhagyashree Nayak Resume
Bhagyashree Nayak ResumeBhagyashree Nayak Resume
Bhagyashree Nayak Resume
 
Characteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovationCharacteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovation
 
Rohit Nagpal_Resume
Rohit Nagpal_ResumeRohit Nagpal_Resume
Rohit Nagpal_Resume
 

More from CloverDX

Data architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategyData architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategyCloverDX
 
CloverDX 6.2 Release
CloverDX 6.2 ReleaseCloverDX 6.2 Release
CloverDX 6.2 ReleaseCloverDX
 
How to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy AppsHow to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy AppsCloverDX
 
Deploying ETL to Cloud
Deploying ETL to CloudDeploying ETL to Cloud
Deploying ETL to CloudCloverDX
 
Moving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid RiskMoving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid RiskCloverDX
 
Starting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyStarting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyCloverDX
 
CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)CloverDX
 
Modern management of data pipelines made easier
Modern management of data pipelines made easierModern management of data pipelines made easier
Modern management of data pipelines made easierCloverDX
 
Removing Danger From Data
Removing Danger From DataRemoving Danger From Data
Removing Danger From DataCloverDX
 
Data Anonymization For Better Software Testing
Data Anonymization For Better Software TestingData Anonymization For Better Software Testing
Data Anonymization For Better Software TestingCloverDX
 
How to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data ServicesHow to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data ServicesCloverDX
 
Moving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really TakesMoving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really TakesCloverDX
 

More from CloverDX (12)

Data architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategyData architecture principles to accelerate your data strategy
Data architecture principles to accelerate your data strategy
 
CloverDX 6.2 Release
CloverDX 6.2 ReleaseCloverDX 6.2 Release
CloverDX 6.2 Release
 
How to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy AppsHow to Effectively Migrate Data From Legacy Apps
How to Effectively Migrate Data From Legacy Apps
 
Deploying ETL to Cloud
Deploying ETL to CloudDeploying ETL to Cloud
Deploying ETL to Cloud
 
Moving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid RiskMoving Legacy Apps to Cloud: How to Avoid Risk
Moving Legacy Apps to Cloud: How to Avoid Risk
 
Starting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyStarting Your Modern DataOps Journey
Starting Your Modern DataOps Journey
 
CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)CloverDX for IBM Infosphere MDM (for 11.4 and later)
CloverDX for IBM Infosphere MDM (for 11.4 and later)
 
Modern management of data pipelines made easier
Modern management of data pipelines made easierModern management of data pipelines made easier
Modern management of data pipelines made easier
 
Removing Danger From Data
Removing Danger From DataRemoving Danger From Data
Removing Danger From Data
 
Data Anonymization For Better Software Testing
Data Anonymization For Better Software TestingData Anonymization For Better Software Testing
Data Anonymization For Better Software Testing
 
How to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data ServicesHow to publish data and transformations over APIs with CloverDX Data Services
How to publish data and transformations over APIs with CloverDX Data Services
 
Moving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really TakesMoving "Something Simple" To The Cloud - What It Really Takes
Moving "Something Simple" To The Cloud - What It Really Takes
 

Recently uploaded

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 

Recently uploaded (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 

Automating Data Pipelines: Moving away from Scripts and Excel

  • 1. Automating Data Pipelines: Moving away from Scripts and Excel Kevin Scott Director of Sales Engineering
  • 2. Homegrown ETL solutions are common Excel Excel, Python, SQL *-SQL, Java, C# Manual Process Scripts Custom Applications
  • 3. Naive assessment of the task o “This is simple, we just need to…” Urgency o tight project deadline, no time for research/selection of third-party tools Exceptional Requirements o too challenging for a commercial off-the-shelf solution Exceptional Team o you have a highly skilled and available dev team eager to DIY Historical Precedent o you’ve always done it this way Motivation for choosing homegrown solutions
  • 4. Feature Gaps o new end points, new DQ issues Lack of transparency o Logging, alerting, auditing, error reporting Age o Needs age-related overhaul, or has accumulated cruft Maintenance Costs o dev team has moved on (or you need the dev to move on…) o maintenance costs ripple beyond that actual maintenance task – what else could team be working on? Scaling Issues o can’t keep up with increased demand Risks of choosing homegrown solutions
  • 5. Designed in-house to solve specific in-house data problems Use some combination of o Manual processes o Desktop tools o Scripts o Libraries o Programs o Data storage o Operating System Services Homegrown ETL Solutions
  • 6. Using a Modern Data Integration Platform to properly automate your data pipelines, in a robust, scalable way, can eliminate these risks and save a significant amount of time.
  • 7. In cloud — On premise — Hybrid CloverDX Data Integration Platform Automation of data workloads from A to Z One place for solving the mundane and the complex Productivity and trust for the enterprise Data self-service for everyone
  • 8. CloverDX Data Integration Platform helps with.. Replacing legacy/home-grown tooling Data ingestion/onboarding Operational data and application integration Data migration Data quality Data for BI and reporting
  • 10. Case Study Ingesting data from many sources for analysis
  • 11. Fintech Vertical Business provides analysis services to credit unions Accept input files from many client institutions o Variable format o Variable quality Transform into standard format Assess quality Load into a warehouse for subsequent analysis Case Study Scenario
  • 12. As a manual process?
  • 13.
  • 14. As a scripted process?
  • 15.
  • 16. Using the CloverDX Data Integration Platform…
  • 17. Steps include: o Detecting arrival of client files to be ingested o Detecting format and layout of client files o Reading client files o Transforming/Mapping o Assessing quality o Loading to target o Detecting/Logging at every step End-to-end oversight of the ingest process
  • 18. Steps include: o Detecting arrival of client files to be ingested o Detecting format and layout of client files o Reading client files o Transforming/Mapping o Assessing quality o Loading to target o Detecting/Logging at every step End-to-end oversight of the ingest process Detect data available for ingest Match with client-specific processing rules Read Transform Map Validate Load to warehouse Update ingestion log
  • 22. Ingest process details Read, validate, transform, write, log error
  • 23. Run ingest jobs automatically, unattended o Schedule jobs that look for files to onboard o Listen for arrival of files to onboard o Launch the onboarding process on-demand Record all ingest activity o Alerts when jobs fail o Logs of every execution o Graphical inspection of any run CloverDX automates the ingest process
  • 24. Run ingest jobs automatically and unattended
  • 25. (Re)run ingest jobs on demand
  • 27. Visually inspect ingest job failures
  • 28. Eliminate risks of using homegrown Scripts and Excel Visually design your data jobs Automate Execution Instill confidence in operations Save a significant amount of time Use a Modern Data Integration Platform
  • 29. More on automated data ingestion with CloverDX: www.cloverdx.com/solutions/data-ingest Request a CloverDX demo: www.cloverdx.com/demo Q&A www.cloverdx.com/webinars

Editor's Notes

  1. You can certainly envision how to do this manually. Open your favorite FTP program to grab the files, copy them to your local workspace, open them, visually inspect them. Run the data import wizard in your SQLWorkbench. You can also envision all the reasons this is impractical. Huge data files. Too many files. How often the process needs to run.
  2. You can probably also think about how to simplify the process and begin to automate. A shell script to pull the files from the FTP site. Choose your favorite animal from the O’reilly menagerie. scripting language for validation. SQL scripts to load data to the repository. Maybe add further efficiencies by more shell scripts to start hooking these steps together. Less time consuming, but still rather ad-hoc, still error prone, and still taking staff resources away from more valuable work.   CloverETL will allow you to automate this data management process - to orchestrate, monitor and alert the entire workflow. Take people completely out of the loop, de-risking, removing sources of error, keeping logs of all activity and alerting the right people when errors occur and intervention is needed.