Properly automating your data pipelines, in a robust, scalable way, can eliminate these risks and save a significant amount of time.
See how data integration tools like CloverDX can help you:
Save time writing data manipulations scripts by switching to visual representation of data flows
Handle a growing complexity of data transformation and movement scenarios with integrated jobflow management and business process monitoring
Handle potentially hundreds of data feeds in a manageable manner by easily adopting templates and pre-made components
2. Homegrown ETL solutions are common
Excel Excel, Python, SQL *-SQL, Java, C#
Manual Process Scripts Custom Applications
3. Naive assessment of the task
o “This is simple, we just need to…”
Urgency
o tight project deadline, no time for research/selection of third-party tools
Exceptional Requirements
o too challenging for a commercial off-the-shelf solution
Exceptional Team
o you have a highly skilled and available dev team eager to DIY
Historical Precedent
o you’ve always done it this way
Motivation for choosing homegrown solutions
4. Feature Gaps
o new end points, new DQ issues
Lack of transparency
o Logging, alerting, auditing, error reporting
Age
o Needs age-related overhaul, or has accumulated cruft
Maintenance Costs
o dev team has moved on (or you need the dev to move on…)
o maintenance costs ripple beyond that actual maintenance task – what else
could team be working on?
Scaling Issues
o can’t keep up with increased demand
Risks of choosing homegrown solutions
5. Designed in-house to solve specific in-house data problems
Use some combination of
o Manual processes
o Desktop tools
o Scripts
o Libraries
o Programs
o Data storage
o Operating System Services
Homegrown ETL Solutions
6. Using a Modern Data
Integration Platform to
properly automate your
data pipelines, in a robust,
scalable way, can eliminate
these risks and save a
significant amount of time.
7. In cloud — On premise — Hybrid
CloverDX Data Integration Platform
Automation of data
workloads from A to Z
One place for solving the
mundane and the complex
Productivity and trust
for the enterprise
Data self-service for everyone
8. CloverDX Data Integration Platform helps with..
Replacing legacy/home-grown tooling
Data ingestion/onboarding
Operational data and application integration
Data migration
Data quality
Data for BI and reporting
11. Fintech Vertical
Business provides analysis services to credit unions
Accept input files from many client institutions
o Variable format
o Variable quality
Transform into standard format
Assess quality
Load into a warehouse for subsequent analysis
Case Study Scenario
17. Steps include:
o Detecting arrival of client files to be ingested
o Detecting format and layout of client files
o Reading client files
o Transforming/Mapping
o Assessing quality
o Loading to target
o Detecting/Logging at every step
End-to-end oversight of the ingest process
18. Steps include:
o Detecting arrival of client files to be ingested
o Detecting format and layout of client files
o Reading client files
o Transforming/Mapping
o Assessing quality
o Loading to target
o Detecting/Logging at every step
End-to-end oversight of the ingest process
Detect data
available for ingest
Match with
client-specific
processing rules
Read
Transform
Map
Validate
Load to warehouse
Update
ingestion log
23. Run ingest jobs automatically, unattended
o Schedule jobs that look for files to onboard
o Listen for arrival of files to onboard
o Launch the onboarding process on-demand
Record all ingest activity
o Alerts when jobs fail
o Logs of every execution
o Graphical inspection of any run
CloverDX automates the ingest process
28. Eliminate risks of using homegrown Scripts and Excel
Visually design your data jobs
Automate Execution
Instill confidence in operations
Save a significant amount of time
Use a Modern Data Integration Platform
29. More on automated data ingestion with CloverDX:
www.cloverdx.com/solutions/data-ingest
Request a CloverDX demo:
www.cloverdx.com/demo
Q&A
www.cloverdx.com/webinars
Editor's Notes
You can certainly envision how to do this manually. Open your favorite FTP program to grab the files, copy them to your local workspace, open them, visually inspect them. Run the data import wizard in your SQLWorkbench. You can also envision all the reasons this is impractical. Huge data files. Too many files. How often the process needs to run.
You can probably also think about how to simplify the process and begin to automate. A shell script to pull the files from the FTP site. Choose your favorite animal from the O’reilly menagerie. scripting language for validation. SQL scripts to load data to the repository. Maybe add further efficiencies by more shell scripts to start hooking these steps together. Less time consuming, but still rather ad-hoc, still error prone, and still taking staff resources away from more valuable work.
CloverETL will allow you to automate this data management process - to orchestrate, monitor and alert the entire workflow. Take people completely out of the loop, de-risking, removing sources of error, keeping logs of all activity and alerting the right people when errors occur and intervention is needed.