Characteristics of modern data architecture that drive innovation

Characteristics of modern
data architecture that drive
innovation
Cullen Patel
Solutions Engineer @ CloverDX

Automation, Simplicity, Data Quality,
Scalability, and Cost Savings
These matter because
o Keep up with changes
o Keep competitive edge
Real world solution – CloverDX
project
5 Key characteristics

Speed up data processing
Reduce errors
Reduce time and effort required
Ensure data is accurate and
consistent
Why is automation important?

Think about your automation requirements
o Should it go beyond time-based automation
o Should have required level of granularity
o Should be able to orchestrate multiple data jobs intelligently
Automated/handle bad data not just the good
o Should notify or even fix errors/re-try automatically
o Provide it on time to the right person
o Should provide the details to address the error
Ensure tools have intelligent automation

Data pipelines maintainable in long-term
Development team productivity
Build the process in pieces
Trust in process
Why is simplicity important?

How to break the job into smaller pieces?
Transfer files
to cloud
Load into
Snowflake
Build Models

Identify individual components of data pipelines
Each job should deal with a single task
Log
Ingest
Log Log Log
Validate Transform Deliver
Transfer files
to cloud
Load into
Snowflake
Build Models

Ask questions
o What is the purpose of the process, and what is its business impact?
o What interfaces are you going to use?
o How would you like to automate the process?
o What are the weak points?
o How to handle errors?

Ask questions
o What is the purpose of the process, and what is its business impact?
o What interfaces are you going to use?
o How would you like to automate the process?
o What are the weak points?
o How to handle errors?
Identify patterns
o Repeatable and configurable code sections
o Logging, monitoring, automation, …

Good data is great for business but
bad data can be ruinous
Ensure data is accurate and
consistent
Design for bad data
Why is data quality important?

Data profiling – Analyzing the data and looking at its statistics
Data validation – This process involves verifying that data is accurate,
consistent, and even could involve business rules
Data cleansing -This process involves removing or correcting any errors
or inconsistencies in the data
Do it as soon as you can!
Have it always on!
How to ensure high quality data?

Data is essential for innovation
Data volume is increasing
Businesses need to scale without
sacrificing the performance/experience
Why is scalability and extensibility important?

A lot of factors to consider
We will focus on:
o How and when to scale hardware
o How and when to scale your data pipelines/jobs
How to scale

Vertical
o increasing the RAM, CPU, or storage capacity of a single server
o often used in traditional on-premise data centers
Horizontal
o adding more nodes to a system and distributing the load
o this approach is often used in distributed systems – in the cloud
Both
o often the approach in the real world
o can be deployed in the cloud or on prem
How to scale hardware – vertical vs horizontal

Many/large data jobs
o Often as business grows, volume of data/jobs grows
o Environment and tools need to be stable
o Smart automation helps
Many unique data jobs
o Much more challenging to solve
o Proper tooling and development methodology is key
o Your software should allow you use dev time efficiently
o You software leverages templates/reusable parts
How to handle process scaling

Two types of costs, variable and fixed
Cost in cloud can be unpredictable –
o providers offer tools to track and even estimate costs
o Estimates can be hard
o Need to have good discipline and monitoring for the cost -> FinOps
Costs can vary – even with good FinOps
o Seasonal changes
o Changes to workflows
Cost

Capacity vs consumption
Consumption is the most common,
especially in the cloud
Consumption can be hard to estimate
Capacity makes it easier to plan and
budget
Pricing models

Scalability
Automation
Data Quality
Simplicity
Cost Savings
5 Key characteristics

The company is having trouble scaling properly
Each new client they onboard takes too much time
Each has its own variation/uniqueness
Costs for onboarding make ROI much longer
Real project: Highly Scalable ingestion Framework

In cloud — On premise — Hybrid
CloverDX Data Integration Platform
Automation of data
workloads from A to Z
One place for solving the
mundane and the complex
Productivity and trust
for the enterprise

Architecture example
Amazon S3
Bucket
Source SFTP
Servers
CloverDX
Server
GitHub
Target SFTP
Servers
SMTP Server
Codebase
Email report
Data
Data
Configurations

Architecture example
Data staging Preprocessing Validations Postprocessing Data delivery Logging
Configuration

Upcoming webinar
What’s new in CloverDX 6 -April 6
Want to know more?
Requestademo:cloverdx.com/demo
Getatrial:cloverdx.com/trial
Watchpastwebinars:cloverdx.com/webinars
Q&A

Characteristics of modern data architecture that drive innovation

More Related Content

Similar to Characteristics of modern data architecture that drive innovation

More from CloverDX

Recently uploaded

Characteristics of modern data architecture that drive innovation

Editor's Notes