Five Keys to Building a Killer Data Lake

Five Keys to a
Killer Data Lake
Chuck Yarbrough
VP, Solutions Marketing and Management

Agenda
Why a Data Lake?
Five Keys to a Killer Data Lake
What You Should Do Now

What Do You Get from Data Lake?
More
accurate
intelligence
Transform
your
business
Ability to
increase
revenue
Create
new
products
Better
understand
your
customers
Streamline
operations
and improve
efficiencies

Five Keys to a Killer Data Lake
Align to
corporate strategy
1
Solid data
integration strategy
2
Big Data
on-boarding process
3
Embrace new data
management practices
4
Operationalize
machine learning
models
5

KEY #1
Align to Strategic Organizational Goals

Align Goals and Executive Buy In
• Understand corporate goals
• Identify executive leadership and sponsorship
• Recognize lack of alignment
• Ensure efforts are aligned with strategic goals

Align to Strategic Organizational Goals
Business Acceleration Operational Efficiency Security and Risk
Know your
customer
Customer 360
Churn
Recommendation
engine
Maximize
Profit
Pricing analytics
Targeted
promotions
Market basket
analytics
New Product
Development
Customization of
product
Next product to
build
Modernizing Data
Architecture
EDWO
Storage data
optimization
Industrial
IoT
Sensor Analytics
Predictive
Maintenance
TelematicsInfrastructure
Analytics
Risk
Credit scoring
Fraud detection
Security
Cyber security
Compliance
Trade compliance
Health care
compliance
Anti Money
Laundering

KEY #2
Have a Solid Data Integration Strategy

Data Integration Strategy
• Ensure organizational agreement on strategy
• Manage and automate the Data Pipeline
• Modernize your architecture
• Adaptive execution strategy
• Secure your data
• Accept that Data Governance is separate from Data Management
• Rethink Metadata Management

Managing and Automating the Pipeline
Administration Security Lifecycle
Management
Data
Provenance
Dynamic Data
Pipeline Monitoring Automation
Analytic Data Pipeline
DATA ENGINEERING DATA PREPARATION ANALYTICS
Cleanse Conform Shape
Transform Ingest
Refine Virtualize Blend
Orchestrate Prepare Enrich
Visualize Build Score
Analyze Model
Data
Lake

KEY #3
Establish a Big Data Onboarding Strategy

More Data, More Problems
Modern data onboarding is more than just “connecting” or “loading” – it includes:
Managing a
changing array of
data sources
Establishing
repeatable
processes at scale
Maintaining control
and governance

Dynamic Integration Processes Dynamic Transformations
Ingest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest Procedures
Big Data On Boarding
RDBMS
Hadoop
Disparate Data Sources
CSV
Integration Processes Transformations
Ingest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest ProceduresIngest Procedures
Metadata
Injection

KEY #4
Embrace New Data Management Strategies

Modern Data Management Strategies
• Adopt early ingest and adaptive processing
• Enable the capture of metadata on ingest
• Adopt streaming data processing where appropriate
• Model on the fly
• Modernize data integration infrastructure
• Extend data management to all data
• Apply analytics to all data

KEY #5
Apply Machine Learning Algorithms

Machine Learning Workflow
Prepare
Data
Engineer
Features
Train, Tune and
Tet Models
Update Models
Deploy and
Operationalize Models

Data Lake Blueprint
Global Data Integration
Ingest Blend and Refine
Network
Location
Web
EDW (x12)
Billing
Provisioning
Customer
Social
Media
Pentaho Data
Integration
Hadoop
Cluster
Data
Publisher
Analytical
Database
Pentaho
Analytics
Server
Existing BI and
Data Mining Tools
Data Lake
Pentaho Data Integration
Visual MapReduce
and some native PDI
Transformations On-demand
Data Marts
To be
decommissioned Deliver
 Do you want Protegrity logo
or keep it generic? Go ahead
and delete this note if you
don’t need it.

Uncover Billions of Tax Revenue
Challenge
• £34B missed tax revenue
• Managing 40 TB of data held across
11 separate legacy data warehouses
• Relied on consultants for reports that
required customization and long
lead time
Benefits
• 360 degree view of the tax citizen
• Created a single Big Data platform and
ability to consolidate 40 reporting
streams with self-service reporting
• New reports save an estimated 900 man
hours per day (based on a user-base of
1,200) by streamlining the reporting
process

Takeaways
Align with clear
corporate/strategic
initiatives
Embrace data
management
practices
Enable adaptive
data execution for
data processing
and integration
Drive adoption
of Machine
Learning and
Automation

Thank You
Chuck Yarbrough
VP, Solutions Marketing and Management
@cyarbrough

Five Keys to Building a Killer Data Lake

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Five Keys to Building a Killer Data Lake

Similar to Five Keys to Building a Killer Data Lake (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Five Keys to Building a Killer Data Lake

Editor's Notes