8. Use Case: Architecting for Analytics
Enabling Analytics through Data Lakes
This... ... Not this
9. What not to do
Data Lake
Operational
Analytics System
Enabling Analytics through Data Lakes
10. The Keys to Data Lake Success
Data
Lake
Architecture
Access
Model
Asset
Inventory
Security
Governance
Enabling Analytics through Data Lakes
Operational Analytics System
12. Architecture Keys
Enabling Analytics through Data Lakes
• Linear data ingestion time
• Horizontal expansionScale
• Data source patterns
• New data sourcesFlexible
• No SPOF
• Vendor support
• Disaster recovery plan
Reliable
13. Access Model – different strokes
Enabling Analytics through Data Lakes
Can we
predict key
KPIs?
Can I
explore
that data?
How’s the
business
doing?
We need a
new data
set
14. Access Model Keys
Enabling Analytics through Data Lakes
• Operational report writers
• Data Scientists
• Business leaders
• Predictive Modelers
Identify
Stakeholders
• Reporting
• Aggregation
• Modeling
• Visualization
Tools
• Internal Knowledge Bases
• Local experts
• Vendor training
Training
15. Asset Inventory – What’s in your lake?
Enabling Analytics through Data Lakes
ETL and
model
code
Aggregate
data sets
Data
Domains
Reports
16. Asset Inventory Keys
Enabling Analytics through Data Lakes
• Data sources
• Key terms
• Key transformations
Identify
• Data Lineage
• Change procedures
• Usage models
Document
• Organizational shift
• Data stewards lead the charge
• Strict ‘no one-off’ policy
Share
17. Security – Who’s in your lake?
Enabling Analytics through Data Lakes
Access
rights
PCI and
PIIHackers!
Auditing
18. Security Keys
Enabling Analytics through Data Lakes
• Identify PII, PCI Data
• Encrypt in the lakeData
• Access logging and auditing
• Data set and field level security
• Download monitoring
System
• Trust but verify
• Transparent policies
• Training
People
19. Data Governance in the Lake
Enabling Analytics through Data Lakes
Who owns
this data?
What is
data truth?
These
numbers
don’t agree!
It’s a
swamp,
not a lake!
20. Governance Keys
Enabling Analytics through Data Lakes
• Stress the value
• Adopt the behaviorsMindshare
• Minimal governance in lake
• Increased governance in warehouse
• Data domains
Lightweight
• Per-domain process
• Data Glossary
• Data stewards
Repeatable
21. Summary
• It’s a lake, not a swamp – inventory and manage
• Lightweight governance is critical
• Architecture needs to be flexible
• Ignore security at your own risk
• Involve accessstakeholders early and often