Fireside Chat with Tony Baer, Ovum Research
Developing a Strategy for Data Lake Governance
Wednesday, May 18, 2016
1:00 pm EST
Meet today’s speakers
Tony Baer Principle Analyst, Information Management, Ovum
Tony Baer leads Ovum’s Big Data research area. His coverage focuses on how
Big Data must become a first-class citizen in the data center, IT organization, and
the business. He has a multi-disciplinary background touching the different tiers
of enterprise software. He is an author and sought after speaker.
Scott Gidley Vice President of Product, Zaloni
Scott is a nearly 20 year veteran of the data management software and services
market. Prior to joining Zaloni, Scott served as senior director of product
management at SAS and was previously CTO and cofounder of DataFlux
Corporation. Scott received his BS in Computer Science from University of
Pittsburgh.
•  Award-winning provider of enterprise data lake
management solutions:
Integrated data lake management platform
Self-service data preparation
•  Data Lake Design and Implementation Services:
POC, Pilot, Production, Operations, Training
•  Data Science Professional Services
Delivering on the business of big data
Funded by top-tier technology
investors:
Key Findings
•  Data lakes must be managed
•  Data lakes must have the capability to ingest all data &
related metadata
•  Data lakes will only succeed if they become shared
resources
•  Business users must be prepared to take responsibility
for curating data.
•  Maturity & readiness of tools, technologies & best
practices are works in progress
•  Mgmt. & governance of data lakes should be a phased
process
Ovum Big Data Report:
Developing a Strategy for Data Lake Governance
Group Multi-department Enterprise
Log analytics
Sentiment Analysis
DW offload
Data Lake
Exploratory Analytics
Line of business analytic applications
Operational analytics
Data lake is later stage of Hadoop adoption
IT Data Scientists Business
Bulk storage of raw data
Exploratory Analytics
Line of business
analytic applications
Operational analytics
Migrate I/O-intensive operations (e.g., ELT)
“Deep” analytics
(e.g. segmentation, predictive, prescriptive modeling)
Data lake use case maturity model
Availability/Reliability
(FT,HA,BackupDR)
Monitoring&troubleshooting
Perimeter
Security
Data platform (Hadoop)
Query/Analytics tools, programs
Cost Optimization & Integration
Data Inventory
Data Curation
Data-level security
Self-service
tier
Data Lake building block
Hadoop platform management
End user tool
Ovum’s data lake reference architecture
Data lake challenges and complications
•  Ingestion
•  Lack of Visibility
•  Privacy and Compliance
•  Quality Issues
•  Reliance on IT
•  Reusability
•  Rate of Change
•  Skills Gap
•  Complexity
Building: Managing: Delivering:
Zaloni Confidential and Proprietary8
Engage the business
• Discover
• Enrich
• Provision
Govern the data in the lake
• Cleanse
• Secure
• Operationalize
Enable the data lake
• Ingest
• Organize
• Catalog
Data Curation
Build your library of
information
Physical Inventory
Know/manage what data is in
the data lake
Data profiling, data preparation,
collaborative data enrichment,
catalog, match data, derive master
data, record data lineage
Business & Analytics teams Technology team
Manage data access, track
data lineage, tag for security,
data retention
Manage data access, tag for
security, data retention, lifecycle &
workflow, track data lineage
Collaboration key to modern data management
Data lake reference architecture
Consumption
Zone
Source
System
File Data
DB Data
ETL Extracts
Streaming
Transient
Loading Zone
Raw Data
Refined
Data
Trusted
Data
Discovery
Sandbox
Original unaltered
data attributes
Tokenized Data
APIs
Reference Data Master Data
Data Wrangling
Data Discovery
Exploratory Analytics
Metadata Data Quality Data Catalog Security
Data Lake
Integrate to
common format
Data Validation
Data Cleansing
Aggregations
OLTP or ODS
Enterprise Data
Warehouse
Logs
(or other unstructured
data)
Cloud Services
Business Analysts
Researchers
Data Scientists
Zaloni Proprietary
10
DON’T GO IN THE DATA
LAKE WITHOUT US
Zaloni Proprietary

Ovum Fireside Chat: Governing the data lake - Understanding what's in there

  • 1.
    Fireside Chat withTony Baer, Ovum Research Developing a Strategy for Data Lake Governance Wednesday, May 18, 2016 1:00 pm EST
  • 2.
    Meet today’s speakers TonyBaer Principle Analyst, Information Management, Ovum Tony Baer leads Ovum’s Big Data research area. His coverage focuses on how Big Data must become a first-class citizen in the data center, IT organization, and the business. He has a multi-disciplinary background touching the different tiers of enterprise software. He is an author and sought after speaker. Scott Gidley Vice President of Product, Zaloni Scott is a nearly 20 year veteran of the data management software and services market. Prior to joining Zaloni, Scott served as senior director of product management at SAS and was previously CTO and cofounder of DataFlux Corporation. Scott received his BS in Computer Science from University of Pittsburgh.
  • 3.
    •  Award-winning providerof enterprise data lake management solutions: Integrated data lake management platform Self-service data preparation •  Data Lake Design and Implementation Services: POC, Pilot, Production, Operations, Training •  Data Science Professional Services Delivering on the business of big data Funded by top-tier technology investors:
  • 4.
    Key Findings •  Datalakes must be managed •  Data lakes must have the capability to ingest all data & related metadata •  Data lakes will only succeed if they become shared resources •  Business users must be prepared to take responsibility for curating data. •  Maturity & readiness of tools, technologies & best practices are works in progress •  Mgmt. & governance of data lakes should be a phased process Ovum Big Data Report: Developing a Strategy for Data Lake Governance
  • 5.
    Group Multi-department Enterprise Loganalytics Sentiment Analysis DW offload Data Lake Exploratory Analytics Line of business analytic applications Operational analytics Data lake is later stage of Hadoop adoption
  • 6.
    IT Data ScientistsBusiness Bulk storage of raw data Exploratory Analytics Line of business analytic applications Operational analytics Migrate I/O-intensive operations (e.g., ELT) “Deep” analytics (e.g. segmentation, predictive, prescriptive modeling) Data lake use case maturity model
  • 7.
    Availability/Reliability (FT,HA,BackupDR) Monitoring&troubleshooting Perimeter Security Data platform (Hadoop) Query/Analyticstools, programs Cost Optimization & Integration Data Inventory Data Curation Data-level security Self-service tier Data Lake building block Hadoop platform management End user tool Ovum’s data lake reference architecture
  • 8.
    Data lake challengesand complications •  Ingestion •  Lack of Visibility •  Privacy and Compliance •  Quality Issues •  Reliance on IT •  Reusability •  Rate of Change •  Skills Gap •  Complexity Building: Managing: Delivering: Zaloni Confidential and Proprietary8 Engage the business • Discover • Enrich • Provision Govern the data in the lake • Cleanse • Secure • Operationalize Enable the data lake • Ingest • Organize • Catalog
  • 9.
    Data Curation Build yourlibrary of information Physical Inventory Know/manage what data is in the data lake Data profiling, data preparation, collaborative data enrichment, catalog, match data, derive master data, record data lineage Business & Analytics teams Technology team Manage data access, track data lineage, tag for security, data retention Manage data access, tag for security, data retention, lifecycle & workflow, track data lineage Collaboration key to modern data management
  • 10.
    Data lake referencearchitecture Consumption Zone Source System File Data DB Data ETL Extracts Streaming Transient Loading Zone Raw Data Refined Data Trusted Data Discovery Sandbox Original unaltered data attributes Tokenized Data APIs Reference Data Master Data Data Wrangling Data Discovery Exploratory Analytics Metadata Data Quality Data Catalog Security Data Lake Integrate to common format Data Validation Data Cleansing Aggregations OLTP or ODS Enterprise Data Warehouse Logs (or other unstructured data) Cloud Services Business Analysts Researchers Data Scientists Zaloni Proprietary 10
  • 11.
    DON’T GO INTHE DATA LAKE WITHOUT US Zaloni Proprietary