Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Building a modern data architecture
March 31, 2016
Ben Sharma | CEO and Founder
ben@zaloni.com

•  Award-winning provider of enterprise
data lake management solutions:
Integrated data lake management platform
Self-service data preparation
•  Data Lake Design and Implementation Services
•  Data Science Professional Services
2
Zaloni Proprietary
Delivering on the business of big data
Funded by top-tier technology
investors:

Data lakes will be central to the modern data architecture
Agility Insight Scalability
3
Zaloni Proprietary

•  Store all types data: structured and unstructured data
•  Store raw data in its original form for extended period of time
•  Uses various tools to correlate, enrich and query for insights
on the data
•  Provides democratized access via a single unified view
across the Enterprise
The promise of a data lake: All data is welcome….
Zaloni Proprietary4

Data architecture modernizationTraditionalNew
Data Lake
Sources ETL EDW
Derived
(Transformed)
Discovery Sandbox
EDW
Streaming
Unstructured Data
Various Sources
Zaloni Proprietary
Data Discovery
Analytics
BI
Data Science
Data Discovery
Analytics
BI
5

Data lake challenges and complications
•  Ingestion
•  Lack of Visibility
•  Privacy and Compliance
•  Quality Issues
•  Reliance on IT
•  Reusability
•  Rate of Change
•  Skills Gap
•  Complexity
Building: Managing: Delivering:
Zaloni Proprietary6
Engage the business
• Discover
• Enrich
• Provision
Govern the data in the lake
• Cleanse
• Secure
• Operationalize
Enable the data lake
• Ingest
• Organize
• Catalog

Data lake reference architecture
Consumption
Zone
Source
System
File Data
DB Data
ETL Extracts
Streaming
Transient
Loading Zone
Raw Data
Refined
Data
Trusted
Data
Discovery
Sandbox
Original unaltered
data attributes
Tokenized Data
APIs
Reference Data Master Data
Data Wrangling
Data Discovery
Exploratory Analytics
Metadata Data Quality Data Catalog Security
Data Lake
Integrate to
common format
Data Validation
Data Cleansing
Aggregations
OLTP or ODS
Enterprise Data
Warehouse
Logs
(or other unstructured
data)
Cloud Services
Business Analysts
Researchers
Data Scientists
Zaloni Proprietary
7

Data lake management platform
Unified Data Management
Managed Ingestion
Data Reliability
Data Visibility
Data Security and Privacy
Integrated
Data Lake
Management
Zaloni Proprietary8

•  Ability to ingest vast amounts of data
•  Ability to handle a wide variety of formats
(streaming, files, custom)
•  Ability to handle wide variety of sources
•  Capture operational metadata implicitly
as new data arrives
•  Build in repeatability through automation to pick up
incoming data and apply pre-defined processing
First things first….managed ingestion
Various
Sources
Streaming
Unstructured
Data
Zaloni Proprietary9

•  Reduced time to insight for analytics
•  File and record level watermarking provides data lineage
Capture metadata to improve data visibility and reliability
Type of Metadata Description Example
Technical Captures the form and structure
of each data set
Type of data (text, JSON, Avro), structure
of the data (fields and their types)
Operational Captures lineage, quality, profile
and provenance of the data
Source and target locations of data, size,
number of records, lineage
Business Captures what it all means to the
user
Business names, descriptions, tags,
quality and masking rules
Zaloni Proprietary10

Diagram derived from Gartner report on Self Service Data Preparation
•  Interactive data preparation to address errors, corrupted formats, duplicates
•  Data enrichment to go from raw to refined
•  Self service to prepare data without IT request/SQL knowledge
Data ready: Data preparation required for actionable data
Orchestrate and
automate workflows
Transform Refined
Data
Explore
BI Reports
Enterprise Data
Integrations
Data Science
Data Discovery
Analytics
Raw Data
Automation
Reusable
Transformations
Data Preparation

•  Data lakes enable multiple groups to share access
to centrally stored data
•  Differing permissions require enhanced data security
§  Mask or tokenize data before published in the lake for
consumption
§  Policy-based security
•  Metadata management enables audit and traceability
•  End result: more open and democratized access to
data in the lake for those with permission
Protect sensitive data

Discover, Enrich, Provision
Self Service Data Preparation for Analytics: Catalog, Wrangling, Collaboration
•  See what data is available across your enterprise
•  Blend data in the lake without a costly IT project
•  Perform interactive data-driven transformations
•  Collaborate and share data assets and transformations with peers
EXPLORE PREPARE OPERATIONALIZE
13 Zaloni Proprietary

Catalog with KPIs
Zaloni Confidential and Proprietary14

•  Seeing rapid increase of big data in the Cloud
•  Leverage cloud platforms as complementary to on-premises
•  Support sensitive data on premise and external data in the cloud
(e.g. client data, machine-generated)
Key data challenges for hybrid environments:
“Ground to Cloud” hybrid architectures
Zaloni Proprietary
VISIBILITY GOVERNANCE
Need enterprise-wide data catalog
(logical data lake)
Need consistent data governance
requirements for hybrid platforms
15

INGEST
Manage data ingestion
so you know what is your
Hadoop Data Lake
ORGANIZE
Define and capture
metadata for ease of
searching and browsing
ENRICH
Orchestrate and manage
the data preparation
process
ENGAGE
Data visibility and self-
service data preparation
Manage the complete data pipeline
16
Zaloni Proprietary

Network Data Lake architecture
BI Tools
Network Data Lake
Custom Apps
Data Warehouse
Custom Applications:
•  Subscriber Usage
•  Network Usage Exploration & Ad-hoc Analytics
Data Lake
Manage Ingestion Manage Metadata Manage, Monitor, Schedule
Operations and
Metadata Store
Data Quality &
Rules Engine
Transformation
Engine
Work flow
Executor
Enterprise Data
Warehouse
•  CDR
•  DPI
•  IPFIX
•  SNMP
•  RADIUS
Network Data
•  CRM
•  Billing
•  Inventory
Enterprise Data
Zaloni Proprietary
17

Managed data lake for healthcare payers
Data Lake Management
Edge Node
Data Sources
Relational
Streaming
Files
Data Lake
Configure Ingestion Administer Metadata Manage, Monitor, Schedule
Operations and
Metadata Store
Data Quality &
Rules Engine
Transformation
Engine
Workflow
Executor
Analytical
Applications
Enterprise Data
Warehouse
Consumers
Data Lake
•  Claims
•  EMR
•  Lab/Pathology
•  Pharmacy
•  Member
•  Social
•  Enterprise Data
Applications:
•  HEDIS Reporting
•  Bundle Payments
•  Medical Benefits
Management
•  Scorecards
•  Enterprise Reports
Batch
Ingestion
Streaming
Ingestion
Change Data
Capture
Data Sets:
18
Zaloni Proprietary

Data Lake for BCBS239 Compliance (RDARR)
Register/ update
metadata
RDBMS
Mainframes
Flat files
Binary files
Source Systems
Metadata
repositories
Metadata
Management
solution
Extract/ Read
metadata
Data Ingestion
Data Quality and
Validation
Layout
Standardization
Operational
Metadata
Generation
Data at Rest
Data Acquisition
Automation
•  Automated Data Acquisition Framework providing timeliness of data
•  Capture Metadata in all phases: Ingestion, Transformation
•  Integration with Enterprise Metadata Management
•  Integrated Data Quality Analysis
Zaloni Proprietary
19

Getting Started
Roadmap
Prototype
Analytics Strategy
Business drivers
AND
Business
Questions:
Where is fraud
occurring?
How to optimize
inventory?
Data
Use
Cases
Platform
Subject areas
Source system
Capabilities,
Process
Ingest,
Organize,
Enrich, Explore
Roadmap
Prototype
Analytics Strategy
1Questions 2 Inputs 3 Outcomes
Zaloni Proprietary
20
+ +
=

Stop by booth #1335
and ask for a copy of
our new book and a
free t-shirt!
DON’T GO IN THE DATA
LAKE WITHOUT US
Zaloni Proprietary

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

More Related Content

Viewers also liked

Similar to Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016

Recently uploaded

Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World San Jose 2016