Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power

Data Warehouse Augmentation
Cut Costs, Increase Power
October 26, 2016

• Award-winning provider of enterprise data lake
management solutions:
Integrated data lake management platform
Self-service catalog and data preparation
• Data Lake Design and Implementation Services:
POC, Pilot, Production, Operations, Training
• Data Science Professional Services

3 Zaloni Proprietary
About our speakers
Pradeep Varadan, Verizon Wireline, OSS Data Science Leader
Varadan is a data scientist and enterprise architect who specializes in data challenges within
telecommunications. He is tasked with providing a competitive edge focused on utilizing data
analytics to drive effective decision-making. He is skilled in creating systems that can be used to
understand and make better decisions involving rapid technology shifts, customer lifestyle and
behavior trends and relevant changes that impact the Verizon Network.
Scott Gidley, Zaloni, VP Product Management
Gidley is responsible for the strategy and roadmap of existing and future products within the Zaloni
portfolio. He is a nearly 20 year veteran of the data management software and services market.
Prior to joining Zaloni, he served as senior director of product management at SAS and was
previously CTO and cofounder of DataFlux Corporation.

Zaloni Confidential and Proprietary - Provided under NDA
Current state of a corporate data flow architecture
BI/ReportingData Generators
Machines
Data Channels
Warehouses Marts
Repositories
Data stores

Business Challenges:
• Increased processing time/reduced
response
• Lack of data lineage/lack of
visibility
• Constant CapEx for hardware
upgrade
• Lack of access to history
Key Challenges
IT Challenges:
• Multiple data transfers
• Multiple technology platforms with
data copies
• Constant performance tuning
for CPU
• Manual data offload for space
management

Sources ETL Report Mart
Data Discovery
Analytics BI
ELT/Reporting/MiningETL
Resource consumption
Staging Warehouse

Typical utilization of RDBMS resources
We expend almost all CPU for low business value ETL
Business Value
CPU
ETL to Stage
Auditing
(Landing tables query)
Data Mining
(Staging query)
Ad-hoc Analysis
(Warehouse query)
ETL to Warehouse
ETL to Reporting
Reporting
(Presentation table query)
*Size indicates frequency of use

~80% of system capacity used for batch processing (ELT)

Reduce cost of ELT/ETL by offloading to Hadoop

The future of enterprise data flowFuture
Legacy
Structured Data ETL EDW+Sandbox BI/ReportingData Marts
Transactional
Systems
Machine logs/IOT
Structured/ Unstructured
Data Lake
Modern
T-Systems
Machines ETL Sandbox
EDW BI/Reporting/
Analytics
Data Marts
Operational Dashboards/EDA/Mining/Reporting/Analytics
Transactional
Systems
EDW Data Marts ETL Sandbox
ETL

Increased
Agility
New
Insights
Improved
Scalability
Data lakes are central to the modern data architecture

Data lake challenges
• Ingestion
• Visibility and Quality
• Privacy and Compliance
• Timeliness
• Reliance on IT
• Reusability
• Rate of Change
• Skills Gap
• Complexity
Managing: Delivering:Building:

Data Lake 360°: A holistic approach to actionable big data
1. Enable the lake
2. Govern the data
3. Engage the business
• Foster a data-driven business
through self-service data
discovery and preparation
• Safeguard sensitive data and
enable regulatory compliance
• Improve data visibility, reliability
and quality to reduce time-to-
insight
• Leverage the full power of a scale-out
architecture with an actionable,
scalable data lake

• Managed Ingestion
 Ability to ingest vast amounts of data
 Ability to handle a wide variety of formats
(streaming, files, custom) and sources
 Build in repeatability through automation to pick up incoming data and
apply pre-defined processing
• Metadata Management
 Capture and manage operational, technical and business metadata
 Provides visibility and reliability – key to finding data in the lake
 Reduced time to insight for analytics
 File and record level watermarking provides data lineage, enables
audit and traceability
Enable the lake

Govern the data
• Data Lineage
 See how data moves and how it is consumed in the data lake.
 Safeguard data and reduce risk, always knowing where data
has come from, where it is, and how it is being used.
• Data Quality
 Rules based Data validation
 Integration with the Managed Data Pipeline
 Stats and metrics for reporting and actions

Govern the data
• Data Security and Privacy
 Differing permissions require enhanced data security
 Mask or tokenize data before published in the lake for consumption
 Policy-based security
• Data lifecycle management across tiered storage environments
 Hot -> Warm -> Cold on an entity level based on policies/SLAs
 Across on-premise and cloud environments
 Provide data management features to automate scheduling and
orchestration of data movement between heterogeneous storage
environments

Engage the business
• Data Catalog
 See what data is available across your enterprise
 Contribute valuable business information to improve
search and usage
 Use a shopping cart experience to create sandbox for ad-
hoc and exploratory analytics
• Self-service Data Preparation
 Blend data in the lake without a costly IT project
 Perform interactive data-driven transformations
 Collaborate and share data assets and transformations
with peers

Data lake reference architecture
• Data required for LOB specific views - transformed
from existing certified data
• Consumers are anyone with appropriate role-based access
• Standardized on corporate governance/ quality policies
• Single version of truth
Transient
Landing Zone
Raw Zone
Refined Zone
Trusted Zone
Sandbox
Data Lake
• Temporary store of
source data
• Consumers are IT,
Data Stewards
• Implemented in highly
regulated industries
• Original source data
ready for consumption
• Consumers are ETL
developers, data
stewards, some data
scientists
• Single source of truth
with history
• Data required for LOB specific views - transformed
from existing certified data
Sensors
(or other time series data)
Relational Data
Stores
(OLTP/ODS/DW)
Logs
(or other unstructured
data)
Social and
shared data

Data lake reference architecture with Zaloni
Consumption ZoneSource
System
File Data
DB Data
ETL Extracts
Streaming
Transient
Landing Zone Raw Zone
Refined
Zone
Trusted
Zone
Sandbox
APIs
Metadata
Management
Data Quality Data Catalog Security
Data Lake
Business Analysts
Researchers
Data Scientists
DATA LAKE MANAGEMENT
& GOVERNANCE PLATFORM
Sensors
(or other time series data)
Relational Data
Stores
(OLTP/ODS/DW)
Logs
(or other unstructured
data)
Social and
shared data
EDW
Data Marts

• Save millions in storage costs
• Significantly speed up processing
• Maximize the data warehouse for BI
• Extract more value from all of your data
Four great reasons to augment with a data lake

Centralized data, decentralized access
Business Analyst Business Manager Data Scientist Business SME
What happened? What is happening? What will happen? What can we control? Can I see the data?
IT Team
Business
Users
IT Analyst Programmer DBA/Modeler Data Scientist Data Engineer
Data Lake
Code Analysis App ImplementationApp PrototypeData ModelCode Development
Operations Manager

DATA LAKE MANAGEMENT
AND GOVERNANCE PLATFORM
SELF-SERVICE DATA
PREPARATION

Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power

Similar to Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power (20)

Recently uploaded

Recently uploaded (20)

Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power