Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Future-proofing your Data Lake
Extending Storage and Lifecycle of Data
Scott Gidley, Zaloni and Gus Horn, NetApp
Webinar: ...
• Award-winning provider of enterprise data lake
management solutions:
Integrated data lake management platform
Self-servi...
3 Zaloni Proprietary
Increased
Agility
New
Insights
Improved
Scalability
Data lakes are central to the modern data archite...
4 Zaloni Proprietary
Data architecture modernizationTraditionalModern
Data Lake
Sources ETL EDW
Derived
(Transformed)
Disc...
Data Lake Promise
• Stores all types of data (structured and
unstructured) in its raw format
• Stores data for longer peri...
Zaloni Confidential and Proprietary - Provided under NDA
• Leverage the full power of a scale-out
architecture with an act...
Data lake’s show promise but success can be short-lived!
▪ Internet retailer relies on data
lake to enable:
▪ Real-time in...
Data Lake Reference Architecture
• Data required for LOB specific views - transformed
from existing certified data
• Consu...
Data Lake Reference Architecture with Zaloni
Consumption ZoneSource
System
File Data
DB Data
ETL
Extracts
Streaming
Transi...
Bedrock Data Lifecycle Management – Policy Execution
Zaloni DLM – Future proof your data lake
Zaloni Confidential and Prop...
The complexities of the
connected vehicle
The classic problems associated with Big Data
Volume, Velocity, Variability & Pr...
The promise of a Connected Car’s Data lake
How to manage billions of unstructured records
© 2016 NetApp, Inc. All rights r...
Validated Certified Designs with all
Distributions of Hadoop
• Map-R
• Cloudera
• Hortonworks
Uses high performance storag...
Fleet maintenance
▪ Large commercial hauling company in US has over 400,000 leased vehicles
▪ Trucks are under warranty
▪ ...
Solution for fleet maintenance
▪ Placed cellular data telemetry devices in all leased vehicles
▪ Collected all telemetry
▪...
Large Strip-mining Operation in Mid West
▪ Vehicles were large Caterpillar Earth Movers
▪ Maintenance cost in Millions (Oi...
Savings extended beyond pure Maintenance
▪ Vehicle load sensors transmitting load in real-time to production plant
▪ Suspe...
DATA LAKE MANAGEMENT
AND GOVERNANCE PLATFORM
SELF-SERVICE DATA
PREPARATION
Upcoming SlideShare
Loading in …5
×

Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

658 views

Published on

Join Gus Horn of NetApp and Scott Gidley of Zaloni as they discuss effective data lake lifecycle management and data architecture modernization. This webinar will address the best ways to achieve new levels of data insight and how to get superior value from your data.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Webinar - Data Lake Management: Extending Storage and Lifecycle of Data

  1. 1. Future-proofing your Data Lake Extending Storage and Lifecycle of Data Scott Gidley, Zaloni and Gus Horn, NetApp Webinar: October 05 2016
  2. 2. • Award-winning provider of enterprise data lake management solutions: Integrated data lake management platform Self-service data preparation • Data Lake Design and Implementation Services: POC, Pilot, Production, Operations, Training • Data Science Professional Services
  3. 3. 3 Zaloni Proprietary Increased Agility New Insights Improved Scalability Data lakes are central to the modern data architecture
  4. 4. 4 Zaloni Proprietary Data architecture modernizationTraditionalModern Data Lake Sources ETL EDW Derived (Transformed) Discovery Sandbox EDW Streaming Unstructured Data Various Sources Data Discovery Analytics BI Data Science Data Discovery Analytics BI
  5. 5. Data Lake Promise • Stores all types of data (structured and unstructured) in its raw format • Stores data for longer periods of time to enable historical analysis • Manages real-time, streaming, and reference data all in the same environment • Integrates storage and compute environments Data Lake Reality • Homogenous data storage degrades performance and efficiency • Aged or non-relevant data pollutes the data lake • Lack of business driven SLA’s for data archival impacts compliance and automated initiatives Zaloni Confidential and Proprietary - Provided under NDA Big data opportunities come with challenges
  6. 6. Zaloni Confidential and Proprietary - Provided under NDA • Leverage the full power of a scale-out architecture with an actionable, scalable data lake Data Lake 360° : Zaloni’s holistic approach to actionable big data 1. Enable the lake 2. Govern the data • Improve data visibility, reliability and quality to reduce time-to-insight 3. Engage the business • Safeguard sensitive data and enable regulatory compliance • Foster a data-driven business through self-service data discovery and preparation
  7. 7. Data lake’s show promise but success can be short-lived! ▪ Internet retailer relies on data lake to enable: ▪ Real-time inventory analytics ▪ Customer next-best-offer programs ▪ Initial implementation shows promise and delivers measurable business value ▪ Increasing costs and decreasing performance due to unmanaged data growth limit long-term ROI Real-Time Inventory Management Customer 360: Next Best Offer
  8. 8. Data Lake Reference Architecture • Data required for LOB specific views - transformed from existing certified data • Consumers are anyone with appropriate role-based access • Standardized on corporate governance/ quality policies • Consumers are anyone with appropriate role-based access • Single version of truth Transient Landing Zone Raw Zone Analytic Zone Refined Zone Sandbox Data Lake • Temporary store of source data • Consumers are IT, Data Stewards • Implemented in highly regulated industries • Original source data ready for consumption • Consumers are ETL developers, data stewards, some data scientists • Single source of truth with history • Data required for LOB specific views - transformed from existing certified data • Consumers are anyone with appropriate role-based access Sensors (or other time series data) Relational Data Stores (OLTP/ODS/DW) Logs (or other unstructured data) Social and shared data
  9. 9. Data Lake Reference Architecture with Zaloni Consumption ZoneSource System File Data DB Data ETL Extracts Streaming Transient Landing Zone Raw Zone Analytic Zone Refined Zone Sandbox API s Metadata Management Data Quality Data Catalog Security Data Lake Business Analysts Researchers Data Scientists DATA LAKE MANAGEMENT & GOVERNANCE PLATFORM Sensors (or other time series data) Relational Data Stores (OLTP/ODS/DW) Logs (or other unstructured data) Social and shared data
  10. 10. Bedrock Data Lifecycle Management – Policy Execution Zaloni DLM – Future proof your data lake Zaloni Confidential and Proprietary - Provided under NDA Business Analysts Researchers Data Scientists File Data DB Data ETL Extracts Stream s API s Raw Data Zone Refined Data Zone Analytic Data Zone DLM Policy < 360 Days = Warm > 360 Days = S3 Vault DLM Policy < 30 Days = Hot > 30 & < 120 Days = Warm > 120 Days = S3 Vault DLM Policy < 30 Days = Hot > 30 Days = S3 Vault INGEST ORGANIZE ENRICH ENGAGE S3 Vault StorageGRID Webscale Hot E-Series Flash Warm E-Series Disk Consumption Zone Applications Data Lake Data Storage Data Tier Bedrock Data Lifecycle Management – Policy Definition
  11. 11. The complexities of the connected vehicle The classic problems associated with Big Data Volume, Velocity, Variability & Privacy! © 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---11
  12. 12. The promise of a Connected Car’s Data lake How to manage billions of unstructured records © 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---12 INGEST Manage data ingestion so you know what is your Hadoop Data Lake ORGANIZE Define and capture metadata for ease of searching and browsing ENRICH Orchestrate and manage the data preparation process ENGAGE Self-service data preparation
  13. 13. Validated Certified Designs with all Distributions of Hadoop • Map-R • Cloudera • Hortonworks Uses high performance storage • Resilient Compact footprint • Protection of Data, DDP, R5/R6/R10 • Less Network Congestion Higher capacity and density • 480TB in 4U • Expandable to 3.1 PB / Controller • Fully serviceable storage system • No Architectural limit Reliability • 99.9999% reliability <35sec / year The NetApp Solution for Hadoop 13 Insight © 2015 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only Enterprise Grade Hadoop (Consistent performance during all modes of operation) 12Gb/S SAS Data Nodes 4:1 Ratio 1 0 G B E t h e r N e t 10GBEtherNetworkDataIntensiveside 1 0 G B E t h e r N e t 1or10GBEtherNetworkManagement Hadoop Analytic Platform - High Performance HDFS - Heterogeneous File system - Tiered HOT/WARM/COLD Storage - Tested Validated Architecture High Performance Building Block - High Performance HDFS - Scale to Thousands of Nodes - Exa-Bytes of Capacity NFS Connector for Hadoop Resource Manager Name Node(s) Fully connected Building Block - High Performance NFS optimized - Augment existing Hadoop Cluster - Exa-Bytes of Capacity
  14. 14. Fleet maintenance ▪ Large commercial hauling company in US has over 400,000 leased vehicles ▪ Trucks are under warranty ▪ Fleet must operate and maximum efficiency to maintain profits ▪ Truck drivers have predictable behavior ▪ They will continue to drive even with warning lights indicating problems with the vehicle, they keep on trucking ▪ Minor problems often times elevate to major ones if not addressed early on during the failure process ▪ Perception of driver that the vehicle is under warranty and therefore if it is driving they will continue to the final destination i.e. completing the delivery before addressing any issue Proactive maintenance is much more cost effective than reactive © 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---14
  15. 15. Solution for fleet maintenance ▪ Placed cellular data telemetry devices in all leased vehicles ▪ Collected all telemetry ▪ Speeds of vehicle and GPS coordinates ▪ All mechanical sensor data ▪ Could identify employee ▪ Alerts driver to mechanical issue immediately and schedules proactive maintenance with appointment at next rest stop with predictive time out of service ▪ Minor problems do not escalate to major failures ▪ Immediate improvement of fleet uptime and reduce warranty expense and out of service situations ▪ Saved over $5M in the first year of operation Maintenance and vehicle readiness are correlated © 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---15
  16. 16. Large Strip-mining Operation in Mid West ▪ Vehicles were large Caterpillar Earth Movers ▪ Maintenance cost in Millions (Oils, Hydraulic, Engine, Transmissions etc.) ▪ Vehicles only make money when moving product ▪ Rather than Hobs meter (How many hours of operation) maintenance it was changed to telemetry based maintenance was implemented ▪ Minor issues never progressed to major down time issues ▪ Driver behavior had a direct correlation to vehicle damage and ware (brakes and suspension) ▪ Maintenance cost reduction paid for Hadoop cluster and related software within the first quarter of operation Telemetry proved benefits beyond the vehicle © 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---16
  17. 17. Savings extended beyond pure Maintenance ▪ Vehicle load sensors transmitting load in real-time to production plant ▪ Suspension load sensors transmitted road conditions ▪ Abnormal angles were detected in real time ▪ Pot holes and terrain require re-grading detected before causing excessive strain to the suspension of Earth movers ▪ Prior to telemetry the mine guessed were to maintain the road and often were missing major issues causing excessive suspension strain and out of limit failures costing Millions of dollars in down time and repairs ▪ Driver behavior had a direct correlation to vehicle damage and ware (brakes and suspension) ▪ Drivers were better trained to learn how to brake and accelerate with the vehicles saving millions in unneeded repairs ▪ The side effect of telemetry produced more than $10M in cost reduction in vehicle and road maintenance with greater uptime of fleet Route maintenance, driver behaviors and real-time product tracking © 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---17
  18. 18. DATA LAKE MANAGEMENT AND GOVERNANCE PLATFORM SELF-SERVICE DATA PREPARATION

×