Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Creating a Modern Data Architecture

2,653 views

Published on

Ben Sharma's presentation at Strata + Hadoop New York 2016.

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE Format, ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Creating a Modern Data Architecture

  1. 1. Creating a modern data architecture September 28, 2016 Ben Sharma | CEO ben@zaloni.com
  2. 2. •  Award-winning provider of enterprise data lake management solutions: Integrated data lake management platform Self-service data preparation •  Data Lake Design and Implementation Services: POC, Pilot, Production, Operations, Training •  Data Science Professional Services
  3. 3. 3 Zaloni Proprietary Increased Agility New Insights Improved Scalability Data lakes are central to the modern data architecture
  4. 4. 4 Zaloni Proprietary •  Store all types of data in its raw format •  Create Refined, Standardized, Trusted datasets for various use cases •  Store data for longer periods of time to enable historical analysis •  Query and Access the data using a variety of methods •  Manage streaming and batch data in a converged platform •  Provide shorter time-to-insight with proper data management and governance The data lake promise
  5. 5. 5 Zaloni Proprietary Data architecture modernizationTraditionalModern Data Lake Sources ETL EDW Derived (Transformed) Discovery Sandbox EDW Streaming Unstructured Data Various Sources Data Discovery Analytics BI Data Science Data Discovery Analytics BI
  6. 6. 6 Zaloni Proprietary Data lake – the challenges and the solution •  Ingestion •  Lack of Visibility •  Privacy and Compliance •  Quality Issues •  Reliance on IT •  Reusability •  Rate of Change •  Skills Gap •  Complexity Managing: Delivering:Building:
  7. 7. 7 Zaloni Proprietary Data Lake Reference Architecture •  Data required for LOB specific views - transformed from existing certified data •  Consumers are anyone with appropriate role-based access •  Standardized on corporate governance/ quality policies •  Consumers are anyone with appropriate role-based access •  Single version of truth Transient Landing Zone Raw Zone Refined Zone Trusted Zone Sandbox Data Lake •  Temporary store of source data •  Consumers are IT, Data Stewards •  Implemented in highly regulation industries •  Original source data ready for consumption •  Consumers are ETL developers, data stewards, some data scientists •  Single source of truth with history •  Data required for LOB specific views - transformed from existing certified data •  Consumers are anyone with appropriate role-based access Sensors (or other time series data) Relational Data Stores (OLTP/ODS/ DW) Logs (or other unstructured data) Social and shared data
  8. 8. 8 Zaloni Proprietary Inputs: •  Sources: RDBMS, File, Streaming, Structure/ Unstructured, External Data Processes: •  Data transfer and intake: Managed and scheduled •  Discover metadata •  Register in the catalog •  Apply Zone specific policies •  Capture operational metrics and monitoring •  Post-ingestion validations and clean up Outputs: •  Data transfer to Raw Zone Policies: •  Data privacy – tokenization, masking •  Data security – user access •  Data quality – profiling, entity level checks •  Data lifecycle management – short lived, temporary Transient Landing Zone Transient Landing Zone •  Temporary stores source data •  Limited access •  Consumers are IT, Data Stewards •  Implemented in highly regulation industries •  This zone collapses with Raw Zone if security is not needed Transient Landing Zone Raw Zone Refined Zone Trusted Zone Sandbox Data Lake
  9. 9. 9 Zaloni Proprietary Inputs: •  Output from TLZ (Hcatalog entries) •  Source inputs if TLZ is skipped Policies: •  Data privacy – tokenization, masking •  Data security – user access •  Data Quality – profiling, entity level, field level •  Data transformations required for Single View of Truth •  Combine attributes to one entity •  Change data formats (e.g. VSAM binary of JSON) •  Derived columns •  Field mappings •  Drop columns •  Data lifecycle management •  Could be a candidate for S3 or Object store Outputs: •  Data transfer to Trusted Zone •  Data transfer to Sandbox Processes: •  Register and update catalog •  Apply zone specific policies •  Operational metrics and monitoring Raw Zone Transient Landing Zone Raw Zone Refined Zone Trusted Zone Sandbox Data Lake Raw Zone •  Original source data •  Ready for consumption •  Treated for basic validation and privacy •  Metadata available to everyone but data access limited based on role •  Consumers are ETL developers, data stewards, some data scientists •  Single source of truth with history
  10. 10. 10 Zaloni Proprietary Inputs: •  Output from Raw Zone Processes: •  Register and update catalog •  Apply zone specific policies •  Data transformation required for refined use cases for LOB such as •  Customer360 view •  Periodic snapshots of revenue Outputs: •  Data transfer to Refined Policies: •  Data security – user access •  Data lifecycle management •  Lifetime of use case •  Use case specific Trusted Zone Raw Zone Refined Zone Trusted Zone Sandbox Data Lake •  Standardized on corporate governance/ quality policies •  Consumers are anyone with appropriate role-based access •  Metadata catalog available to all •  Single version of truth Trusted Zone Transient Landing Zone
  11. 11. 11 Zaloni Proprietary Inputs: •  Output from Trusted Zone •  Output from Raw Zone for LOB-specific use cases Processes: •  LOB specific transformations •  Aggregates •  De-normalized •  Apply zone specific policies •  Model building for reports •  Optionally a cube generation Outputs: •  Transformed data can be saved back to Refined Zone •  Applications such as BI tools •  Transferred to sandbox if required Policies: •  Data security – user access •  Data lifecycle management •  Lifetime of use case •  Use case specific •  De tokenization if required (based on access) Refined Zone Raw Zone Refined Zone Trusted Zone Sandbox Data Lake Transient Landing Zone Refined Zone •  Data required for LOB specific views - transformed from existing certified data •  Consumers are anyone with appropriate role- based access •  Metadata catalog available to all
  12. 12. 12 Zaloni Proprietary Inputs: •  Output from Raw, Trusted and Refined Zones •  Self-service ingestion Sandbox Raw Zone Refined Zone Trusted Zone Sandbox Data Lake Transient Landing Zone Processes: •  Data scientists drive analysis •  Self-service for ad-hoc Outputs: •  Models that can later be operationalized •  Optionally, results/data can be sent back to the Raw Zone Policies: •  Data security – user access •  Data lifecycle management •  Lifetime of use case •  Use case specific •  Data required for LOB specific views - transformed from existing certified data •  Consumers are anyone with appropriate role- based access •  Metadata catalog available to all Sandbox
  13. 13. 13 Zaloni Proprietary Data lake Reference Architecture with Zaloni Consumption ZoneSource System File Data DB Data ETL Extracts Streaming Transient Landing Zone Raw Zone Refined Zone Trusted Zone Sandbox APIs Metadata Management Data Quality Data Catalog Security Data Lake Business Analysts Researchers Data Scientists DATA LAKE MANAGEMENT & GOVERNANCE PLATFORM Sensors (or other time series data) Relational Data Stores (OLTP/ODS/ DW) Logs (or other unstructured data) Social and shared data
  14. 14. 14 Zaloni Proprietary Data Lake 360°: A holistic approach to actionable big data 1. Enable the lake 2. Govern the data 3. Engage the business •  Foster a data-driven business through self-service data discovery and preparation •  Safeguard sensitive data and enable regulatory compliance •  Improve data visibility, reliability and quality to reduce time-to- insight •  Leverage the full power of a scale-out architecture with an actionable, scalable data lake
  15. 15. 15 Zaloni Proprietary •  Managed Ingestion §  Ability to ingest vast amounts of data §  Ability to handle a wide variety of formats (streaming, files, custom) and sources §  Build in repeatability through automation to pick up incoming data and apply pre-defined processing •  Metadata Management §  Capture and manage operational, technical and business metadata §  Provides visibility and reliability – key to finding data in the lake §  Reduced time to insight for analytics §  File and record level watermarking provides data lineage, enables audit and traceability Enable the lake
  16. 16. 16 Zaloni Proprietary •  Data Lineage §  See how data moves and how it is consumed in the data lake. §  Safeguard data and reduce risk, always knowing where data has come from, where it is, and how it is being used. •  Data Quality §  Rules based Data validation §  Integration with the Managed Data Pipeline §  Stats and metrics for reporting and actions Govern the data
  17. 17. 17 Zaloni Proprietary •  Data Security and Privacy §  Differing permissions require enhanced data security §  Mask or tokenize data before published in the lake for consumption §  Policy-based security •  Data lifecycle management across tiered storage environments §  Hot -> Warm -> Cold on an entity level based on policies/SLAs §  Across on-premise and cloud environments §  Provide data management features to automate scheduling and orchestration of data movement between heterogeneous storage environments Govern the data
  18. 18. 18 Zaloni Proprietary Engage the business •  Data Catalog §  See what data is available across your enterprise §  Contribute valuable business information to improve search and usage §  Use a shopping cart experience to create sandbox for ad- hoc and exploratory analytics •  Self-service Data Preparation §  Blend data in the lake without a costly IT project §  Perform interactive data-driven transformations §  Collaborate and share data assets and transformations with peers
  19. 19. 19 Zaloni Proprietary •  Rapid increase of Data Lake platforms in the Cloud •  Hybrid cloud and multi-cloud considerations •  Support sensitive data on premise and external data in the cloud (e.g. client data, machine-generated) •  Key challenges: §  Leverage Cloud native features §  Consistence Data Management and Governance Emergence of cloud-based and hybrid data lakes GOVERNANCE VISIBILITY
  20. 20. 20 Zaloni Proprietary •  How do you create a cloud agnostic data lake platform? •  How deploy a cost-effective compute layer? §  Elastic compute layer §  Batch and near real-time •  How do you optimize storage? §  Support polyglot persistence §  DLM •  How do you optimize network connectivity between Ground to Cloud? •  How do you meet enterprise security requirements? Considerations for data lake in the cloud CLOUD and HYBRID ENVIRONMENTS
  21. 21. 21 Zaloni Proprietary Cloud Data Lake Maturity model Lift and Shift Cloud Native features Multi and Hybrid Cloud Replicate on- premise Data Lake in the cloud Leverage Object stores, Transient compute platforms, Messaging systems Abstraction over multiple clouds, consistent Data Management and Governance
  22. 22. 22 Zaloni Proprietary Building your blueprint 1. Questions 2. Inputs 3. Outcomes Business Drivers AND Business Questions: e.g. Where is fraud occurring? How do I optimize inventory? Data Use Cases Platform Subject Areas Source System Capabilities, Process Ingest, Organize, Enrich, Explore Roadmap Managed Data Lake Analytics Strategy = ++
  23. 23. 23 Zaloni Proprietary Typical data lake implementation timeline POC Weeks Weeks Production Data Lake Platform Proof of Concept: ü  Demonstrate technical capabilities of the platform in the context of selected use cases Data Lake Implementation: ü  Planning, Installation, Training ü  Sample data sets ingested ü  Pilot uses cases created Business Use Case Delivered: ü  Engage business stakeholders to identify production use cases at scale ü  Review learnings and optimize the data lake Data Lake Use Case Implement Business Use Case Varies by Use Case
  24. 24. DATA LAKE MANAGEMENT AND GOVERNANCE PLATFORM SELF-SERVICE DATA PREPARATION
  25. 25. FREE T-SHIRT! Building a Modern Data Architecture Ben Sharma, CEO and Founder, Zaloni Wednesday, 2:05 p.m. – 1 E 09 Demo and FREE copy of book “Architecting Data Lakes” Speaking Sessions: Cloud Computing and Big Data Ben Sharma, CEO and Founder, Zaloni Tuesday, 9:30 a.m. – 1B 01/02 Visit Booth #644 for these giveaways!

×