Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Swimming Across the Data Lake, Lessons learned and keys to success

1,150 views

Published on

Swimming Across the Data Lake, Lessons learned and keys to success

Published in: Technology
  • Be the first to comment

Swimming Across the Data Lake, Lessons learned and keys to success

  1. 1. © 2016 Impetus Technologies - Confidential1 SwimmingAcross the Data Lake Lessons Learned and Keys to Success Impetus Technologies Inc.
  2. 2. © 2016 Impetus Technologies - Confidential2 Our 40 MinutesToday • CriticalTrends – Lessons Learned • Solving the Big Data “DILEMMA” – Data Democratization – Enterprise Metadata Management – DataAccess – Self Service BI • Migrating Workloads – Lift and Shift Automation
  3. 3. © 2016 Impetus Technologies - Confidential3 Source Credit- http://www.cioinsight.com/it-strategy/big-data/slideshows/big-problems-for- big-data Year 2013 Hadoop in Gartner’s Hype Cycle State of Play – Big Data Year 2014
  4. 4. © 2016 Impetus Technologies - Confidential4 State of Play – Big Data - 2015 Year 2013 2015: Gartner moves Big Data out of Hype Cycle - it is REAL Hadoop
  5. 5. © 2016 Impetus Technologies - Confidential5 "Through 2018, 90% of Hadoop installations will be useless as they are overwhelmed with information assets captured for uncertain use cases““Enterprises today are realizing about 15% of potential ROI on BI investments “
  6. 6. © 2016 Impetus Technologies - Confidential6 Blueprint for a Modern Data Architecture Landing and ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning, Workflow, Monitoring and Security Enterprise Data Lake Predictive applications Exploration & discovery Enterprise applications Real-Time applications Traditional data repositories RDBMS MPP Governance, Information Lifecycle, Enterprise Meta Data Management
  7. 7. © 2016 Impetus Technologies - Confidential7 Don’t throw your users into the Data Lake! Creating a data lake is only the beginning…
  8. 8. © 2016 Impetus Technologies - Confidential8 The Data Lake “DILEMMA” Data • Ingestion and Storage • Governance • Security & Compliance Information Lifecycle Management • Lineage Enterprise Metadata Management • Meta data discovery • Ontology Access • Query Performance • Search data D IL EMM A Effective use of the Data Lake as a true enterprise data reservoir introduces new challenges. We call these the Data Lake “DILEMMA”. Addressing these will help avoid turning the lake into a “data swamp” and inhibit or slow enterprise adoption.
  9. 9. © 2016 Impetus Technologies - Confidential9 CriticalTrends – Planning for Success "Through 2018, 90% of deployed data lakes will be useless as they are overwhelmed with information assets captured for uncertain use cases“
  10. 10. © 2016 Impetus Technologies - Confidential10 CriticalTrends – Planning for Success Making insights and data in the lake readily discoverable, accessible and usable “Visual data-discovery, an important enabler of end user self-service, will grow 2.5 x faster than the rest of the market, becoming by 2018 a requirement for all enterprises.”
  11. 11. © 2016 Impetus Technologies - Confidential11 The Path to Democracy: Data Discovery Don’t tell them what they need. Help your stakeholders find what works best for them
  12. 12. © 2016 Impetus Technologies - Confidential12 The Path to Democracy: Data Discovery • Identification of unknown data • Consolidation of enterprise data dictionary • Metadata capture throughout the data lifecycle • Search tools to help users find what they need • Tools to browse/sample available data sets • Collaboration tools for users to share their data insights
  13. 13. © 2016 Impetus Technologies - Confidential13 The Path to Democracy: Data Accessibility Lower adoption barriers for your stakeholders. Getting the data they want should be fast and easy
  14. 14. © 2016 Impetus Technologies - Confidential14 The Path to Democracy: Data Accessibility • Easy to use, access request mechanisms • Monitored access approval workflows • Business rules for decisioning automation • Fast data provisioning, integrated with approval workflow • Automated data provisioning mechanisms
  15. 15. © 2016 Impetus Technologies - Confidential15 The Path to Democracy: Data Usability Get the most out of your data lake. Make it simple to use.
  16. 16. © 2016 Impetus Technologies - Confidential16 The Path to Democracy: Data Usability • Virtualize data views to hide storage platform complexity • Provide business user friendly façades to technical tools • Help users visualize data • Link data views to business entities • Help users access the data lake through familiar paradigms • Wrap analytics algorithms into easy to use tools
  17. 17. © 2016 Impetus Technologies - Confidential17 Metadata Repository Metadata Management – Unified SingleView • Automatic schema ingestion from heterogeneous data sources • Data dictionary and catalog entity tagging • Search for anything on data catalog(e.g., free text search) • Increment update and automatic metadata synchronization • Define and manage data catalogue entity’s lifecycle • Applying machine learning algorithms for attribute identification, mappings, and entity consolidation • Domain experts analyse, approve and customize the results. Provide suggestions to model. • Domain based dictionary for attributes matching, classification and building composite
  18. 18. © 2016 Impetus Technologies - Confidential18 Access Big Data Specific Query & Reporting SQL Cross Dimensional Fast Slice Dice and Drill Down OLAP Data from MPP, Relational and Hadoop Data Virtualization Finding the “Needle in a Haystack” Search “Don’t Know WhatYou Don’t Know” Self Service Data Discovery
  19. 19. © 2016 Impetus Technologies - Confidential19 “By 2017, most business users and analysts in organizations will have have access to self-service tools to prepare data for analysis” “Managed BI Self-Service Will Continue to Close the Business and Technology Gap.” Critical Trends – Planning for Success Self Service BI over Data Lake
  20. 20. © 2016 Impetus Technologies - Confidential20 Steps to Effective Self Service BI Provision Cluster Discover and Blend New Sources DataAccess and Exploration Ingest and Transform data Security and Governance BI, Analytics and Models
  21. 21. © 2016 Impetus Technologies - Confidential21 Blueprint for a Modern Data Architecture Landing and Ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning, Workflow, Monitoring and Governance Enterprise Data Lake Data Federation/ Virtualization Exploration & Discovery Data WranglingReal-time Applications Traditional Data Repositories RDBMS MPP Enterprise Meta Data Management Accelerators
  22. 22. © 2016 Impetus Technologies - Confidential22 Why are customers MigratingWorkloads to Hadoop ? • Free up capacity to contain costs – Immediate ROI on Hadoop – Contain expenditure on relational warehouse • Create a multi-platform data warehouse environment – One of the strongest tactic in data architecture today – Create an “Adjunct” to the relational warehouse platform • Get a platform better suited to advanced analytics • Setup for success
  23. 23. © 2016 Impetus Technologies - Confidential23 Manual migrations often come with million-dollar price tags and years of business logic must be recoded, debugged, and vetted in Hadoop (RISK)
  24. 24. © 2016 Impetus Technologies - Confidential24 Challenges to MigrateWorkloads to Hadoop • Extremely Complicated Process – Manual Identification of Workloads to Migrate • WhatTables, Data, Queries, Dependent Queries – Figuring out the Data Model on Hadoop • Where to store on Hadoop ? • Hadoop best practices are not known/missing – Manual Migration • time taking and risky • Offload/Migration Validation/ QA problem • Technology Readiness – ANSI standard SQL and other complex relational technologies are not fully supported on Hadoop – Support for DW specific keywords and data types
  25. 25. © 2016 Impetus Technologies - Confidential25 “4 click” Paradigm Connect to supported Data Warehouses Teradata SQL Server SAP Hana and More on Roadmap Oracle Netezza DB2 • Full Intelligent Assessment & Identification of “Offload-able” Entities – Analyze and Recommend off loadable queries and tables – Recommend Query Engine to meet SLA’s – Recommend Data Store Tool Sets for Automated Migration • Data Migration – Recommendation for data partitioning, clustering and buckets – Migrate role based security – Data Validation • WorkloadTransformation – Impetus UDF Library to support Source specifickeywords – Automatic conversation of SQL and PL/SQL scripts • WorkloadExecution – Support for Multiple Query Engine – HIVEQL and SparkSQL – Schedule execution for migrated code Validate Migrated Workloads • Establish functional equivalence • Meet or exceed SLA’s – Support for HiveQL and SparkSQL – Support Hive on Tez and Hive on Spark Engines – Built-in recommendation for partitioning, clustering and number of buckets based on dataset. – Optimized parallelism (number of mappers) based on data source size – Scale out on commodity hardware
  26. 26. © 2016 Impetus Technologies - Confidential26 Impetus Enabling the Modern Analytical Platform Landing and Ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning,Workflow, Monitoring andGovernance Enterprise Data Lake Data Federation/ Virtualization Exploration & Discovery DataWranglingReal-time applications Traditional Data Repositories RDBMS MPP Enterprise Meta Data Management Accelerators KYVOS INSIGHTS DATA BLENDING WORKLOAD MIGRATION METADATA & DISCOVERY DATA GOVERNANCE (for Hadoop) DATA ACCESS STREAM ANALYTIX
  27. 27. © 2016 Impetus Technologies - Confidential27 Thank you. Questions??

×