Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enterprise Big Data Lake: Challenges, Strategies, Maximizing Benefits - Impetus Webinar

6,948 views

Published on

Impetus on- demand webcast ‘Implementing the Enterprise Big Data Lake: Challenges, Strategies, Maximizing Benefits’ available at http://lf1.me/wNb/

Published in: Technology
  • Be the first to comment

Enterprise Big Data Lake: Challenges, Strategies, Maximizing Benefits - Impetus Webinar

  1. 1. © 2015 Impetus Technologies1 Recorded version available at http://lf1.me/wNb/ We Implement Big Data Webinar: Implementing the Enterprise Big Data Lake: Challenges, Strategies, Maximizing Benefits Vineet Tyagi- CTO and Head of Labs, Impetus Technologies Larry Pearson- Vice President, Marketing, Impetus Technologies
  2. 2. © 2015 Impetus Technologies2 Recorded version available at http://lf1.me/wNb/ Agenda • Overview • What is a Data Lake? • Drivers for Data Lake implementation • Building a Data Lake • Challenges of Data Lake implementation • Strategies of building a Data Lake • Q & A
  3. 3. © 2015 Impetus Technologies3 Recorded version available at http://lf1.me/wNb/ Enterprise Data Warehouse – Current Environment Optimize Existing DW/BI Infrastructure or Create New Capabilities • Handle Big Data and the 3 V’s • Volume, Variety, Velocity • Integrate Multiple Data Silos • ERP, CRM, HRM and others • Reduce Cost ― ETL process ― Analytical process ― Mainframe process ― Cloud feasibility for data analytics • Applying Science • Unstructured data for enhancing analytics • Data Science for advanced analytics • Reduce Time to Market by Faster Processing/ Analytics
  4. 4. © 2015 Impetus Technologies4 Recorded version available at http://lf1.me/wNb/ What is a Data Lake ? A massive, easily accessible, flexible and scalable data repository • Built on inexpensive computer hardware • Designed for storing uncategorized pools of data “as is”, including the following: – Data immediately of interest – Data potentially of interest – Data for which the intended usage is not yet known
  5. 5. © 2015 Impetus Technologies5 Recorded version available at http://lf1.me/wNb/ What Capabilities Does the Data Lake Bring? Active Archive Self Service Exploratory BI Advanced Analytics at Scale (Moving from Analyst Intuition to Empirical Insights) Lower cost of transformation A Data Lake brings newer capabilities and insights to business users which include but not limited to the following:
  6. 6. © 2015 Impetus Technologies6 Recorded version available at http://lf1.me/wNb/ Data Lake Architecture Benefits • Allows organization to create the "Adjunct" to the EDW ― Offload relatively colder data and workloads ― Support for unstructured data ― Create a cultural shift towards "democratizing data access“ • Acquire a capability of running workloads based on cost /performance
  7. 7. © 2015 Impetus Technologies7 Recorded version available at http://lf1.me/wNb/ Drivers for Data Lake Architecture • Nature of the Data ― How much unstructured v/s structured data do you use for insights • Level of unification of Data ― Time analysis of information on source data, delta data detection • Encourage experimentation ― Creation of Point Solutions by Line of Business • Moving from “analyst intuition” & statistics to empirical data science driven insights • Scale is driven by demand
  8. 8. © 2015 Impetus Technologies8 Recorded version available at http://lf1.me/wNb/ Building a Data Lake Design Principle s Discovery without limitations Low latency at any scale Reactive to predictive Affordable unlimited scale Elasticity in infrastructu re
  9. 9. © 2015 Impetus Technologies9 Recorded version available at http://lf1.me/wNb/ Data Lake (Big Data PaaS) Data Servicing Relational Data (PostgreSQL, Oracle, DB2, SQL Server…) Flat Files/XML/JSON/ CSV Existing Systems (ARS, PLC, Cimplicity, Active Plant) Data Sources Machine Data Data Ingestion Streaming Kafka/Flume Sqoop/ Connectors Existing DI Tools REST JDBC SOAP Custom Data Processing Data Curation Indexing Data Governance Data Quality Data Classification Information Policy Lifecycle Management Hive/Pig/Impala/Drill/Spark SQL Query engines Data Store Virtualization Search Federation Access Delivery HA Provision Security Monitoring Business Intelligence Machine Data Analysis Predictive & Statistical Analytics Data Discovery Visualization & Reporting Reference Big Data Architecture
  10. 10. © 2015 Impetus Technologies10 Recorded version available at http://lf1.me/wNb/ Patterns for Implementing a Data Lake Data Reservoir Support Iterative Investigation Drive Analytical Applications An comprehensive Data Lake strategy requires effective implementation patterns to be in place for systems and process.
  11. 11. © 2015 Impetus Technologies11 Recorded version available at http://lf1.me/wNb/ Building a Data Lake – Stages of Evolution Staged Approach to Data Lake Roll-out Handle and ingest data at scale Stage 1 Building the analytical muscle, laying data pipelines, monitoring, supporting use cases Stage 2 Operational Impact - have EDW and Data Lake work in unison Stage 3 Enterprise capability in the lake Stage 4
  12. 12. © 2015 Impetus Technologies12 Recorded version available at http://lf1.me/wNb/ Stage 1: Handle and Ingest Data at Scale Landing and ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Enterprise Data Lake The organization needs to determine the existing and new data source that it can leverage. The data sources are integrated and the variety of voluminous data is ingested at high velocity in Hadoop storage.
  13. 13. © 2015 Impetus Technologies13 Recorded version available at http://lf1.me/wNb/ Stage 2: Building the Analytical Muscle Landing and ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning, Workflow, Monitoring and Security Enterpris e Data Lake Predictive applications Exploration & discovery Enterprise applications Real-Time applications Leveraging the enterprise Data Lake in Hadoop, the organization builds batch, mini-batch and real time applications for enterprise usage, exploratory analytics and predictive use cases. Various tools and frameworks are utilized in this stage.
  14. 14. © 2015 Impetus Technologies14 Recorded version available at http://lf1.me/wNb/ Stage 3: EDW and Data Lake Work in Unison The enterprise data warehouse (EDW) and Hadoop based Big Data Lake would co-exist to allow the enterprise to leverage the strengths of each architecture. Landing and ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning, Workflow, Monitoring and Security Enterpris e Data Lake Predictive applications Exploration & discovery Enterprise applications Real-Time applications Traditional data repositories RDBMS MPP
  15. 15. © 2015 Impetus Technologies15 Recorded version available at http://lf1.me/wNb/ The Data Lake DILEMMA Data • Ingestion and Storage • Governance • Security & Compliance Information Lifecycle Management • Lineage Enterprise Metadata Management • Meta data discovery • Ontology Access • Query Performance • Search data D IL EM M A Effective use of the Data Lake as a true enterprise data reservoir introduces new challenges. We call these the Data Lake “DILEMMA”. Addressing these will help avoid turning the lake into a “data swamp” and inhibit or slow enterprise adoption.
  16. 16. © 2015 Impetus Technologies16 Recorded version available at http://lf1.me/wNb/ Stage 4: Enterprise Capability in the Lake Broad adoption of unified Data Lake architectures, will require information governance, meta data management and information lifecycle management capabilities. Landing and ingestion Structured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning, Workflow, Monitoring and Security Enterpris e Data Lake Predictive applications Exploration & discovery Enterprise applications Real-Time applications Traditional data repositories RDBMS MPP Governance, Information Lifecycle, Enterprise Meta Data Management
  17. 17. © 2015 Impetus Technologies17 Recorded version available at http://lf1.me/wNb/ Summary • Hadoop based Big Data architectures have changed the face of the Data Warehouse/BI/Analytics world forever. • Enterprise adoption of Big Data architectures is accelerating as a way to enable broad new opportunities across all industries. • There is a growing acceptance of the concept of a “Data Lake” as a cornerstone component of an enterprise Big Data strategy. • “Big Data Warehouse” architectures will complement rather than replace the enterprise data warehouses of today. • Models and approaches are emerging to address the enterprise class DILEMMA related to security, governance and operations. • There are defined best practices and roadmaps for implementing an enterprise Data Lake architecture.
  18. 18. © 2015 Impetus Technologies18 Recorded version available at http://lf1.me/wNb/ Q&A (Use the chat/Q&A panel) For general inquiries about our services and solutions reach us at bigdata@impetus.com ? Follow us on Twitter- @impetustech

×