• Save
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast
Upcoming SlideShare
Loading in...5
×
 

Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast

on

  • 60 views

Impetus on- demand webcast ‘Accelerating Hadoop Solution Lifecycle and Improving ROI’ available at http://bit.ly/1nMw8nQ

Impetus on- demand webcast ‘Accelerating Hadoop Solution Lifecycle and Improving ROI’ available at http://bit.ly/1nMw8nQ

Statistics

Views

Total Views
60
Views on SlideShare
59
Embed Views
1

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 1

http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast Presentation Transcript

  • © 2014 Impetus Technologies1 July 25, 2014 Accelerating the Big Data Solution Lifecycle and Improving ROI
  • © 2014 Impetus Technologies2 Agenda Big Data Analytics: Implementation patterns Challenges faced Jumbune – an open source lifecycle accelerator Enterprise solution lifecycle Ways to address the challenges Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies3 Big Data Analytics Primary drive for performing analytics Rise of the enterprise data lake Utilization of analytical resources Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies4 Primary Purposes of an Analytical Solution Optimize the business Reduce time taken by analytics Result in effective analytics Compete and win Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies5 Rise of the Enterprise Data Lake BIG DATA Sources of Data: ETL from every source - RDBMS, flat files, queues, legacy off loading, logs Arrival of Data: Intermittent, bulk, incremental Theme : “Leave no Data unused” Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies6 Utilization of Analytics Resources • Capitalize on all analytics resources (engines) available • Access data with a variety of processing engines – Storm, Spark, Yarn etc. • Model in data science analytical systems – R, Octave, SAS, etc. • Write complex logic in custom MapReduce • Reuse code as User Defined Functions (UDFs) • Create ad hoc queries using Hive and PIG • Customization of Mahout algorithms, machine learning libraries Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies7 Enterprise Big Data Solution Trends • No more single purpose Hadoop clusters • Enterprise Data Lake: Data flowing from many sources • Integrated platforms using variety of analytical engines • Serving multiple business applications • Resource sharing is a must across applications and engines Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies8 Enterprise Solution Lifecycle (High level view) Business Requirement Designing / Modelling Development and Testing Production and Monitoring Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies9 Enterprise Solution Lifecycle (Ground level view) xxx xxx Business User Data Analyst Development Quality Test DevOpsData Lake Production and Monitoring Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies10 Challenges in Enterprise Analytical Solutions No common platform to detect root causes Incremental imports may ingest bad data Cluster resources are shared and optimal utilization is the key Implementing models in custom MR without errors is like hitting the bull’s eye Bad logic or bad data Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies11 Scenario: Digitization of Newspaper for Analyzing News xxx xxx Team: 5 Dev, 3 QA, 2 DevOps Simple Problem: ‘q’ was misread by OCR as 9 TIME • A single code fault on TB of data can consume 24 work hours total for 2 Developers + 1 QA COST • Additional hours by engineers + The cost of unproductive cloud instances, storage and resources Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies12 Scenario: Hive Queries Interpreted as MapReduce Executions on a Hadoop Cluster xxx xxx Team: 2 Dev, 1 QA, 1 DevOps Simple Problem: Data imbalance across cluster, low performance by Hive queries. TIME • Development team were refactoring Hive queries for improving the performance COST • Additional hours by engineers + The cost of unproductive cloud instances, storage and resources Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies13 Impact on ROI Delayed Analytics Increase in CostsProductivity Loss Defeats one of the prime purpose of analytics Defeats the purpose of business cost optimization Iterations reduce the productivity of dependent teams in the cycle Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies14 Current Iterative Development Approach Local Debug/ Unit Tests HDFS Data Check Performance • Localized subset of data • Non parallel execution • Practically unfeasible • Error prone • Difficult to find bad code • Difficult to collaborate across environments Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies15 A Complete Enterprise Platform Data Lake Enterprise Engines Solutions Governance Security Validate,Profile,DebugandMonitor Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies16 Introducing Jumbune: An Open Source Solution “A catalyst to accelerate realization of Big Data Analytics solutions” Flow AnalyzerData Validation Cluster Monitor Job Profiler Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies17
  • © 2014 Impetus Technologies18
  • © 2014 Impetus Technologies19
  • © 2014 Impetus Technologies20
  • © 2014 Impetus Technologies21 Full Lifecycle Support - Jumbune xxx xxx Development Quality DevOpsData Ingestion Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies22 Jumbune - Key Features • In depth code level analysis of cluster wide flow • Record and field level data violation reports • No deployment on worker nodes - Ultra light agent installation on the gateway node • Ability to turn on/off cluster monitoring at will – reduces resource load • Customizable rack aware monitoring • Correlated profiling analysis of phases, throughput and resource consumption • Ability to work with all Hadoop distributions • Coming up support for Yarn, Spark, Mesos • Available as Open Source Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies23 2 3 For general inquiries about other Impetus solutions and services reach us at bigdata@impetus.com Recorded version available at http://bit.ly/1nMw8nQ
  • © 2014 Impetus Technologies24 Thank You! Website • http://jumbune.org Contribute • http://github.com/impetus-opensource/jumbune • http://jumbune.org/jira/JUM Social • Follow @jumbune Use #jumbune • Jumbune Group: http://linkd.in/1mUmcYm Forums • Users: users-subscribe@collaborate.jumbune.org • Dev: dev-subscribe@collaborate.jumbune.org • Issues: issues-subscribe@collaborate.jumbune.org Downloads • http://jumbune.org • https://bintray.com/jumbune/downloads/jumbune Recorded version available at http://bit.ly/1nMw8nQ