Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast


Published on

Impetus on- demand webcast ‘Accelerating Hadoop Solution Lifecycle and Improving ROI’ available at

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast

  1. 1. © 2014 Impetus Technologies1 July 25, 2014 Accelerating the Big Data Solution Lifecycle and Improving ROI
  2. 2. © 2014 Impetus Technologies2 Agenda Big Data Analytics: Implementation patterns Challenges faced Jumbune – an open source lifecycle accelerator Enterprise solution lifecycle Ways to address the challenges Recorded version available at
  3. 3. © 2014 Impetus Technologies3 Big Data Analytics Primary drive for performing analytics Rise of the enterprise data lake Utilization of analytical resources Recorded version available at
  4. 4. © 2014 Impetus Technologies4 Primary Purposes of an Analytical Solution Optimize the business Reduce time taken by analytics Result in effective analytics Compete and win Recorded version available at
  5. 5. © 2014 Impetus Technologies5 Rise of the Enterprise Data Lake BIG DATA Sources of Data: ETL from every source - RDBMS, flat files, queues, legacy off loading, logs Arrival of Data: Intermittent, bulk, incremental Theme : “Leave no Data unused” Recorded version available at
  6. 6. © 2014 Impetus Technologies6 Utilization of Analytics Resources • Capitalize on all analytics resources (engines) available • Access data with a variety of processing engines – Storm, Spark, Yarn etc. • Model in data science analytical systems – R, Octave, SAS, etc. • Write complex logic in custom MapReduce • Reuse code as User Defined Functions (UDFs) • Create ad hoc queries using Hive and PIG • Customization of Mahout algorithms, machine learning libraries Recorded version available at
  7. 7. © 2014 Impetus Technologies7 Enterprise Big Data Solution Trends • No more single purpose Hadoop clusters • Enterprise Data Lake: Data flowing from many sources • Integrated platforms using variety of analytical engines • Serving multiple business applications • Resource sharing is a must across applications and engines Recorded version available at
  8. 8. © 2014 Impetus Technologies8 Enterprise Solution Lifecycle (High level view) Business Requirement Designing / Modelling Development and Testing Production and Monitoring Recorded version available at
  9. 9. © 2014 Impetus Technologies9 Enterprise Solution Lifecycle (Ground level view) xxx xxx Business User Data Analyst Development Quality Test DevOpsData Lake Production and Monitoring Recorded version available at
  10. 10. © 2014 Impetus Technologies10 Challenges in Enterprise Analytical Solutions No common platform to detect root causes Incremental imports may ingest bad data Cluster resources are shared and optimal utilization is the key Implementing models in custom MR without errors is like hitting the bull’s eye Bad logic or bad data Recorded version available at
  11. 11. © 2014 Impetus Technologies11 Scenario: Digitization of Newspaper for Analyzing News xxx xxx Team: 5 Dev, 3 QA, 2 DevOps Simple Problem: ‘q’ was misread by OCR as 9 TIME • A single code fault on TB of data can consume 24 work hours total for 2 Developers + 1 QA COST • Additional hours by engineers + The cost of unproductive cloud instances, storage and resources Recorded version available at
  12. 12. © 2014 Impetus Technologies12 Scenario: Hive Queries Interpreted as MapReduce Executions on a Hadoop Cluster xxx xxx Team: 2 Dev, 1 QA, 1 DevOps Simple Problem: Data imbalance across cluster, low performance by Hive queries. TIME • Development team were refactoring Hive queries for improving the performance COST • Additional hours by engineers + The cost of unproductive cloud instances, storage and resources Recorded version available at
  13. 13. © 2014 Impetus Technologies13 Impact on ROI Delayed Analytics Increase in CostsProductivity Loss Defeats one of the prime purpose of analytics Defeats the purpose of business cost optimization Iterations reduce the productivity of dependent teams in the cycle Recorded version available at
  14. 14. © 2014 Impetus Technologies14 Current Iterative Development Approach Local Debug/ Unit Tests HDFS Data Check Performance • Localized subset of data • Non parallel execution • Practically unfeasible • Error prone • Difficult to find bad code • Difficult to collaborate across environments Recorded version available at
  15. 15. © 2014 Impetus Technologies15 A Complete Enterprise Platform Data Lake Enterprise Engines Solutions Governance Security Validate,Profile,DebugandMonitor Recorded version available at
  16. 16. © 2014 Impetus Technologies16 Introducing Jumbune: An Open Source Solution “A catalyst to accelerate realization of Big Data Analytics solutions” Flow AnalyzerData Validation Cluster Monitor Job Profiler Recorded version available at
  17. 17. © 2014 Impetus Technologies17
  18. 18. © 2014 Impetus Technologies18
  19. 19. © 2014 Impetus Technologies19
  20. 20. © 2014 Impetus Technologies20
  21. 21. © 2014 Impetus Technologies21 Full Lifecycle Support - Jumbune xxx xxx Development Quality DevOpsData Ingestion Recorded version available at
  22. 22. © 2014 Impetus Technologies22 Jumbune - Key Features • In depth code level analysis of cluster wide flow • Record and field level data violation reports • No deployment on worker nodes - Ultra light agent installation on the gateway node • Ability to turn on/off cluster monitoring at will – reduces resource load • Customizable rack aware monitoring • Correlated profiling analysis of phases, throughput and resource consumption • Ability to work with all Hadoop distributions • Coming up support for Yarn, Spark, Mesos • Available as Open Source Recorded version available at
  23. 23. © 2014 Impetus Technologies23 2 3 For general inquiries about other Impetus solutions and services reach us at Recorded version available at
  24. 24. © 2014 Impetus Technologies24 Thank You! Website • Contribute • • Social • Follow @jumbune Use #jumbune • Jumbune Group: Forums • Users: • Dev: • Issues: Downloads • • Recorded version available at