© 2014 Impetus Technologies1
July 25, 2014
Accelerating the Big Data Solution
Lifecycle and Improving ROI
© 2014 Impetus Technologies2
Agenda
Big Data
Analytics:
Implementation
patterns
Challenges
faced
Jumbune –
an open source
...
© 2014 Impetus Technologies3
Big Data Analytics
Primary drive
for performing
analytics
Rise of the
enterprise
data lake
Ut...
© 2014 Impetus Technologies4
Primary Purposes of an Analytical Solution
Optimize the
business
Reduce time
taken by analyti...
© 2014 Impetus Technologies5
Rise of the Enterprise Data Lake
BIG DATA
Sources of Data: ETL from every
source - RDBMS, fla...
© 2014 Impetus Technologies6
Utilization of Analytics Resources
• Capitalize on all analytics resources (engines) availabl...
© 2014 Impetus Technologies7
Enterprise Big Data Solution Trends
• No more single purpose Hadoop clusters
• Enterprise Dat...
© 2014 Impetus Technologies8
Enterprise Solution Lifecycle (High level view)
Business
Requirement
Designing /
Modelling
De...
© 2014 Impetus Technologies9
Enterprise Solution Lifecycle (Ground level
view)
xxx
xxx
Business User Data Analyst Developm...
© 2014 Impetus Technologies10
Challenges in Enterprise Analytical Solutions
No common
platform to detect
root causes
Incre...
© 2014 Impetus Technologies11
Scenario: Digitization of Newspaper for
Analyzing News
xxx
xxx
Team: 5 Dev, 3 QA, 2 DevOps
S...
© 2014 Impetus Technologies12
Scenario: Hive Queries Interpreted as
MapReduce Executions on a Hadoop Cluster
xxx
xxx
Team:...
© 2014 Impetus Technologies13
Impact on ROI
Delayed Analytics Increase in CostsProductivity Loss
Defeats one of the
prime ...
© 2014 Impetus Technologies14
Current Iterative Development Approach
Local
Debug/ Unit
Tests
HDFS Data Check
Performance
•...
© 2014 Impetus Technologies15
A Complete Enterprise Platform
Data Lake
Enterprise Engines
Solutions
Governance
Security
Va...
© 2014 Impetus Technologies16
Introducing Jumbune: An Open Source
Solution
“A catalyst to accelerate realization of Big Da...
© 2014 Impetus Technologies17
© 2014 Impetus Technologies18
© 2014 Impetus Technologies19
© 2014 Impetus Technologies20
© 2014 Impetus Technologies21
Full Lifecycle Support - Jumbune
xxx
xxx
Development Quality
DevOpsData Ingestion
Recorded v...
© 2014 Impetus Technologies22
Jumbune - Key Features
• In depth code level analysis of cluster wide flow
• Record and fiel...
© 2014 Impetus Technologies23
2
3
For general inquiries about other Impetus solutions and services
reach us at bigdata@imp...
© 2014 Impetus Technologies24
Thank You!
Website
• http://jumbune.org
Contribute
• http://github.com/impetus-opensource/ju...
Upcoming SlideShare
Loading in …5
×

Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast

675 views

Published on

Impetus on- demand webcast ‘Accelerating Hadoop Solution Lifecycle and Improving ROI’ available at http://bit.ly/1nMw8nQ

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
675
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand Webcast

  1. 1. © 2014 Impetus Technologies1 July 25, 2014 Accelerating the Big Data Solution Lifecycle and Improving ROI
  2. 2. © 2014 Impetus Technologies2 Agenda Big Data Analytics: Implementation patterns Challenges faced Jumbune – an open source lifecycle accelerator Enterprise solution lifecycle Ways to address the challenges Recorded version available at http://bit.ly/1nMw8nQ
  3. 3. © 2014 Impetus Technologies3 Big Data Analytics Primary drive for performing analytics Rise of the enterprise data lake Utilization of analytical resources Recorded version available at http://bit.ly/1nMw8nQ
  4. 4. © 2014 Impetus Technologies4 Primary Purposes of an Analytical Solution Optimize the business Reduce time taken by analytics Result in effective analytics Compete and win Recorded version available at http://bit.ly/1nMw8nQ
  5. 5. © 2014 Impetus Technologies5 Rise of the Enterprise Data Lake BIG DATA Sources of Data: ETL from every source - RDBMS, flat files, queues, legacy off loading, logs Arrival of Data: Intermittent, bulk, incremental Theme : “Leave no Data unused” Recorded version available at http://bit.ly/1nMw8nQ
  6. 6. © 2014 Impetus Technologies6 Utilization of Analytics Resources • Capitalize on all analytics resources (engines) available • Access data with a variety of processing engines – Storm, Spark, Yarn etc. • Model in data science analytical systems – R, Octave, SAS, etc. • Write complex logic in custom MapReduce • Reuse code as User Defined Functions (UDFs) • Create ad hoc queries using Hive and PIG • Customization of Mahout algorithms, machine learning libraries Recorded version available at http://bit.ly/1nMw8nQ
  7. 7. © 2014 Impetus Technologies7 Enterprise Big Data Solution Trends • No more single purpose Hadoop clusters • Enterprise Data Lake: Data flowing from many sources • Integrated platforms using variety of analytical engines • Serving multiple business applications • Resource sharing is a must across applications and engines Recorded version available at http://bit.ly/1nMw8nQ
  8. 8. © 2014 Impetus Technologies8 Enterprise Solution Lifecycle (High level view) Business Requirement Designing / Modelling Development and Testing Production and Monitoring Recorded version available at http://bit.ly/1nMw8nQ
  9. 9. © 2014 Impetus Technologies9 Enterprise Solution Lifecycle (Ground level view) xxx xxx Business User Data Analyst Development Quality Test DevOpsData Lake Production and Monitoring Recorded version available at http://bit.ly/1nMw8nQ
  10. 10. © 2014 Impetus Technologies10 Challenges in Enterprise Analytical Solutions No common platform to detect root causes Incremental imports may ingest bad data Cluster resources are shared and optimal utilization is the key Implementing models in custom MR without errors is like hitting the bull’s eye Bad logic or bad data Recorded version available at http://bit.ly/1nMw8nQ
  11. 11. © 2014 Impetus Technologies11 Scenario: Digitization of Newspaper for Analyzing News xxx xxx Team: 5 Dev, 3 QA, 2 DevOps Simple Problem: ‘q’ was misread by OCR as 9 TIME • A single code fault on TB of data can consume 24 work hours total for 2 Developers + 1 QA COST • Additional hours by engineers + The cost of unproductive cloud instances, storage and resources Recorded version available at http://bit.ly/1nMw8nQ
  12. 12. © 2014 Impetus Technologies12 Scenario: Hive Queries Interpreted as MapReduce Executions on a Hadoop Cluster xxx xxx Team: 2 Dev, 1 QA, 1 DevOps Simple Problem: Data imbalance across cluster, low performance by Hive queries. TIME • Development team were refactoring Hive queries for improving the performance COST • Additional hours by engineers + The cost of unproductive cloud instances, storage and resources Recorded version available at http://bit.ly/1nMw8nQ
  13. 13. © 2014 Impetus Technologies13 Impact on ROI Delayed Analytics Increase in CostsProductivity Loss Defeats one of the prime purpose of analytics Defeats the purpose of business cost optimization Iterations reduce the productivity of dependent teams in the cycle Recorded version available at http://bit.ly/1nMw8nQ
  14. 14. © 2014 Impetus Technologies14 Current Iterative Development Approach Local Debug/ Unit Tests HDFS Data Check Performance • Localized subset of data • Non parallel execution • Practically unfeasible • Error prone • Difficult to find bad code • Difficult to collaborate across environments Recorded version available at http://bit.ly/1nMw8nQ
  15. 15. © 2014 Impetus Technologies15 A Complete Enterprise Platform Data Lake Enterprise Engines Solutions Governance Security Validate,Profile,DebugandMonitor Recorded version available at http://bit.ly/1nMw8nQ
  16. 16. © 2014 Impetus Technologies16 Introducing Jumbune: An Open Source Solution “A catalyst to accelerate realization of Big Data Analytics solutions” Flow AnalyzerData Validation Cluster Monitor Job Profiler Recorded version available at http://bit.ly/1nMw8nQ
  17. 17. © 2014 Impetus Technologies17
  18. 18. © 2014 Impetus Technologies18
  19. 19. © 2014 Impetus Technologies19
  20. 20. © 2014 Impetus Technologies20
  21. 21. © 2014 Impetus Technologies21 Full Lifecycle Support - Jumbune xxx xxx Development Quality DevOpsData Ingestion Recorded version available at http://bit.ly/1nMw8nQ
  22. 22. © 2014 Impetus Technologies22 Jumbune - Key Features • In depth code level analysis of cluster wide flow • Record and field level data violation reports • No deployment on worker nodes - Ultra light agent installation on the gateway node • Ability to turn on/off cluster monitoring at will – reduces resource load • Customizable rack aware monitoring • Correlated profiling analysis of phases, throughput and resource consumption • Ability to work with all Hadoop distributions • Coming up support for Yarn, Spark, Mesos • Available as Open Source Recorded version available at http://bit.ly/1nMw8nQ
  23. 23. © 2014 Impetus Technologies23 2 3 For general inquiries about other Impetus solutions and services reach us at bigdata@impetus.com Recorded version available at http://bit.ly/1nMw8nQ
  24. 24. © 2014 Impetus Technologies24 Thank You! Website • http://jumbune.org Contribute • http://github.com/impetus-opensource/jumbune • http://jumbune.org/jira/JUM Social • Follow @jumbune Use #jumbune • Jumbune Group: http://linkd.in/1mUmcYm Forums • Users: users-subscribe@collaborate.jumbune.org • Dev: dev-subscribe@collaborate.jumbune.org • Issues: issues-subscribe@collaborate.jumbune.org Downloads • http://jumbune.org • https://bintray.com/jumbune/downloads/jumbune Recorded version available at http://bit.ly/1nMw8nQ

×