Jumbune optimize hadoop-solutions

•Download as PPTX, PDF•

0 likes•411 views

Jumbune helps developers to analyze the data flow within their code and perform distributed debugging of MapReduce application, detect data anomalies, monitor your cluster and also profile your code.

Technology

1
Community 1.3.0
(Optimize both Yarn & Non Yarn Hadoop clusters)

2
Agenda
• Big Data Trends
• What is Jumbune?
• Description of Components

3
Big Data Trends
Resource sharing/isolation
frameworks: Yarn, Mesos,
Shared cluster workers etc.
(resources)
Multiple Execution engines:
MapReduce, Spark, Hama,
Storm, Giraph, etc.
Data ETLing from all
possible sources to Data
Lake

4
Hadoop based solution life stages
(as on ground) – Cyclic execution
xxx
xxx
Business User Data Analyst MapReduce Dev
Logic & Data Test
Staging Data Devops
Production
Bad
Logic?
Resource
Utilization ?
Bad
Data?
Monitoring
Needs

55
Challenges in Analytical Solutions
1. No common
platform across
actors to detect root
causes
2. Incremental
imports may ingest
bad data
3. Cluster
resources are
shared and optimal
utilization is key
4. Implementing
models in custom
MR in initial
attempts is like
hitting bull’s eye
5. Bad Logic or Bad
data

6
Intersecting solution Lifecycle Stages
xxx
xxx
Solution
Development Quality Test
Bulk & Incremental Devops
Data

7
Jumbune
“A catalyst to accelerate realization of analytical solutions”
Data Validation Flow Analyzer Cluster Monitor Job Profiler

8
Niche offerings
• In depth code level analysis of cluster wide flow
• Record level data violation reports.
• No deployment on Workers - Ultra light agent installation on Hadoop master
only
• Ability to turn on/off cluster monitoring at will – lessens resource load
• Customizable rack aware monitoring
• Correlated profiling analysis of phases, throughput and resource consumption
• Ability to work across all Hadoop Distributions

9
Components - Recommended Environments
Dev
• Flow
Debugger
• Data
Validation
• MR Job
Profiler
QA
• Data
Validation
Stage + Perf
• MR Job
Profiler
Prod
• Cluster
Monitoring
• Data
Validation

10
Supported Deployments
Jumbune
Azure, EC2
All major distributions
On Premise

11
MapReduce Flow Debugger
• Verifies the flow of input records in user’s map reduce implementation
• Drill down visualization helps developer to quickly identify the problem.
• Only tool to assist developers to figure out MapReduce implementation
faults without any extra coding

12
Data Validator
• Validates inconsistencies in data in the form of :
– Null checks
– Data type checks
– Regular expression checks
• Generic way of specifying validation rules
• Provides record level report for found anomalies
• Currently supports HDFS as the lake file system

13
MR Job Profiling
• Per Job Phase wise
– performance for each JVM
– data flow rate
– Resource usage
• Per Job Heap sites for Mapper & Reducer
• Per Job CPU cycles for Mapper & Reducer

14
Hadoop Cluster Monitoring
• Data Centre & Rack aware nodes view of Yarn and Non Yarn Daemons
• Dynamic Interval based monitoring
• Hadoop JMX, Node Resource Statistics
• Per file, node wise replica Placement (which nodes have replicas of a given
file ?)
• HDFS data placement view (HDFS balanced ?)

16
Let’s Collaborate 
Website
• http://jumbune.org
Contribute
• http://github.com/impetus-opensource/jumbune
• http://jumbune.org/jira/JUM
Social
• Follow @jumbune Use #jumbune
• Jumbune Group: http://linkd.in/1mUmcYm
Forums
• Users: users-subscribe@collaborate.jumbune.org
• Dev: dev-subscribe@collaborate.jumbune.org
• Issues: issues-subscribe@collaborate.jumbune.org
Downloads
• http://jumbune.org
• https://bintray.com/jumbune/downloads/jumbune

What's hot

Flurry Analytic Backend - Processing Terabytes of Data in Real-timeTrieu Nguyen

Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit

Debunking Common Myths in Stream ProcessingDataWorks Summit/Hadoop Summit

Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya

Apache Drill (ver. 0.1, check ver. 0.2)Camuel Gilyadov

Pacemaker hadoop infrastructure and soft serve experienceVitaliy Bashun

Data Streaming For Big DataSeval Çapraz

Boston Hadoop Meetup: Presto for the EnterpriseMatt Fuller

Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Databricks

Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningSpark Summit

Assaf Araki – Real Time Analytics at ScaleFlink Forward

Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang

Presto - SQL on anythingGrzegorz Kokosiński

Large-Scale Data Science on Hadoop (Intel Big Data Day)Uri Laserson

Automatic Scaling Iterative ComputationsGuozhang Wang

Spark Summit EU talk by Berni SchieferSpark Summit

Introduction to Apache ApexApache Apex

Next-Gen Decision Making in Under 2msIlya Ganelin

Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinFlink Forward

HBaseConEast2016: Splice machine open source rdbmsMichael Stack

What's hot (20)

Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Spark Summit EU talk by Ruben Pulido Behar Veliqi

Debunking Common Myths in Stream Processing

Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...

Apache Drill (ver. 0.1, check ver. 0.2)

Pacemaker hadoop infrastructure and soft serve experience

Data Streaming For Big Data

Boston Hadoop Meetup: Presto for the Enterprise

Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...

Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning

Assaf Araki – Real Time Analytics at Scale

Building Realtim Data Pipelines with Kafka Connect and Spark Streaming

Presto - SQL on anything

Large-Scale Data Science on Hadoop (Intel Big Data Day)

Automatic Scaling Iterative Computations

Spark Summit EU talk by Berni Schiefer

Introduction to Apache Apex

Next-Gen Decision Making in Under 2ms

Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin

HBaseConEast2016: Splice machine open source rdbms

Viewers also liked

November 2013 HUG: Compute Capacity CalculatorYahoo Developer Network

Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceCloudera, Inc.

Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your ApplicationYahoo Developer Network

Profile hadoop appsBasant Verma

HW09 Hadoop VaidyaCloudera, Inc.

Hadoop Monitoring best PracticesEdward Capriolo

Viewers also liked (6)

November 2013 HUG: Compute Capacity Calculator

Hadoop Summit 2012 | Optimizing MapReduce Job Performance

Hadoop Summit 2010 Tuning Hadoop To Deliver Performance To Your Application

Profile hadoop apps

HW09 Hadoop Vaidya

Hadoop Monitoring best Practices

Similar to Jumbune optimize hadoop-solutions

Teradata Loom Introductory Presentationmlang222

OpenSource Big Data Platform - Flamingo ProjectBYOUNG GON KIM

Introduction To Hadoop EcosystemInSemble

Hadoop-Quick introductionSandeep Singh

Taboola Road To Scale With Apache Sparktsliwowicz

Introduction to Impalamarkgrover

Hadoop Tutorial.pptSathish24111

Technologies for Data Analytics PlatformN Masahiro

Advanced Analytics and Big Data (August 2014)Thomas W. Dinsmore

Hadoop tutorialAamir Ameen

YARN Ready: Integrating to YARN with Tez Hortonworks

Hadoop/MapReduce/HDFSpraveen bhat

Big Data and Hadoopch adnan

Foxvalley bigdataTom Rogers

Spark 1.0Jatin Arora

Java scalability considerations yogesh deshpandeIndicThreads

RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovBig Data Spain

2. hadoop fundamentalsLokesh Ramaswamy

Apache hadoop technology : BeginnersShweta Patnaik

Similar to Jumbune optimize hadoop-solutions (20)

Teradata Loom Introductory Presentation

OpenSource Big Data Platform - Flamingo Project

Introduction To Hadoop Ecosystem

Hadoop-Quick introduction

Taboola Road To Scale With Apache Spark

Introduction to Impala

Hadoop Tutorial.ppt

Technologies for Data Analytics Platform

Advanced Analytics and Big Data (August 2014)

Hadoop tutorial

YARN Ready: Integrating to YARN with Tez

Hadoop/MapReduce/HDFS

Big Data and Hadoop

Foxvalley bigdata

Spark 1.0

Java scalability considerations yogesh deshpande

RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov

2. hadoop fundamentals

Apache hadoop technology : Beginners

Recently uploaded

Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4

Install Stable Diffusion in windows machinePadma Pradeep

How to Remove Document Management Hurdles with X-Docs?XfilesPro

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

Key Features Of Token Development (1).pptxLBM Solutions

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group

GenCyber Cyber Security Day PresentationMichael W. Hawkins

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Slack Application Development 101 Slidespraypatel2

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Recently uploaded (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application

Install Stable Diffusion in windows machine

How to Remove Document Management Hurdles with X-Docs?

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

Key Features Of Token Development (1).pptx

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

Next-generation AAM aircraft unveiled by Supernal, S-A2

GenCyber Cyber Security Day Presentation

08448380779 Call Girls In Friends Colony Women Seeking Men

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

My Hashitalk Indonesia April 2024 Presentation

Pigging Solutions in Pet Food Manufacturing

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget

Understanding the Laravel MVC Architecture

Slack Application Development 101 Slides

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

The transition to renewables in India.pdf

Jumbune optimize hadoop-solutions

1. 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

2. 2 Agenda • Big Data Trends • What is Jumbune? • Description of Components

3. 3 Big Data Trends Resource sharing/isolation frameworks: Yarn, Mesos, Shared cluster workers etc. (resources) Multiple Execution engines: MapReduce, Spark, Hama, Storm, Giraph, etc. Data ETLing from all possible sources to Data Lake

4. 4 Hadoop based solution life stages (as on ground) – Cyclic execution xxx xxx Business User Data Analyst MapReduce Dev Logic & Data Test Staging Data Devops Production Bad Logic? Resource Utilization ? Bad Data? Monitoring Needs

5. 55 Challenges in Analytical Solutions 1. No common platform across actors to detect root causes 2. Incremental imports may ingest bad data 3. Cluster resources are shared and optimal utilization is key 4. Implementing models in custom MR in initial attempts is like hitting bull’s eye 5. Bad Logic or Bad data

6. 6 Intersecting solution Lifecycle Stages xxx xxx Solution Development Quality Test Bulk & Incremental Devops Data

7. 7 Jumbune “A catalyst to accelerate realization of analytical solutions” Data Validation Flow Analyzer Cluster Monitor Job Profiler

8. 8 Niche offerings • In depth code level analysis of cluster wide flow • Record level data violation reports. • No deployment on Workers - Ultra light agent installation on Hadoop master only • Ability to turn on/off cluster monitoring at will – lessens resource load • Customizable rack aware monitoring • Correlated profiling analysis of phases, throughput and resource consumption • Ability to work across all Hadoop Distributions

9. 9 Components - Recommended Environments Dev • Flow Debugger • Data Validation • MR Job Profiler QA • Data Validation Stage + Perf • MR Job Profiler Prod • Cluster Monitoring • Data Validation

10. 10 Supported Deployments Jumbune Azure, EC2 All major distributions On Premise

11. 11 MapReduce Flow Debugger • Verifies the flow of input records in user’s map reduce implementation • Drill down visualization helps developer to quickly identify the problem. • Only tool to assist developers to figure out MapReduce implementation faults without any extra coding

12. 12 Data Validator • Validates inconsistencies in data in the form of : – Null checks – Data type checks – Regular expression checks • Generic way of specifying validation rules • Provides record level report for found anomalies • Currently supports HDFS as the lake file system

13. 13 MR Job Profiling • Per Job Phase wise – performance for each JVM – data flow rate – Resource usage • Per Job Heap sites for Mapper & Reducer • Per Job CPU cycles for Mapper & Reducer

14. 14 Hadoop Cluster Monitoring • Data Centre & Rack aware nodes view of Yarn and Non Yarn Daemons • Dynamic Interval based monitoring • Hadoop JMX, Node Resource Statistics • Per file, node wise replica Placement (which nodes have replicas of a given file ?) • HDFS data placement view (HDFS balanced ?)

15. 15 How we are building Jumbune?

16. 16 Let’s Collaborate  Website • http://jumbune.org Contribute • http://github.com/impetus-opensource/jumbune • http://jumbune.org/jira/JUM Social • Follow @jumbune Use #jumbune • Jumbune Group: http://linkd.in/1mUmcYm Forums • Users: users-subscribe@collaborate.jumbune.org • Dev: dev-subscribe@collaborate.jumbune.org • Issues: issues-subscribe@collaborate.jumbune.org Downloads • http://jumbune.org • https://bintray.com/jumbune/downloads/jumbune

17. 17 Thanks

Jumbune optimize hadoop-solutions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Jumbune optimize hadoop-solutions

Similar to Jumbune optimize hadoop-solutions (20)

Recently uploaded

Recently uploaded (20)

Jumbune optimize hadoop-solutions