SlideShare a Scribd company logo
CASE STUDY




Enhancing, Monitoring and
Managing a Hadoop Based
Analytics Solution
In this engagement, Imaginea team contributed over 14 patches to the Hadoop community and
all of them were verified and accepted.



COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC.




COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT
WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
CASE STUDY




1. Executive Summary
One of Imaginea’s clients is a video marketing company that deals with branding,
real time media buying, ad serving, targeting, optimization and brand
measurement.

Imaginea enhanced and managed a platform for video playtime statistical
analysis for our client. The solution used Hadoop (Cloudera distribution) and
Hive. The cluster was 500 nodes with 300 TB of existing data and over 200 GBs
data being streamed in and processed every day.



2. Hadoop Migration and New Features
We helped in migration of the entire platform from 0.19 to 0.20.2, porting all the
MR jobs. Migration also included back-porting some feature from .21 to 20.
Features that were back-ported included:

 Map-Side join
 CompositeInputFormat



3. Cluster Monitoring, Management & Resolution
We helped in monitoring and managing the cluster during IST Business hours.
We were able to uncover workflow instability issues and lack of resume feature
during these phases, which we resolved later.

The solution had used a custom workflow manager; it had stability issues
especially as the load increased by orders of magnitude.

Zookeeper was introduced as the central workflow status manager and changes
were made to the workflow manager to use the same. This helped the system
stability improve by about 90%.

We also discovered problems in publishing configuration and code changes to all
the nodes in the cluster during this phase. We used Ganglia and Nagios for



COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT
WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
CASE STUDY




monitoring. We also solved some of the memory overflow issues in the Hadoop
nodes.



4. Configuration Management using Puppet
As part of the engagement, Imaginea worked to introduce Puppet in to the system
removing a custom configuration management tool. We had developed some
recipes and were able to solve many issues that were raised with replication of
configuration changes and deployment of new codebase



5. Performance Improvements
Imaginea contributed to enhance performance in a variety of ways. Below are a
couple of highlight scenarios

Job Starvation

Problem: Many cases of data overflow at a collector level

The solution had business analytics Hive queries, which used to starve the
normal MR jobs. Imaginea helped in development of a fair scheduling algorithm
which balances the production tasks and hive query jobs. Before this solution
there were many cases of data overflow at the collector level.

Job Optimization

Problem: Job to identify if the user was unique took over 8 hours

Imaginea helped in optimizing the job from 8-10 hours to 4 hours using better
distribution of keys and better hashing algorithm.




COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT
WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
CASE STUDY




6. Apache Hadoop involvement and other
 contributions
We have worked on Apache Hadoop and other components. Following are the list
of patches that were contributed to the community by Imaginea.

Jira Id               Severity/ Priority     Component        Brief Description

MAPREDUCE-3360        Critical-Improvement   mrv2             Provide information about lost nodes in the UI

MAPREDUCE-3686        Critical-Bug           mrv2             history server web ui - job counter values for map/reduce not
                                                              shown properly

MAPREDUCE-3532        Critical-Bug           mrv2,nodema      When 0 is provided as port number in
                                             nager            yarn.nodemanager.webapp.address, NMs webserver
                                                              component picks up random port, NM keeps on Reporting 0
                                                              port to RM

MAPREDUCE-3952        Major-Bug              mrv2             In MR2, when Total input paths to process == 1,
                                                              CombinefileInputFormat.getSplits() returns 0 split.

MAPREDUCE-3686        Critical-Bug           mrv2             history server web ui - job counter values for map/reduce not
                                                              shown properly

MAPREDUCE-3532        Critical-Bug           mrv2,nodema      When 0(zero) is provided as port number in
                                             nager            yarn.nodemanager.webapp.address, NMs webserver
                                                              component picks up random port, NM keeps on Reporting
                                                              0(zero) port to RM

MAPREDUCE-3316        Major-Bug              Resource         Rebooted Link not working
                                             Manager

MAPREDUCE-3708        Major-Bug              mrv2             Metrics: Incorrect Apps submitted count

MAPREDUCE-3723        Major-Bug              mrv2, test,      TestAMWebServicesJobs & TestHSWebServicesJobs
                                             webapp           incorrectly asserting tests

MAPREDUCE-4050        Major-Bug              mrv2             Invalid Node link

MAPREDUCE-3870        Major – Bug            mrv2             Invalid App Metrics

MAPREDUCE-4102        Major – Bug            Webapps          Job counter not available in Job History Web UI for killed jobs

MAPREDUCE-4002        Major – Bug            Examples         MultiFileWordCount job fails if the input path is not from default
                                                              file system

MAPREDUCE-4040        Minor-Bug              mrv2,            History links should use hostname rather than IP address.
                                             jobhistoryserv
                                             er

MAPREDUCE-3212        Minor-Bug              mrv2             Message displays while executing yarn command should be
                                                              proper




COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT
WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.

More Related Content

Viewers also liked

Our family holiday
Our family holidayOur family holiday
Our family holidaythesopha5
 
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformatanalyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
leorick lin
 
Social Media & Social Game - How They Works Mutually
Social Media & Social Game - How They Works MutuallySocial Media & Social Game - How They Works Mutually
Social Media & Social Game - How They Works Mutually
Abang Edwin Syarif Agustin
 
GlowRadiance cellu
GlowRadiance celluGlowRadiance cellu
GlowRadiance cellu
DermaHealth_
 
The Page Diet
The Page DietThe Page Diet
The Page Diet
Dr. Leon du Plessis
 
KITCHEN WORKTOP: CAESARSTONE
KITCHEN WORKTOP: CAESARSTONEKITCHEN WORKTOP: CAESARSTONE
KITCHEN WORKTOP: CAESARSTONE
MKW Surfaces
 
Bread Loaf School of English Provides Innovative Six-week Study Course
Bread Loaf School of English Provides Innovative Six-week Study CourseBread Loaf School of English Provides Innovative Six-week Study Course
Bread Loaf School of English Provides Innovative Six-week Study Course
Willy Wood
 
Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3
leorick lin
 
Cognition and learning in education
Cognition and learning in educationCognition and learning in education
Cognition and learning in education
Periyar University, Salem-11
 
Опис досвіду вчителя української мови та літератури Веретільник Л.І.
Опис досвіду вчителя української мови та літератури Веретільник Л.І.Опис досвіду вчителя української мови та літератури Веретільник Л.І.
Опис досвіду вчителя української мови та літератури Веретільник Л.І.
Галина Сызько
 
Програма дослідно-експериментальної роботи з гуманної педагогіки
Програма дослідно-експериментальної роботи з гуманної педагогікиПрограма дослідно-експериментальної роботи з гуманної педагогіки
Програма дослідно-експериментальної роботи з гуманної педагогіки
Галина Сызько
 
Preposições: PowerPoint
Preposições:  PowerPointPreposições:  PowerPoint
Preposições: PowerPointA. Simoes
 

Viewers also liked (13)

Our family holiday
Our family holidayOur family holiday
Our family holiday
 
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformatanalyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
analyzing hdfs files using apace spark and mapreduce FixedLengthInputformat
 
Kung fu panda session7
Kung fu panda session7Kung fu panda session7
Kung fu panda session7
 
Social Media & Social Game - How They Works Mutually
Social Media & Social Game - How They Works MutuallySocial Media & Social Game - How They Works Mutually
Social Media & Social Game - How They Works Mutually
 
GlowRadiance cellu
GlowRadiance celluGlowRadiance cellu
GlowRadiance cellu
 
The Page Diet
The Page DietThe Page Diet
The Page Diet
 
KITCHEN WORKTOP: CAESARSTONE
KITCHEN WORKTOP: CAESARSTONEKITCHEN WORKTOP: CAESARSTONE
KITCHEN WORKTOP: CAESARSTONE
 
Bread Loaf School of English Provides Innovative Six-week Study Course
Bread Loaf School of English Provides Innovative Six-week Study CourseBread Loaf School of English Provides Innovative Six-week Study Course
Bread Loaf School of English Provides Innovative Six-week Study Course
 
Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3Multiclassification with Decision Tree in Spark MLlib 1.3
Multiclassification with Decision Tree in Spark MLlib 1.3
 
Cognition and learning in education
Cognition and learning in educationCognition and learning in education
Cognition and learning in education
 
Опис досвіду вчителя української мови та літератури Веретільник Л.І.
Опис досвіду вчителя української мови та літератури Веретільник Л.І.Опис досвіду вчителя української мови та літератури Веретільник Л.І.
Опис досвіду вчителя української мови та літератури Веретільник Л.І.
 
Програма дослідно-експериментальної роботи з гуманної педагогіки
Програма дослідно-експериментальної роботи з гуманної педагогікиПрограма дослідно-експериментальної роботи з гуманної педагогіки
Програма дослідно-експериментальної роботи з гуманної педагогіки
 
Preposições: PowerPoint
Preposições:  PowerPointPreposições:  PowerPoint
Preposições: PowerPoint
 

Similar to Imaginea cs hadoop

Выявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов RiverbedВыявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов Riverbed
Elena Marianenko
 
INSIDE M2M products & references
INSIDE M2M products & referencesINSIDE M2M products & references
INSIDE M2M products & referencesDaniel Stanke
 
Java Abs Grid Information Retrival System
Java Abs   Grid Information Retrival SystemJava Abs   Grid Information Retrival System
Java Abs Grid Information Retrival System
ncct
 
Cisco discovery d homesb module 9 - v.4 in english.
Cisco discovery   d homesb module 9 - v.4 in english.Cisco discovery   d homesb module 9 - v.4 in english.
Cisco discovery d homesb module 9 - v.4 in english.igede tirtanata
 
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Bangladesh Network Operators Group
 
M|18 How MariaDB Server Scales with Spider
M|18 How MariaDB Server Scales with SpiderM|18 How MariaDB Server Scales with Spider
M|18 How MariaDB Server Scales with Spider
MariaDB plc
 
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
IRJET Journal
 
Rishikesh Sharma Portfolio
Rishikesh Sharma PortfolioRishikesh Sharma Portfolio
Rishikesh Sharma Portfolio
sharmarishikesh
 
Sprint 131
Sprint 131Sprint 131
Sprint 131
ManageIQ
 
Five Meteor Dev Power Tools - 2015-04-06
Five Meteor Dev Power Tools - 2015-04-06Five Meteor Dev Power Tools - 2015-04-06
Five Meteor Dev Power Tools - 2015-04-06
Mike Seidle
 
ICPDAS - Modbus Concentrator 700 series
ICPDAS - Modbus Concentrator 700 seriesICPDAS - Modbus Concentrator 700 series
ICPDAS - Modbus Concentrator 700 series
ICPDAS
 
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 jsАНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
WDDay
 
NIG系統報表開發指南
NIG系統報表開發指南NIG系統報表開發指南
NIG系統報表開發指南Guo Albert
 
Ccna 3-discovery-4-0-module-9-100-
Ccna 3-discovery-4-0-module-9-100-Ccna 3-discovery-4-0-module-9-100-
Ccna 3-discovery-4-0-module-9-100-junkut3
 
Map reduce
Map reduceMap reduce
Ccna 3 Chapter 9 V4.0 Answers
Ccna 3 Chapter 9 V4.0 AnswersCcna 3 Chapter 9 V4.0 Answers
Ccna 3 Chapter 9 V4.0 Answers
ccna4discovery
 
AUTOMATIC DETECTION OF OVERSPEED VEHICLE
AUTOMATIC DETECTION OF OVERSPEED VEHICLEAUTOMATIC DETECTION OF OVERSPEED VEHICLE
AUTOMATIC DETECTION OF OVERSPEED VEHICLE
IRJET Journal
 
Rails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan GusievRails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan Gusiev
Ruby Meditation
 
Fluentd meetup #3
Fluentd meetup #3Fluentd meetup #3
Fluentd meetup #3
Treasure Data, Inc.
 

Similar to Imaginea cs hadoop (20)

Выявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов RiverbedВыявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов Riverbed
 
INSIDE M2M products & references
INSIDE M2M products & referencesINSIDE M2M products & references
INSIDE M2M products & references
 
Java Abs Grid Information Retrival System
Java Abs   Grid Information Retrival SystemJava Abs   Grid Information Retrival System
Java Abs Grid Information Retrival System
 
Cisco discovery d homesb module 9 - v.4 in english.
Cisco discovery   d homesb module 9 - v.4 in english.Cisco discovery   d homesb module 9 - v.4 in english.
Cisco discovery d homesb module 9 - v.4 in english.
 
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
 
M|18 How MariaDB Server Scales with Spider
M|18 How MariaDB Server Scales with SpiderM|18 How MariaDB Server Scales with Spider
M|18 How MariaDB Server Scales with Spider
 
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduceA simulation-based approach for straggler tasks detection in Hadoop MapReduce
A simulation-based approach for straggler tasks detection in Hadoop MapReduce
 
Rishikesh Sharma Portfolio
Rishikesh Sharma PortfolioRishikesh Sharma Portfolio
Rishikesh Sharma Portfolio
 
ccna 4 final 2012
ccna 4 final 2012ccna 4 final 2012
ccna 4 final 2012
 
Sprint 131
Sprint 131Sprint 131
Sprint 131
 
Five Meteor Dev Power Tools - 2015-04-06
Five Meteor Dev Power Tools - 2015-04-06Five Meteor Dev Power Tools - 2015-04-06
Five Meteor Dev Power Tools - 2015-04-06
 
ICPDAS - Modbus Concentrator 700 series
ICPDAS - Modbus Concentrator 700 seriesICPDAS - Modbus Concentrator 700 series
ICPDAS - Modbus Concentrator 700 series
 
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 jsАНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
 
NIG系統報表開發指南
NIG系統報表開發指南NIG系統報表開發指南
NIG系統報表開發指南
 
Ccna 3-discovery-4-0-module-9-100-
Ccna 3-discovery-4-0-module-9-100-Ccna 3-discovery-4-0-module-9-100-
Ccna 3-discovery-4-0-module-9-100-
 
Map reduce
Map reduceMap reduce
Map reduce
 
Ccna 3 Chapter 9 V4.0 Answers
Ccna 3 Chapter 9 V4.0 AnswersCcna 3 Chapter 9 V4.0 Answers
Ccna 3 Chapter 9 V4.0 Answers
 
AUTOMATIC DETECTION OF OVERSPEED VEHICLE
AUTOMATIC DETECTION OF OVERSPEED VEHICLEAUTOMATIC DETECTION OF OVERSPEED VEHICLE
AUTOMATIC DETECTION OF OVERSPEED VEHICLE
 
Rails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan GusievRails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan Gusiev
 
Fluentd meetup #3
Fluentd meetup #3Fluentd meetup #3
Fluentd meetup #3
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

Imaginea cs hadoop

  • 1. CASE STUDY Enhancing, Monitoring and Managing a Hadoop Based Analytics Solution In this engagement, Imaginea team contributed over 14 patches to the Hadoop community and all of them were verified and accepted. COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
  • 2. CASE STUDY 1. Executive Summary One of Imaginea’s clients is a video marketing company that deals with branding, real time media buying, ad serving, targeting, optimization and brand measurement. Imaginea enhanced and managed a platform for video playtime statistical analysis for our client. The solution used Hadoop (Cloudera distribution) and Hive. The cluster was 500 nodes with 300 TB of existing data and over 200 GBs data being streamed in and processed every day. 2. Hadoop Migration and New Features We helped in migration of the entire platform from 0.19 to 0.20.2, porting all the MR jobs. Migration also included back-porting some feature from .21 to 20. Features that were back-ported included:  Map-Side join  CompositeInputFormat 3. Cluster Monitoring, Management & Resolution We helped in monitoring and managing the cluster during IST Business hours. We were able to uncover workflow instability issues and lack of resume feature during these phases, which we resolved later. The solution had used a custom workflow manager; it had stability issues especially as the load increased by orders of magnitude. Zookeeper was introduced as the central workflow status manager and changes were made to the workflow manager to use the same. This helped the system stability improve by about 90%. We also discovered problems in publishing configuration and code changes to all the nodes in the cluster during this phase. We used Ganglia and Nagios for COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
  • 3. CASE STUDY monitoring. We also solved some of the memory overflow issues in the Hadoop nodes. 4. Configuration Management using Puppet As part of the engagement, Imaginea worked to introduce Puppet in to the system removing a custom configuration management tool. We had developed some recipes and were able to solve many issues that were raised with replication of configuration changes and deployment of new codebase 5. Performance Improvements Imaginea contributed to enhance performance in a variety of ways. Below are a couple of highlight scenarios Job Starvation Problem: Many cases of data overflow at a collector level The solution had business analytics Hive queries, which used to starve the normal MR jobs. Imaginea helped in development of a fair scheduling algorithm which balances the production tasks and hive query jobs. Before this solution there were many cases of data overflow at the collector level. Job Optimization Problem: Job to identify if the user was unique took over 8 hours Imaginea helped in optimizing the job from 8-10 hours to 4 hours using better distribution of keys and better hashing algorithm. COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.
  • 4. CASE STUDY 6. Apache Hadoop involvement and other contributions We have worked on Apache Hadoop and other components. Following are the list of patches that were contributed to the community by Imaginea. Jira Id Severity/ Priority Component Brief Description MAPREDUCE-3360 Critical-Improvement mrv2 Provide information about lost nodes in the UI MAPREDUCE-3686 Critical-Bug mrv2 history server web ui - job counter values for map/reduce not shown properly MAPREDUCE-3532 Critical-Bug mrv2,nodema When 0 is provided as port number in nager yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM MAPREDUCE-3952 Major-Bug mrv2 In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split. MAPREDUCE-3686 Critical-Bug mrv2 history server web ui - job counter values for map/reduce not shown properly MAPREDUCE-3532 Critical-Bug mrv2,nodema When 0(zero) is provided as port number in nager yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0(zero) port to RM MAPREDUCE-3316 Major-Bug Resource Rebooted Link not working Manager MAPREDUCE-3708 Major-Bug mrv2 Metrics: Incorrect Apps submitted count MAPREDUCE-3723 Major-Bug mrv2, test, TestAMWebServicesJobs & TestHSWebServicesJobs webapp incorrectly asserting tests MAPREDUCE-4050 Major-Bug mrv2 Invalid Node link MAPREDUCE-3870 Major – Bug mrv2 Invalid App Metrics MAPREDUCE-4102 Major – Bug Webapps Job counter not available in Job History Web UI for killed jobs MAPREDUCE-4002 Major – Bug Examples MultiFileWordCount job fails if the input path is not from default file system MAPREDUCE-4040 Minor-Bug mrv2, History links should use hostname rather than IP address. jobhistoryserv er MAPREDUCE-3212 Minor-Bug mrv2 Message displays while executing yarn command should be proper COPYRIGHT © 2012, IMAGINEA TECHNOLOGIES, INC. THIS DOCUMENT IS CONFIDENTIAL AND NOT FOR DISTRIBUTION WITHOUT WRITTEN PERMISSION FROM IMAGINEA TECHNOLOGIES, INC.