SlideShare a Scribd company logo
Hadoop Testing
Workshop
Ophir Cohen
Data Platform Leader,
ophirc@liveperson.com
July 2013
Agenda
1. Connection Before Content
2. Testing Fundamental
3. Unit Tests
4. Integration Tests
5. Try it out
6. Performance
7. Diagnostics
Why Testing
1. Catch bugs early in the developing cycle
2. Transparency of current project status
3. Easy developing / refactoring: immediate feedback
4. Push developer to provide better and stable code
5. Decrease developing cycle times
Why Automatic Testing?
It isn't real question right?
Testing Fundamental
1. Unit testing - functional verification of each 'unit' (method /
class in Java)
2. Integration testing - verifies that the system works as a
whole
3. Performance testing - test the efficiency of the program.
Deepened by code AND cluster architecture
4. Diagnostic - the way to find problems in production.
--> 1 + 2 should be done BEFORE production
Unit Tests
Key Features
1. Simple (up to 10 lines)
2. Isolation (no DB connection, no cluster dependency etc...)
3. Deterministics - PASS or FAIL
4. Automated (of course)
Why Unit Tests
1. Prevent regression
2. Fast - no need of full MR env
3. Help in refactoring and updates
Unit Tests - MR jobs
Best Practices
1. Extract the tested code into isolated method/class
2. Do not test MR framework but pure Java
3. Use the same package for tests
MRUnit
1. Lib for MR unit tests
2. Apache project
3. Supports testing of mappers, reducers and full job (without full
cluster)
4. Supports counters testing (nice!)
Unit Tests - Examples
Unit Tests Code Example
Integration Tests - background
1. Unit tests test each unit (Mapper/Reducer), integration
test the integrated work
2. Test the integration with the framework
3. Does not limited by data volumes
Integration Tests - tips and tricks
Tips and tricks
1. Use MiniMRCluster / MiniDFSCluster for tests
2. Use Linux
3. Make dev == production
4. Use data sampling:
a. Random sampling
b. Biased sampling
5. Apache BigTop (never try that)
6. Use Cloudera CDH
Lets play a bit
1. Checkout the code:
git clone https://github.com/ophchu/mapreduce-tutorials.git
2. Make sure you manage to run the mapper test
3. Complete the MRUnit tests for the reducer and full job
4. Play with the MiniMRCluster/MiniDFSCluster test
Performance
Profiling (at a glance...)
1. Profile your code
2. Measure and tune what's matters to you
3. Benchmarking: micro and macro
4. Hadoop has a built-in profiler (e.g. using hprof)
Cluster Performance
1. Terasort test
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4.
1.2.jar teragen 1000 /user/dataint/terasort/input
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4.
1.2.jar terasort /user/dataint/terasort/input /user/dataint/terasort/output
2. MRBench - MR benchmarking
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench -numRuns 2
-maps 10 -reduces 10 -inputLines 100 -inputType random
3. NNBench - Name Node benchmarking
4. TestDFSIO - write and read performance
Diagnostics
1. Check web API (http://your_server:50030/jobtracker.jsp):
a. Nodes: how many up, how many down, check slots
b. Jobs: logs, failures, exceptions
c. Counters: expected
2. Configuration:
a. check job conf (job.xml)
b. Check env conf (http://your_server:50030/conf)
3. Jobs history (http://your_server:50030/jobhistory.jsp)
4. Log dirs:
a. Job tracker (http://your_server:50030/logs/)
b. Task trakcers
Thanks
● ophchu@gmail.com
● @ophchu
Thanks

More Related Content

What's hot

QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
RTTS
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
Qualitest
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
RTTS
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
RTTS
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
RTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
RTTS
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
Impetus Technologies
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
RTTS
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
Bill Hayduk
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
RTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
RTTS
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
RTTS
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
RTTS
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonTerry Bunio
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
RTTS
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
RTTS
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
RTTS
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
Patrick Van Renterghem
 

What's hot (20)

QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 

Similar to Hadoop testing workshop - july 2013

ScalaUA - distage: Staged Dependency Injection
ScalaUA - distage: Staged Dependency InjectionScalaUA - distage: Staged Dependency Injection
ScalaUA - distage: Staged Dependency Injection
7mind
 
Unit testing using Munit Part 1
Unit testing using Munit Part 1Unit testing using Munit Part 1
Unit testing using Munit Part 1
Anand kalla
 
Automated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsAutomated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra Solutions
Quontra Solutions
 
JAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & JasmineJAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & Jasmine
Anup Singh
 
TDD Workshop UTN 2012
TDD Workshop UTN 2012TDD Workshop UTN 2012
TDD Workshop UTN 2012
Facundo Farias
 
Developers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomonDevelopers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomon
Ineke Scheffers
 
2014 Joker - Integration Testing from the Trenches
2014 Joker - Integration Testing from the Trenches2014 Joker - Integration Testing from the Trenches
2014 Joker - Integration Testing from the Trenches
Nicolas Fränkel
 
SynapseIndia drupal presentation on drupal info
SynapseIndia drupal  presentation on drupal infoSynapseIndia drupal  presentation on drupal info
SynapseIndia drupal presentation on drupal info
Synapseindiappsdevelopment
 
Drupalcamp Simpletest
Drupalcamp SimpletestDrupalcamp Simpletest
Drupalcamp Simpletestlyricnz
 
Testing In Drupal
Testing In DrupalTesting In Drupal
Testing In Drupal
Ryan Cross
 
The Test way
The Test wayThe Test way
The Test way
Mikhail Grinfeld
 
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
Bruno Tanoue
 
Agile Engineering Sparker GLASScon 2015
Agile Engineering Sparker GLASScon 2015Agile Engineering Sparker GLASScon 2015
Agile Engineering Sparker GLASScon 2015
Stephen Ritchie
 
(Agile) engineering best practices - What every project manager should know
(Agile) engineering best practices - What every project manager should know(Agile) engineering best practices - What every project manager should know
(Agile) engineering best practices - What every project manager should know
Richard Cheng
 
TDD for joomla extensions
TDD for joomla extensionsTDD for joomla extensions
TDD for joomla extensions
Roberto Segura
 
Python and test
Python and testPython and test
Python and test
Micron Technology
 
Agile Engineering Best Practices by Richard Cheng
Agile Engineering Best Practices by Richard ChengAgile Engineering Best Practices by Richard Cheng
Agile Engineering Best Practices by Richard Cheng
Excella
 
Testing 101
Testing 101Testing 101
Testing 101
Noam Barkai
 
Simple test drupal7_presentation_la_drupal_jul21-2010
Simple test drupal7_presentation_la_drupal_jul21-2010Simple test drupal7_presentation_la_drupal_jul21-2010
Simple test drupal7_presentation_la_drupal_jul21-2010
Miguel Hernandez
 
Testing & should i do it
Testing & should i do itTesting & should i do it
Testing & should i do it
Martin Sykora
 

Similar to Hadoop testing workshop - july 2013 (20)

ScalaUA - distage: Staged Dependency Injection
ScalaUA - distage: Staged Dependency InjectionScalaUA - distage: Staged Dependency Injection
ScalaUA - distage: Staged Dependency Injection
 
Unit testing using Munit Part 1
Unit testing using Munit Part 1Unit testing using Munit Part 1
Unit testing using Munit Part 1
 
Automated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsAutomated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra Solutions
 
JAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & JasmineJAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & Jasmine
 
TDD Workshop UTN 2012
TDD Workshop UTN 2012TDD Workshop UTN 2012
TDD Workshop UTN 2012
 
Developers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomonDevelopers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomon
 
2014 Joker - Integration Testing from the Trenches
2014 Joker - Integration Testing from the Trenches2014 Joker - Integration Testing from the Trenches
2014 Joker - Integration Testing from the Trenches
 
SynapseIndia drupal presentation on drupal info
SynapseIndia drupal  presentation on drupal infoSynapseIndia drupal  presentation on drupal info
SynapseIndia drupal presentation on drupal info
 
Drupalcamp Simpletest
Drupalcamp SimpletestDrupalcamp Simpletest
Drupalcamp Simpletest
 
Testing In Drupal
Testing In DrupalTesting In Drupal
Testing In Drupal
 
The Test way
The Test wayThe Test way
The Test way
 
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
 
Agile Engineering Sparker GLASScon 2015
Agile Engineering Sparker GLASScon 2015Agile Engineering Sparker GLASScon 2015
Agile Engineering Sparker GLASScon 2015
 
(Agile) engineering best practices - What every project manager should know
(Agile) engineering best practices - What every project manager should know(Agile) engineering best practices - What every project manager should know
(Agile) engineering best practices - What every project manager should know
 
TDD for joomla extensions
TDD for joomla extensionsTDD for joomla extensions
TDD for joomla extensions
 
Python and test
Python and testPython and test
Python and test
 
Agile Engineering Best Practices by Richard Cheng
Agile Engineering Best Practices by Richard ChengAgile Engineering Best Practices by Richard Cheng
Agile Engineering Best Practices by Richard Cheng
 
Testing 101
Testing 101Testing 101
Testing 101
 
Simple test drupal7_presentation_la_drupal_jul21-2010
Simple test drupal7_presentation_la_drupal_jul21-2010Simple test drupal7_presentation_la_drupal_jul21-2010
Simple test drupal7_presentation_la_drupal_jul21-2010
 
Testing & should i do it
Testing & should i do itTesting & should i do it
Testing & should i do it
 

Recently uploaded

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Hadoop testing workshop - july 2013

  • 1. Hadoop Testing Workshop Ophir Cohen Data Platform Leader, ophirc@liveperson.com July 2013
  • 2. Agenda 1. Connection Before Content 2. Testing Fundamental 3. Unit Tests 4. Integration Tests 5. Try it out 6. Performance 7. Diagnostics
  • 3. Why Testing 1. Catch bugs early in the developing cycle 2. Transparency of current project status 3. Easy developing / refactoring: immediate feedback 4. Push developer to provide better and stable code 5. Decrease developing cycle times
  • 4. Why Automatic Testing? It isn't real question right?
  • 5. Testing Fundamental 1. Unit testing - functional verification of each 'unit' (method / class in Java) 2. Integration testing - verifies that the system works as a whole 3. Performance testing - test the efficiency of the program. Deepened by code AND cluster architecture 4. Diagnostic - the way to find problems in production. --> 1 + 2 should be done BEFORE production
  • 6. Unit Tests Key Features 1. Simple (up to 10 lines) 2. Isolation (no DB connection, no cluster dependency etc...) 3. Deterministics - PASS or FAIL 4. Automated (of course) Why Unit Tests 1. Prevent regression 2. Fast - no need of full MR env 3. Help in refactoring and updates
  • 7. Unit Tests - MR jobs Best Practices 1. Extract the tested code into isolated method/class 2. Do not test MR framework but pure Java 3. Use the same package for tests MRUnit 1. Lib for MR unit tests 2. Apache project 3. Supports testing of mappers, reducers and full job (without full cluster) 4. Supports counters testing (nice!)
  • 8. Unit Tests - Examples Unit Tests Code Example
  • 9. Integration Tests - background 1. Unit tests test each unit (Mapper/Reducer), integration test the integrated work 2. Test the integration with the framework 3. Does not limited by data volumes
  • 10. Integration Tests - tips and tricks Tips and tricks 1. Use MiniMRCluster / MiniDFSCluster for tests 2. Use Linux 3. Make dev == production 4. Use data sampling: a. Random sampling b. Biased sampling 5. Apache BigTop (never try that) 6. Use Cloudera CDH
  • 11. Lets play a bit 1. Checkout the code: git clone https://github.com/ophchu/mapreduce-tutorials.git 2. Make sure you manage to run the mapper test 3. Complete the MRUnit tests for the reducer and full job 4. Play with the MiniMRCluster/MiniDFSCluster test
  • 12. Performance Profiling (at a glance...) 1. Profile your code 2. Measure and tune what's matters to you 3. Benchmarking: micro and macro 4. Hadoop has a built-in profiler (e.g. using hprof)
  • 13. Cluster Performance 1. Terasort test hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4. 1.2.jar teragen 1000 /user/dataint/terasort/input hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4. 1.2.jar terasort /user/dataint/terasort/input /user/dataint/terasort/output 2. MRBench - MR benchmarking hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench -numRuns 2 -maps 10 -reduces 10 -inputLines 100 -inputType random 3. NNBench - Name Node benchmarking 4. TestDFSIO - write and read performance
  • 14. Diagnostics 1. Check web API (http://your_server:50030/jobtracker.jsp): a. Nodes: how many up, how many down, check slots b. Jobs: logs, failures, exceptions c. Counters: expected 2. Configuration: a. check job conf (job.xml) b. Check env conf (http://your_server:50030/conf) 3. Jobs history (http://your_server:50030/jobhistory.jsp) 4. Log dirs: a. Job tracker (http://your_server:50030/logs/) b. Task trakcers