SlideShare a Scribd company logo
1 of 5
Distcp Benchmarking on Taito
Experiments
• 2000 files, each of 1 GB size.
• The number of nodes were increased in steps
of 2.
• Initially, the number of maps were not
specified. Number of map slots set to (number
of nodes * 16 * 10).
• Next, the number of maps were set by the –m
parameter.
Observations 1
• When number of maps was not specified, the
number of map tasks observed were 20.
• Some readings –
2 nodes – 89 m 15.540s
4 nodes – 23 m 15.156s
6 nodes – 22 m 54.151s
8 nodes – 22 m 2.209s
10 nodes – 21 m 42.285s
Observations 2
• Setting –m to 2000 in the distcp command
increased the number of maps to 2000.
• Some readings –
6 nodes – 17 m 9.829s
8 nodes – 16 m 40.988s
10 nodes – 10 m 19.093s
Observations 3
• Increasing the map slots does not have
significant effect on distcp performance when
the number of concurrent maps is greater.
• The decrease in time with increase in
parallelism was not always proportional, but
was evident till 10 nodes.

More Related Content

Viewers also liked (7)

Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
R
RR
R
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Statistics
StatisticsStatistics
Statistics
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
 
Ios
IosIos
Ios
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Taito

  • 2. Experiments • 2000 files, each of 1 GB size. • The number of nodes were increased in steps of 2. • Initially, the number of maps were not specified. Number of map slots set to (number of nodes * 16 * 10). • Next, the number of maps were set by the –m parameter.
  • 3. Observations 1 • When number of maps was not specified, the number of map tasks observed were 20. • Some readings – 2 nodes – 89 m 15.540s 4 nodes – 23 m 15.156s 6 nodes – 22 m 54.151s 8 nodes – 22 m 2.209s 10 nodes – 21 m 42.285s
  • 4. Observations 2 • Setting –m to 2000 in the distcp command increased the number of maps to 2000. • Some readings – 6 nodes – 17 m 9.829s 8 nodes – 16 m 40.988s 10 nodes – 10 m 19.093s
  • 5. Observations 3 • Increasing the map slots does not have significant effect on distcp performance when the number of concurrent maps is greater. • The decrease in time with increase in parallelism was not always proportional, but was evident till 10 nodes.