2. Experiments
• 2000 files, each of 1 GB size.
• The number of nodes were increased in steps
of 2.
• Initially, the number of maps were not
specified. Number of map slots set to (number
of nodes * 16 * 10).
• Next, the number of maps were set by the –m
parameter.
3. Observations 1
• When number of maps was not specified, the
number of map tasks observed were 20.
• Some readings –
2 nodes – 89 m 15.540s
4 nodes – 23 m 15.156s
6 nodes – 22 m 54.151s
8 nodes – 22 m 2.209s
10 nodes – 21 m 42.285s
4. Observations 2
• Setting –m to 2000 in the distcp command
increased the number of maps to 2000.
• Some readings –
6 nodes – 17 m 9.829s
8 nodes – 16 m 40.988s
10 nodes – 10 m 19.093s
5. Observations 3
• Increasing the map slots does not have
significant effect on distcp performance when
the number of concurrent maps is greater.
• The decrease in time with increase in
parallelism was not always proportional, but
was evident till 10 nodes.