• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Performance analysis in a multitenant cloud environment Using Hadoop Cluster and Oracle Solaris 11
 

Performance analysis in a multitenant cloud environment Using Hadoop Cluster and Oracle Solaris 11

on

  • 555 views

Analyzing the performance of a virtualized multitenant cloud environment can be challenging because of the layers of abstraction. This article shows how to use Oracle Solaris 11 to overcome those ...

Analyzing the performance of a virtualized multitenant cloud environment can be challenging because of the layers of abstraction. This article shows how to use Oracle Solaris 11 to overcome those limitations.
For more information see:
http://www.oracle.com/technetwork/articles/servers-storage-admin/perf-analysis-multitenant-cloud-2082193.html

Statistics

Views

Total Views
555
Views on SlideShare
553
Embed Views
2

Actions

Likes
0
Downloads
18
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Performance analysis in a multitenant cloud environment Using Hadoop Cluster and Oracle Solaris 11 Performance analysis in a multitenant cloud environment Using Hadoop Cluster and Oracle Solaris 11 Presentation Transcript

    • Seminar: Performance Analysis in a Multitenant Cloud Environment Using Hadoop Cluster and Oracle Solaris 11 Presenter: Orgad Kimchi Principal Software Engineer Oracle
    • The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
    • Overview Analyzing the performance of a virtualized multi-tenant cloud environment can be challenging because of the layers of abstraction. • Each type of virtualization software adds an abstraction layer to enable better manageability. • Each Oracle Solaris Zone can have different workload; it can be disk I/O, network I/O, CPU, memory, or combination of these. In addition, a single Oracle Solaris Zone can overload the entire system resources. • It is very difficult to observe the environment; you need to be able to monitor the environment from the top level to see all the virtual instances (non-global zones) in real time with the ability to drill down to specific resources.
    • Introduction to Hadoop The Apache Hadoop software is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. To store data, Hadoop uses the Hadoop Distributed File System (HDFS), which provides high-throughput access to application data and is suitable for applications that have large data sets. The Hadoop cluster building blocks are as follows: NameNode: The centerpiece of HDFS, which stores file system metadata, directs the slave DataNode daemons to perform the low-level I/O tasks, and also runs the JobTracker process. Secondary NameNode: Performs internal checks of the NameNode transaction log. DataNodes: Nodes that store the data in HDFS, which are also known as slaves and run the TaskTracker process.
    • Hadoop Architecture Overview
    • Solaris Architecture Overview
    • Overview Best practice for any performance analysis is to get a bird's eye view of the running environment in order to see which resource is the busiest, and then drill down to each resource. We will the Solaris 11 zonestat and the fsstat commands in order to answer this question.
    • Benchmark Description The first Hadoop benchmark that we are going to run to load our environment is Pi Estimator. Pi Estimator is a MapReduce program that employs a Monte Carlo method to estimate the value of pi. In this example, we're going to use 128 maps and each of the maps will compute one billion samples (for a total of 128 billion samples). root@global_zone:~# zlogin -l hadoop name-node hadoop jar /usr/local/hadoop/hadoop-examples-1.2.0.jar pi 128 1000000000 Where: “zlogin -l hadoop name-node” running the command as user hadoop on the name-node zone “hadoop jar /usr/local/hadoop/hadoop-examples-1.2.0.jar pi” the hadoop jar file “128” the number of maps “1000000000” the number of samples
    • zonestat The zonestat allows us to monitor all the Oracle Solaris Zones running in our environment and provides real-time statistics for the CPU utilization, memory utilization, and network utilization. Run the zonestat command at 10-second intervals root@global_zone:~# zonestat 10 10 Interval: 1, Duration: 0:00:10 SUMMARY Cpus/Online: 128/12 PhysMem: 256G VirtMem: 259G ---CPU---- --PhysMem-- --VirtMem-- --PhysNet-ZONE USED %PART USED %USED USED %USED PBYTE %PUSE [total] 118.10 92.2% 24.6G 9.62% 60.0G 23.0% 18.4E 100% [system] 0.00 0.00% 9684M 3.69% 40.5G 15.5% data-node3 42.13 32.9% 4897M 1.86% 6146M 2.30% 18.4E 100% data-node1 41.49 32.4% 4891M 1.86% 6173M 2.31% 18.4E 100% data-node2 33.97 26.5% 4851M 1.85% 6145M 2.30% 18.4E 100% global 0.34 0.27% 283M 0.10% 420M 0.15% 2192 0.00% name-node 0.15 0.11% 419M 0.15% 718M 0.26% 126 0.00% sec-name-node 0.00 0.00% 205M 0.07% 363M 0.13% 0 0.00% As we can see from the zonestat output the Pi program is CPU bound application CPU (%PART 92.2.0%).
    • mpstat Another useful command to show if CPU utilization balanced evenly across the available CPUs is the mpstat command. The following is the output of the Oracle Solaris mpstat(1M) command. Each line represents one virtual CPU. root@global_zone:~# mpstat 1 CPU minf mjf xcal intr ithr csw icsw migr smtx srw 0 85 0 10183 683 59 931 40 269 464 1 80 0 34872 484 9 1096 39 317 498 2 72 0 15632 325 4 669 30 166 334 3 42 0 13422 253 3 553 32 144 277 syscl usr sys wt 2 1315 30 14 2 1437 34 14 1 1321 37 9 2 818 31 7 idl 0 56 0 51 0 54 0 62 on system with many CPUs the mpstat output can be very long; we can monitor CPU utilization per core. root@global_zone:~# mpstat -A core 10 COR minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt 3074 103 0 23654 1680 697 1264 644 277 502 10 11268 748 52 3078 95 0 32090 893 137 1228 635 281 439 8 10929 759 41 3082 94 0 31574 889 129 1245 629 308 560 9 12792 753 47 3086 111 0 20262 829 121 1200 615 277 512 7 12657 753 47 idl 0 0 0 0 sze 0 0 0 0 In addition mpstat can print performance statistics pet socket or processor-set, for more examples see mpstat(1M). 8 8 8 8
    • mpstat Visualization We can visualize the output of the mpstat command using the dim_stat tool
    • fsstat The fsstat command allows us to monitor disk I/O activity per disk or per Oracle Solaris Zone For example, we can monitor the writes to all ZFS file systems at 10-second intervals root@global_zone:~# fsstat -Z zfs 10 10 new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 0 0 0 0 0 0 0 0 0 0 744 0 0 151 0 359 0 413 0 14 0 14 11.4K 0 6.01K 5.87M 0 0 3.27K 0 1.41K 1.94M 0 8.72K 0 2.75K 3.95M 0 9.03K 0 2.98K 4.22M 0 51 0 0 0 0 51 0 0 0 0 zfs:global 7 1.42K zfs:data-node1 22 4.06K zfs:data-node2 21 4.34K zfs:data-node3 0 0 zfs:name-node 0 0 zfs:sec-name-node The default report shows general file system activity. This display combines similar operations into general categories as follows:
    • vmstat Based on the zonestat, mpstat, and fsstat output, the conclusion is that the Pi Estimator program is a CPU-bound application So let's continue our CPU performance analysis. The next question that we are going to ask is whether there idle CPU time. root@global_zone:~# vmstat 1 kthr memory page disk r b w swap free re mf pi po fr de sr s3 s4 s5 s6 faults in sy cpu cs us sy id kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s3 s4 s5 s6 in sy cs us sy id 8 0 0 213772168 245340872 770 5954 0 0 0 0 0 0 0 0 0 17732 161637 39181 93 7 0 12 0 0 213346168 244887200 134 2237 0 0 0 0 0 0 0 0 0 13689 140604 19640 96 4 0 17 0 0 212974464 244353760 124 1939 0 0 0 0 0 0 0 0 0 12079 130895 17225 96 4 0 A value of 0 in the id column means that the system's CPU is 100 percent busy!
    • prstat You can also track run queue latency by using the prstat -Lm command and noting the value in the LAT column. we can use the prstat command to see whether the CPU cycles are being consumed in user mode or in system (kernel) mode: root@global_zone:~# prstat –ZmL Total: 310 processes, 8269 lwps, load averages: 47.63, 48.79, 36.98 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 19338 hadoop 100 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0 73 0 0 java/2 19329 hadoop 100 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0 86 0 0 java/2 19519 hadoop 84 15 0.1 0.0 0.2 0.0 0.0 0.8 56 153 29K 0 java/2 19503 hadoop 88 11 0.1 0.0 0.3 0.1 0.0 1.0 52 168 23K 3 java/2 The prstat output shows that the system CPU cycles are being consumed in user mode (USR)
    • Virtualizasion-aware When you use the -Z option, it prints under an additional ZONE column header the name of the zone with which the process is associated. Note: The command is aware that it's running within a non-global zone; thus, it can't see other user processes when running from the non-global zone. For example, to print all the Hadoop processes that are running now, root@global_zone:~# ps -efZ | grep hadoop ZONE UID PID PPID C STIME TTY TIME CMD data-nod hadoop 14024 11795 0 07:38:19 ? 0:20 /usr/jdk/instances/jdk1.6.0/jre/bin/java -Djava.library.path=/usr/local/hadoopdata-nod hadoop 14026 11798 0 07:38:19 ? 0:19 /usr/jdk/instances/jdk1.6.0/jre/bin/java -Djava.library.path=/usr/local/hadoopname-nod hadoop 11621 1 0 07:20:12 ? 0:59 /usr/java/bin/java -Dproc_jobtracker -Xmx1000m -Dcom.sun.management.jmxremote -
    • Virtualizasion Aware We want to observe the application that is responsible for the load. For example, what code paths are making the CPUs busy? And which process in each zone is responsible for the system load? In the next example, we are going to drill down into one of the Oracle Solaris Zones to understand which application or process is responsible to the load. Let's login into the data-node1 zone. root@global_zone:~# zlogin data-node1 We can use the prstat command inside the zone to see which process is responsible for the system load. root@data-node1:~# prstat –mLc PID USERNAME USR 22866 root 22715 hadoop 22704 hadoop SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 24 74 1.6 0.0 0.0 0.0 0.0 0.0 122 122 85K 0 prstat/1 80 3.3 0.1 0.0 0.0 4.0 0.1 12 45 201 4K 4 java/2 80 3.3 0.2 0.0 0.0 6.2 0.4 10 61 277 4K 10 java/2
    • DISK I/O Performance Monitoring
    • DISK I/O -Cont'd The first command that we are going to use in the disk I/O performance observation in the fsstat command, which allows us to analyze disk I/O workload per Oracle Solaris Zone and see file system statistics for each file system. The following example shows per-zone statistics for zones data-node1, data-node2, and data-node3 as well as a system-wide aggregate for the tmpfs and zfs file systems. root@global_zone:~# fsstat -A -Z tmpfs zfs 10 10 new name file remov 126 0 0 0 20 0 52 0 0 0 0 0 54 0 156 0 0 0 52 0 52 0 0 0 0 0 52 0 name attr chng get 128 1.57K 0 0 20 260 52 612 0 40 0 40 56 656 162 1.78K 0 0 54 511 54 512 0 140 0 140 54 518 attr lookup rddir set ops ops 512 15.9K 0 0 0 0 80 2.55K 0 208 6.36K 0 0 70 0 0 70 0 224 6.83K 0 0 22.9K 0 0 3 0 0 4.52K 0 0 8.46K 0 0 514 0 0 510 0 0 8.95K 0 read ops 0 0 0 0 0 0 0 28 2 0 12 1 0 13 read bytes 0 0 0 0 0 0 0 3.16K 599 0 1.28K 4 0 1.29K write ops 127 0 20 52 0 0 55 175K 0 58.3K 58.3K 106 0 58.3K write bytes 15.9K 0 2.50K 6.50K 0 0 6.88K 5.45G 0 1.82G 1.82G 19.2K 0 1.81G tmpfs tmpfs:global tmpfs:data-node2 tmpfs:data-node3 tmpfs:name-node tmpfs:sec-name-node tmpfs:data-node1 zfs zfs:global zfs:data-node2 zfs:data-node3 zfs:name-node zfs:sec-name-node zfs:data-node1
    • DISK I/O -Cont'd Next, we want to pinpoint our measurements for a specific Oracle Solaris Zone. The following example shows per-zone statistics for zones data-node1 data-node2 and data-node3, As well as a system-wide aggregate, for the tmpfs and zfs file systems. root@global_zone:~# fsstat -A -Z -z data-node1 -z data-node2 -z data-node3 tmpfs zfs 10 10 new name name file remov chng 140 13 116 20 0 20 node2 57 5 46 node3 63 8 50 node1 154 0 94 52 0 32 52 0 32 50 0 30 attr attr lookup rddir read read write write get set ops ops ops bytes ops bytes 3.16K 512 42.7K 16 242 926K 250 342K tmpfs 266 80 2.56K 0 0 0 20 2.50K tmpfs:data1.35K 204 19.2K 8 115 436K 113 170K tmpfs:data- 1.47K 228 20.8K 8 127 491K 117 170K tmpfs:data- 7.74K 445 2.98K 3.04K 0 0 0 0 85.6K 4.25K 31.0K 32.9K 40 20.9K 29.8M 127K 0 0 0 43.0K 20 6.63K 10.9M 43.1K 20 7.21K 11.9M 41.0K 3.96G 1.34G 1.34G 1.28G zfs zfs:data-node2 zfs:data-node3 zfs:data-node1
    • DISK I/O -Cont'd Next, we are going to drill down to watch individual disk read and write operations. First let’s get the ZFS pool names. root@global_zone:~# zpool list NAME data-node1-pool data-node2-pool data-node3-pool rpool SIZE 556G 556G 556G 278G ALLOC 56.7G 56.3G 56.4G 21.7G FREE 499G 500G 500G 256G CAP 10% 10% 10% 7% We can see that we have four ZFS zpools DEDUP 1.00x 1.00x 1.00x 1.00x HEALTH ONLINE ONLINE ONLINE ONLINE ALTROOT -
    • DISK I/O -Cont'd We can monitor all the ZFS zpools at the same time using the following command: root@global_zone:~# zpool iostat -v 10 pool ------------------------data-node1-pool c0t5000CCA0160D3264d0 ------------------------data-node2-pool c0t5000CCA01612A4F0d0 ------------------------data-node3-pool c0t5000CCA016295ABCd0 ------------------------rpool c0t5001517803D013B3d0s0 ------------------------- capacity alloc free ----- ----31.1G 525G 31.1G 525G ----- ----31.0G 525G 31.0G 525G ----- ----31.0G 525G 31.0G 525G ----- ----22.0G 256G 22.0G 256G ----- ----- operations read write ----- ----2 9 2 9 ----- ----2 10 2 10 ----- ----1 9 1 9 ----- ----10 7 10 7 ----- ----- bandwidth read write ----- ----124K 6.49M 124K 6.49M ----- ----91.0K 6.50M 91.0K 6.50M ----- ----103K 6.49M 103K 6.49M ----- ----95.0K 64.1K 95.0K 64.1K ----- -----
    • DISK I/O -Cont'd We can also use the iostat command to see how fast the disk I/O operations are being processed on a per-device basis. root@global_zone:~# iostat -xnz 5 10 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.6 10.8 47.9 3765.1 0.0 0.2 0.1 16.4 0 2 c0t5001517803D013B3d0 1.2 7.1 365.9 2238.4 0.0 0.2 0.1 19.6 0 2 c0t5000CCA0160D3264d0 0.9 8.5 279.4 2237.7 0.0 0.2 0.1 16.7 0 2 c0t5000CCA01612A4F0d0 1.1 8.8 335.9 2237.2 0.0 0.2 0.1 16.3 0 2 c0t5000CCA016295ABCd0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 16.6 0.0 50.1 0.0 0.0 0.0 0.3 0 0 c0t5001517803D013B3d0 31.0 15.6 13346.7 44.4 0.0 0.8 0.0 17.1 0 12 c0t5000CCA0160D3264d0 0.0 15.0 0.0 47.0 0.0 0.0 0.0 1.8 0 1 c0t5000CCA016295ABCd0 extended device statistics
    • DISK I/O -Cont'd Another useful tool is the iotop DTrace script, which displays top disk I/O events by process per Oracle Solaris Zone root@global_zone:~# /usr/dtrace/DTT/iotop -Z 10 10 Tracing... Please wait. 2013 Oct ZONE 0 0 0 7 08:40:19, PID 717 5 896 PPID 0 0 0 load: 24.38, CMD zpool-data-node3 zpool-rpool zpool-data-node1 disk_r: DEVICE sd6 sd3 sd4 0 KB, MAJ MIN D 73 48 W 73 24 W 73 32 W disk_w: 1886 KB BYTES 347648 417280 1195520 You can see the zone ID (ZONE), process ID (PID), type of operation (read or write, D), and total size of the operation (BYTES).
    • DISK I/O -Cont'd Next, we can analyze the disk I/O pattern to determine whether it is random or sequential by using the DTrace iopattern script root@global_zone:~# /usr/dtrace/DTT/iopattern %RAN %SEQ COUNT MIN MAX AVG KR 69 31 236 1024 1048576 448830 103441 75 25 577 512 1048576 327938 184306 92 8 598 512 1048576 198293 114275 74 26 379 512 1048576 330296 121954 66 34 281 1024 1048576 500550 137358 80 20 346 1024 1048576 332114 112218 81 19 444 512 1048576 290734 124694 65 35 337 512 1048576 490375 161139 75 25 704 512 1048576 353086 241105 75 25 444 1024 1048576 386634 167642 77 23 666 1024 1048576 397105 258274 77 23 853 512 1048576 385908 320740 77 23 525 512 1048576 345048 175352 68 32 253 512 1048576 508290 125355 64 36 237 1024 1048576 501317 116027 KW 0 479 1525 294 0 0 1366 244 1642 0 0 725 1553 228 0
    • Monitoring Memory Utilization First let’s print how much physical memory the system has. root@global_zone:~# prtconf -v | grep Mem Memory size: 262144 Megabytes We can see that we have 256 GB of memory in the system. Second, let's get more information about how the system memory is being allocated root@global_zone:~# echo ::memstat | mdb -k Page Summary Pages --------------------------Kernel 1473974 ZFS File Data 4990336 Anon 2223697 Exec and libs 3342 Page cache 5244141 Free (cachelist) 27122 Free (freelist) 19591820 Total 33554432 MB ---------------11515 38987 17372 26 40969 211 153061 262144 %Tot ---4% 15% 7% 0% 16% 0% 58%
    • Monitoring Memory -Cont'd To find how much free memory is currently available in the system, we can use the vmstat command, as shown in Listing 23, and look at the value in the free column (the unit is KB) for any line other than the first line. root@global_zone:~# vmstat 10 kthr memory page r b w swap free re mf pi po fr de 1 0 0 202844144 233325872 315 1311 0 0 4 0 0 110774160 142093304 347 3681 0 0 5 0 0 110862440 142055728 347 3671 0 0 3 0 0 111113056 142043608 331 3525 0 0 disk faults cpu sr s3 s4 s5 s6 in sy cs us sy id 0 0 1 15 19 19 18 23352 32919 46222 3 4 93 0 0 0 0 27 15 18 72275 48754 148884 1 11 88 0 0 0 19 15 22 16 72286 48292 148838 1 11 88 0 0 0 0 20 29 20 70099 49362 143970 1 11 88 The command output shows that the system has about 138 GB of free memory. This is the memory that has no association with any file or process.
    • Monitoring Memory -Cont'd We can use is the prstat command in order to see process statistics for the system and virtual machines (non-global zones) is the prstat command root@global_zone:~# prstat -ZmLc PID USERNAME SIZE RSS STATE 20025 hadoop 293M 253M cpu60 20739 hadoop 285M 241M sleep 17206 hadoop 285M 237M sleep 17782 hadoop 281M 229M sleep 17356 hadoop 289M 241M sleep 11621 hadoop 166M 126M sleep ZONEID NPROC SWAP 4 74 7246M 3 53 7442M 5 52 7108M 2 32 675M 0 82 870M Total: 322 processes, PRI NICE 59 0 59 0 59 0 59 0 59 0 59 0 TIME CPU PROCESS/NLWP 0:00:49 12% java/68 0:00:49 10% java/68 0:01:07 10% java/68 0:00:57 7.4% java/67 0:01:04 7.0% java/68 0:02:32 5.9% java/90 RSS MEMORY TIME CPU ZONE 6133M 2.3% 2:31:34 43% data-node2 6248M 2.4% 2:23:01 30% data-node1 6001M 2.3% 2:27:40 22% data-node3 468M 0.1% 0:04:36 4.0% name-node 414M 0.1% 1:19:20 1.0% global 8024 lwps, load averages: 15.54, 18.25, 20.09 From the prstat output, we can see the following information for each Oracle Solaris Zone: The SWAP column shows the total virtual memory size for each zone. The RSS column shows the total zone-resident set size (main memory usage). The MEMORY column shows the main memory consumed, as a percentage of system-wide resources. The CPU column shows the CPU consumed, as a percentage of system-wide resources. The ZONE column shows each zone's name.
    • Monitoring Memory -Cont'd We can use to monitor memory utilization in a virtualized environment is the zvmstat command, which prints vmstat output for each zone root@global_zone:~# /usr/dtrace/DTT/Bin/zvmstat 10 ZONE re global 273 sec-name-node name-nodenode data-node1ode data-node2ode data-node3ode mf fr 218 0 0 0 0 0 0 0 0 0 0 0 sr 0 0 0 0 0 0 epi 0 0 0 0 0 0 epo 0 0 0 0 0 0 epf 0 0 0 0 0 0 api 0 0 0 0 0 0 apo 0 0 0 0 0 0 apf 0 0 0 0 0 0 fpi 0 0 0 0 0 0 fpo 0 0 0 0 0 0 fpf 0 0 0 0 0 0 0 0 0 0 0
    • Monitoring Network •In the Hadoop cluster, most of the network traffic is for HDFS data replication between the DataNodes. •The questions that we will be answering are as follows: •Which zones are seeing the highest and lowest network traffic? •Which is the busiest zone in terms of the number of network connections that it handles currently? •How can we monitor specific network resources, for example, Oracle Solaris Zones, physical network cards, or virtual network interface cards (VNICs).
    • Network Architecture Layout
    • Monitoring Network -Cont'd First let's view our network setup by using the dladm command to show how many physical network cards we have: root@global_zone:~# dladm show-phys LINK net0 net2 net1 net3 net4 MEDIA Ethernet Ethernet Ethernet Ethernet Ethernet Let's prints the VNIC information root@global_zone:~# dladm show-vnic STATE up unknown unknown unknown up SPEED 1000 0 0 0 10 DUPLEX full unknown unknown unknown full DEVICE ixgbe0 ixgbe2 ixgbe1 ixgbe3 usbecm0
    • Monitoring Network -Cont'd We can use the zonestat command with the -r and -x options for extended networking information to pinpoint our measurements to specific Oracle Solaris Zones, for example, monitoring the network traffic on three DataNode zones (data-node1, data-node2, and data-node3) root@global_zone:~# zonestat -z data-node1 -z data-node2 -z data-node3 -r network -x 10 Collecting data for first interval... Interval: 1, Duration: 0:00:10 NETWORK-DEVICE SPEED STATE TYPE net0 1000mbps up phys ZONE LINK TOBYTE MAXBW %MAXBW PRBYTE %PRBYTE POBYTE %POBYTE [total] net0 269M 198 0.00% 18.4E 100% global net0 2642 13474770085G 198 0.00% 284 0.00% data-node1 data-node1/net0 93.6M 0 0.00% 18.4E 100% data-node3 data-node3/net0 91.3M 0 0.00% 18.4E 100% data-node2 data-node2/net0 84.4M 0 0.00% 18.4E 100% name-node name-node/net0 304K 0 0.00% 18.4E 100% data-node3 data_node3 2340 0 0.00% 0 0.00% sec-name-node sec-name-node/net0 2340 0 0.00% 0 0.00% data-node2 data_node2 2280 0 0.00% 0 0.00% name-node name_node1 2280 0 0.00% 0 0.00% data-node1 data_node1 2220 0 0.00% 0 0.00% sec-name-node secondary_name1 2220 0 0.00% 0 0.00%
    • Monitoring Network -Cont'd We can drill down to a specific network resource for example: We can monitor physical network interface (net0). root@global_zone:~# dlstat net0 -i 10 LINK ^C IPKTS RBYTES net0 39.41K net0 45 net0 43 net0 41 OPKTS OBYTES 2.63M 8.16K 2.74K 1 2.61K 1 2.47K 1 1.44M 198 150 150 Note: To stop the dlstat command, press Ctrl-c.  We can monitor only the VNIC which is associated with the data-node1 zone. root@global_zone:~# dlstat name_node1 -i 10 LINK data_node1 data_node1 data_node1 data_node1 IPKTS 26.30K 42 43 31 RBYTES 1.59M 2.70K 2.58K 1.86K OPKTS 0 0 0 0 OBYTES 0 0 0 0
    • Monitoring Network -Cont'd We can also monitor our network traffic on a specific TCP or UDP port. This is useful if we want to monitor how the data replication between two Hadoop clusters is progressing, for example, data being replicated from Hadoop cluster A to Hadoop cluster B, which is located in a different data center
    • Monitoring Network -Cont'd Flow is a sophisticated quality of service (QoS) mechanism built into the new Oracle Solaris 11 network virtualization architecture, and it allows us to measure or limit the network bandwidth for a specific network port on a specific network interface In the following example, we will set up a flow that is associated with the TCP 8020 network port on the name_node1 network interface. Create the flow root@name-node:~# flowadm add-flow -l name_node1 -a transport=TCP,local_port=8020 distcp-flow Note: You don't need to reboot the zone in order to enable or disable the flow. This is very useful when you need to debug network performance issues on a production system! Verify the flow creation: root@name_node:~# flowadm show-flow FLOW LINK distcp-flow name_node1 IPADDR -- PROTO LPORT RPORT DSFLD tcp 8020 --
    • Monitoring Network -Cont'd To report the bandwidth on the distcp-flow flow, which monitors TCP port 8020, use the command shown root@name_node:~# flowstat -i 1 FLOW IPKTS RBYTES IDROPS OPKTS OBYTES ODROPS distcp-flow 24.72M 37.17G 0 3.09M 204.08M 0 distcp-flow 749.28K 1.13G 0 93.73K 6.19M 0 distcp-flow 783.68K 1.18G 0 98.03K 6.47M 0 distcp-flow 668.83K 1.01G 0 83.66K 5.52M 0 distcp-flow 783.87K 1.18G 0 98.07K 6.47M 0 distcp-flow 775.34K 1.17G 0 96.98K 6.40M 0 distcp-flow 777.15K 1.17G 0 97.21K 6.42M 0 ^C Note: To stop the flowstat command, press Ctrl-c.
    • Conclusion In this slide deckwe saw how we can leverage the new Oracle Solaris 11 performance analysis tools to observe and monitor a virtualized environment that hosts a Hadoop cluster. For more information: MY Blog https://blogs.oracle.com/vreality Hands on lab http://www.oracle.com/technetwork/systems/hands-on-labs/hol-setuphadoop-solaris-2041770.html How to Set Up a Hadoop Cluster Using Oracle Solaris Zones http://www.oracle.com/technetwork/articles/servers-storage-admin/howto-setuphadoop-zones-1899993.html Performance Analysis in a Multitenant Cloud Environment http://www.oracle.com/technetwork/articles/servers-storage-admin/perf-analysismultitenant-cloud-2082193.html
    • Questions