• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Application/OS performance: What does it depend on? Hands-on Lab
 

Application/OS performance: What does it depend on? Hands-on Lab

on

  • 854 views

HP Technology Services Master Technologists Chris and Greg Tinker will demonstrate the advanced debugging and technical tactics HP Enterprise Technical Services engineers use to triage back-office IT ...

HP Technology Services Master Technologists Chris and Greg Tinker will demonstrate the advanced debugging and technical tactics HP Enterprise Technical Services engineers use to triage back-office IT events that could critically impact the business. This is a deep technical session with engineers demonstrating the methodologies they employ to address enterprise application and operating system issues.

Statistics

Views

Total Views
854
Views on SlideShare
854
Embed Views
0

Actions

Likes
0
Downloads
17
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Application/OS performance: What does it depend on? Hands-on Lab Application/OS performance: What does it depend on? Hands-on Lab Presentation Transcript

    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. ApplicationOSperformance What does it depend on? Greg Tinker – HP Master Technologist Chris Tinker – HP Master Technologist Month day, 2013
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3 My background Title HP Master Technologist IT industry experience • Published Author • Patents pending • Social media/white papers Professional information • HP MVP • Social media ambassador Years at HP 14 Current responsibilities • Lead technologist for HP’s Global Solution Support Engineering (GSSE) team Name: Chris Tinker E-mail: chris.tinker@hp.com
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4 My background Title HP Master Technologist IT industry experience • Published Author • Patents pending • Social media/white papers Professional information • HP MVP • Social media ambassador Years at HP 14 Current responsibilities • Lead technologist for HP’s Global Solution Support Engineering (GSSE) team Name: Greg Tinker E-mail: greg.tinker@hp.com
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Applicationperformance
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 The stack Layer overview UserSpace Applications ~~ User Code GNU C lib KernelSpace System Call Interface VFS (ext3, NTFS, VxFS, etc) Page alloc MPIO – device mapper Char devices LVM, VxVM, sd<alpha> BLK DV DriversSCSI IDE Etc… sockets memory process Tasks scheduler Interrupts CPU VM logical protocols Net DrvBUS Dvrs
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 Overview Application Performance Application Execution Data Access Managing resources Platform Architecture
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 Architecture CPU IA32 program on an X86_64 machine – can it run on a PA_RISC? Can an executable run on a machine for which it was not compiled? Performance trade offs MAGIC Originally used to determine binary object type exec_magic, demand_magic, shared_magic, shmem_magic; however, around 1999/2000 ELF was adopted as the new file format, replacing the magic
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 Architecture CPU • Instruction set – leverage branch prediction • Frequency • BUS • cache– L3, L2, and L1 (location from Cores: registers, AL Units, Branch UNITS, LS units, FP UNITS, etc) • CPU bus: – QPI – Intel QuickPath Interconnect – HTB – AMD Hyper Transport Bus – Frontside Bus – Older INTEL/AMD – RunWay bus – IA64 • NUMA
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 Architecture Execution – access to address space • Locality domains • Memory interleaving: NODE, Channel, Bank, Cell( depends on hardware) • OS’s ability to determine Locality domains and differentiate cost to each from each • SLIT – Advanced performance tuning option on HP Proliant BIOS systems • Integrity supports LDOMS – Locality domains
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 Architecture Execution – access to address space: interleaving • Memory bank interleaving When you use memory bank interleaving, data goes alternately to memory banks through the common memory channel connecting the DIMM banks and the integrated memory controller. Memory bank interleaving increases the probability that more DIMMs will remain in an active state (requiring more power) because the memory controller alternates between memory banks and between DIMMs. Memory bank interleaving is automatically enabled on a processor node under the following conditions: • Two single-rank DIMMs per channel result in two-way bank interleaving. • Two dual-rank DIMMs per channel result in four--way bank interleaving. • Two quad-rank DIMMs per channel result in eight-way bank interleaving. • Two dual-rank DIMMs and one quad-rank DIMM result in eight-way bank interleaving, in servers using three DIMMs per channel.
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 Architecture Execution – access to address space: interleaving Memory channel interleaving Memory channel interleaving transfers data by alternate routing through the two available memory channels. As a result, when the memory controller must access a block of logically contiguous memory, the requests don’t stack up in the queue of a single channel. Alternate routing decreases memory access latency and increases performance. However, memory channel interleaving increases the probability that more DIMMs must remain in an active state. Memory channel interleaving is always active on AMD Opteron 6200 Series processors.
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 Architecture Execution – access to address space: interleaving Memory node interleaving Node interleaving can interleave memory across any subset of nodes in the multi-processor system. Memory Cell interleaving The way a multi-cell machine would interleave memory (cell local vs. global see superdome partitioning)
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14 Architecture PA - Runway CC CPU P0 Runway Runway Runway Runway CPU P2 CPU P1 CPU P3 MID1 Data Quad 2 Quad 3 Quad 0 Quad 1 MID0 Data MID0 Adr + Ctl MID1 Adr + Ctl M2 M2M2 M2 M2 M2 M2 M2 Legacy Superdome cell
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15 Architecture INTEL - FSB Legacy FSB
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16 Architecture AMD HTB DL685 Hyper Transport BUS
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17 Architecture Intel QPI *http://www.intel.com/content/dam/staging/image/Kim/quickpath-technology.png
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18 Architecture BUS limits Bandwidth is limited by the lanes and the protocols Manufactures standardize on a PCI bus for the cards & slots • 2X 32bit PCI @ 33 Mhz ~125 MB/s • 4X 64Bit PCI @ 33/66 Mhz • 4X 64Bit PCIX @ 66 Mhz • 4X 32Bit PCIX @ 133 Mhz • 8X 64Bit PCIX @ 133Mhz ~ 1024MB/s PCI-e replaces the above older PCI architecture… and is capable of hitting significantly higher signaling rates per lane 8Gbit/sec per lane! Expect this to increase as protocols become more efficient
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19 Architecture BUS limits Different types of memory have way different performance profiles! • Anywhere from 800Mhz to 1333MHz • http://h18004.www1.hp.com/products/servers/options/tool/hp_memtool.html
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20 Architecture BUS limits SLIT • Allows the BIOS to send the hardware layout to the OS • System locality Information Table • OS must support SLIT in order to leverage these latency factors
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Execution
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22 Execution Objects Compiled or interpreted • speed vs. agility – Interpreted can change at runtime.. Interpreted is Indirectly executed Compiled is directly executed Many languages today implement just-in-time compilers • PERL is compiled by the Perl engine before it is executed (so it is first interpreted, then compiled, then executed). Of course, you can compile PERL to produce an executable object.
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23 CPU Executable types Cross platform IA-64 ~ RX8600 32bit ELF X86_64 ~ DL980 PA RISC MIPS IA64 ELF IA32 ELF-64 / X86_64 PARISC MIPS Use of emulation engines ARIES − HP HPUX platform engine allows for PA RISC to execute on IA64 OS kernel and platform Binfmt − Linux driver module that allows for emulation of many architecture types Objects Execution
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24 Execution Language examples Compiled Interpreted C,C++,C# BASIC Visual Basic .NET PostScript Python Python Lisp Scripting Languages Java PERL PERL*
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25 Execution Determine object type # file <string> • Uses the magic to determine file type! # file /boot/vmlinuz-3.0.0-26-generic-pae /boot/vmlinuz-3.0.0-26-generic-pae: Linux kernel x86 boot executable bzImage, version 3.0.0-26-generic-pae (buildd@roseapple) #42-Ubuntu SMP Wed Sep , RO-rootFS, root_dev 0x801, swap_dev 0x4, Normal VGA # file /bin/ls /bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV) readelf -a /bin/ls | head -50 ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x804be34
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26 Sharing resources System V message queues Mutex locks Data sharing Context switching Data access The never forgiving sleep()  interrupt is a better way to go Execution
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27 Execution Processes and Threads execve() #include <unistd.h> int execve(const char *filename, char *const argv[], char *const envp[]); *filename ~ must be executable or shell with interpreter called out “#!”
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28 Execution Processes and threads Exec(), fork(),clone() .. Vfork(), clone2(), etc Examples: 16935 fork() = 17424 <-- NEW task's (HWP) 17424 execve("/bin/ls", ["ls", "-F", "--color=auto", "-l", "test"], [/* 56 vars */]) = 0
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29 Processes and threads HWP – Heavy Weight Process –forks() a new process LWP – light Weight Process – thread ~ clone() Major different is in sharing of resources HWP only shares the parent's text; whereas, a LWP can share everything but the private stack. HWP’s utilize pipes, PF_UNIX (Unix sockets), signals, or Inter-process Communication's shared memory, message queues, and semaphores to share data. Execution
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30 Processes and threads UNIX Processes Single threaded process Multithreaded process Linux Processes Single threaded process Multithreaded process Task group Process/Task -- Thread(s) -- Execution
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31 Execution Basic portions of address space Text ~ machine code instructions. • Usually the OS sets this to read only .. Allows for many instances of the same execution to reference a single structure– the application code normally does not change. Data • Initialized Read only • Initialized read/write • Uninitialized Data • Heap – dynamically allocated memory Stack – local variables, stack frames Shared memory
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.32 Memory – user address space routine var1() var2() … Main() routine1() routine2() … Array1 Array2 … stack text data heap routine1 var1() var2() Main() routine1() routine2() … Array1 Array2 … Thread stack text data heap routine1 var1() var2() Thread stack Execution
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33 Execution Tempered by logic • Compiler optimization • Execution flow CPU • Hardware • Scheduler – task switching Data fetch • Memory • IO Locks and/or IPC
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Profiling
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.35 Profiling Toolbag Application instrumentation • gprof, Valgrind, Visual Studio, komodo, Xcode – many others Compiler instrumentation • At time of compile – use flags to leverage trace pointers Kernel tracing • Great for understanding what the application is doing when it enters KERNEL space System profiling Environment profiling
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.36 Profiling The layer involved and precision required determines toolbox What is the application waiting on? • CPU • Networking • Disk • Filesystem • locks?
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.37 IPC Network Access Semaphores semop(), semctl() Locking of resources Messages queues msgsnd() / msgrcv() Shared memory shmget() shmat() RPC – (request /response framework) Normally leverages sockets but can leverage Pipes (no network) Socket (layer 5) TCP/IP (transport) Segments – frames! RTT Sliding windows BDP (bandwidth delay product) Latency Throughput/bandwidth Serialization/parallelization Flow Control PROFILING
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.38 Profiling The toolbox : Example Linux Windows HP-UX Solaris AIX ESX Collectl / Glance Perfmon / sysinternals GLANCE GLANCE topas esxtop strace Sysinternals , Xperf tusc Truss / strace truss Kitrace / Oprofile Logman/per fmon/PAL Kitrace caliper trace
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.39 Profiling Glance Application Object Execution Profiling Labs Platform Architecture
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Labs
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.41 Labs Scenario 1 1. Where do you start? 2. What data would you collect? 3. How would you analyze it?
    • © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Thankyou