Technology (1)

Eserver pSeries
"Any sufficiently advanced technology will
have the appearance of magic."
…Arthur C. Clarke
© 2003 IBM Corporation
Section 2: The Technology

^Eserver pSeries
Section Objectives
 On completion of this unit you should be able to:
– Describe the relationship between technology and
solutions.
– List key IBM technologies that are part of the POWER5
products.
– Be able to describe the functional benefits that these
technologies provide.
– Be able to discuss the appropriate use of these
technologies.
© 2003 Concepts of Solution Design IBM Corporation

^Eserver pSeries
IBM and Technology
Solutions
Products
Technology
Science

^Eserver pSeries
Technology and innovation
 Having technology available is a necessary first
step.
 Finding creative new ways to use the technology
for the benefit of our clients is what innovation is
about.
 Solution design is an opportunity for innovative
application of technology.

^Eserver pSeries
When technology won’t ‘fix’ the problem
 When the technology is not related to the problem.
 When the client has unreasonable expectations.

Eserver pSeries
POWER5 Technology

^Eserver pSeries
POWER4 and POWER5 Cores
POWER4 Core POWER5 Core

^Eserver pSeries
POWER5
 Designed for entry and high-end
servers
 Enhanced memory subsystem
 Improved performance
 Simultaneous Multi-Threading
 Hardware support for Shared
Processor Partitions (Micro-
Partitioning)
 Dynamic power management
 Compatibility with existing
POWER4 systems
 Enhanced reliability,
availability, serviceability
SSMMTT CCoorree
SSMMTT CCoorree
11..99 MMBB LL22 CCaacchhee
Enhanced distributed switch
LL33 DDiirr MMeemm CCttrrll
Chip-Chip / MCM-MCM / SMPLink
GX+

^Eserver pSeries
Enhanced memory subsystem
 Improved L1 cache design
– 2-way set associative i-cache
– 4-way set associative d-cache
– New replacement algorithm (LRU vs. FIFO)
 Larger L2 cache
– 1.9 MB, 10-way set associative
 Improved L3 cache design
– 36 MB, 12-way set associative
– L3 on the processor side of the fabric
– Satisfies L2 cache misses more frequently
– Avoids traffic on the interchip fabric
 On-chip L3 directory and memory controller
– L3 directory on the chip reduces off-chip delays
after an L2 miss
– Reduced memory latencies
 Improved pre-fetch algorithms
SSMMTT CCoorree
SSMMTT CCoorree
11..99 MMBB LL22 CCaacchhee
LL33 DDiirr MMeemm CCttrrll

^Eserver pSeries
Enhanced memory subsystem
POWER4 system structure POWER5 system structure
PPrroocceessssoorr PPrroocceessssoorr PPrroocceessssoorr PPrroocceessssoorr PPrroocceessssoorr PPrroocceessssoorr PPrroocceessssoorr PPrroocceessssoorr
L3
Cache
L3
Cache
L2
Cache
L2
Cache
L2
Cache
L2
Cache
Fabric
controller
Fabric
controller
Memory
controller
Memory
controller
Memory Memory
L3
Cache
L3
Cache
Fabric
controller
Fabric
controller
Memory
controller
Memory
controller
Memory Memory
Reduced
L3 latency
Faster
access to
memory
Larger
SMPs
64-way
Number of
chips cut
in half
LL33 DDiirr
rrii DD 33LL

^Eserver pSeries
Simultaneous Multi-Threading (SMT)
 What is it?
 Why would I want it?

^Eserver pSeries
Out-of-order processing
Branch
pipeline
Load/store
pipeline
MP ISS RF EX WB Xfer
MP ISS RF EA DC Fmt
WB Xfer
Fixed-point
pipeline
Floating-point
WB
pipeline
POWER4 pipeline
MP ISS RF F6
Xfer
F6 F6 F6 F6 F6
D1 D2 D3 Xfer GD
Branch redirects
Instruction Fetch
IC
IF BP
CP
Instruction Crack and
Group Formation
D0
Interrupts & Flushes
POWER4 instruction pipeline (IF = instruction fetch, IC = instruction cache, BP = branch predict, D0
= decode stage 0, Xfer = transfer, GD = group dispatch, MP = mapping, ISS = instruction issue, RF
= register file read, EX = execute, EA = compute address, DC = data caches, F6 = six-cycle floating-point
execution pipe, Fmt = data format, WB = write back, and CP = group commit)
POWER5 pipeline

^Eserver pSeries
 Execution unit utilization is low in today’s
microprocessors
 25% of average execution unit utilization across
a broad spectrum of environments
FX0
FX1
LS0
LS1
FP0
FP1
BFX
CRL
Processor Cycles
Multi-threading evolution
ehcaC-i
Memory
Instruction streams
Next evolution step

^Eserver pSeries
 Two instruction streams, one thread at any instance
 Hardware swaps in second thread when long-latency event
FX0
FX1
LS0
LS1
FP0
FP1
BFX
CRL
Coarse-grained multi-threading
 Swap requires several cycles
Swap
Swap
Processor Cycles
occurs
ehcaC-i
Memory
Instruction streams
Swap
Next evolution step

^Eserver pSeries
Coarse-grained multi-threading (Cont.)
 Processor (for example, RS64-IV) is able to store context for
two threads
– Rapid switching between threads minimizes lost cycles due
to I/O waits and cache misses.
– Can yield ~20% improvement for OLTP workloads.
 Coarse-grained multi-threading only beneficial where
number of active threads exceeds 2x number of CPUs
– AIX must create a “dummy” thread if there are insufficient
numbers of real threads.
• Unnecessary switches to “dummy” threads can degrade
performance ~20%
• Does not work with dynamic CPU deallocation

^Eserver pSeries
 Variant of coarse-grained multi-threading
 Thread execution in round-robin fashion
 Cycle remains unused when a thread
encounters a long-latency event
FX0
FX1
LS0
LS1
FP0
FP1
BFX
CRL
Processor Cycles
Fine-grained multi-threading
ehcaC-i
Memory
Instruction streams
Next evolution step

^Eserver pSeries
POWER5 pipeline
Out-of-order processing
Branch
pipeline
MP ISS RF EA DC WB Xfer
MP ISS RF F6
Xfer
F6 F6 F6 F6 F6
D1 D2 D3 Xfer GD
Branch redirects
Instruction Fetch
IC
IF BP
CP
Instruction Crack and
Group Formation
D0
Interrupts & Flushes
WB
Fmt
POWER5 instruction pipeline (IF = instruction fetch, IC = instruction cache, BP = branch predict, D0
= decode stage 0, Xfer = transfer, GD = group dispatch, MP = mapping, ISS = instruction issue, RF
= register file read, EX = execute, EA = compute address, DC = data caches, F6 = six-cycle floating-point
execution pipe, Fmt = data format, WB = write back, and CP = group commit)
IF
CP
Load/store
pipeline
Fixed-point
pipeline
Floating-point
pipeline
POWER4 pipeline

^Eserver pSeries
 Reduction in unused execution
units results in a 25-40% boost and
even more!
FX0
FX1
LS0
LS1
FP0
FP1
BFX
CRL
Simultaneous multi-threading (SMT)
Processor Cycles
ehcaC-i
Memory
Instruction streams
First evolution step

^Eserver pSeries
Simultaneous multi-threading (SMT) (Cont.)
 Each chip appears as a 4-way SMP to software
– Allows instructions from two threads to execute
simultaneously
 Processor resources optimized for enhanced SMT
performance
– No context switching, no dummy threads
 Hardware, POWER Hypervisor, or OS controlled thread
priority
– Dynamic feedback of shared resources allows for balanced
thread execution
 Dynamic switching between single and multithreaded mode

^Eserver pSeries
Dynamic resource balancing
 Threads share many
resources
– Global Completion Table,
Branch History Table,
Translation Lookaside Buffer,
and so on
 Higher performance realized
when resources balanced
across threads
– Tendency to drift toward
extremes accompanied by
reduced performance

^Eserver pSeries
Adjustable thread priority
 Instances when unbalanced
execution is desirable
– No work for opposite thread
– Thread waiting on lock
– Software determined non
uniform balance
– Power management
 Control instruction decode
rate
– Software/hardware controls
eight priority levels for each
thread
22
cycle
1
per 1
Instructions 1
1
1
0
0
0
Single-threaded operation
0,7 2,7 4,7 6,7 7,7 7,6 7,4 7,2 7,0 1,1
Thread 0 Priority - Thread 1 Priority
Thread 0 IPC Thread 1 IPC
Power
Save
Mode
Hardware thread priorities

^Eserver pSeries
Single-threaded operation
 Advantageous for execution unit
limited applications
– Floating or fixed point intensive
workloads
 Execution unit limited applications
provide minimal performance
leverage for SMT
– Extra resources necessary for SMT
provide higher performance benefit
when dedicated to single thread
 Determined dynamically on a per
processor basis
Thread states
Dormant
Hardware
or Software
Software
Null
Software
Active
Software

Eserver pSeries
Micro-Partitioning

^Eserver pSeries
Micro-Partitioning overview
 Mainframe inspired technology
 Virtualized resources shared by multiple partitions
 Benefits
– Finer grained resource allocation
– More partitions (Up to 254)
– Higher resource utilization
 New partitioning model
– POWER Hypervisor
– Virtual processors
– Fractional processor capacity partitions
– Operating system optimized for Micro-Partitioning exploitation
– Virtual I/O

^Eserver pSeries
Processor terminology
Shared processor
partition
SMT Off
Shared processor
partition
SMT On
Entitled capacity
Shared processor pool
Dedicated
processor partition
SMT Off
Logical (SMT)
Virtual
Shared
Dedicated
Inactive (CUoD)
Deconfigured
Installed physical
processors

^Eserver pSeries
Shared processor partitions
 Micro-Partitioning allows for multiple partitions to
share one physical processor
 Up to 10 partitions per physical processor
 Up to 254 partitions active at the same time
 Partition’s resource definition
– Minimum, desired, and maximum values for each
resource
– Processor capacity
– Virtual processors
– Capped or uncapped
• Capacity weight
– Dedicated memory
• Minimum of 128 MB and 16 MB increments
– Physical or virtual I/O resources
CPU 0 CPU 1
CPU 3 CPU 4
LPAR 1 LPAR 2
LPAR 3 LPAR 4
LPAR 5 LPAR 6

^Eserver pSeries
Understanding min/max/desired resource values
 The desired value for a resource is given to a
partition if enough resource is available.
 If there is not enough resource to meet the desired
value, then a lower amount is allocated.
 If there is not enough resource to meet the min
value, the partition will not start.
 The maximum value is only used as an upper limit
for dynamic partitioning operations.

^Eserver pSeries
Partition capacity entitlement
 Processing units
– 1.0 processing unit represents one
physical processor
 Entitled processor capacity
– Commitment of capacity that is
reserved for the partition
– Set upper limit of processor
utilization for capped partitions
– Each virtual processor must be
granted at least 1/10 of a
processing unit of entitlement
 Shared processor capacity is
always delivered in terms of whole
physical processors
Minimum requirement
0.1 processing units
0.5 processing unit 0.4 processing unit
Processing capacity
1 physical processor
1.0 processing units

^Eserver pSeries
Capped and uncapped partitions
 Capped partition
– Not allowed to exceed its entitlement
 Uncapped partition
– Is allowed to exceed its entitlement
 Capacity weight
– Used for prioritizing uncapped partitions
– Value 0-255
– Value of 0 referred to as a “soft cap”

^Eserver pSeries
Partition capacity entitlement example
 Shared pool has 2.0 processing units
available
 LPARs activated in sequence
 Partition 1 activated
– Min = 1.0, max = 2.0, desired = 1.5
– Starts with 1.5 allocated processing units
– Min = 1.0, max = 2.0, desired = 1.0
– Does not start
– Min = 0.1, max = 1.0, desired = 0.8
– Starts with 0.5 allocated processing units

^Eserver pSeries
Understanding capacity allocation – An example
 A workload is run under different configurations.
 The size of the shared pool (number of physical
processors) is fixed at 16.
 The capacity entitlement for the partition is fixed
at 9.5.
 No other partitions are active.

^Eserver pSeries
Uncapped – 16 virtual processors
Uncapped (16PPs/16VPs/9.5CE)
15
10
5
 16 virtual processors.
 Uncapped.
 Can use all available resource.
 The workload requires 26 minutes to complete.
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Elapsed time

^Eserver pSeries
Uncapped – 12 virtual processors
Uncapped (16PPs/12VPs/9.5CE)
15
10
5
 12 virtual processors.
 Even though the partition is uncapped, it can only use 12
processing units.
 The workload now requires 27 minutes to complete.
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Elapsed time

^Eserver pSeries
Capped (16PPs/12VPs/9.5E)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Elapses time
Capped
15
10
5
0
 The partition is now capped and resource utilization is
limited to the capacity entitlement of 9.5.
– Capping limits the amount of time each virtual processor is
scheduled.
– The workload now requires 28 minutes to complete.

^Eserver pSeries
Dynamic partitioning operations
 Add, move, or remove processor capacity
– Remove, move, or add entitled shared processor capacity
– Change between capped and uncapped processing
– Change the weight of an uncapped partition
– Add and remove virtual processors
• Provided CE / VP > 0.1
 Add, move, or remove memory
– 16 MB logical memory block
 Add, move, or remove physical I/O adapter slots
 Add or remove virtual I/O adapter slots
 Min/max values defined for LPARs set the bounds within
which DLPAR can work

^Eserver pSeries
Dynamic LPAR
Standard on all new systems
Part#1
Production
AIX
5L
HMC
Part#2 Part#3 Part#4
Test/
Dev
Move resources
between live
AIX
5L
File/
Print
Linux
Legacy
Apps
partitions
AIX
5L
Hypervisor

Eserver pSeries
Firmware
POWER Hypervisor

^Eserver pSeries
POWER Hypervisor strategy
 New Hypervisor for POWER5 systems
– Further convergence with iSeries
– But brands will retain unique value propositions
– Reduced development effort
– Faster time to market
 New capabilities on pSeries servers
– Shared processor partitions
– Virtual I/O
 New capability on iSeries servers
– Can run AIX 5L

^Eserver pSeries
POWER Hypervisor component sourcing
H-Call Interface
Nucleus (SLIC)
Virtual I/O
Bus recovery Dump
Drawer concurrent maint. Slot/tower concurrent maint.
Shared processor LPAR
Capacity on Demand
Virtual Ethernet
Location codes
Load from flash
Message passing
LAN IOA VLAN IOA
FSP
NVRAM
HSC VVLLAANN
pSeries
iSeries
255 partitions
Partition on demand
HMC
SCSI IOA
I/O configuration

^Eserver pSeries
POWER Hypervisor functions
 Same functions as POWER4 Hypervisor.
– Dynamic LPAR
– Capacity Upgrade on Demand
 New, active functions.
– Dynamic Micro-Partitioning
– Shared processor pool
– Virtual I/O
– Virtual LAN
 Machine is always in LPAR mode.
Dynamic LPAR
– Even with all resources dedicated to one OS
Dynamic Micro-Partitioning
CPU 0 CPU 1
CPU 2 CPU 3
Shared processor pools
SSMMTT C Coroere
SSMMT TC Coorere
LL33 D Dirir MMeemm C Ctrtlrl
SSMMTT C Coorere
11.9.9 M MBB L L22 C Caacchhee
11.9. 9M MBB L L22 C Caacchhee
SSMMTT C Coorere
SSMMTT C Coorere
SSMMTT C Coorere
SSMMTT C Coorere
SSMMTT C Coorere
Virtual I/O
Disk LAN
Capacity Upgrade on Demand
Planned
Actual
Client Capacity Growth

^Eserver pSeries
POWER Hypervisor implementation
 Design enhancements to previous POWER4
implementation enable the sharing of processors
by multiple partitions
– Hypervisor decrementer (HDECR)
– New Processor Utilization Resource Register (PURR)
– Refine virtual processor objects
• Does not include physical characteristics of the processor
– New Hypervisor calls

^Eserver pSeries
POWER Hypervisor processor dispatch
 Manage a set of processors on the machine
(shared processor pool).
 POWER5 generates a 10 ms dispatch window.
– Minimum allocation is 1 ms per physical
processor.
 Each virtual processor is guaranteed to get its
entitled share of processor cycles during each 10
ms dispatch window.
– ms/VP = CE * 10 / VPs
 The partition entitlement is evenly distributed
among the online virtual processors.
 Once a capped partition has received its CE
within a dispatch interval, it becomes not-runnable.
 A VP dispatched within 1 ms of the end of the
Virtual processor capacity entitlement for
six shared processor partitions
CPU 0 CPU 1
SSMMTT C Coorere
SSMMTT C Coorere
SSMMTT C Coorere
SSMMTT C Coroere
SSMMTT C Coorere
SSMMTT C Coroere
SSMMTT C Coorere
SSMMTT C Coorere
CPU 2 CPU 3
POWER
Hypervisor’s
processor
dispatch
dispatch interval will receive half its CE at the
start of the next dispatch interval. Shared processor pool

^Eserver pSeries
Dispatching and interrupt latencies
 Virtual processors have dispatch latency.
 Dispatch latency is the time between a virtual
processor becoming runnable and being actually
dispatched.
 Timers have latency issues also.
 External interrupts have latency issues also.

^Eserver pSeries
 Processors not associated with
dedicated processor partitions.
 No fixed relationship between virtual
processors and physical processors.
 The POWER Hypervisor attempts to
use the same physical processor.
– Affinity scheduling
– Home node
Virtual processor capacity entitlement for
six shared processor partitions
SSMMTT C Coorere
SSMMTT C Coorere
SSMMTT C Coorere
SSMMTT C Coorere
SSMMT TC Coorere
SSMMTT C Coorere
POWER
Hypervisor’s
processor
dispatch
SSMMTT C Coorere
SSMMTT C Coorere
CPU 0 CPU 1 CPU 2 CPU 3

^Eserver pSeries
Affinity scheduling
 When dispatching a VP, the POWER Hypervisor attempts to
preserve affinity by using:
– Same physical processor as before, or
– Same chip, or
– Same MCM
 When a physical processor becomes idle, the POWER
Hypervisor looks for a runnable VP that:
– Has affinity for it, or
– Has affinity to no-one, or
– Is uncapped
 Similar to AIX affinity scheduling

^Eserver pSeries
Operating system support
 Micro-Partitioning capable operating systems need to be modified
to cede a virtual processor when they have no runnable work
– Failure to do this results in wasted CPU resources
• For example, an partition spends its CE waiting for I/O
– Results in better utilization of the pool
 May confer the remainder of their timeslice to another VP
– For example, a VP holding a lock
 Can be redispatched if they become runnable again during the
same dispatch interval

^Eserver pSeries
LPAR 1
VP 1
LPAR 1
VP 1
LPAR 1
VP 0
LPAR 3
VP 0
IDLE VP 0 IDLE
LPAR 3
VP 1
0 1 2 3 4 5 6 7 8 9
LPAR 3
VP 2
LPAR 3
LPAR 1
VP 0
LPAR 1
VP 1
LPAR 3
VP 1
LPAR 2
VP 0
LPAR1
Capacity entitlement = 0.8 processing units; virtual processors = 2 (capped)
Example
POWER Hypervisor dispatch interval pass 1 (msec) POWER Hypervisor dispatch interval pass 2 (msec)
Physical
processor 0
Physical
processor 1
10 11 12 13 14 15 16 17 18 19
LPAR 2
VP 0
20
LPAR 3
VP 2
LPAR2
LPAR3

^Eserver pSeries
POWER Hypervisor and virtual I/O
 I/O operations without dedicating resources to an individual
partition
 POWER Hypervisor’s virtual I/O related operations
– Provide control and configuration structures for virtual
adapter images required by the logical partitions
– Operations that allow partitions controlled and secure access
to physical I/O adapters in a different partition
– The POWER Hypervisor does not own any physical I/O
devices; they are owned by an I/O hosting partition
 I/O types supported
– SCSI
– Ethernet
– Serial console
Disk LAN

^Eserver pSeries
Performance monitoring and accounting
 CPU utilization is measured against CE.
– An uncapped partition receiving more than its CE will record
100% but will be using more.
 SMT
– Thread priorities compound the variable speed rate.
– Twice as many logical CPUs.
 For accounting, interval may be incorrectly allocated.
– New hardware support is required.
 Processor utilization register (PURR) records actual clock ticks
spent executing a partition.
– Used by performance commands (for example, new flags) and
accounting modules.
– Third party tools will need to be modified.

Eserver pSeries
Virtual I/O Server

^Eserver pSeries
Virtual I/O Server
 Provides an operating environment for virtual I/O administration
– Virtual I/O server administration
– Restricted scriptable command line user interface (CLI)
 Minimum hardware requirements
– POWER5 VIO capable machine
– Hardware management console
– Storage adapter
– Physical disk
– Ethernet adapter
– At least 128 MB of memory
 Capabilities of the Virtual I/O Server
– Ethernet Adapter Sharing
– Virtual SCSI disk
• Virtual I/O Server Version 1.1 is addressed for selected configurations, which include specific
models of EMC, HDS, and STK disk subsystems, attached using Fiber Channel
– Interacts with AIX and Linux partitions

^Eserver pSeries
Virtual I/O Server (Cont.)
 Installation CD when Advanced POWER
Virtualization feature is ordered
 Configuration approaches for high availability
– Virtual I/O Server
• LVM mirroring
• Multipath I/O
• EtherChannel
– Second virtual I/O server instance in another partition

^Eserver pSeries
Virtual SCSI
 Allows sharing of storage devices
 Vital for shared processor partitions
– Overcomes potential limit of adapter slots due to Micro-
Partitioning
– Allows the creation of logical partitions without the need for
additional physical resources
 Allows attachment of previously unsupported storage
solutions

^Eserver pSeries
VSCSI server and client architecture overview
 Virtual SCSI is based on a
client/server relationship.
 The virtual I/O resources are assigned
using an HMC.
 Virtual SCSI enables sharing of
adapters as well as disk devices.
 Dynamic LPAR operations allowed.
 Dynamic mapping between physical
Client
partition
Linux
Virtual I/O
Server partition
Client
partition
AIX
LVM
Logical
volume 1
VSCSI server
adapter
Logical hdisk
volume 2
VSCSI server
adapter
VSCI client
adapter
and virtual resources on the virtual
I/O server. POWER Hypervisor
Physical adapter
hdisk
VSCI client
adapter
Physical disk
(SCSI, FC)

^Eserver pSeries
Client partition
hdisk
LVM
VSCI client
adapter
POWER Hypervisor
VSCSI server
adapter
LVM
LV
hdisk
Virtual I/O Server partition
Virtual devices
 Are defined as LVs in the I/O server
partition
– Normal LV rules apply
 Appear as real devices (hdisks) in the
hosted partition
 Can be manipulated using Logical
Volume Manager just like an ordinary
physical disk
 Can be used as a boot device and as a
NIM target
 Can be shared by multiple clients
Virtual
disk

^Eserver pSeries
SCSI RDMA and Logical Remote Direct Memory Access
 SCSI transport protocols define the
rules for exchanging information
between SCSI initiators and targets.
 Virtual SCSI uses the SCSI RDMA
Protocol (SRP).
– SCSI initiators and targets have the
ability to directly transfer information
between their respective address
spaces.
 SCSI requests and responses are
sent using the Virtual SCSI adapters.
 The actual data transfer, however, is
done using the Logical Redirected
DMA protocol.
Virtual I/O Server
partition
Client partition AIX
VSCI device
driver (target)
Device
Mapping
VSCI device
driver (initiator) Data Buffer
Reliable Command / Response Transport
Logical Remote Direct Memory Access
POWER Hypervisor
Physical
adapter device
driver
Physical adapter

^Eserver pSeries
Virtual SCSI security
 Only the owning partition has access to its data.
 Data-information is copied directly from the PCI
adapter to the client’s memory.

^Eserver pSeries
Performance considerations
 Twice as many processor cycles to do VSCSI as a locally attached
disk I/O (evenly distributed on the client partition and virtual I/O
server)
– The path of each virtual I/O request involves several sources of
overhead that are not present in a non-virtual I/O request.
– For a virtual disk backed by the LVM, there is also the performance
impact of going through the LVM and disk device drivers twice.
 If multiple partitions are competing for resources from a VSCSI
server, care must be taken to ensure enough server resources
(CPU, memory, and disk) are allocated to do the job.
 If not constrained by CPU performance, dedicated partition
throughput is comparable to doing local I/O.
 Because there is no caching in memory on the server I/O partition,
it's memory requirements should be modest.

^Eserver pSeries
Limitations
 Hosting partition must be available before hosted
partition boot.
 Virtual SCSI supports FC, parallel SCSI, and SCSI
RAID.
 Maximum of 65535 virtual slots in the I/O server
partition.
 Maximum of 256 virtual slots on a single partition.
 Support for all mandatory SCSI commands.
 Not all optional SCSI commands are supported.

^Eserver pSeries
Implementation guideline
 Partitions with high performance and disk I/O
requirements are not recommended for
implementing VSCSI.
 Partitions with very low performance and disk I/O
requirements can be configured at minimum
expense to use only a portion of a logical volume.
 Boot disks for the operating system.
 Web servers that will typically cache a lot of data.

^Eserver pSeries
Virtual I/O
Server
partition
Client
partition
Virtual I/O
Server
partition
LVM
VSCSI server
adapter
LVM
VSCSI server
adapter
LVM
VSCSI
client
adapter
VSCSI
client
adapter
POWER Hypervisor
LVM mirroring
 This configuration
protects virtual disks in a
client partition against
failure of:
– One physical disk
– One physical adapter
– One virtual I/O server
 Many possibilities exist
to exploit this great
function!
Physical SCSI
adapter
Physical disk
(SCSI)
Physical SCSI
adapter
Physical disk
(SCSI)

^Eserver pSeries
Virtual I/O
Server
partition
Client
partition
Virtual I/O
Server
partition
LVM
(hdisk)
VSCSI server
adapter
LVM
(hdisk)
VSCSI server
adapter
LVM
VSCSI
client
adapter
VSCSI
client
adapter
POWER Hypervisor
Multipath I/O
 This configuration protects
virtual disks in a client
partition against failure of:
– Failure of one physical FC
adapter in one I/O server
– Failure of one Virtual I/O
server
 Physical disk is assigned as a
whole to the client partition
 Many possibilities exist to
exploit this great function!
Physical FC adapter
Physical FC adapter
SAN Switch
Physical disk
ESS

^Eserver pSeries
Virtual LAN overview
 Virtual network segments on top of
physical switch devices.
 All nodes in the VLAN can
communicate without any L3
routing or inter-VLAN bridging.
 VLANs provides:
– Increased LAN security
– Flexible network deployment over
traditional network devices
 VLAN support in AIX is based on
the IEEE 802.1Q VLAN
implementation.
– VLAN ID tagging to Ethernet
frames
– VLAN ID restricted switch ports
Node A-1 Node A-2
Switch A
Switch B Switch C
VLAN 1
VLAN 2
X
Node B-1 Node B-2 Node B-3 Node C-1 Node C-2

^Eserver pSeries
Virtual Ethernet
 Enables inter-partition communication.
– In-memory point to point connections
 Physical network adapters are not needed.
 Similar to high-bandwidth Ethernet connections.
 Supports multiple protocols (IPv4, IPv6, and ICMP).
 No Advanced POWER Virtualization feature required.
– POWER5 Systems
– AIX 5L V5.3 or appropriate Linux level
– Hardware management console (HMC)

^Eserver pSeries
Virtual Ethernet connections
 VLAN technology implementation
– Partitions can only access data directed to
them.
 Virtual Ethernet switch provided by the
POWER Hypervisor
 Virtual LAN adapters appears to the OS as
physical adapters
– MAC-Address is generated by the HMC.
 1-3 Gb/s transmission speed
– Support for large MTUs (~64K) on AIX.
 Up to 256 virtual Ethernet adapters
– Up to 18 VLANs.
 Bootable device support for NIM OS
installations
Linux
partition
AIX
partition
Virtual
Ethernet
adapter
Virtual
Ethernet
adapter
AIX
partition
Virtual
Ethernet
adapter
Virtual Ethernet switch
POWER Hypervisor

^Eserver pSeries
 Based on IEEE 802.1Q VLAN standard
– OSI-Layer 2
– Optional Virtual LAN ID (VID)
– 4094 virtual LANs supported
– Up to 18 VIDs per virtual LAN port
 Switch configuration through HMC

^Eserver pSeries
How it works
Virtual Ethernet adapter
Virtual VLAN switch port
PHYP caches source MAC
Y
IEEE VLAN Check VLAN header
header?
N
Insert VLAN header
Port allowed?
N
Dest. MAC in
table?
Y
Trunk adapter
defined?
Configured associated switch
port
Match for
VLAN Nr. in
table?
N
Y
N
Y
Deliver Pass to Trunk
N
Y
N
adapter Drop packet

^Eserver pSeries
 Virtual Ethernet performance
– Throughput scales nearly linear with the
allocated capacity entitlement
 Virtual LAN vs. Gigabit Ethernet
throughput
– Virtual Ethernet adapter has higher raw
throughput at all MTU sizes
– In-memory copy is more efficient at larger
MTU
Throughput/0.1
entitlement
Throughput per 0.1 entitlement
1000
800
600
400
200
0
[Mb/s]
0.1 0.3 0.5 0.8 1
65394
9000
1500
CPU entitlements
MTU
size
Throughput, TCP_STREAM
10000
8000
6000
4000
2000
0
Throughput
[Mb/s]
1
VLAN
Gb Ethernet
MTU 1500 1500 9000 9000 65394 65394
Simpl./Dupl. S D S D S D

^Eserver pSeries
Limitations
 Virtual Ethernet can be used in both shared and
dedicated processor partitions provided with the
appropriate OS levels.
 A mixture of Virtual Ethernet connections, real network
adapters, or both are permitted within a partition.
 Virtual Ethernet can only connect partitions within a
single system.
 A system’s processor load is increased when using
virtual Ethernet.

^Eserver pSeries
 Know your environment and the network traffic.
 Choose a high MTU size, as it makes sense for the
network traffic in the Virtual LAN.
 Use the MTU size 65394 if you expect a large amount of
data to be copied inside your Virtual LAN.
 Enable tcp_pmtu_discover and udp_pmtu_discover in
conjunction with MTU size 65394.
 Do not turn off SMT.
 No dedicated CPUs are required for virtual Ethernet
performance.

^Eserver pSeries
Connecting Virtual Ethernet to external networks
 Routing
– The partition that routes the traffic to the external work does not necessarily have to be
the virtual I/O server.
AIX
partition
3.1.1.10 3.1.1.10
AIX partition
1.1.1.100 3.1.1.1
POWER Hypervisor
Linux
partition
Physical adapter
AIX
partition
4.1.1.10 4.1.1.11
AIX partition
2.1.1.100 4.1.1.1
POWER Hypervisor
Linux
partition
Physical adapter
IP subnet 1.1.1.X
AIX
Server
IP subnet 2.1.1.X
Linux
Server
IP Router
1.1.1.1
2.1.1.1
1.1.1.10 2.1.1.10

^Eserver pSeries
Shared Ethernet Adapter
 Connects internal and external VLANs using one physical
adapter.
 SEA is a new service that acts as a layer 2 network switch.
– Securely bridges network traffic from a virtual Ethernet
adapter to a real network adapter
 SEA service runs in the Virtual I/O Server partition.
– Advanced POWER Virtualization feature required
– At least one physical Ethernet adapter required
 No physical I/O slot and network adapter required in the
client partition.

^Eserver pSeries
Shared Ethernet Adapter (Cont.)
 Virtual Ethernet MAC are visible to outside systems.
 Broadcast/multicast is supported.
 ARP (Address Resolution Protocol) and NDP (Neighbor Discovery
Protocol) can work across a shared Ethernet.
 One SEA can be shared by multiple VLANs and multiple subnets
can connect using a single adapter on the Virtual I/O Server.
 Virtual Ethernet adapter configured in the Shared Ethernet Adapter
must have the trunk flag set.
– The trunk Virtual Ethernet adapter enables a layer-2 bridge to a
physical adapter
 IP fragmentation is performed or an ICMP packet too big message
is sent when the shared Ethernet adapter receives IP (or IPv6)
packets that are larger than the MTU of the adapter that the packet
is forwarded through.

^Eserver pSeries
Virtual Ethernet and Shared Ethernet Adapter security
 VLAN (virtual local area network) tagging description taken
from the IEEE 802.1Q standard.
 The implementation of this VLAN standard ensures that the
partitions have no access to foreign data.
 Only the network adapters (virtual or physical) that are
connected to a port (virtual or physical) that belongs to the
same VLAN can receive frames with that specific VLAN ID.

^Eserver pSeries
 Virtual I/O-Server
performance
– Adapters stream data at
media speed if the Virtual
I/O server has enough
capacity entitlement.
– CPU utilization per Gigabit
of throughput is higher with
a Shared Ethernet adapter.
Throughput
[Mb/s]
2000
1500
1000
500
0
Virtual I/O Server Throughput, TCP_STREAM
1 2 3 4
MTU 1500 1500 9000 9000
Simplex/Duplex simplex duplex simplex duplex
CPU
Utilisation
[%cpu/Gb]
100
80
60
40
20
0
Virtual I/O Server
normalized CPU utilisation, TCP_STREAM
1 2 3 4
MTU 1500 1500 9000 9000
Simplex/Duplex simplex duplex simplex duplex

^Eserver pSeries
Limitations
 System processors are used for all communication
functions, leading to a significant amount of system
processor load.
 One of the virtual adapters in the SEA on the Virtual I/O
server must be defined as a default adapter with a default
PVID.
 Up to 16 Virtual Ethernet adapters with 18 VLANs on each
can be shared on a single physical network adapter.
 Shared Ethernet Adapter requires:
– POWER Hypervisor component of POWER5
systems
– AIX 5L Version 5.3 or appropriate Linux level

^Eserver pSeries
 Know your environment and the network traffic.
 Use a dedicated network adapter if you expect heavy
network traffic between Virtual Ethernet and local
networks.
 If possible, use dedicated CPUs for the Virtual I/O
Server.
 Choose 9000 for MTU size, if this makes sense for
your network traffic.
 Don’t use Shared Ethernet Adapter functionality for
latency critical applications.
 With MTU size 1500, you need about 1 CPU per
gigabit Ethernet adapter streaming at media speed.
 With MTU size 9000, 2 Gigabit Ethernet adapters can
stream at media speed per CPU.

^Eserver pSeries
Shared Ethernet Adapter configuration
 The Virtual I/O Server is
configured with at least one
physical Ethernet adapter.
 One Shared Ethernet Adapter
can be shared by multiple
VLANs.
 Multiple subnets can connect
using a single adapter on the
Virtual I/O Server.
AIX
partition
VLAN 1
10.1.1.11
Virtual I/O Server
ent0 VLAN 2
VLAN 1
POWER Hypervisor
Linux
partition
VLAN 2
10.1.2.11
Physical adapter
VLAN 1
AIX
Server
10.1.1.14
VLAN 2
Linux
Server
10.1.2.15

^Eserver pSeries
Multiple Shared Ethernet Adapter configuration
 Maximizing throughput
– Using several Shared Ethernet
Adapters
– More queues
– More performance
Linux
partition
AIX
partition
VLAN 2
10.1.2.11
VLAN 1
10.1.1.11
POWER Hypervisor
Virtual I/O Server
VLAN
VLAN
2
ent0 ent1
1
Physical adapter
VLAN 1
AIX
Server
10.1.1.14
VLAN 2
Linux
Server
10.1.2.15
Physical adapter

^Eserver pSeries
Multipath routing with dead gateway detection
 This configuration protects
your access to the external
network against:
– Failure of one physical
network adapter in one I/O
server
– Failure of one Virtual I/O
server
– Failure of one gateway
AIX partition
Virtual I/O
Server 2
VLAN 2
9.3.5.21
Virtual I/O
Server 2
VLAN 1
9.3.5.11
Multipath routing
with
dead gateway
detection
default route to 9.3.5.10 via 9.3.5.12
default route to 9.3.5.20 via 9.3.5.22
VLAN 2
9.3.5.22
VLAN 1
9.3.5.12
POWER Hypervisor
ent0
Physical adapter
External
network
ent0
Physical adapter
Gateway
9.3.5.10
Gateway
9.3.5.20

^Eserver pSeries
Shared Ethernet Adapter commands
 Virtual I/O Server commands
– lsdev -type adapter: Lists all the virtual and physical adapters.
– Choose the virtual Ethernet adapter we want to map to the physical
Ethernet adapter.
– Make sure the physical and virtual interfaces are unconfigured
(down or detached).
– mkvdev: Maps the physical adapter to the virtual adapter, creates a
layer 2 bridge, and defines the default virtual adapter with its default
VLAN ID. It creates a new Ethernet interface (for example, ent5).
– The mktcpip command is used for TCP/IP configuration on the new
Ethernet interface (for example, ent5).
 Client partition commands
– No new commands are needed; the typical TCP/IP configuration is
done on the virtual Ethernet interface that it is defined in the client
partition profile on the HMC.

^Eserver pSeries
Virtual SCSI commands
 Virtual I/O Server commands
– To map a LV:
• mkvg: Creates the volume group, where a new LV will be created using
the mklv command.
• lsdev: Shows the virtual SCSI server adapters that could be used for
mapping with the LV.
• mkvdev: Maps the virtual SCSI server adapter to the LV.
• lsmap -all: Shows the mapping information.
– To map a physical disk:
• lsdev: Shows the virtual SCSI server adapters that could be used for
mapping with a physical disk.
• mkvdev: Maps the virtual SCSI server adapter to a physical disk.
• lsmap -all: Shows the mapping information.
 Client partition commands
– No new commands needed; the typical device configuration uses
the cfgmgr command.

^Eserver pSeries
Section Review Questions
1. Any technology improvement will boost
performance of any client solution.
a. True
b. False
2. The application of technology in a creative way
to solve client’s business problems is one
definition of innovation.
a. True
b. False

^Eserver pSeries
3. Client’s satisfaction with your solution can be
enhanced by which of the following?
a. Setting expectations appropriately.
b. Applying technology appropriately.
c. Communicating the benefits of the technology to the
client.
d. All of the above.

^Eserver pSeries
4. Which of the following are available with
POWER5 architecture?
a. Simultaneous Multi-Threading.
b. Micro-Partitioning.
c. Dynamic power management.
d. All of the above.

^Eserver pSeries
5. Simultaneous Multi-Threading is the same as
hyperthreading, IBM just gave it a different
name.
a. True.
b. False.

^Eserver pSeries
6. In order to bridge network traffic between the
Virtual Ethernet and external networks, the
Virtual I/O Server has to be configured with at
least one physical Ethernet adapter.
a. True.
b. False.

^Eserver pSeries
Review Question Answers
1. b
2. a
3. d
4. d
5. b
6. a

^Eserver pSeries
Unit Summary
 You should now be able to:
– Describe the relationship between technology and
solutions.
– List key IBM technologies that are part of the POWER5
products.
– Be able to describe the functional benefits that these
technologies provide.
– Be able to discuss the appropriate use of these
technologies.

^Eserver pSeries
Reference
 You may find more information here:
IBM eServer pSeries AIX 5L Support for Micro-Partitioning
and Simultaneous Multi-threading White Paper
Introduction to Advanced POWER Virtualization on IBM
eServer p5 Servers SG24-7940
IBM eServer p5 Virtualization – Performance
Considerations SG24-5768

Technology (1)

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Technology (1)

Similar to Technology (1) (20)

Technology (1)

Editor's Notes