Tinker Twins – TB2095 - Technical tactics for enterprise storage
Upcoming SlideShare
Loading in...5
×
 

Tinker Twins – TB2095 - Technical tactics for enterprise storage

on

  • 4,704 views

HP Master Technologists: Greg & Chris Tinker, presentation deck from HP Discover 2012 Las Vegas “Technical tactics for enterprise storage”

HP Master Technologists: Greg & Chris Tinker, presentation deck from HP Discover 2012 Las Vegas “Technical tactics for enterprise storage”

Statistics

Views

Total Views
4,704
Views on SlideShare
4,576
Embed Views
128

Actions

Likes
0
Downloads
60
Comments
0

2 Embeds 128

http://h30507.www3.hp.com 127
http://translate.googleusercontent.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Tinker Twins – TB2095 - Technical tactics for enterprise storage Tinker Twins – TB2095 - Technical tactics for enterprise storage Presentation Transcript

  • Tinker, Greg & Chris ‟ HP Master Technologists© Copyright 2012 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice.
  • Technical tactics forenterprise storageHP P9000 (XP), P10000 (3PAR), and P6000 (EVA), P4000 (LeftHand), X9000 (Ibrix)Tinker, Greg & Chris – HP Master TechnologistsJune 2012© Copyright 2012 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice.
  • Forward-looking statementsThis session is high level and is subject to change without notice.This document contains forward looking statements regarding future operations, productdevelopment, and product capabilities. This information is subject to substantial uncertaintiesand is subject to change at any time without prior notification. Statements contained in thisdocument concerning these matters only reflect Hewlett Packards predictions and / orexpectations as of the date of this document and actual results and future plans of Hewlett-Packard may differ significantly as a result of, among other things, changes in product strategyresulting from technological, internal corporate, market and other changes. This is not acommitment to deliver any material, code or functionality and should not be relied upon inmaking purchasing decisions.3 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • The HP storage portfolio Solutions E5000 HP HP HP for Exchange VirtualSystem CloudSystem AppSystem Online X1000/X3000 X5000 X9000 P2000 P4000 P6000 EVA P10K/3PAR P9500 XP HP Services Infrastructure HP Networking Wired, Wireless, Data Center, HP Networking SAN Connection B, C & H Series Security & Management Enterprise Switches Portfolio FC Switches & Directors Nearline ESL tape Virtual library RDX, tape drives EML tape libraries libraries D2D Backup systems & tape autoloaders MSL tape libraries Systems Business Copy Software Continuous Access Data Protector Storage Storage Array Storage Cluster Extension Express Mirroring Software Data Protector Essentials4 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Where to begin© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Technical tactics for enterprise storageAgendaDevice server – the array: • Front end (CHA/FA, Cache, MP/ASIC, Bus, CMD IOCTL, etc.) ‟ overhead and/or saturation • Back end ‟ Hot disks, slow disks, Array Groups, Storage Tiers, external storageHost / application: • IO profile ‟ stride, reverse, random, sequential, buffered/non-buffered, sync/async... • CPU ‟ interrupts, switches, freq of CPU… • Parallelism ‟ keeping the pipe fullStorage connectivity • Flow control • LatencyDebugging Know the layers6 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device server© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device server Storage Application Mission Critical Virtual IT Utility Storage Consolidation Consolidation Consolidation P2000 MSA P4000 LeftHand P6000 EVA 3PAR P9500 Architecture Dual Controller Scale-out Cluster Dual Controller Mesh-Active Cluster Fully RedundantConnectivity SAS, iSCSI, FC iSCSI FC, iSCSI, FCoE iSCSI, FC, (FCoE) FC, FCoEPerformance 30K Random Read IOPs ; 1.5GB/s seq 35K Random read IOPs 55K Random read IOPS > 400K random IOPs; >300K Random IOPS (ThP) reads 2.6 GB/s seq reads 1.7 GB/s seq Reads > 10 GB/s seq reads > 10GB/s seq readsApplication Sweet SMB , enterprise ROBO, consolidation/ SMB, ROBO and Enterprise ‟ Enterprise - Microsoft, Virtualized, Enterprise and Service Provider , Large Enterprise - Mission Criticalspot virtualization Virtualized inc VDI , Microsoft apps OLTP Utilities, Cloud, Virtualized w/Extreme availability, Virtualized Server attach, Video surveillance BladeSystem SAN (P4800) Environments, OLTP, Mixed Environments, Multi-Site DR WorkloadsCapacity 600GB ‟ 192TB; 7TB ‟ 768TB; 2TB ‟ 480TB; 5TB ‟ 1600TB; 10TB ‟ 2000 TB; 6TB average 72TB average 36TB average 120TB average 150TB averageKey features Price / performance All-inclusive SW Ease of use and Simplicity Multi-tenancy Constant Data Availability Controller Choice Multi-Site DR included Integration/Compatibility Efficiency (Thin Provisioning) Heterogeneous Virtualization Replication Virtualization Multi-Site Failover Performance Multi-site Disaster Recovery Server Attach VM Integration Autonomic Tiering and Management Application QOS (APEX) Virtual SAN Appliance Smart TiersOS support Windows, vSphere, HP-UX, Linux, vSphere. Windows, Linux, HP-UX, Windows, VMware, HP-UX, Linux, vSphere, Windows, Linux, HP-UX, AIX, All major OS’s including Mainframe OVMS, Mac OS X, Solaris, Hyper-V MacOS X, AIX, Solaris, XenServer OVMS, Mac OS X, Solaris, AIX Solaris and Nonstop8 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverFront end: midrange controller architecture - ALUA Controller 2 CPU Cache ... Fibre Channel or FC Ports SAS Backend Cache ... CPU Controller 1Typical Midrange Controller designClustered dual Controller, Host Ports, Mirrored Cache, Set of Software Features.9 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverFront end: integrated processors vs. SMP distributed processors CHA DKA MP MP MP MP LM Shared MPB ESW DKA CHA CSW Memory MP MPMP MP MP MP MP MP SM CACHE CACHE • With the XP24000, up to 128 MPs were located directly on the CHAs • With the P9500, the much faster multi-core MPs reside on Micro Processor and DKAs Blades (MPB) and are independent of specific responsibilities • Each MP has specific task sets and limited performance. • All MPs share responsibility for the operation of the whole array. • All MPs competed for Share Memory (MP) access and locks • The MP Blades do have Local Memory (LM) to store Shared Memory content and reducing SM traffic10 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverFront end: integrated processors vs. distributed processors (SMP) XP24000 P9500 DKA DKA DKA DKA MP MP MP MP MP MP MP MP CHA CHA Control of all LDEVs Control of dedicated Control of dedicated Shared and SW features memory LDEVs LDEVs and SW features and SW features MP MP MP MP MP MP MP MP LM LM MP MP MP MP CHA CHA MP MP MP MP MP MP MP MP MP MP MP MP MPB MPB • LDEV ownership and SW features are shared between all MPs • LDEV ownership and SW features are dedicated to one multi-core Micro • All MPs compete for Share Memory access and locks Processor Blade (MPB) where all cores share responsibility and load • Shared Memory may become a bottleneck • Most Shared Memory traffic only occurs locally to the Local Memory (LM) eliminating Shared Memory as the bottleneck11 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverBack end: physical to logical layout Shared (& LM on P9500) memory Phy -- LDEV ‟ Port PG 1-1 -- 0:00 -- CL1A 0:00 CL2A Cache MP board on SMP architecture CL1A CHIP Ports CL2A12 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverBack end: HDD understanding„ The number and type of disk drives installed in a array is flexible and actual counts vary depending on model„ Averaged IOPS/drive vary depending on array models due to factors other than Physical drive ‟ such as ASIC, MP SMP roles, and cache slot boundaries to name a few.„ Average IOPS/drive or per array group can only give you a small glimpse into what an array can do. One must consider the cache slot size (256K for P9000, or 16K for P10000) and average expected latency per drive (design usually around 8ms) 1) Max # of SSD drives: 128 with one DKC; 256 with 2 DKC 2) Each disk logs in to the SAS switch at max SAS speed13 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverBack end: High level maximums P4000 P6000 P2000 X9000 Maximum Values P10000 XP24K P9500 (LeftHand) (EVA) MSA (IBRIX) 450(SFF) 96 (LFF) Internal Disks 1920 1152 2048 >1120 2048 240(LFF) 149 (SFF) 1840/2000 Internal Capacity TB 1600 688/22601 2240 480 ~130 -- ² Subsystem Capacity PB -- 247 247 -- -- -- 16PB (Internal + External Capacities) ~4 Host Ports FC 192 FC 224 FC 192 1GbE ~ 64 8 -- 4Gb/Controller # of LUN/LDEVs -- 65280 65280 -- 2047 512 512 Cache 32GB Cache GB 768 512 1’024 22/controller 2/controller -- Memory 768GB Performance Disk GB/s -- 11 >15 80 (32 nodes) ~1.6 ~1.6 -- >360,000 Performance Disk IOPS SPC-1: >160k >350k -- ~55k ~16k -- 450,213 1 600GB FC / 2TB SATA 3.5” disks14 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 2 900GB SAS / 1TB NL SAS 2.5” disks
  • Device serverBack end: avoid disk access on critical application SCSI Ctlr 8KB Xfr (Same For F/C Drives) Seek Rotate 5400 RPM 7200 RPM 10K RPM 15K RPM HSx80 Ultra-SCSI Bus (1/2 For F/C Bus) 0 1 2 3 4 5 6 7 8 9 10 Time (ms)15 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverBackend: HDD, LDEV, ThP, V-Vol, & pool-vol PDEV 1 of 4 in 4 disk raid set The space represents a clear divide between the LDEVs on a single Physical DEV (PDEV) LDEV #1 LDEV #2 Logical Device (LDEV) LDEV #n Array Group, RSS, etc…16 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverBack end: queue depth overview active I/Os 1 2 3 4k 32k 8k 16k 4Read ahead algorithm for sequential contiguous blocks is done prior to HDD access. At this point, eachblock requires physical disk access and seek for physical location. Best designs try to hold each drive withan average queue between 2 – 4 depending on load. Q-depth at HDD level is not captured with all arrayperformance tools, and varies depending on model.Note: it is very important to understand that each I/O block may or may not be equal to a cache slotdepending on array model – this plays a large role in performance characteristics.17 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverBack end: Latency with respect to utilization Increase of Response Tim eMax IOPS  highest response time (queuing) 25Design for ~<10ms/IO 20 Increase 15Designing and maintaining an average disk 10usage of 60% provides best overall 5performance results while yielding room forspike loads. 0 0% 20% 40% 60% 80% 100% Utilization of Resource18 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverBack end: traditional (thick) vs. thin provisioning Server view HP ThP Traditional OS visible 14.3TB OS visible 14.3TB (Projected requirements) (Projected requirements) 2TB 1.8TB 3TB 1.9TB 2.1TB 3.5TB 14.3TB logically 14.3TB physically provisioned Array presentation capacity provisioned capacity 3TB 3.5TB 3.1TB of data actually written 2TB 2.1TB 11.2TB stranded (V-Volumes) 1.8TB 1.9TB 0.6TB 0.5TB 0.7TB 0.5TB 0.4TB 0.4TB 5TB ThP pool 3.1TB used/written No pool 1.9TB free/unused ThP Pool Net physical pool Physical capacity capacity 5TB Array groups/Disk Drives required 14.3TB net19 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Device serverBack end: cache partitioning ‟ depends on model… P9500 below Group A Group B Group C + D • You can divide your P9500 in up to 32 independent “sub arrays” • Partitions array resources (CHA, Host Groups, Cache and disk groups) P9500 CHA CHA CHA CHA CHA CHA CHA CHA • Allows array resources to be focused to meet Host Group Host Group Host Group Host Group critical host requirements • Allows for service provider oriented deployments LDEVs/VVols LDEVs/VVols LDEVs/VVols LDEVs/VVols • Array partitions can be independently managed Cache Cache Cache Cache Partition n Partition 3 Partition 2 Partition 1 • Can be used in a mixed deployment (Cache, Array, Traditional, ThP, Smart) Super Admin • You can set a MAX of 4 independent “sub arrays” with full hardware isolation down to the MP board. External Storage20 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationOverview ‟ I/O© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationI/O profile• The IO subsystem is the slowest component of the system and it is often the cause of performance problems• Distinguishing between the many different layers of the IO request is key • Delays often occur before and after the physical IO request is dispatched to disk device(s) • Throughput (Mb/sec) and responsiveness (msec/IO) are directly related.22 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationI/O ProfileFundamental causes of I/O performance problems• Slow Response in communicating to the device server• Bottleneck / Saturation / Queuing in I/O stack• Contention for locks at all levels of I/O stack• I/O access patterns• Inefficient I/O ‟ logical vs physical IO (Scatter Gather ‟ read-ahead)23 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationI/O profile: elements of an I/O request File System Logical IO Buffer Cache File system IO Vol Mgmt Device Driver Physical IO Raw IO IO Channel Disk Device24 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationI/O Profile layers logical I/O request 64Kb fs extents 8K 32Kb 16Kb 8Kb 192Kb 4 FS extents (excluding read ahead amount) readahead physical I/O request 16Kb 16Kb 16Kb 8Kb 64K 64K 64K LVM physical extent boundary max buffered I/O = 64K25 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationI/O profile layers: IBRIX Dir US Patent # 6,782,389 … F1 F2 F3 Fn Subdir … S1 S2 S3 Sn Dir 1 S2 S1 4 Extremely High Aggregate S2 F2 2 S1 Performance from a Single Directory (and Single File) 3 F3 Subdir Sn … Sn 100 Fn S3 Segment Segments Servers26 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationI/O profile: stride, reverse, random, sequential, buffered/non-buffered, sync/async, ...Regardless of the SCSI(), CPU Cycles will be required for the given operation, time will be requiredand depending on Queue depth ‟ latency will be felt.Examples:P6000 (EVA) port Queue Depth is 2048‟ if the running queue depth is hold at 825, and theaverage latency is 23ms/IO, and the I/O behind the port is scheduled for a single LUN, and a SCSI-2 Reserve is issued (ESX), the latency could theoretically go as high as 18,975ms for which theSCSIconflictretry counter would deprecate its default counter to 0 from 80 and fail anyoutstanding I/O.This is not a fault of the array, but a mis-understanding of the architecture and placing to manyapples in one basket.27 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationCPU: interrupts, switches, and Q-depth behavior Queued I/Os Active I/Os 1 HBA 1 2 Host 3 HBA 2A IOCTL an interrupt to the CPU which the HBA is 4assigned. Interrupt coalescing is a (HBA driver) vender’sability to interrupt a CPU one time to handle multipleIOCTL’s in a period of time thus reducing the CPU contextswitches.28 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationOS CPU: Interrupt processing, switches, Freq of CPU… HBA I/O completion cpu 0 busy 1 I/O completion 2 cpu 1 3 idle I/O completion 4 I/O completion• HBA is assigned CPU for I/O completion interrupt• CPU may be uninterruptible29 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationArray CPU: Busy due to IOCTL() cpu 0 IOCTL() request for disk array performance busy measurements tools resulting in internal processing for dumping performance registers cpu 1 idleCPU (MP) on array can also become “busy” or over utilized by processing request from3rd party API’s to manipulate the array through IOCTL(), not just normal SCSI()read, writes, reserves, TUR, etc.30 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Host / applicationParallelism ‟ keeping the pipe full through threading Read stream # -- 8KB 8KB 8KB 8KB 8KB 8KB 8KB 8KB Single Stream MB/sec (in theory) 90 9 KB/IO 80 8 70 7 60 6 MB/Sec 50 5 40 4 MB/sec 30 3 20 2 KB/IO 10 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 milliseconds / IO31 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Storage connectivity© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Storage connectivityFCIP, IFCP, or ISCSI• IFCP and FCIP are intended for interconnecting Storage Area Networks (SAN) devices. IFCP’s claim to fame is providing SAN fabric segmentation (form of routing) but does not have much vendor hardware backing• FCIP tunnels FC through IP and allows for fabric merges.• ISCSI almost always can reside on the same router with FCIP but not on the same GbE port.• IFCP ‟ routes between fabrics and can also be used for storage and server33 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Storage connectivityFlow Control: FC• Primary Mechanism buffer credits.• Each Transmitting port has a given credit, bb_credit, which represents the maximum number of receive buffers (outstanding frames) it can use.• Slow Drain ‟a slow component in the SAN can lead to exhaustion of the Buffer credits for an initiator or target resulting in severely inflated service times Example: Time to transmit 8 2KB Frames 32 KM takes 160us. Tx 2KB 2KB 2KB 2KB 2KB 2KB 2KB 2KB Rx34 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Storage connectivityFlow Control: FCIP, IFCP, & ISCSIAll three have one thing in common; they all utilize TCP/IP protocols for movingblock data.• Buffer credits @ FC layer• sliding windows @ TCP layer• QOS (@ both layers)35 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Storage connectivityConnectivity issues:• TCP/IP network RTT• TCP congestion control ‟ to avoid over running receiver• TCP/IP retransmissions severely impacts SCSI exchangeExample:Over subscription of a FCIP tunnel will result in TCP retransmissions. Anything greaterthan 1% TCP retransmissions will greatly inflate the SCSI exchange RTT thus resultingin the depletion of FC B2B and causing FC communication to stall. The FC “Slow Drain”effect. In 10Gbit 0.1% retransmission will result in never achieving more than 80%throughput.36 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Debugging© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • DebuggingLayer overview User Space Applications GNU C lib System Call Interface VFS (ext3, NTFS, VxFS, etc) Kernel Space RAW Buffer Cache LVM, VxVM, sd<alpha> MPIO ‟ device mapper Blkdev SCSI IDE Etc…38 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • User Space ApplicationsDebugging GNU C libLayer overview -- SCSI System Call Interface VFS (ext3, NTFS, VxFS, etc) Kernel Space RAW Buffer Cache Upper Layer SD ST SR SG LVM, VxVM, sd<alpha> MPIO ‟ device mapper Blkdev SCSI IDE Etc… SCIS MID LAYER: GLUE SCSI Lower Layer FC ISCSI SAS Etc…39 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Debugging Upper Layer SD ST SR SGLayer Overview -- SCSI SCIS MID LAYER: GLUEUpper Layer: SCSI Lower Layer FC ISCSI SAS Etc…Converts the requests from the upper layersinto SCSI commands. This is where the SCSIencapsulation begins… ./linux/drivers/scsi/sd.c …Files: /** * init_sd - entry point for this driver (both when built in or when * a module). * * Note: this function registers this driver with the scsi mid-level. **/ static int __init init_sd(void) { int majors = 0, i, err; Remember  SCSI_LOG_HLQUEUE(3, printk("init_sd: sd driver entry pointn")); for (i = 0; i < SD_MAJORS; i++) if (register_blkdev(sd_major(i), "sd") == 0) majors++]40 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Upper Layer SD ST SR SGDebugging SCIS MID LAYER: GLUELayer overview -- SCSI SCSI Lower Layer FC ISCSI SAS Etc…Mid Layer:Error handlers, Queuing, timeout, etc are represented in this layer. Given that this layer is what holds the LLDlayer together with the SCSI protocol it is referred to as the GLUE layer.SCSI logging “/sys/module/scsi_mod/parameters/scsi_logging_level” is enabled at this Layer. static int __init init_scsi(void)Files: { int error;~/scsi/scsi.c error = scsi_init_queue(); if (error) return error;~/scsi/scsi_error.c error = scsi_init_procfs(); if (error) goto cleanup_queue; error = scsi_init_devinfo(); if (error) goto cleanup_procfs; error = scsi_init_hosts(); if (error) … goto cleanup_devlist; error = scsi_init_sysctl(); int scsi_error_handler(void *data) if (error) goto cleanup_hosts; error = scsi_sysfs_register(); { if (error) goto cleanup_sysctl; struct Scsi_Host *shost = data; scsi_netlink_init(); … printk(KERN_NOTICE "SCSI subsystem initializedn");41 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. return 0;
  • Layer overview -- SCSILayer overview -- SCSILower layerThe low Layer Device Driver layer (LDD), such as Upper Layer SD ST SR SGQlogic, reside at this layer. We also enableddebugging at this layer. SCIS MID LAYER: GLUE SCSI Lower Layer FC ISCSI SAS Etc…Files:qla_dbg.h …iscsi_tcp.c (software)be_iscsi.hscsi_host.h42 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • SCSI -- LLDD Upper Layer SD ST SR SG SCIS MID LAYER: GLUE Etc SCSI Lower Layer FC ISCSI SAS …Lower layer device drivers ‟ donnectivity ‟ FC© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Lower layer device driversConnectivityLow layer device drivers• Qlogic SCSI Lower Layer FC ISCSI SAS Etc…• Emulex• Brocade• …44 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Lower layer device driversSanSurfer ~ Qlogic SCSI Lower Layer FC ISCSI SAS Etc… User Space Applications GNU C lib System Call Interface VFS (ext3, NTFS, VxFS, etc) Kernel Space RAW Buffer Cache LVM, VxVM, sd<alpha> MPIO ‟ device mapper Etc Blkdev SCSI IDE … Upper Layer SD ST SR SG SCIS MID LAYER: GLUE S ISC Etc SCSI Lower Layer FC A SI … S45 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Lower layer device driversscsi_host via systool SCSI Lower Layer FC ISCSI SAS Etc…# systool -c scsi_host -v output for this system ###################################################Class = "scsi_host" Class Device = "host0" Class Device path = "/sys/class/scsi_host/host0" ./include/scsi/scsi_host.h cmd_per_lun = "1" host_busy = "0" proc_name = "ata_piix" scan = <store method only> sg_tablesize = "128" state = "running" uevent = <store method only> unchecked_isa_dma = "0" unique_id = "1"…46 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • SCSI -- LLDD Upper Layer SD ST SR SG SCIS MID LAYER: GLUE Etc SCSI Lower Layer FC ISCSI SAS …Lower layer Device Drivers ‟ FC Debug© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Lower layer device driversDebugEnabling debug of the lower layer devicesdepends on driver options SCSI Lower Layer FC ISCSI SAS Etc…• Qlogic• Emulex# modinfo <driver>48 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Lower layer device driversDebug qlogic SCSI Lower Layer FC ISCSI SAS Etc…DynamicThe actual parameter varies depending on kernel version… in short.To enable:$ echo 1 > /sys/module/qla2xxx/ql2xextended_error_loggingor$ echo 1 >/sys/module/qla2xxx/parameters/ql2xextended_error_loggingFull Details at:http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/Enable-verbose-debugging-with-Emulex-and-Qlogic-on-Linux/ba-p/8995749 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Lower layer device drivers SCSI Lower Layer FC ISCSI SAS Etc…Debug lpfc ~ Emulex# modinfo lpfc We (Greg and Chris) find that logging level 0xdb works GREAT in situations in which you need to see what is… going on.depends: scsi_mod,scsi_transport_fcvermagic: 2.6.18-194.el5 SMP mod_unload gcc-4.1 Here is a detailed breakdown of how to calculate theparm: lpfc_log_verbose:Verbose logging bit-mask logging level in case you wish to log something else.(int)Log mask definition from to verbose bit discription lpfc_log_verbose=0xdbLOG_ELS 100 199 0x1 ELS eventsLOG_DISCOVERY 200 299 0x2 Link discovery eventsLOG_INIT 400 499 0x8 Initialization events Depends on driver (check the source)LOG_FCP 700 799 0x40 FCP traffic historyReservedLOG_NODE 800 900 899 999 0x80 Node table events position 4 3 2 1LOG_SECURITYReserved 1000 1100 1099 0x8000 1199 FC Security HEX D B DEC 13 11LOG_MISC 1200 1299 0x400 Miscellaneous eventsLOG_LINK_EVENT 1300 1399 0x10 Link events… … … … … BIN 1101 101150 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • SCSI -- LLDD Upper Layer SD ST SR SG SCIS MID LAYER: GLUE Etc SCSI Lower Layer FC ISCSI SAS …Lower layer Device Drivers ‟ ISCSI© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Lower layer device driversISCSILow Layer device drivers• ISCSI transport (software or hardware) SCSI Lower Layer FC ISCSI SAS Etc…• iscsi_tcp.c52 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Lower layer device drivers SCSI Lower Layer FC ISCSI SAS Etc… ISCSIThis is the default print of a kernel ring buffer on a RH system where ISCIS drivers are Loading iSCSI transport class v2.0-871. cxgb3i: tag itt 0x1fff, 13 bits, age 0xf, 4 bits.installed. iscsi: registered transport (cxgb3i) Broadcom NetXtreme II CNIC Driver cnic v2.1.0 (Oct 10, 2009)It should be noted that not all of these drivers are needed. cnic: Added CNIC device: eth0 cnic: Added CNIC device: eth1 Broadcom NetXtreme II iSCSI Driver bnx2i v2.1.0 (Dec 06, 2009)• ISCSI_TCP --software (needed if there are no hardware offload adaptors iscsi: registered transport (bnx2i) installed) scsi4 : Broadcom Offload iSCSI Initiator scsi5 : Broadcom Offload iSCSI Initiator iscsi: registered transport (tcp)• Cxgb3i ~ hardware Chelsio adaptors iscsi: registered transport (iser)• Bnx2 ~ broadcom and cnic used for offload iscsi: registered transport (be2iscsi) scsi6 : iSCSI Initiator over TCP/IP• ISER ~ AKA ib_iser ~ infiniband Vendor: LEFTHAND Model: iSCSIDisk Rev: 9500• Be2iscsi ~ Server Engine ~ Emulex acquired Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 2147483648 512-byte hdwr sectors (1099512 MB) sda: Write Protect is off sda: Mode Sense: 77 00 00 08 SCSI device sda: drive cache: none SCSI device sda: 2147483648 512-byte hdwr sectors (1099512 MB) sda: Write Protect is off sda: Mode Sense: 77 00 00 08 SCSI device sda: drive cache: none sda: unknown partition table sd 6:0:0:0: Attached scsi disk sda sd 6:0:0:0: Attached scsi generic sg0 type 0 53 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Lower layer device driversISCSI SCSI Lower Layer FC ISCSI SAS Etc…Enable debug at the ISCSI connection layer (not the TCP interconnect level)# modprobe.conf:Add “options iscsi_tcp debug_iscsi_tcp=1”NOTE Depends on driver used# /sbin/iscsid -c /etc/iscsi/iscsid.conf -i /etc/iscsi/initiatorname.iscsi -d 1# iscsiadm -m node --loginall=automaticLogging in to [iface: default, target: iqn.2003-10.com.lefthandnetworks:labtest:39:test, portal: 10.1.0.42,3260]Login to [iface: default, target: iqn.2003-10.com.lefthandnetworks:labtest:39:test, portal: 10.1.0.42,3260] successfulThough the CLI shows the same information on STDOUT, the log file will have far more…54 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • SCSI Upper Layer SD ST SR SG SCIS MID LAYER: GLUE Etc SCSI Lower Layer FC ISCSI SAS …Mid and upper layers© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • SCSI – mid and upper layers Upper Layer SD ST SR SGSCSI Architecture ‟ t10.org SCIS MID LAYER: GLUE Etc SCSI Lower Layer FC ISCSI SAS …56 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • SCSI – mid and upper layers Upper Layer SD ST SR SGReview of LLDD inquiry SCIS MID LAYER: GLUEThe LLDD layer has discovered targets and initialized it’s structures.It has issued the SCSI_SCAN() or REPORT_LUN(). Etc SCSI Lower Layer FC ISCSI SAS …EXAMPLE:RSCN comes in over the wire.. Triggers error handling at the HBA and the HBA calls a LIP to rescan.LDD driverqla2x00_do_dpc(void *data) --> qla2x00_rescan_fcports --> qla2x00_update_fcport --> qla2x00_lun_discovery() -->qla2x00_rpt_lun_discovery() --> qla2x00_report_lun()Same for ISCSI. Once communication is established for the session, a REPORT_LUN() is issued.• Devices are returned to the SCIS MID and UPPER layers for driver registration and context building.• Udev wakes up with an interrupt and builds dynamic user level device files from the uevent.57 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • SCSI Upper Layer SD ST SR SG SCIS MID LAYER: GLUE Etc SCSI Lower Layer FC ISCSI SAS …Mid and upper layers -- DEBUG© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • SCSI mid and upper: DEBUG Upper Layer SD ST SR SGSCSI loggingMany levels exist, a good starting point: SCIS MID LAYER: GLUETo enable: Etcecho 0x9411 > /proc/sys/dev/scsi/logging_level SCSI Lower Layer FC ISCSI SAS …To disable:echo 0 > /proc/sys/dev/scsi/logging_levelSee more details at:http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/Enable-verbose-debugging-with-native-SCSI-drivers-on-Linux/ba-p/8995559 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • SCSI mid and upper: DEBUG Upper Layer SD ST SR SGSCSI debug runneth overWARNING: SCIS MID LAYER: GLUEThe above Logging level will fill up /var/log/messages at Etca rapid pace. Make sure you know what you are looking for SCSI Lower Layer FC ISCSI SAS …and try to keep the debug level down to 4 ‟ 6 hours.The levels are described in: scsi_logging.hFound on any distribution and onhttp://linuxdb.corp.hp.comExample:http://brunel.gbr.hp.com/suse/lxr/http-SLES10-x86_64/source/drivers/scsi/scsi_logging.h60 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
  • Thank you© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.