More Related Content Similar to SCSI-3 PGR Support on Symm Similar to SCSI-3 PGR Support on Symm (20) More from Yunchao (Kevin) Wang More from Yunchao (Kevin) Wang (8) SCSI-3 PGR Support on Symm1. Copyright © 2013 EMC Corporation. All Rights Reserved.
SCSI-3 PGR Support on
Symm
Kevin Wang
Nov 2014
2. Copyright © 2013 EMC Corporation. All Rights Reserved.
SCSI-3 Persistent
Group
Reservation
3. Copyright © 2013 EMC Corporation. All Rights Reserved.
SCSI-3 Persistent Group Reservation
(PGR)
• This is a reservation key based device locking method where each
initiator’s path (registrant) registers itself to a device using a set
reservation key and any one of the registrants can hold an Active
lock at a single point in time.
• The reservation is “persistent” as the lock information is held on
the SFS within the Symmetrix and is not affected by a SCSI bus
reset.
– This means that the host can be shut down but the lock will stay “Active”
in the Symm until one of the registrants with the same reservation key
PREEMT or RELEASE the Active reservation.
4. Copyright © 2013 EMC Corporation. All Rights Reserved.
SCSI-3 Persistent Group Reservation
(PGR)
• Usually, all initiators (HBA port) from a single host register with
the same reservation key. (much like EMC Grouped Reservation)
– Same initiator can be resgistered multiple times if its presented to multiple
different FA ports. Each “path” per initiator will need to register.
• There is a maximum number of registrants limit in Symmetrix of
340 (decimal) registrations per device.
– This limit was causing issues with Microsoft Cluster Shared Volume where
all the hosts in the cluster registered each of its paths to the device
simultaneously and the number exceeded 340.
5. Copyright © 2013 EMC Corporation. All Rights Reserved.
SCSI-3 Persistent Group Reservation
(PGR)
• SC3 Port flag (seen by 8F,,,<port>) needs to be enabled so that the
Inquiry data returned by any device on the port to the host
reports Symmetrix supporting SCSI -3.
• From Enginuity 5875, SCSI-3 Pers Resv (PER) bit on the Symm
devices are enabled by default.
– To check = D1,<dv>,C from FA
– There should not be any negative effect by having it enabled on SCSI-2
only hosts
6. Copyright © 2013 EMC Corporation. All Rights Reserved.
D1,<dv>,C
• This shows the device flags. SCSI-3 Pers Rsv flag is now enabled by
default on all devices at Enginuity 5875 or above.
7. Copyright © 2013 EMC Corporation. All Rights Reserved.
D1,<dv>,A
• First use A1,D,<dv>,<cnt> to determine which director the device
lock is held.
8. Copyright © 2013 EMC Corporation. All Rights Reserved.
8F,’PGR’,VIEW,<dv>
• This command shows the last update time of the lock as well as
the registrants and the reservation key.
• Initiator in CYAN is the current active lock holder and this can
change often when there is IO to the device.
9. Copyright © 2013 EMC Corporation. All Rights Reserved.
SCSI-3 PGR
KEY = 1234123412341234
KEY = 1234123412341234
KEY = 0000000000000000
KEY = 0000000000000000
10. Copyright © 2013 EMC Corporation. All Rights Reserved.
How to release SCSI
Reservation
11. Copyright © 2013 EMC Corporation. All Rights Reserved.
Solutions Enabler to clear SCSI
Reservation
• Exclusive Reservation (SCSI-2) and Group Reservation (SCSI-2)
can be cleared by Solutions Enabler SYMCLI.
– symld -g dg_name break LdevName
• Persistent Group Reservation (SCSI-3) cannot be displayed or
released using SYMCLI commands.
– It can only be cleared by Inlines command or a dedicated host
application that can clear SCSI-3 PGR.
12. Copyright © 2013 EMC Corporation. All Rights Reserved.
How to clear a lock for an open system device on a
Symm 4, Symm5, Symm6, Symm7 or Symm8 with
inline
13. Copyright © 2013 EMC Corporation. All Rights Reserved.
Releasing SCSI-3 PGR Reservation
1. Run A1,D and find which FA lock is active on
2. Go to the FA holding the active reservation
3. Run following Inlines command
– BC,BF,F0,RCVR,ALLI,ALLP,<dv>,PRSV
Note:
Broadcast to all FA is being performed via BC,BF, prefix.
This is done because the lock holder may change over time.
15. Copyright © 2013 EMC Corporation. All Rights Reserved.
Lab example – PGR Reserve
After host has performed reserve:
D1,20,A
8F,'PGR',VIEW,20
16. Copyright © 2013 EMC Corporation. All Rights Reserved.
Lab example – PGR Release
After the host releases the reservation:
8F,'PGR',VIEW,20
– Notice the CYAN colored text are gone.
– Initiator is still registered. (May be cleared depending on host action)
D1,20,A
– Does not show any active lock on the device now.
17. Copyright © 2013 EMC Corporation. All Rights Reserved.
Lab example – PGR Clear
• After the host performs a SCSI3 PGR Clear (0x03)
– All the registrations and reservation is cleared
– Meaning the device is cleared of SCSI-3 reservation.
8F,'PGR',VIEW,20
D1,20,A
19. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – Ranbaxy Laboratories Limited,
SR 64251214
• Symmetrix DMX-3
• Customer's windows cluster failed to pass the validation test
before creating the cluster
• After the investigation, seems the SCSI-3 PGR reservation should
be the root cause
• Manually intervention is required to fix this issue
20. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – Ranbaxy Laboratories Limited,
SR 64251214
• Customer's windows cluster failed to pass the validation test before creating
the cluster
• After the investigation, seems the SCSI-3 PGR reservation should be the root
cause
21. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – Ranbaxy Laboratories Limited,
SR 64251214
• Per customer’s request, checked the affected device 191 and 1EA
and did not find any SCSI reservation on it.
22. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – Ranbaxy Laboratories Limited,
SR 64251214
• After further investigation, I found that the SCSI-3 PRG reservation
has not been enabled both on the FA port level and the device. So
below is the action that customer need to take to fix this issue.
• Enable the SCSI-3 PGR bit at FA level use the command like this:
symconfigure -sid xxx -cmd "set port xxx:x SCSI_3=enable;"
commit -v –noprompt
• Enable the SCSI-3 PGR support for the device that needs this
feature be enabled use the command like this: symconfigure -sid
xxx -cmd "set device <dev_num> attribute=SCSI3_persist_reserv;"
commit -v -noprompt
23. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – Ranbaxy Laboratories Limited,
SR 64251214
• Enable the SCSI-3 PGR bit at FA level use the command like this:
symconfigure -sid xxx -cmd "set port xxx:x SCSI_3=enable;"
commit -v –noprompt
24. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – Ranbaxy Laboratories Limited,
SR 64251214
• Enable the SCSI-3 PGR support for the device that needs this
feature be enabled use the command like this: symconfigure -sid
xxx -cmd "set device <dev_num> attribute=SCSI3_persist_reserv;"
commit -v -noprompt
25. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TI Telecom Italia, SR 53441840
• Symmetrix VMAX 20K
• Device (1263) was removed from an SG (or deleting a masking
view) without first having removed it from the cluster
configuration, resulting in PGR being left over on the devices.
• PSE was engaged to fix this issue
26. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TI Telecom Italia, SR 53441840
• DB0D.22 is logging on FA port 16f against device 1263 which is
also locked by director 16f
27. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TI Telecom Italia, SR 53441840
• 1263 is locked by WWN 2312 from director 16f
28. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TI Telecom Italia, SR 53441840
• No initiator at FA:16f is registered with device 1263
29. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TI Telecom Italia, SR 53441840
• Clear the reservation lock and error stopped
30. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• 70.1222.D0 - Persistent group reserve found on non-pgr device
• 75.DB0D.43 - Persistent group reserve error
• 1222.D0 and DB0D.43 logging since April 2014 against devices
1050, 1178, 12A7 and 12A9 and streaming on FA 6G:0
31. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• Devices are mapped on FA 6G:0, 7G:0, 10G:0 and 11G:0
32. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• Checked the director flag setting and found that SCSI-3 PGR bit
had not been enabled on FA 6G:0
33. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• The SCSI-3 PGR support has not been enabled on those four devices, hence we
can not get correct PGR register information
34. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• Checked the lock status on one device and found the following problems (lock status was changing all the time),
actually all four devices have the same issue. Reservation type code is WRITE EXCLUSIVE REGISTRANTS ONLY but
no initiators on this director are registered which means this function did not work as design.
35. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• Later we issued command A1,D,<device>,<count> on those four
affected devices and system did not report any live SCSI locks
which is so strange.
36. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• So at this moment, can we find out some clues which may caused
this issue from the previous inline output?
• Even if the 8F,PGR,VIEW,<device> command did not work for this
case, we can still find the useful information from the following
inline output
• All four affected devices were reporting locked by lp_id 0203. The
lp_id here can be convert to index in the first column of command
8F,DVIN,VIEW. Index 0 represents lp_id 1; index 1 represents lp_id
2 etc…Here index 202 represents lp_id 203.
37. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• The first problem we observe is that index 202 is a PWWN which
logged on 6G:1. But actually all four affected devices are mapped
to 6G:0.
• The second problem is that the index 202 does not log in the box
correctly. Below is the evidence.
• For common HBA log in, we should have flags show as C3 04.
38. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• If we have the left flag as C2.
• DefineBit(IR_DVIN_EXISTS , 1), /* Initiator is a known initiator. (Known initiators are those
which might have logged in earlier and logged out) */
• DefineBit(IR_ALIVE_INITIATOR , 6), /* Initiator is logged in and executed an IO. */
• DefineBit(IR_ACTIVE_INITIATOR , 7) /* Flag updated in emul_new_ulp_cmd() whenever we
receive a cdb from CDI. */
• We probably have issue that the HBA is logged into the FA
although the host is not seeing the devices and the "symmask -sid
<SID> list login" command doesn't report the WWN as logged in.
The host needs to be forced to re-initialize its login.
39. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• We have left flag as 06, we would not find the related record in
FC,NAME which means this PWWN does not log in correctly.
40. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• We can not check the status of command FC,NTBL,<D_ID>,1 also
can not find it in 8F,,,1.
41. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• (First Attempt) In order to fix the issue, ENG tried to release the
lock by use SCSI-3 releasing inline command on those four
devices, but it did not work. Lock status was still changing as
before.
42. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• (Second Attempt) As the current issue was so strange and the
PRSV command could not fix the issue, ENG issued regular
releasing inline command on those four devices, but it still did not
work. Lock status was still changing.
43. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• (Third Attempt) ENG set the trace and 8C the device record but it
did not work, so he changed it back.
44. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• 8F,PGR,DBCK,<device>,1,RCLR,FORC
• (Final Attempt) ENG removed the reservations from the devices
by removing the PGR database records for each registered
initiator from SFS. It works this time.
45. Copyright © 2013 EMC Corporation. All Rights Reserved.
Case Study – TIM RIO NORTE S/A, SR
66293766
• 1222.D0 and DB0D.43 error stopped finally
• We found several entries on April 8th for mapping changes on
these devices to their apparent current mapping locations so that
might be the root cause.
Editor's Notes Notice how only 1 registrant shows as ACTIVE, Write Exclusive and Registrant only. This means that at that point in time, the mentioned WWN at Index 1 is the active holder of the reservation and is the only one allowed to perform IO to the device.
You will also see here that there is GID mentioned. This means that the host is using EMC PowerPath. You can see that there are 4 registrants but there are 2 different GIDs. This means there are 2 hosts and each host has most likely 2 HBA’s or 1 HBA with 2 ports.
Also notice how the 2 WWNs with GID 8444b000 has a non-zero reservation key while the 2 WWNs from the other host with GID 6e59b300 has all-zeros reservation key. From this, you can determine that the 2nd host’s initiators (index 2 & 3) has released its reservation but remains registered.
Update Time = when the PGR was last updated/queried for this device. This time can change very frequently if a host is performing multi-path active/active load balancing of IO which changes the ACTIVE RESV between registrants. (Such as Round Robin)
Animation Explanation
HBA1 from Node A registers itself to device 0x195C
HBA2 from Node A registers itself to device 0x195C
Can Solutions Enabler (SYMCLI) be used to clear SCSI reservations? KB - 28651
See solution 6131 for an explanation of group and exclusive reservations For Mainframe environment, have the Customer check the allocation to the device that is locked or reserved with 'd u,,alloc,xxxx,1'. If there is a Customer Job allocated, let them know that this job is reserving that device. If there is no Customer Jobs allocated, continue with Customers approval.
Notice the WWN.
Notice how it doesn’t show any GID/GIDN. Meaning host doesn’t have PowerPath installed or isn’t working properly.
There is only 1 registered initiator in the 8F,PGR,VIEW output. This tells us that:
It may be a single standalone host or 1 node cluster or the device has not yet been reserved by the other hosts yet since the cluster was set up or PGR DB was cleared. (CLEAR = 03h)
There is only 1 path from host to this device. (Not necessarily host to Symm!) Meaning no path redundancy to the device.
FLAGS break to left value and right value.
Left value of C3:
DefineBit(IR_DVIN_VALID , 0), /* Initiator is logged in and host config flags have been applied */
DefineBit(IR_DVIN_EXISTS , 1), /* Initiator is a known initiator. ( Known initiators are those which might have logged in earlier and logged out) */
DefineBit(IR_DVIN_DISCONNECTED , 2), /* Monitor_connection notification pending */
DefineBit(IR_STUN_ENABLE , 3), /* STUN enable bit */
DefineBit(IR_INIT_GIDN_PENDING , 4), /* Indicator that Background task is yet to set it */
DefineBit(IR_CLEAR_GIDN_PENDING , 5), /* Indicater that Background task is yet to clear it*/
DefineBit(IR_ALIVE_INITIATOR , 6), /* Initiator is logged in and executed an IO. */
DefineBit(IR_ACTIVE_INITIATOR , 7) /* Flag updated in emul_new_ulp_cmd()whenever we receive a cdb from CDI. */
Right value of 04:
DefineBit(ORS_MAX_CNXT , 4), /* This initiator has the max# of ORS sessions */
DefineBit(UA_REPORT_LUNS , 3), /* Host is owed an 06/3F/0E Unit Attn */
DefineBit(IR_DVIN_ACTIVE_RW , 2), /* Host is sending read/write activity */
DefineBit(ENV_ERR_PENDING , 1), /* Environmental error report pending */
DefineBit(ISCSI_BRIDGE , 0) /* Host connected via iSCSI bridge */
DefineBit(IR_DVIN_VALID , 0), /* Initiator is logged in and host config flags have been applied */
DefineBit(IR_DVIN_EXISTS , 1), /* Initiator is a known initiator. (Known initiators are those which might have logged in earlier and logged out) */
DefineBit(IR_DVIN_DISCONNECTED , 2), /* Monitor_connection notification pending */
DefineBit(IR_STUN_ENABLE , 3), /* STUN enable bit */
DefineBit(IR_INIT_GIDN_PENDING , 4), /* Indicator that Background task is yet to set it */
DefineBit(IR_CLEAR_GIDN_PENDING , 5), /* Indicater that Background task is yet to clear it*/
DefineBit(IR_ALIVE_INITIATOR , 6), /* Initiator is logged in and executed an IO. */
DefineBit(IR_ACTIVE_INITIATOR , 7) /* Flag updated in emul_new_ulp_cmd()whenever we receive a cdb from CDI. */