MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
Upcoming SlideShare
Loading in...5
×
 

MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian

on

  • 9,446 views

Slides from MOW2010 presentation. ...

Slides from MOW2010 presentation.
The presentation provides practical understanding of Oracle Clusterware/CRS and knowledge required for independent troubleshooting of Clusterware issues - why nodes are evicted, why resources don't start or fail for no reason. After the presentation, a DBA will know where to look for the answers instead of blindly running cluvfy.sh utility. The session includes demos of how to troubleshoot clusterware issues such as evictions. The presentation does goes into Oracle Clusterware internals but it's appropriate for all DBA's from beginners to experienced.

Statistics

Views

Total Views
9,446
Views on SlideShare
8,888
Embed Views
558

Actions

Likes
21
Downloads
1,221
Comments
2

29 Embeds 558

http://oracleapplicationss11i.blogspot.com 178
http://www.pythian.com 162
http://www.oracloid.com 52
http://www.slideshare.net 46
http://oracleapplicationss11i.blogspot.in 44
http://www.oaktable.net 19
http://www.linkedin.com 9
http://oracleapplicationss11i.blogspot.co.nz 9
http://oracleapplicationss11i.blogspot.co.uk 8
http://oracleapplicationss11i.blogspot.hk 3
http://oaktable.net 3
http://oracleapplicationss11i.blogspot.com.au 3
http://slideclip.b-prep.com 2
http://oracleapplicationss11i.blogspot.kr 2
http://oracleapplicationss11i.blogspot.de 2
http://oracleapplicationss11i.blogspot.fr 2
http://oracleapplicationss11i.blogspot.jp 2
http://oracleapplicationss11i.blogspot.ru 1
http://oracleapplicationss11i.blogspot.cz 1
http://oracleapplicationss11i.blogspot.ca 1
http://oracleapplicationss11i.blogspot.ae 1
http://oracleapplicationss11i.blogspot.fi 1
http://oracleapplicationss11i.blogspot.it 1
https://nr-027.appspot.com 1
http://oracleapplicationss11i.blogspot.nl 1
http://reader.googleusercontent.com 1
http://static.slidesharecdn.com 1
file:// 1
http://oracleapplicationss11i.blogspot.no 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • can you please convert the file to ppt file .. or mention program to open this.

    regards
    feras
    Are you sure you want to
    Your message goes here
    Processing…
  • When I try downloading the presentation - it downloaded as mow10-uthoc-alex-gorbachev-public-100422164413-phpapp02.key & doesnt open up either - plz suggest
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • - Successful growing business for more than 10 years <br /> - Served many customers with complex requirements/infrastructure just like yours. <br /> - Operate globally for 24 x 7 &#x201C;always awake&#x201D; services <br />
  • <br />
  • <br />
  • Clusterware is generic with customizations for Oracle resources. <br /> Only Clusterware accesses OCR and VD. <br /> Only DB instances access shared database files. <br /> OCR is accessed by almost every Clusterware component - configuration read from OCR. <br /> VIP is part of OC. <br /> Emphasize shared access to data!!! <br />
  • Clusterware is generic with customizations for Oracle resources. <br /> Only Clusterware accesses OCR and VD. <br /> Only DB instances access shared database files. <br /> OCR is accessed by almost every Clusterware component - configuration read from OCR. <br /> VIP is part of OC. <br /> Emphasize shared access to data!!! <br />
  • Clusterware is generic with customizations for Oracle resources. <br /> Only Clusterware accesses OCR and VD. <br /> Only DB instances access shared database files. <br /> OCR is accessed by almost every Clusterware component - configuration read from OCR. <br /> VIP is part of OC. <br /> Emphasize shared access to data!!! <br />
  • Clusterware is generic with customizations for Oracle resources. <br /> Only Clusterware accesses OCR and VD. <br /> Only DB instances access shared database files. <br /> OCR is accessed by almost every Clusterware component - configuration read from OCR. <br /> VIP is part of OC. <br /> Emphasize shared access to data!!! <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • OPROCD - pre 10.2.0.4 - hangcheck-timer <br />
  • Node membership and group membership for instances, ASM diskgrops <br />
  • Node membership and group membership for instances, ASM diskgrops <br />
  • CSSD cannot talk to each other -> operations are not synchronized -> shared data access -> corruption <br />
  • CSSD cannot talk to each other -> operations are not synchronized -> shared data access -> corruption <br />
  • CSSD cannot talk to each other -> operations are not synchronized -> shared data access -> corruption <br />
  • CSSD cannot talk to each other -> operations are not synchronized -> shared data access -> corruption <br />
  • CSSD cannot talk to each other -> operations are not synchronized -> shared data access -> corruption <br />
  • CSSD cannot talk to each other -> operations are not synchronized -> shared data access -> corruption <br />
  • CSSD cannot talk to each other -> operations are not synchronized -> shared data access -> corruption <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • In addition to NHB, Oracle introduced DHB. <br /> IO Fencing needed on split brain to avoid evicted node doing any further IO&#x2019;s. <br /> Oracle doesn&#x2019;t rely on any hardware - need compatibility with all palatform/hardware. <br />
  • <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • Oracle can&#x2019;t shoot another node without remote control and can&#x2019;t rely on one type of IO fencing (HBA/SCSI reservations). <br /> What&#x2019;s left - beg another another - please shoot yourself! <br />
  • What if CSSD is not healthy? It&#x2019;s very possible that it&#x2019;s not network problem but CSSD just doesn&#x2019;t reply for some reason. OCLSOMON comes to the scene. <br />
  • What if CSSD is not healthy? It&#x2019;s very possible that it&#x2019;s not network problem but CSSD just doesn&#x2019;t reply for some reason. OCLSOMON comes to the scene. <br />
  • Worse yes, the whole node is sick and even OCLSOMON can&#x2019;t function properly. Like CPU execution is stall. <br />
  • Worse yes, the whole node is sick and even OCLSOMON can&#x2019;t function properly. Like CPU execution is stall. <br />
  • Worse yes, the whole node is sick and even OCLSOMON can&#x2019;t function properly. Like CPU execution is stall. <br />
  • Worse yes, the whole node is sick and even OCLSOMON can&#x2019;t function properly. Like CPU execution is stall. <br />
  • Worse yes, the whole node is sick and even OCLSOMON can&#x2019;t function properly. Like CPU execution is stall. <br />
  • Worse yes, the whole node is sick and even OCLSOMON can&#x2019;t function properly. Like CPU execution is stall. <br />
  • Losing access to voting disks - CSSD commit suicide. <br /> Why? Cluster must have two communication paths + VD is the media for IO fencing. <br />
  • Losing access to voting disks - CSSD commit suicide. <br /> Why? Cluster must have two communication paths + VD is the media for IO fencing. <br />
  • Losing access to voting disks - CSSD commit suicide. <br /> Why? Cluster must have two communication paths + VD is the media for IO fencing. <br />
  • All nodes can reboot if voting disk is lost. <br /> Good time to discuss voting disk redundancy? 1 vs 2 vs 3 <br />
  • All nodes can reboot if voting disk is lost. <br /> Good time to discuss voting disk redundancy? 1 vs 2 vs 3 <br />
  • All nodes can reboot if voting disk is lost. <br /> Good time to discuss voting disk redundancy? 1 vs 2 vs 3 <br />
  • All nodes can reboot if voting disk is lost. <br /> Good time to discuss voting disk redundancy? 1 vs 2 vs 3 <br />
  • All nodes can reboot if voting disk is lost. <br /> Good time to discuss voting disk redundancy? 1 vs 2 vs 3 <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • diagwait -> not set by default (assumed 0) <br /> reboottime -> 3 seconds <br /> margin = reboottime - diagwait <br /> <br /> See init.cssd for more details <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • When Clusterware autostart is disabled (crsstart -> disable) then &#x201C;init.cssd autostart&#x201D; doesn&#x2019;t do anything. In this case a DBA can initiate the start later using &#x201C;init.crs start&#x201D; (10.1+) or crsctl start crs (10.2+). <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • Configuration data - voting disks, ports, resource profiles (ASM, instances, listeners, VIPs and etc). <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • DEMO - existing dependencies <br />
  • DB is in CRS Home <br /> Log files would be in appropriate Oracle home: <br /> {home}/log/{host}/racg/{resource_name}.log <br /> DEMO - log files and action script home match! <br /> DEMO - IMON logs <br />
  • DEMO - stop DB + rename spfile + start DB <br /> old way if have time with .cap file <br />
  • DEMO - lsmodules <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian Presentation Transcript

  • Under the Hood of Oracle Clusterware Miracle OpenWorld 2010 15-Apr-2010 Alex Gorbachev, The Pythian Group
  • Alex Gorbachev • CTO, The Pythian Group • Blogger • OakTable Network member • Oracle ACE Director • BattleAgainstAnyGuess.com • Vice-president, Oracle RAC SIG 2 © 2009/2010 Pythian
  • Why Companies Trust Pythian • Recognized Leader: • Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server • Work with over 150 multinational companies such as Forbes.com, Fox Interactive media, and MDS Inc. to help manage their complex IT deployments • Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. • Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response 3 © 2009/2010 Pythian
  • Agenda • Place of Clusterware in Oracle RAC • Node membership and evictions • Clusterware startup sequence • Oracle Cluster Registry • Resources Management and troubleshooting • 11gR2 Grid Infrastructure 4 © 2009/2010 Pythian
  • Agenda High th Th e e le m ss or yo e y Need to memorize u ou ne u ed nd to ers m ta em nd or , iz e Low Shallow In-depth Understanding 4 © 2009/2010 Pythian
  • Architecture OS OS OS VIP VIP VIP Listener Listener Listener Service Service Service Instance Instance Instance ASM ASM ASM Clusterware Clusterware Clusterware interconnect storage access OCR Voting disk Shared storage 5 © 2009/2010 Pythian
  • Architecture OS OS OS VIP VIP VIP Listener Listener Listener Service Service Service Instance Instance Instance ASM ASM ASM Clusterware Clusterware Clusterware interconnect storage access OCR Voting disk Shared storage 5 © 2009/2010 Pythian
  • OS Clusterware 6 © 2009/2010 Pythian
  • OS Clusterware Cluster Synchronization Services CSSD 6 © 2009/2010 Pythian
  • OS Clusterware Cluster Ready Services Cluster Synchronization Services CRSD CSSD 6 © 2009/2010 Pythian
  • OS Clusterware HA Framework scripts VIP RACG Cluster Ready Services Cluster Synchronization Services CRSD CSSD 6 © 2009/2010 Pythian
  • Event Manager OS Clusterware HA Framework scripts VIP RACG Cluster Ready Services EVMD Cluster Synchronization Services CRSD CSSD 6 © 2009/2010 Pythian
  • Event Manager OS Clusterware HA Framework scripts VIP RACG Cluster Ready Services EVMD Cluster Synchronization Services CRSD CSSD Oracle Process Monitor OPROCD 6 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CRSD CSSD OPROCD 7 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CSSD CRSD OPROCD 7 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 8 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 8 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 8 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 9 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 9 © 2009/2010 Pythian
  • OS OS Shoot Clusterware Clusterware VIP VIP The RACG RACG EVMD EVMD Other CRSD CRSD Node CSSD interconnect CSSD OPROCD OPROCD In The Head Voting disk 9 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 10 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 11 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 11 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 11 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 11 © 2009/2010 Pythian
  • OS Clusterware Ask VIP RACG The EVMD CRSD Other CSSD CSSD Node interconnect OPROCD To Reboot Voting disk Itself (c) known quote 11 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CS SD CSSD interconnect OPROCD OPROCD Voting disk 12 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG OCLSOMON EVMD EVMD CRSD CRSD CS SD CSSD interconnect OPROCD OPROCD Voting disk 12 © 2009/2010 Pythian
  • OS Clusterware VIP RACG OCLSOMON EVMD CRSD CSSD interconnect OPROCD Voting disk 12 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 13 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 13 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 13 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 13 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CRSD CSSD interconnect OPROCD OPROCD Voting disk 13 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 14 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 14 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 14 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CRSD CSSD CSSD interconnect OPROCD Voting disk 14 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD Voting disk 15 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 15 © 2009/2010 Pythian
  • OS OS Clusterware Clusterware VIP VIP RACG RACG EVMD EVMD CRSD CRSD CSSD CSSD interconnect OPROCD OPROCD 15 © 2009/2010 Pythian
  • CSSD CSSD interconnect 15 © 2009/2010 Pythian
  • Evictions 16 © 2009/2010 Pythian
  • Evictions • Network heartbeat lost 16 © 2009/2010 Pythian
  • Evictions • Network heartbeat lost • Voting disk access lost 16 © 2009/2010 Pythian
  • Evictions • Network heartbeat lost • Voting disk access lost • CSSD is not healthy 16 © 2009/2010 Pythian
  • Evictions • Network heartbeat lost • Voting disk access lost • CSSD is not healthy • OS is not healthy • OPROCD - Unix, Windows, 11g Linux • hangcheck-timer - 10g Linux 16 © 2009/2010 Pythian
  • DEMO NHB failure • Simulate with “ifconfig eth1 down” • Both nodes notice the loss • Racing to evict each other • from voting disk => 2 equal sub-clusters • survives the one with the lowest leader # • leader is the node with lowest # in sub-cluster • Winner evicts another node • Setting kill-block in voting disk • CSSD and OCLSOMON race to suicide 17 © 2009/2010 Pythian
  • NHB failure symptoms • NHB failure on several nodes • ocssd.log • Evicted node can contain other traces • maybe - syslog (Linux - /var/log/messages) • maybe - oclsomon.log • almost always - console • Network is only *possible* root cause • check syslog, ifconfig, netstat • Network engineering - switches logs 18 © 2009/2010 Pythian
  • DEMO CSSD is not healthy • Simulate using kill -STOP <cssd.bin pid> • Another node observes NHB loss • After misscount seconds => attempt eviction • but CSSD is frozen and can’t commit suicide • OCLSOMON detects CSSD timeout • Commit suicide 19 © 2009/2010 Pythian
  • OCSSD sick - symptoms • Error in OCLSOMON.log • OCSSD log might be clean on evicted node • syslog might contain OCLSOMON diag. err. • Console often contains diag. err. • Depending on syslogd settings • Set diagwait to more that 3 for better diagnosability • 3 seconds is reboottime • Increases risk of corruption 20 © 2009/2010 Pythian
  • DEMO host sick - CPU stalled • Simulate by pausing OPROCD • kill -STOP <oprocd pid> • sleep 1 or 2 • kill -CONT <oprocd pid> • oprocd.log • Usually nothing if node is reset • Immediate reboot • Console might contain diag msg 21 © 2009/2010 Pythian
  • Killed by OPROCD - symptoms • Hard to confirm (nothing in oprocd.log) • Console output often helps • “SysRq: resetting” could be in syslog as well • Root cause • Faulty hardware, drivers, caused by IO/network • Kernel bugs, NTP bugs • Investigate syslog messages • Margin can be tuned • diagwait and reboottime CSSD parameters 22 © 2009/2010 Pythian
  • 10g on Linux - hangcheck-timer • Replaced by OPROCD in 11g and 10.2.0.4+ • Most of the time useless and inactive! • Metalink Note 726833.1 • Updated 21-JUL-08! • Oracle suggests to keep both • I would only leave OPROCD • Metalink Note 567730.1 • OPROCD in 10.2.0.4 23 © 2009/2010 Pythian
  • Killed by hangcheck-timer • Rarely can be confirmed • “Hangcheck: hangcheck is restarting the machine” • Can set hangcheck_dump_tasks to dump state • See source code... 24 © 2009/2010 Pythian
  • Clusterware startup • Linux & UNIX inittab • init.cssd • init.evmd • init.crsd • Linux & UNIX init.d • init.crs • Windows Services 25 © 2009/2010 Pythian
  • Daemons startup sequence Third-party clusterware CSSD • Triggered • by init.crs from init.d sequence • manually EVMD CRSD 26 © 2009/2010 Pythian
  • Startup in Linux & Unix [gorby@dime ~]$ ps -fe | grep 'init.' | grep -v grep root 6352 1 0 10:24 ... /bin/sh /etc/init.d/init.evmd run root 6353 1 0 10:24 ... /bin/sh /etc/init.d/init.cssd fatal root 6354 1 0 10:24 ... /bin/sh /etc/init.d/init.crsd run root 7356 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oprocd root 7364 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oclsomon root 7383 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd daemon [gorby@dime ~]$ tail -3 /etc/inittab h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null [gorby@dime ~]$ ls -l /etc/rc3.d/S96init.crs lrwxrwxrwx 1 root root 20 Aug 1 23:51 /etc/rc3.d/S96init.crs -> /etc/init.d/init.crs 27 © 2009/2010 Pythian
  • Startup flow t 28 © 2009/2010 Pythian
  • Startup flow init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • Startup flow /etc/oracle/scls_scr/{host}/root/cssrun init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • Startup flow /etc/oracle/scls_scr/{host}/root/cssrun init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • Startup flow /etc/oracle/scls_scr/{host}/root/cssrun init.crs start init.cssd autostart init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • Startup flow /etc/oracle/scls_scr/{host}/root/cssrun /etc/oracle/scls_scr/{host}/root/crsstart • enable • disable init.crs start init.cssd autostart init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • Startup flow /etc/oracle/scls_scr/{host}/root/cssrun /etc/oracle/scls_scr/{host}/root/crsstart • enable • disable init.crs start init.cssd autostart init.cssd fatal init.evmd run init.crsd run t 28 © 2009/2010 Pythian
  • Startup flow /etc/oracle/scls_scr/{host}/root/cssrun /etc/oracle/scls_scr/{host}/root/crsstart • enable • disable init.cssd oprodc oprocd init.cssd oclsomon oclsomon.bin init.cssd oclsvmon oclsvmon.bin init.cssd daemon ocssd.bin init.cssd fatal evmd.bin init.evmd run init.crsd run crsd.bin t 28 © 2009/2010 Pythian
  • DEMO Startup troubleshooting • Check processes using “ps -fe | grep init” • Check syslog (/var/log/messages) • Can point to /tmp/crsctl.##### • Remember boot sequence • Clusterware log files • if *.bin processes are running already • crsctl • crsctl check crs/cssd/crsd/evmd 29 © 2009/2010 Pythian
  • Log files • log/{host}/cssd/ocssd.log • log/{host}/cssd/oclsomon/ocslmon.log • ocslmon.ba1, ocslmon.ba2,... • /etc/oracle/oprocd/{host}.oprocd.log • {host}.oprocd.log.{timestamp} • syslog • Linux /var/log/messages • Solaris /var/adm/log • Console logs 30 © 2009/2010 Pythian
  • Windows world • OPROCD = OraFenceService • EVMD = OracleEVMService • CRSD = OracleCRService • CSSD = OracleCSService • OPMD • Oracle Process Manager Daemon • Start trigger like init.crs in *nix • registered with Windows Service Control Manager (WSCM) and delay start by 60 seconds 31 © 2009/2010 Pythian
  • OS Clusterware VIP • Passing clusterware events RACG • Usually not a problem EVMD • Verify • evmwatch -A CRSD • evmpost -u "my message" CSSD OPROCD 32 © 2009/2010 Pythian
  • OS EVMD Clusterware VIP • Passing clusterware events RACG • Usually not a problem • Verify • evmwatch -A CRSD • evmpost -u "my message" CSSD OPROCD 32 © 2009/2010 Pythian
  • OS Clusterware VIP RACG EVMD CRSD CSSD OPROCD 33 © 2009/2010 Pythian
  • VIP OS CRSD Clusterware RACG • CRSD manages cluster resources EVMD • Stop / Start • Failover • VIP management CSSD • New resources and etc. OPROCD • RACG helper scripts 33 © 2009/2010 Pythian
  • CRSD startup • AfterCSSD and EVMD • Re-spawned on failure • No eviction • Runs as root • VIP control • OCR management • root ulimits are in place! • Can run resources owned by any user • owner is the property of a resource 34 © 2009/2010 Pythian
  • Oracle Cluster Registry • Repository for all configuration data • Except OCR location itself • OCR is accessed mostly read-only • Every component reads OCR • OCR is written only by CRS • only from a single OCR master node ### crsd.log ### 2008-08-02 22:23:50.958: [ OCRMAS] [3065154448]th_master:13: I AM THE NEW OCR MASTER at incar 12. Node Number 1 35 © 2009/2010 Pythian
  • CRS resources • Standard Oracle resources • ASM • Listener • VIP • Database and Instance • etc.. • srvctl => manages Oracle resources • Custom user resources • crs_% => manages any resources 36 © 2009/2010 Pythian
  • CRS resource internals • Unique name • Associated action script • stop / start / check functions • Other attributes • check frequency • pre-requisites • restart retries • etc... • All info stored in OCR 37 © 2009/2010 Pythian
  • DEMO Resource profiles • Use crs_stat [-t] to check status • Use crs_stat -p to check attributes • crs_* vs srvctl (like srvctl config ... -a) • Standard action scripts • racgimon • racgwrap / racgmain • racgvip • racgons • usrvip 38 © 2009/2010 Pythian
  • DEMO OCR internals • ocrcheck • ocrconfig • used during install/ugrade • backup OCR • recover OCR • ocrdump • txt or xml 39 © 2009/2010 Pythian
  • DEMO racgvip case study • Check the script • Set env. vars and simulate the call • Use _USR_ORA_DEBUG=1 in the script 40 © 2009/2010 Pythian
  • Resources hierarchy CS • 10.2.0.2 (?) DB (Collective Service) • released dependency of Service ASM and Instance on VIP Instance • If DB registered ASM manually with srvctl Listener • ASM dependency missing GSD ONS VIP Nodeapps Only 10.1 and 10.2.0.1 41 © 2009/2010 Pythian
  • Resources and Oracle homes CS DB Home DB (Collective Service) Service Instance ASM ASM Home Listener can be in ASM home ASM home can be Oracle home Listener CRS Home GSD ONS VIP Nodeapps Logs are in appropriate home Only 10.1 and 10.2.0.1 42 © 2009/2010 Pythian
  • DEMO troubleshooting resources • {home}/log/{host}/racg/{resource_name}.log • Old way - edit racgwrap • Uncomment _USR_ORA_DEBUG=1 • crsctl debug log res ‘{res_name}:{0|1}’ • crs_stat -p | grep DEBUG • Run “srvctl start ...” manually • SRVM_TRACE=TRUE 43 © 2009/2010 Pythian
  • Troubleshooting summary • crsctl check crs | crsd | cssd | evmd • crs_stat [-t] • crs_stat -p [{res_name}] • crsctl debug log css | crs | evm | res • crsctl lsmodules css | crs | evm • crs_stop {res_name} [-f] (stop force resource) • ocrdump • See scripts 44 © 2009/2010 Pythian
  • Troubleshooting flow • Is Clusterware up? • Is Oracle resources up? • Listener & VIP • Database & ASM instance • Services • Did any nodes got rebooted? • Did any resources re-started? • $ORA_CRS_HOME/log/{host}/crs/crsd.log • $ORA_CRS_HOME/log/{host}/alert{host}.log • MOS Note 265769.1 “Troubleshooting 10g and 11.1 Clusterware Reboots” 45 © 2009/2010 Pythian
  • Enter the 11gR2 World - Grid Infrastructure 46 © 2009/2010 Pythian
  • Enter the 11gR2 World - Grid Infrastructure Oracle Clusterware Administration and Deployment Guide 46 © 2009/2010 Pythian
  • Enter the 11gR2 World - Grid Infrastructure My Oracle Support Note 1053147.1 47 © 2009/2010 Pythian
  • 11g Grid Infrastructure Documentation • OracleClusterware Administration and Deployment Guide • MOS Note 1053147.1 • 11gR2 Clusterware and Grid Home - What You Need to Know • MOS Note 1050908.1 • How to Troubleshoot Grid Infrastructure Startup Issues • MOS Note 1053970.1 • Troubleshooting 11.2 Grid Infastructure Installation Root.sh Issues • MOS Note 1050693.1 • Troubleshooting 11.2 Clusterware Node Evictions (Reboots) 48 © 2009/2010 Pythian
  • 11gR2 Node Evictions • Same as in 10g + member kill escalation • LMON process may request CSS to remove an instance from the cluster via the instance eviction mechanism.  If this times out it could escalate to a node kill. • Processes evicting • CSSD • CSSDAGENT • CSSDMONITOR 49 © 2009/2010 Pythian
  • Questions? Thank you! http://www.pythian.com/ gorbachev@pythian.com © 2009/2010 Pythian