SlideShare a Scribd company logo
1 of 18
Download to read offline
了解 Oracle RAC Brain
 Split Resolution


      by Maclean.liu
            liu.maclean@gmail.com
        www.oracledatabase12g.com
About Me

l Email:liu.maclean@gmail.com
l Blog:www.oracledatabase12g.com
l Oracle Certified Database Administrator Master 10g
and 11g
l Over 6 years experience with Oracle DBA technology
l Over 7 years experience with Linux technology
l Member Independent Oracle Users Group
l Member All China Users Group
l Presents for advanced Oracle topics: RAC,
DataGuard, Performance Tuning and Oracle Internal.
大约是一周前,一位资深的 Oracle 工程师向我和客户介绍 RAC 中脑裂的处理过程,据他介
绍脑裂发生时通过各节点对 voting disk(投票磁盘)的抢夺,那些争抢到(n/2+1)数量 voting disk
的节点就可以 survive(幸存)下来,而没有争抢到 voting disk 的节点则被 evicted 踢出节点。


不得不说以上这番观点,来得太过随意了,一位从 Oracle 6 就开始从事维护工作的老工程师
也会犯这样的概念性错误,只能说 Oracle 技术的更新过于日新月异了。


在理解脑裂(Brain Split)处理过程前,有必要介绍一下 Oracle RAC Css(Cluster Synchronization
Services)的工作框架:




Oracle RAC CSS 提供 2 种后台服务包括群组管理(Group Managment 简称 GM)和节点监控
(Node Monitor 简称 NM),其中 GM 管理组(group)和锁(lock)服务。在集群中任意时刻总有一
个节点会充当 GM 主控节点(master node)。集群中的其他节点串行地将 GM 请求发送到主控
节点(master node),而 master node 将集群成员变更信息广播给集群中的其他节点。组成员关
系(group membership)在每次发生集群重置(cluster reconfiguration)时发生同步。每一个节点独
立地诠释集群成员变化信息。
而节点监控 NM 服务则负责通过 skgxn(skgxn-libskgxn.a,提供节点监控的库)与其他厂商的集
群软件保持节点信息的一致性。此外 NM 还提供对我们熟知的网络心跳(Network heartbeat)
和磁盘心跳(Disk heartbeat)的维护以保证节点始终存活着。当集群成员没有正常 Network
heartbeat 或 Disk heartbeat 时 NM 负责将成员踢出集群,被踢出集群的节点将发生节点重启
(reboot)。


NM 服务通过 OCR 中的记录(OCR 中记录了 Interconnect 的信息)来了解其所需要监听和交互
的端点,将心跳信息通过网络发送到其他集群 成员。同时它也监控来自所有其他集群成员
的网络心跳 Network heartbeat,每一秒钟都会发生这样的网络心跳,若某个节点的网络心跳
在 misscount(by the way:10.2.0.1 中 Linux 上默认 misscount 为 60s,其他平台为 30s,若使用了
第三方 vendor clusterware 则为 600s,但 10.2.0.1 中未引入 disktimeout;10.2.0.4 以后
misscount 为 60s,disktimeout 为 200s;11.2 以后 misscount 为 30s:CRS-4678: Successful get
misscount 30 for Cluster Synchronization Services,CRS-4678: Successful get disktimeout 200 for
Cluster Synchronization Services)指定的秒数中都没有被收到的话,该节点被认为已经”死
亡”了。NM 还负责当其他节点加入或离开集群时初始化集群的重置 (Initiates cluster
reconfiguration)。


在解决脑裂的场景中,NM 还会监控 voting disk 以了解其他的竞争子集群(subclusters)。关于
子集群我们有必要介绍一下,试想我们的环境中存在大量的节点,以 Oracle 官方构建过 的
128 个节点的环境为我们的想象空间,当网络故障发生时存在多种的可能性,一种可能性是
全局的网络失败,即 128 个节点中每个节点都不能互相发生网络心 跳,此时会产生多达 128
个的信息”孤岛”子集群。另一种可能性是局部的网络失败,128 个节点中被分成多个部分,
每个部分中包含多于一个的节点,这些部 分就可以被称作子集群(subclusters)。当出现网络
故障时子集群内部的多个节点仍能互相通信传输投票信息(vote mesg),但子集群或者孤岛节点
之间已经无法通过常规的 Interconnect 网络交流了,这个时候 NM Reconfiguration 就需要用到
voting disk 投票磁盘。




因为 NM 要使用 voting disk 来解决因为网络故障造成的通信障碍,所以需要保证 voting disk
在任意时刻都可以被正常访问。在正常状态下,每个节点都会进行磁盘心跳活动,具体来说
就是会到投票磁盘的某个块上写入 disk 心跳信息,这种活动 每一秒钟都会发生,同时 CSS
还会每秒读取一种称作”kill block”的”赐死块”,当”kill block”的内容表示本节点被驱逐出集群
时,CSS 会主动重启节点。
为了保证以上的磁盘心跳和读取”kill block”的活动始终正常运作 CSS 要求保证至少(N/2+1)个
投票磁盘要被节点正常访问,这样就保证了每 2 个节点间总是至少有一个投票磁盘是它们都
可以正常访问的,在正常情况下(注意是风平浪静的正常情况)只要节点所能访问的在线
voting disk 多于无法访问的 voting disk,该节点都能幸福地活下去,当无法访问的 voting disk
多于正常的 voting disk 时,Cluster Communication Service 进程将失败并引起节点重启。所以
有一种说法认为 voting disk 只要有 2 个足以保证冗余度就可以了,没有必要有 3 个或以上
voting disk,这种说法是错误的。Oracle 推荐集群中至少要有 3 个 voting disks。




当实际的 NM Reconfiguration 集群重置情况发生时所有的 active 节点和正在加入集群的节点
都会参与到 reconfig 中,那些没有应答(ack)的节点都将不再被归入新的集群关系中。实际上
reconfig 重置包括多个阶段:




1.初始化阶段 — reconfig manager(由集群成员号最低的节点担任)向其他节点发送启动
reconfig 的信号


2.投票阶段 — 节点向 reconfig manager 发送该节点所了解的成员关系


3.脑裂检查阶段 — reconfig manager 检查是否脑裂


4.驱逐阶段 — reconfig manager 驱逐非成员节点


5.更新阶段 — reconfig manager 向成员节点发送权威成员关系信息


在脑裂检查阶段 Reconfig Manager 会找出那些没有 Network Heartbeat 而有 Disk Heartbeat 的
节点,并通过 Network Heartbeat(如果可能的话)和 Disk Heartbeat 的信息来计算所有竞争子集
群(subcluster)内的节点数目,并依据以下 2 种因素决定哪个子集群应当存活下去:

   1. 拥有最多节点数目的子集群(Sub-cluster with largest number of Nodes)
   2. 若子集群内数目相等则为拥有最低节点号的子集群(Sub-cluster with lowest node
     number),举例来说在一个 2 节点的 RAC 环境中总是 1 号节点会获胜。
在完成脑裂检查后进入驱逐阶段,被驱逐节点会收到发送给它们的驱逐信息(如果网络可用
的话),若无法发送信息则会通过写出驱逐通知到 voting disk 上的”kill block”来达到驱逐通知
的目的。同时还会等待被驱逐节点表示其已收到驱逐通知,这种表示可能是通过网络通信的
方式也可能是投票磁盘上的状态信息。


可以看到 Oracle CSS 中 Brain Split Check 时会尽可能地保证最大的一个子集群存活下来以保
证 RAC 系统具有最高的可用性,而并不如那位资深工程师所说的在 Cluster Reconfiguration
阶段会通过节点对投票磁盘的抢占来保证哪个节点存活下来。


以下为一个三节点 RAC 环境中的 2 个示例场景:


1.1 号节点网络失败,2,3 号节点形成子集群;2,3 节点通过 voting disk 向 1 号节点发起驱
逐:
以下为 1 号节点的 ocssd.log 日志:
[    CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING:   clssnmPollingThread:
node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction   in 29.610 seconds
[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:     clssnmPollingThread:
node vrh2 (2) is impending reconfig, flag 1037, misstime   30390
[    CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING:   clssnmPollingThread:
node vrh3 (3) at 50 3.280308e-268artbeat fatal, eviction   in 29.150 seconds

对 2,3 号节点发起 misscount 计时

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:    clssnmPollingThread:
node vrh3 (3) is impending reconfig, flag 1037, misstime 30850
[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:    clssnmPollingThread:
diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:11:44.368 [3042950032] >WARNING: clssnmPollingThread:
node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 28.610 seconds
[    CSSD]2011-04-23 17:12:04.778 [3042950032] >WARNING: clssnmPollingThread:
node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 14.580 seconds
[    CSSD]2011-04-23 17:12:04.779 [3042950032] >WARNING: clssnmPollingThread:
node vrh3 (3) at 75 3.280308e-268artbeat fatal, eviction in 14.120 seconds
[    CSSD]2011-04-23 17:12:06.207 [3042950032] >WARNING: clssnmPollingThread:
node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 13.580 seconds
[    CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 5.560 seconds
[    CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread:
node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 5.100 seconds
[    CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 4.560 seconds
[    CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread:
node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 4.100 seconds
[    CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 3.560 seconds
[    CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread:
node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 3.100 seconds
[    CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 2.560 seconds
[      CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING:   clssnmPollingThread:
node   vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction   in 2.100 seconds
[      CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING:   clssnmPollingThread:
node   vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction   in 1.550 seconds
[      CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING:   clssnmPollingThread:
node   vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction   in 1.090 seconds


2 号节点的 ocssd.log 日志:
[    CSSD]2011-04-23 17:11:53.054 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 29.800 seconds
[    CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE:    clssnmPollingThread:
node vrh1 (1) is impending reconfig, flag 1037, misstime 30200
[    CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE:    clssnmPollingThread:
diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:11:54.516 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 28.790 seconds
[    CSSD]2011-04-23 17:12:14.826 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 14.800 seconds
[    CSSD]2011-04-23 17:12:16.265 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 13.800 seconds
[    CSSD]2011-04-23 17:12:27.755 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 5.800 seconds
[    CSSD]2011-04-23 17:12:29.197 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 4.800 seconds
[    CSSD]2011-04-23 17:12:30.658 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 3.800 seconds
[    CSSD]2011-04-23 17:12:32.133 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 2.800 seconds
[    CSSD]2011-04-23 17:12:33.602 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 1.790 seconds
[    CSSD]2011-04-23 17:12:35.126 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 0.800 seconds

[    CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE:   clssnmHandleSync:
diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE:   clssnmHandleSync:
Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]

clssnmHandleSyn 应答 3 号节点发送的同步信息

[    CSSD]2011-04-23 17:12:35.399 [5073104] >USER:      NMEVENT_SUSPEND [00][00]
[00][0e]

发生 Node Monitoring SUSPEND 事件

[    CSSD]2011-04-23 17:12:35.405 [117574544] >TRACE:     clssnmSendVoteInfo:
node(3) syncSeqNo(10)

通过 clssnmSendVoteInfo 向 3 号节点发送投票信息 Vote mesg

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState:
node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState:
node 1, state (3/0) unique (1303592601/1303592601) prevConuni(0) birth (9/9)
(old/new)
[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmDiscHelper: vrh1,
node(1) connection failed, con (0xb7e80ae8), probe((nil))
[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmDeactivateNode:
node 1 (vrh1) left cluster

确认 1 号节点离开了集群 cluster
[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:    clssnmUpdateNodeState:
node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2)
(old/new)
[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:    clssnmUpdateNodeState:
node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3)
(old/new)
[    CSSD]2011-04-23 17:12:35.415 [117574544] >USER:     clssnmHandleUpdate:
SYNC(10) from node(3) completed
[    CSSD]2011-04-23 17:12:35.416 [117574544] >USER:     clssnmHandleUpdate: NODE
2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:12:35.416 [117574544] >USER:     clssnmHandleUpdate: NODE
3 (vrh3) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:12:35.416 [117574544] >TRACE:    clssnmHandleUpdate:
diskTimeout set to (200000)ms
[    CSSD]2011-04-23 17:12:35.416 [3021970320] >TRACE:    clssgmReconfigThread:
started for reconfig (10)
[    CSSD]2011-04-23 17:12:35.416 [3021970320] >USER:     NMEVENT_RECONFIG [00]
[00][00][0c]
[    CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock crs_version type 2
[    CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version)
birth(9/9)
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_FAILOVER type 3
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock EVMDMAIN type 2
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9)
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock CRSDMAIN type 2
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_MEMBER_vrh1 type 3
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_MEMBER_vrh2 type 3
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_MEMBER_vrh3 type 3
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock ocr_crs type 2
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(ocr_crs) birth(9/9)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock #CSS_CLSSOMON type 2
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON)
birth(9/9)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:
clssgmEstablishConnections: 2 nodes in cluster incarn 10
[    CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE:    clssgmPeerDeactivate:
node 1 (vrh1), death 10, state 0x80000000 connstate 0xa
[    CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE:    clssgmPeerListener:
connects done (2/2)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:
clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:    clssgmMasterCMSync:
Synchronizing group/lock status
[    CSSD]2011-04-23 17:12:35.428 [3021970320] >TRACE:    clssgmMasterSendDBDone:
group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2
完成 reconfiguration

[    CSSD]2011-04-23 17:12:35.440 [3021970320] >TRACE:   clssgmReconfigThread:
completed for reconfig(10), with status(1)


以下为 3 号节点的 ocssd.log:
[    CSSD]2011-04-23 17:12:36.303 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 50 1.867300e-268artbeat fatal, eviction in 29.220 seconds
[    CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE:    clssnmPollingThread:
node vrh1 (1) is impending reconfig, flag 1037, misstime 30780
[    CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE:    clssnmPollingThread:
diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:12:57.889 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 75 1.867300e-268artbeat fatal, eviction in 14.220 seconds
[    CSSD]2011-04-23 17:13:10.674 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 5.220 seconds
[    CSSD]2011-04-23 17:13:12.115 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 4.220 seconds
[    CSSD]2011-04-23 17:13:13.597 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 3.210 seconds
[    CSSD]2011-04-23 17:13:15.024 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 2.220 seconds
[    CSSD]2011-04-23 17:13:16.504 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 1.220 seconds
[    CSSD]2011-04-23 17:13:17.987 [3053439888] >WARNING: clssnmPollingThread:
node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 0.220 seconds
[    CSSD]2011-04-23 17:13:18.325 [3053439888] >TRACE:    clssnmPollingThread:
Eviction started for node vrh1 (1), flags 0x040d, state 3, wt4c 0
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:    clssnmDoSyncUpdate:
Initiating sync 10
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:    clssnmDoSyncUpdate:
diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:    clssnmSetupAckWait: Ack
message type (11)
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:    clssnmSetupAckWait:
node(2) is ALIVE
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:    clssnmSetupAckWait:
node(3) is ALIVE
[    CSSD]2011-04-23 17:13:18.327 [3032460176] >TRACE:    clssnmSendSync:
syncSeqNo(10)
[    CSSD]2011-04-23 17:13:18.329 [3032460176] >TRACE:    clssnmWaitForAcks: Ack
message type(11), ackCount(2)
[    CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE:    clssnmHandleSync:
diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE:    clssnmHandleSync:
Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]
[    CSSD]2011-04-23 17:13:18.330 [8136912] >USER:     NMEVENT_SUSPEND [00][00]
[00][0e]
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:    clssnmWaitForAcks:
done, msg type(11)
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:    clssnmDoSyncUpdate:
Terminating node 1, vrh1, misstime(60010) state(5)
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:    clssnmSetupAckWait: Ack
message type (13)
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:    clssnmSetupAckWait:
node(2) is ACTIVE
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:    clssnmSetupAckWait:
node(3) is ACTIVE
[    CSSD]2011-04-23 17:13:18.334 [3032460176] >TRACE:    clssnmWaitForAcks: Ack
message type(13), ackCount(2)
[    CSSD]2011-04-23 17:13:18.335 [89033616] >TRACE:    clssnmSendVoteInfo:
node(3) syncSeqNo(10)
[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmWaitForAcks:
done, msg type(13)

以上完成了 2-3 节点间的 Vote mesg 通信,这些信息包含 Node identifier,GM peer to peer
listening endpoint 以及
View of cluster membership。

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmCheckDskInfo:
Checking disk info...

开始检测 voting disk 上的信息

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmCheckDskInfo:
node 1, vrh1, state 5
with leader 1 has smaller cluster size 1; my cluster size 2 with leader 2

发现其他子集群,包含 1 号节点且 1 号节点为该子集群的 leader,为最小子集群;3 号与 2 号节点组成最大子
集群,2 号节点为 leader 节点

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmEvict: Start
[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmEvict: Evicting
node 1, vrh1, birth 9, death 10,
impendingrcfg 1, stateflags 0x40d

发起对 1 号节点的驱逐

[     CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmSendShutdown: req
to node 1, kill time 443294
[     CSSD]2011-04-23 17:13:18.339 [3032460176] >TRACE:   clssnmDiscHelper: vrh1,
node(1) connection failed, con (0xb7eaf220), probe((nil))
[     CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions:
Start
[     CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions:
node 1, vrh1, undead 1
[     CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmCheckKillStatus:
Node 1, vrh1, down, LATS(443144),timeout(150)

clssnmCheckKillStatus 检查 1 号节点是否 down 了

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: Ack
message type (15)
[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait:
node(2) is ACTIVE
[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait:
node(3) is ACTIVE
[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSendUpdate:
syncSeqNo(10)
[    CSSD]2011-04-23 17:13:18.341 [3032460176] >TRACE:   clssnmWaitForAcks: Ack
message type(15), ackCount(2)
[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState:
node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState:
node 1, state (5/0) unique (1303592601/1303592601) prevConuni(1303592601) birth
(9/9) (old/new)
[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmDeactivateNode:
node 1 (vrh1) left cluster

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState:
node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2)
(old/new)
[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:    clssnmUpdateNodeState:
node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3)
(old/new)
[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:     clssnmHandleUpdate:
SYNC(10) from node(3) completed
[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:     clssnmHandleUpdate: NODE
2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:     clssnmHandleUpdate: NODE
3 (vrh3) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:13:18.342 [89033616] >TRACE:    clssnmHandleUpdate:
diskTimeout set to (200000)ms
[    CSSD]2011-04-23 17:13:18.347 [3032460176] >TRACE:    clssnmWaitForAcks:
done, msg type(15)
[    CSSD]2011-04-23 17:13:18.348 [3032460176] >TRACE:    clssnmDoSyncUpdate:
Sync 10 complete!
[    CSSD]2011-04-23 17:13:18.350 [3021970320] >TRACE:    clssgmReconfigThread:
started for reconfig (10)
[    CSSD]2011-04-23 17:13:18.350 [3021970320] >USER:     NMEVENT_RECONFIG [00]
[00][00][0c]
[    CSSD]2011-04-23 17:13:18.351 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock crs_version type 2
[    CSSD]2011-04-23 17:13:18.352 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version)
birth(9/9)
[    CSSD]2011-04-23 17:13:18.353 [3063929744] >TRACE:    clssgmDispatchCMXMSG():
got message type(7) src(2) incarn(10) during incarn(9/9)
[    CSSD]2011-04-23 17:13:18.354 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_FAILOVER type 3
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock EVMDMAIN type 2
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9)
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock CRSDMAIN type 2
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9)
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_MEMBER_vrh1 type 3
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_MEMBER_vrh2 type 3
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_MEMBER_vrh3 type 3
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock ocr_crs type 2
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(ocr_crs) birth(9/9)
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:    clssgmCleanupGrocks:
cleaning up grock #CSS_CLSSOMON type 2
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON)
birth(9/9)
[    CSSD]2011-04-23 17:13:18.357 [3021970320] >TRACE:
clssgmEstablishConnections: 2 nodes in cluster incarn 10
[    CSSD]2011-04-23 17:13:18.366 [3063929744] >TRACE:    clssgmPeerDeactivate:
node 1 (vrh1), death 10, state 0x80000000 connstate 0xa
[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:    clssgmHandleDBDone():
src/dest (2/65535) size(68) incarn 10
[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:    clssgmPeerListener:
connects done (2/2)
[    CSSD]2011-04-23 17:13:18.369 [3021970320] >TRACE:
clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)

更新阶段
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes

[    CSSD]CLSS-3001: local node number 3, master node number 2

[    CSSD]2011-04-23 17:13:18.372 [3021970320] >TRACE:   clssgmReconfigThread:
completed for reconfig(10), with status(1)


2.另一场景为 1 号节点未加入集群,2 号节点的网络失败,因 2 号节点的 member number 较
小故其通过 voting disk 向 3 号节点发起驱逐

以下为 2 号节点的 ocssd.log 日志
[    CSSD]2011-04-23 17:41:48.643 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 29.890 seconds
[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:    clssnmPollingThread:
node vrh3 (3) is impending reconfig, flag 1037, misstime 30110
[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:    clssnmPollingThread:
diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:41:50.132 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 28.890 seconds
[    CSSD]2011-04-23 17:42:10.533 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 14.860 seconds
[    CSSD]2011-04-23 17:42:11.962 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 13.860 seconds
[    CSSD]2011-04-23 17:42:23.523 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 5.840 seconds
[    CSSD]2011-04-23 17:42:24.989 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 4.840 seconds
[    CSSD]2011-04-23 17:42:26.423 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 3.840 seconds
[    CSSD]2011-04-23 17:42:27.890 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 2.840 seconds
[    CSSD]2011-04-23 17:42:29.382 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 1.840 seconds
[    CSSD]2011-04-23 17:42:30.832 [3053439888] >WARNING: clssnmPollingThread:
node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 0.830 seconds
[    CSSD]2011-04-23 17:42:32.020 [3053439888] >TRACE:    clssnmPollingThread:
Eviction started for node vrh3 (3), flags 0x040d, state 3, wt4c 0
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:    clssnmDoSyncUpdate:
Initiating sync 13
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:    clssnmDoSyncUpdate:
diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:    clssnmSetupAckWait: Ack
message type (11)
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:    clssnmSetupAckWait:
node(2) is ALIVE
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:    clssnmSendSync:
syncSeqNo(13)
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:    clssnmWaitForAcks: Ack
message type(11), ackCount(1)
[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync:
diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync:
Acknowledging sync: src[2] srcName[vrh2] seq[13] sync[13]
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:    clssnmWaitForAcks:
done, msg type(11)
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:    clssnmDoSyncUpdate:
Terminating node 3, vrh3, misstime(60000) state(5)
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:    clssnmSetupAckWait: Ack
message type (13)
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:    clssnmSetupAckWait:
node(2) is ACTIVE
[    CSSD]2011-04-23 17:42:32.021 [5073104] >USER:     NMEVENT_SUSPEND [00][00]
[00][0c]
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:    clssnmWaitForAcks: Ack
message type(13), ackCount(1)
[    CSSD]2011-04-23 17:42:32.022 [117574544] >TRACE:    clssnmSendVoteInfo:
node(2) syncSeqNo(13)
[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:    clssnmWaitForAcks:
done, msg type(13)
[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:    clssnmCheckDskInfo:
Checking disk info...
[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:    clssnmCheckDskInfo:
node 3, vrh3, state 5 with leader 3
has smaller cluster size 1; my cluster size 1 with leader 2

检查 voting disk 后发现子集群 3 为最小"子集群"(3 号节点的 node number 较 2 号大);2 号节点为最大
子集群

[    CSSD]2011-04-23 17:42:32.022 [3032460176]   >TRACE:   clssnmEvict: Start
[    CSSD]2011-04-23 17:42:32.022 [3032460176]   >TRACE:   clssnmEvict: Evicting
node 3, vrh3, birth 3, death 13, impendingrcfg   1, stateflags 0x40d
[    CSSD]2011-04-23 17:42:32.022 [3032460176]   >TRACE:   clssnmSendShutdown: req
to node 3, kill time 1643084

发起对 3 号节点的驱逐和 shutdown request

[     CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmDiscHelper: vrh3,
node(3) connection failed, con (0xb7e79bb0), probe((nil))
[     CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions:
Start
[     CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions:
node 3, vrh3, undead 1
[     CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmCheckKillStatus:
Node 3, vrh3, down, LATS(1642874),timeout(210)
[     CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait: Ack
message type (15)
[     CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait:
node(2) is ACTIVE
[     CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSendUpdate:
syncSeqNo(13)
[     CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmWaitForAcks: Ack
message type(15), ackCount(1)
[     CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:  clssnmUpdateNodeState:
node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[     CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:  clssnmUpdateNodeState:
node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[     CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:  clssnmUpdateNodeState:
node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2)
(old/new)
[     CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:  clssnmUpdateNodeState:
node 3, state (5/0) unique (1303591326/1303591326) prevConuni(1303591326) birth
(3/3) (old/new)
[     CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:  clssnmDeactivateNode:
node 3 (vrh3) left cluster

[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:      clssnmHandleUpdate:
SYNC(13) from node(2) completed
[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:      clssnmHandleUpdate: NODE
2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:     clssnmHandleUpdate:
diskTimeout set to (200000)ms
[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:     clssnmWaitForAcks:
done, msg type(15)
[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:     clssnmDoSyncUpdate:
Sync 13 complete!
[    CSSD]2011-04-23 17:42:32.024 [3021970320] >TRACE:   clssgmReconfigThread:
started for reconfig (13)
[    CSSD]2011-04-23 17:42:32.024 [3021970320] >USER:    NMEVENT_RECONFIG [00]
[00][00][04]
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks:
cleaning up grock crs_version type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(crs_version)
birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_FAILOVER type 3
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_FAILOVER)
birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks:
cleaning up grock EVMDMAIN type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(EVMDMAIN) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks:
cleaning up grock CRSDMAIN type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(CRSDMAIN) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_MEMBER_vrh1 type 3
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(0)
grock(_ORA_CRS_MEMBER_vrh1) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks:
cleaning up grock _ORA_CRS_MEMBER_vrh3 type 3
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(0)
grock(_ORA_CRS_MEMBER_vrh3) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks:
cleaning up grock ocr_crs type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(ocr_crs) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks:
cleaning up grock #CSS_CLSSOMON type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:
clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(#CSS_CLSSOMON)
birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:
clssgmEstablishConnections: 1 nodes in cluster incarn 13
[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerDeactivate:
node 3 (vrh3), death 13, state 0x0 connstate 0xf
[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerListener:
connects done (1/1)
[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:
clssgmEstablishMasterNode: MASTER for 13 is node(2) birth(2)
[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterCMSync:
Synchronizing group/lock status
[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterSendDBDone:
group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 13 with 1 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2

完成 reconfiguration

[    CSSD]2011-04-23 17:42:32.027 [3021970320] >TRACE:   clssgmReconfigThread:
completed for reconfig(13), with status(1)
以下为 3 号节点的 ocssd.log 日志:
[    CSSD]2011-04-23 17:42:33.204 [3053439888] >WARNING: clssnmPollingThread:
node vrh2 (2) at 50 1.867300e-268artbeat fatal, eviction in 29.360 seconds
[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:    clssnmPollingThread:
node vrh2 (2) is impending reconfig, flag 1039, misstime 30640
[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:    clssnmPollingThread:
diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:42:55.168 [3053439888] >WARNING: clssnmPollingThread:
node vrh2 (2) at 75 1.867300e-268artbeat fatal, eviction in 14.330 seconds
[    CSSD]2011-04-23 17:43:08.182 [3053439888] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 5.310 seconds
[    CSSD]2011-04-23 17:43:09.661 [3053439888] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 4.300 seconds
[    CSSD]2011-04-23 17:43:11.144 [3053439888] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 3.300 seconds
[    CSSD]2011-04-23 17:43:12.634 [3053439888] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 2.300 seconds
[    CSSD]2011-04-23 17:43:14.053 [3053439888] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 1.300 seconds
[    CSSD]2011-04-23 17:43:15.467 [3053439888] >WARNING: clssnmPollingThread:
node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 0.300 seconds
[    CSSD]2011-04-23 17:43:15.911 [3053439888] >TRACE:    clssnmPollingThread:
Eviction started for node vrh2 (2), flags 0x040f, state 3, wt4c 0
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:    clssnmDoSyncUpdate:
Initiating sync 13
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:    clssnmDoSyncUpdate:
diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:    clssnmSetupAckWait: Ack
message type (11)
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:    clssnmSetupAckWait:
node(3) is ALIVE
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:    clssnmSendSync:
syncSeqNo(13)
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:    clssnmWaitForAcks: Ack
message type(11), ackCount(1)
[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:    clssnmHandleSync:
diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:    clssnmHandleSync:
Acknowledging sync: src[3] srcName[vrh3] seq[29] sync[13]
[    CSSD]2011-04-23 17:43:15.912 [8136912] >USER:     NMEVENT_SUSPEND [00][00]
[00][0c]
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:    clssnmWaitForAcks:
done, msg type(11)
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:    clssnmDoSyncUpdate:
Terminating node 2, vrh2, misstime(60010) state(5)
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:    clssnmSetupAckWait: Ack
message type (13)
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:    clssnmSetupAckWait:
node(3) is ACTIVE
[    CSSD]2011-04-23 17:43:15.913 [89033616] >TRACE:    clssnmSendVoteInfo:
node(3) syncSeqNo(13)
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:    clssnmWaitForAcks: Ack
message type(13), ackCount(1)
[    CSSD]2011-04-23 17:43:15.913 [3032460176] >TRACE:    clssnmCheckDskInfo:
Checking disk info...
[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:    clssnmCheckDskInfo:
Aborting local node to avoid splitbrain.
[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:                       : my
node(3), Leader(3), Size(1) VS Node(2), Leader(2), Size(1)

读取 voting disk 后发现 kill block,为避免 split brain,自我 aborting!
[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:
###################################
[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   clssscExit: CSSD
aborting
[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:
###################################
[    CSSD]--- DUMP GROCK STATE DB ---
[    CSSD]----------
[    CSSD] type 2, Id 4, Name = (crs_version)
[    CSSD] flags: 0x1000
[    CSSD] grant: count=0, type 0, wait 0
[    CSSD] Member Count =2, master 2
[    CSSD]   . . . . .
[    CSSD]     memberNo =2, seq 2
[    CSSD]     flags = 0x0, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]   . . . . .
[    CSSD]     memberNo =1, seq 12
[    CSSD]     flags = 0x1000, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 2, nodeBirth 2
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]----------
[    CSSD]----------
[    CSSD] type 3, Id 11, Name = (_ORA_CRS_FAILOVER)
[    CSSD] flags: 0x0
[    CSSD] grant: count=1, type 3, wait 1
[    CSSD] Member Count =1, master -3
[    CSSD]   . . . . .
[    CSSD]     memberNo =0, seq 0
[    CSSD]     flags = 0x12, granted 1
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]----------
[    CSSD]----------
[    CSSD] type 2, Id 2, Name = (EVMDMAIN)
[    CSSD] flags: 0x1000
[    CSSD] grant: count=0, type 0, wait 0
[    CSSD] Member Count =2, master 2
[    CSSD]   . . . . .
[    CSSD]     memberNo =2, seq 1
[    CSSD]     flags = 0x0, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 2, nodeBirth 2
[    CSSD]     privateDataSize = 508
[    CSSD]     publicDataSize = 504
[    CSSD]   . . . . .
[    CSSD]     memberNo =3, seq 2
[    CSSD]     flags = 0x0, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 508
[    CSSD]     publicDataSize = 504
[    CSSD]----------
[    CSSD]----------
[    CSSD] type 2, Id 5, Name = (CRSDMAIN)
[    CSSD] flags: 0x1000
[    CSSD] grant: count=0, type 0, wait 0
[   CSSD] Member Count =1, master 3
[   CSSD]   . . . . .
[   CSSD]     memberNo =3, seq 2
[   CSSD]     flags = 0x0, granted 0
[   CSSD]     refCnt = 1
[   CSSD]     nodeNum = 3, nodeBirth 3
[   CSSD]     privateDataSize = 128
[   CSSD]     publicDataSize = 128
[   CSSD]----------
[   CSSD]----------
[   CSSD] type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh1)
[   CSSD] flags: 0x0
[   CSSD] grant: count=1, type 3, wait 1
[   CSSD] Member Count =1, master -3
[   CSSD]   . . . . .
[   CSSD]     memberNo =0, seq 0
[   CSSD]     flags = 0x12, granted 1
[   CSSD]     refCnt = 1
[   CSSD]     nodeNum = 3, nodeBirth 3
[   CSSD]     privateDataSize = 0
[   CSSD]     publicDataSize = 0
[   CSSD]----------
[   CSSD]----------
[   CSSD] type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh3)
[   CSSD] flags: 0x0
[   CSSD] grant: count=1, type 3, wait 1
[   CSSD] Member Count =1, master -3
[   CSSD]   . . . . .
[   CSSD]     memberNo =0, seq 0
[   CSSD]     flags = 0x12, granted 1
[   CSSD]     refCnt = 1
[   CSSD]     nodeNum = 3, nodeBirth 3
[   CSSD]     privateDataSize = 0
[   CSSD]     publicDataSize = 0
[   CSSD]----------
[   CSSD]----------
[   CSSD] type 2, Id 3, Name = (ocr_crs)
[   CSSD] flags: 0x1000
[   CSSD] grant: count=0, type 0, wait 0
[   CSSD] Member Count =2, master 3
[   CSSD]   . . . . .
[   CSSD]     memberNo =3, seq 2
[   CSSD]     flags = 0x0, granted 0
[   CSSD]     refCnt = 1
[   CSSD]     nodeNum = 3, nodeBirth 3
[   CSSD]     privateDataSize = 0
[   CSSD]     publicDataSize = 32
[   CSSD]   . . . . .
[   CSSD]     memberNo =2, seq 12
[   CSSD]     flags = 0x1000, granted 0
[   CSSD]     refCnt = 1
[   CSSD]     nodeNum = 2, nodeBirth 2
[   CSSD]     privateDataSize = 0
[   CSSD]     publicDataSize = 32
[   CSSD]----------
[   CSSD]----------
[   CSSD] type 2, Id 1, Name = (#CSS_CLSSOMON)
[   CSSD] flags: 0x1000
[   CSSD] grant: count=0, type 0, wait 0
[   CSSD] Member Count =2, master 2
[   CSSD]   . . . . .
[   CSSD]     memberNo =2, seq 1
[   CSSD]     flags = 0x1000, granted 0
[   CSSD]     refCnt = 1
[   CSSD]     nodeNum = 2, nodeBirth 2
[   CSSD]     privateDataSize = 0
[   CSSD]     publicDataSize = 0
[   CSSD]   . . . . .
[   CSSD]     memberNo =3, seq 2
[   CSSD]     flags = 0x1000, granted 0
[   CSSD]     refCnt = 1
[   CSSD]     nodeNum = 3, nodeBirth 3
[   CSSD]     privateDataSize = 0
[   CSSD]     publicDataSize = 0
[   CSSD]----------
[   CSSD]--- END OF GROCK STATE DUMP ---
[   CSSD]------- Begin Dump -------




© 2011, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,
否则追求法律责任.

More Related Content

What's hot

Performance tuning a quick intoduction
Performance tuning   a quick intoductionPerformance tuning   a quick intoduction
Performance tuning a quick intoductionRiyaj Shamsudeen
 
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀EXEM
 
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀EXEM
 
oracle cloud with 2 nodes processing
oracle cloud with 2 nodes processingoracle cloud with 2 nodes processing
oracle cloud with 2 nodes processingmahdi ahmadi
 
Performance schema 설정
Performance schema 설정Performance schema 설정
Performance schema 설정EXEM
 
Dbms plan - A swiss army knife for performance engineers
Dbms plan - A swiss army knife for performance engineersDbms plan - A swiss army knife for performance engineers
Dbms plan - A swiss army knife for performance engineersRiyaj Shamsudeen
 
Introduction to Parallel Execution
Introduction to Parallel ExecutionIntroduction to Parallel Execution
Introduction to Parallel ExecutionDoug Burns
 
Backup automation in KAKAO
Backup automation in KAKAO Backup automation in KAKAO
Backup automation in KAKAO I Goo Lee
 
Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7I Goo Lee
 
Oracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingOracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingKyle Hailey
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksKyle Hailey
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersConnor McDonald
 
OakTable World Sep14 clonedb
OakTable World Sep14 clonedb OakTable World Sep14 clonedb
OakTable World Sep14 clonedb Connor McDonald
 
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...Ontico
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS processRiyaj Shamsudeen
 
Oracle cluster installation with grid and iscsi
Oracle cluster  installation with grid and iscsiOracle cluster  installation with grid and iscsi
Oracle cluster installation with grid and iscsiChanaka Lasantha
 
Database administration commands
Database administration commands Database administration commands
Database administration commands Varsha Ajith
 

What's hot (20)

Performance tuning a quick intoduction
Performance tuning   a quick intoductionPerformance tuning   a quick intoduction
Performance tuning a quick intoduction
 
Rac 12c optimization
Rac 12c optimizationRac 12c optimization
Rac 12c optimization
 
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 6회 엑셈 수요 세미나 자료 연구컨텐츠팀
 
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
제 8회 엑셈 수요 세미나 자료 연구컨텐츠팀
 
oracle cloud with 2 nodes processing
oracle cloud with 2 nodes processingoracle cloud with 2 nodes processing
oracle cloud with 2 nodes processing
 
Performance schema 설정
Performance schema 설정Performance schema 설정
Performance schema 설정
 
Px execution in rac
Px execution in racPx execution in rac
Px execution in rac
 
Dbms plan - A swiss army knife for performance engineers
Dbms plan - A swiss army knife for performance engineersDbms plan - A swiss army knife for performance engineers
Dbms plan - A swiss army knife for performance engineers
 
Introduction to Parallel Execution
Introduction to Parallel ExecutionIntroduction to Parallel Execution
Introduction to Parallel Execution
 
Backup automation in KAKAO
Backup automation in KAKAO Backup automation in KAKAO
Backup automation in KAKAO
 
Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7
 
Oracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingOracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 sampling
 
UKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction LocksUKOUG, Oracle Transaction Locks
UKOUG, Oracle Transaction Locks
 
OpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developersOpenWorld Sep14 12c for_developers
OpenWorld Sep14 12c for_developers
 
Oracle ORA Errors
Oracle ORA ErrorsOracle ORA Errors
Oracle ORA Errors
 
OakTable World Sep14 clonedb
OakTable World Sep14 clonedb OakTable World Sep14 clonedb
OakTable World Sep14 clonedb
 
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS process
 
Oracle cluster installation with grid and iscsi
Oracle cluster  installation with grid and iscsiOracle cluster  installation with grid and iscsi
Oracle cluster installation with grid and iscsi
 
Database administration commands
Database administration commands Database administration commands
Database administration commands
 

Viewers also liked

【Maclean liu技术分享】深入理解oracle中mutex的内部原理
【Maclean liu技术分享】深入理解oracle中mutex的内部原理【Maclean liu技术分享】深入理解oracle中mutex的内部原理
【Maclean liu技术分享】深入理解oracle中mutex的内部原理maclean liu
 
Essential oracle security internal for dba
Essential oracle security internal for dbaEssential oracle security internal for dba
Essential oracle security internal for dbamaclean liu
 
11g新特性 在线实施补丁online patching
11g新特性 在线实施补丁online patching11g新特性 在线实施补丁online patching
11g新特性 在线实施补丁online patchingmaclean liu
 
了解Oracle critical patch update
了解Oracle critical patch update了解Oracle critical patch update
了解Oracle critical patch updatemaclean liu
 
Oracle数据库升级前必要的准备工作
Oracle数据库升级前必要的准备工作Oracle数据库升级前必要的准备工作
Oracle数据库升级前必要的准备工作maclean liu
 
PRM DUL Oracle Database Health Check
PRM DUL Oracle Database Health CheckPRM DUL Oracle Database Health Check
PRM DUL Oracle Database Health Checkmaclean liu
 
Content Marketing World 2014 talk - Jay Acunzo
Content Marketing World 2014 talk - Jay AcunzoContent Marketing World 2014 talk - Jay Acunzo
Content Marketing World 2014 talk - Jay AcunzoJay Acunzo
 
Permen tahun2013 nomor81a_lampiran2
Permen tahun2013 nomor81a_lampiran2Permen tahun2013 nomor81a_lampiran2
Permen tahun2013 nomor81a_lampiran2Irma Muthiara Sari
 
22tribes.com - 22 ways we help organisations grow, experiment, measure and in...
22tribes.com - 22 ways we help organisations grow, experiment, measure and in...22tribes.com - 22 ways we help organisations grow, experiment, measure and in...
22tribes.com - 22 ways we help organisations grow, experiment, measure and in...Björn Ühss (500+) ★ Bjoern Uehss
 
New Zealand Franchising Confidence Index | January 2013
New Zealand Franchising Confidence Index | January 2013New Zealand Franchising Confidence Index | January 2013
New Zealand Franchising Confidence Index | January 2013Franchize Consultants
 
Oracle prm数据库恢复工具与asm
Oracle prm数据库恢复工具与asmOracle prm数据库恢复工具与asm
Oracle prm数据库恢复工具与asmmaclean liu
 
Evaluating my magazine
Evaluating my magazineEvaluating my magazine
Evaluating my magazineabcdsmile
 
Implementasi TLS dan SRTP pada VoIP Server
Implementasi TLS dan SRTP pada VoIP ServerImplementasi TLS dan SRTP pada VoIP Server
Implementasi TLS dan SRTP pada VoIP ServerFendi Kurniawan
 
2012 jaws ug紹介(熊本ug版)
2012 jaws ug紹介(熊本ug版)2012 jaws ug紹介(熊本ug版)
2012 jaws ug紹介(熊本ug版)Takeshita Kouhei
 
Hallgrímur pétursson(power-point)
Hallgrímur pétursson(power-point)Hallgrímur pétursson(power-point)
Hallgrímur pétursson(power-point)sudaratkaenjan
 

Viewers also liked (20)

【Maclean liu技术分享】深入理解oracle中mutex的内部原理
【Maclean liu技术分享】深入理解oracle中mutex的内部原理【Maclean liu技术分享】深入理解oracle中mutex的内部原理
【Maclean liu技术分享】深入理解oracle中mutex的内部原理
 
Essential oracle security internal for dba
Essential oracle security internal for dbaEssential oracle security internal for dba
Essential oracle security internal for dba
 
11g新特性 在线实施补丁online patching
11g新特性 在线实施补丁online patching11g新特性 在线实施补丁online patching
11g新特性 在线实施补丁online patching
 
了解Oracle critical patch update
了解Oracle critical patch update了解Oracle critical patch update
了解Oracle critical patch update
 
Oracle数据库升级前必要的准备工作
Oracle数据库升级前必要的准备工作Oracle数据库升级前必要的准备工作
Oracle数据库升级前必要的准备工作
 
PRM DUL Oracle Database Health Check
PRM DUL Oracle Database Health CheckPRM DUL Oracle Database Health Check
PRM DUL Oracle Database Health Check
 
Content Marketing World 2014 talk - Jay Acunzo
Content Marketing World 2014 talk - Jay AcunzoContent Marketing World 2014 talk - Jay Acunzo
Content Marketing World 2014 talk - Jay Acunzo
 
Gandhi Ultimate Marketing Guru1234 X5555
Gandhi  Ultimate Marketing Guru1234 X5555Gandhi  Ultimate Marketing Guru1234 X5555
Gandhi Ultimate Marketing Guru1234 X5555
 
Mauna Loa :D
Mauna Loa :DMauna Loa :D
Mauna Loa :D
 
Permen tahun2013 nomor81a_lampiran2
Permen tahun2013 nomor81a_lampiran2Permen tahun2013 nomor81a_lampiran2
Permen tahun2013 nomor81a_lampiran2
 
22tribes.com - 22 ways we help organisations grow, experiment, measure and in...
22tribes.com - 22 ways we help organisations grow, experiment, measure and in...22tribes.com - 22 ways we help organisations grow, experiment, measure and in...
22tribes.com - 22 ways we help organisations grow, experiment, measure and in...
 
New Zealand Franchising Confidence Index | January 2013
New Zealand Franchising Confidence Index | January 2013New Zealand Franchising Confidence Index | January 2013
New Zealand Franchising Confidence Index | January 2013
 
Oracle prm数据库恢复工具与asm
Oracle prm数据库恢复工具与asmOracle prm数据库恢复工具与asm
Oracle prm数据库恢复工具与asm
 
1
11
1
 
Energy
EnergyEnergy
Energy
 
Evaluating my magazine
Evaluating my magazineEvaluating my magazine
Evaluating my magazine
 
Become a Social Business: How Intuit is leveraging Social Media
Become a Social Business: How Intuit is leveraging Social MediaBecome a Social Business: How Intuit is leveraging Social Media
Become a Social Business: How Intuit is leveraging Social Media
 
Implementasi TLS dan SRTP pada VoIP Server
Implementasi TLS dan SRTP pada VoIP ServerImplementasi TLS dan SRTP pada VoIP Server
Implementasi TLS dan SRTP pada VoIP Server
 
2012 jaws ug紹介(熊本ug版)
2012 jaws ug紹介(熊本ug版)2012 jaws ug紹介(熊本ug版)
2012 jaws ug紹介(熊本ug版)
 
Hallgrímur pétursson(power-point)
Hallgrímur pétursson(power-point)Hallgrímur pétursson(power-point)
Hallgrímur pétursson(power-point)
 

Similar to 了解Oracle rac brain split resolution

Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPFIvan Babrou
 
A close encounter_with_real_world_and_odd_perf_issues
A close encounter_with_real_world_and_odd_perf_issuesA close encounter_with_real_world_and_odd_perf_issues
A close encounter_with_real_world_and_odd_perf_issuesRiyaj Shamsudeen
 
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL softbasemarketing
 
Cassandra Performance Benchmark
Cassandra Performance BenchmarkCassandra Performance Benchmark
Cassandra Performance BenchmarkBigstep
 
Ccnp3 lab 3_2_en (hacer)
Ccnp3 lab 3_2_en (hacer)Ccnp3 lab 3_2_en (hacer)
Ccnp3 lab 3_2_en (hacer)Omar Herrera
 
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Anne Nicolas
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFBrendan Gregg
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Rakib Hossain
 
[KOR] ODI no.004 analysis of oracle performance degradation caused by ineffic...
[KOR] ODI no.004 analysis of oracle performance degradation caused by ineffic...[KOR] ODI no.004 analysis of oracle performance degradation caused by ineffic...
[KOR] ODI no.004 analysis of oracle performance degradation caused by ineffic...EXEM
 
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdfriyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdfabdulhafeezkalsekar1
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
Analysis of Compromised Linux Server
Analysis of Compromised Linux ServerAnalysis of Compromised Linux Server
Analysis of Compromised Linux Serveranandvaidya
 
How to debug ocfs2 hang problem
How to debug ocfs2 hang problemHow to debug ocfs2 hang problem
How to debug ocfs2 hang problemGang He
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxNaoto MATSUMOTO
 
ioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionMasahito Zembutsu
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxyBo-Yi Wu
 

Similar to 了解Oracle rac brain split resolution (20)

Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
A close encounter_with_real_world_and_odd_perf_issues
A close encounter_with_real_world_and_odd_perf_issuesA close encounter_with_real_world_and_odd_perf_issues
A close encounter_with_real_world_and_odd_perf_issues
 
Channelconfih s9
Channelconfih s9Channelconfih s9
Channelconfih s9
 
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
 
Cassandra Performance Benchmark
Cassandra Performance BenchmarkCassandra Performance Benchmark
Cassandra Performance Benchmark
 
Ccnp3 lab 3_2_en (hacer)
Ccnp3 lab 3_2_en (hacer)Ccnp3 lab 3_2_en (hacer)
Ccnp3 lab 3_2_en (hacer)
 
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
Using AWR/Statspack for Wait Analysis
Using AWR/Statspack for Wait AnalysisUsing AWR/Statspack for Wait Analysis
Using AWR/Statspack for Wait Analysis
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
 
[KOR] ODI no.004 analysis of oracle performance degradation caused by ineffic...
[KOR] ODI no.004 analysis of oracle performance degradation caused by ineffic...[KOR] ODI no.004 analysis of oracle performance degradation caused by ineffic...
[KOR] ODI no.004 analysis of oracle performance degradation caused by ineffic...
 
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdfriyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
riyaj_advanced_rac_troubleshooting_rmoug_2010_ppt.pdf
 
Long live to CMAN!
Long live to CMAN!Long live to CMAN!
Long live to CMAN!
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Analysis of Compromised Linux Server
Analysis of Compromised Linux ServerAnalysis of Compromised Linux Server
Analysis of Compromised Linux Server
 
How to debug ocfs2 hang problem
How to debug ocfs2 hang problemHow to debug ocfs2 hang problem
How to debug ocfs2 hang problem
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
 
ioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distribution
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
 

More from maclean liu

Mysql企业备份发展及实践
Mysql企业备份发展及实践Mysql企业备份发展及实践
Mysql企业备份发展及实践maclean liu
 
Oracle専用データ復旧ソフトウェアprm dulユーザーズ・マニュアル
Oracle専用データ復旧ソフトウェアprm dulユーザーズ・マニュアルOracle専用データ復旧ソフトウェアprm dulユーザーズ・マニュアル
Oracle専用データ復旧ソフトウェアprm dulユーザーズ・マニュアルmaclean liu
 
【诗檀软件 郭兆伟-技术报告】跨国企业级Oracle数据库备份策略
【诗檀软件 郭兆伟-技术报告】跨国企业级Oracle数据库备份策略【诗檀软件 郭兆伟-技术报告】跨国企业级Oracle数据库备份策略
【诗檀软件 郭兆伟-技术报告】跨国企业级Oracle数据库备份策略maclean liu
 
基于Oracle 12c data guard & far sync的低资源消耗两地三数据中心容灾方案
基于Oracle 12c data guard & far sync的低资源消耗两地三数据中心容灾方案基于Oracle 12c data guard & far sync的低资源消耗两地三数据中心容灾方案
基于Oracle 12c data guard & far sync的低资源消耗两地三数据中心容灾方案maclean liu
 
TomCat迁移步骤简述以及案例
TomCat迁移步骤简述以及案例TomCat迁移步骤简述以及案例
TomCat迁移步骤简述以及案例maclean liu
 
dbdao.com 汪伟华 my-sql-replication复制高可用配置方案
dbdao.com 汪伟华 my-sql-replication复制高可用配置方案dbdao.com 汪伟华 my-sql-replication复制高可用配置方案
dbdao.com 汪伟华 my-sql-replication复制高可用配置方案maclean liu
 
Vbox virtual box在oracle linux 5 - shoug 梁洪响
Vbox virtual box在oracle linux 5 - shoug 梁洪响Vbox virtual box在oracle linux 5 - shoug 梁洪响
Vbox virtual box在oracle linux 5 - shoug 梁洪响maclean liu
 
【诗檀软件】Mysql高可用方案
【诗檀软件】Mysql高可用方案【诗檀软件】Mysql高可用方案
【诗檀软件】Mysql高可用方案maclean liu
 
Shoug at apouc2015 4min pitch_biotwang_v2
Shoug at apouc2015 4min pitch_biotwang_v2Shoug at apouc2015 4min pitch_biotwang_v2
Shoug at apouc2015 4min pitch_biotwang_v2maclean liu
 
Apouc 4min pitch_biotwang_v2
Apouc 4min pitch_biotwang_v2Apouc 4min pitch_biotwang_v2
Apouc 4min pitch_biotwang_v2maclean liu
 
使用Oracle osw analyzer工具分析oswbb日志,并绘制系统性能走势图1
使用Oracle osw analyzer工具分析oswbb日志,并绘制系统性能走势图1使用Oracle osw analyzer工具分析oswbb日志,并绘制系统性能走势图1
使用Oracle osw analyzer工具分析oswbb日志,并绘制系统性能走势图1maclean liu
 
诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础 诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础 maclean liu
 
Orclrecove 1 pd-prm-dul testing for oracle database recovery_20141030_biot_wang
Orclrecove 1 pd-prm-dul testing for oracle database recovery_20141030_biot_wangOrclrecove 1 pd-prm-dul testing for oracle database recovery_20141030_biot_wang
Orclrecove 1 pd-prm-dul testing for oracle database recovery_20141030_biot_wangmaclean liu
 
诗檀软件 – Oracle数据库修复专家 oracle数据块损坏知识2014-10-24
诗檀软件 – Oracle数据库修复专家 oracle数据块损坏知识2014-10-24诗檀软件 – Oracle数据库修复专家 oracle数据块损坏知识2014-10-24
诗檀软件 – Oracle数据库修复专家 oracle数据块损坏知识2014-10-24maclean liu
 
追求Jdbc on oracle最佳性能?如何才好?
追求Jdbc on oracle最佳性能?如何才好?追求Jdbc on oracle最佳性能?如何才好?
追求Jdbc on oracle最佳性能?如何才好?maclean liu
 
使用Virtual box在oracle linux 5.7上安装oracle database 11g release 2 rac的最佳实践
使用Virtual box在oracle linux 5.7上安装oracle database 11g release 2 rac的最佳实践使用Virtual box在oracle linux 5.7上安装oracle database 11g release 2 rac的最佳实践
使用Virtual box在oracle linux 5.7上安装oracle database 11g release 2 rac的最佳实践maclean liu
 
Prm dul is an oracle database recovery tool database
Prm dul is an oracle database recovery tool   databasePrm dul is an oracle database recovery tool   database
Prm dul is an oracle database recovery tool databasemaclean liu
 
Oracle prm dul, jvm and os
Oracle prm dul, jvm and osOracle prm dul, jvm and os
Oracle prm dul, jvm and osmaclean liu
 
Oracle dba必备技能 使用os watcher工具监控系统性能负载
Oracle dba必备技能   使用os watcher工具监控系统性能负载Oracle dba必备技能   使用os watcher工具监控系统性能负载
Oracle dba必备技能 使用os watcher工具监控系统性能负载maclean liu
 
Parnassus data recovery manager for oracle database user guide v0.3
Parnassus data recovery manager for oracle database user guide v0.3Parnassus data recovery manager for oracle database user guide v0.3
Parnassus data recovery manager for oracle database user guide v0.3maclean liu
 

More from maclean liu (20)

Mysql企业备份发展及实践
Mysql企业备份发展及实践Mysql企业备份发展及实践
Mysql企业备份发展及实践
 
Oracle専用データ復旧ソフトウェアprm dulユーザーズ・マニュアル
Oracle専用データ復旧ソフトウェアprm dulユーザーズ・マニュアルOracle専用データ復旧ソフトウェアprm dulユーザーズ・マニュアル
Oracle専用データ復旧ソフトウェアprm dulユーザーズ・マニュアル
 
【诗檀软件 郭兆伟-技术报告】跨国企业级Oracle数据库备份策略
【诗檀软件 郭兆伟-技术报告】跨国企业级Oracle数据库备份策略【诗檀软件 郭兆伟-技术报告】跨国企业级Oracle数据库备份策略
【诗檀软件 郭兆伟-技术报告】跨国企业级Oracle数据库备份策略
 
基于Oracle 12c data guard & far sync的低资源消耗两地三数据中心容灾方案
基于Oracle 12c data guard & far sync的低资源消耗两地三数据中心容灾方案基于Oracle 12c data guard & far sync的低资源消耗两地三数据中心容灾方案
基于Oracle 12c data guard & far sync的低资源消耗两地三数据中心容灾方案
 
TomCat迁移步骤简述以及案例
TomCat迁移步骤简述以及案例TomCat迁移步骤简述以及案例
TomCat迁移步骤简述以及案例
 
dbdao.com 汪伟华 my-sql-replication复制高可用配置方案
dbdao.com 汪伟华 my-sql-replication复制高可用配置方案dbdao.com 汪伟华 my-sql-replication复制高可用配置方案
dbdao.com 汪伟华 my-sql-replication复制高可用配置方案
 
Vbox virtual box在oracle linux 5 - shoug 梁洪响
Vbox virtual box在oracle linux 5 - shoug 梁洪响Vbox virtual box在oracle linux 5 - shoug 梁洪响
Vbox virtual box在oracle linux 5 - shoug 梁洪响
 
【诗檀软件】Mysql高可用方案
【诗檀软件】Mysql高可用方案【诗檀软件】Mysql高可用方案
【诗檀软件】Mysql高可用方案
 
Shoug at apouc2015 4min pitch_biotwang_v2
Shoug at apouc2015 4min pitch_biotwang_v2Shoug at apouc2015 4min pitch_biotwang_v2
Shoug at apouc2015 4min pitch_biotwang_v2
 
Apouc 4min pitch_biotwang_v2
Apouc 4min pitch_biotwang_v2Apouc 4min pitch_biotwang_v2
Apouc 4min pitch_biotwang_v2
 
使用Oracle osw analyzer工具分析oswbb日志,并绘制系统性能走势图1
使用Oracle osw analyzer工具分析oswbb日志,并绘制系统性能走势图1使用Oracle osw analyzer工具分析oswbb日志,并绘制系统性能走势图1
使用Oracle osw analyzer工具分析oswbb日志,并绘制系统性能走势图1
 
诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础 诗檀软件 Oracle开发优化基础
诗檀软件 Oracle开发优化基础
 
Orclrecove 1 pd-prm-dul testing for oracle database recovery_20141030_biot_wang
Orclrecove 1 pd-prm-dul testing for oracle database recovery_20141030_biot_wangOrclrecove 1 pd-prm-dul testing for oracle database recovery_20141030_biot_wang
Orclrecove 1 pd-prm-dul testing for oracle database recovery_20141030_biot_wang
 
诗檀软件 – Oracle数据库修复专家 oracle数据块损坏知识2014-10-24
诗檀软件 – Oracle数据库修复专家 oracle数据块损坏知识2014-10-24诗檀软件 – Oracle数据库修复专家 oracle数据块损坏知识2014-10-24
诗檀软件 – Oracle数据库修复专家 oracle数据块损坏知识2014-10-24
 
追求Jdbc on oracle最佳性能?如何才好?
追求Jdbc on oracle最佳性能?如何才好?追求Jdbc on oracle最佳性能?如何才好?
追求Jdbc on oracle最佳性能?如何才好?
 
使用Virtual box在oracle linux 5.7上安装oracle database 11g release 2 rac的最佳实践
使用Virtual box在oracle linux 5.7上安装oracle database 11g release 2 rac的最佳实践使用Virtual box在oracle linux 5.7上安装oracle database 11g release 2 rac的最佳实践
使用Virtual box在oracle linux 5.7上安装oracle database 11g release 2 rac的最佳实践
 
Prm dul is an oracle database recovery tool database
Prm dul is an oracle database recovery tool   databasePrm dul is an oracle database recovery tool   database
Prm dul is an oracle database recovery tool database
 
Oracle prm dul, jvm and os
Oracle prm dul, jvm and osOracle prm dul, jvm and os
Oracle prm dul, jvm and os
 
Oracle dba必备技能 使用os watcher工具监控系统性能负载
Oracle dba必备技能   使用os watcher工具监控系统性能负载Oracle dba必备技能   使用os watcher工具监控系统性能负载
Oracle dba必备技能 使用os watcher工具监控系统性能负载
 
Parnassus data recovery manager for oracle database user guide v0.3
Parnassus data recovery manager for oracle database user guide v0.3Parnassus data recovery manager for oracle database user guide v0.3
Parnassus data recovery manager for oracle database user guide v0.3
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

了解Oracle rac brain split resolution

  • 1. 了解 Oracle RAC Brain Split Resolution by Maclean.liu liu.maclean@gmail.com www.oracledatabase12g.com
  • 2. About Me l Email:liu.maclean@gmail.com l Blog:www.oracledatabase12g.com l Oracle Certified Database Administrator Master 10g and 11g l Over 6 years experience with Oracle DBA technology l Over 7 years experience with Linux technology l Member Independent Oracle Users Group l Member All China Users Group l Presents for advanced Oracle topics: RAC, DataGuard, Performance Tuning and Oracle Internal.
  • 3. 大约是一周前,一位资深的 Oracle 工程师向我和客户介绍 RAC 中脑裂的处理过程,据他介 绍脑裂发生时通过各节点对 voting disk(投票磁盘)的抢夺,那些争抢到(n/2+1)数量 voting disk 的节点就可以 survive(幸存)下来,而没有争抢到 voting disk 的节点则被 evicted 踢出节点。 不得不说以上这番观点,来得太过随意了,一位从 Oracle 6 就开始从事维护工作的老工程师 也会犯这样的概念性错误,只能说 Oracle 技术的更新过于日新月异了。 在理解脑裂(Brain Split)处理过程前,有必要介绍一下 Oracle RAC Css(Cluster Synchronization Services)的工作框架: Oracle RAC CSS 提供 2 种后台服务包括群组管理(Group Managment 简称 GM)和节点监控 (Node Monitor 简称 NM),其中 GM 管理组(group)和锁(lock)服务。在集群中任意时刻总有一 个节点会充当 GM 主控节点(master node)。集群中的其他节点串行地将 GM 请求发送到主控 节点(master node),而 master node 将集群成员变更信息广播给集群中的其他节点。组成员关 系(group membership)在每次发生集群重置(cluster reconfiguration)时发生同步。每一个节点独 立地诠释集群成员变化信息。
  • 4. 而节点监控 NM 服务则负责通过 skgxn(skgxn-libskgxn.a,提供节点监控的库)与其他厂商的集 群软件保持节点信息的一致性。此外 NM 还提供对我们熟知的网络心跳(Network heartbeat) 和磁盘心跳(Disk heartbeat)的维护以保证节点始终存活着。当集群成员没有正常 Network heartbeat 或 Disk heartbeat 时 NM 负责将成员踢出集群,被踢出集群的节点将发生节点重启 (reboot)。 NM 服务通过 OCR 中的记录(OCR 中记录了 Interconnect 的信息)来了解其所需要监听和交互 的端点,将心跳信息通过网络发送到其他集群 成员。同时它也监控来自所有其他集群成员 的网络心跳 Network heartbeat,每一秒钟都会发生这样的网络心跳,若某个节点的网络心跳 在 misscount(by the way:10.2.0.1 中 Linux 上默认 misscount 为 60s,其他平台为 30s,若使用了 第三方 vendor clusterware 则为 600s,但 10.2.0.1 中未引入 disktimeout;10.2.0.4 以后 misscount 为 60s,disktimeout 为 200s;11.2 以后 misscount 为 30s:CRS-4678: Successful get misscount 30 for Cluster Synchronization Services,CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services)指定的秒数中都没有被收到的话,该节点被认为已经”死 亡”了。NM 还负责当其他节点加入或离开集群时初始化集群的重置 (Initiates cluster reconfiguration)。 在解决脑裂的场景中,NM 还会监控 voting disk 以了解其他的竞争子集群(subclusters)。关于 子集群我们有必要介绍一下,试想我们的环境中存在大量的节点,以 Oracle 官方构建过 的 128 个节点的环境为我们的想象空间,当网络故障发生时存在多种的可能性,一种可能性是 全局的网络失败,即 128 个节点中每个节点都不能互相发生网络心 跳,此时会产生多达 128 个的信息”孤岛”子集群。另一种可能性是局部的网络失败,128 个节点中被分成多个部分, 每个部分中包含多于一个的节点,这些部 分就可以被称作子集群(subclusters)。当出现网络 故障时子集群内部的多个节点仍能互相通信传输投票信息(vote mesg),但子集群或者孤岛节点 之间已经无法通过常规的 Interconnect 网络交流了,这个时候 NM Reconfiguration 就需要用到 voting disk 投票磁盘。 因为 NM 要使用 voting disk 来解决因为网络故障造成的通信障碍,所以需要保证 voting disk 在任意时刻都可以被正常访问。在正常状态下,每个节点都会进行磁盘心跳活动,具体来说 就是会到投票磁盘的某个块上写入 disk 心跳信息,这种活动 每一秒钟都会发生,同时 CSS 还会每秒读取一种称作”kill block”的”赐死块”,当”kill block”的内容表示本节点被驱逐出集群 时,CSS 会主动重启节点。
  • 5. 为了保证以上的磁盘心跳和读取”kill block”的活动始终正常运作 CSS 要求保证至少(N/2+1)个 投票磁盘要被节点正常访问,这样就保证了每 2 个节点间总是至少有一个投票磁盘是它们都 可以正常访问的,在正常情况下(注意是风平浪静的正常情况)只要节点所能访问的在线 voting disk 多于无法访问的 voting disk,该节点都能幸福地活下去,当无法访问的 voting disk 多于正常的 voting disk 时,Cluster Communication Service 进程将失败并引起节点重启。所以 有一种说法认为 voting disk 只要有 2 个足以保证冗余度就可以了,没有必要有 3 个或以上 voting disk,这种说法是错误的。Oracle 推荐集群中至少要有 3 个 voting disks。 当实际的 NM Reconfiguration 集群重置情况发生时所有的 active 节点和正在加入集群的节点 都会参与到 reconfig 中,那些没有应答(ack)的节点都将不再被归入新的集群关系中。实际上 reconfig 重置包括多个阶段: 1.初始化阶段 — reconfig manager(由集群成员号最低的节点担任)向其他节点发送启动 reconfig 的信号 2.投票阶段 — 节点向 reconfig manager 发送该节点所了解的成员关系 3.脑裂检查阶段 — reconfig manager 检查是否脑裂 4.驱逐阶段 — reconfig manager 驱逐非成员节点 5.更新阶段 — reconfig manager 向成员节点发送权威成员关系信息 在脑裂检查阶段 Reconfig Manager 会找出那些没有 Network Heartbeat 而有 Disk Heartbeat 的 节点,并通过 Network Heartbeat(如果可能的话)和 Disk Heartbeat 的信息来计算所有竞争子集 群(subcluster)内的节点数目,并依据以下 2 种因素决定哪个子集群应当存活下去: 1. 拥有最多节点数目的子集群(Sub-cluster with largest number of Nodes) 2. 若子集群内数目相等则为拥有最低节点号的子集群(Sub-cluster with lowest node number),举例来说在一个 2 节点的 RAC 环境中总是 1 号节点会获胜。
  • 6. 在完成脑裂检查后进入驱逐阶段,被驱逐节点会收到发送给它们的驱逐信息(如果网络可用 的话),若无法发送信息则会通过写出驱逐通知到 voting disk 上的”kill block”来达到驱逐通知 的目的。同时还会等待被驱逐节点表示其已收到驱逐通知,这种表示可能是通过网络通信的 方式也可能是投票磁盘上的状态信息。 可以看到 Oracle CSS 中 Brain Split Check 时会尽可能地保证最大的一个子集群存活下来以保 证 RAC 系统具有最高的可用性,而并不如那位资深工程师所说的在 Cluster Reconfiguration 阶段会通过节点对投票磁盘的抢占来保证哪个节点存活下来。 以下为一个三节点 RAC 环境中的 2 个示例场景: 1.1 号节点网络失败,2,3 号节点形成子集群;2,3 节点通过 voting disk 向 1 号节点发起驱 逐: 以下为 1 号节点的 ocssd.log 日志: [ CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 29.610 seconds [ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1037, misstime 30390 [ CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 3.280308e-268artbeat fatal, eviction in 29.150 seconds 对 2,3 号节点发起 misscount 计时 [ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30850 [ CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2011-04-23 17:11:44.368 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 28.610 seconds [ CSSD]2011-04-23 17:12:04.778 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 14.580 seconds [ CSSD]2011-04-23 17:12:04.779 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 3.280308e-268artbeat fatal, eviction in 14.120 seconds [ CSSD]2011-04-23 17:12:06.207 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 13.580 seconds [ CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 5.560 seconds [ CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 5.100 seconds [ CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 4.560 seconds [ CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 4.100 seconds [ CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 3.560 seconds [ CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 3.100 seconds [ CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 2.560 seconds
  • 7. [ CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 2.100 seconds [ CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 1.550 seconds [ CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 1.090 seconds 2 号节点的 ocssd.log 日志: [ CSSD]2011-04-23 17:11:53.054 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 29.800 seconds [ CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE: clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30200 [ CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2011-04-23 17:11:54.516 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 28.790 seconds [ CSSD]2011-04-23 17:12:14.826 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 14.800 seconds [ CSSD]2011-04-23 17:12:16.265 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 13.800 seconds [ CSSD]2011-04-23 17:12:27.755 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 5.800 seconds [ CSSD]2011-04-23 17:12:29.197 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 4.800 seconds [ CSSD]2011-04-23 17:12:30.658 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 3.800 seconds [ CSSD]2011-04-23 17:12:32.133 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 2.800 seconds [ CSSD]2011-04-23 17:12:33.602 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 1.790 seconds [ CSSD]2011-04-23 17:12:35.126 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 0.800 seconds [ CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms [ CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10] clssnmHandleSyn 应答 3 号节点发送的同步信息 [ CSSD]2011-04-23 17:12:35.399 [5073104] >USER: NMEVENT_SUSPEND [00][00] [00][0e] 发生 Node Monitoring SUSPEND 事件 [ CSSD]2011-04-23 17:12:35.405 [117574544] >TRACE: clssnmSendVoteInfo: node(3) syncSeqNo(10) 通过 clssnmSendVoteInfo 向 3 号节点发送投票信息 Vote mesg [ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new) [ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 1, state (3/0) unique (1303592601/1303592601) prevConuni(0) birth (9/9) (old/new) [ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7e80ae8), probe((nil)) [ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmDeactivateNode: node 1 (vrh1) left cluster 确认 1 号节点离开了集群 cluster
  • 8. [ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new) [ CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE: clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new) [ CSSD]2011-04-23 17:12:35.415 [117574544] >USER: clssnmHandleUpdate: SYNC(10) from node(3) completed [ CSSD]2011-04-23 17:12:35.416 [117574544] >USER: clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER [ CSSD]2011-04-23 17:12:35.416 [117574544] >USER: clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER [ CSSD]2011-04-23 17:12:35.416 [117574544] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms [ CSSD]2011-04-23 17:12:35.416 [3021970320] >TRACE: clssgmReconfigThread: started for reconfig (10) [ CSSD]2011-04-23 17:12:35.416 [3021970320] >USER: NMEVENT_RECONFIG [00] [00][00][0c] [ CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2 [ CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9) [ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3 [ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2 [ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9) [ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2 [ CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9) [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh1 type 3 [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh2 type 3 [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh3 type 3 [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock ocr_crs type 2 [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(ocr_crs) birth(9/9) [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2 [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9) [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 10 [ CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE: clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa [ CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE: clssgmPeerListener: connects done (2/2) [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2) [ CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status [ CSSD]2011-04-23 17:12:35.428 [3021970320] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete [ CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes [ CSSD]CLSS-3001: local node number 2, master node number 2
  • 9. 完成 reconfiguration [ CSSD]2011-04-23 17:12:35.440 [3021970320] >TRACE: clssgmReconfigThread: completed for reconfig(10), with status(1) 以下为 3 号节点的 ocssd.log: [ CSSD]2011-04-23 17:12:36.303 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 1.867300e-268artbeat fatal, eviction in 29.220 seconds [ CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE: clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30780 [ CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2011-04-23 17:12:57.889 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 1.867300e-268artbeat fatal, eviction in 14.220 seconds [ CSSD]2011-04-23 17:13:10.674 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 5.220 seconds [ CSSD]2011-04-23 17:13:12.115 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 4.220 seconds [ CSSD]2011-04-23 17:13:13.597 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 3.210 seconds [ CSSD]2011-04-23 17:13:15.024 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 2.220 seconds [ CSSD]2011-04-23 17:13:16.504 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 1.220 seconds [ CSSD]2011-04-23 17:13:17.987 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 0.220 seconds [ CSSD]2011-04-23 17:13:18.325 [3053439888] >TRACE: clssnmPollingThread: Eviction started for node vrh1 (1), flags 0x040d, state 3, wt4c 0 [ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmDoSyncUpdate: Initiating sync 10 [ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms [ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (11) [ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ALIVE [ CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ALIVE [ CSSD]2011-04-23 17:13:18.327 [3032460176] >TRACE: clssnmSendSync: syncSeqNo(10) [ CSSD]2011-04-23 17:13:18.329 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(2) [ CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms [ CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10] [ CSSD]2011-04-23 17:13:18.330 [8136912] >USER: NMEVENT_SUSPEND [00][00] [00][0e] [ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(11) [ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmDoSyncUpdate: Terminating node 1, vrh1, misstime(60010) state(5) [ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (13) [ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE [ CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ACTIVE [ CSSD]2011-04-23 17:13:18.334 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(2) [ CSSD]2011-04-23 17:13:18.335 [89033616] >TRACE: clssnmSendVoteInfo:
  • 10. node(3) syncSeqNo(10) [ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(13) 以上完成了 2-3 节点间的 Vote mesg 通信,这些信息包含 Node identifier,GM peer to peer listening endpoint 以及 View of cluster membership。 [ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: Checking disk info... 开始检测 voting disk 上的信息 [ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: node 1, vrh1, state 5 with leader 1 has smaller cluster size 1; my cluster size 2 with leader 2 发现其他子集群,包含 1 号节点且 1 号节点为该子集群的 leader,为最小子集群;3 号与 2 号节点组成最大子 集群,2 号节点为 leader 节点 [ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Start [ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Evicting node 1, vrh1, birth 9, death 10, impendingrcfg 1, stateflags 0x40d 发起对 1 号节点的驱逐 [ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmSendShutdown: req to node 1, kill time 443294 [ CSSD]2011-04-23 17:13:18.339 [3032460176] >TRACE: clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7eaf220), probe((nil)) [ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmWaitOnEvictions: Start [ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmWaitOnEvictions: node 1, vrh1, undead 1 [ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmCheckKillStatus: Node 1, vrh1, down, LATS(443144),timeout(150) clssnmCheckKillStatus 检查 1 号节点是否 down 了 [ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (15) [ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE [ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ACTIVE [ CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE: clssnmSendUpdate: syncSeqNo(10) [ CSSD]2011-04-23 17:13:18.341 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(2) [ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new) [ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 1, state (5/0) unique (1303592601/1303592601) prevConuni(1303592601) birth (9/9) (old/new) [ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmDeactivateNode: node 1 (vrh1) left cluster [ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
  • 11. [ CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE: clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new) [ CSSD]2011-04-23 17:13:18.342 [89033616] >USER: clssnmHandleUpdate: SYNC(10) from node(3) completed [ CSSD]2011-04-23 17:13:18.342 [89033616] >USER: clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER [ CSSD]2011-04-23 17:13:18.342 [89033616] >USER: clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER [ CSSD]2011-04-23 17:13:18.342 [89033616] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms [ CSSD]2011-04-23 17:13:18.347 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(15) [ CSSD]2011-04-23 17:13:18.348 [3032460176] >TRACE: clssnmDoSyncUpdate: Sync 10 complete! [ CSSD]2011-04-23 17:13:18.350 [3021970320] >TRACE: clssgmReconfigThread: started for reconfig (10) [ CSSD]2011-04-23 17:13:18.350 [3021970320] >USER: NMEVENT_RECONFIG [00] [00][00][0c] [ CSSD]2011-04-23 17:13:18.351 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2 [ CSSD]2011-04-23 17:13:18.352 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9) [ CSSD]2011-04-23 17:13:18.353 [3063929744] >TRACE: clssgmDispatchCMXMSG(): got message type(7) src(2) incarn(10) during incarn(9/9) [ CSSD]2011-04-23 17:13:18.354 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3 [ CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2 [ CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9) [ CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2 [ CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9) [ CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh1 type 3 [ CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh2 type 3 [ CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh3 type 3 [ CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock ocr_crs type 2 [ CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(ocr_crs) birth(9/9) [ CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2 [ CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9) [ CSSD]2011-04-23 17:13:18.357 [3021970320] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 10 [ CSSD]2011-04-23 17:13:18.366 [3063929744] >TRACE: clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa [ CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(68) incarn 10 [ CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE: clssgmPeerListener: connects done (2/2) [ CSSD]2011-04-23 17:13:18.369 [3021970320] >TRACE: clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2) 更新阶段
  • 12. [ CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes [ CSSD]CLSS-3001: local node number 3, master node number 2 [ CSSD]2011-04-23 17:13:18.372 [3021970320] >TRACE: clssgmReconfigThread: completed for reconfig(10), with status(1) 2.另一场景为 1 号节点未加入集群,2 号节点的网络失败,因 2 号节点的 member number 较 小故其通过 voting disk 向 3 号节点发起驱逐 以下为 2 号节点的 ocssd.log 日志 [ CSSD]2011-04-23 17:41:48.643 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 29.890 seconds [ CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE: clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30110 [ CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2011-04-23 17:41:50.132 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 28.890 seconds [ CSSD]2011-04-23 17:42:10.533 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 14.860 seconds [ CSSD]2011-04-23 17:42:11.962 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 13.860 seconds [ CSSD]2011-04-23 17:42:23.523 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 5.840 seconds [ CSSD]2011-04-23 17:42:24.989 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 4.840 seconds [ CSSD]2011-04-23 17:42:26.423 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 3.840 seconds [ CSSD]2011-04-23 17:42:27.890 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 2.840 seconds [ CSSD]2011-04-23 17:42:29.382 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 1.840 seconds [ CSSD]2011-04-23 17:42:30.832 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 0.830 seconds [ CSSD]2011-04-23 17:42:32.020 [3053439888] >TRACE: clssnmPollingThread: Eviction started for node vrh3 (3), flags 0x040d, state 3, wt4c 0 [ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmDoSyncUpdate: Initiating sync 13 [ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms [ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (11) [ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ALIVE [ CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE: clssnmSendSync: syncSeqNo(13) [ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1) [ CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms [ CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[13] sync[13] [ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(11) [ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmDoSyncUpdate: Terminating node 3, vrh3, misstime(60000) state(5) [ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (13) [ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
  • 13. [ CSSD]2011-04-23 17:42:32.021 [5073104] >USER: NMEVENT_SUSPEND [00][00] [00][0c] [ CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1) [ CSSD]2011-04-23 17:42:32.022 [117574544] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(13) [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(13) [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: Checking disk info... [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: node 3, vrh3, state 5 with leader 3 has smaller cluster size 1; my cluster size 1 with leader 2 检查 voting disk 后发现子集群 3 为最小"子集群"(3 号节点的 node number 较 2 号大);2 号节点为最大 子集群 [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Start [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Evicting node 3, vrh3, birth 3, death 13, impendingrcfg 1, stateflags 0x40d [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmSendShutdown: req to node 3, kill time 1643084 发起对 3 号节点的驱逐和 shutdown request [ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmDiscHelper: vrh3, node(3) connection failed, con (0xb7e79bb0), probe((nil)) [ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmWaitOnEvictions: Start [ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmWaitOnEvictions: node 3, vrh3, undead 1 [ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmCheckKillStatus: Node 3, vrh3, down, LATS(1642874),timeout(210) [ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (15) [ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE [ CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE: clssnmSendUpdate: syncSeqNo(13) [ CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1) [ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new) [ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new) [ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new) [ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmUpdateNodeState: node 3, state (5/0) unique (1303591326/1303591326) prevConuni(1303591326) birth (3/3) (old/new) [ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmDeactivateNode: node 3 (vrh3) left cluster [ CSSD]2011-04-23 17:42:32.024 [117574544] >USER: clssnmHandleUpdate: SYNC(13) from node(2) completed [ CSSD]2011-04-23 17:42:32.024 [117574544] >USER: clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER [ CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms [ CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(15) [ CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE: clssnmDoSyncUpdate:
  • 14. Sync 13 complete! [ CSSD]2011-04-23 17:42:32.024 [3021970320] >TRACE: clssgmReconfigThread: started for reconfig (13) [ CSSD]2011-04-23 17:42:32.024 [3021970320] >USER: NMEVENT_RECONFIG [00] [00][00][04] [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock crs_version type 2 [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(crs_version) birth(3/3) [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3 [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_FAILOVER) birth(3/3) [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2 [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(EVMDMAIN) birth(3/3) [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2 [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(CRSDMAIN) birth(3/3) [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh1 type 3 [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_MEMBER_vrh1) birth(3/3) [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh3 type 3 [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_MEMBER_vrh3) birth(3/3) [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock ocr_crs type 2 [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(ocr_crs) birth(3/3) [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2 [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(#CSS_CLSSOMON) birth(3/3) [ CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 13 [ CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE: clssgmPeerDeactivate: node 3 (vrh3), death 13, state 0x0 connstate 0xf [ CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE: clssgmPeerListener: connects done (1/1) [ CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE: clssgmEstablishMasterNode: MASTER for 13 is node(2) birth(2) [ CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status [ CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete [ CSSD]CLSS-3000: reconfiguration successful, incarnation 13 with 1 nodes [ CSSD]CLSS-3001: local node number 2, master node number 2 完成 reconfiguration [ CSSD]2011-04-23 17:42:32.027 [3021970320] >TRACE: clssgmReconfigThread: completed for reconfig(13), with status(1)
  • 15. 以下为 3 号节点的 ocssd.log 日志: [ CSSD]2011-04-23 17:42:33.204 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 1.867300e-268artbeat fatal, eviction in 29.360 seconds [ CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE: clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1039, misstime 30640 [ CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2011-04-23 17:42:55.168 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 1.867300e-268artbeat fatal, eviction in 14.330 seconds [ CSSD]2011-04-23 17:43:08.182 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 5.310 seconds [ CSSD]2011-04-23 17:43:09.661 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 4.300 seconds [ CSSD]2011-04-23 17:43:11.144 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 3.300 seconds [ CSSD]2011-04-23 17:43:12.634 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 2.300 seconds [ CSSD]2011-04-23 17:43:14.053 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 1.300 seconds [ CSSD]2011-04-23 17:43:15.467 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 0.300 seconds [ CSSD]2011-04-23 17:43:15.911 [3053439888] >TRACE: clssnmPollingThread: Eviction started for node vrh2 (2), flags 0x040f, state 3, wt4c 0 [ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmDoSyncUpdate: Initiating sync 13 [ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms [ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (11) [ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ALIVE [ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmSendSync: syncSeqNo(13) [ CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1) [ CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms [ CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE: clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[29] sync[13] [ CSSD]2011-04-23 17:43:15.912 [8136912] >USER: NMEVENT_SUSPEND [00][00] [00][0c] [ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmWaitForAcks: done, msg type(11) [ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmDoSyncUpdate: Terminating node 2, vrh2, misstime(60010) state(5) [ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmSetupAckWait: Ack message type (13) [ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmSetupAckWait: node(3) is ACTIVE [ CSSD]2011-04-23 17:43:15.913 [89033616] >TRACE: clssnmSendVoteInfo: node(3) syncSeqNo(13) [ CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1) [ CSSD]2011-04-23 17:43:15.913 [3032460176] >TRACE: clssnmCheckDskInfo: Checking disk info... [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssnmCheckDskInfo: Aborting local node to avoid splitbrain. [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: : my node(3), Leader(3), Size(1) VS Node(2), Leader(2), Size(1) 读取 voting disk 后发现 kill block,为避免 split brain,自我 aborting!
  • 16. [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: ################################### [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssscExit: CSSD aborting [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: ################################### [ CSSD]--- DUMP GROCK STATE DB --- [ CSSD]---------- [ CSSD] type 2, Id 4, Name = (crs_version) [ CSSD] flags: 0x1000 [ CSSD] grant: count=0, type 0, wait 0 [ CSSD] Member Count =2, master 2 [ CSSD] . . . . . [ CSSD] memberNo =2, seq 2 [ CSSD] flags = 0x0, granted 0 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 3, nodeBirth 3 [ CSSD] privateDataSize = 0 [ CSSD] publicDataSize = 0 [ CSSD] . . . . . [ CSSD] memberNo =1, seq 12 [ CSSD] flags = 0x1000, granted 0 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 2, nodeBirth 2 [ CSSD] privateDataSize = 0 [ CSSD] publicDataSize = 0 [ CSSD]---------- [ CSSD]---------- [ CSSD] type 3, Id 11, Name = (_ORA_CRS_FAILOVER) [ CSSD] flags: 0x0 [ CSSD] grant: count=1, type 3, wait 1 [ CSSD] Member Count =1, master -3 [ CSSD] . . . . . [ CSSD] memberNo =0, seq 0 [ CSSD] flags = 0x12, granted 1 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 3, nodeBirth 3 [ CSSD] privateDataSize = 0 [ CSSD] publicDataSize = 0 [ CSSD]---------- [ CSSD]---------- [ CSSD] type 2, Id 2, Name = (EVMDMAIN) [ CSSD] flags: 0x1000 [ CSSD] grant: count=0, type 0, wait 0 [ CSSD] Member Count =2, master 2 [ CSSD] . . . . . [ CSSD] memberNo =2, seq 1 [ CSSD] flags = 0x0, granted 0 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 2, nodeBirth 2 [ CSSD] privateDataSize = 508 [ CSSD] publicDataSize = 504 [ CSSD] . . . . . [ CSSD] memberNo =3, seq 2 [ CSSD] flags = 0x0, granted 0 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 3, nodeBirth 3 [ CSSD] privateDataSize = 508 [ CSSD] publicDataSize = 504 [ CSSD]---------- [ CSSD]---------- [ CSSD] type 2, Id 5, Name = (CRSDMAIN) [ CSSD] flags: 0x1000 [ CSSD] grant: count=0, type 0, wait 0
  • 17. [ CSSD] Member Count =1, master 3 [ CSSD] . . . . . [ CSSD] memberNo =3, seq 2 [ CSSD] flags = 0x0, granted 0 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 3, nodeBirth 3 [ CSSD] privateDataSize = 128 [ CSSD] publicDataSize = 128 [ CSSD]---------- [ CSSD]---------- [ CSSD] type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh1) [ CSSD] flags: 0x0 [ CSSD] grant: count=1, type 3, wait 1 [ CSSD] Member Count =1, master -3 [ CSSD] . . . . . [ CSSD] memberNo =0, seq 0 [ CSSD] flags = 0x12, granted 1 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 3, nodeBirth 3 [ CSSD] privateDataSize = 0 [ CSSD] publicDataSize = 0 [ CSSD]---------- [ CSSD]---------- [ CSSD] type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh3) [ CSSD] flags: 0x0 [ CSSD] grant: count=1, type 3, wait 1 [ CSSD] Member Count =1, master -3 [ CSSD] . . . . . [ CSSD] memberNo =0, seq 0 [ CSSD] flags = 0x12, granted 1 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 3, nodeBirth 3 [ CSSD] privateDataSize = 0 [ CSSD] publicDataSize = 0 [ CSSD]---------- [ CSSD]---------- [ CSSD] type 2, Id 3, Name = (ocr_crs) [ CSSD] flags: 0x1000 [ CSSD] grant: count=0, type 0, wait 0 [ CSSD] Member Count =2, master 3 [ CSSD] . . . . . [ CSSD] memberNo =3, seq 2 [ CSSD] flags = 0x0, granted 0 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 3, nodeBirth 3 [ CSSD] privateDataSize = 0 [ CSSD] publicDataSize = 32 [ CSSD] . . . . . [ CSSD] memberNo =2, seq 12 [ CSSD] flags = 0x1000, granted 0 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 2, nodeBirth 2 [ CSSD] privateDataSize = 0 [ CSSD] publicDataSize = 32 [ CSSD]---------- [ CSSD]---------- [ CSSD] type 2, Id 1, Name = (#CSS_CLSSOMON) [ CSSD] flags: 0x1000 [ CSSD] grant: count=0, type 0, wait 0 [ CSSD] Member Count =2, master 2 [ CSSD] . . . . . [ CSSD] memberNo =2, seq 1 [ CSSD] flags = 0x1000, granted 0 [ CSSD] refCnt = 1
  • 18. [ CSSD] nodeNum = 2, nodeBirth 2 [ CSSD] privateDataSize = 0 [ CSSD] publicDataSize = 0 [ CSSD] . . . . . [ CSSD] memberNo =3, seq 2 [ CSSD] flags = 0x1000, granted 0 [ CSSD] refCnt = 1 [ CSSD] nodeNum = 3, nodeBirth 3 [ CSSD] privateDataSize = 0 [ CSSD] publicDataSize = 0 [ CSSD]---------- [ CSSD]--- END OF GROCK STATE DUMP --- [ CSSD]------- Begin Dump ------- © 2011, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址, 否则追求法律责任.