歡迎您光臨本站 註冊首頁

rhel 5.4的rhcs兩節點cluster,每6天的同一時間,會自動重啟一次

rhel 5.4的rhcs兩節點cluster,每6天的同一時間,會自動重啟一次

新配一套兩節點的rhel5.4機器,配置完成後測試,運行正常.但是後來發現每隔6天就會重啟一次. 通過日誌看: 首先是備把主fence重啟,主重啟完成後把備fence重啟,然後集群即恢復正常,直至6天後重複.

最近一次實驗,把備的cman 服務stop,結果是主同樣會重啟,主重啟後備被fence重啟,而後機器恢復正常.  由於這次備機的cman沒有啟動,所以主重啟前備機上也沒有記錄任何日誌.

已排除主機上有定時任務影響,  求有 rhel 5.4的rhcs部署經驗者回應,看是否碰到該問題.


主機:為HP DL585 G7
fencedevice  agent為fence_ipmilan
《解決方案》

回復 1# afareg


    HP機器油專門的ILO口,做成fence設備
《解決方案》

是使用HP機器的ilo口做的fence設備,  DL585的是ilo3版本所以用的agent類型是fence_ipmilan. 配置如下:

<?xml version="1.0"?>
<cluster alias="cccrcls" config_version="22" name="cccrcls">
        <quorumd device="/dev/mapper/cccrqvd" interval="1" label="qdsk" min_score="2" tko="10" votes="1">
                <heuristic interval="2" program="ping 192.144.133.254 -c1 -t1" score="2"/>
        </quorumd>
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="cccra" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="" name="cccra-ilo"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="cccrb" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="" name="cccrb-ilo"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" auth="password" ipaddr="192.144.133.101" login="postpas" name="cccra-ilo" passwd="paswww"/>
                <fencedevice agent="fence_ipmilan" auth="password" ipaddr="192.144.133.102" login="postpas" name="cccrb-ilo" passwd="paswww"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="cccr" ordered="0" restricted="0">
                                <failoverdomainnode name="cccra" priority="1"/>
                                <failoverdomainnode name="cccrb" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="192.144.144.103" monitor_link="1"/>
                        <lvm lv_name="cccr_vg1_lv1" name="cccr_vg1_lv1" vg_name="cccr_vg1"/>
                        <fs device="/dev/cccr_vg1/cccr_vg1_lv1" force_fsck="0" force_unmount="1" fsid="4615" fstype="ext3" mountpoint="/data01" name="data01" options="rw" self_fence="0"/>
                        <script file="/etc/init.d/oracleha.sh" name="oraha"/>
                        <lvm lv_name="cccr_vg2_lv1" name="cccr_vg2_lv1" vg_name="cccr_vg2"/>
                        <fs device="/dev/cccr_vg2/cccr_vg2_lv1" force_fsck="0" force_unmount="1" fsid="17341" fstype="ext3" mountpoint="/data02" name="data02" options="rw" self_fence="0"/>
                </resources>
                <service autostart="1" domain="cccr" exclusive="1" name="cccrha" recovery="relocate">
                        <ip ref="192.144.144.103">
                                <lvm ref="cccr_vg1_lv1">
                                        <lvm ref="cccr_vg2_lv1">
                                                <fs ref="data01">
                                                        <fs ref="data02">
                                                                <script ref="oraha"/>
                                                        </fs>
                                                </fs>
                                        </lvm>
                                </lvm>
                        </ip>
                </service>
        </rm>
</cluster>
《解決方案》

最近幾次自動重啟的時間點,  每次大約是6天差2分鐘.

messages.2:Sep  6  19:23:43
messages.3:Aug 31 19:25:32
messages.4:Aug 25 19:27:03
《解決方案》

回復 4# afareg


    看下日誌中的描述記錄是否符合重啟條件
《解決方案》

每次重啟的順序是: 節點2,cccrb先重啟.節點2重啟完成後,節點1cccra開始重啟.  節點1重啟完成後,cluster就恢復正常,直至6天之後重複該過程.

節點1日誌Aug 21 04:03:02 cccra syslogd 1.4.1: restart.
Aug 25 19:23:44 cccra openais: The token was lost in the OPERATIONAL state.
Aug 25 19:23:44 cccra openais: Receive multicast socket recv buffer size (288000 bytes).
Aug 25 19:23:44 cccra openais: Transmit multicast socket send buffer size (288000 bytes).
Aug 25 19:23:44 cccra openais: entering GATHER state from 2.
Aug 25 19:23:46 cccra openais: entering GATHER state from 0.
Aug 25 19:23:46 cccra openais: Creating commit token because I am the rep.
Aug 25 19:23:46 cccra openais: Saving state aru 2c high seq received 2c
Aug 25 19:23:46 cccra openais: Storing new sequence id for ring 11c
Aug 25 19:23:46 cccra openais: entering COMMIT state.
Aug 25 19:23:46 cccra openais: entering RECOVERY state.
Aug 25 19:23:46 cccra openais: position member 192.144.144.103:
Aug 25 19:23:46 cccra openais: previous ring seq 280 rep 192.144.144.103
Aug 25 19:23:46 cccra openais: aru 2c high delivered 2c received flag 1
Aug 25 19:23:46 cccra openais: Did not need to originate any messages in recovery.
Aug 25 19:23:46 cccra openais: Sending initial ORF token
Aug 25 19:23:46 cccra openais: CLM CONFIGURATION CHANGE
Aug 25 19:23:46 cccra openais: New Configuration:
Aug 25 19:23:46 cccra openais:        r(0) ip(192.144.144.103)  
Aug 25 19:23:46 cccra openais: Members Left:
Aug 25 19:23:46 cccra kernel: dlm: closing connection to node 2
Aug 25 19:23:46 cccra openais:        r(0) ip(192.144.144.104)  
Aug 25 19:23:46 cccra openais: Members Joined:
Aug 25 19:23:46 cccra openais: CLM CONFIGURATION CHANGE
Aug 25 19:23:46 cccra openais: New Configuration:
Aug 25 19:23:46 cccra openais:        r(0) ip(192.144.144.103)  
Aug 25 19:23:46 cccra openais: Members Left:
Aug 25 19:23:46 cccra openais: Members Joined:
Aug 25 19:23:46 cccra fenced: cccrb not a cluster member after 0 sec post_fail_delay
Aug 25 19:23:46 cccra openais: This node is within the primary component and will provide service.
Aug 25 19:23:46 cccra openais: entering OPERATIONAL state.
Aug 25 19:23:46 cccra fenced: fencing node "cccrb"
Aug 25 19:23:46 cccra openais: got nodejoin message 192.144.144.103
Aug 25 19:23:46 cccra openais: got joinlist message from node 1
Aug 25 19:23:49 cccra qdiskd: <info> Assuming master role
Aug 25 19:23:50 cccra qdiskd: <notice> Writing eviction notice for node 2
Aug 25 19:23:51 cccra qdiskd: <notice> Node 2 evicted
Aug 25 19:23:54 cccra fenced: fence "cccrb" success
Aug 25 19:23:55 cccra clurgmgrd: <notice> Taking over service service:cccrha from down member cccrb
Aug 25 19:23:57 cccra clurgmgrd: : <notice> Owner of cccr_vg1/cccr_vg1_lv1 is not in the cluster
Aug 25 19:23:57 cccra clurgmgrd: : <notice> Stealing cccr_vg1/cccr_vg1_lv1
Aug 25 19:23:57 cccra clurgmgrd: : <notice> Activating cccr_vg1/cccr_vg1_lv1
Aug 25 19:23:57 cccra clurgmgrd: : <notice> Making resilient : lvchange -ay cccr_vg1/cccr_vg1_lv1
Aug 25 19:23:57 cccra clurgmgrd: : <notice> Resilient command: lvchange -ay cccr_vg1/cccr_vg1_lv1 --config dev
ices{filter=["a|/dev/dm-0|","a|/dev/dm-1|","r|.*|"]}
Aug 25 19:23:57 cccra multipathd: dm-3: add map (uevent)
Aug 25 19:23:57 cccra kernel: kjournald starting.  Commit interval 5 seconds
Aug 25 19:23:57 cccra kernel: EXT3 FS on dm-3, internal journal
Aug 25 19:23:57 cccra kernel: EXT3-fs: recovery complete.
Aug 25 19:23:57 cccra kernel: EXT3-fs: mounted filesystem with ordered data mode.
Aug 25 19:24:18 cccra clurgmgrd: <notice> Service service:cccrha started
Aug 25 19:24:19 cccra clurgmgrd: : <notice> Getting status
Aug 25 19:27:15 cccra openais: entering GATHER state from 11.
Aug 25 19:27:15 cccra openais: Creating commit token because I am the rep.
Aug 25 19:27:15 cccra openais: Saving state aru 19 high seq received 19
Aug 25 19:27:15 cccra openais: Storing new sequence id for ring 120
Aug 25 19:27:15 cccra openais: entering COMMIT state.
Aug 25 19:27:15 cccra openais: entering RECOVERY state.
Aug 25 19:27:15 cccra openais: position member 192.144.144.103:
Aug 25 19:27:15 cccra openais: previous ring seq 284 rep 192.144.144.103
Aug 25 19:27:15 cccra openais: aru 19 high delivered 19 received flag 1
Aug 25 19:27:15 cccra openais: position member 192.144.144.104:
Aug 25 19:27:15 cccra openais: previous ring seq 284 rep 192.144.144.104
Aug 25 19:27:15 cccra openais: aru a high delivered a received flag 1
Aug 25 19:27:15 cccra openais: Did not need to originate any messages in recovery.
Aug 25 19:27:15 cccra openais: Sending initial ORF token
Aug 25 19:27:15 cccra openais: CLM CONFIGURATION CHANGE
Aug 25 19:27:15 cccra openais: New Configuration:
Aug 25 19:27:15 cccra openais:        r(0) ip(192.144.144.103)  
Aug 25 19:27:15 cccra openais: Members Left:
Aug 25 19:27:15 cccra openais: Members Joined:
Aug 25 19:27:15 cccra openais: CLM CONFIGURATION CHANGE
Aug 25 19:27:15 cccra openais: New Configuration:
Aug 25 19:27:15 cccra openais:        r(0) ip(192.144.144.103)  
Aug 25 19:27:15 cccra openais:        r(0) ip(192.144.144.104)  
Aug 25 19:27:15 cccra openais: Members Left:
Aug 25 19:27:15 cccra openais: Members Joined:
Aug 25 19:27:15 cccra openais:        r(0) ip(192.144.144.104)  
Aug 25 19:27:15 cccra openais: This node is within the primary component and will provide service.
Aug 25 19:27:15 cccra openais: entering OPERATIONAL state.
Aug 25 19:27:15 cccra openais: got nodejoin message 192.144.144.103
Aug 25 19:27:15 cccra openais: got nodejoin message 192.144.144.104
Aug 25 19:27:15 cccra openais: got joinlist message from node 1
Aug 25 19:27:21 cccra kernel: dlm: connecting to 2
Aug 25 19:27:21 cccra kernel: dlm: got connection from 2
Aug 25 19:32:18 cccra syslogd 1.4.1: restart.
Aug 25 19:32:18 cccra kernel: klogd 1.4.1, log source = /proc/kmsg started.
Aug 25 19:32:18 cccra kernel: Linux version 2.6.18-164.el5 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.1.
2 20080704 (Red Hat 4.1.2-46)) #1 SMP Tue Aug 18 15:51:48 EDT 2009
Aug 25 19:32:18 cccra kernel: Command line: ro root=LABEL=/ rhgb quiet acpi=off
.................省略啟動日誌.............................

Aug 25 19:32:19 cccra ccsd: Starting ccsd 2.0.115:
Aug 25 19:32:19 cccra ccsd:  Built: Mar  6 2011 00:47:03
Aug 25 19:32:19 cccra ccsd:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Aug 25 19:32:19 cccra ccsd: cluster.conf (cluster name = cccrcls, version = 20) found.
Aug 25 19:32:19 cccra ccsd: Remote copy of cluster.conf is from quorate node.
Aug 25 19:32:19 cccra ccsd:  Local version # : 20
Aug 25 19:32:19 cccra ccsd:  Remote version #: 20
Aug 25 19:32:19 cccra ccsd: Remote copy of cluster.conf is from quorate node.
Aug 25 19:32:19 cccra ccsd:  Local version # : 20
Aug 25 19:32:19 cccra ccsd:  Remote version #: 20
Aug 25 19:32:19 cccra ccsd: Remote copy of cluster.conf is from quorate node.
Aug 25 19:32:19 cccra ccsd:  Local version # : 20
Aug 25 19:32:19 cccra ccsd:  Remote version #: 20
Aug 25 19:32:19 cccra ccsd: Remote copy of cluster.conf is from quorate node.
Aug 25 19:32:19 cccra ccsd:  Local version # : 20
Aug 25 19:32:19 cccra ccsd:  Remote version #: 20
Aug 25 19:32:19 cccra openais: AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Aug 25 19:32:19 cccra openais: Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Aug 25 19:32:19 cccra openais: Copyright (C) 2006 Red Hat, Inc.
Aug 25 19:32:19 cccra openais: AIS Executive Service: started and ready to provide service.
Aug 25 19:32:19 cccra openais: Using default multicast address of 239.192.50.201
Aug 25 19:32:19 cccra openais: Token Timeout (10000 ms) retransmit timeout (495 ms)
Aug 25 19:32:19 cccra openais: token hold (386 ms) retransmits before loss (20 retrans)
Aug 25 19:32:19 cccra openais: join (60 ms) send_join (0 ms) consensus (2000 ms) merge (200 ms)
Aug 25 19:32:19 cccra openais: downcheck (1000 ms) fail to recv const (2500 msgs)
Aug 25 19:32:19 cccra openais: seqno unchanged const (30 rotations) Maximum network MTU 1500
Aug 25 19:32:19 cccra openais: window size per rotation (50 messages) maximum messages per rotation (1
7 messages)
Aug 25 19:32:19 cccra openais: send threads (0 threads)
Aug 25 19:32:19 cccra openais: RRP token expired timeout (495 ms)
Aug 25 19:32:19 cccra openais: RRP token problem counter (2000 ms)
Aug 25 19:32:19 cccra openais: RRP threshold (10 problem count)
Aug 25 19:32:19 cccra openais: RRP mode set to none.
Aug 25 19:32:19 cccra openais: heartbeat_failures_allowed (0)
Aug 25 19:32:19 cccra openais: max_network_delay (50 ms)
Aug 25 19:32:19 cccra openais: HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Aug 25 19:32:19 cccra openais: Receive multicast socket recv buffer size (288000 bytes).
Aug 25 19:32:19 cccra openais: Transmit multicast socket send buffer size (288000 bytes).
Aug 25 19:32:19 cccra openais: The network interface is now up.
Aug 25 19:32:19 cccra openais: Created or loaded sequence id 288.192.144.144.103 for this ring.
Aug 25 19:32:19 cccra openais: entering GATHER state from 15.
Aug 25 19:32:19 cccra openais: CMAN 2.0.115 (built Mar  6 2011 00:47:08) started
Aug 25 19:32:19 cccra openais: Service initialized 'openais CMAN membership service 2.01'
Aug 25 19:32:19 cccra openais: Service initialized 'openais extended virtual synchrony service'
Aug 25 19:32:19 cccra openais: Service initialized 'openais cluster membership service B.01.01'
Aug 25 19:32:19 cccra openais: Service initialized 'openais availability management framework B.01.01'

Aug 25 19:32:19 cccra openais: Service initialized 'openais checkpoint service B.01.01'
Aug 25 19:32:19 cccra openais: Service initialized 'openais event service B.01.01'
Aug 25 19:32:19 cccra openais: Service initialized 'openais distributed locking service B.01.01'
Aug 25 19:32:19 cccra openais: Service initialized 'openais message service B.01.01'
Aug 25 19:32:19 cccra openais: Service initialized 'openais configuration service'
Aug 25 19:32:19 cccra openais: Service initialized 'openais cluster closed process group service v1.01
'
Aug 25 19:32:19 cccra openais: Service initialized 'openais cluster config database access v1.01'
Aug 25 19:32:19 cccra openais: Not using a virtual synchrony filter.
Aug 25 19:32:19 cccra openais: Creating commit token because I am the rep.
Aug 25 19:32:19 cccra openais: Saving state aru 0 high seq received 0
Aug 25 19:32:19 cccra openais: Storing new sequence id for ring 124
Aug 25 19:32:19 cccra openais: entering COMMIT state.
Aug 25 19:32:19 cccra openais: entering RECOVERY state.
Aug 25 19:32:19 cccra openais: position member 192.144.144.103:
Aug 25 19:32:19 cccra openais: previous ring seq 288 rep 192.144.144.103
Aug 25 19:32:19 cccra openais: aru 0 high delivered 0 received flag 1
Aug 25 19:32:19 cccra openais: Did not need to originate any messages in recovery.
Aug 25 19:32:19 cccra openais: Sending initial ORF token
Aug 25 19:32:19 cccra openais: CLM CONFIGURATION CHANGE
Aug 25 19:32:19 cccra openais: New Configuration:
Aug 25 19:32:19 cccra openais: Members Left:
Aug 25 19:32:19 cccra openais: Members Joined:
Aug 25 19:32:19 cccra openais: CLM CONFIGURATION CHANGE
Aug 25 19:32:19 cccra openais: New Configuration:
Aug 25 19:32:19 cccra openais:        r(0) ip(192.144.144.103)  
Aug 25 19:32:19 cccra openais: Members Left:
Aug 25 19:32:19 cccra openais: Members Joined:
Aug 25 19:32:19 cccra openais:        r(0) ip(192.144.144.103)  
Aug 25 19:32:19 cccra openais: This node is within the primary component and will provide service.
Aug 25 19:32:19 cccra openais: entering OPERATIONAL state.
Aug 25 19:32:19 cccra openais: got nodejoin message 192.144.144.103
Aug 25 19:32:19 cccra openais: entering GATHER state from 11.
Aug 25 19:32:19 cccra openais: Creating commit token because I am the rep.
Aug 25 19:32:19 cccra openais: Saving state aru a high seq received a
Aug 25 19:32:19 cccra openais: Storing new sequence id for ring 128
Aug 25 19:32:19 cccra openais: entering COMMIT state.
Aug 25 19:32:19 cccra openais: entering RECOVERY state.
Aug 25 19:32:19 cccra openais: position member 192.144.144.103:
Aug 25 19:32:19 cccra openais: previous ring seq 292 rep 192.144.144.103
Aug 25 19:32:19 cccra openais: aru a high delivered a received flag 1
Aug 25 19:32:19 cccra openais: position member 192.144.144.104:
Aug 25 19:32:19 cccra openais: previous ring seq 292 rep 192.144.144.104
Aug 25 19:32:19 cccra openais: aru 19 high delivered 19 received flag 1
Aug 25 19:32:19 cccra openais: Did not need to originate any messages in recovery.
Aug 25 19:32:19 cccra openais: Sending initial ORF token
Aug 25 19:32:19 cccra openais: CLM CONFIGURATION CHANGE
Aug 25 19:32:19 cccra openais: New Configuration:
Aug 25 19:32:19 cccra openais:        r(0) ip(192.144.144.103)  
Aug 25 19:32:19 cccra openais: Members Left:
Aug 25 19:32:19 cccra openais: Members Joined:
Aug 25 19:32:19 cccra openais: CLM CONFIGURATION CHANGE
Aug 25 19:32:19 cccra openais: New Configuration:
Aug 25 19:32:19 cccra openais:        r(0) ip(192.144.144.103)  
Aug 25 19:32:19 cccra openais:        r(0) ip(192.144.144.104)  
Aug 25 19:32:19 cccra openais: Members Left:
Aug 25 19:32:19 cccra openais: Members Joined:
Aug 25 19:32:19 cccra openais:        r(0) ip(192.144.144.104)  
Aug 25 19:32:19 cccra openais: This node is within the primary component and will provide service.
Aug 25 19:32:19 cccra openais: entering OPERATIONAL state.
Aug 25 19:32:19 cccra openais: quorum regained, resuming activity
Aug 25 19:32:19 cccra openais: got nodejoin message 192.144.144.103
Aug 25 19:32:19 cccra openais: got nodejoin message 192.144.144.104
Aug 25 19:32:19 cccra openais: got joinlist message from node 2
Aug 25 19:32:20 cccra ccsd: Initial status:: Quorate
Aug 25 19:32:20 cccra qdiskd: <info> Quorum Partition: /dev/dm-2 Label: qdsk
Aug 25 19:32:20 cccra qdiskd: <info> Quorum Daemon Initializing
Aug 25 19:32:21 cccra qdiskd: <info> Heuristic: 'ping 192.144.144.254 -c1 -t1' UP
Aug 25 19:32:23 cccra snmpd: mibII/mta_sendmail.c:open_sendmailst: could not guess version of statistics file
"/var/log/mail/statistics"
Aug 25 19:32:23 cccra gpm: *** info :
Aug 25 19:32:23 cccra gpm: Started gpm successfully. Entered daemon mode.
Aug 25 19:32:23 cccra xinetd: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options co
mpiled in.
Aug 25 19:32:23 cccra xinetd: Started working: 1 available service
Aug 25 19:32:23 cccra snmpd: NET-SNMP version 5.3.2.2
Aug 25 19:32:24 cccra qdiskd: <info> Node 2 is the master
Aug 25 19:32:24 cccra modclusterd: startup succeeded
Aug 25 19:32:24 cccra kernel: dlm: Using TCP for communications
Aug 25 19:32:24 cccra kernel: dlm: got connection from 2
Aug 25 19:32:24 cccra kernel: dlm: connecting to 2
Aug 25 19:32:07 cccra oddjobd: oddjobd startup succeeded
Aug 25 19:32:07 cccra saslauthd: detach_tty      : master pid is: 13411
Aug 25 19:32:07 cccra saslauthd: ipc_init        : listening on socket: /var/run/saslauthd/mux
Aug 25 19:32:07 cccra clurgmgrd: <notice> Resource Group Manager Starting
Aug 25 19:32:07 cccra ricci: startup succeeded
Aug 25 19:32:12 cccra qdiskd: <info> Initial score 2/2
Aug 25 19:32:12 cccra qdiskd: <info> Initialization complete
Aug 25 19:32:12 cccra openais: quorum device registered
Aug 25 19:32:12 cccra qdiskd: <notice> Score sufficient for master operation (2/2; required=2); upgrading
Aug 25 19:32:12 cccra clurgmgrd: : <err>   cccrb   owns cccr_vg1/cccr_vg1_lv1 unable to stop
Aug 25 19:32:12 cccra clurgmgrd: <notice> stop on lvm "cccr_vg1_lv1" returned 1 (generic error)
《解決方案》

節點2日誌Aug 25 17:38:12 cccrb clurgmgrd: : <notice> Getting status
Aug 25 18:38:12 cccrb clurgmgrd: : <notice> Getting status
Aug 25 19:27:03 cccrb syslogd 1.4.1: restart.
Aug 25 19:27:03 cccrb kernel: klogd 1.4.1, log source = /proc/kmsg started.
Aug 25 19:27:03 cccrb kernel: Linux version 2.6.18-164.el5 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.1.
2 20080704 (Red Hat 4.1.2-46)) #1 SMP Tue Aug 18 15:51:48 EDT 2009
Aug 25 19:27:03 cccrb kernel: Command line: ro root=LABEL=/ rhgb quiet acpi=off
Aug 25 19:27:03 cccrb kernel: BIOS-provided physical RAM map:
Aug 25 19:27:03 cccrb kernel:  BIOS-e820: 0000000000010000 - 000000000009f400 (usable)
..............省略啟動日誌...........................

Aug 25 19:27:03 cccrb ccsd: Starting ccsd 2.0.115:
Aug 25 19:27:03 cccrb ccsd:  Built: Mar  6 2011 00:47:03
Aug 25 19:27:03 cccrb ccsd:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Aug 25 19:27:03 cccrb ccsd: cluster.conf (cluster name = cccrcls, version = 20) found.
Aug 25 19:27:03 cccrb ccsd: Remote copy of cluster.conf is from quorate node.
Aug 25 19:27:03 cccrb ccsd:  Local version # : 20
Aug 25 19:27:03 cccrb ccsd:  Remote version #: 20
Aug 25 19:27:03 cccrb ccsd: Remote copy of cluster.conf is from quorate node.
Aug 25 19:27:03 cccrb ccsd:  Local version # : 20
Aug 25 19:27:03 cccrb ccsd:  Remote version #: 20
Aug 25 19:27:03 cccrb ccsd: Remote copy of cluster.conf is from quorate node.
Aug 25 19:27:03 cccrb ccsd:  Local version # : 20
Aug 25 19:27:03 cccrb ccsd:  Remote version #: 20
Aug 25 19:27:03 cccrb ccsd: Remote copy of cluster.conf is from quorate node.
Aug 25 19:27:03 cccrb ccsd:  Local version # : 20
Aug 25 19:27:03 cccrb ccsd:  Remote version #: 20
Aug 25 19:27:03 cccrb openais: AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Aug 25 19:27:03 cccrb openais: Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Aug 25 19:27:03 cccrb openais: Copyright (C) 2006 Red Hat, Inc.
Aug 25 19:27:03 cccrb openais: AIS Executive Service: started and ready to provide service.
Aug 25 19:27:03 cccrb openais: Using default multicast address of 239.192.50.201
Aug 25 19:27:03 cccrb openais: Token Timeout (10000 ms) retransmit timeout (495 ms)
Aug 25 19:27:03 cccrb openais: token hold (386 ms) retransmits before loss (20 retrans)
Aug 25 19:27:03 cccrb openais: join (60 ms) send_join (0 ms) consensus (2000 ms) merge (200 ms)
Aug 25 19:27:03 cccrb openais: downcheck (1000 ms) fail to recv const (2500 msgs)
Aug 25 19:27:03 cccrb openais: seqno unchanged const (30 rotations) Maximum network MTU 1500
Aug 25 19:27:03 cccrb openais: window size per rotation (50 messages) maximum messages per rotation (17
messages)
Aug 25 19:27:03 cccrb openais: send threads (0 threads)
Aug 25 19:27:03 cccrb openais: RRP token expired timeout (495 ms)
Aug 25 19:27:03 cccrb openais: RRP token problem counter (2000 ms)
Aug 25 19:27:03 cccrb openais: RRP threshold (10 problem count)
Aug 25 19:27:03 cccrb openais: RRP mode set to none.
Aug 25 19:27:03 cccrb openais: heartbeat_failures_allowed (0)
Aug 25 19:27:03 cccrb openais: max_network_delay (50 ms)
Aug 25 19:27:03 cccrb openais: HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Aug 25 19:27:03 cccrb openais: Receive multicast socket recv buffer size (288000 bytes).
Aug 25 19:27:03 cccrb openais: Transmit multicast socket send buffer size (288000 bytes).
Aug 25 19:27:03 cccrb openais: The network interface is now up.
Aug 25 19:27:03 cccrb openais: Created or loaded sequence id 280.192.144.144.104 for this ring.
Aug 25 19:27:03 cccrb openais: entering GATHER state from 15.
Aug 25 19:27:03 cccrb openais: CMAN 2.0.115 (built Mar  6 2011 00:47:08) started
Aug 25 19:27:03 cccrb openais: Service initialized 'openais CMAN membership service 2.01'
Aug 25 19:27:03 cccrb openais: Service initialized 'openais extended virtual synchrony service'
Aug 25 19:27:03 cccrb openais: Service initialized 'openais cluster membership service B.01.01'
Aug 25 19:27:03 cccrb openais: Service initialized 'openais availability management framework B.01.01'
Aug 25 19:27:03 cccrb openais: Service initialized 'openais checkpoint service B.01.01'
Aug 25 19:27:03 cccrb openais: Service initialized 'openais event service B.01.01'
Aug 25 19:27:03 cccrb openais: Service initialized 'openais distributed locking service B.01.01'
Aug 25 19:27:03 cccrb openais: Service initialized 'openais message service B.01.01'
Aug 25 19:27:03 cccrb openais: Service initialized 'openais configuration service'
Aug 25 19:27:03 cccrb openais: Service initialized 'openais cluster closed process group service v1.01'

Aug 25 19:27:03 cccrb openais: Service initialized 'openais cluster config database access v1.01'
Aug 25 19:27:03 cccrb openais: Not using a virtual synchrony filter.
Aug 25 19:27:03 cccrb openais: Creating commit token because I am the rep.
Aug 25 19:27:03 cccrb openais: Saving state aru 0 high seq received 0
Aug 25 19:27:03 cccrb openais: Storing new sequence id for ring 11c
Aug 25 19:27:03 cccrb openais: entering COMMIT state.
Aug 25 19:27:03 cccrb openais: entering RECOVERY state.
Aug 25 19:27:03 cccrb openais: position member 192.144.144.104:
Aug 25 19:27:03 cccrb openais: previous ring seq 280 rep 192.144.144.104
Aug 25 19:27:03 cccrb openais: aru 0 high delivered 0 received flag 1
Aug 25 19:27:03 cccrb openais: Did not need to originate any messages in recovery.
Aug 25 19:27:03 cccrb openais: Sending initial ORF token
Aug 25 19:27:03 cccrb openais: CLM CONFIGURATION CHANGE
Aug 25 19:27:03 cccrb openais: New Configuration:
Aug 25 19:27:03 cccrb openais: Members Left:
Aug 25 19:27:03 cccrb openais: Members Joined:
Aug 25 19:27:03 cccrb openais: CLM CONFIGURATION CHANGE
Aug 25 19:27:03 cccrb openais: New Configuration:
Aug 25 19:27:03 cccrb openais:         r(0) ip(192.144.144.104)  
Aug 25 19:27:03 cccrb openais: Members Left:
Aug 25 19:27:03 cccrb openais: Members Joined:
Aug 25 19:27:03 cccrb openais:         r(0) ip(192.144.144.104)  
Aug 25 19:27:03 cccrb openais: This node is within the primary component and will provide service.
Aug 25 19:27:03 cccrb openais: entering OPERATIONAL state.
Aug 25 19:27:03 cccrb openais: got nodejoin message 192.144.144.104
Aug 25 19:27:04 cccrb openais: entering GATHER state from 11.
Aug 25 19:27:04 cccrb openais: Saving state aru a high seq received a
Aug 25 19:27:04 cccrb openais: Storing new sequence id for ring 120
Aug 25 19:27:04 cccrb openais: entering COMMIT state.
Aug 25 19:27:04 cccrb openais: entering RECOVERY state.
Aug 25 19:27:04 cccrb openais: position member 192.144.144.103:
Aug 25 19:27:04 cccrb openais: previous ring seq 284 rep 192.144.144.103
Aug 25 19:27:04 cccrb openais: aru 19 high delivered 19 received flag 1
Aug 25 19:27:04 cccrb openais: position member 192.144.144.104:
Aug 25 19:27:04 cccrb openais: previous ring seq 284 rep 192.144.144.104
Aug 25 19:27:04 cccrb openais: aru a high delivered a received flag 1
Aug 25 19:27:04 cccrb openais: Did not need to originate any messages in recovery.
Aug 25 19:27:04 cccrb openais: CLM CONFIGURATION CHANGE
Aug 25 19:27:04 cccrb openais: New Configuration:
Aug 25 19:27:04 cccrb openais:         r(0) ip(192.144.144.104)  
Aug 25 19:27:04 cccrb openais: Members Left:
Aug 25 19:27:04 cccrb openais: Members Joined:
Aug 25 19:27:04 cccrb openais: CLM CONFIGURATION CHANGE
Aug 25 19:27:04 cccrb openais: New Configuration:
Aug 25 19:27:04 cccrb openais:         r(0) ip(192.144.144.103)  
Aug 25 19:27:04 cccrb openais:         r(0) ip(192.144.144.104)  
Aug 25 19:27:04 cccrb openais: Members Left:
Aug 25 19:27:04 cccrb openais: Members Joined:
Aug 25 19:27:04 cccrb openais:         r(0) ip(192.144.144.103)  
Aug 25 19:27:04 cccrb openais: This node is within the primary component and will provide service.
Aug 25 19:27:04 cccrb openais: entering OPERATIONAL state.
Aug 25 19:27:04 cccrb openais: quorum regained, resuming activity
Aug 25 19:27:04 cccrb openais: got nodejoin message 192.144.144.103
Aug 25 19:27:04 cccrb openais: got nodejoin message 192.144.144.104
Aug 25 19:27:04 cccrb openais: got joinlist message from node 1
Aug 25 19:27:04 cccrb ccsd: Initial status:: Quorate
Aug 25 19:27:05 cccrb qdiskd: <info> Quorum Partition: /dev/dm-2 Label: qdsk
Aug 25 19:27:05 cccrb qdiskd: <info> Quorum Daemon Initializing
Aug 25 19:27:06 cccrb qdiskd: <info> Heuristic: 'ping 192.144.144.254 -c1 -t1' UP
Aug 25 19:27:08 cccrb snmpd: mibII/mta_sendmail.c:open_sendmailst: could not guess version of statistics file "
/var/log/mail/statistics"
Aug 25 19:27:08 cccrb gpm: *** info :
Aug 25 19:27:08 cccrb gpm: Started gpm successfully. Entered daemon mode.
Aug 25 19:27:08 cccrb xinetd: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options com
piled in.
Aug 25 19:27:08 cccrb xinetd: Started working: 1 available service
Aug 25 19:27:08 cccrb snmpd: NET-SNMP version 5.3.2.2
Aug 25 19:27:09 cccrb qdiskd: <info> Node 1 is the master
Aug 25 19:27:09 cccrb modclusterd: startup succeeded
Aug 25 19:27:09 cccrb kernel: dlm: Using TCP for communications
Aug 25 19:27:09 cccrb kernel: dlm: connecting to 1
Aug 25 19:27:09 cccrb kernel: dlm: got connection from 1
Aug 25 19:27:10 cccrb oddjobd: oddjobd startup succeeded
Aug 25 19:27:10 cccrb saslauthd: detach_tty      : master pid is: 7213
Aug 25 19:27:10 cccrb saslauthd: ipc_init        : listening on socket: /var/run/saslauthd/mux
Aug 25 19:27:10 cccrb clurgmgrd: <notice> Resource Group Manager Starting
Aug 25 19:27:27 cccrb ricci: startup succeeded
Aug 25 19:27:32 cccrb qdiskd: <info> Initial score 2/2
Aug 25 19:27:32 cccrb qdiskd: <info> Initialization complete
Aug 25 19:27:32 cccrb openais: quorum device registered
Aug 25 19:27:32 cccrb qdiskd: <notice> Score sufficient for master operation (2/2; required=2); upgrading
Aug 25 19:27:32 cccrb clurgmgrd: : <err>   cccra   owns cccr_vg1/cccr_vg1_lv1 unable to stop
Aug 25 19:27:32 cccrb clurgmgrd: <notice> stop on lvm "cccr_vg1_lv1" returned 1 (generic error)
Aug 25 19:28:22 cccrb openais: The token was lost in the OPERATIONAL state.
Aug 25 19:28:22 cccrb openais: Receive multicast socket recv buffer size (288000 bytes).
Aug 25 19:28:22 cccrb openais: Transmit multicast socket send buffer size (288000 bytes).
Aug 25 19:28:22 cccrb openais: entering GATHER state from 2.
Aug 25 19:28:24 cccrb openais: entering GATHER state from 0.
Aug 25 19:28:24 cccrb openais: Creating commit token because I am the rep.
Aug 25 19:28:24 cccrb openais: Saving state aru 2c high seq received 2c
Aug 25 19:28:24 cccrb openais: Storing new sequence id for ring 124
Aug 25 19:28:24 cccrb openais: entering COMMIT state.
Aug 25 19:28:24 cccrb openais: entering RECOVERY state.
Aug 25 19:28:24 cccrb openais: position member 192.144.144.104:
Aug 25 19:28:24 cccrb openais: previous ring seq 288 rep 192.144.144.103
Aug 25 19:28:24 cccrb openais: aru 2c high delivered 2c received flag 1
Aug 25 19:28:24 cccrb openais: Did not need to originate any messages in recovery.
Aug 25 19:28:24 cccrb openais: Sending initial ORF token
Aug 25 19:28:24 cccrb openais: CLM CONFIGURATION CHANGE
Aug 25 19:28:24 cccrb openais: New Configuration:
Aug 25 19:28:24 cccrb openais:         r(0) ip(192.144.144.104)  
Aug 25 19:28:24 cccrb openais: Members Left:
Aug 25 19:28:24 cccrb openais:         r(0) ip(192.144.144.103)  
Aug 25 19:28:24 cccrb kernel: dlm: closing connection to node 1
Aug 25 19:28:24 cccrb openais: Members Joined:
Aug 25 19:28:24 cccrb openais: CLM CONFIGURATION CHANGE
Aug 25 19:28:24 cccrb openais: New Configuration:
Aug 25 19:28:24 cccrb openais:         r(0) ip(192.144.144.104)  
Aug 25 19:28:24 cccrb openais: Members Left:
Aug 25 19:28:24 cccrb fenced: cccra not a cluster member after 0 sec post_fail_delay
Aug 25 19:28:24 cccrb openais: Members Joined:
Aug 25 19:28:24 cccrb fenced: fencing node "cccra"
Aug 25 19:28:24 cccrb openais: This node is within the primary component and will provide service.
Aug 25 19:28:24 cccrb openais: entering OPERATIONAL state.
Aug 25 19:28:24 cccrb openais: got nodejoin message 192.144.144.104
Aug 25 19:28:24 cccrb openais: got joinlist message from node 2
Aug 25 19:28:27 cccrb qdiskd: <info> Assuming master role
Aug 25 19:28:28 cccrb qdiskd: <notice> Writing eviction notice for node 1
Aug 25 19:28:29 cccrb qdiskd: <notice> Node 1 evicted
Aug 25 19:28:32 cccrb fenced: fence "cccra" success
Aug 25 19:28:33 cccrb clurgmgrd: <notice> Taking over service service:cccrha from down member cccra
Aug 25 19:28:34 cccrb clurgmgrd: : <notice> Owner of cccr_vg1/cccr_vg1_lv1 is not in the cluster
Aug 25 19:28:34 cccrb clurgmgrd: : <notice> Stealing cccr_vg1/cccr_vg1_lv1
Aug 25 19:28:34 cccrb clurgmgrd: : <notice> Activating cccr_vg1/cccr_vg1_lv1
Aug 25 19:28:34 cccrb clurgmgrd: : <notice> Making resilient : lvchange -ay cccr_vg1/cccr_vg1_lv1
Aug 25 19:28:34 cccrb clurgmgrd: : <notice> Resilient command: lvchange -ay cccr_vg1/cccr_vg1_lv1 --config devi
ces{filter=["a|/dev/dm-0|","a|/dev/dm-1|","r|.*|"]}
Aug 25 19:28:35 cccrb multipathd: dm-4: add map (uevent)
Aug 25 19:28:35 cccrb kernel: kjournald starting.  Commit interval 5 seconds
Aug 25 19:28:35 cccrb kernel: EXT3 FS on dm-4, internal journal
Aug 25 19:28:35 cccrb kernel: EXT3-fs: recovery complete.
Aug 25 19:28:35 cccrb kernel: EXT3-fs: mounted filesystem with ordered data mode.
Aug 25 19:28:55 cccrb clurgmgrd: <notice> Service service:cccrha started
Aug 25 19:29:03 cccrb clurgmgrd: : <notice> Getting status
Aug 25 19:32:00 cccrb openais: entering GATHER state from 11.
Aug 25 19:32:00 cccrb openais: Saving state aru 19 high seq received 19
Aug 25 19:32:00 cccrb openais: Storing new sequence id for ring 128
Aug 25 19:32:00 cccrb openais: entering COMMIT state.
Aug 25 19:32:00 cccrb openais: entering RECOVERY state.
Aug 25 19:32:00 cccrb openais: position member 192.144.144.103:
Aug 25 19:32:00 cccrb openais: previous ring seq 292 rep 192.144.144.103
Aug 25 19:32:00 cccrb openais: aru a high delivered a received flag 1
Aug 25 19:32:00 cccrb openais: position member 192.144.144.104:
Aug 25 19:32:00 cccrb openais: previous ring seq 292 rep 192.144.144.104
Aug 25 19:32:00 cccrb openais: aru 19 high delivered 19 received flag 1
Aug 25 19:32:00 cccrb openais: Did not need to originate any messages in recovery.
Aug 25 19:32:00 cccrb openais: CLM CONFIGURATION CHANGE
Aug 25 19:32:00 cccrb openais: New Configuration:
Aug 25 19:32:00 cccrb openais:         r(0) ip(192.144.144.104)  
Aug 25 19:32:00 cccrb openais: Members Left:
Aug 25 19:32:00 cccrb openais: Members Joined:
Aug 25 19:32:00 cccrb openais: CLM CONFIGURATION CHANGE
Aug 25 19:32:00 cccrb openais: New Configuration:
Aug 25 19:32:00 cccrb openais:         r(0) ip(192.144.144.103)  
Aug 25 19:32:00 cccrb openais:         r(0) ip(192.144.144.104)  
Aug 25 19:32:00 cccrb openais: Members Left:
Aug 25 19:32:00 cccrb openais: Members Joined:
Aug 25 19:32:00 cccrb openais:         r(0) ip(192.144.144.103)  
Aug 25 19:32:00 cccrb openais: This node is within the primary component and will provide service.
Aug 25 19:32:00 cccrb openais: entering OPERATIONAL state.
Aug 25 19:32:00 cccrb openais: got nodejoin message 192.144.144.103
Aug 25 19:32:00 cccrb openais: got nodejoin message 192.144.144.104
Aug 25 19:32:00 cccrb openais: got joinlist message from node 2
Aug 25 19:32:06 cccrb kernel: dlm: connecting to 1
Aug 25 19:32:06 cccrb kernel: dlm: got connection from 1
Aug 25 20:29:03 cccrb clurgmgrd: : <notice> Getting status
Aug 25 21:29:03 cccrb clurgmgrd: : <notice> Getting status
Aug 25 22:29:03 cccrb clurgmgrd: : <notice> Getting status
Aug 25 23:29:03 cccrb clurgmgrd: : <notice> Getting status
Aug 26 00:29:13 cccrb clurgmgrd: : <notice> Getting status  

[火星人 ] rhel 5.4的rhcs兩節點cluster,每6天的同一時間,會自動重啟一次已經有1036次圍觀

http://coctec.com/docs/service/show-post-4883.html