請教集群時cman無法啟l動問題

火星人 @ 2014-03-04 , reply:0
←手機掃碼閱讀

請教集群時cman無法啟l動問題

問題:用fence_ilo 命令success。
但用cman就無法啟動服務.
謝謝!
環境:linuxa5
硬體連接:
伺服器都是eth0連接交換機,iloh直接連接交換機
陣列,直接和伺服器連接

hosts如下:
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        localhost.localdomain        localhost
10.229.208.134  test-1
10.229.208.136  test-2
10.229.208.138  test-cluter
10.229.208.135  ilo1
10.229.208.137  ilo2

/etc/cluster/cluster.confv如下:
<?xml version="1.0" ?>
<cluster config_version="3" name="new_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="test-1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Fence-1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="test-2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="Fence-2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ilo" hostname="10.229.208.135" login="Administrator" name="Fence-1" passwd="C7N7NS4B"/>
                <fencedevice agent="fence_ilo" hostname="10.229.208.137" login="Administrator" name="Fence-2" passwd="CFEFHGP3"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="Failover_sybase" ordered="1" restricted="1">
                                <failoverdomainnode name="test-1" priority="1"/>
                                <failoverdomainnode name="test-2" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="10.229.208.138" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="Failover_sybase" name="Services_sybase">
                        <ip ref="10.229.208.138"/>
                </service>
        </rm>
</cluster>

在service cman start時/var/log/message日誌::
Jan 12 03:35:59 test-2 openais: Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jan 12 03:35:59 test-2 openais: Copyright (C) 2006 Red Hat, Inc.
Jan 12 03:35:59 test-2 openais: AIS Executive Service: started and ready to provide service.
Jan 12 03:35:59 test-2 openais: Using default multicast address of 239.192.187.112
Jan 12 03:35:59 test-2 openais: openais component openais_cpg loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais cluster closed process group service v1.01'
Jan 12 03:35:59 test-2 openais: openais component openais_cfg loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais configuration service'
Jan 12 03:35:59 test-2 openais: openais component openais_msg loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais message service B.01.01'
Jan 12 03:35:59 test-2 openais: openais component openais_lck loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais distributed locking service B.01.01'
Jan 12 03:35:59 test-2 openais: openais component openais_evt loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais event service B.01.01'
Jan 12 03:35:59 test-2 openais: openais component openais_ckpt loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais checkpoint service B.01.01'
Jan 12 03:35:59 test-2 openais: openais component openais_amf loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais availability management framework B.01.01'
Jan 12 03:35:59 test-2 openais: openais component openais_clm loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais cluster membership service B.01.01'
Jan 12 03:35:59 test-2 openais: openais component openais_evs loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais extended virtual synchrony service'
Jan 12 03:35:59 test-2 openais: openais component openais_cman loaded.
Jan 12 03:35:59 test-2 openais: Registering service handler 'openais CMAN membership service 2.01'
Jan 12 03:35:59 test-2 openais: Token Timeout (10000 ms) retransmit timeout (495 ms)
Jan 12 03:35:59 test-2 openais: token hold (386 ms) retransmits before loss (20 retrans)
Jan 12 03:35:59 test-2 openais: join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Jan 12 03:35:59 test-2 openais: downcheck (1000 ms) fail to recv const (50 msgs)
Jan 12 03:35:59 test-2 openais: seqno unchanged const (30 rotations) Maximum network MTU 1500
Jan 12 03:35:59 test-2 openais: window size per rotation (50 messages) maximum messages per rotation (17 messages)
Jan 12 03:35:59 test-2 openais: send threads (0 threads)
Jan 12 03:35:59 test-2 openais: RRP token expired timeout (495 ms)
Jan 12 03:35:59 test-2 openais: RRP token problem counter (2000 ms)
Jan 12 03:35:59 test-2 openais: RRP threshold (10 problem count)
Jan 12 03:35:59 test-2 openais: RRP mode set to none.
Jan 12 03:35:59 test-2 openais: heartbeat_failures_allowed (0)
Jan 12 03:35:59 test-2 openais: max_network_delay (50 ms)
Jan 12 03:35:59 test-2 openais: HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Jan 12 03:35:59 test-2 openais: Receive multicast socket recv buffer size (262142 bytes).
Jan 12 03:35:59 test-2 openais: Transmit multicast socket send buffer size (262142 bytes).
Jan 12 03:35:59 test-2 openais: The network interface is now up.
Jan 12 03:35:59 test-2 openais: Created or loaded sequence id 0.10.229.208.136 for this ring.
Jan 12 03:35:59 test-2 openais: entering GATHER state from 15.
Jan 12 03:35:59 test-2 openais: Initialising service handler 'openais extended virtual synchrony service'
Jan 12 03:35:59 test-2 openais: Initialising service handler 'openais cluster membership service B.01.01'
Jan 12 03:35:59 test-2 openais: Initialising service handler 'openais availability management framework B.01.01'
Jan 12 03:35:59 test-2 openais: Initialising service handler 'openais checkpoint service B.01.01'
Jan 12 03:35:59 test-2 openais: Initialising service handler 'openais event service B.01.01'
Jan 12 03:35:59 test-2 openais: Initialising service handler 'openais distributed locking service B.01.01'
Jan 12 03:35:59 test-2 openais: Initialising service handler 'openais message service B.01.01'
Jan 12 03:35:59 test-2 openais: Initialising service handler 'openais configuration service'
Jan 12 03:35:59 test-2 openais: Initialising service handler 'openais cluster closed process group service v1.01'
Jan 12 03:36:00 test-2 openais: Initialising service handler 'openais CMAN membership service 2.01'
Jan 12 03:36:00 test-2 openais: CMAN 2.0.60 (built Jan 23 2007 12:42:29) started
Jan 12 03:36:00 test-2 openais: Not using a virtual synchrony filter.
Jan 12 03:36:00 test-2 openais: Creating commit token because I am the rep.
Jan 12 03:36:00 test-2 openais: Saving state aru 0 high seq received 0
Jan 12 03:36:00 test-2 openais: entering COMMIT state.
Jan 12 03:36:00 test-2 openais: entering RECOVERY state.
Jan 12 03:36:00 test-2 openais: position member 10.229.208.136:
Jan 12 03:36:00 test-2 openais: previous ring seq 0 rep 10.229.208.136
Jan 12 03:36:00 test-2 openais: aru 0 high delivered 0 received flag 0
Jan 12 03:36:00 test-2 openais: Did not need to originate any messages in recovery.
Jan 12 03:36:00 test-2 openais: Couldn't store new ring id 4 to stable storage (Permission denied)
Jan 12 03:36:01 test-2 setroubleshoot:      SELinux is preventing the /usr/sbin/aisexec from using potentially mislabeled files (tmp).      For complete SELinux messages. run sealert -l 3ee1d4bd-50a6-4093-ab44-c17869fa8d36
Jan 12 03:36:07 test-2 ccsd: Unable to connect to cluster infrastructure after 30 seconds.
《解決方案》

<fence_daemon post_fail_delay="0" post_join_delay="3"/>
改成
<fence_daemon post_fail_delay="0" post_join_delay="30"/>

確保關閉selinux並且使用非xen的內核。
另外,這個地方配置有問題:
                <fence>
                                <method name="1">
                                        <device name="Fence-1"/>
                                </method>
                        </fence>
《解決方案》

回復 #2 jerrywjl 的帖子

非常感謝!!!!
剛回家,明天一早過去試試,Thank you very much
《解決方案》

我仔細比較了一些其它ok的cluster.conf。和我方面沒什麼區別啊。。。




[火星人 via ] 請教集群時cman無法啟l動問題已經有194次圍觀

http://www.coctec.com/docs/service/show-post-6735.html