項目被RHCS卡住了。。。。。。求jerrywjl大哥指導下。
項目被RHCS卡住了。。。。。。求jerrywjl大哥指導下。集群節點能online但是clustat下面沒有資源顯示。資源部能啟動。
具體情況如下:
# uname -a
Linux udbapp1 2.6.18-53.el5xen #1 SMP Wed Oct 10 16:48:44 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
119.87.244.70 udbapp1.local udbapp1
119.87.244.69 udbapp2.local udbapp2
#119.87.244.70 udbapp1
#119.87.244.69 udbapp2
#
-------------------------------------------
# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
119.87.244.70 udbapp1.local udbapp1
119.87.244.69 udbapp2.local udbapp2
#119.87.244.70 udbapp1
#119.87.244.69 udbapp2
上面是/etc/hosts文件。fence用HP ilo。硬體設備連接如下:
HP主機雙網口進行bond產生bond0 IP分別為119.87.244.70和69 ilo分別為71和72.網口和ILO口都連接到交換機都能相互ping通。
# fence_ilo -a 119.87.244.71 -l redhat -p redhat123456 -o status
power is ON
success
-------------------
# fence_ilo -a 119.87.244.72 -l redhat -p redhat123456 -o status
power is ON
success
/etc/cluster/cluster.conf文件如下:
<?xml version="1.0" ?>
<cluster config_version="1" name="cluster_2">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="udbapp1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="fence_1"/>
</method>
</fence>
</clusternode>
<clusternode name="udbapp2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="fence_2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1">
<multicast addr="224.0.0.1"/>
</cman>
<fencedevices>
<fencedevice agent="fence_ilo" hostname="119.87.244.71" login="redhat" name="fence_1" passwd="redhat123456"/>
<fencedevice agent="fence_ilo" hostname="119.87.244.72" login="redhat" name="fence_2" passwd="redhat123456"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="udbapp" ordered="1" restricted="0"/>
<failoverdomainnode name="udbapp1" priority="1"/>
</failoverdomains>
<resources>
<ip address="119.87.244.73" monitor_link="1"/>
</resources>
<service autostart="1" domain="udbapp" name="apache" recovery="relocate">
<ip ref="119.87.244.73"/>
<script file="/etc/init.d/httpd" name="httpd"/>
</service>
</rm>
</cluster>
兩邊同時service cman start的情況:
# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... done
Starting daemons... done
Starting fencing... done
[確定]
clustat顯示的情況:
# clustat
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
udbapp1 1 Online, Local
udbapp2 2 Online
顯示資源沒有啟動。
# clusvcadm -e apache -m udbapp1
Member udbapp1 trying to enable service:apache...Success
service:apache is now running on udbapp1
# ps -ef|grep httpd
root 32690 30413 0 09:08 pts/1 00:00:00 grep httpd
仍沒有啟動資源。。。。
# service rgmanager start
啟動 Cluster Service Manager:[確定]
# service rgmanager status
clurgmgrd 已死,但 pid 文件仍存
啟動rgmanager 卻發現進程已死。。。
下面是啟動過程的messages:
# tail -f /var/log/messages
Nov 30 09:11:36 udbapp2 ccsd: Starting ccsd 2.0.60:
Nov 30 09:11:36 udbapp2 ccsd: Built: Jan 23 2007 12:42:13
Nov 30 09:11:36 udbapp2 ccsd: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Nov 30 09:11:36 udbapp2 ccsd: cluster.conf (cluster name = cluster_2, version = 1) found.
Nov 30 09:11:39 udbapp2 openais: AIS Executive Service RELEASE 'subrev 1324 version 0.80.2'
Nov 30 09:11:39 udbapp2 openais: Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Nov 30 09:11:39 udbapp2 openais: Copyright (C) 2006 Red Hat, Inc.
Nov 30 09:11:39 udbapp2 openais: AIS Executive Service: started and ready to provide service.
Nov 30 09:11:39 udbapp2 openais: openais component openais_cpg loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais cluster closed process group service v1.01'
Nov 30 09:11:39 udbapp2 openais: openais component openais_cfg loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais configuration service'
Nov 30 09:11:39 udbapp2 openais: openais component openais_msg loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais message service B.01.01'
Nov 30 09:11:39 udbapp2 openais: openais component openais_lck loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais distributed locking service B.01.01'
Nov 30 09:11:39 udbapp2 openais: openais component openais_evt loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais event service B.01.01'
Nov 30 09:11:39 udbapp2 openais: openais component openais_ckpt loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais checkpoint service B.01.01'
Nov 30 09:11:39 udbapp2 openais: openais component openais_amf loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais availability management framework B.01.01'
Nov 30 09:11:39 udbapp2 openais: openais component openais_clm loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais cluster membership service B.01.01'
Nov 30 09:11:39 udbapp2 openais: openais component openais_evs loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais extended virtual synchrony service'
Nov 30 09:11:39 udbapp2 openais: openais component openais_cman loaded.
Nov 30 09:11:39 udbapp2 openais: Registering service handler 'openais CMAN membership service 2.01'
Nov 30 09:11:39 udbapp2 openais: Token Timeout (10000 ms) retransmit timeout (495 ms)
Nov 30 09:11:40 udbapp2 openais: token hold (386 ms) retransmits before loss (20 retrans)
Nov 30 09:11:40 udbapp2 openais: join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Nov 30 09:11:40 udbapp2 openais: downcheck (1000 ms) fail to recv const (50 msgs)
Nov 30 09:11:40 udbapp2 openais: seqno unchanged const (30 rotations) Maximum network MTU 1500
Nov 30 09:11:40 udbapp2 openais: window size per rotation (50 messages) maximum messages per rotation (17 messages)
Nov 30 09:11:40 udbapp2 openais: send threads (0 threads)
Nov 30 09:11:40 udbapp2 openais: RRP token expired timeout (495 ms)
Nov 30 09:11:40 udbapp2 openais: RRP token problem counter (2000 ms)
Nov 30 09:11:40 udbapp2 openais: RRP threshold (10 problem count)
Nov 30 09:11:40 udbapp2 openais: RRP mode set to none.
Nov 30 09:11:40 udbapp2 openais: heartbeat_failures_allowed (0)
Nov 30 09:11:40 udbapp2 openais: max_network_delay (50 ms)
Nov 30 09:11:40 udbapp2 openais: HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Nov 30 09:11:40 udbapp2 openais: Receive multicast socket recv buffer size (262142 bytes).
Nov 30 09:11:40 udbapp2 openais: Transmit multicast socket send buffer size (262142 bytes).
Nov 30 09:11:40 udbapp2 openais: The network interface is now up.
Nov 30 09:11:40 udbapp2 openais: Created or loaded sequence id 0.119.87.244.69 for this ring.
Nov 30 09:11:40 udbapp2 openais: entering GATHER state from 15.
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais extended virtual synchrony service'
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais cluster membership service B.01.01'
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais availability management framework B.01.01'
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais checkpoint service B.01.01'
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais event service B.01.01'
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais distributed locking service B.01.01'
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais message service B.01.01'
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais configuration service'
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais cluster closed process group service v1.01'
Nov 30 09:11:40 udbapp2 ccsd: Initial status:: Quorate
Nov 30 09:11:40 udbapp2 openais: Initialising service handler 'openais CMAN membership service 2.01'
Nov 30 09:11:40 udbapp2 openais: CMAN 2.0.60 (built Jan 23 2007 12:42:16) started
Nov 30 09:11:40 udbapp2 openais: Not using a virtual synchrony filter.
Nov 30 09:11:40 udbapp2 openais: Creating commit token because I am the rep.
Nov 30 09:11:40 udbapp2 openais: Saving state aru 0 high seq received 0
Nov 30 09:11:40 udbapp2 openais: entering COMMIT state.
Nov 30 09:11:41 udbapp2 openais: entering RECOVERY state.
Nov 30 09:11:41 udbapp2 openais: position member 119.87.244.69:
Nov 30 09:11:41 udbapp2 openais: previous ring seq 0 rep 119.87.244.69
Nov 30 09:11:41 udbapp2 openais: aru 0 high delivered 0 received flag 0
Nov 30 09:11:41 udbapp2 openais: Did not need to originate any messages in recovery.
Nov 30 09:11:41 udbapp2 openais: Storing new sequence id for ring 4
Nov 30 09:11:41 udbapp2 openais: Sending initial ORF token
Nov 30 09:11:41 udbapp2 openais: CLM CONFIGURATION CHANGE
Nov 30 09:11:41 udbapp2 openais: New Configuration:
Nov 30 09:11:41 udbapp2 openais: Members Left:
Nov 30 09:11:41 udbapp2 openais: Members Joined:
Nov 30 09:11:41 udbapp2 openais: This node is within the primary component and will provide service.
Nov 30 09:11:41 udbapp2 openais: CLM CONFIGURATION CHANGE
Nov 30 09:11:41 udbapp2 openais: New Configuration:
Nov 30 09:11:41 udbapp2 openais: r(0) ip(119.87.244.69)
Nov 30 09:11:41 udbapp2 openais: Members Left:
Nov 30 09:11:41 udbapp2 openais: Members Joined:
Nov 30 09:11:41 udbapp2 openais: r(0) ip(119.87.244.69)
Nov 30 09:11:41 udbapp2 openais: This node is within the primary component and will provide service.
Nov 30 09:11:41 udbapp2 openais: entering OPERATIONAL state.
Nov 30 09:11:41 udbapp2 openais: quorum regained, resuming activity
Nov 30 09:11:41 udbapp2 openais: got nodejoin message 119.87.244.69
Nov 30 09:11:41 udbapp2 openais: entering GATHER state from 11.
Nov 30 09:11:41 udbapp2 openais: Creating commit token because I am the rep.
Nov 30 09:11:41 udbapp2 openais: Saving state aru 9 high seq received 9
Nov 30 09:11:41 udbapp2 openais: entering COMMIT state.
Nov 30 09:11:41 udbapp2 openais: entering RECOVERY state.
Nov 30 09:11:41 udbapp2 openais: position member 119.87.244.69:
Nov 30 09:11:41 udbapp2 openais: previous ring seq 4 rep 119.87.244.69
Nov 30 09:11:41 udbapp2 openais: aru 9 high delivered 9 received flag 0
Nov 30 09:11:41 udbapp2 openais: position member 119.87.244.70:
Nov 30 09:11:42 udbapp2 openais: previous ring seq 4 rep 119.87.244.70
Nov 30 09:11:42 udbapp2 openais: aru 9 high delivered 9 received flag 0
Nov 30 09:11:42 udbapp2 openais: Did not need to originate any messages in recovery.
Nov 30 09:11:42 udbapp2 openais: Storing new sequence id for ring 8
Nov 30 09:11:42 udbapp2 openais: Sending initial ORF token
Nov 30 09:11:42 udbapp2 openais: CLM CONFIGURATION CHANGE
Nov 30 09:11:42 udbapp2 openais: New Configuration:
Nov 30 09:11:42 udbapp2 openais: r(0) ip(119.87.244.69)
Nov 30 09:11:42 udbapp2 openais: Members Left:
Nov 30 09:11:42 udbapp2 openais: Members Joined:
Nov 30 09:11:42 udbapp2 openais: This node is within the primary component and will provide service.
Nov 30 09:11:42 udbapp2 openais: CLM CONFIGURATION CHANGE
Nov 30 09:11:42 udbapp2 openais: New Configuration:
Nov 30 09:11:42 udbapp2 openais: r(0) ip(119.87.244.69)
Nov 30 09:11:42 udbapp2 openais: r(0) ip(119.87.244.70)
Nov 30 09:11:42 udbapp2 openais: Members Left:
Nov 30 09:11:42 udbapp2 openais: Members Joined:
Nov 30 09:11:42 udbapp2 openais: r(0) ip(119.87.244.70)
Nov 30 09:11:42 udbapp2 openais: This node is within the primary component and will provide service.
Nov 30 09:11:42 udbapp2 openais: entering OPERATIONAL state.
Nov 30 09:11:42 udbapp2 openais: got nodejoin message 119.87.244.69
Nov 30 09:11:42 udbapp2 openais: got nodejoin message 119.87.244.70
麻煩各位兄弟姐妹分析下messages是否有報錯。別人一下子就建立起來了。我這個就是不行。
1.我看文檔沒有明確說要專有心跳網路。心跳能直接走數據介面。但是有哪個配置文件在設置心跳走哪個網路?
《解決方案》
第一,配置文件基本沒有問題,但為什麼failoverdomain中只有一個節點?
<failoverdomains>
<failoverdomain name="udbapp" ordered="1" restricted="0"/>
<failoverdomainnode name="udbapp1" priority="1"/>
</failoverdomains>
第二,在clusvcadm之前要先啟動rgmanager,所以你應該將rgmanager設置為enable,重啟兩台機器看看。或者手動啟動rgmanager,由集群自己去決定在哪台機器上啟apache。
第三,能不用xen的內核盡量不要用。
《解決方案》
《解決方案》
LZ如果在上海的話可以聯繫我,我這邊可以介紹個RHCA給你解決問題。
《解決方案》
配資源啊