11.2.0.4 aix下运行第二个节点root.sh报错处理

网友投稿 705 2022-05-30

11.2.0.4 aix下运行第二个节点root.sh报错处理

11.2.0.4 aix下运行第二个节点root.sh报错处理

第二个节点运行root.sh报错如下

Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /U01/app/crs/crs/install/crsconfig_params User ignored Prerequisites during installation Installing Trace File Analyzer Start of resource "ora.asm" failed CRS-2672: Attempting to start 'ora.drivers.acfs' on 'sbfhxxj-db2' CRS-2676: Start of 'ora.drivers.acfs' on 'sbfhxxj-db2' succeeded CRS-2672: Attempting to start 'ora.asm' on 'sbfhxxj-db2' CRS-5017: The resource action "ora.asm start" encountered the following error: ORA-03113: end-of-file on communication channel Process ID: 0 Session ID: 0 Serial number: 0 . For details refer to "(:CLSN00107:)" in "/U01/app/crs/log/sbfhxxj-db2/agent/ohasd/oraagent_grid/oraagent_grid.log". CRS-2674: Start of 'ora.asm' on 'sbfhxxj-db2' failed CRS-2679: Attempting to clean 'ora.asm' on 'sbfhxxj-db2' CRS-2681: Clean of 'ora.asm' on 'sbfhxxj-db2' succeeded CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'sbfhxxj-db2' CRS-2677: Stop of 'ora.drivers.acfs' on 'sbfhxxj-db2' succeeded CRS-4000: Command Start failed, or completed with errors. Failed to start Oracle Grid Infrastructure stack Failed to start ASM at /U01/app/crs/crs/install/crsconfig_lib.pm line 1339. /U01/app/crs/perl/bin/perl -I/U01/app/crs/perl/lib -I/U01/app/crs/crs/install /U01/app/crs/crs/install/rootcrs.pl execution failed

从上述报错可知ora.asm没启动起来导致运行脚本报错。查看节点2 asm的alert日志

PMON (ospid: 4063948): terminating the instance due to error 481 kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).

参考ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 (Doc ID 1383737.1),检查haip服务

[+ASM1]@sbfhxxj-db1[/home/grid]$crsctl stat res ora.cluster_interconnect.haip -init NAME=ora.cluster_interconnect.haip TYPE=ora.haip.type TARGET=ONLINE STATE=OFFLINE [+ASM2]@sbfhxxj-db2[/home/grid]$crsctl stat res ora.cluster_interconnect.haip -init NAME=ora.cluster_interconnect.haip TYPE=ora.haip.type TARGET=ONLINE STATE=ONLINE on sbfhxxj-db2 [+ASM1]@sbfhxxj-db1[/home/grid]$crsctl start res ora.cluster_interconnect.haip -init CRS-2501: Resource 'ora.cluster_interconnect.haip' is disabled CRS-4000: Command Start failed, or completed with errors.

查看日志报错如下

2018-10-17 14:47:44.551: [ AGFW][2057]{0:0:70} Agent received the message: AGENT_HB[Engine] ID 12293:749 2018-10-17 14:47:44.738: [ USRTHRD][5674]{0:0:169} failed to create arp 2018-10-17 14:47:44.738: [ USRTHRD][5674]{0:0:169} (null) category: -2, operation: ioctl, loc: bpfopen:22,o, OS error: 22, other: ARP device /dev/bpf0, interface en4, BIOCSBLEN request with size 4096 2018-10-17 14:47:44.738: [ USRTHRD][5674]{0:0:169} (:CLSN00130:) category: -2, operation: ioct , loc: bpfopen:22,o, OS error: 28288, other: ARP device /de 2018-10-17 14:47:44.739: [ USRTHRD][5674]{0:0:169} [NetHAWork] thread hit exception Agent failed to initialize which is required for HAIP processing 2018-10-17 14:47:44.739: [ USRTHRD][5674]{0:0:169} [NetHAWork] thread stopping 2018-10-17 14:47:44.739: [ USRTHRD][5674]{0:0:169} Thread:[NetHAWork]isRunning is reset to false here 2018-10-17 14:47:46.742: [ USRTHRD][5931]{0:0:169} failed to create arp

根据OS error: 22, other: ARP device /dev/bpf0,参考AIX: HAIP fails to start with "OS error: 22" due to non-related devices using same major device number (文档 ID 1447517.1),检查/dev下设备标签号

[root@sbfhxxj-db1 /dev]# ls -lrt|grep bpf cr-------- 1 root system 43, 9 Jan 18 2016 bpf9 cr-------- 1 root system 43, 8 Jan 18 2016 bpf8 cr-------- 1 root system 43, 7 Jan 18 2016 bpf7 cr-------- 1 root system 43, 6 Jan 18 2016 bpf6 cr-------- 1 root system 43, 5 Jan 18 2016 bpf5 cr-------- 1 root system 43, 4 Jan 18 2016 bpf4 cr-------- 1 root system 43, 3 Jan 18 2016 bpf3 cr-------- 1 root system 43, 2 Jan 18 2016 bpf2 cr-------- 1 root system 43, 19 Jan 18 2016 bpf19 cr-------- 1 root system 43, 18 Jan 18 2016 bpf18 cr-------- 1 root system 43, 17 Jan 18 2016 bpf17 cr-------- 1 root system 43, 16 Jan 18 2016 bpf16 cr-------- 1 root system 43, 15 Jan 18 2016 bpf15 cr-------- 1 root system 43, 14 Jan 18 2016 bpf14 cr-------- 1 root system 43, 13 Jan 18 2016 bpf13 cr-------- 1 root system 43, 12 Jan 18 2016 bpf12 cr-------- 1 root system 43, 11 Jan 18 2016 bpf11 cr-------- 1 root system 43, 10 Jan 18 2016 bpf10 cr-------- 1 root system 43, 1 Jan 18 2016 bpf1 cr-------- 1 root system 43, 0 Jan 18 2016 bpf0 [root@sbfhxxj-db1 /dev]# ls -lrt|grep dlm crw------- 1 root system 42, 0 Dec 01 2015 rdlmcldrv crw------- 1 root system 43, 0 Dec 01 2015 dlmadrv crw------- 1 root system 44, 0 Sep 13 10:52 rdlmfdrvio

理论上/dev目录下设备标签号应该是相同的;dlm是多路径软件;尝试过 rm bpf* 删除所有bpf设备,重新生成bpf tcpdump -D;设备标签号不变,只有卸载多路径软件重新安装多路径;到此设备标签号一直,第二个节点运行root.sh正常,第二个节点正常加入集群

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:Node.js简要小总结
下一篇:《考取HCIA证书,看我就够了》系列第一篇-华为职业认证体系及HCIA介绍
相关文章