proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
Today I have restarted two nodes in a rac after applying the patch at the OS level , after that we started the first node and the node got started successully but when we try to start the second node we are getting the below errors.
alertlog :-
2014-07-21 10:30:47.412
[ohasd(4587656)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 1 were announced and 0 errors occurred
2014-07-21 10:30:57.098
[/oracle/11.2.0/grid/bin/orarootagent.bin(7864336)]CRS-5016:Process “/oracle/11.2.0/grid/bin/acfsload” spawned by agent “/oracle/11.2.0/grid/bin/orarootagent.bin” for action “check” failed: details at “(:CLSN00010:)” in “/oracle/11.2.0/grid/log/dwtest2/agent/ohasd/orarootagent_root/orarootagent_root.log”
2014-07-21 10:30:57.291
[/oracle/11.2.0/grid/bin/oraagent.bin(7536894)]CRS-5011:Check of resource “+ASM” failed: details at “(:CLSN00006:)” in “/oracle/11.2.0/grid/log/dwtest2/agent/ohasd/oraagent_oracle/oraagent_oracle.log”
2014-07-21 10:31:10.735
[gpnpd(10354892)]CRS-2328:GPNPD started on node dwtest2.
2014-07-21 10:31:14.503
[cssd(5111998)]CRS-1713:CSSD daemon is started in clustered mode
2014-07-21 10:31:15.689
[ohasd(4587656)]CRS-2767:Resource state recovery not attempted for ‘ora.diskmon’ as its target state is OFFLINE
2014-07-21 10:31:26.659
[cssd(5111998)]CRS-1707:Lease acquisition for node dwtest2 number 2 completed
2014-07-21 10:31:28.063
[cssd(5111998)]CRS-1605:CSSD voting file is online: /dev/rhdisk201; details in /oracle/11.2.0/grid/log/dwtest2/cssd/ocssd.log.
2014-07-21 10:31:33.164
[cssd(5111998)]CRS-1601:CSSD Reconfiguration complete. Active nodes are dwtest1 dwtest2 .
2014-07-21 10:31:35.504
[ctssd(7602224)]CRS-2403:The Cluster Time Synchronization Service on host dwtest2 is in observer mode.
2014-07-21 10:31:35.878
[ctssd(7602224)]CRS-2407:The new Cluster Time Synchronization Service reference node is host dwtest1.
2014-07-21 10:31:35.879
[ctssd(7602224)]CRS-2401:The Cluster Time Synchronization Service started on host dwtest2.
[client(9830460)]CRS-10001:21-Jul-14 10:31 ACFS-9391: Checking for existing ADVM/ACFS installation.
[client(9830462)]CRS-10001:21-Jul-14 10:31 ACFS-9392: Validating ADVM/ACFS installation files for operating system.
[client(9830464)]CRS-10001:21-Jul-14 10:31 ACFS-9393: Verifying ASM Administrator setup.
[client(9830466)]CRS-10001:21-Jul-14 10:31 ACFS-9308: Loading installed ADVM/ACFS drivers.
[client(9830472)]CRS-10001:21-Jul-14 10:31 ACFS-9154: Loading ‘oracleadvm.ext’ driver.
[client(9830480)]CRS-10001:21-Jul-14 10:31 ACFS-9154: Loading ‘oracleacfs.ext’ driver.
[client(9306270)]CRS-10001:21-Jul-14 10:31 ACFS-9327: Verifying ADVM/ACFS devices.
[client(9306274)]CRS-10001:21-Jul-14 10:31 ACFS-9156: Detecting control device ‘/dev/asm/.asm_ctl_spec’.
[client(9306280)]CRS-10001:21-Jul-14 10:31 ACFS-9156: Detecting control device ‘/dev/ofsctl’.
[client(9306284)]CRS-10001:21-Jul-14 10:31 ACFS-9322: completed
2014-07-21 10:31:42.833
[/oracle/11.2.0/grid/bin/oraagent.bin(6095012)]CRS-5011:Check of resource “+ASM” failed: details at “(:CLSN00006:)” in “/oracle/11.2.0/grid/log/dwtest2/agent/ohasd/oraagent_oracle/oraagent_oracle.log”
2014-07-21 10:31:43.566
[/oracle/11.2.0/grid/bin/oraagent.bin(6095012)]CRS-5011:Check of resource “+ASM” failed: details at “(:CLSN00006:)” in “/oracle/11.2.0/grid/log/dwtest2/agent/ohasd/oraagent_oracle/oraagent_oracle.log”
2014-07-21 11:11:57.781
Here ASM got stared and the diskgroups are mounted , but the crsd and the remaining processes are not getting started.
[oracle@DWTEST2:/oracle/11.2.0/grid/log/dwtest2]$ ps -ef | grep pmon
oracle 12845226 1 0 13:16:39 – 0:00 asm_pmon_+ASM2
The css deamon also started successfully.
[root@DWTEST2:/oracle/11.2.0/grid/bin]# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
Cluster Synchronization service started successfully. But we are not able to start the crs. I have checked the below things
1) I have checked all the disks which are using by ocr for permissions
2) Checked the udp_sendspace , weather it is > 10240 bytes or not
3) Netmask mismatch between the nodes. The private interface must have the same netmask on all nodes.
I have checked the crsd log :-
[root@DWTEST2:/oracle/11.2.0/grid/log/dwtest2/crsd]# tail -f crsd.log
2014-07-21 12:09:15.599: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:15.799: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:16.000: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:16.200: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:16.401: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:16.601: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:16.801: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:17.002: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:17.202: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:17.402: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:17.603: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:17.804: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
2014-07-21 12:09:18.004: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
I am continuesly getting this error messages in the crsd.log .
In ohasd log i am getting the below errors :-
2014-07-21 13:23:00.921: [UiServer][7711] CS(1128e0db0)set Properties ( oracle,113886c90)
2014-07-21 13:23:00.921: [UiServer][7711] SS(1141c7070)Accepted client connection: saddr =(ADDRESS=(PROTOCOL=ipc)(DEV=716)(KE
Y=OHASD_UI_SOCKET))daddr = (ADDRESS=(PROTOCOL=ipc)(KEY=OHASD_UI_SOCKET))
2014-07-21 13:23:00.931: [UiServer][7454] {0:0:117} processMessage called
2014-07-21 13:23:00.932: [UiServer][7454] {0:0:117} Sending message to PE. ctx= 1119f1f90, Client PID: 7536812
2014-07-21 13:23:00.932: [UiServer][7454] {0:0:117} Sending command to PE: 24
2014-07-21 13:23:00.933: [ CRSPE][7197] {0:0:117} Processing PE command id=128. Description: [Stat Resource : 11387faf0]
2014-07-21 13:23:00.946: [UiServer][7454] {0:0:117} Done for ctx=1119f1f90
2014-07-21 13:23:00.948: [UiServer][7711] Closed: remote end failed/disc.
Finally i have checked the network interconnect(private) ,weather both are able to ping or not.
Ila I came to know what is the problem
[root@DWTEST2:/oracle/11.2.0/grid/log/dwtest2/ohasd]# ping DWTEST1-priv
PING DWTEST1-priv: (10.6.9.133): 56 data bytes
64 bytes from 10.6.9.133: icmp_seq=0 ttl=255 time=0 ms
64 bytes from 10.6.9.133: icmp_seq=2 ttl=255 time=0 ms
64 bytes from 10.6.9.133: icmp_seq=4 ttl=255 time=0 ms
— DWTEST1-priv ping statistics —
5 packets transmitted, 3 packets received, 40% packet loss
round-trip min/avg/max = 0/0/0 ms
[oracle@DWTEST1:/oracle/11.2.0/grid/log/dwtest1/crsd]$ ping DWTEST2-priv
PING DWTEST2-priv: (10.6.9.134): 56 data bytes
— DWTEST2-priv ping statistics —
7 packets transmitted, 0 packets received, 100% packet loss
If I tried to ping using the ipaddress it is getting successful but when we use the host name it is getting failed . I have informed the same to network team and came to know due to the duplicate ip address this problem is coming. Finall they removed the duplicate ip.
I have restared the cluster
[root@DWTEST2:/oracle/11.2.0/grid/bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘dwtest2’
CRS-2673: Attempting to stop ‘ora.crsd’ on ‘dwtest2’
CRS-2677: Stop of ‘ora.crsd’ on ‘dwtest2’ succeeded
CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘dwtest2’
CRS-2673: Attempting to stop ‘ora.crf’ on ‘dwtest2’
CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘dwtest2’
CRS-2673: Attempting to stop ‘ora.evmd’ on ‘dwtest2’
CRS-2673: Attempting to stop ‘ora.asm’ on ‘dwtest2’
CRS-2673: Attempting to stop ‘ora.drivers.acfs’ on ‘dwtest2’
CRS-2677: Stop of ‘ora.mdnsd’ on ‘dwtest2’ succeeded
CRS-2677: Stop of ‘ora.crf’ on ‘dwtest2’ succeeded
CRS-2677: Stop of ‘ora.evmd’ on ‘dwtest2’ succeeded
CRS-2677: Stop of ‘ora.asm’ on ‘dwtest2’ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘dwtest2’
CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘dwtest2’ succeeded
CRS-2677: Stop of ‘ora.ctssd’ on ‘dwtest2’ succeeded
CRS-2673: Attempting to stop ‘ora.cssd’ on ‘dwtest2’
CRS-2677: Stop of ‘ora.cssd’ on ‘dwtest2’ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘dwtest2’
CRS-2677: Stop of ‘ora.gipcd’ on ‘dwtest2’ succeeded
CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘dwtest2’
CRS-2677: Stop of ‘ora.drivers.acfs’ on ‘dwtest2’ succeeded
CRS-2677: Stop of ‘ora.gpnpd’ on ‘dwtest2’ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘dwtest2’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@DWTEST2:/oracle/11.2.0/grid/bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@DWTEST2:/oracle/11.2.0/grid/bin]# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@DWTEST2:/oracle/11.2.0/grid/bin]# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
.