OpenSolaris

Printable Version Enter a New Search
Bug ID 6502215
Synopsis sap scalable app server remains in Degraded state after public network failure
State 10-Fix Delivered (Fix available in build)
Category:Subcategory suncluster:ha-sapwebas
Keywords
Responsible Engineer Hemachandran Namachivayam
Reported Against
Duplicate Of
Introduced In
Commit to Fix 3.2_patch_02
Fixed In 3.2_patch_02
Release Fixed 3.2_patch(3.2_patch_02)
Related Bugs 6550989
Submit Date 8-December-2006
Last Update Date 27-April-2007
Description
HA SAP 7.0 application server (scalable) is installed/configured on S10u2b08 SC3.2b71 on a 2 node x64 cluster.  While the application server is running on both cluster nodes, I disconnected the public network on node1.  About 30 minutes later, I reconnected the public network on node 1.  After that, I found the app server resource was on Degraded state and that the database might be down.  I manually connected to the sap database and it succeeded without any problem.  


# clrs status assc-rs

=== Cluster Resources ===

Resource Name       Node Name       State       Status Message
-------------       ---------       -----       --------------
assc-rs             pbulge1         Online      Degraded - Database might be down.
                    pbulge2         Online      Degraded - Database might be down.

Then, looking in the /var/adm/messages file, there are lots of messages regarding the monitor probe timing out and it's not restarting the resource as it should.  I also noticed that the app server processes are hang.  I tried killing the pid and it refused to go away.  

Dec  7 08:21:50 pbulge1 SC[SUNW.sap_as_v2,assc-rg,assc-rs,sap_as_probe]: [ID 646865 daemon.notice] Monitor probe time of 60.00 seconds is 100.00 percent of Probe timeout.
Dec  7 08:21:50 pbulge1 SC[SUNW.sap_as_v2,assc-rg,assc-rs,sap_as_probe]: [ID 646865 daemon.notice] Monitor probe time of 60.00 seconds is 100.00 percent of Probe timeout.
Dec  7 08:22:50 pbulge1 su: [ID 366847 auth.info] 'su qt1adm' succeeded for root on /dev/???
Dec  7 08:23:50 pbulge1 SC[SUNW.sap_as_v2,assc-rg,assc-rs,sap_as_probe]: [ID 646865 daemon.notice] Monitor probe time of 60.00 seconds is 100.00 percent of Probe timeout.

The client connection is also not working.  After I login, it's complaining "Runtime Error - Description of Exception.
Work Around
User has to increase the retry_interval to >= 4320 to meet the formula  { retry-interval >= partial failure value x threshold x (thorough-probe-interval + probe-timeout) } for a restart or failover to happen.
Comments
N/A