OpenSolaris

Printable Version Enter a New Search
Bug ID 6659883
Synopsis HA-SAMBA for SC 3.2 : incorrect grep token in probe can trigger false probe failure
State 10-Fix Delivered (Fix available in build)
Category:Subcategory suncluster:ha-samba
Keywords samba | usr/xpg4/bin/grep
Responsible Engineer Neil Garthwaite
Reported Against 3.2_fcs
Duplicate Of
Introduced In
Commit to Fix 3.2u2_12
Fixed In 3.2u2_12
Release Fixed 3.2u2_fcs(3.2u2_12) , osc(osc_29) (Bug ID:2166055)
Related Bugs
Submit Date 6-February-2008
Last Update Date 18-August-2008
Description
We built a cluster using 
     Solaris 10
     Sun Cluster 3.2
     VxVM /VxFS 4.1
     True Copy
     Samba 3.0.24
     HA-Samba for Sun Cluster 3.2 agent

The logical hostname is called "mscanberra" (here all logical hostname start with ms).
The cluster nodes are mssydney, msmelbourne (the theme for the cluster is the Australia).

When I start the samba resource, I receive the error message
Feb  6 15:03:23 msmelbourne SC[SUNWscsmb.samba.probe]:mscanberra-rg:mscanberra-smb-server: [ID 182069 daemon.error] check_smbd - Samba server <mscanberra> not working, failed to connect to samba-resource <scmondir>
Feb  6 15:03:53 msmelbourne last message repeated 12 times
Feb  6 15:03:55 msmelbourne SC[SUNWscsmb.samba.probe]:mscanberra-rg:mscanberra-smb-server: [ID 182069 daemon.error] check_smbd - Samba server <mscanberra> not working, failed to connect to samba-resource <scmondir>
Feb  6 15:07:15 msmelbourne last message repeated 77 times

But the customer can access to the netbios share using his PC running Windows XP and Windows Explorer.

Explanation:
The fault monitoring probe runs check_samba (see /opt/SUNWscsmb/bin/functions file).
The line (859) is:
               ${SMBCLIENT} '\\'${NETBIOSNAME}'\'${SAMBA_FMRESOURCE} -s ${SMBCONF} -U `/usr/bin/echo ${SAMBA_FMUSER}` \
                  -c 'pwd;exit' 2>/dev/null | /usr/xpg4/bin/grep -i -e ERR -e FAIL > ${TMPF}

In our case, the smbclient command run is
/mscanberra/product/samba/bin/smbclient '\\mscanberra\scmondir' -s /mscanberra/product/samba/mscanberra/lib/smb.conf -U 'CM\sambamon%samb
apass' -c 'pwd;exit' 2>/dev/null | /usr/xpg4/bin/grep -i -e ERR -e FAIL

In our case, the output of the smbclient is:
   Current directory is \\mscanberra\scmondir\

Unfortunately, mscanberra contains the string ERR if you ignore the upper/lower case (-i of grep).
setting preferred master to no, turns out in following message :

doing parameter preferred master = No

which is in turn matched by  "usr/xpg4/bin/grep -i -e ERR -e FAIL" as an error.


in /opt/SUNWscsmb/bin/functions). 

check_nmbd()
{
        debug_message "Function: check_nmbd - Begin"
        ${SET_DEBUG}

        rc=0

        if [ "${RUN_NMBD}" = "YES" ]
        then
           for lh in ${LHOST}
           do
                ${NMBLOOKUP} -s ${SMBCONF} -U ${lh} ${NETBIOSNAME} | \
 >>>>>            /usr/xpg4/bin/grep -i -e ERR -e FAIL > ${TMPF}

                if [ -s "${TMPF}" ]
                then
                   # SCMSGS
                   # @explanation
                   # nmblookup could not be performed.
                   # @user_action
                   # No user action is needed. The Samba server will be
                   # restarted.
                   scds_syslog -p daemon.error -t $(syslog_tag) -m \
                        "check_nmbd - Nmbd for <%s> is not working, failed to retrieve ipnumber for %s" \
                        "${NETBIOSNAME}" "${NETBIOSNAME}"
                   rc=1
                else
                   debug_message "check_nmbd - nmblookup for ${lh} is working"
                fi
           done
        fi

        debug_message "Function: check_nmbd - End"
        return ${rc}
}


this results in :

and results in: 
---
"Jul 17 11:01:03 node1 SC[SUNWscsmb.samba.probe]:tranrg:tran-smb-res: 
[ID 182069 daemon.error] check_smbd - Samba server <samba> not working, 
failed to connect to samba-resource <scmondir>"

and this in turn leads to exit code 100 for the probe-script
Work Around
N/A
Comments
N/A