OpenSolaris

Printable Version Enter a New Search
Bug ID 6745856
Synopsis statd has a fixed limit of max 256 file descriptors
State 10-Fix Delivered (Fix available in build)
Category:Subcategory network:locking
Keywords oss-bite-size | rtiq_reviewed
Sponsor
Submitter
Responsible Engineer Pavel Filipensky
Reported Against 5.10
Duplicate Of
Introduced In solaris_7
Commit to Fix snv_102
Fixed In snv_102
Release Fixed solaris_nevada(snv_102) , solaris_10u7(s10u7_02) (Bug ID:2168099)
Related Bugs 1156444 , 1218695
Submit Date 8-September-2008
Last Update Date 6-November-2008
Description
statd had a fixed limit of 265 file descriptors. This causes problems for setups with a lot of ip addresses (in this particular case a non-global zone wirh 2015 virtual ip addresses). During start of the svc:/network/nfs/status:default the statd exists after reaching this limit while opening the sockets.

From the truss of the statd() we can see:

[...]
16816/1:         1.0369 so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, 0x00000000, SOV_DEFAULT) = 254
16816/1:             0x00000000: ""
16816/1:         1.0370 ioctl(254, 0xC0786975, 0xFFBFF750)              = 0
                write/read (struct lifreq)
16816/1:         1.0370 ioctl(254, 0xC0786971, 0xFFBFF750)              = 0
                write/read (struct lifreq)
16816/1:         1.0371 getuid()                                        = 0 [0]
16816/1:         1.0371 getuid()                                        = 0 [0]
16816/1:         1.0372 door(6, 0xFFBFF258)                             = 0
16816/1:                target=645 proc=0x2D61C data=0xDEADBEED
16816/1:                attributes=DOOR_UNREF
16816/1:                uniquifier=691
16816/1:         1.0403 door(6, 0xFFBFF300)                             = 0
16816/1:                data_ptr=FF340000 data_size=255
16816/1:                desc_ptr=0x0 desc_num=0
16816/1:                rbuf=0xFF340000 rsize=25600
16816/1:         1.0403 brk(0x0029C738)                                 = 0
16816/1:         1.0404 brk(0x0029E738)                                 = 0
16816/1:         1.0405 so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, 0x00000000, SOV_DEFAULT) = 255
16816/1:             0x00000000: ""
16816/1:         1.0406 ioctl(255, 0xC0786975, 0xFFBFF750)              = 0
                write/read (struct lifreq)
16816/1:         1.0406 ioctl(255, 0xC0786971, 0xFFBFF750)              = 0
                write/read (struct lifreq)
16816/1:         1.0407 getuid()                                        = 0 [0]
16816/1:         1.0407 getuid()                                        = 0 [0]
16816/1:         1.0408 door(6, 0xFFBFF258)                             = 0
16816/1:                target=645 proc=0x2D61C data=0xDEADBEED
16816/1:                attributes=DOOR_UNREF
16816/1:                uniquifier=691
16816/1:         1.0438 door(6, 0xFFBFF300)                             = 0
16816/1:                data_ptr=FF340000 data_size=254
16816/1:                desc_ptr=0x0 desc_num=0
16816/1:                rbuf=0xFF340000 rsize=25600
16816/1:         1.0439 brk(0x0029E738)                                 = 0
16816/1:         1.0439 brk(0x002A0738)                                 = 0
16816/1:         1.0440 so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, 0x00000000, SOV_DEFAULT) Err#24 EMFILE
16816/1:             0x00000000: ""
16816/1:         1.0441 fstat(-1, 0xFFBFEA30)                           Err#9 EBADF
16816/1:         1.0441 open(0xFF25AEF4, 01)                            Err#24 EMFILE
16816/1:             0xFF25AEF4: "/dev/conslog"
16816/1:         1.0442 fcntl(-1, 2, 0x00000001)                        Err#9 EBADF
16816/1:         1.0442 fstat(-1, 0xFFBFEA30)                           Err#9 EBADF
16816/1:         1.0443 fstat(-1, 0xFFBFF490)                           Err#9 EBADF
16816/1:         1.0444 open(0xFF30BC0C, 0)                             Err#24 EMFILE
16816/1:             0xFF30BC0C: "/dev/udp"
16816/1:         1.0445 schedctl()                                      = 0xFF36E000
16816/1:         1.0446 sigaction(0x0000000C, 0xFFBFF6C0, 0xFFBFF760)   = 0
16816/1:            new: hand = 0x00000001 mask = 0 0 0 0 flags = 0x0012
16816/1:            old: hand = 0x00000000 mask = 0 0 0 0 flags = 0x0000
16816/1:         1.0447 labelsys(1)                                     = 0
16816/1:         1.0447 sigaction(0x0000000C, 0xFFBFF6C0, 0xFFBFF760)   = 0
16816/1:            new: hand = 0x00000000 mask = 0 0 0 0 flags = 0x0012
16816/1:            old: hand = 0x00000001 mask = 0 0 0 0 flags = 0x0000
16816/1:         1.0448 open(0x0002E748, 02)                            Err#24 EMFILE
16816/1:             0x0002E748: "/dev/udp"
16816/1:         1.0450 open(0x0002E760, 02)                            Err#24 EMFILE
16816/1:             0x0002E760: "/dev/tcp"
16816/1:         1.0450 open(0x0002E7A8, 02)                            Err#24 EMFILE
16816/1:             0x0002E7A8: "/dev/ticlts"
16816/1:         1.0451 open(0x0002E808, 02)                            Err#24 EMFILE
16816/1:             0x0002E808: "/dev/ticotsord"
16816/1:         1.0451 open(0x0002E850, 02)                            Err#24 EMFILE
16816/1:             0x0002E850: "/dev/ticots"
16816/1:         1.0453 _exit(1)

This fixed limit is not tunable via the file descriptor ressource limit. Even after setting a higher ressource limit the statd fails back to use just 256 descriptors. From the pfiles output we can see:
364:    /usr/lib/nfs/statd
  Current rlimit: 256 file descriptors
[...]

statd needs a tunable filedescriptor ressource limit.
from Wolfgangs comments:

suggested fix:

*** /tmp/geta115	Wed Oct 25 10:15:59 1995
--- sm_svc.c	Tue Oct 24 16:28:19 1995
***************
*** 190,196 ****
--- 190,204 ----
  		if (ppid != 0) {
  			exit(0);
  		}
+ 
+ 		/*
+ 		 * Set the limit on open files to a very high number, so
+ 		 * that servers with lots of clients don't run out of file
+ 		 * descriptors.  Then close all currently open files.
+ 		 */
  		getrlimit(RLIMIT_NOFILE, &rl);
+ 		rl.rlim_cur = rl.rlim_max;
+ 		setrlimit(RLIMIT_NOFILE, &rl);
  		for (t = 0; t < rl.rlim_max; t++)
  			(void) close(t);

If we do look at the code of usr/src/cmd/fs.d/nfs/statd/sm_svc.c in Solaris 10 we can don't see the suggested fix but rather this code:

[...]
    556 
    557 	/* Set maxfdlimit current soft limit */
    558 	rl.rlim_cur = MAX_FDS;
    559 	if (setrlimit(RLIMIT_NOFILE, &rl) != 0)
    560 		syslog(LOG_ERR, "statd: unable to set RLIMIT_NOFILE to %d\n",
    561 			MAX_FDS);
    562 
[...]

And MAX_FDS is hardcoded to 256 in usr/src/cmd/fs.d/nfs/statd/sm_statd.h

CR# 1218695 is in state "fix delivered" but it seems that this fix never made it into the ON tree (only in cte_patch). The setrlimit() with the fixed limit was introduced in sccs rev 1.21 by the bugfix for CR# 1156444 (which however doens't mention 1218695 or anything regarding the file descriptor limit).

A fix would be to use rl.rlim_max as the upper limit (instead of MAX_FDS) in the setrlimit call (as suggested by 1218695).
[...]

Entry 1 wolfgang.ley [2008-09-08 16:44]

oss-bite-size
Work Around
use NFSv4
Comments
N/A