|
Description
|
statd had a fixed limit of 265 file descriptors. This causes problems for setups with a lot of ip addresses (in this particular case a non-global zone wirh 2015 virtual ip addresses). During start of the svc:/network/nfs/status:default the statd exists after reaching this limit while opening the sockets.
From the truss of the statd() we can see:
[...]
16816/1: 1.0369 so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, 0x00000000, SOV_DEFAULT) = 254
16816/1: 0x00000000: ""
16816/1: 1.0370 ioctl(254, 0xC0786975, 0xFFBFF750) = 0
write/read (struct lifreq)
16816/1: 1.0370 ioctl(254, 0xC0786971, 0xFFBFF750) = 0
write/read (struct lifreq)
16816/1: 1.0371 getuid() = 0 [0]
16816/1: 1.0371 getuid() = 0 [0]
16816/1: 1.0372 door(6, 0xFFBFF258) = 0
16816/1: target=645 proc=0x2D61C data=0xDEADBEED
16816/1: attributes=DOOR_UNREF
16816/1: uniquifier=691
16816/1: 1.0403 door(6, 0xFFBFF300) = 0
16816/1: data_ptr=FF340000 data_size=255
16816/1: desc_ptr=0x0 desc_num=0
16816/1: rbuf=0xFF340000 rsize=25600
16816/1: 1.0403 brk(0x0029C738) = 0
16816/1: 1.0404 brk(0x0029E738) = 0
16816/1: 1.0405 so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, 0x00000000, SOV_DEFAULT) = 255
16816/1: 0x00000000: ""
16816/1: 1.0406 ioctl(255, 0xC0786975, 0xFFBFF750) = 0
write/read (struct lifreq)
16816/1: 1.0406 ioctl(255, 0xC0786971, 0xFFBFF750) = 0
write/read (struct lifreq)
16816/1: 1.0407 getuid() = 0 [0]
16816/1: 1.0407 getuid() = 0 [0]
16816/1: 1.0408 door(6, 0xFFBFF258) = 0
16816/1: target=645 proc=0x2D61C data=0xDEADBEED
16816/1: attributes=DOOR_UNREF
16816/1: uniquifier=691
16816/1: 1.0438 door(6, 0xFFBFF300) = 0
16816/1: data_ptr=FF340000 data_size=254
16816/1: desc_ptr=0x0 desc_num=0
16816/1: rbuf=0xFF340000 rsize=25600
16816/1: 1.0439 brk(0x0029E738) = 0
16816/1: 1.0439 brk(0x002A0738) = 0
16816/1: 1.0440 so_socket(PF_INET, SOCK_DGRAM, IPPROTO_IP, 0x00000000, SOV_DEFAULT) Err#24 EMFILE
16816/1: 0x00000000: ""
16816/1: 1.0441 fstat(-1, 0xFFBFEA30) Err#9 EBADF
16816/1: 1.0441 open(0xFF25AEF4, 01) Err#24 EMFILE
16816/1: 0xFF25AEF4: "/dev/conslog"
16816/1: 1.0442 fcntl(-1, 2, 0x00000001) Err#9 EBADF
16816/1: 1.0442 fstat(-1, 0xFFBFEA30) Err#9 EBADF
16816/1: 1.0443 fstat(-1, 0xFFBFF490) Err#9 EBADF
16816/1: 1.0444 open(0xFF30BC0C, 0) Err#24 EMFILE
16816/1: 0xFF30BC0C: "/dev/udp"
16816/1: 1.0445 schedctl() = 0xFF36E000
16816/1: 1.0446 sigaction(0x0000000C, 0xFFBFF6C0, 0xFFBFF760) = 0
16816/1: new: hand = 0x00000001 mask = 0 0 0 0 flags = 0x0012
16816/1: old: hand = 0x00000000 mask = 0 0 0 0 flags = 0x0000
16816/1: 1.0447 labelsys(1) = 0
16816/1: 1.0447 sigaction(0x0000000C, 0xFFBFF6C0, 0xFFBFF760) = 0
16816/1: new: hand = 0x00000000 mask = 0 0 0 0 flags = 0x0012
16816/1: old: hand = 0x00000001 mask = 0 0 0 0 flags = 0x0000
16816/1: 1.0448 open(0x0002E748, 02) Err#24 EMFILE
16816/1: 0x0002E748: "/dev/udp"
16816/1: 1.0450 open(0x0002E760, 02) Err#24 EMFILE
16816/1: 0x0002E760: "/dev/tcp"
16816/1: 1.0450 open(0x0002E7A8, 02) Err#24 EMFILE
16816/1: 0x0002E7A8: "/dev/ticlts"
16816/1: 1.0451 open(0x0002E808, 02) Err#24 EMFILE
16816/1: 0x0002E808: "/dev/ticotsord"
16816/1: 1.0451 open(0x0002E850, 02) Err#24 EMFILE
16816/1: 0x0002E850: "/dev/ticots"
16816/1: 1.0453 _exit(1)
This fixed limit is not tunable via the file descriptor ressource limit. Even after setting a higher ressource limit the statd fails back to use just 256 descriptors. From the pfiles output we can see:
364: /usr/lib/nfs/statd
Current rlimit: 256 file descriptors
[...]
statd needs a tunable filedescriptor ressource limit.
from Wolfgangs comments:
suggested fix:
*** /tmp/geta115 Wed Oct 25 10:15:59 1995
--- sm_svc.c Tue Oct 24 16:28:19 1995
***************
*** 190,196 ****
--- 190,204 ----
if (ppid != 0) {
exit(0);
}
+
+ /*
+ * Set the limit on open files to a very high number, so
+ * that servers with lots of clients don't run out of file
+ * descriptors. Then close all currently open files.
+ */
getrlimit(RLIMIT_NOFILE, &rl);
+ rl.rlim_cur = rl.rlim_max;
+ setrlimit(RLIMIT_NOFILE, &rl);
for (t = 0; t < rl.rlim_max; t++)
(void) close(t);
If we do look at the code of usr/src/cmd/fs.d/nfs/statd/sm_svc.c in Solaris 10 we can don't see the suggested fix but rather this code:
[...]
556
557 /* Set maxfdlimit current soft limit */
558 rl.rlim_cur = MAX_FDS;
559 if (setrlimit(RLIMIT_NOFILE, &rl) != 0)
560 syslog(LOG_ERR, "statd: unable to set RLIMIT_NOFILE to %d\n",
561 MAX_FDS);
562
[...]
And MAX_FDS is hardcoded to 256 in usr/src/cmd/fs.d/nfs/statd/sm_statd.h
CR# 1218695 is in state "fix delivered" but it seems that this fix never made it into the ON tree (only in cte_patch). The setrlimit() with the fixed limit was introduced in sccs rev 1.21 by the bugfix for CR# 1156444 (which however doens't mention 1218695 or anything regarding the file descriptor limit).
A fix would be to use rl.rlim_max as the upper limit (instead of MAX_FDS) in the setrlimit call (as suggested by 1218695).
[...]
Entry 1 wolfgang.ley [2008-09-08 16:44]
oss-bite-size
|