OpenSolaris

Printable Version Enter a New Search
Bug ID 6513858
Synopsis Deleting large file while creating another on full UFS, spending lots of time in ufs_log_amt() loop.
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:ufs
Keywords ENOSPC | rtiq_reviewed | ufs-cleanup | ufs_delete | ufs_log_amt
Responsible Engineer Vidya Sakar
Reported Against snv_49 , solaris_9
Duplicate Of
Introduced In solaris_7
Commit to Fix snv_82
Fixed In snv_82
Release Fixed solaris_nevada(snv_82) , solaris_10u6(s10u6_01) (Bug ID:2161046) solaris_9(Bug ID:2161047,)
Related Bugs 6483587 , 6687760 , 6723423 , 6790471 , 5012326
Submit Date 17-January-2007
Last Update Date 31-January-2008
Description
When data is written to file and filling the file system (ufs), CU starts to delete older files which are fairly large (> 150GB). %sys time started to shot up, 'df -k' shown 100% full for hours. 

ufs_delete() thread appeared to be busy reserving log space, repeatedly spin in ufs_trans_trunc_resv() in a loop calling ufs_log_amt():

ufs:ufs_log_amt+0x134()
ufs:ufs_trans_trunc_resv+0x80()
ufs:ufs_trans_itrunc+0xb4()
ufs:ufs_delete+0x39c()
ufs:ufs_thread_delete+0xc4()
unix:thread_start+0x4()

### Background:
### ----------
### This issue came about when CU was using NBU to backup to
### a UFS file system. When the file system was full, NBU will
### start deleting old files (These are large, a few hundred
### GBs). CU started to observe high %sys time on one CPU and
### 'df -k' showing disk space remaining 100% full for a long
### time, though the directory entry was no longer showing the
### old files.

### CU has offline all CPUs and left with one running. Under the
### same above scenerio, disk space got freed up very fast and no
### high %sys time was observed.


### Details
### -------
### This can be consistently reproduce by filling up the FS 
### with a large file to, let's say 95% full. Then start
### writing a new file to it, when we hit ENOSPC, sleep for
### a short while, then retry again. When ENOSPC is reached,
### start delete the old large file.

### Immediately we will see the %sys time for the CPU where 
### ufs_delete() thread for the FS is running shooting up to
### 100%. lockstat profiling will show that the system is 
### calling ufs_log_amt() from the for() loop in ufs_trans_trunc_serv().

uts/common/fs/ufs/ufs_trans.c:

 914         nchunks = 1; 
 915         for (; (resv = ufs_log_amt(ip, offset, resid, 1)) > ufs_trans_max_resv; 
 916                 offset = length + (nchunks - 1) * resid) { 
 917                 nchunks++; 
 918                 resid = size / nchunks; 
 919         } 
### Test program based on CU source, cu_cp.c:

/* cu_cp.c */
#include <stdlib.h> 
#include <stdio.h> 
#include <signal.h> 
#include <sys/types.h> 
#include <sys/stat.h> 
#include <sys/lock.h> 
#include <unistd.h> 
#include <fcntl.h> 
#include <errno.h> 
 
 
int main(int argc, char *argv[]) 
{ 
        int f1,f2; 
        int  i,n; 
        /*char buf[8192];*/ 
        char buf[262144]; 
 
 
       if (plock(PROCLOCK) < 0) 
                perror("plock error"); 

 
        if (argc != 3) 
                printf("arg is not 2\n"); 
 
        f1 = open(argv[1], O_RDONLY|O_LARGEFILE); 
        if (f1 < 0){ 
                fprintf(stderr, "%s: open failed.\n", argv[1]); 
                exit(1); 
        } 

        f2 = open(argv[2], O_RDWR|O_CREAT|O_LARGEFILE); 
        if (f2 < 0){ 
                fprintf(stderr, "%s: open failed.\n", argv[2]); 
                exit(1); 
        } 
 
 
        for (;;) { 
                n = read(f1, buf, sizeof (buf)); 

                if (n == 0) 
                        return(0);
                if (n < 0) {
                        perror("Read error.");
                        (void) close(f1); 
                        (void) close(f2); 
                        return (1); 
                }


                while (write(f2, buf, n) == -1) { 
                        if (errno == ENOSPC) { 
                                fprintf(stderr, "write() ENOSPC. sleep/retry.\n"
);
                                sleep (1); 
                                continue; 
                        }  
                        perror("Write error."); 
                        (void) close(f1); 
                        (void) close(f2); 
                        return (1); 
                } /* while() */
                fprintf(stderr, "write() ok.\n");

        } /* for() */
 
} 
/* end of cu_cp.c */

### In my test scenario in the lab, I was deleting an old 130GB 
### test_b.out file, while cu_cp was creating a test_a.out file.

### stack dump on ufs_delete() thread for the FS is showing
### in ufs_trans_trunc_resv() all the time.

-- ktl0 regs data  rp: 0x2a1003856e0
pc:  0x11b061c ufs:ufs_log_amt+0x100:   mulx    %g3, %l3, %g3
npc: 0x11b0620 ufs:ufs_log_amt+0x104:   divx      %g3, %l3, %g3
  global:                       %g1          0x11c0a4c
        %g2             0x1fff  %g3                0x1
        %g4              0x7ff  %g5               0x3d
        %g6               0x10  %g7      0x2a100385d40
  out:  %o0          0x102bb0f  %o1              0x800
        %o2                  0  %o3             0x2000
        %o4      0x30002c64000  %o5          0x102bb0f
        %sp      0x2a100384f81  %o7               0x3c
  loc:  %l0                0x1  %l1             0x2080  
        %l2             0x4600  %l3             0x2000  
        %l4                0x2  %l5 0xfffffffffffffca0  
        %l6                  0  %l7      0x2a100385790  
  in:   %i0      0x30004e72e80  %i1       0x2057605feb  
        %i2            0x76e71  %i3                0x1  
        %i4                  0  %i5                  0  
        %fp      0x2a100385031  %i7          0x11b0788  
<trap>ufs:ufs_log_amt+0x100(0x30004e72e80, 0x2057605feb, 0x76e71, 0x1, 0x0, 0x0)
ufs:ufs_trans_trunc_resv+0x80(0x30004e72e80, 0x0, 0x2a100385998, 0x2a100385990, 0x8, 0x8)
ufs:ufs_trans_itrunc+0xb4(0x30004e72e80, 0x0, 0x1)
ufs:ufs_delete+0x39c(0x3000004f888, 0x30004e72e80, 0x1, 0x14a8308, 0x2a100385d40, 0x0)
ufs:ufs_thread_delete+0xc4(0x3000027d2c8, 0x0, 0x300010faa28, 0x1438788, 0x16, 0x0)
unix:thread_start+0x4()
-- end of kernel thread's stack --

### At the same time, df -k is showing that the FS is at 100% capacity,
### and %sys going 100%.

bash-2.05# df -k /b
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/dsk/c1t13d0s2   141185027 141184553       0   100%    /b

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0 2630    60   56  124    3    0    0    0    62    0   4   0  96
  1    0   0    3   215  112    0    0    0    0    0     0    0 100   0   0

bash-2.05# lockstat -D 20 -kgIW sleep 10

Profiling interrupt: 1939 events in 9.998 seconds (194 events/sec)

Count genr cuml rcnt     nsec Hottest CPU+PIL        Caller                  
-------------------------------------------------------------------------------
 1939 100% ---- 1.00      496 cpu0                   lockstat_intr           
 1939 100% ---- 1.00      496 cpu0                   cyclic_fire             
 1939 100% ---- 1.00      496 cpu0                   cbe_level14             
 1939 100% ---- 1.00      496 cpu0                   current_thread          
 1206  62% ---- 1.00      547 cpu[0]                 idle                    
 1059  55% ---- 1.00      442 cpu0                   ufs_trans_trunc_resv    
  980  51% ---- 1.00      553 cpu[0]                 disp_getwork            
  968  50% ---- 1.00      444 cpu0                   ufs_trans_itrunc        
  968  50% ---- 1.00      444 cpu0                   ufs_thread_delete       
  968  50% ---- 1.00      444 cpu0                   ufs_delete              
  875  45% ---- 1.00      445 cpu0                   ufs_log_amt             
  217  11% ---- 1.00      548 cpu[0]                 i_ddi_splx              
  173   9% ---- 1.00      566 cpu[0]                 i_ddi_splhigh           

### It was observed we were not entirely just spinning in the for() loop
### as from the test cu_cp.c program, we can see that disk space did
### get released slowly.
...
write() ok.
write() ok.
write() ok.
write() ok.
write() ENOSPC. sleep/retry.
write() ok.
write() ok.
write() ok.
write() ok.
write() ok.
write() ok.
write() ok.
write() ok.
write() ok.
write() ok.
write() ok.
write() ENOSPC. sleep/retry.
write() ok.
...

### I was not able to reproduce this on S10. And this issue
### can only be observed on a multi CPU machine. If the test
### is run on a single CPU machine, disk space got released
### very fast.

### A similar CR# to this CR#6483587. However, in our case,
### if waited long enough (a few hours for a 130GB file deletion),
### ufs_trans_itrunc() will eventually gone into completion.
Work Around
- Offline all CPUs except one. Not really a practical suggestion for the CU.
- Disable ufs logging.

*** (#1 of 1): [ UNSAVED ] sang- xxxxx@xxxxx.com
do not run a UFS file system constantly nearly full in production.
Comments
N/A