OpenSolaris

Printable Version Enter a New Search
Bug ID 1259818
Synopsis kernel stack overflows during vmstress core dump
State 10-Fix Delivered (Fix available in build)
Category:Subcategory kernel:other
Keywords core | dump | kernel | nfs | overflow | overflows | rpc | stack | vmstress
Responsible Engineer Bryan Cantrill
Reported Against 5.6 , 5.5.1 , s297_12
Duplicate Of
Introduced In
Commit to Fix s297_36
Fixed In s297_36
Release Fixed solaris_2.6(s297_36)
Related Bugs 4044663 , 4172513 , 4256960 , 4261240 , 4300946 , 4409087 , 4705803 , 4143872
Submit Date 18-July-1996
Last Update Date 25-January-2006
Description
[07/18/96]

I was running vmstress. On some of it tests it core dumps. Core dumping
is more or less OK for vmstress according to its README file. But core
dumping causes kernel stack overflow and panic. Here's the stack
trace:

{0} ok ctrace
PC: f005ac6c 
Last leaf: jmpl  f005b18c    from 100079a8  client_handler+38  
     0 w  %o0-%o5: (10000000 16 f0000000 1 3 1 )
 
call 10007970  client_handler        from 10041b00  p1275_sparc_cif_handler+20  
     1 w  %o0-%o5: (f005b18c 104077b8 0 0 3 10407704 )
 
call 10041ae0  p1275_sparc_cif_handler        from 1003e914  prom_enter_mon+34  
     2 w  %o0-%o5: (104077b8 1e e 10420248 0 10421a98 )
 
call 1003e8e0  prom_enter_mon        from 10025330  debug_enter+a0  
     3 w  %o0-%o5: (1042b400 0 0 100074bc 513c0580 3037c860 )
 
call 10025290  debug_enter        from 10024114  do_panic+180 
     4 w  %o0-%o5: (0 0 10006ed0 0 10006ed4 44 )
 
call 10023f94  do_panic        from 10023f84  panic+1c  
     5 w  %o0-%o5: (10406a18 ffffffc0 10413668 0 ffffffff 10408c00 )
 
call 10023f68  panic        from 100070ac  sys_tl1_panic+8   
     6 w  %o0-%o5: (104069d4 2 0 0 0 50ca0d90 )
 
call 10036610  splx        from 10046328  disp_getwork+c8  
     7 w  %o0-%o5: (a db 5 0 0 0 )
 
call 10046260  disp_getwork        from 10044434  disp+b0  
     8 w  %o0-%o5: (0 10423004 0 0 ffffffff 10421590 )
 
call 10044384  disp        from 10044770  swtch+124 
     9 w  %o0-%o5: (0 1041b290 1045dd68 0 ffffffff ffffffff )
 
call 1004464c  swtch        from 1006f54c  genunix:cv_timedwait_sig+28c 
     a w  %o0-%o5: (fcda4 513c0580 10421590 1045dd68 513c0580 535584e0 )
 
call 1006f2c0  genunix:cv_timedwait_sig        from 507536b0  rpcmod:clnt_cots_kcallit+5f4 
     b w  %o0-%o5: (5063067c 507b5b54 fd18c 10463c00 2001788c 0 )
 
jmpl  507530bc  rpcmod:clnt_cots_kcallit        from 51025db0  nfs:rfscall+384 
     c w  %o0-%o5: (50630670 5075cc18 5075b100 fcda4 50630660 50630660 )
 
{0} ok 3037c288 d8 + .stacktrace
call 510258bc  nfs:rfs3call     from 51036f2c  nfs:nfs3write_rpccall+a0  
 ( 506a99a8 7 5103f4a8 3037c568 5103f6e0 3037c4d0 )
call 51036e8c  nfs:nfs3write_rpccall     from 51037028  nfs:nfs3write+90  
 ( 506a99a8 3037c568 3037c4d0 507a37b0 52973028 506a99a8 )
call 51036f98  nfs:nfs3write     from 5103b348  nfs:nfs3_bio+3f8 
 ( 512d4af4 524fc000 0 8000 2000 507a37b0 )
call 5103af50  nfs:nfs3_bio     from 51036e30  nfs:nfs3_rdwrlbn+d0  
 ( 2000 3037c6b4 532dd3b4 510460c4 10464304 51048000 )
call 51036d60  nfs:nfs3_rdwrlbn     from 5103c594  nfs:nfs3_sync_putapage+24  
 ( 512d4af4 106fb800 0 8000 532dd398 10100 )
call 5103c570  nfs:nfs3_sync_putapage     from 5103c530  nfs:nfs3_putapage+378 
 ( 512d4af4 106fb800 0 8000 2000 10000 )
jmpl  0 from 100f3980  genunix:pvn_vplist_dirty+3ac 
 ( 512d4af4 106fb800 0 0 10000 507a37b0 )
call 100f35d4  genunix:pvn_vplist_dirty     from 51022c90  nfs:nfs_putpages+12c 
 ( 10463e0c 10456434 0 5103c1b8 10000 507a37b0 )
call 51022b64  nfs:nfs_putpages     from 5103c180  nfs:nfs3_putpage+a8  
 ( 512d4af4 0 0 2 512d4b3c 507a37b0 )
jmpl  10015ed4  gen_clk_int     from 51020288  nfs:nfs_purge_caches+98  
 ( 512d4af4 0 0 0 10000 507a37b0 )
call 510201f0  nfs:nfs_purge_caches     from 51020510  nfs:nfs_cache_check+f0  
 ( 512d4af4 507a37b0 0 0 512d4b3c 512d4ae8 )
call 51020420  nfs:nfs_cache_check     from 51021094  nfs:nfs3_getattr_otw+1a0 
 ( 512d4af4 1 1 0 24ab720 3037cb70 )
call 51020ef4  nfs:nfs3_getattr_otw     from 510201b4  nfs:nfs3_validate_caches+bc  
 ( 0 3037cc28 507a37b0 5103ec70 5103ec18 0 )
call 510200f8  nfs:nfs3_validate_caches     from 5103b608  nfs:nfs3_getpage+48  
 ( 512d4af4 507a37b0 512d4c28 512d4ae8 512d4b3c 512d4af4 )
jmpl  10015ed4  gen_clk_int     from 100a64e8  genunix:segmap_fault+160 
 ( 512d4af4 0 0 2000 3037cddc 3037cdd0 )
jmpl  10015ed4  gen_clk_int     from 100eed44  genunix:as_fault+400 
 ( 50733fc8 50975f98 40ffc000 2000 0 2 )
call 100ee944  genunix:as_fault     from 100304f8  pagefault+34  
 ( 50975f98 50975f98 2000 0 40ffc000 40ffc000 )
call 100304c4  pagefault     from 1002e328  trap+700 
 ( 40ffc000 0 2 1 1042a7a4 51611680 )
call 1002dc28  trap     from 10020690  sfmmu_tsb_miss+624 
 ( 3037d0c8 10000 4 1000bdcc 0 0 )
????  from 100074bc  prom_rtt+118 
 ( 1042c000 0 0 50733fc8 0 10574848 )
call 1000d3ac  bcopy+1528 from 1000d318  bcopy+1494 
 ( 7 3c 7 5613 2080 0 )
call 1000bd68  kcopy     from 10092984  genunix:uiomove+f0  
 ( 1241d4b8 40ffc638 4 1f8 1 10466548 )
call 10092894  genunix:uiomove     from 51022a54  nfs:writerp+1f8 
 ( 40ffc638 3037d5a0 1 1fc 3037d598 3037d5c0 )
call 5102285c  nfs:writerp     from 51036b54  nfs:nfs3_write+25c 
 ( 1fc 3037d5a0 0 638 1fc 40ffc638 )
jmpl  10015ed4  gen_clk_int     from 100f7cb4  genunix:vn_rdwr+c8  
 ( 1fc 0 40ffc000 fffffffd 506a99a8 512d4ae8 )
call 100f7bec  genunix:vn_rdwr     from 507ffe20  elfexec:elfnote+cc  
 ( 1 512d4af4 0 507a37b0 0 638 )
call 507ffd54  elfexec:elfnote     from 50800ccc  elfexec:write_old_elfnotes+1f0 
 ( 512d4af4 3037d814 1 507a37b0 53419af0 0 )
call 50800adc  elfexec:write_old_elfnotes     from 508001ec  elfexec:elfcore+39c 
 ( 0 51611568 513c0580 3037d814 0 7fffffff )
jmpl  10015ed4  gen_clk_int     from 10070068  genunix:core+258 
 ( 0 1 52af3040 0 20 52af3000 )
call 1006fe10  genunix:core     from 100b3060  genunix:psig+358 
 ( 1040b518 524e3ce8 507a37b0 0 7fffffff b )
call 100b2d08  genunix:psig     from 1002f4f4  trap_cleanup+1ec 
 ( 2 b 10460000 b 0 400 )
call 1002f308  trap_cleanup     from 1002f290  trap+1668 
 ( 3037dae8 6 3037da60 1 3 51611568 )
call 10023f68  panic     from 100070ac  sys_tl1_panic+8   
 ( 3037dae8 10000 5 148f0 524e3ce8 10000 )
XXXXXXX from 148bc 
 ( 8 0 76db46a8 2f788 12d008 14770 )

Solaris per thread kernel stack is 8K (actually less than 7K). The actual overflow happened as follows. We were almost out of stack while doing a context switch. Top of the stack was 3037dfff and were already at 3037c0d8. We got a level 5 interrupt and inside _sys_trap bought a new window at TL=1. And when it did:

have_win+30      stx         %l2, [%l7] to store %tstate we got onto stack's red zone and then got into tl1 panic code.

Dump of the window where have_win above was running.

{0} ok 104079f8 40 ldump
104079f8: 10044440  10044444  80001e04  10009ed4  .. xxxxx@xxxxx.DD........
10407a08: 506a99a8  512d4c4c         0  3037bfd0  Pj..Q-LL....07..
10407a18:        0  10423004         0         0  .....B0.........
10407a28: ffffffff  10421590  10407a58  10044434  ..... xxxxx@xxxxx.D4

As you see %l7 is 0x3037bfd0. i,.e in stack's red zone.
Work Around
[ali 4/22/97]:

The immediate problem can be workaround by increasing lwp_default_stksize.
This limits the number of kernel threads the system can support though.
Comments
N/A