OpenSolaris

Printable Version Enter a New Search
Bug ID 6736845
Synopsis ISCSI Core file Generated along with I/O failure when ISCSI I/Os were in progress to multiple LUNS
State 10-Fix Delivered (Fix available in build)
Category:Subcategory storage_target:iscsi
Keywords
Responsible Engineer Tim Szeto
Reported Against fw_34
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_98
Fixed In snv_98
Release Fixed solaris_nevada(snv_98) , solaris_10u7(s10u7_04) (Bug ID:2168014)
Related Bugs
Submit Date 13-August-2008
Last Update Date 27-November-2008
Description
ISCSI Core file Generated along with I/O failure when ISCSI I/Os were in progress to multiple LUNS from Suse Linux to Iwashi NAS Appliance:-

akash-ar# mdb core.iscsitgtd.102600
Loading modules: [ libumem.so.1 libc.so.1 libuutil.so.1 libavl.so.1 libtopo.so.1 libnvpair.so.1 ld.so.1 ]
> ::stack
libc.so.1`mutex_lock_impl+0x20()
libc.so.1`mutex_lock+0x3d()
queue_message_set+0x60()
sess_process+0x2b4()
libc.so.1`_thr_setup+0x89()
libc.so.1`_lwp_start()
> ::status
debugging core file of iscsitgtd (64-bit) from akash-ar
file: /usr/sbin/amd64/iscsitgtd
initial argv: /usr/sbin/iscsitgtd
threading model: native threads
status: process terminated by SIGSEGV (Segmentation Fault)
> ::quit


When the core file was generated GRITS I/Os were in progress from Suse Linux Client to Iwashi NAS Appliance :-

GRITS Config file:-

/tmp/test1 write/verify     128k    900M        AA
/tmp/test2 write/verify     128k    900M        BB
/tmp/test3 write/verify     128k    900M        CC
/tmp/test4 write/verify     128k    900M        DD
/tmp/test5 write/verify     128k    900M        EE


NAS Appliance used for testing:-
akash-ar.central.sun.com
IP - 10.9.161.25
Iwashi
fw_34


Linux Client used for testing:-
frederic-ar.central.sun.com
IP - 10.9.160.75
Linux frederic-ar 2.6.16.46-0.12-smp #1 SMP Thu May 17 14:00:09 UTC 2007 x86_64 x86_64 x86_64 GNU/Linux


ISCSI Initiator used for testing:-
OPen Iscsi
static Lun discovery was used for testing.
CHAP and RADIUS were not used
Only one initiator i.e. Suse Linux was used for testing


unem debugging was already enabled on the NAS Appliance using following procedure:-

MEM Testing & SMF

To enable default umem_debug testing, invoke the following:

    # svccfg -s iscsitgt setenv LD_PRELOAD libumem.so
    # svccfg -s iscsitgt setenv UMEM_LOGGING transaction,contents
    # svccfg -s iscsitgt setenv UMEM_DEBUG default
    # svcadm refresh iscsitgt
    # svcadm restart iscsitgt

To check if various environmentals are enabled

    # pargs -e `pgrep -f iscsitgt` | grep -i umem
    envp[0]: LD_PRELOAD=libumem.so
    envp[7]: UMEM_DEBUG=default
[wdp, 8/13/08]

Build fw_34 contains all of the iSCSI fixes up to snv_95 plus 6729590.
Disassembly at sess_process+0x2b4 shows the queue_message_set is for
msg_pthread_join (x18) when processing the msg_shutdown after the call
to t10_handle_disable.  From the core file, the value of s_mgmtq passed
to queue_message_set is 0xdeadbeefdeadbeff.

Reproduced the failure locally. Using Dtrace, the captured code flow
was:
                                                                  
  queue_message_set entry with q = 0x4f2f50, called from sess_from_t10
  sess_from_t10 return
    [ ... ]
  queue_message_set entry with q = deadbeefdeadbeef, called from sess_process

When sess_from_t10 exits it frees iscsi_sess_t *s. This is the same
value that was passed into sess_process for it's iscsi_sess_t *s, and
the source of s_mgmtq.  

The code in sess_process:

queue_message_set(s->s_t10q, 0,
    msg_shutdown_rsp, 0);
process = False;
queue_message_set(s->s_mgmtq, 0,
    msg_pthread_join,
    (void *)(uintptr_t)pthread_self());

Posting the msg_shutdown_rsp message to s->s_t10q causes the
sess_from_t10 thread to run, doing its clean up and exiting.

Re-order the code in sess_process so that all session processing and 
iscsi_sess_t struct accesses are complete before posting the 
msg_shutdown_rsp.
Work Around
N/A
Comments
N/A