OpenSolaris

Printable Version Enter a New Search
Bug ID 6803137
Synopsis initiator panic during iSER test run, at mpxio enable reboot to enable multi-pathing
State 10-Fix Delivered (Fix available in build)
Category:Subcategory driver:iscsi
Keywords iser_depends | rtiq-regression | rtiq-reviewed
Responsible Engineer Andrew Rutz
Reported Against b_008
Duplicate Of
Introduced In solaris_nevada
Commit to Fix snv_110
Fixed In snv_110
Release Fixed solaris_nevada(snv_110) , solaris_10u8(s10u8_01) (Bug ID:2173124)
Related Bugs 6617068 , 6814476
Submit Date 9-February-2009
Last Update Date 14-September-2009
Description
initiator panic during iSER test run, at mpxio enable reboot to enable multi-pathing

panic on cpu 0
panic string:   BAD TRAP: type=d (#gp General protection) rp=ffffff00189e2910 addr=0
Work Around
N/A
Comments
The underlying problem is a bug in the logic for the outer loop of iscsi_sess_report luns (see eval).  There are 0 luns reported in the report-luns data which may have contributed to the problem.

> 0xffffff00189e2ad0-0x70 ::print struct uscsi_cmd
{
    uscsi_flags = 0x8
    uscsi_status = 0
    uscsi_timeout = 0x3c
    uscsi_cdb = 0xffffff00189e2a50
    uscsi_bufaddr = 0xffffff0468e488b0
    uscsi_buflen = 0x84
    uscsi_resid = 0x3c
    uscsi_cdblen = 0xc
    uscsi_rqlen = 0
    uscsi_rqstatus = 0
    uscsi_rqresid = 0
    uscsi_rqbuf = 0
    uscsi_path_instance = 0
}
> 0xffffff00189e2a50,12 ::dump -g 1
                   \/  1  2  3  4  5  6  7   8  9  a  b  c  d  e  f  v123456789a
bcdef
ffffff00189e2a50:  a0 00 00 00 00 00 00 00  00 84 00 00 00 00 ff ff  
................
ffffff00189e2a60:  08 00 00 00 00 00 3c 00  50 2a 9e 18 00 ff ff ff  
......<.P*......

*** Note here that the initiator sets "select report" to 0, which defines the following behavior (from ANSI SPC): "The list shall contain the logical units accessible to the I_T nexus .... If there are no logical units, the LUN LIST LENGTH field shall be zero".

The data buffer shows 0 luns:

> 0xffffff0468e488b0,0x84 ::dump -g 1
                   \/  1  2  3  4  5  6  7   8  9  a  b  c  d  e  f  v123456789a
bcdef
ffffff0468e488b0:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
................
ffffff0468e488c0:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
................
ffffff0468e488d0:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
................
ffffff0468e488e0:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
................
ffffff0468e488f0:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
................
ffffff0468e48900:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
................
ffffff0468e48910:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
................
ffffff0468e48920:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
................
ffffff0468e48930:  00 00 00 00 bb ca dd ba  fe ca dd ba fe ca dd ba  
................

which according to the spec is valid and simply means that no luns are mapped.  Given the test case and the fact that the iscsi_sess_t shows several luns on the lun list, I'm suspicious that COMSTAR is not giving us exactly the right data here but in any case it should not cause us to panic.  Local variables that I pulled out of the stackl appear consistent with the data above:

		for (lun_count = lun_start; lun_count < lun_total;
		    lun_count++) {

> 0xffffff00189e2ad0-0xb0/X  (lun total
0xffffff00189e2a20:             0               
> 0xffffff00189e2ad0-0xa0/X  (lun count)
0xffffff00189e2a30:             0           

Given that the logic error in the code is clear from visual inspection, and it is present in the ON gate starting in build 107, I'm redispatching this to the iSCSI catagory.