initiator panic during iSER test run, at mpxio enable reboot to enable multi-pathing
panic on cpu 0
panic string: BAD TRAP: type=d (#gp General protection) rp=ffffff00189e2910 addr=0
Work Around
N/A
Comments
The underlying problem is a bug in the logic for the outer loop of iscsi_sess_report luns (see eval). There are 0 luns reported in the report-luns data which may have contributed to the problem.
> 0xffffff00189e2ad0-0x70 ::print struct uscsi_cmd
{
uscsi_flags = 0x8
uscsi_status = 0
uscsi_timeout = 0x3c
uscsi_cdb = 0xffffff00189e2a50
uscsi_bufaddr = 0xffffff0468e488b0
uscsi_buflen = 0x84
uscsi_resid = 0x3c
uscsi_cdblen = 0xc
uscsi_rqlen = 0
uscsi_rqstatus = 0
uscsi_rqresid = 0
uscsi_rqbuf = 0
uscsi_path_instance = 0
}
> 0xffffff00189e2a50,12 ::dump -g 1
\/ 1 2 3 4 5 6 7 8 9 a b c d e f v123456789a
bcdef
ffffff00189e2a50: a0 00 00 00 00 00 00 00 00 84 00 00 00 00 ff ff
................
ffffff00189e2a60: 08 00 00 00 00 00 3c 00 50 2a 9e 18 00 ff ff ff
......<.P*......
*** Note here that the initiator sets "select report" to 0, which defines the following behavior (from ANSI SPC): "The list shall contain the logical units accessible to the I_T nexus .... If there are no logical units, the LUN LIST LENGTH field shall be zero".
The data buffer shows 0 luns:
> 0xffffff0468e488b0,0x84 ::dump -g 1
\/ 1 2 3 4 5 6 7 8 9 a b c d e f v123456789a
bcdef
ffffff0468e488b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
ffffff0468e488c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
ffffff0468e488d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
ffffff0468e488e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
ffffff0468e488f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
ffffff0468e48900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
ffffff0468e48910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
ffffff0468e48920: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
ffffff0468e48930: 00 00 00 00 bb ca dd ba fe ca dd ba fe ca dd ba
................
which according to the spec is valid and simply means that no luns are mapped. Given the test case and the fact that the iscsi_sess_t shows several luns on the lun list, I'm suspicious that COMSTAR is not giving us exactly the right data here but in any case it should not cause us to panic. Local variables that I pulled out of the stackl appear consistent with the data above:
for (lun_count = lun_start; lun_count < lun_total;
lun_count++) {
> 0xffffff00189e2ad0-0xb0/X (lun total
0xffffff00189e2a20: 0
> 0xffffff00189e2ad0-0xa0/X (lun count)
0xffffff00189e2a30: 0
Given that the logic error in the code is clear from visual inspection, and it is present in the ON gate starting in build 107, I'm redispatching this to the iSCSI catagory.