Hello, I'm interested to see if anybody has a working configuration for open-iscsi initiator with a wasabi storage builder target.
We have a few Wasabi Storage Builder 1.6 targets, which we've used for years with linux-iscsi and core-iscsi initiators. Recently we decided to repurpose them, and use a newer OS for the initiator role. Unfortunately, we can't seem to get it going properly.
The targets are detected correctly, we're able to see the target LUNs, but any attempt to write more than a few megabytes of data fails. Initially the target would show us:
--- thread main:target.c:kq_conn:1610: ***ERROR*** Detected packet with illegal packet size 16777215 (max is 8192) from iqn.2008-06.draugluin,i,00023d060000 to iqn.2000-05.com.wasabisystems.storagebuilder:iscsi-6-0,t,0032 cid 0 PDU was number 77 on the connection, and started at byte 12352 ---
Setting this option helped a little bit, but now we get kernel bug errors: node.conn[0].iscsi.MaxRecvDataSegmentLength = 8192
Logs on the initiator side logs showed us initially:
--- Jun 26 11:06:09 draugluin iscsid: Nop-out timedout after 15 seconds on connection 2:0 state (3). Dropping session. Jun 26 11:06:09 draugluin iscsid: Nop-out timedout after 15 seconds on connection 3:0 state (3). Dropping session. Jun 26 11:06:09 draugluin iscsid: Nop-out timedout after 15 seconds on connection 1:0 state (3). Dropping session. Jun 26 11:06:10 draugluin iscsid: connection3:0 is operational after recovery (1 attempts) Jun 26 11:06:10 draugluin iscsid: connection2:0 is operational after recovery (1 attempts) Jun 26 11:06:10 draugluin iscsid: connection1:0 is operational after recovery (1 attempts) Jun 26 11:06:18 draugluin iscsid: Nop-out timedout after 15 seconds on connection 6:0 state (3). Dropping session. Jun 26 11:06:19 draugluin iscsid: connection6:0 is operational after recovery (1 attempts) Jun 26 11:06:25 draugluin iscsid: Nop-out timedout after 15 seconds on connection 5:0 state (3). Dropping session. Jun 26 11:06:25 draugluin iscsid: connection5:0 is operational after recovery (1 attempts) Jun 26 11:06:45 draugluin iscsid: Nop-out timedout after 15 seconds on connection 6:0 state (3). Dropping session. Jun 26 11:06:45 draugluin iscsid: connection6:0 is operational after recovery (1 attempts) ---
Right now writing any larger amount of data [more than a few kilobytes] results in: --- Jun 26 14:55:33 draugluin kernel: ======================= Jun 26 14:55:33 draugluin kernel: BUG: workqueue leaked lock or atomic: scsi_wq_5/0x00000001/5316 Jun 26 14:55:33 draugluin kernel: last function: iscsi_xmitworker+0x0/0x584 [libiscsi] Jun 26 14:55:33 draugluin kernel: [<c013229c>] run_workqueue+0xcf/0x114 Jun 26 14:55:33 draugluin kernel: [<c0132a5e>] worker_thread+0x0/0xca Jun 26 14:55:33 draugluin kernel: [<c0132b1a>] worker_thread+0xbc/0xca Jun 26 14:55:33 draugluin kernel: [<c01351b9>] autoremove_wake_function+0x0/0x33 Jun 26 14:55:33 draugluin kernel: [<c0132a5e>] worker_thread+0x0/0xca Jun 26 14:55:33 draugluin kernel: [<c01350f2>] kthread+0x38/0x5e Jun 26 14:55:33 draugluin kernel: [<c01350ba>] kthread+0x0/0x5e Jun 26 14:55:33 draugluin kernel: [<c0106117>] kernel_thread_helper+0x7/0x10 Jun 26 14:55:33 draugluin kernel: ======================= ---
We're using OpenSUSE 10.3 with open-iscsi-2.0.866-15.2 and 2.6.22.18-0.2-default SMP i686 kernel. [OpenSUSE 11 with open-iscsi-2.0.869-8.1 had another set of issues, we couldn't even login properly].
I would appreciate any suggestions as to how to get the initiator working with these targets.
sincerely, -- Dominik L. Borkowski - Senior Systems Administrator Virginia Bioinformatics Institute - www.vbi.vt.edu
> We're using OpenSUSE 10.3 with open-iscsi-2.0.866-15.2 and > 2.6.22.18-0.2-default SMP i686 kernel. [OpenSUSE 11 with > open-iscsi-2.0.869-8.1 had another set of issues, we couldn't even login > properly].
Weird. We had this guy that tested the heck with wasabi, and had not seen any bug reports about it for a long time. Now in this last week we have two or three bug reports on it.
Has wasabi done a firmware update or are you using a older firmware?
I am ccing another users that reported issues with wasabi the other day. Ken, what firmware are you using?
There definately seems to be an issue with the initiator sending nops. For Ken's problem we turned them off and that fixed the flood of these:
Jun 26 11:06:18 draugluin iscsid: Nop-out timedout after 15 seconds on connection 6:0 state (3). Dropping session. Jun 26 11:06:19 draugluin iscsid: connection6:0 is operational after recovery
and the slow down that results.
But Ken still has a slown down in IO.
Dominik, to turn off nops set node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0
You can set that in iscsi.conf then redo discovery to pick them up.
But I would like to get the bottom of the problem.
Could you guys try http://www.open-iscsi.org/bits/open-iscsi-2.0-869.2.tar.gz With nops back on (set those noop values to 10 for testing). Then could you send me a wireshark/ethereal trace? I do not need a lot of data just when you see the first
Jun 26 11:06:18 draugluin iscsid: Nop-out timedout after 15 seconds on connection 6:0 state (3). Dropping session. Jun 26 11:06:19 draugluin iscsid: connection6:0 is operational after recovery
On Friday 27 June 2008 11:44:08 Mike Christie wrote:
> Dominik L. Borkowski wrote: > > We're using OpenSUSE 10.3 with open-iscsi-2.0.866-15.2 and > > 2.6.22.18-0.2-default SMP i686 kernel. [OpenSUSE 11 with > > open-iscsi-2.0.869-8.1 had another set of issues, we couldn't even login > > properly].
> Weird. We had this guy that tested the heck with wasabi, and had not > seen any bug reports about it for a long time. Now in this last week we > have two or three bug reports on it.
> Has wasabi done a firmware update or are you using a older firmware?
We use the following firmware:
Current WSB iSCSI SAN v1.6.2.1 5565245 Wed Apr 5 01:04:03 UTC 2006
Hasn't been updated in awhile, and I'm not sure what the latest firmware is. I'm contacting the vendor to see whether there is anything newer available.
Mike Christie wrote: > Dominik L. Borkowski wrote:
>> We're using OpenSUSE 10.3 with open-iscsi-2.0.866-15.2 and >> 2.6.22.18-0.2-default SMP i686 kernel. [OpenSUSE 11 with >> open-iscsi-2.0.869-8.1 had another set of issues, we couldn't even >> login properly].
> Weird. We had this guy that tested the heck with wasabi, and had not > seen any bug reports about it for a long time. Now in this last week we > have two or three bug reports on it.
> Has wasabi done a firmware update or are you using a older firmware?
> I am ccing another users that reported issues with wasabi the other day. > Ken, what firmware are you using?
WSB iSCSI SAN v2.3.0.1 7272641 Wed Apr 25 21:58:22 UTC 2007
I'll send a debug log or trace or both shortly.
The nops do slow things down, but the big issue seems to be the initiator system lockup when connecting to the Wasabi.
fwiw, I tried connecting to an IET target and I don't experience the system freeze/lockup problem.
> There definately seems to be an issue with the initiator sending nops. > For Ken's problem we turned them off and that fixed the flood of these:
> Jun 26 11:06:18 draugluin iscsid: Nop-out timedout after 15 seconds on > connection 6:0 state (3). Dropping session. > Jun 26 11:06:19 draugluin iscsid: connection6:0 is operational after > recovery
> and the slow down that results.
> But Ken still has a slown down in IO.
> Dominik, to turn off nops set > node.conn[0].timeo.noop_out_interval = 0 > node.conn[0].timeo.noop_out_timeout = 0
> You can set that in iscsi.conf then redo discovery to pick them up.
> But I would like to get the bottom of the problem.
> Could you guys try > http://www.open-iscsi.org/bits/open-iscsi-2.0-869.2.tar.gz > With nops back on (set those noop values to 10 for testing). Then could > you send me a wireshark/ethereal trace? I do not need a lot of data just > when you see the first
> Jun 26 11:06:18 draugluin iscsid: Nop-out timedout after 15 seconds on > connection 6:0 state (3). Dropping session. > Jun 26 11:06:19 draugluin iscsid: connection6:0 is operational after > recovery
Attached are the last 3438 lines before the initiator machine locked up and had to be hard rebooted. The attached log ends on the last line. The Wasabi didn't log anything other than a Login.
Using open-iscsi-2.0-869.2.tar.gz and make DEBUG_SCSI=1 DEBUG_TCP=1 make DEBUG_SCSI=1 DEBUG_TCP=1 install
The iscsd.conf is the default with the noops options set to 10 sec.
Kernel 2.6.25.6-55.fc9.i686
Connecting to Wasabi Storage Builder WSB iSCSI SAN v2.3.0.1 7272641 Wed Apr 25 21:58:22 UTC 2007
Mike Christie wrote: > Dominik L. Borkowski wrote:
>> We're using OpenSUSE 10.3 with open-iscsi-2.0.866-15.2 and >> 2.6.22.18-0.2-default SMP i686 kernel. [OpenSUSE 11 with >> open-iscsi-2.0.869-8.1 had another set of issues, we couldn't even >> login properly].
> Weird. We had this guy that tested the heck with wasabi, and had not > seen any bug reports about it for a long time. Now in this last week we > have two or three bug reports on it.
> Has wasabi done a firmware update or are you using a older firmware?
> I am ccing another users that reported issues with wasabi the other day. > Ken, what firmware are you using?
> There definately seems to be an issue with the initiator sending nops. > For Ken's problem we turned them off and that fixed the flood of these:
> Jun 26 11:06:18 draugluin iscsid: Nop-out timedout after 15 seconds on > connection 6:0 state (3). Dropping session. > Jun 26 11:06:19 draugluin iscsid: connection6:0 is operational after > recovery
> and the slow down that results.
> But Ken still has a slown down in IO.
> Dominik, to turn off nops set > node.conn[0].timeo.noop_out_interval = 0 > node.conn[0].timeo.noop_out_timeout = 0
> You can set that in iscsi.conf then redo discovery to pick them up.
> But I would like to get the bottom of the problem.
> Could you guys try > http://www.open-iscsi.org/bits/open-iscsi-2.0-869.2.tar.gz > With nops back on (set those noop values to 10 for testing). Then could > you send me a wireshark/ethereal trace? I do not need a lot of data just > when you see the first
> Jun 26 11:06:18 draugluin iscsid: Nop-out timedout after 15 seconds on > connection 6:0 state (3). Dropping session. > Jun 26 11:06:19 draugluin iscsid: connection6:0 is operational after > recovery
Dominik L. Borkowski wrote: > On Friday 27 June 2008 11:44:08 Mike Christie wrote: >> Dominik L. Borkowski wrote: >>> We're using OpenSUSE 10.3 with open-iscsi-2.0.866-15.2 and >>> 2.6.22.18-0.2-default SMP i686 kernel. [OpenSUSE 11 with >>> open-iscsi-2.0.869-8.1 had another set of issues, we couldn't even login >>> properly]. >> Weird. We had this guy that tested the heck with wasabi, and had not >> seen any bug reports about it for a long time. Now in this last week we >> have two or three bug reports on it.
>> Has wasabi done a firmware update or are you using a older firmware?
> We use the following firmware:
> Current WSB iSCSI SAN v1.6.2.1 5565245 Wed Apr 5 01:04:03 UTC 2006
> Hasn't been updated in awhile, and I'm not sure what the latest firmware is. > I'm contacting the vendor to see whether there is anything newer available.
In that archive I included tcpdump, sample script session of what commands were issued, initiator configs and the kernel logs.
Not sure if joining this thread with Ken's would be a good idea. Somehow I didn't notice his, when I was searching the archive. However, a lot of his symptoms fit my problem, including the os completely freezing.
Dominik L. Borkowski wrote: > On Friday 27 June 2008 15:48:13 Mike Christie wrote: >>> Would it be worth getting sniffer dump from the existing 2.0-866 >>> initiator?
> In that archive I included tcpdump, sample script session of what commands > were issued, initiator configs and the kernel logs.
> Not sure if joining this thread with Ken's would be a good idea. Somehow I > didn't notice his, when I was searching the archive. However, a lot of his > symptoms fit my problem, including the os completely freezing.
Jon France at Wasabi looked into the issue, and he thinks this is fixed in newer firmware. He asked you guys to contact Wasabi to get a firmware update.
On Tuesday 01 July 2008 12:20:32 Mike Christie wrote:
> Jon France at Wasabi looked into the issue, and he thinks this is fixed > in newer firmware. He asked you guys to contact Wasabi to get a firmware > update.
Yep, we've been in touch, just waiting to work out some basic details. We'll see how the latest firmware 4.0.2 will behave with openiscsi.