Hello everyone. I'm not very experienced with iscsi, so please excuse
me if asking stupid questions.
I'm evaluation opensuse 11 (Beta1) for my servers and have found some
worrying errors in the log. The following error quite frequently shows
up.
May 5 21:15:05 manjula klogd: connection1:0: ping timeout of 5 secs
expired, last rx 4626267449, last ping 4626264949, now 4626269949
May 5 21:15:05 manjula klogd: connection1:0: detected conn error
(1011)
May 5 21:15:05 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May 5 21:15:09 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
The target is the infamous Promise m500i and the initiator "transport
class version 2.0-869. iscsid version 2.0-868"
Nic : Broadcom NetXtreme BCM5721 Gigabit Ethernet PCI Express (tg3)
Kernel 2.6.25-rc9-17-default x86_64
There is really no load on this system yet. I used a sniffer to record
the traffic and it seem the target "all the sudden" sends a reset
(frame 65). I'm not sure if this happens at the same time as the error
printed to the log. I have uploaded the capture file here:
http://www.wehay.com/iscsi.cap.gz
To my first stupid question. The target is equipped with 2 x 1GB nics
having address 192.168.43.5 & 6. When I did the discovery I only
pointed to one of these addresses still both where found and used
(magic). Also I noticed that every lun is registered twice (sdc & d
for lun1 etc), is the reason for this performance, fail over or
perhaps both ? I've spotted in the target the following setting "Max
Connections 1" (non-configurable) and was thinking perhaps this reset
was related to this. I have left all configuration to it's defaults
only changing the login information.
swejis wrote: > Hello everyone. I'm not very experienced with iscsi, so please excuse > me if asking stupid questions.
> I'm evaluation opensuse 11 (Beta1) for my servers and have found some > worrying errors in the log. The following error quite frequently shows > up.
> May 5 21:15:05 manjula klogd: connection1:0: ping timeout of 5 secs > expired, last rx 4626267449, last ping 4626264949, now 4626269949 > May 5 21:15:05 manjula klogd: connection1:0: detected conn error > (1011) > May 5 21:15:05 manjula iscsid: Kernel reported iSCSI connection 1:0 > error (1011) state (3) > May 5 21:15:09 manjula iscsid: connection1:0 is operational after > recovery (1 attempts)
> The target is the infamous Promise m500i and the initiator "transport > class version 2.0-869. iscsid version 2.0-868" > Nic : Broadcom NetXtreme BCM5721 Gigabit Ethernet PCI Express (tg3) > Kernel 2.6.25-rc9-17-default x86_64
> There is really no load on this system yet. I used a sniffer to record > the traffic and it seem the target "all the sudden" sends a reset > (frame 65). I'm not sure if this happens at the same time as the error > printed to the log. I have uploaded the capture file here: > http://www.wehay.com/iscsi.cap.gz
> To my first stupid question. The target is equipped with 2 x 1GB nics > having address 192.168.43.5 & 6. When I did the discovery I only > pointed to one of these addresses still both where found and used > (magic). Also I noticed that every lun is registered twice (sdc & d > for lun1 etc), is the reason for this performance, fail over or > perhaps both ? I've spotted in the target the following setting "Max > Connections 1" (non-configurable) and was thinking perhaps this reset > was related to this. I have left all configuration to it's defaults > only changing the login information.
Mike Christie wrote: > swejis wrote: >> Hello everyone. I'm not very experienced with iscsi, so please excuse >> me if asking stupid questions.
>> I'm evaluation opensuse 11 (Beta1) for my servers and have found some >> worrying errors in the log. The following error quite frequently shows >> up.
>> May 5 21:15:05 manjula klogd: connection1:0: ping timeout of 5 secs >> expired, last rx 4626267449, last ping 4626264949, now 4626269949 >> May 5 21:15:05 manjula klogd: connection1:0: detected conn error >> (1011) >> May 5 21:15:05 manjula iscsid: Kernel reported iSCSI connection 1:0 >> error (1011) state (3) >> May 5 21:15:09 manjula iscsid: connection1:0 is operational after >> recovery (1 attempts)
> Could you send me the /boot/config-2.6.whatever-kernel-version-you-are-using
Could you also use the attached patch, and send the output? It should look similar to the above, but I added one more value to be printed out. Thanks.
Mike Christie wrote: > Mike Christie wrote: >> swejis wrote: >>> Hello everyone. I'm not very experienced with iscsi, so please excuse >>> me if asking stupid questions.
>>> I'm evaluation opensuse 11 (Beta1) for my servers and have found some >>> worrying errors in the log. The following error quite frequently shows >>> up.
>>> May 5 21:15:05 manjula klogd: connection1:0: ping timeout of 5 secs >>> expired, last rx 4626267449, last ping 4626264949, now 4626269949 >>> May 5 21:15:05 manjula klogd: connection1:0: detected conn error >>> (1011) >>> May 5 21:15:05 manjula iscsid: Kernel reported iSCSI connection 1:0 >>> error (1011) state (3) >>> May 5 21:15:09 manjula iscsid: connection1:0 is operational after >>> recovery (1 attempts)
>> Could you send me the /boot/config-2.6.whatever-kernel-version-you-are-using
> Could you also use the attached patch, and send the output? It should > look similar to the above, but I added one more value to be printed out. > Thanks.
> May 5 21:15:05 manjula klogd: connection1:0: detected conn error > (1011) > May 5 21:15:05 manjula iscsid: Kernel reported iSCSI connection 1:0 > error (1011) state (3) > May 5 21:15:09 manjula iscsid: connection1:0 is operational after > recovery (1 attempts)
> The target is the infamous Promise m500i and the initiator "transport > class version 2.0-869. iscsid version 2.0-868" > Nic : Broadcom NetXtreme BCM5721 Gigabit Ethernet PCI Express (tg3) > Kernel 2.6.25-rc9-17-default x86_64
> There is really no load on this system yet. I used a sniffer to record > the traffic and it seem the target "all the sudden" sends a reset > (frame 65). I'm not sure if this happens at the same time as the error > printed to the log. I have uploaded the capture file here: > http://www.wehay.com/iscsi.cap.gz
> To my first stupid question. The target is equipped with 2 x 1GB nics > having address 192.168.43.5 & 6. When I did the discovery I only > pointed to one of these addresses still both where found and used > (magic). Also I noticed that every lun is registered twice (sdc & d
This is expected. Normally if you do discovery to one port it will tell you about all of them on the target.
The multiple sd entries is also expected. The scsi layer does not know or care if it can see the same LU from different paths, so it creates a sd everytime it sees it. You could then use multipath to assemble all the sds into one multipath device.
> for lun1 etc), is the reason for this performance, fail over or > perhaps both ? I've spotted in the target the following setting "Max > Connections 1" (non-configurable) and was thinking perhaps this reset > was related to this. I have left all configuration to it's defaults > only changing the login information.
max connections is not used so do not worry about it.
Mike Christie wrote: > swejis wrote: >> Hello everyone. I'm not very experienced with iscsi, so please excuse >> me if asking stupid questions.
>> I'm evaluation opensuse 11 (Beta1) for my servers and have found some >> worrying errors in the log. The following error quite frequently shows >> up.
>> May 5 21:15:05 manjula klogd: connection1:0: ping timeout of 5 secs >> expired, last rx 4626267449, last ping 4626264949, now 4626269949
> It looks like the nop code it broken. If you use
May 6 08:52:22 manjula iscsid: connection2:0 is operational now
May 6 08:52:22 manjula iscsid: connection1:0 is operational now
Should the above changes instead be:
node.conn[1].xxx
node.conn[2].xxx
I still see the same errors though:
May 6 09:07:06 manjula klogd: connection1:0: ping timeout of 5 secs
expired, last rx 4636947795, last ping 4636942795, now 4636950295
May 6 09:07:06 manjula klogd: connection1:0: detected conn error
(1011)
May 6 09:07:07 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May 6 09:07:10 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
May 6 09:08:11 manjula klogd: connection1:0: ping timeout of 5 secs
expired, last rx 4636964045, last ping 4636960295, now 4636966545
May 6 09:08:11 manjula klogd: connection1:0: detected conn error
(1011)
May 6 09:08:12 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May 6 09:08:15 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
May 6 09:08:41 manjula klogd: connection1:0: ping timeout of 5 secs
expired, last rx 4636971545, last ping 4636970295, now 4636974045
May 6 09:08:41 manjula klogd: connection1:0: detected conn error
(1011)
May 6 09:08:42 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May 6 09:08:45 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
May 6 09:09:49 manjula klogd: connection1:0: ping timeout of 5 secs
expired, last rx 4636988378, last ping 4636977795, now 4636990878
May 6 09:09:49 manjula klogd: connection1:0: detected conn error
(1011)
May 6 09:09:49 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May 6 09:09:52 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
May 6 09:10:36 manjula klogd: connection1:0: ping timeout of 5 secs
expired, last rx 4637000296, last ping 4637000296, now 4637002796
May 6 09:10:36 manjula klogd: connection1:0: detected conn error
(1011)
May 6 09:10:37 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May 6 09:10:40 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
There seem to be about one minute between the failures. One
observation, it's always connection1:0 that fails.
Both target ports are connected to the same switch.
Furthermore, are there supposed to as many processes as this ?
Build it with make DEBUG_SCSI=1 make DEBUG_SCSI=1 install
> Furthermore, are there supposed to as many processes as this ?
You are going to get a scsi_eh and a scsi_wq and a iscsi_scan thread per session/target. Some targets do a target per device/LU/LUN and in that case you would see a lot. If you run iscsiadm -m session we can see what is up.