Google Groups Home
Help | Sign in
Connection Errors
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 43 - Collapse all   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
swejis  
View profile
 More options May 5, 4:00 pm
From: swejis <jo...@wehay.com>
Date: Mon, 5 May 2008 13:00:55 -0700 (PDT)
Local: Mon, May 5 2008 4:00 pm
Subject: Connection Errors
Hello everyone. I'm not very experienced with iscsi, so please excuse
me if asking stupid questions.

I'm evaluation opensuse 11 (Beta1) for my servers and have found some
worrying errors in the log. The following error quite frequently shows
up.

May  5 21:15:05 manjula klogd:  connection1:0: ping timeout of 5 secs
expired, last rx 4626267449, last ping 4626264949, now 4626269949
May  5 21:15:05 manjula klogd:  connection1:0: detected conn error
(1011)
May  5 21:15:05 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May  5 21:15:09 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)

The target is the infamous Promise m500i and the initiator "transport
class version 2.0-869. iscsid version 2.0-868"
Nic : Broadcom NetXtreme BCM5721 Gigabit Ethernet PCI Express (tg3)
Kernel  2.6.25-rc9-17-default x86_64

There is really no load on this system yet. I used a sniffer to record
the traffic and it seem the target "all the sudden" sends a reset
(frame 65). I'm not sure if this happens at the same time as the error
printed to the log. I have uploaded the capture file here:
http://www.wehay.com/iscsi.cap.gz

To my first stupid question. The target is equipped with 2 x 1GB nics
having address 192.168.43.5 & 6. When I did the discovery I only
pointed to one of these addresses still both where found  and used
(magic). Also I noticed that every lun is registered twice (sdc & d
for lun1 etc), is the reason for this performance, fail over or
perhaps both ? I've spotted in the target the following setting "Max
Connections 1" (non-configurable) and was thinking perhaps this reset
was related to this. I have left all configuration to it's defaults
only changing the login information.

Config file: http://www.wehay.com/iscsid.conf

I have several older machines accessing the same target without any
problems.

Any ides on what todo next would be most appreciated.
Brgds Jonas Israelsson


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mike Christie  
View profile
 More options May 5, 4:46 pm
From: Mike Christie <micha...@cs.wisc.edu>
Date: Mon, 05 May 2008 15:46:12 -0500
Local: Mon, May 5 2008 4:46 pm
Subject: Re: Connection Errors

Could you send me the /boot/config-2.6.whatever-kernel-version-you-are-using

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mike Christie  
View profile
 More options May 5, 4:57 pm
From: Mike Christie <micha...@cs.wisc.edu>
Date: Mon, 05 May 2008 15:57:59 -0500
Local: Mon, May 5 2008 4:57 pm
Subject: Re: Connection Errors

Could you also do

cat /sys/class/iscsi_connection/connection1:0/recv_tmo
cat /sys/class/iscsi_connection/connection1:0/ping_tmo

and send those values.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mike Christie  
View profile
 More options May 5, 5:11 pm
From: Mike Christie <micha...@cs.wisc.edu>
Date: Mon, 05 May 2008 16:11:35 -0500
Local: Mon, May 5 2008 5:11 pm
Subject: Re: Connection Errors

Could you also use the attached patch, and send the output? It should
look similar to the above, but I added one more value to be printed out.
Thanks.

  print-total-recv-tmo.patch
< 1K Download

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
swejis  
View profile
 More options May 5, 6:08 pm
From: swejis <jo...@wehay.com>
Date: Mon, 5 May 2008 15:08:28 -0700 (PDT)
Local: Mon, May 5 2008 6:08 pm
Subject: Re: Connection Errors
    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
swejis  
View profile
 More options May 5, 6:09 pm
From: swejis <jo...@wehay.com>
Date: Mon, 5 May 2008 15:09:56 -0700 (PDT)
Local: Mon, May 5 2008 6:09 pm
Subject: Re: Connection Errors
# cat /sys/class/iscsi_connection/connection1:0/recv_tmo
5
 # cat /sys/class/iscsi_connection/connection1:0/ping_tmo
5

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mike Christie  
View profile
 More options May 5, 6:47 pm
From: Mike Christie <micha...@cs.wisc.edu>
Date: Mon, 05 May 2008 17:47:18 -0500
Local: Mon, May 5 2008 6:47 pm
Subject: Re: Connection Errors

Ignore that. I forgot suse uses HZ=250.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mike Christie  
View profile
 More options May 5, 6:52 pm
From: Mike Christie <micha...@cs.wisc.edu>
Date: Mon, 05 May 2008 17:52:15 -0500
Local: Mon, May 5 2008 6:52 pm
Subject: Re: Connection Errors

swejis wrote:
> Hello everyone. I'm not very experienced with iscsi, so please excuse
> me if asking stupid questions.

> I'm evaluation opensuse 11 (Beta1) for my servers and have found some
> worrying errors in the log. The following error quite frequently shows
> up.

> May  5 21:15:05 manjula klogd:  connection1:0: ping timeout of 5 secs
> expired, last rx 4626267449, last ping 4626264949, now 4626269949

It looks like the nop code it broken. If you use

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

Set that for each portal or set it in your /etc/iscsi/iscsi.conf file
and then redo discovery.

This is expected. Normally if you do discovery to one port it will tell
you about all of them on the target.

The multiple sd entries is also expected. The scsi layer does not know
or care if it can see the same LU from different paths, so it creates a
sd everytime it sees it. You could then use multipath to assemble all
the sds into one multipath device.

> for lun1 etc), is the reason for this performance, fail over or
> perhaps both ? I've spotted in the target the following setting "Max
> Connections 1" (non-configurable) and was thinking perhaps this reset
> was related to this. I have left all configuration to it's defaults
> only changing the login information.

max connections is not used so do not worry about it.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mike Christie  
View profile
 More options May 5, 10:36 pm
From: Mike Christie <micha...@cs.wisc.edu>
Date: Mon, 05 May 2008 21:36:31 -0500
Local: Mon, May 5 2008 10:36 pm
Subject: Re: Connection Errors

If that works could you try the kernel modules and tools in this tarball?
http://open-iscsi.org/bits/open-iscsi-2.0-869.1.test1.tar.gz

Renable nops by setting:

node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5

back to five, then retry the test. This should fix the root problem
where we drop the session when we wanted to test it.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
swejis  
View profile
 More options May 6, 3:17 am
From: swejis <jo...@wehay.com>
Date: Tue, 6 May 2008 00:17:32 -0700 (PDT)
Local: Tues, May 6 2008 3:17 am
Subject: Re: Connection Errors
I have now changed the following, correct ?

## node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_interval = 0
node.conn[1].timeo.noop_out_interval = 0

# node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].timeo.noop_out_timeout = 0
node.conn[1].timeo.noop_out_timeout = 0

I noticed this:

May  6 08:52:22 manjula iscsid: connection2:0 is operational now
May  6 08:52:22 manjula iscsid: connection1:0 is operational now

Should the above changes instead be:

node.conn[1].xxx
node.conn[2].xxx

I still see the same errors though:

May  6 09:07:06 manjula klogd:  connection1:0: ping timeout of 5 secs
expired, last rx 4636947795, last ping 4636942795, now 4636950295
May  6 09:07:06 manjula klogd:  connection1:0: detected conn error
(1011)
May  6 09:07:07 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May  6 09:07:10 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
May  6 09:08:11 manjula klogd:  connection1:0: ping timeout of 5 secs
expired, last rx 4636964045, last ping 4636960295, now 4636966545
May  6 09:08:11 manjula klogd:  connection1:0: detected conn error
(1011)
May  6 09:08:12 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May  6 09:08:15 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
May  6 09:08:41 manjula klogd:  connection1:0: ping timeout of 5 secs
expired, last rx 4636971545, last ping 4636970295, now 4636974045
May  6 09:08:41 manjula klogd:  connection1:0: detected conn error
(1011)
May  6 09:08:42 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May  6 09:08:45 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
May  6 09:09:49 manjula klogd:  connection1:0: ping timeout of 5 secs
expired, last rx 4636988378, last ping 4636977795, now 4636990878
May  6 09:09:49 manjula klogd:  connection1:0: detected conn error
(1011)
May  6 09:09:49 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May  6 09:09:52 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)
May  6 09:10:36 manjula klogd:  connection1:0: ping timeout of 5 secs
expired, last rx 4637000296, last ping 4637000296, now 4637002796
May  6 09:10:36 manjula klogd:  connection1:0: detected conn error
(1011)
May  6 09:10:37 manjula iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May  6 09:10:40 manjula iscsid: connection1:0 is operational after
recovery (1 attempts)

There seem to be about one minute between the failures. One
observation, it's always connection1:0 that fails.
Both target ports are connected to the same switch.

Furthermore, are there supposed to as many processes as this ?

root       104     2  0 Apr20 ?        00:00:00 [scsi_eh_0]
root       124     2  0 Apr20 ?        00:00:00 [scsi_eh_1]
root       125     2  0 Apr20 ?        00:00:00 [scsi_eh_2]
root       126     2  0 Apr20 ?        00:00:00 [scsi_eh_3]
root       127     2  0 Apr20 ?        00:00:00 [scsi_eh_4]
root       134     2  0 Apr20 ?        00:00:00 [scsi_eh_5]
root       135     2  0 Apr20 ?        00:00:00 [scsi_eh_6]
root      9580     2  0 May05 ?        00:00:00 [scsi_eh_24]
root      9581     2  0 May05 ?        00:00:00 [scsi_wq_24]
root      9583     2  0 May05 ?        00:00:00 [scsi_eh_25]
root      9584     2  0 May05 ?        00:00:00 [scsi_wq_25]
root     16981     2  0 08:35 ?        00:00:00 [scsi_eh_26]
root     16982     2  0 08:35 ?        00:00:00 [scsi_wq_26]
root     16984     2  0 08:35 ?        00:00:00 [scsi_eh_27]
root     16985     2  0 08:35 ?        00:00:00 [scsi_wq_27]
root     18289     2  0 Apr21 ?        00:00:00 [scsi_eh_10]
root     18290     2  0 Apr21 ?        00:00:00 [scsi_wq_10]
root     18369     2  0 Apr21 ?        00:00:00 [scsi_eh_11]
root     18370     2  0 Apr21 ?        00:00:00 [scsi_wq_11]
root     18448     2  0 08:52 ?        00:00:00 [iscsi_eh]
root     18517     1  0 08:52 ?        00:00:00 /sbin/iscsid -c /etc/
iscsi/iscsid.conf -p /var/run/iscsi.pid
root     18518     1  0 08:52 ?        00:00:00 /sbin/iscsid -c /etc/
iscsi/iscsid.conf -p /var/run/iscsi.pid
root     18547     2  0 08:52 ?        00:00:00 [scsi_eh_28]
root     18548     2  0 08:52 ?        00:00:00 [scsi_wq_28]
root     18549     2  0 08:52 ?        00:00:00 [iscsi_scan_28]
root     18550     2  0 08:52 ?        00:00:00 [scsi_eh_29]
root     18551     2  0 08:52 ?        00:00:00 [scsi_wq_29]
root     18552     2  0 08:52 ?        00:00:00 [iscsi_scan_29]
root     18719     2  0 Apr21 ?        00:00:00 [scsi_eh_12]
root     18720     2  0 Apr21 ?        00:00:02 [scsi_wq_12]
root     18722     2  0 Apr21 ?        00:00:00 [scsi_eh_13]
root     18723     2  0 Apr21 ?        00:00:00 [scsi_wq_13]
root     19272 15942  0 09:15 pts/0    00:00:00 grep scsi
root     25657     2  0 May01 ?        00:00:00 [scsi_eh_14]
root     25659     2  0 May01 ?        00:00:00 [scsi_wq_14]
root     25663     2  0 May01 ?        00:00:00 [scsi_eh_15]
root     25664     2  0 May01 ?        00:00:00 [scsi_wq_15]
root     26038     2  0 May01 ?        00:00:00 [scsi_eh_16]
root     26039     2  0 May01 ?        00:00:00 [scsi_wq_16]
root     26041     2  0 May01 ?        00:00:00 [scsi_eh_17]
root     26042     2  0 May01 ?        00:00:00 [scsi_wq_17]
root     26390     2  0 May01 ?        00:00:00 [scsi_eh_18]
root     26392     2  0 May01 ?        00:00:00 [scsi_wq_18]
root     26427     2  0 May01 ?        00:00:00 [scsi_eh_19]
root     26428     2  0 May01 ?        00:00:00 [scsi_wq_19]
root     26763     2  0 May01 ?        00:00:00 [scsi_eh_20]
root     26764     2  0 May01 ?        00:00:00 [scsi_wq_20]
root     26766     2  0 May01 ?        00:00:00 [scsi_eh_21]
root     26767     2  0 May01 ?        00:00:00 [scsi_wq_21]
root     27122     2  0 May01 ?        00:00:00 [scsi_eh_22]
root     27123     2  0 May01 ?        00:00:00 [scsi_wq_22]
root     27125     2  0 May01 ?        00:00:00 [scsi_eh_23]
root     27126     2  0 May01 ?        00:00:00 [scsi_wq_23]

TIA
// Jonas


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mike Christie  
View profile
 More options May 6, 11:43 am
From: Mike Christie <micha...@cs.wisc.edu>
Date: Tue, 06 May 2008 10:43:29 -0500
Local: Tues, May 6 2008 11:43 am
Subject: Re: Connection Errors

No. The first numnber in connectionX:Y is the session number.

> I still see the same errors though:

> May  6 09:07:06 manjula klogd:  connection1:0: ping timeout of 5 secs
> expired, last rx 4636947795, last ping 4636942795, now 4636950295

It looks the value did not get picked up. Forget I asked you to do this
ok? We do not need it.

Could you just try
http://open-iscsi.org/bits/open-iscsi-2.0-869.1.test1.tar.gz

Remove the old iscsi tools (I think in suse the package is named
open-iscsi).
Do

rpm -e open-iscsi

Now build this test package
http://open-iscsi.org/bits/open-iscsi-2.0-869.1.test1.tar.gz
with extra debugging:

Build it with
make DEBUG_SCSI=1
make DEBUG_SCSI=1 install

> Furthermore, are there supposed to as many processes as this ?

You are going to get a scsi_eh and a scsi_wq and a iscsi_scan thread per
session/target. Some targets do a target per device/LU/LUN and in that
case you would see a lot. If you run iscsiadm -m session we can see what
is up.

    Reply to author    Forward