iscsid: stop connection for recovery if error is not timeout in iscsi_login_eh

When iscsid is reopening a connection, and the reopen process has succeed
to call bind_conn and comes to iscsi_session_set_params() to set
parameters. If the iscsi target trigger another error event(such as
close the socket connection between initiator and target) at this time,
kernel would perform the error handler and set connection's state to
ISCSI_CONN_FAILED, and set kernel iscsi_cls_conn->flags'
ISCSI_CLS_CONN_BIT_CLEANUP bit. Which would make iscsid's
iscsi_session_set_params() failed with ENOTCONN, so iscsi_login_eh()
would be called by iscsid to handle this error.

Now iscsid see conn->state is ISCSI_CONN_STATE_XPT_WAIT and
session->r_stage is R_STAGE_SESSION_REOPEN, so it would call
session_conn_reopen() with do_stop set to 0, which would not trigger
kernel to call iscsi_if_stop_conn() to clear kernel data struct
iscsi_cls_conn->flags' ISCSI_CLS_CONN_BIT_CLEANUP bit.

The reopen would fall into an infinite cycle which looks like
following:

iscsi_conn_connect -> bind_conn(failed with ENOTCONN)

         ^                     |
         |                     |
         |                     v

    session_conn_reopwn(with do_stop set to 0)

The phenomenon is iscsid would always report log "can't bind conn x:0
to session x, retcode -107 (115)" and the session would not recovery.

Fix this issue by checking error type in iscsi_login_eh(), if the error
type is not timeout, make sure we would call session_conn_reopen() with
do_stop set to STOP_CONN_RECOVER.

Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
This commit is contained in:
Wenchao Hao 2023-01-17 17:27:06 +08:00
parent 3d3a92976e
commit 021ba3614f
2 changed files with 73 additions and 2 deletions

View File

@ -0,0 +1,68 @@
From 9f2074568e6c39f85c9d948cb3b869f4fc774695 Mon Sep 17 00:00:00 2001
From: Wenchao Hao <73930449+wenchao-hao@users.noreply.github.com>
Date: Thu, 12 Jan 2023 11:10:05 +0800
Subject: iscsid: stop connection for recovery if error is not
timeout in iscsi_login_eh (#388)
When iscsid is reopening a connection, and the reopen process has succeed
to call bind_conn and comes to iscsi_session_set_params() to set
parameters. If the iscsi target trigger another error event(such as
close the socket connection between initiator and target) at this time,
kernel would perform the error handler and set connection's state to
ISCSI_CONN_FAILED, and set kernel iscsi_cls_conn->flags'
ISCSI_CLS_CONN_BIT_CLEANUP bit. Which would make iscsid's
iscsi_session_set_params() failed with ENOTCONN, so iscsi_login_eh()
would be called by iscsid to handle this error.
Now iscsid see conn->state is ISCSI_CONN_STATE_XPT_WAIT and
session->r_stage is R_STAGE_SESSION_REOPEN, so it would call
session_conn_reopen() with do_stop set to 0, which would not trigger
kernel to call iscsi_if_stop_conn() to clear kernel data struct
iscsi_cls_conn->flags' ISCSI_CLS_CONN_BIT_CLEANUP bit.
The reopen would fall into an infinite cycle which looks like
following:
iscsi_conn_connect -> bind_conn(failed with ENOTCONN)
^ |
| |
| v
session_conn_reopwn(with do_stop set to 0)
The phenomenon is iscsid would always report log "can't bind conn x:0
to session x, retcode -107 (115)" and the session would not recovery.
Fix this issue by checking error type in iscsi_login_eh(), if the error
type is not timeout, make sure we would call session_conn_reopen() with
do_stop set to STOP_CONN_RECOVER.
Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
---
usr/initiator.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/usr/initiator.c b/usr/initiator.c
index 56bf38b..9c48dd5 100644
--- a/usr/initiator.c
+++ b/usr/initiator.c
@@ -735,8 +735,13 @@ static void iscsi_login_eh(struct iscsi_conn *conn, struct queue_task *qtask,
session_conn_shutdown(conn, qtask, err);
break;
}
- /* timeout during reopen connect. try again */
- session_conn_reopen(conn, qtask, 0);
+ /*
+ * stop connection for recovery if error is not
+ * timeout
+ */
+ if (err != ISCSI_ERR_TRANS_TIMEOUT)
+ stop_flag = STOP_CONN_RECOVER;
+ session_conn_reopen(conn, qtask, stop_flag);
break;
case R_STAGE_SESSION_CLEANUP:
session_conn_shutdown(conn, qtask, err);
--
2.35.3

View File

@ -4,7 +4,7 @@
Name: open-iscsi Name: open-iscsi
Version: 2.1.5 Version: 2.1.5
Release: 11 Release: 12
Summary: ISCSI software initiator daemon and utility programs Summary: ISCSI software initiator daemon and utility programs
License: GPLv2+ and BSD License: GPLv2+ and BSD
URL: http://www.open-iscsi.com URL: http://www.open-iscsi.com
@ -35,7 +35,7 @@ patch23: 0023-Remove-unused-fwparam_ibft.-ch-files-in-fwparam_ibft.patch
patch24: 0024-Fix-a-possible-passing-null-pointer-in-usr-iface.c-3.patch patch24: 0024-Fix-a-possible-passing-null-pointer-in-usr-iface.c-3.patch
patch25: 0025-iscsid-iscsiuio-fix-OOM-adjustment-377.patch patch25: 0025-iscsid-iscsiuio-fix-OOM-adjustment-377.patch
patch26: 0026-iscsid-clear-scanning-thread-s-PR_SET_IO_FLUSHER-fla.patch patch26: 0026-iscsid-clear-scanning-thread-s-PR_SET_IO_FLUSHER-fla.patch
patch27: 0027-iscsid-stop-connection-for-recovery-if-error-is-not-.patch
BuildRequires: flex bison doxygen kmod-devel systemd-units gcc git isns-utils-devel systemd-devel BuildRequires: flex bison doxygen kmod-devel systemd-units gcc git isns-utils-devel systemd-devel
BuildRequires: autoconf automake libtool libmount-devel openssl-devel pkg-config BuildRequires: autoconf automake libtool libmount-devel openssl-devel pkg-config
@ -162,6 +162,9 @@ fi
%{_mandir}/man8/* %{_mandir}/man8/*
%changelog %changelog
* Tue Jan 17 2023 haowenchao <haowenchao@huawei.com> - 2.1.5-12
- iscsid: stop connection for recovery if error is not timeout in iscsi_login_eh
* Tue Jan 17 2023 haowenchao <haowenchao@huawei.com> - 2.1.5-11 * Tue Jan 17 2023 haowenchao <haowenchao@huawei.com> - 2.1.5-11
- iscsid: clear scanning thread's PR_SET_IO_FLUSHER flag - iscsid: clear scanning thread's PR_SET_IO_FLUSHER flag