56 lines
3.5 KiB
Diff
56 lines
3.5 KiB
Diff
From dc7560d404857c0540caed2f71f8e7c2e7307ab3 Mon Sep 17 00:00:00 2001
|
|
From: gulams <64251312+gulams@users.noreply.github.com>
|
|
Date: Tue, 28 Apr 2020 13:03:50 +0530
|
|
Subject: [PATCH 162/170] Proper disconnect of TCP connection
|
|
|
|
1. Due to configuration issues, the logins from iSCSI initiator were getting rejected by the target
|
|
2. The initiator was retrying the login again and again
|
|
3. Each time the initiator tries to log in, the host number gets incremented by 1
|
|
4. At one point of time, the host number reached 65535
|
|
5. During the login process, once the TCP connection is established, the initiator tries to set the host parameters for the network interface if its not the default interface
|
|
6. While setting these host parameters, it will try to do a lookup of the host based upon the host number
|
|
7. The host number in "iscsi_uevent" structure is uint32_t. This is given as an argument to the scsi_host_lookup() function
|
|
8. This scsi_host_lookup() function takes it as unsigned short. So, when it receives the host number above 65535, the value is wrapped and starts from 0 again
|
|
9. Thus the incorrect value of host number is received by the scsi_host_lookup() function and hence it returns with error that the host is not existing in the list
|
|
10. Due to this "host not found error", the open-iscsi will retry this particular connection again and again
|
|
11. In this each retry, it will disconnect and then connect again with the same connection pointer, i.e it re-opens the connection multiple times till 120 seconds timeout
|
|
12. During these 120 seconds, observed that its trying to re-open the connection aroung 400+ times with each time disconnect and connect
|
|
13. After 120 seconds, the connection and session will be destroyed
|
|
14. So, while doing multiple retries of connect and disconnect during the 120 seconds, when the connect is successful it will try to bind the connection to the session
|
|
15. When it binds the connection and session, the reference count for the socket is incremented
|
|
16. When it disconnects, its trying to close the socket with close(sockfd) system call
|
|
17. This close() system call is entering into the kernel and NOT going forward till the networking layrer to call tcp_close() to send the FIN packet to the target
|
|
18. Its not going till tcp_close() because the reference count of the socket is still 1
|
|
19. So, the initiator is not sending the FIN packet to target and hence target is timing out and sending FIN after its timeout. This happens for all the retries (400+)
|
|
20. At some point, when this FIN packet is received by the initiator, the connection was destroyed and the memory was re-used for some other purpose and hence we see the panic
|
|
|
|
Fix:
|
|
==
|
|
Fix is to decrement the reference count of the socket fd after disconnect by calling the stop connection
|
|
|
|
Corrected the indentation for the change in the function iscsi_login_eh()
|
|
---
|
|
usr/initiator.c | 6 +-----
|
|
1 file changed, 1 insertion(+), 5 deletions(-)
|
|
|
|
diff --git a/usr/initiator.c b/usr/initiator.c
|
|
index a07f9aa..5f4bdca 100644
|
|
--- a/usr/initiator.c
|
|
+++ b/usr/initiator.c
|
|
@@ -711,11 +711,7 @@ static void iscsi_login_eh(struct iscsi_conn *conn, struct queue_task *qtask,
|
|
!iscsi_retry_initial_login(conn))
|
|
session_conn_shutdown(conn, qtask, err);
|
|
else {
|
|
- session->reopen_cnt++;
|
|
- session->t->template->ep_disconnect(conn);
|
|
- if (iscsi_conn_connect(conn, qtask))
|
|
- queue_delayed_reopen(qtask,
|
|
- ISCSI_CONN_ERR_REOPEN_DELAY);
|
|
+ session_conn_reopen(conn, qtask, STOP_CONN_TERM);
|
|
}
|
|
break;
|
|
case R_STAGE_SESSION_REDIRECT:
|
|
--
|
|
2.21.1 (Apple Git-122.3)
|
|
|