libhns: Bugfixes and one debug improvement

The last commit was found when I created a XRC SRQ in
lock-free mode but failed to destroy it because of the
refcnt check added in the previous commit.

The failure was because the PAD was acquired through
ibv_srq->pd in destroy_srq(), while ibv_srq->pd wasn't
assigned when the SRQ was created by ibv_create_srq_ex().
So let's assign ibv_srq->pd in the common ibv_icmd_create_srq() ,
so that drivers can get the correct pd no matter
which api the SRQ is created by.

Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com>
(cherry picked from commit 3ac30fc125c7cff122f21ff8593294060c92429f)
This commit is contained in:
Xinghai Cen 2025-04-28 11:32:16 +08:00 committed by openeuler-sync-bot
parent b1efe2238a
commit c35fab9925
6 changed files with 340 additions and 1 deletions

View File

@ -0,0 +1,59 @@
From 40c7b406829bc1250d93af527d70836e02c1fbac Mon Sep 17 00:00:00 2001
From: Junxian Huang <huangjunxian6@hisilicon.com>
Date: Thu, 24 Apr 2025 20:32:12 +0800
Subject: [PATCH 58/62] libhns: Add debug log for lock-free mode
mainline inclusion
from mainline-v56.0-65
commit fb96940fcf6f96185d407d57bcaf775ccf8f1762
category: cheanup
bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57
CVE: NA
Reference:
https://github.com/linux-rdma/rdma-core/pull/1599/commits/fb96940fcf6f96185d...
---------------------------------------------------------------------
Currently there is no way to observe whether the lock-free mode is
configured from the driver's perspective. Add debug log for this.
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com>
---
providers/hns/hns_roce_u_verbs.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index 5fe169e..3efc2f4 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -182,6 +182,7 @@ err:
struct ibv_pd *hns_roce_u_alloc_pad(struct ibv_context *context,
struct ibv_parent_domain_init_attr *attr)
{
+ struct hns_roce_pd *protection_domain;
struct hns_roce_pad *pad;
if (ibv_check_alloc_parent_domain(attr))
@@ -198,12 +199,16 @@ struct ibv_pd *hns_roce_u_alloc_pad(struct ibv_context *context,
return NULL;
}
+ protection_domain = to_hr_pd(attr->pd);
if (attr->td) {
pad->td = to_hr_td(attr->td);
atomic_fetch_add(&pad->td->refcount, 1);
+ verbs_debug(verbs_get_ctx(context),
+ "set PAD(0x%x) to lock-free mode.\n",
+ protection_domain->pdn);
}
- pad->pd.protection_domain = to_hr_pd(attr->pd);
+ pad->pd.protection_domain = protection_domain;
atomic_fetch_add(&pad->pd.protection_domain->refcount, 1);
atomic_init(&pad->pd.refcount, 1);
--
2.25.1

View File

@ -0,0 +1,58 @@
From 478e5fd1d8e1a0b04fe6638c163951a0892eab44 Mon Sep 17 00:00:00 2001
From: Junxian Huang <huangjunxian6@hisilicon.com>
Date: Wed, 23 Apr 2025 16:55:14 +0800
Subject: [PATCH 59/62] libhns: Fix ret not assigned in create srq()
mainline inclusion
from mainline-v56.0-65
commit 2034b1860c5a8b0cc3879315259462c04e53a98d
category: bugfix
bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57
CVE: NA
Reference:
https://github.com/linux-rdma/rdma-core/pull/1599/commits/2034b1860c5a8b0cc3...
---------------------------------------------------------------------
Fix the problem that ret may not be assigned in the error flow
of create_srq().
Fixes: aa7bcf7f7e44 ("libhns: Add support for lock-free SRQ")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com>
---
providers/hns/hns_roce_u_verbs.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index 3efc2f4..e0bafe3 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -933,16 +933,20 @@ static struct ibv_srq *create_srq(struct ibv_context *context,
if (pad)
atomic_fetch_add(&pad->pd.refcount, 1);
- if (hns_roce_srq_spinlock_init(context, srq, init_attr))
+ ret = hns_roce_srq_spinlock_init(context, srq, init_attr);
+ if (ret)
goto err_free_srq;
set_srq_param(context, srq, init_attr);
- if (alloc_srq_buf(srq))
+ ret = alloc_srq_buf(srq);
+ if (ret)
goto err_destroy_lock;
srq->rdb = hns_roce_alloc_db(hr_ctx, HNS_ROCE_SRQ_TYPE_DB);
- if (!srq->rdb)
+ if (!srq->rdb) {
+ ret = ENOMEM;
goto err_srq_buf;
+ }
ret = exec_srq_create_cmd(context, srq, init_attr);
if (ret)
--
2.25.1

View File

@ -0,0 +1,99 @@
From e45b9c648476b1b56592a873fa71699cb7f32ffd Mon Sep 17 00:00:00 2001
From: Junxian Huang <huangjunxian6@hisilicon.com>
Date: Wed, 23 Apr 2025 16:55:15 +0800
Subject: [PATCH 60/62] libhns: Fix pad refcnt leaking in error flow of create
qp/cq/srq
mainline inclusion
from mainline-v56.0-65
commit f877d6e610e438515e1535c9ec7a3a3ef37c58e0
category: bugfix
bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57
CVE: NA
Reference:
https://github.com/linux-rdma/rdma-core/pull/1599/commits/f877d6e610e438515e...
---------------------------------------------------------------------
Decrease pad refcnt by 1 in error flow of create qp/cq/srq.
Fixes: f8b4f622b1c5 ("libhns: Add support for lock-free QP")
Fixes: 95225025e24c ("libhns: Add support for lock-free CQ")
Fixes: aa7bcf7f7e44 ("libhns: Add support for lock-free SRQ")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com>
---
providers/hns/hns_roce_u_verbs.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index e0bafe3..70f516a 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -445,12 +445,9 @@ static int verify_cq_create_attr(struct ibv_cq_init_attr_ex *attr,
return EOPNOTSUPP;
}
- if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD) {
- if (!pad) {
- verbs_err(&context->ibv_ctx, "failed to check the pad of cq.\n");
- return EINVAL;
- }
- atomic_fetch_add(&pad->pd.refcount, 1);
+ if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD && !pad) {
+ verbs_err(&context->ibv_ctx, "failed to check the pad of cq.\n");
+ return EINVAL;
}
attr->cqe = max_t(uint32_t, HNS_ROCE_MIN_CQE_NUM,
@@ -556,6 +553,7 @@ static void hns_roce_uninit_cq_swc(struct hns_roce_cq *cq)
static struct ibv_cq_ex *create_cq(struct ibv_context *context,
struct ibv_cq_init_attr_ex *attr)
{
+ struct hns_roce_pad *pad = to_hr_pad(attr->parent_domain);
struct hns_roce_context *hr_ctx = to_hr_ctx(context);
struct hns_roce_cq *cq;
int ret;
@@ -570,8 +568,10 @@ static struct ibv_cq_ex *create_cq(struct ibv_context *context,
goto err;
}
- if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD)
+ if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD) {
cq->parent_domain = attr->parent_domain;
+ atomic_fetch_add(&pad->pd.refcount, 1);
+ }
ret = hns_roce_cq_spinlock_init(context, cq, attr);
if (ret)
@@ -611,6 +611,8 @@ err_db:
err_buf:
hns_roce_spinlock_destroy(&cq->hr_lock);
err_lock:
+ if (attr->comp_mask & IBV_CQ_INIT_ATTR_MASK_PD)
+ atomic_fetch_sub(&pad->pd.refcount, 1);
free(cq);
err:
if (ret < 0)
@@ -977,6 +979,8 @@ err_destroy_lock:
hns_roce_spinlock_destroy(&srq->hr_lock);
err_free_srq:
+ if (pad)
+ atomic_fetch_sub(&pad->pd.refcount, 1);
free(srq);
err:
@@ -1872,6 +1876,8 @@ err_cmd:
err_buf:
hns_roce_qp_spinlock_destroy(qp);
err_spinlock:
+ if (pad)
+ atomic_fetch_sub(&pad->pd.refcount, 1);
free(qp);
err:
if (ret < 0)
--
2.25.1

View File

@ -0,0 +1,69 @@
From 59108bf3e452fa7701a3972c78d22352598891be Mon Sep 17 00:00:00 2001
From: Junxian Huang <huangjunxian6@hisilicon.com>
Date: Wed, 23 Apr 2025 16:55:16 +0800
Subject: [PATCH 61/62] libhns: Fix freeing pad without checking refcnt
mainline inclusion
from mainline-v56.0-65
commit 234d135276ea8ef83633113e224e0cd735ebeca8
category: bugfix
bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57
CVE: NA
Reference:
https://github.com/linux-rdma/rdma-core/pull/1599/commits/234d135276ea8ef836...
---------------------------------------------------------------------
Currently pad refcnt will be added when creating qp/cq/srq, but it is
not checked when freeing pad. Add a check to prevent freeing pad when
it is still used by any qp/cq/srq.
Fixes: 7b6b3dae328f ("libhns: Add support for thread domain and parent
domain")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com>
---
providers/hns/hns_roce_u_verbs.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index 70f516a..edd8e3d 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -218,14 +218,18 @@ struct ibv_pd *hns_roce_u_alloc_pad(struct ibv_context *context,
return &pad->pd.ibv_pd;
}
-static void hns_roce_free_pad(struct hns_roce_pad *pad)
+static int hns_roce_free_pad(struct hns_roce_pad *pad)
{
+ if (atomic_load(&pad->pd.refcount) > 1)
+ return EBUSY;
+
atomic_fetch_sub(&pad->pd.protection_domain->refcount, 1);
if (pad->td)
atomic_fetch_sub(&pad->td->refcount, 1);
free(pad);
+ return 0;
}
static int hns_roce_free_pd(struct hns_roce_pd *pd)
@@ -248,10 +252,8 @@ int hns_roce_u_dealloc_pd(struct ibv_pd *ibv_pd)
struct hns_roce_pad *pad = to_hr_pad(ibv_pd);
struct hns_roce_pd *pd = to_hr_pd(ibv_pd);
- if (pad) {
- hns_roce_free_pad(pad);
- return 0;
- }
+ if (pad)
+ return hns_roce_free_pad(pad);
return hns_roce_free_pd(pd);
}
--
2.25.1

View File

@ -0,0 +1,43 @@
From 387d76c0046b4fb6fbd8d70389b335661d099683 Mon Sep 17 00:00:00 2001
From: Junxian Huang <huangjunxian6@hisilicon.com>
Date: Wed, 23 Apr 2025 16:55:17 +0800
Subject: [PATCH 62/62] verbs: Assign ibv srq->pd when creating SRQ
mainline inclusion
from mainline-v56.0-65
commit bf1e427141fde2651bab4860e77a432bb7e26094
category: bugfix
bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/IC3X57
CVE: NA
Reference:
https://github.com/linux-rdma/rdma-core/pull/1599/commits/bf1e427141fde2651b...
---------------------------------------------------------------------
Some providers need to access ibv_srq->pd during SRQ destruction, but
it may not be assigned currently when using ibv_create_srq_ex(). This
may lead to some SRQ-related resource leaks. Assign ibv_srq->pd when
creating SRQ to ensure pd can be obtained correctly.
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Xinghai Cen <cenxinghai@h-partners.com>
---
libibverbs/cmd_srq.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/libibverbs/cmd_srq.c b/libibverbs/cmd_srq.c
index dfaaa6a..259ea0d 100644
--- a/libibverbs/cmd_srq.c
+++ b/libibverbs/cmd_srq.c
@@ -63,6 +63,7 @@ static int ibv_icmd_create_srq(struct ibv_pd *pd, struct verbs_srq *vsrq,
struct verbs_xrcd *vxrcd = NULL;
enum ibv_srq_type srq_type;
+ srq->pd = pd;
srq->context = pd->context;
pthread_mutex_init(&srq->mutex, NULL);
pthread_cond_init(&srq->cond, NULL);
--
2.25.1

View File

@ -1,6 +1,6 @@
Name: rdma-core
Version: 50.0
Release: 27
Release: 28
Summary: RDMA core userspace libraries and daemons
License: GPL-2.0-only OR BSD-2-Clause AND BSD-3-Clause
Url: https://github.com/linux-rdma/rdma-core
@ -57,6 +57,11 @@ patch54: 0054-libhns-Fix-wrong-max-inline-data-value.patch
patch55: 0055-libhns-Fix-wrong-order-of-spin-unlock-in-modify-qp.patch
patch56: 0056-libhns-Add-initial-support-for-HNS-LTTng-tracing.patch
patch57: 0057-libhns-Add-tracepoint-for-HNS-RoCE-I-O.patch
patch58: 0058-libhns-Add-debug-log-for-lock-free-mode.patch
patch59: 0059-libhns-Fix-ret-not-assigned-in-create-srq.patch
patch60: 0060-libhns-Fix-pad-refcnt-leaking-in-error-flow-of-creat.patch
patch61: 0061-libhns-Fix-freeing-pad-without-checking-refcnt.patch
patch62: 0062-verbs-Assign-ibv-srq-pd-when-creating-SRQ.patch
BuildRequires: binutils cmake >= 2.8.11 gcc libudev-devel pkgconfig pkgconfig(libnl-3.0)
BuildRequires: pkgconfig(libnl-route-3.0) systemd systemd-devel
@ -634,6 +639,12 @@ fi
%doc %{_docdir}/%{name}-%{version}/70-persistent-ipoib.rules
%changelog
* Fri Apr 25 2025 Xinghai Cen <cenxinghai@h-partners.com> - 50.0-28
- Type: bugfix
- ID: NA
- SUG: NA
- DESC: Bugfixes and one debug improvement
* Wed Apr 23 2025 Xinghai Cen <cenxinghai@h-partners.com> - 50.0-27
- Type: feature
- ID: NA