!40 galera: allow joiner to report non-Primary during initial IST

From: @bixiaoyan1 
Reviewed-by: @xiangbudaomz 
Signed-off-by: @xiangbudaomz
This commit is contained in:
openeuler-ci-bot 2024-04-24 01:04:02 +00:00 committed by Gitee
commit d824e75531
No known key found for this signature in database
GPG Key ID: 173E9B9CA92EEF8F
2 changed files with 84 additions and 1 deletions

View File

@ -0,0 +1,79 @@
From 4357f0dbb8668ac4090cd7070c2ea195e5683326 Mon Sep 17 00:00:00 2001
From: Damien Ciabrini <dciabrin@redhat.com>
Date: Wed, 24 Jan 2024 13:27:26 +0100
Subject: [PATCH 05/20] galera: allow joiner to report non-Primary during
initial IST
It seems that with recent galera versions, when a galera node
joins a cluster, there is a small time window where the node is
connected to the primary component of the galera cluster, but it
might still be preparing its IST. During this time, it can report
itself as being 'not ready' and in 'non-primary' state.
Update the galera resource agent to allow the node to be in
non-primary state, but only if running a "promote" operation. Any
network partition during the promotion will be caught by the
promote timeout.
In reworking the promotion code, we move the check for primary
partition into the "galera_monitor" function. The check works
as before for regular "monitor" or "probe" operations.
Related-Bug: rhbz#2255414
---
heartbeat/galera.in | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/heartbeat/galera.in b/heartbeat/galera.in
index 6aed3e4b..b518595c 100755
--- a/heartbeat/galera.in
+++ b/heartbeat/galera.in
@@ -822,6 +822,11 @@ galera_promote()
return $rc
fi
+ # At this point, the mysql pidfile is created on disk and the
+ # mysql server is reacheable via its UNIX socket. If we are a
+ # joiner, SST transfers (rsync) have finished, but an IST may
+ # still be requested or ongoing
+
galera_monitor
rc=$?
if [ $rc != $OCF_SUCCESS -a $rc != $OCF_RUNNING_MASTER ]; then
@@ -835,12 +840,6 @@ galera_promote()
return $OCF_ERR_GENERIC
fi
- is_primary
- if [ $? -ne 0 ]; then
- ocf_exit_reason "Failure. Master instance started, but is not in Primary mode."
- return $OCF_ERR_GENERIC
- fi
-
if ocf_is_true $bootstrap; then
promote_everyone
clear_bootstrap_node
@@ -991,8 +990,18 @@ galera_monitor()
fi
rc=$OCF_RUNNING_MASTER
else
- ocf_exit_reason "local node <${NODENAME}> is started, but not in primary mode. Unknown state."
- rc=$OCF_ERR_GENERIC
+ # It seems that with recent galera (26.4+), a joiner that is
+ # connected to a Primary component and is preparing its IST
+ # request might still temporarily report its state as
+ # Non-Primary. Do not fail in this case as the promote
+ # operation will loop until the IST finishes or the promote
+ # times out.
+ if [ "$__OCF_ACTION" = "promote" ] && ! ocf_is_true $(is_bootstrap); then
+ ocf_log info "local node <${NODENAME}> is receiving a State Transfer."
+ else
+ ocf_exit_reason "local node <${NODENAME}> is started, but not in primary mode. Unknown state."
+ rc=$OCF_ERR_GENERIC
+ fi
fi
return $rc
--
2.25.1

View File

@ -1,7 +1,7 @@
Name: resource-agents
Summary: Open Source HA Reusable Cluster Resource Scripts
Version: 4.13.0
Release: 18
Release: 19
License: GPLv2+ and LGPLv2+
URL: https://github.com/ClusterLabs/resource-agents
Source0: https://github.com/ClusterLabs/resource-agents/archive/v%{version}.tar.gz
@ -23,6 +23,7 @@ Patch0014: portblock-remove-write-to-tcp_tw_recycle.patch
Patch0015: findifsh-fix-corner-cases.patch
Patch0016: fix-OCF_SUCESS-name-in-db2_notify.patch
Patch0017: docs-writing-python-agents-update-required-Python-ve.patch
Patch0018: galera-allow-joiner-to-report-non-Primary-during-ini.patch
Obsoletes: heartbeat-resources <= %{version}
Provides: heartbeat-resources = %{version}
BuildRequires: automake autoconf pkgconfig gcc perl-interpreter perl-generators python3-devel
@ -120,6 +121,9 @@ export CFLAGS="$(echo '%{optflags}')"
%{_mandir}/man8/{ocf-tester.8*,ldirectord.8*}
%changelog
* Tue Apr 22 2024 bixiaoyan <bixiaoyan@kylinos.cn> - 4.13.0-19
- galera: allow joiner to report non-Primary during initial IST
* Mon Apr 22 2024 zouzhimin <zouzhimin@kylinos.cn> - 4.13.0-18
- docs: writing-python-agents: update required Python version to 3.6+