Compare commits

...

10 Commits

Author SHA1 Message Date
openeuler-ci-bot
c7ac29215a
!190 Add some improvements and add new modules to HiSilicon common section
From: @fwo 
Reviewed-by: @hubin95 
Signed-off-by: @hubin95
2025-05-09 02:43:30 +00:00
wjiang
fae7c0263f
Add some improvements and add new modules to HiSilicon common section 2025-04-30 10:56:55 +08:00
openeuler-ci-bot
720f6f4b2f
!182 fix ras-mc-ctl --summary failed
From: @znzjugod 
Reviewed-by: @hubin95 
Signed-off-by: @hubin95
2025-04-19 07:05:26 +00:00
znzjugod
f0b4938899 fix ras-mc-ctl --summary failed 2025-04-19 11:15:50 +08:00
openeuler-ci-bot
6b21185d42
!172 set to default when param is overflow
From: @luckky7 
Reviewed-by: @lvying6 
Signed-off-by: @lvying6
2024-12-16 11:57:26 +00:00
luckky
ab151e3a74 set to default when param is overflow 2024-12-16 17:31:13 +08:00
openeuler-ci-bot
1f384c5cc4
!147 rasdaemon: ras-mc-ctl: Modify check for HiSilicon KunPeng9xx error fields
From: @xia-bing1 
Reviewed-by: @hunan4222, @znzjugod 
Signed-off-by: @znzjugod
2024-04-25 11:58:41 +00:00
Bing Xia
c6a2b9b56f rasdaemon: ras-mc-ctl: Modify check for HiSilicon KunPeng9xx error fields
Modify check for valid HiSilicon KunPeng9xx error fields.
Fixes an error data is not printed when it's value is 0.
2024-04-25 14:52:52 +08:00
openeuler-ci-bot
dedbf6837a
!149 Fix cpu isolate errors when some cpus are offline before the service started
From: @Lostwayzxc 
Reviewed-by: @znzjugod 
Signed-off-by: @znzjugod
2024-04-24 07:08:25 +00:00
Shengwei Luo
be4fee4058 Fix cpu isolate errors when some cpus are offline before the service started 2024-04-23 17:53:17 +08:00
9 changed files with 677 additions and 1 deletions

View File

@ -0,0 +1,38 @@
From a0cf58e6c96bb5e2646da9fd43e1ddd285a6e8da Mon Sep 17 00:00:00 2001
From: Bing Xia <xiabing14@h-partners.com>
Date: Sun, 19 Jan 2025 11:08:26 +0000
Subject: [PATCH 1/4] rasdaemon: Fix some compilation alarms in ras-record.h.
Fix the problem that the type of a constant string does not match
when it is assigned to a character pointer.
Signed-off-by: Bing Xia <xiabing14@h-partners.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
ras-record.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/ras-record.h b/ras-record.h
index 5eab62c..eec0702 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -318,12 +318,12 @@ struct sqlite3_priv {
};
struct db_fields {
- char *name;
- char *type;
+ const char *name;
+ const char *type;
};
struct db_table_descriptor {
- char *name;
+ const char *name;
const struct db_fields *fields;
size_t num_fields;
};
--
2.25.1

View File

@ -0,0 +1,46 @@
From 1d83921b5ad4dc03a411abf759c1025e548ce3e0 Mon Sep 17 00:00:00 2001
From: Bing Xia <xiabing14@h-partners.com>
Date: Sun, 19 Jan 2025 11:26:43 +0000
Subject: [PATCH 2/4] rasdaemon: Fix few compilation warnings in non standard
hisilicon code
Fix the problem that the type of a constant string does not match
when it is assigned to a character pointer.
Signed-off-by: Bing Xia <xiabing14@h-partners.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
non-standard-hisi_hip08.c | 2 +-
non-standard-hisilicon.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/non-standard-hisi_hip08.c b/non-standard-hisi_hip08.c
index de5b5e9..3b8d93c 100644
--- a/non-standard-hisi_hip08.c
+++ b/non-standard-hisi_hip08.c
@@ -462,7 +462,7 @@ static const char * const oem_submodule_name(const struct hisi_module_info *info
return "unknown";
}
-static char *pcie_local_sub_module_name(uint8_t id)
+static const char *pcie_local_sub_module_name(uint8_t id)
{
switch (id) {
case HISI_PCIE_SUB_MODULE_ID_AP: return "AP_Layer";
diff --git a/non-standard-hisilicon.h b/non-standard-hisilicon.h
index 44da9e7..afd5e83 100644
--- a/non-standard-hisilicon.h
+++ b/non-standard-hisilicon.h
@@ -24,7 +24,7 @@ enum hisi_oem_data_type {
};
/* helper functions */
-static inline char *err_severity(uint8_t err_sev)
+static inline const char *err_severity(uint8_t err_sev)
{
switch (err_sev) {
case HISI_ERR_SEVERITY_NFE: return "recoverable";
--
2.25.1

View File

@ -0,0 +1,198 @@
From f717326ae7ffb30d6da09840c7e4613793d0113e Mon Sep 17 00:00:00 2001
From: Qizhi Zhang <zhangqizhi3@h-partners.com>
Date: Tue, 8 Apr 2025 10:55:06 +0800
Subject: [PATCH] rasdaemon: Fix some static check warning
The decode_int_fields() and decode_text_fields() functions are used
to replace the original if judgment branch, reducing the cyclomatic
complexity.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
non-standard-hisilicon.c | 135 ++++++++++++++++++---------------------
1 file changed, 61 insertions(+), 74 deletions(-)
diff --git a/non-standard-hisilicon.c b/non-standard-hisilicon.c
index 2b00ed6..53b36ed 100644
--- a/non-standard-hisilicon.c
+++ b/non-standard-hisilicon.c
@@ -144,7 +144,6 @@ int step_vendor_data_tab(struct ras_ns_ev_decoder *ev_decoder, const char *name)
}
#endif
-#ifdef HAVE_SQLITE3
static const struct db_fields hisi_common_section_fields[] = {
{ .name = "id", .type = "INTEGER PRIMARY KEY" },
{ .name = "timestamp", .type = "TEXT" },
@@ -164,6 +163,7 @@ static const struct db_fields hisi_common_section_fields[] = {
{ .name = "regs_dump", .type = "TEXT" },
};
+#ifdef HAVE_SQLITE3
static const struct db_table_descriptor hisi_common_section_tab = {
.name = "hisi_common_section_v2",
.fields = hisi_common_section_fields,
@@ -245,81 +245,33 @@ static void decode_module(struct ras_ns_ev_decoder *ev_decoder,
}
}
-static void decode_hisi_common_section_hdr(struct ras_ns_ev_decoder *ev_decoder,
- const struct hisi_common_error_section *err,
- struct hisi_event *event)
+static void decode_int_fields(struct ras_ns_ev_decoder *ev_decoder, int id,
+ uint16_t data, struct hisi_event *event, bool valid)
{
- HISI_SNPRINTF(event->error_msg, "[ table_version=%hhu", err->version);
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_VERSION,
- err->version, NULL);
- if (err->val_bits & BIT(HISI_COMMON_VALID_SOC_ID)) {
- HISI_SNPRINTF(event->error_msg, "soc=%s", get_soc_desc(err->soc_id));
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_SOC_ID,
- err->soc_id, NULL);
- }
-
- if (err->val_bits & BIT(HISI_COMMON_VALID_SOCKET_ID)) {
- HISI_SNPRINTF(event->error_msg, "socket_id=%hhu", err->socket_id);
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_SOCKET_ID,
- err->socket_id, NULL);
- }
+ if (!valid)
+ return;
- if (err->val_bits & BIT(HISI_COMMON_VALID_TOTEM_ID)) {
- HISI_SNPRINTF(event->error_msg, "totem_id=%hhu", err->totem_id);
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_TOTEM_ID,
- err->totem_id, NULL);
+ if (id == HISI_COMMON_FIELD_SOC_ID) {
+ HISI_SNPRINTF(event->error_msg, "soc=%s", get_soc_desc(data));
+ } else {
+ HISI_SNPRINTF(event->error_msg, "%s=%hu",
+ hisi_common_section_fields[id].name, data);
}
- if (err->val_bits & BIT(HISI_COMMON_VALID_NIMBUS_ID)) {
- HISI_SNPRINTF(event->error_msg, "nimbus_id=%hhu", err->nimbus_id);
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_NIMBUS_ID,
- err->nimbus_id, NULL);
- }
+ record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT, id, data, NULL);
+}
- if (err->val_bits & BIT(HISI_COMMON_VALID_SUBSYSTEM_ID)) {
- HISI_SNPRINTF(event->error_msg, "subsystem_id=%hhu", err->subsystem_id);
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_SUB_SYSTEM_ID,
- err->subsystem_id, NULL);
- }
+static void decode_text_fields(struct ras_ns_ev_decoder *ev_decoder, int id,
+ const struct hisi_common_error_section *err,
+ struct hisi_event *event, bool valid)
+{
+ if (!valid)
+ return;
- if (err->val_bits & BIT(HISI_COMMON_VALID_MODULE_ID))
+ if (id == HISI_COMMON_FIELD_MODULE_ID)
decode_module(ev_decoder, event, err->module_id);
- if (err->val_bits & BIT(HISI_COMMON_VALID_SUBMODULE_ID)) {
- HISI_SNPRINTF(event->error_msg, "submodule_id=%hhu", err->submodule_id);
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_SUB_MODULE_ID,
- err->submodule_id, NULL);
- }
-
- if (err->val_bits & BIT(HISI_COMMON_VALID_CORE_ID)) {
- HISI_SNPRINTF(event->error_msg, "core_id=%hhu", err->core_id);
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_CORE_ID,
- err->core_id, NULL);
- }
-
- if (err->val_bits & BIT(HISI_COMMON_VALID_PORT_ID)) {
- HISI_SNPRINTF(event->error_msg, "port_id=%hhu", err->port_id);
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_PORT_ID,
- err->port_id, NULL);
- }
-
- if (err->val_bits & BIT(HISI_COMMON_VALID_ERR_TYPE)) {
- HISI_SNPRINTF(event->error_msg, "err_type=%hu", err->err_type);
- record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_INT,
- HISI_COMMON_FIELD_ERR_TYPE,
- err->err_type, NULL);
- }
-
- if (err->val_bits & BIT(HISI_COMMON_VALID_PCIE_INFO)) {
+ if (id == HISI_COMMON_FIELD_PCIE_INFO) {
HISI_SNPRINTF(event->error_msg, "pcie_device_id=%04x:%02x:%02x.%x",
err->pcie_info.segment, err->pcie_info.bus,
err->pcie_info.device, err->pcie_info.function);
@@ -327,16 +279,51 @@ static void decode_hisi_common_section_hdr(struct ras_ns_ev_decoder *ev_decoder,
err->pcie_info.segment, err->pcie_info.bus,
err->pcie_info.device, err->pcie_info.function);
record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_TEXT,
- HISI_COMMON_FIELD_PCIE_INFO,
- 0, event->pcie_info);
+ id, 0, event->pcie_info);
}
- if (err->val_bits & BIT(HISI_COMMON_VALID_ERR_SEVERITY)) {
- HISI_SNPRINTF(event->error_msg, "err_severity=%s", err_severity(err->err_severity));
+ if (id == HISI_COMMON_FIELD_ERR_SEVERITY) {
+ HISI_SNPRINTF(event->error_msg, "err_severity=%s",
+ err_severity(err->err_severity));
record_vendor_data(ev_decoder, HISI_OEM_DATA_TYPE_TEXT,
- HISI_COMMON_FIELD_ERR_SEVERITY,
- 0, err_severity(err->err_severity));
+ id, 0, err_severity(err->err_severity));
}
+}
+
+static void decode_hisi_common_section_hdr(struct ras_ns_ev_decoder *ev_decoder,
+ const struct hisi_common_error_section *err,
+ struct hisi_event *event)
+{
+ HISI_SNPRINTF(event->error_msg, "[");
+
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_VERSION, err->version, event, 1);
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_SOC_ID, err->soc_id, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_SOC_ID));
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_SOCKET_ID, err->socket_id, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_SOCKET_ID));
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_TOTEM_ID, err->totem_id, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_TOTEM_ID));
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_NIMBUS_ID, err->nimbus_id, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_NIMBUS_ID));
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_SUB_SYSTEM_ID, err->subsystem_id, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_SUBSYSTEM_ID));
+
+ decode_text_fields(ev_decoder, HISI_COMMON_FIELD_MODULE_ID, err, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_MODULE_ID));
+
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_SUB_MODULE_ID, err->submodule_id, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_SUBMODULE_ID));
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_CORE_ID, err->core_id, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_CORE_ID));
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_PORT_ID, err->port_id, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_PORT_ID));
+ decode_int_fields(ev_decoder, HISI_COMMON_FIELD_ERR_TYPE, err->err_type, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_ERR_TYPE));
+
+ decode_text_fields(ev_decoder, HISI_COMMON_FIELD_PCIE_INFO, err, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_PCIE_INFO));
+ decode_text_fields(ev_decoder, HISI_COMMON_FIELD_ERR_SEVERITY, err, event,
+ err->val_bits & BIT(HISI_COMMON_VALID_ERR_SEVERITY));
HISI_SNPRINTF(event->error_msg, "]");
}
--
2.25.1

View File

@ -0,0 +1,40 @@
From 34e16f737fdcc90ed396a4b6074c6f2e3573e1d1 Mon Sep 17 00:00:00 2001
From: Qizhi Zhang <zhangqizhi3@h-partners.com>
Date: Tue, 8 Apr 2025 11:15:55 +0800
Subject: [PATCH] rasdaemon: Add new modules supported by HiSilicon common
section
Add new modules supported by HiSilicon common error section.
Signed-off-by: Bing Xia <xiabing14@h-partners.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
non-standard-hisilicon.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/non-standard-hisilicon.c b/non-standard-hisilicon.c
index 2b00ed6..d294442 100644
--- a/non-standard-hisilicon.c
+++ b/non-standard-hisilicon.c
@@ -219,6 +219,17 @@ static const char* module_name[] = {
"SDMA",
"UC",
"HBMC",
+ "PMC",
+ "SCHE",
+ "ASMB_DFS",
+ "ASMB_NTU",
+ "UB",
+ "UMMU",
+ "PCU",
+ "UCMI",
+ "DJTAGM",
+ "CFGBUS",
+ "MPU",
};
static const char* get_soc_desc(uint8_t soc_id)
--
2.25.1

View File

@ -0,0 +1,32 @@
From 77600e0cd71cd5c34126635b199e7b66f4d74874 Mon Sep 17 00:00:00 2001
From: Shengwei Luo <luoshengwei@huawei.com>
Date: Tue, 23 Apr 2024 17:09:10 +0800
Subject: [PATCH] rasdaemon: Fix cpu isolate errors when some cpus are offline
before the service started.
The upstream patch use (sysconf(_SC_NPROCESSORS_ONLN)) instead of
(sysconf(_SC_NPROCESSORS_CONF)). However ras_cpu_isolation_init()
need the all cpu info, so fix it.
Fixes: f1ea76375281 ("rasdaemon: Check CPUs online, not configured")
Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
---
ras-events.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ras-events.c b/ras-events.c
index ffac02b..1aa6db6 100644
--- a/ras-events.c
+++ b/ras-events.c
@@ -950,7 +950,7 @@ int handle_ras_events(int record_events)
cpus = get_num_cpus(ras);
#ifdef HAVE_CPU_FAULT_ISOLATION
- ras_cpu_isolation_init(cpus);
+ ras_cpu_isolation_init(sysconf(_SC_NPROCESSORS_CONF));
#endif
#ifdef HAVE_MCE
--
2.33.0

View File

@ -0,0 +1,136 @@
From c306d693f86cd9e128a103b1670b653613eb78d2 Mon Sep 17 00:00:00 2001
From: luckky <guodashun1@huawei.com>
Date: Mon, 16 Dec 2024 17:20:04 +0800
Subject: [PATCH] bugfix set to default when param is overflow
1. In this patch, we check if the value is overflow before parsing the
value with its unit. we replaced sscanf with strtoul cause the strtoul
has clear errno ERANGE for overflow case.
2. When the value is overflow, the sscanf will produce Undefined Behavior
(https://man7.org/linux/man-pages/man3/sscanf.3.html#BUGS).
The final value after being truncated is confusing. So in this patch
we will set to default value when the value is overflow.
---
ras-page-isolation.c | 56 +++++++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 21 deletions(-)
diff --git a/ras-page-isolation.c b/ras-page-isolation.c
index caa8c31..ed07b70 100644
--- a/ras-page-isolation.c
+++ b/ras-page-isolation.c
@@ -44,6 +44,7 @@ static struct isolation threshold = {
.units = threshold_units,
.env = "50",
.unit = "",
+ .val = 50,
};
static struct isolation cycle = {
@@ -51,6 +52,7 @@ static struct isolation cycle = {
.units = cycle_units,
.env = "24h",
.unit = "h",
+ .val = 86400,
};
static const char *kernel_offline[] = {
@@ -106,10 +108,24 @@ static void page_offline_init(void)
offline_choice[offline].name);
}
+/*
+ * The 'parse_isolation_env' will parse the real value from the env settings
+ * in config file. The valid format of the env is pure positive number
+ * (like '12345') or a positive number with specific units (like '24h').
+ * When the unit is not set, we use the default unit (threshold for '' and
+ * cycle for 'h').
+ * The number is only supported in decimal, while others will produce errors.
+ * This function will parse the high level units to base units (like 'h' is
+ * a high level unit and 's' is a base unit).
+ * The valid value range is [1, UNLONG_MAX], and when the value is out of
+ * range (whether the origin pure number without units or the parsed number
+ * with the base units), the value will be set to the default value.
+ */
static void parse_isolation_env(struct isolation *config)
{
char *env = getenv(config->name);
char *unit = NULL;
+ char *endptr = NULL;
const struct config *units = NULL;
int i, no_unit;
int valid = 0;
@@ -146,43 +162,41 @@ static void parse_isolation_env(struct isolation *config)
parse:
/* if invalid, use default env */
if (valid) {
- config->env = env;
if (!no_unit)
config->unit = unit;
} else {
+ env = config->env;
log(TERM, LOG_INFO, "Improper %s, set to default %s.\n",
config->name, config->env);
}
/* if env value string is greater than ulong_max, truncate the last digit */
- sscanf(config->env, "%lu", &value);
+ errno = 0;
+ value = strtoul(env, &endptr, 10);
+ if (errno == ERANGE)
+ config->overflow = true;
for (units = config->units; units->name; units++) {
if (!strcasecmp(config->unit, units->name))
unit_matched = 1;
if (unit_matched) {
tmp = value;
value *= units->val;
- if (tmp != 0 && value / tmp != units->val)
+ if (tmp != 0 && value / tmp != units->val) {
config->overflow = true;
+ break;
+ }
}
}
- config->val = value;
- /* In order to output value and unit perfectly */
- config->unit = no_unit ? config->unit : "";
-}
-
-static void parse_env_string(struct isolation *config, char *str, unsigned int size)
-{
- int i;
-
- if (config->overflow) {
- /* when overflow, use basic unit */
- for (i = 0; config->units[i].name; i++) ;
- snprintf(str, size, "%lu%s", config->val, config->units[i-1].name);
- log(TERM, LOG_INFO, "%s is set overflow(%s), truncate it\n",
- config->name, config->env);
+ if (!config->overflow) {
+ config->val = value;
+ config->env = env;
+ /* In order to output value and unit perfectly */
+ config->unit = no_unit ? config->unit : "";
} else {
- snprintf(str, size, "%s%s", config->env, config->unit);
+ log(TERM, LOG_INFO, "%s is set overflow(%s), set to default %s\n",
+ config->name, env, config->env);
+ /* In order to output value and unit perfectly */
+ config->unit = "";
}
}
@@ -199,8 +213,8 @@ static void page_isolation_init(void)
parse_isolation_env(&threshold);
parse_isolation_env(&cycle);
- parse_env_string(&threshold, threshold_string, sizeof(threshold_string));
- parse_env_string(&cycle, cycle_string, sizeof(cycle_string));
+ snprintf(threshold_string, sizeof(threshold_string), "%s%s", threshold.env, threshold.unit);
+ snprintf(cycle_string, sizeof(cycle_string), "%s%s", cycle.env, cycle.unit);
log(TERM, LOG_INFO, "Threshold of memory Corrected Errors is %s / %s\n",
threshold_string, cycle_string);
}
--
2.43.0

View File

@ -0,0 +1,25 @@
From 8f9b6aeb13884696fbf17da7ac28111a672c1301 Mon Sep 17 00:00:00 2001
From: root <zhangnan134@huawei.com>
Date: Sat, 19 Apr 2025 10:04:20 +0800
Subject: [PATCH] fix ras-mc-ctl --summary failed
---
misc/rasdaemon.service.in | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/misc/rasdaemon.service.in b/misc/rasdaemon.service.in
index 508aa3c..4ef3d2c 100644
--- a/misc/rasdaemon.service.in
+++ b/misc/rasdaemon.service.in
@@ -4,7 +4,7 @@ After=syslog.target
[Service]
EnvironmentFile=@SYSCONFDEFDIR@/rasdaemon
-ExecStart=@sbindir@/rasdaemon -f
+ExecStart=@sbindir@/rasdaemon -f -r
ExecStartPost=@sbindir@/rasdaemon --enable
ExecStop=@sbindir@/rasdaemon --disable
Restart=on-abort
--
2.27.0

View File

@ -0,0 +1,122 @@
From bcc5779d52269b5a0b7bae42aaf2a3e650587bdb Mon Sep 17 00:00:00 2001
From: Shiju Jose <shiju.jose@huawei.com>
Date: Thu, 24 Aug 2023 13:07:17 +0100
Subject: [PATCH 12/12] rasdaemon: ras-mc-ctl: Modify check for HiSilicon
KunPeng9xx error fields
Modify check for valid HiSilicon KunPeng9xx error fields.
Fixes an error data is not printed when it's value is 0.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---
util/ras-mc-ctl.in | 72 +++++++++++++++++++++++-----------------------
1 file changed, 36 insertions(+), 36 deletions(-)
diff --git a/util/ras-mc-ctl.in b/util/ras-mc-ctl.in
index 4178dcf..07e6fca 100755
--- a/util/ras-mc-ctl.in
+++ b/util/ras-mc-ctl.in
@@ -1672,13 +1672,13 @@ sub vendor_errors
if ($module eq 0 || ($module_id && uc($module) eq uc($module_id))) {
$out .= "$id. $timestamp Error Info: ";
$out .= "version=$version, ";
- $out .= "soc_id=$soc_id, " if ($soc_id);
- $out .= "socket_id=$socket_id, " if ($socket_id);
- $out .= "nimbus_id=$nimbus_id, " if ($nimbus_id);
- $out .= "module_id=$module_id, " if ($module_id);
- $out .= "sub_module_id=$sub_module_id, " if ($sub_module_id);
- $out .= "err_severity=$err_severity, " if ($err_severity);
- $out .= "Error Registers: $regs " if ($regs);
+ $out .= "soc_id=$soc_id, " if (defined $soc_id && length $soc_id);
+ $out .= "socket_id=$socket_id, " if (defined $socket_id && length $socket_id);
+ $out .= "nimbus_id=$nimbus_id, " if (defined $nimbus_id && length $nimbus_id);
+ $out .= "module_id=$module_id, " if (defined $module_id && length $module_id);
+ $out .= "sub_module_id=$sub_module_id, " if (defined $sub_module_id && length $sub_module_id);
+ $out .= "err_severity=$err_severity, " if (defined $err_severity && length $err_severity);
+ $out .= "Error Registers: $regs " if (defined $regs && length $regs);
$out .= "\n\n";
$found_module = 1;
}
@@ -1697,13 +1697,13 @@ sub vendor_errors
if ($module eq 0 || ($module_id && uc($module) eq uc($module_id))) {
$out .= "$id. $timestamp Error Info: ";
$out .= "version=$version, ";
- $out .= "soc_id=$soc_id, " if ($soc_id);
- $out .= "socket_id=$socket_id, " if ($socket_id);
- $out .= "nimbus_id=$nimbus_id, " if ($nimbus_id);
- $out .= "module_id=$module_id, " if ($module_id);
- $out .= "sub_module_id=$sub_module_id, " if ($sub_module_id);
- $out .= "err_severity=$err_severity, " if ($err_severity);
- $out .= "Error Registers: $regs " if ($regs);
+ $out .= "soc_id=$soc_id, " if (defined $soc_id && length $soc_id);
+ $out .= "socket_id=$socket_id, " if (defined $socket_id && length $socket_id);
+ $out .= "nimbus_id=$nimbus_id, " if (defined $nimbus_id && length $nimbus_id);
+ $out .= "module_id=$module_id, " if (defined $module_id && length $module_id);
+ $out .= "sub_module_id=$sub_module_id, " if (defined $sub_module_id && length $sub_module_id);
+ $out .= "err_severity=$err_severity, " if (defined $err_severity && length $err_severity);
+ $out .= "Error Registers: $regs " if (defined $regs && length $regs);
$out .= "\n\n";
$found_module = 1;
}
@@ -1722,15 +1722,15 @@ sub vendor_errors
if ($module eq 0 || ($sub_module_id && uc($module) eq uc($sub_module_id))) {
$out .= "$id. $timestamp Error Info: ";
$out .= "version=$version, ";
- $out .= "soc_id=$soc_id, " if ($soc_id);
- $out .= "socket_id=$socket_id, " if ($socket_id);
- $out .= "nimbus_id=$nimbus_id, " if ($nimbus_id);
- $out .= "sub_module_id=$sub_module_id, " if ($sub_module_id);
- $out .= "core_id=$core_id, " if ($core_id);
- $out .= "port_id=$port_id, " if ($port_id);
- $out .= "err_severity=$err_severity, " if ($err_severity);
- $out .= "err_type=$err_type, " if ($err_type);
- $out .= "Error Registers: $regs " if ($regs);
+ $out .= "soc_id=$soc_id, " if (defined $soc_id && length $soc_id);
+ $out .= "socket_id=$socket_id, " if (defined $socket_id && length $socket_id);
+ $out .= "nimbus_id=$nimbus_id, " if (defined $nimbus_id && length $nimbus_id);
+ $out .= "sub_module_id=$sub_module_id, " if (defined $sub_module_id && length $sub_module_id);
+ $out .= "core_id=$core_id, " if (defined $core_id && length $core_id);
+ $out .= "port_id=$port_id, " if (defined $port_id && length $port_id);
+ $out .= "err_severity=$err_severity, " if (defined $err_severity && length $err_severity);
+ $out .= "err_type=$err_type, " if (defined $err_type && length $err_type);
+ $out .= "Error Registers: $regs " if (defined $regs && length $regs);
$out .= "\n\n";
$found_module = 1;
}
@@ -1749,19 +1749,19 @@ sub vendor_errors
if ($module eq 0 || ($module_id && uc($module) eq uc($module_id))) {
$out .= "$id. $timestamp Error Info: ";
$out .= "version=$version, ";
- $out .= "soc_id=$soc_id, " if ($soc_id);
- $out .= "socket_id=$socket_id, " if ($socket_id);
- $out .= "totem_id=$totem_id, " if ($totem_id);
- $out .= "nimbus_id=$nimbus_id, " if ($nimbus_id);
- $out .= "sub_system_id=$sub_system_id, " if ($sub_system_id);
- $out .= "module_id=$module_id, " if ($module_id);
- $out .= "sub_module_id=$sub_module_id, " if ($sub_module_id);
- $out .= "core_id=$core_id, " if ($core_id);
- $out .= "port_id=$port_id, " if ($port_id);
- $out .= "err_type=$err_type, " if ($err_type);
- $out .= "pcie_info=$pcie_info, " if ($pcie_info);
- $out .= "err_severity=$err_severity, " if ($err_severity);
- $out .= "Error Registers: $regs" if ($regs);
+ $out .= "soc_id=$soc_id, " if (defined $soc_id && length $soc_id);
+ $out .= "socket_id=$socket_id, " if (defined $socket_id && length $socket_id);
+ $out .= "totem_id=$totem_id, " if (defined $totem_id && length $totem_id);
+ $out .= "nimbus_id=$nimbus_id, " if (defined $nimbus_id && length $nimbus_id);
+ $out .= "sub_system_id=$sub_system_id, " if (defined $sub_system_id && length $sub_system_id);
+ $out .= "module_id=$module_id, " if (defined $module_id && length $module_id);
+ $out .= "sub_module_id=$sub_module_id, " if (defined $sub_module_id && length $sub_module_id);
+ $out .= "core_id=$core_id, " if (defined $core_id && length $core_id );
+ $out .= "port_id=$port_id, " if (defined $port_id && length $port_id);
+ $out .= "err_type=$err_type, " if (defined $err_type && length $err_type);
+ $out .= "pcie_info=$pcie_info, " if (defined $pcie_info && length $pcie_info);
+ $out .= "err_severity=$err_severity, " if (defined $err_severity && length $err_severity);
+ $out .= "Error Registers: $regs" if (defined $regs && length $regs);
$out .= "\n\n";
$found_module = 1;
}
--
2.25.1

View File

@ -1,6 +1,6 @@
Name: rasdaemon
Version: 0.8.0
Release: 3
Release: 8
License: GPLv2
Summary: Utility to get Platform Reliability, Availability and Serviceability (RAS) reports via the Kernel tracing events
URL: https://github.com/mchehab/rasdaemon.git
@ -32,6 +32,14 @@ Patch9005: 0002-rasdaemon-fix-issue-of-signed-and-unsigned-integer-c.patch
Patch9006: 0003-rasdaemon-Add-support-for-creating-the-vendor-error-.patch
Patch9007: backport-Check-CPUs-online-not-configured.patch
Patch9008: backport-rasdaemon-diskerror-fix-incomplete-diskerror-log.patch
Patch9009: bugfix-fix-cpu-isolate-errors-when-some-cpus-are-.patch
Patch9010: rasdaemon-ras-mc-ctl-Modify-check-for-HiSilicon-KunP.patch
Patch9011: bugfix-set-to-default-when-param-is-overflow.patch
Patch9012: fix-ras-mc-ctl-summary-failed.patch
Patch9013: 0001-rasdaemon-Fix-some-compilation-alarms-in-ras-record..patch
Patch9014: 0002-rasdaemon-Fix-few-compilation-warnings-in-non-standa.patch
Patch9015: 0003-rasdaemon-Fix-some-static-check-warning.patch
Patch9016: 0004-rasdaemon-Add-new-modules-supported-by-HiSilicon-com.patch
%description
The rasdaemon program is a daemon which monitors the platform
@ -83,6 +91,37 @@ fi
/usr/bin/systemctl disable rasdaemon.service >/dev/null 2>&1 || :
%changelog
* Wed Apr 30 2025 wangjiang <app@cameyan.com> - 0.8.0-8
- Type:bugfix
- ID:NA
- SUG:NA
- DESC:Add some improvements and add new modules to HiSilicon common section
* Sat Apr 19 2025 zhangnan <zhangnan134@huawei.com> - 0.8.0-7
- Type:bugfix
- ID:NA
- SUG:NA
- DESC:fix ras-mc-ctl --summary failed
* Mon Dec 16 2024 guodashun <guodashun1@huawei.com> - 0.8.0-6
- Type:bugfix
- ID:NA
- SUG:NA
- DESC:set to default when param is overflow
* Tue Apr 23 2024 Bing Xia <xiabing12@h-partners.com> - 0.8.0-5
- Type:bugfix
- ID:NA
- SUG:NA
- DESC:Modify check for HiSilicon KunPeng9xx error fields.
* Tue Apr 23 2024 luoshengwei <luoshengwei@huawei.com> - 0.8.0-4
- Type:bugfix
- ID:NA
- SUG:NA
- DESC:fix cpu isolate errors when some cpus are offline
before the service started
* Wed Mar 27 2024 zhuofeng <zhuofeng2@huawei.com> - 0.8.0-3
- Type:bugfix
- ID:NA