rubik/patch/0005-rubik-move-fssr-design-document-to-design-dir.patch
vegbir a1d5d5f63d rubik: support nri & optimize
Signed-off-by: vegbir <yangjiaqi16@huawei.com>
2024-09-23 03:35:19 +00:00

84 lines
3.3 KiB
Diff
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

From 780a5e80311f5f3d188666733dcf276abc8e7e81 Mon Sep 17 00:00:00 2001
From: vegbir <yangjiaqi16@huawei.com>
Date: Wed, 14 Jun 2023 17:14:16 +0800
Subject: [PATCH 05/13] rubik: move fssr design document to design dir
Signed-off-by: vegbir <yangjiaqi16@huawei.com>
---
docs/{ => design}/fssr.md | 18 ++++++++++++++----
.../fssr/flowchart.png} | Bin
.../fssr/sequence_diagram.png} | Bin
3 files changed, 14 insertions(+), 4 deletions(-)
rename docs/{ => design}/fssr.md (90%)
rename docs/{png/rubik_fssr_2.png => images/fssr/flowchart.png} (100%)
rename docs/{png/rubik_fssr_1.png => images/fssr/sequence_diagram.png} (100%)
diff --git a/docs/fssr.md b/docs/design/fssr.md
similarity index 90%
rename from docs/fssr.md
rename to docs/design/fssr.md
index 3fb36bd..184b364 100644
--- a/docs/fssr.md
+++ b/docs/design/fssr.md
@@ -1,20 +1,27 @@
+# 【需求设计】异步内存分级回收 fssr策略
+
## 方案目标
+
在混部集群中在线和离线业务被同时部署到同一物理资源节点同时离线业务是内存资源消耗型在线业务有波峰波谷在离线业务之间内存资源竞争导致在线业务受影响。该方案目标在充分利用内存资源的同时保证在线QoS。
## 总体设计
+
各个模块之间的联系如下:
-![](png/rubik_fssr_1.png)
+![sequence_diagram](../images/fssr/sequence_diagram.png)
- 用户部署rubikrubik向k8s注册监听pod事件。
- 当离线业务被部署时k8s会通知rubikrubik向该离线pod配置memory.high。
- 同时rubik实时监控当前节点的内存使用量使用fssr策略向pod配置memory.high。
### 依赖说明
+
内核需要支持memcg级内存水位线方案即提供`memory.high`和`memory.high_async_ratio`。
### 详细设计
+
内存分级方案中rubik新增FSSR内存处理模块该模块主要处理获取主机节点的总内存(total memory)、预留内存(reserved memory)、剩余内存(free memory)。并根据FSSR算法设置离线内存的memory.high。具体策略如下
-![](png/rubik_fssr_2.png)
+![flowchart](../images/fssr/flowchart.png)
+
- rubik启动时计算预留内存默认为总内存的10%如果总内存的10%超过10G则为10G
- 配置离线容器的cgroup级别水位线内核提供`memory.high`和`memory.high_async_ratio`两个接口分别配置cgroup的软上限和警戒水位线。启动rubik时默认配置`memory.high`为`total_memory`(总内存)`*`80%
- 获取剩余内存free_memory
@@ -22,13 +29,16 @@
- 持续一分钟free_memory>2`*`reserved_memory时提高离线的memory.high每次提升总内存的1%total_memory`*`1%
说明:
+
1. 离线应用memory.high的范围为`[total_memory*30%, total_memory*80%]`
### 配置说明
-```
+
+```json
"dynMemory": {
"policy": "fssr"
}
```
+
- dynMemory表示动态内存
-- policy目前只支持fssr
\ No newline at end of file
+- policy目前只支持fssr
diff --git a/docs/png/rubik_fssr_2.png b/docs/images/fssr/flowchart.png
similarity index 100%
rename from docs/png/rubik_fssr_2.png
rename to docs/images/fssr/flowchart.png
diff --git a/docs/png/rubik_fssr_1.png b/docs/images/fssr/sequence_diagram.png
similarity index 100%
rename from docs/png/rubik_fssr_1.png
rename to docs/images/fssr/sequence_diagram.png
--
2.41.0