rubik/patch/0004-rubik-add-psi-design-documentation.patch

200 lines
93 KiB
Diff
Raw Normal View History

From 4c64eac61570b4cfd4e77766639f144a8a93f713 Mon Sep 17 00:00:00 2001
From: vegbir <yangjiaqi16@huawei.com>
Date: Sat, 10 Jun 2023 11:41:04 +0800
Subject: [PATCH 04/13] rubik: add psi design documentation
Signed-off-by: vegbir <yangjiaqi16@huawei.com>
---
CHANGELOG/CHANGELOG-2.0.0.md | 29 +++++++--
docs/design/psi.md | 94 +++++++++++++++++++++++++++++
docs/images/psi/PSI_designation.svg | 16 +++++
docs/images/psi/PSI_implement.svg | 4 ++
4 files changed, 139 insertions(+), 4 deletions(-)
create mode 100644 docs/design/psi.md
create mode 100644 docs/images/psi/PSI_designation.svg
create mode 100644 docs/images/psi/PSI_implement.svg
diff --git a/CHANGELOG/CHANGELOG-2.0.0.md b/CHANGELOG/CHANGELOG-2.0.0.md
index 5cc2cb8..b46fa3d 100644
--- a/CHANGELOG/CHANGELOG-2.0.0.md
+++ b/CHANGELOG/CHANGELOG-2.0.0.md
@@ -1,16 +1,37 @@
-1. Architecture optimization:
+# CHANGELOG
+
+## v2.0.1
+
+### New Feature
+
+Before June 30, 2023
+
+1. **dynMemory** (asynchronous memory classification recovery): implement fssr strategy
+2. **psi**: interference detection based on PSI index
+3. **quotaTurbo**: elastic cpu limit user mode solution
+
+## v2.0.0
+
+### Architecture optimization
+
refactor rubik through `informer-podmanager-services` mechanism, decoupling modules and improving performance
-2. Interface change:
+
+### Interface change
+
- configuration file changes
- use the list-watch mechanism to get the pod instead of the http interface
-3. Feature enhancements:
+
+### Feature enhancements
+
- support elastic cpu limit user mode scheme-quotaturbo
- support psi index observation
- support memory asynchronous recovery feature (fssr optimization)
- support memory access bandwidth and LLC limit
- optimize the absolute preemption
- optimize the elastic cpu limiting kernel mode scheme-quotaburst
-4. Other optimizations:
+
+### Other optimizations
+
- document optimization
- typo fix
- compile option optimization
diff --git a/docs/design/psi.md b/docs/design/psi.md
new file mode 100644
index 0000000..674a8e0
--- /dev/null
+++ b/docs/design/psi.md
@@ -0,0 +1,94 @@
+# 【需求设计】基于PSI指标的干扰检测
+
+## 需求设计图
+
+![PSI_designation](../images/psi/PSI_designation.svg)
+
+## 实现思路
+
+### PSI简介
+
+PSI是Pressure Stall Information的简称用于评估当前系统三大基础硬件资源CPU、Memory、IO的压力。顾名思义当进程无法获得运行所需的资源时将会产生停顿PSI就是衡量进程停顿时间长度的度量标准。
+
+### 使能cgroupv1 psi特性
+
+首先检查是否开启cgroup v1的PSI。两种方法看看文件是否存在或者查看内核启动命令行是否包含psi相关选项。
+
+```bash
+cat /proc/cmdline | grep "psi=1 psi_v1=1"
+```
+
+若无,则新增内核启动命令行
+
+```bash
+# 查看内核版本号
+uname -a
+# Linux openEuler 5.10.0-136.12.0.86.oe2203sp1.x86_64 #1
+# 找到内核的boot文件
+ls /boot/vmlinuz-5.10.0-136.12.0.86.oe2203sp1.x86_64
+# 新增参数
+grubby --update-kernel="/boot/vmlinuz-5.10.0-136.12.0.86.oe2203sp1.x86_64" --args="psi=1 psi_v1=1"
+# 重启
+reboot
+```
+
+随后便可以在cgroup v1中使用psi的三个文件观测数据。
+例如,在`/sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/<PodUID>/<container-longid>`目录下,涉及如下文件:
+
+- cpu.pressure
+- memory.pressure
+- io.pressure
+
+### 方案流程
+
+针对PSI格式数据使用`some avg10`作为观测指标。它表示任一任务在10s内的平均阻塞时间占比。
+
+用户通过配置阈值保障在线Pod的资源可用以及高性能。具体来说当阻塞占比超过某一阈值默认为5%则rubik按照一定策略驱逐离线Pod释放相应资源。
+
+在离线业务由注解`volcano.sh/preemptable="true"/"false"`标识。
+
+```yaml
+annotations:
+ volcano.sh/preemptable: true
+```
+
+在线Pod的CPU和内存利用率偏高rubik会驱逐当前占用CPU资源/内存资源最多的离线业务。若离线业务I/O高则会选择驱逐CPU资源占用最多的离线业务。
+> 注1当前cgroup控制io带宽手段有效难以精准判断驱逐哪个业务会降低io因此暂时采用CPU利用率作为标准。
+>
+> 注2通过cadvisor库实时获取离线业务的CPU利用率、内存占用量、IO带宽等信息按指标从大到小排序。
+
+需要处理可疑对象时则通过责任链设计模式传递事件处理请求,并执行相应操作。
+
+## 实现设计
+
+![PSI_implement](../images/psi/PSI_implement.svg)
+
+## 接口设计
+
+```yaml
+data:
+ config.json: |
+ {
+ "agent": {
+ "enabledFeatures": [
+ "psi"
+ ]
+ },
+ "psi": {
+ "resource": [
+ "cpu",
+ "memory",
+ "io",
+ ],
+ "interval": 10
+ }
+ }
+```
+
+`psi`字段用于标识基于psi指标的干扰检测特性配置。目前psi特性支持监测CPU、内存和I/O资源用户可以按需配置该字段单独或组合监测资源的PSI取值。
+
+| 配置键[=默认值] | 类型 | 描述 | 可选值 |
+| --------------- | ---------- | -------------------------------- | ----------- |
+| interval=10 |int|psi指标监测间隔单位| [10,30]|
+| resource=[] | string数组 | 资源类型,声明何种资源需要被访问 | cpu, memory, io |
+| avg10Threshold=5.0 | float | psi some类型资源平均10s内的压制百分比阈值单位%),超过该阈值则驱逐离线业务 | [5.0,100]|
diff --git a/docs/images/psi/PSI_designation.svg b/docs/images/psi/PSI_designation.svg
new file mode 100644
index 0000000..8b829e8
--- /dev/null
+++ b/docs/images/psi/PSI_designation.svg
@@ -0,0 +1,16 @@
+<svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1348.1945190429688 1007.3749847412109" width="1348.1945190429688" height="1007.3749847412109" filter="invert(93%) hue-rotate(180deg)">
+ <!-- svg-source:excalidraw -->
+
+ <defs>
+ <style class="style-fonts">
+ @font-face {
+ font-family: "Virgil";
+ src: url("https://excalidraw.com/Virgil.woff2");
+ }
+ @font-face {
+ font-family: "Cascadia";
+ src: url("https://excalidraw.com/Cascadia.woff2");
+ }
+ </style>
+ </defs>
+ <rect x="0" y="0" width="1348.1945190429688" height="1007.3749847412109" fill="#ffffff"/><g stroke-linecap="round" transform="translate(130.611083984375 10) rotate(0 49 20.5)"><path d="M10.25 0 M10.25 0 C28.3 -2.22, 41.02 -1.56, 87.75 0 M10.25 0 C41.81 0.02, 72.45 -1.44, 87.75 0 M87.75 0 C93.66 -1.68, 97.97 4.19, 98 10.25 M87.75 0 C93.97 -1.69, 96.77 2.23, 98 10.25 M98 10.25 C98.22 18.12, 98.06 26.67, 98 30.75 M98 10.25 C97.81 17.25, 97.4 25.59, 98 30.75 M98 30.75 C98.76 36.53, 96.26 41.35, 87.75 41 M98 30.75 C95.89 39.86, 93.88 38.92, 87.75 41 M87.75 41 C72.63 39.3, 55.58 41.37, 10.25 41 M87.75 41 C69.77 41.35, 53.94 41.69, 10.25 41 M10.25 41 C3.16 42.81, 0.73 35.99, 0 30.75 M10.25 41 C5.18 40.02, -1.6 39.7, 0 30.75 M0 30.75 C-1.8 26.73, -1.46 20.84, 0 10.25 M0 30.75 C-0.64 25.32, 0.15 20.57, 0 10.25 M0 10.25 C1.75 3.72, 2.7 0.66, 10.25 0 M0 10.25 C-1.57 5.24, 2.35 0.94, 10.25 0" stroke="#000000" stroke-width="1" fill="none"/></g><g transform="translate(158.611083984375 19) rotate(0 21 11.5)"><text x="21" y="18" font-family="Helvetica, Segoe UI Emoji" font-size="20px" fill="#000000" text-anchor="middle" style="white-space: pre;" direction="ltr">开始</text></g><g stroke-linecap="round" transform="translate(10.52783203125 272.54173278808594) rotate(0 167.5 51)"><path d="M25.5 0 M25.5 0 C102.23 1.4, 175.02 2.64, 309.5 0 M25.5 0 C101.89 1.59, 179.27 1.13, 309.5 0 M309.5 0 C328.43 0.51, 333.43 9.99, 335 25.5 M309.5 0 C327.13 -1.89, 335.31 9.67, 335 25.5 M335 25.5 C335.67 41.65, 336.47 60.15, 335 76.5 M335 25.5 C334.53 36.69, 334.47 49.47, 335 76.5 M335 76.5 C333.35 93.45, 325.87 101.41, 309.5 102 M335 76.5 C336.32 91.88, 326.66 102.2, 309.5 102 M309.5 102 C200.46 102.84, 91.48 103.54, 25.5 102 M309.5 102 C237.27 102.83, 164.87 103.44, 25.5 102 M25.5 102 C7.62 103.31, -0.35 95.1, 0 76.5 M25.5 102 C8.01 101.64, 0.32 91.85, 0 76.5 M0 76.5 C1.44 56.76, 0.02 38.05, 0 25.5 M0 76.5 C-0.69 61.46, 0.54 47.4, 0 25.5 M0 25.5 C-0.76 8.89, 8.59 1.26, 25.5 0 M0 25.5 C-0.06 7.1, 10.4 -0.3, 25.5 0" stroke="#000000" stroke-width="1" fill="none"/></g><g transform="translate(70.02783203125 300.54173278808594) rotate(0 108 23)"><text x="108" y="18" font-family="Helvetica, Segoe UI Emoji" font-size="20px" fill="#000000" text-anchor="middle" style="white-space: pre;" direction="ltr">遍历在线Pod列表</text><text x="108" y="41" font-family="Helvetica, Segoe UI Emoji" font-size="20px" fill="#000000" text-anchor="middle" style="white-space: pre;" direction="ltr">读取并解析Pod PSI指标</text></g><g stroke-linecap="round" transform="translate(10.5833740234375 112.263916015625) rotate(0 167 51)"><path d="M25.5 0 M25.5 0 C96.35 0.78, 167.13 1, 308.5 0 M25.5 0 C104.04 2.22, 181.65 0.88, 308.5 0 M308.5 0 C324.28 -1.89, 335.64 8.69, 334 25.5 M308.5 0 C327.29 -1.41, 336.11 8.11, 334 25.5 M334 25.5 C335.47 39.31, 332.59 50.78, 334 76.5 M334 25.5 C334.37 44.3, 333.82 62.76, 334 76.5 M334 76.5 C335.88 92.21, 327.02 100.05, 308.5 102 M334 76.5 C334.52 91.59, 324.31 103.95, 308.5 102 M308.5 102 C228.63 102.84, 148.86 101.45, 25.5 102 M308.5 102 C249.17 102.01, 189.79 101.17, 25.5 102 M25.5 102 C10.08 102.34, -1.32 92.32, 0 76.5 M25.5 102 C6.56 101.39, -2.07 93.74, 0 76.5 M0 76.5 C0.83 62.92, 1.7 46.63, 0 25.5 M0 76.5 C0.39 61.78, 0.97 46.06, 0 25.5 M0 25.5 C-1.92 7.82, 8.15 -0.27, 25.5 0 M0 25.5 C1.7 7.13, 7.63 2.03, 25.5 0" stroke="#000000" stroke-width="1" fill="none"/></g><g transform="translate(17.5833740234375 117.263916015625) rotate(0 160 46)"><text x="160" y="18" font-family="Helvetica, Segoe UI Emoji" font-size="20px" fill="#000000" text-anchor="middle" style="white-space: pre;" direction="ltr">是否支持cgroupV1 PSI接口</text><text x="160" y="41" font-family="Helvetica, Segoe UI Emoji" font-size="20px" fill="#000000" text-anchor="middle" style="white-space: pre;" direction="ltr">/sys/fs/cgroup/cpuacct/cpu.pressure</text><text x="160" y="64" font-family="Helvetica, Segoe UI Emoji" font-size="20px" fill="#000000" text-anchor="middle" style="white-space: pre;" direction="ltr">...io.pressure </text><text x="160" y="87" font-family="
\ No newline at end of file
diff --git a/docs/images/psi/PSI_implement.svg b/docs/images/psi/PSI_implement.svg
new file mode 100644
index 0000000..9704504
--- /dev/null
+++ b/docs/images/psi/PSI_implement.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than diagrams.net -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="1386px" height="771px" viewBox="-0.5 -0.5 1386 771" content="&lt;mxfile host=&quot;Electron&quot; modified=&quot;2023-06-09T03:31:36.369Z&quot; agent=&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/21.2.8 Chrome/112.0.5615.165 Electron/24.2.0 Safari/537.36&quot; etag=&quot;hMdrX38smwDcLm7Ib4KD&quot; version=&quot;21.2.8&quot; type=&quot;device&quot; pages=&quot;5&quot;&gt;&lt;diagram id=&quot;07kHJZS3hwJW5_4EwsxT&quot; name=&quot;rubik重构&quot;&gt;7V3rc9q4Fv9rmEl3Bsbvx8dAmiZ7m95Mk/am95vAArw1ltc2Celfv5It+SEJMAQD3bjNTCxZluVzfjo6Dx2lp48Wq08xiOZ3yINBT1O8VU+/6mmabbsO/kVqXvOavuVaec0s9r28Ti0rHvxfkFYqtHbpezCpNUwRClI/qldOUBjCSVqrA3GMXurNpiiovzUCMyhUPExAINb+z/fSOa21TKO8cQP92Zy+WtN1+oELwFrTT0nmwEMvlSr9Y08fxQil+dViNYIBIR8jzPhHoHxPxuPr8V+/0uHPT87frzf9vLPrXR4pviGGYbp31zFYPM7M2VNw++fV8+V//prdzK76qkE7fwbBkpJsArxnP0Ex/er0ldESEyAilwvkLUndEAT+LMQVAZzicQ2TCEz8cPY5K11hGpUtJnjkMMYVzzBOfcyeS3ojRRGubfiBbKy4D7iq8Jd+8CeIFjCNX3ETerev26oyMPPHKIQ1x6EMfSkBobm00byKBcZ5QEE4K15Q0hlfUFLvQnaB6D3NCggNoxrJrb+XBF3DBYhnPibXJb6rRKteRtsMo6S+nxGR3DMq9zCJ0j6lP7lHWVD0ia9m9Hf2Zr9SARaEK4FYuiV9TMEEFtX1R2o9YtL4/FvGsVDDKu4waf1J5eEx3xTXRXzdPCYUY3KHfZy6/jv3onCO8ILEYt+k7bfIAym8+IAvIRZd8aZhtzeKS897jP3ZDMYXg8GAXpIxCfQ90qDWMb4+Ak7WyOUEJ3AQbjUNMpk89ckMHE5RmNJFSNVo+Ros/IBM/hsYPEPSK4FNughIozaFj2FYuiB8VInwsQ1R+KhKa8LHXSt9ihk3BAncNB/5leHFXwQghIwD9M4OK8Bk7gfeZ/CKluT7kxRMfrLScI5i/xfuFjCO4dsxY7Np1lo8kCfpq2NI5MI9Y6zKVd2BVa3hZ5CktGKCggBEiT8uPiMH+xClKVrQqqNAyLaUOoBs2xQApLIVrQYguy0AsWWxAiCQYqyMlylW9XhskHWoTq0kjdFPOEIBFpH6VYhy3OAJzFVthc7C97xMEalrHkZZ85XSw6hLizl+EIaE8ygFKSjZHCE/TDN6mUP8gz90RPQHE3/ICJfVsox/SPM4HaEQfxDwM85DjKEXmKRSTGyejdtBQSFgNgRAa9qLJqovab7MkLWBEO7qkZUPDgYq+n8rKBCZyD6np+mOQv63CZGmMqI9iOiSNYbDQuBnjM0BwIwz9SDcfyRrClGzeTzoIh50Ce8DMIbBPUr81Eek/zhvy2HiZBLAbchepy3uGgJ3UQRjQKjVLQCts19V9FNPb1MAQEOzozQDRGupWytaRY3WEDWG3RJqbG294bHBLQBCkKkSnR1yQjvEUHhTVlcsiSlrycSQaraFKFHN6CyRfUVKMT3fYonIAdDWOmSLisj9wy2mw9SfiaLluw9fOpvkMDbJ7mCR2CRSsOitCQurs0naY68qc2zK+NuWUWLbnVFyUv47p14MHAEAt8nXZRhik4RYGWOEAnFRwA0uJlgWYjwMRvnvD2KrB5jmq8rFHAYYVYMRKdyAEHM6FgI+5XP3MXwgSuIFiPxBvv5saP0I44UfEqtI2rxbt1oFsN4QwHZb9pGz3j46aVhY8uYxij0Y9yc5ri6zQccX/X61/kPei9jv2ujyA4ZWAFO8vO0bXW4UdWwUVz4+xeEqWgYJXq0+ruDGMO2ZhLzfb1xXVd2BxgV2LUsTxIeuFUZzzcHS1groyLzunQB5LwIkhglaxhN4GYLgNfGTTo78hnLEVk4vR1zZ/jSOATD0Lsn+TFwaB4j4OnN3KKukqiZudp2RPtMDsxaVMr5LueHUac6zd0wdnkMPJHPo0VaZSTwEk5+zGC1Dj9Nys7qsbf5qwaQvseBkY8Gke6L3ssIP8ujA0AxWcbUqxk1Kr6y08tOnyvUP1gm+Lh8hhdfq8/cw9jG/iA15Jfe8bMJZPtUbaJOY5DO4SVVWmcMferW9tCJwq5EDphnHMMAm9nP1QTkiaXf3xBSoTgLFMvlNUq7KITv/WPpoCW5Jb6or7PcUesspIvSWTZXia9+wv0oX98f8C6cPQ71aQT2h/VbcP1ULlac2zLADTw0GUw7wTebKESeGY1oDtQ5l09YGjm247N++s8QyhTlnOkeeJYZo7VZ899x0WRPYI9jbO2KnWT2pd6TdIJ7akwbxqopJu0qHwvYFsjieqku2kxgyx0drO1INmeFCdMhsv/kzYR9RIDMoWj085FwokBZfIZvupEXuA8fCChsCAoiow+xAXrJc4zxHFxmv5IpY2jIjG/vMNKsZcgyjNeTIgjpnYPKeNpUiswB/rds0cfYWIWmLBero/tu31A8u8kn9xz3ybsMpIg5xdn1w+7vp0G7/OwShl6lD5zi8O7iQUe5043q3Nr5uK/XlVrNYaOKEGSCGGCvtpOaw2LT+2wpN4vFbpvCCvQnrIqH0Zc3dr+vYKHPNUgJeg0lay37Lh5G3OXF23ANMv+AP4vaoipw/4pC+gEW+DZapre9ENMrS4yzZnsJjC0dxH0EnHNfO7d9NRN7BJAGz7ROu7XE8UveKl2RDWYAIK2t0QFhlS18j6MHpoNPYzkEsmWeQtWuIabvfEiiwoOJYRlHmaSi9xBlxay7gCkF3CJeoRYgk8+WW7tsi8tEkWlL4in9U7sj9xrsxe6v/t7G397SREd3gY4Pu/nERZ1tfbft7TTEqclr87oHDfTF/aPwyXDZAsHNKBKs2jzp9bwQr6ra+WkewmG1QJCc1i1ccIxFJV3r/4kQk3XI4f4orW5uZsDnOgQiqKeZDdnlIe6c2FvOscRxCbwiB9g50MsU4xCKXDYN1MuJ97t9+61kIu4PDbAgOfjk5IDhk7lYODV3m0d4Mlp3nJmNwW5lHqil6jLrUo2MCQJK5cWTxv/5Mraju9/kCXy4kCUbUO53lCm3IEMqbkTwlTF8YE9fQd63sb72/v9jlVfX5+yGe0kSJlPjyC6dhOE6iuv+n6s5v5BZ6nytde3hXlYaAZzkJbwH8ZHr5f/hi/YnMEUR9/ylxo1VftsNRcoTc/cNtd4rcjkaTAIOmdtQGl6bDBaHlZzdIReYhZKYUQuIe885kWiNDtkBi/Qzd6eSGtlZM6fA0gft/LDJJMRCPniyXP/A8U5XHOZ5pc3IQtKZgVoDUMsSWLBcm6fGbAvmWE0x6sscwU24bB2LK55NlEvmThg93C+NWE/DwYJedPCG1Dw5w8oR0eN1heO0xV226kh3A+pMOrzsL74TrmGod0fSTjk/0/HYH2x0XAk2tf70tY0j07p1FYLOvDJSiIs+z0nWntzG+iQt8WmIl5um6Wq+Wt1XkTa6L9a/PcxRyrw4WHWXnwFSDo+uX5WPF8lWrZoFpDi+RmkdCDYe35tqL5UsJJ7o7zwLwh8hE1JqlIraIXkNE76Z0nWPh163jV9d0Y1/8mizRqOzLOip+VdF9dRYA3nMrimHXMey69onl8iaf4VZg5yci
\ No newline at end of file
--
2.41.0