iSulad/0037-add-cgroup-v2-doc.patch

253 lines
57 KiB
Diff
Raw Normal View History

From 2cbce684b973bf4e250f41d750253b5b8abde32d Mon Sep 17 00:00:00 2001
From: zhongtao <zhongtao17@huawei.com>
Date: Tue, 20 Feb 2024 19:22:11 +0800
Subject: [PATCH 37/43] add cgroup v2 doc
Signed-off-by: zhongtao <zhongtao17@huawei.com>
---
docs/design/README_zh.md | 2 +
.../detailed/Container/cgroup_v2_design_zh.md | 193 ++++++++++++++++++
docs/images/cgroup_v2_module.svg | 16 ++
3 files changed, 211 insertions(+)
create mode 100644 docs/design/detailed/Container/cgroup_v2_design_zh.md
create mode 100644 docs/images/cgroup_v2_module.svg
diff --git a/docs/design/README_zh.md b/docs/design/README_zh.md
index f2c187a1..b7ec3ddb 100644
--- a/docs/design/README_zh.md
+++ b/docs/design/README_zh.md
@@ -18,6 +18,8 @@
- 查看 restart 模块的设计文档: [restart_manager_design](./detailed/Container/restart_manager_design.md)。
+- 查看 cgroup v2 的设计文档: [cgroup_v2_design](./detailed/Container/cgroup_v2_design_zh.md)。
+
## CRI
- 查看 CRI的启动程序的重构文档 [cri_cni_refactor](./detailed/CRI/cri_cni_refactor_zh.md) 。
diff --git a/docs/design/detailed/Container/cgroup_v2_design_zh.md b/docs/design/detailed/Container/cgroup_v2_design_zh.md
new file mode 100644
index 00000000..e1ce81d0
--- /dev/null
+++ b/docs/design/detailed/Container/cgroup_v2_design_zh.md
@@ -0,0 +1,193 @@
+| Author | zhongtao |
+| ------ | --------------------- |
+| Date | 2024-02-19 |
+| Email | zhongtao17@huawei.com |
+# 方案目标
+
+cgroup是linux中用于限制进程组资源的功能。cgroup目前包括两个版本cgroup v1和cgroup v2。cgroup v2的目标是取代cgroup v1出于兼容性的考虑cgroup v1并没有在内核中删除并且大概率会长期存在。该需求的目的为使得iSulad支持cgroup v2.
+
+## 与cgroup v1差异
+无论是cgroup v1还是cgroup v2iSulad提供给用户使用的接口都是一致的。不过由于有部分cgroup v1支持的功能在cgroup v2中被去掉了或者实现方式有所变化因此部分接口在cgroup v2中不可用或者含义发生变化。iSulad支持限制如下资源
+
+|资源|功能|和cgroup v1的差异|
+|---|---|---|
+|devices|限制对应的设备是否可以在容器中访问以及访问权限|devcies子系统不再使用往cgroup文件里写值的方式进行限制而是采用ebpf的方式进行限制|
+|memory|限制容器的内存资源|不支持swappiness不支持kmem相关参数不支持oom_control|
+|cpu/cpuset|限制容器的cpu资源|不支持rt_*相关(实时线程)的限制|
+|blkio/io|限制容器的块设备io|不仅限制块设备的IO也能限制buffer IO|
+|hugetlb|限制大页内存的使用|无差异|
+|pids|限制容器使用的pid|无差异|
+|files|限制容器使用的fd|无差异|
+|freeze|暂停容器|无差异|
+
+## 使用方式
+
+使用的示例如下:
+
+1. 以限制内存资源为例假设我们需要限制单个容器最多使用10M内存则可以在运行容器时加上-m 10m参数进行限制
+
+ ```sh
+ [root@openEuler iSulad]# isula run -tid -m 10m busybox sh
+ 000c0c6eb609179062b19a3d2de4d7c38a42c887f55e2a7759ed9df851277163
+ ```
+
+ -m 10m表示限制容器内最多只能使用10m内存可以通过isula stats命令查看资源的限制情况
+
+ ```sh
+ [root@openEuler iSulad]# isula stats --no-stream 000c0c6eb6
+ CONTAINER CPU % MEM USAGE / LIMIT MEM % BLOCK I / O PIDS
+ 000c0c6eb609 0.00 104.00 KiB / 10.00 MiB 1.02 0.00 B / 0.00 B 1
+ ```
+
+ 可以动态更新资源的限制:
+ ```sh
+ [root@openEuler iSulad]# isula update -m 20m 000c0c6eb6
+ 000c0c6eb6
+ [root@openEuler iSulad]# isula stats --no-stream 000c0c6eb6
+ CONTAINER CPU % MEM USAGE / LIMIT MEM % BLOCK I / O PIDS
+ 000c0c6eb609 0.00 104.00 KiB / 20.00 MiB 0.51 0.00 B / 0.00 B 1
+ ```
+
+2. 假设我们要将设备/dev/sda挂载到容器中成为/dev/sdx并限制为只读设备则可以这么配置
+
+```sh
+ [root@openEuler iSulad]# isula run -ti --rm --device=/dev/sda:/dev/sdx:wm busybox fdisk /dev/sdx
+ fdisk: can't open '/dev/sdx'
+ [root@openEuler iSulad]#
+```
+
+挂载设备到容器的语法为`--device=$host:$container:rwm $host`指定设备在主机上的绝对路径,$container指定设备在容器内的绝对路径r表示可读w表示可写m表示可以创建node 上述命令中rwm三个参数缺少r参数也就是说允许写和创建node但是不允许读(即只读)。
+
+3. 使用cri的PodSandboxStats接口与ContainerStats接口获取容器的资源使用状况
+
+```sh
+[root@openEuler ~]# crictl statsp c3
+ POD POD ID CPU % MEM
+ test-sandbox c32556d3bb139 0.00 196.6kB
+[root@openEuler ~]# crictl statsp --output json c3
+......
+ "linux": {
+ "cpu": {
+ "timestamp": "1708499622485777700",
+ "usageCoreNanoSeconds": {
+ "value": "180973"
+ },
+ "usageNanoCores": null
+ },
+ "memory": {
+ "timestamp": "1708499622485777700",
+ "workingSetBytes": {
+ "value": "196608"
+ },
+ "availableBytes": {
+ "value": "0"
+ },
+ "usageBytes": {
+ "value": "4386816"
+ },
+ "rssBytes": {
+ "value": "176128"
+ },
+ "pageFaults": {
+ "value": "1193"
+ },
+ "majorPageFaults": {
+ "value": "6"
+ }
+ },
+ "network": null,
+ "process": {
+ "timestamp": "1708499622485777700",
+ "processCount": {
+ "value": "2"
+ }
+ },
+.....
+
+[root@openEuler ~]# crictl stats 01
+CONTAINER CPU % MEM DISK INODES
+01a726f61c5c3 0.01 3.801MB 16.4kB 8
+[root@openEuler ~]#
+```
+
+# 总体设计
+
+在原有只支持cgroup v1的基础上对cgroup模块进行了重构重构后的架构图如下
+
+![cgroup_v2_module](../../../images/cgroup_v2_module.svg)
+
+主要功能为以下两种:
+
+1. iSulad在资源控制参数设置和更新过程中负责参数的合法校验用于拦截非法请求真正的cgroup操作由容器运行时完成。
+
+2. iSulad在获得sandbox资源使用信息时直接读取sandbox cgroup文件信息。
+
+# 接口描述
+由于无论是cgroup v1还是cgroup v2iSulad提供给用户使用的接口都是一致的。
+无新增接口。
+
+```c
+int verify_container_settings(const oci_runtime_spec *container, const sysinfo_t *sysinfo);
+
+int verify_host_config_settings(host_config *hostconfig, const sysinfo_t *sysinfo, bool update);
+
+
+typedef struct {
+ int (*get_cgroup_info)(cgroup_mem_info_t *meminfo, cgroup_cpu_info_t *cpuinfo,
+ cgroup_hugetlb_info_t *hugetlbinfo, cgroup_blkio_info_t *blkioinfo,
+ cgroup_cpuset_info_t *cpusetinfo, cgroup_pids_info_t *pidsinfo,
+ cgroup_files_info_t *filesinfo, bool quiet);
+ int (*get_cgroup_metrics)(const char *cgroup_path, cgroup_metrics_t *cgroup_metrics);
+
+ int (*common_find_cgroup_mnt_and_root)(const char *subsystem, char **mountpoint, char **root);
+
+ char *(*sysinfo_cgroup_controller_cpurt_mnt_path)(void);
+} cgroup_ops;
+
+int cgroup_v2_ops_init(cgroup_ops *ops)
+{
+ if (ops == NULL) {
+ return -1;
+ }
+ ops->get_cgroup_info = common_get_cgroup_info_v2;
+ ops->get_cgroup_metrics = common_get_cgroup_v2_metrics;
+ ops->common_find_cgroup_mnt_and_root = common_find_cgroup_v2_mnt_and_root;
+ return 0;
+}
+```
+
+# 详细设计
+
+```mermaid
+sequenceDiagram
+ participant isula
+ participant kubelet
+ participant isulad
+ participant runc
+ participant cgroup
+
+ isula->>isulad: request
+ kubelet->>isulad:request
+ alt run/create/update
+ isulad->>isulad:verify request option
+ isulad->>runc:run/create/update request
+ runc ->> cgroup:write cgroup file
+ else stats
+ par container stats
+ isulad->>runc:container stats request
+ runc ->> cgroup:read cgroup file
+ runc ->> isulad:container stats info
+ and sandbox stats
+ isulad ->> cgroup: read cgroup file
+ end
+ end
+ isulad ->> isula:response
+ isulad ->> kubelet:response
+```
+
+# 使用限制
+
+1. 只支持cgroup 挂载点在/sys/fs/cgroup
+2. cgroup v1与cgrooup v2混用场景不支持
+3. 该需求只涉及cgroup v2对runc容器运行时的支持
+
diff --git a/docs/images/cgroup_v2_module.svg b/docs/images/cgroup_v2_module.svg
new file mode 100644
index 00000000..59e7939b
--- /dev/null
+++ b/docs/images/cgroup_v2_module.svg
@@ -0,0 +1,16 @@
+<svg version="1.1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1060.0000000000007 754.5203469732791" width="1060.0000000000007" height="754.5203469732791">
+ <!-- svg-source:excalidraw -->
+
+ <defs>
+ <style class="style-fonts">
+ @font-face {
+ font-family: "Virgil";
+ src: url("https://excalidraw.com/Virgil.woff2");
+ }
+ @font-face {
+ font-family: "Cascadia";
+ src: url("https://excalidraw.com/Cascadia.woff2");
+ }
+ </style>
+ </defs>
+ <rect x="0" y="0" width="1060.0000000000007" height="754.5203469732791" fill="#ffffff"></rect><g stroke-linecap="round" transform="translate(374.51277982271654 10) rotate(0 98 27)"><path d="M0 0 C76.18 0, 152.36 0, 196 0 M0 0 C53.42 0, 106.84 0, 196 0 M196 0 C196 10.9, 196 21.81, 196 54 M196 0 C196 13.47, 196 26.95, 196 54 M196 54 C120.92 54, 45.83 54, 0 54 M196 54 C128.59 54, 61.18 54, 0 54 M0 54 C0 35.38, 0 16.75, 0 0 M0 54 C0 42.69, 0 31.39, 0 0" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g><g transform="translate(456.95125638521654 25.5) rotate(0 15.5615234375 11.5)"><text x="15.5615234375" y="0" font-family="Helvetica, Segoe UI Emoji" font-size="20px" fill="#1e1e1e" text-anchor="middle" style="white-space: pre;" direction="ltr" dominant-baseline="text-before-edge">CLI</text></g><g stroke-linecap="round" transform="translate(644.5127798227165 11) rotate(0 98 27)"><path d="M0 0 C41.14 0, 82.28 0, 196 0 M0 0 C77.31 0, 154.62 0, 196 0 M196 0 C196 11.1, 196 22.21, 196 54 M196 0 C196 10.93, 196 21.86, 196 54 M196 54 C135.62 54, 75.24 54, 0 54 M196 54 C135.82 54, 75.63 54, 0 54 M0 54 C0 39.1, 0 24.21, 0 0 M0 54 C0 32.96, 0 11.92, 0 0" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g><g transform="translate(725.2911001352165 26.5) rotate(0 17.2216796875 11.5)"><text x="17.2216796875" y="0" font-family="Helvetica, Segoe UI Emoji" font-size="20px" fill="#1e1e1e" text-anchor="middle" style="white-space: pre;" direction="ltr" dominant-baseline="text-before-edge">CRI</text></g><g stroke-linecap="round"><g transform="translate(481.84615384615404 66.66668701171875) rotate(0 0 0)"><path d="M0 0 C0 0, 0 0, 0 0 M0 0 C0 0, 0 0, 0 0" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g><g transform="translate(481.84615384615404 66.66668701171875) rotate(0 0 0)"><path d="MNaN NaN CNaN NaN, NaN NaN, NaN NaN MNaN NaN CNaN NaN, NaN NaN, NaN NaN" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g><g transform="translate(481.84615384615404 66.66668701171875) rotate(0 0 0)"><path d="MNaN NaN CNaN NaN, NaN NaN, NaN NaN MNaN NaN CNaN NaN, NaN NaN, NaN NaN" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g></g><mask></mask><g stroke-linecap="round"><g transform="translate(474.4873658176107 65) rotate(0 -0.1998750712791164 41.22263106766279)"><path d="M0 0 C-0.15 30.41, -0.29 60.82, -0.4 82.45 M0 0 C-0.09 19.52, -0.19 39.04, -0.4 82.45" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g><g transform="translate(474.4873658176107 65) rotate(0 -0.1998750712791164 41.22263106766279)"><path d="M-10.52 54.21 C-6.79 64.62, -3.06 75.04, -0.4 82.45 M-10.52 54.21 C-8.13 60.89, -5.73 67.58, -0.4 82.45" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g><g transform="translate(474.4873658176107 65) rotate(0 -0.1998750712791164 41.22263106766279)"><path d="M10 54.3 C6.16 64.68, 2.33 75.06, -0.4 82.45 M10 54.3 C7.54 60.97, 5.07 67.63, -0.4 82.45" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g></g><mask></mask><g stroke-linecap="round"><g transform="translate(731.846153846154 66.66668701171875) rotate(0 0.33331298828125 39.666656494140625)"><path d="M0 0 C0.21 25.34, 0.43 50.69, 0.67 79.33 M0 0 C0.15 18.11, 0.3 36.21, 0.67 79.33" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g><g transform="translate(731.846153846154 66.66668701171875) rotate(0 0.33331298828125 39.666656494140625)"><path d="M-9.83 51.23 C-6.48 60.21, -3.12 69.18, 0.67 79.33 M-9.83 51.23 C-7.43 57.64, -5.04 64.06, 0.67 79.33" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g><g transform="translate(731.846153846154 66.66668701171875) rotate(0 0.33331298828125 39.666656494140625)"><path d="M10.69 51.06 C7.49 60.09, 4.29 69.12, 0.67 79.33 M10.69 51.06 C8.4 57.51, 6.11 63.96, 0.67 79.33" stroke="#1e1e1e" stroke-width="1" fill="none"></path></g></g><mask></mask><g stroke-linecap="round" transform="translate(623.846153846154 147.83334350585938) rotate(0 114 26.5)"><path d="M0 0 C65.21 0, 130.41 0, 228 0 M0 0 C55.16 0, 110.32 0, 228 0 M228 0 C228 11.04, 228 22.08, 228 53 M228 0 C228 18.15, 228 36.3, 228 53 M228 53 C142.95 5
\ No newline at end of file
--
2.34.1