From e8650fd6c225ac95a463a3bb738f93822ed1bbf5 Mon Sep 17 00:00:00 2001 From: zhongtao Date: Wed, 24 May 2023 16:52:28 +0800 Subject: [PATCH] upgrade from upstream Signed-off-by: zhongtao --- 0001-convert-files-from-CRLF-to-LF.patch | 1942 +++++++++++++++++ 0002-restore-ping-head.patch | 26 + 0003-fix-health_check.sh.patch | 42 + ...e-isulad_io-not-NULL-before-close-fd.patch | 62 + 0005-recheck-delete-command-exit-status.patch | 75 + 0006-restore-execSync-return-value.patch | 31 + ...ce-cri_stream.sh-and-health_check.sh.patch | 131 ++ 0008-reinforce-omit-health_check.sh.patch | 166 ++ ...y-leak-and-array-access-out-of-range.patch | 75 + iSulad.spec | 18 +- 10 files changed, 2567 insertions(+), 1 deletion(-) create mode 100644 0001-convert-files-from-CRLF-to-LF.patch create mode 100644 0002-restore-ping-head.patch create mode 100644 0003-fix-health_check.sh.patch create mode 100644 0004-ensure-isulad_io-not-NULL-before-close-fd.patch create mode 100644 0005-recheck-delete-command-exit-status.patch create mode 100644 0006-restore-execSync-return-value.patch create mode 100644 0007-reinforce-cri_stream.sh-and-health_check.sh.patch create mode 100644 0008-reinforce-omit-health_check.sh.patch create mode 100644 0009-fix-memory-leak-and-array-access-out-of-range.patch diff --git a/0001-convert-files-from-CRLF-to-LF.patch b/0001-convert-files-from-CRLF-to-LF.patch new file mode 100644 index 0000000..1f496e4 --- /dev/null +++ b/0001-convert-files-from-CRLF-to-LF.patch @@ -0,0 +1,1942 @@ +From 634671cf7ac001bc64853375c5d77016966b6c09 Mon Sep 17 00:00:00 2001 +From: zhangxiaoyu +Date: Fri, 12 May 2023 09:58:09 +0800 +Subject: [PATCH 1/9] convert files from CRLF to LF + +Signed-off-by: zhangxiaoyu +--- + README_zh.md | 462 ++++++++-------- + docs/design/architecture.md | 86 +-- + docs/design/architecture_zh.md | 82 +-- + .../design/detailed/CRI/k8s125_New_Add_CRI.md | 324 ++++++------ + docs/design/detailed/Container/state_check.md | 494 +++++++++--------- + docs/manual/k8s_integration_zh.md | 428 +++++++-------- + test/image/oci/registry/data/v2/ping_head | 4 +- + 7 files changed, 940 insertions(+), 940 deletions(-) + +diff --git a/README_zh.md b/README_zh.md +index 4d4e1401..72942765 100755 +--- a/README_zh.md ++++ b/README_zh.md +@@ -1,232 +1,232 @@ +-- [English version](README.md) +- +-iSulad +- +- ![license](https://img.shields.io/badge/license-Mulan%20PSL%20v2-blue) ![language](https://img.shields.io/badge/language-C%2FC%2B%2B-blue) +- +-## Introduction +- +-`iSulad`是一个由C/C++编写实现的轻量级容器引擎,具有轻、灵、巧、快的特点,不受硬件规格和架构限制,底噪开销更小,可应用的领域更为广泛。 +- +-## Architecture +- +-`iSulad`架构的相关介绍请查看:[architecture](./docs/design/architecture_zh.md)。 +- +-## Function +- +-### Runtime +- +-`iSulad`支持多种容器runtime,包括lxc、runc和kata。 +- +-#### lxc +- +-lxc是用C语言编写的开源容器操作runtime,资源占用少,适用于对底噪资源限制高的场景,为iSulad默认的runtime。 +- +-#### runc +- +-runc是用GO语言编写的符合OCI标准的runtime,使用runc时要求其使用的OCI runtime-spec version不低于iSulad支持的oci spec version 1.0.0。 +- +-#### kata-runtime +- +-kata-runtime是一个安全容器runtime,用于启动安全容器时使用。 +- +-### Image +- +-`iSulad`支持多种镜像格式,包括OCI标准镜像格式、external rootfs镜像格式和embedded image镜像格式。 +- +-#### OCI +- +-OCI标准镜像格式是与docker兼容的镜像格式,支持从远程镜像仓库拉取镜像、运行容器。 +- +-#### external rootfs +- +-external rootfs镜像格式允许用户自行准备可启动的`root fs`目录,主要用于系统容器场景。 +- +-#### embedded image +- +-embedded image镜像格式是`iSulad`特有的嵌入式镜像格式,占用资源低,主要用于嵌入式应用场景。 +- +-### Operation Interface +- +-`iSulad`提供两种不同的镜像和容器管理操作接口,分别为CLI和CRI。 +- +-#### CLI +- +-CLI采用命令行的形式进行镜像和容器管理,是标准的C/S架构模式,将iSulad作为daemon服务端,iSula作为独立的客户端命令,供用户使用。 +- +-iSula提供的命令参数覆盖了常用的大部分应用场景,包括容器的操作接口,如运行、停止、删除、pause等操作,也包括镜像的相关操作,如下载、导入、删除等。 +- +-#### CRI +- +-CRI(Container Runtime Interface)是由K8S对外提供的容器和镜像的服务接口,供容器引擎接入K8s。 +- +-CRI接口基于gRPC实现。iSulad遵循CRI接口规范,实现 CRI gRPC Server,CRI gRPC Server 中包括 Runtime Service 和 image Service,分别用来提供容器运行时接口和镜像操作接口。iSulad的 gRPC Server 需要监听本地的Unix socket,而K8s的组件 kubelet 则作为 gRPC Client 运行。 +- +-## Getting Started +- +-- [用法指南:openeuler官方手册](https://docs.openeuler.org/zh/docs/22.03_LTS/docs/Container/container.html) +- +-- [开发指南](./docs/build_docs/README_zh.md) +- +-- [用户手册](./docs/manual/README_zh.md) +- +-- [设计文档](./docs/design/README_zh.md) +- +-### Installing +- +-`iSulad`可以使用`yum`命令进行安装,安装之前需要查看确保配置了openEuler仓库: +- +-```shell +-$ cat << EOF > /etc/yum.repos.d/openEuler.repo +-[openEuler] +-baseurl=https://repo.openeuler.org/openEuler-22.03-LTS/OS/\$basearch +-enabled=1 +-EOF +-``` +- +-用`yum`安装`iSulad`的命令如下: +- +-```shell +-$ yum install -y iSulad +-``` +- +-若运行安装命令后报如下错误: +- +-```txt +-Repository 'openEuler' is missing name in configuration, using id. +- +-You have enabled checking of packages via GPG keys. This is a good thing. +-However, you do not have any GPG public keys installed. You need to download +-the keys for packages you wish to install and install them. +-You can do that by running the command: +- rpm --import public.gpg.key +- +- +-Alternatively you can specify the url to the key you would like to use +-for a repository in the 'gpgkey' option in a repository section and YUM +-will install it for you. +- +-For more information contact your distribution or package provider. +-``` +- +-则需要先运行`rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-openEuler`。 +- +-### Configure +- +-成功安装`iSulad`之后,需要先配置好容器镜像的注册地址,以"docker.io"为例: +- +-```shell +-# cat /etc/isulad/daemon.json +-..... +- "registry-mirrors": [ +- "docker.io" +- ], +-..... +-``` +- +-### Run +- +-`iSulad`提供了两种服务的启动方式: +- +-1. 使用`systemd`服务来启动`iSulad` +- +-```shell +-# 通过systemd命令来重启isulad服务 +-$ systemctl restart isulad +-``` +- +-2. 直接使用命令启动`iSulad` +- +-```shell +-# 使用默认套接字名称、默认日志级别和镜像管理功能启动isulad +-$ sudo isulad +-``` +- +-### Operations on containers +- +-`iSulad` 提供了两个管理镜像和容器的操作接口:CLI和CRI。 +- +-#### **CLI** +- +-`iSulad`使用 `iSula` 作为客户端命令,以下是利用CLI接口管理容器的一些基本命令: +- +-- 列出当前环境下的所有容器: +- +-```shell +-$ sudo isula ps -a +-``` +- +-- 通过`busybox`镜像创建容器: +- +- - 采用默认的runtime创建容器`test` +- +- ```sh +- $ sudo isula create -t -n test busybox +- ``` +- +- - 创建**runtime为runc**的容器`testrunc` +- +- ```sh +- $ sudo isula create -t --runtime runc -n testrunc busybox +- ``` +- +- +-- 启动容器`test`: +- +-```shell +-$ sudo isula start test +-``` +- +-- 停止容器`test`: +- +-```shell +-$ sudo isula kill test +-``` +- +-- 移除容器`test` +- +-```shell +-$ sudo isula rm test +-``` +- +-#### CRI +- +-`iSulad`可以通过CRI接口与kubernetes集成,如何与kubernetes集成请参考[k8s_integration](./docs/manual/k8s_integration_zh.md)。 +- +-## Performance +- +-采用[ptcr](https://gitee.com/openeuler/ptcr)作为容器引擎的性能测试工具,展示在不同架构的计算机中`iSulad`的性能效果。 +- +-### ARM +- +-- 10个容器串行操作的情况下,`iSula`与`docker`、`podman`的性能对比雷达图如下: +- +-ARM searially +- +-- 100个容器并行操作的情况下,`iSula`与`docker`、`podman`的性能对比雷达图如下: +- +-ARM parallerlly +- +-### X86 +- +-- 10个容器串行操作的情况下,`iSula`与`docker`、`podman`的性能对比雷达图如下: +- +-X86 searially +- +-- 100个容器并行操作的情况下,`iSula`与`docker`、`podman`的性能对比雷达图如下: +- +-X86 parallerlly +- +-**关于性能测试的更多信息请查看** [Performance test](https://gitee.com/openeuler/iSulad/wikis/Performance?sort_id=5449355)。 +- +-## Kernel Requirements +- +-`iSulad`支持在3.0.x之后的Kernel上运行。 +- +-## Compatibility +- +-`iSulad` 能够兼容的标准规范版本如下: +- +-- 兼容 1.0.0 版本的OCI +-- 兼容 0.3.0 版本以上的CNI ++- [English version](README.md) ++ ++iSulad ++ ++ ![license](https://img.shields.io/badge/license-Mulan%20PSL%20v2-blue) ![language](https://img.shields.io/badge/language-C%2FC%2B%2B-blue) ++ ++## Introduction ++ ++`iSulad`是一个由C/C++编写实现的轻量级容器引擎,具有轻、灵、巧、快的特点,不受硬件规格和架构限制,底噪开销更小,可应用的领域更为广泛。 ++ ++## Architecture ++ ++`iSulad`架构的相关介绍请查看:[architecture](./docs/design/architecture_zh.md)。 ++ ++## Function ++ ++### Runtime ++ ++`iSulad`支持多种容器runtime,包括lxc、runc和kata。 ++ ++#### lxc ++ ++lxc是用C语言编写的开源容器操作runtime,资源占用少,适用于对底噪资源限制高的场景,为iSulad默认的runtime。 ++ ++#### runc ++ ++runc是用GO语言编写的符合OCI标准的runtime,使用runc时要求其使用的OCI runtime-spec version不低于iSulad支持的oci spec version 1.0.0。 ++ ++#### kata-runtime ++ ++kata-runtime是一个安全容器runtime,用于启动安全容器时使用。 ++ ++### Image ++ ++`iSulad`支持多种镜像格式,包括OCI标准镜像格式、external rootfs镜像格式和embedded image镜像格式。 ++ ++#### OCI ++ ++OCI标准镜像格式是与docker兼容的镜像格式,支持从远程镜像仓库拉取镜像、运行容器。 ++ ++#### external rootfs ++ ++external rootfs镜像格式允许用户自行准备可启动的`root fs`目录,主要用于系统容器场景。 ++ ++#### embedded image ++ ++embedded image镜像格式是`iSulad`特有的嵌入式镜像格式,占用资源低,主要用于嵌入式应用场景。 ++ ++### Operation Interface ++ ++`iSulad`提供两种不同的镜像和容器管理操作接口,分别为CLI和CRI。 ++ ++#### CLI ++ ++CLI采用命令行的形式进行镜像和容器管理,是标准的C/S架构模式,将iSulad作为daemon服务端,iSula作为独立的客户端命令,供用户使用。 ++ ++iSula提供的命令参数覆盖了常用的大部分应用场景,包括容器的操作接口,如运行、停止、删除、pause等操作,也包括镜像的相关操作,如下载、导入、删除等。 ++ ++#### CRI ++ ++CRI(Container Runtime Interface)是由K8S对外提供的容器和镜像的服务接口,供容器引擎接入K8s。 ++ ++CRI接口基于gRPC实现。iSulad遵循CRI接口规范,实现 CRI gRPC Server,CRI gRPC Server 中包括 Runtime Service 和 image Service,分别用来提供容器运行时接口和镜像操作接口。iSulad的 gRPC Server 需要监听本地的Unix socket,而K8s的组件 kubelet 则作为 gRPC Client 运行。 ++ ++## Getting Started ++ ++- [用法指南:openeuler官方手册](https://docs.openeuler.org/zh/docs/22.03_LTS/docs/Container/container.html) ++ ++- [开发指南](./docs/build_docs/README_zh.md) ++ ++- [用户手册](./docs/manual/README_zh.md) ++ ++- [设计文档](./docs/design/README_zh.md) ++ ++### Installing ++ ++`iSulad`可以使用`yum`命令进行安装,安装之前需要查看确保配置了openEuler仓库: ++ ++```shell ++$ cat << EOF > /etc/yum.repos.d/openEuler.repo ++[openEuler] ++baseurl=https://repo.openeuler.org/openEuler-22.03-LTS/OS/\$basearch ++enabled=1 ++EOF ++``` ++ ++用`yum`安装`iSulad`的命令如下: ++ ++```shell ++$ yum install -y iSulad ++``` ++ ++若运行安装命令后报如下错误: ++ ++```txt ++Repository 'openEuler' is missing name in configuration, using id. ++ ++You have enabled checking of packages via GPG keys. This is a good thing. ++However, you do not have any GPG public keys installed. You need to download ++the keys for packages you wish to install and install them. ++You can do that by running the command: ++ rpm --import public.gpg.key ++ ++ ++Alternatively you can specify the url to the key you would like to use ++for a repository in the 'gpgkey' option in a repository section and YUM ++will install it for you. ++ ++For more information contact your distribution or package provider. ++``` ++ ++则需要先运行`rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-openEuler`。 ++ ++### Configure ++ ++成功安装`iSulad`之后,需要先配置好容器镜像的注册地址,以"docker.io"为例: ++ ++```shell ++# cat /etc/isulad/daemon.json ++..... ++ "registry-mirrors": [ ++ "docker.io" ++ ], ++..... ++``` ++ ++### Run ++ ++`iSulad`提供了两种服务的启动方式: ++ ++1. 使用`systemd`服务来启动`iSulad` ++ ++```shell ++# 通过systemd命令来重启isulad服务 ++$ systemctl restart isulad ++``` ++ ++2. 直接使用命令启动`iSulad` ++ ++```shell ++# 使用默认套接字名称、默认日志级别和镜像管理功能启动isulad ++$ sudo isulad ++``` ++ ++### Operations on containers ++ ++`iSulad` 提供了两个管理镜像和容器的操作接口:CLI和CRI。 ++ ++#### **CLI** ++ ++`iSulad`使用 `iSula` 作为客户端命令,以下是利用CLI接口管理容器的一些基本命令: ++ ++- 列出当前环境下的所有容器: ++ ++```shell ++$ sudo isula ps -a ++``` ++ ++- 通过`busybox`镜像创建容器: ++ ++ - 采用默认的runtime创建容器`test` ++ ++ ```sh ++ $ sudo isula create -t -n test busybox ++ ``` ++ ++ - 创建**runtime为runc**的容器`testrunc` ++ ++ ```sh ++ $ sudo isula create -t --runtime runc -n testrunc busybox ++ ``` ++ ++ ++- 启动容器`test`: ++ ++```shell ++$ sudo isula start test ++``` ++ ++- 停止容器`test`: ++ ++```shell ++$ sudo isula kill test ++``` ++ ++- 移除容器`test` ++ ++```shell ++$ sudo isula rm test ++``` ++ ++#### CRI ++ ++`iSulad`可以通过CRI接口与kubernetes集成,如何与kubernetes集成请参考[k8s_integration](./docs/manual/k8s_integration_zh.md)。 ++ ++## Performance ++ ++采用[ptcr](https://gitee.com/openeuler/ptcr)作为容器引擎的性能测试工具,展示在不同架构的计算机中`iSulad`的性能效果。 ++ ++### ARM ++ ++- 10个容器串行操作的情况下,`iSula`与`docker`、`podman`的性能对比雷达图如下: ++ ++ARM searially ++ ++- 100个容器并行操作的情况下,`iSula`与`docker`、`podman`的性能对比雷达图如下: ++ ++ARM parallerlly ++ ++### X86 ++ ++- 10个容器串行操作的情况下,`iSula`与`docker`、`podman`的性能对比雷达图如下: ++ ++X86 searially ++ ++- 100个容器并行操作的情况下,`iSula`与`docker`、`podman`的性能对比雷达图如下: ++ ++X86 parallerlly ++ ++**关于性能测试的更多信息请查看** [Performance test](https://gitee.com/openeuler/iSulad/wikis/Performance?sort_id=5449355)。 ++ ++## Kernel Requirements ++ ++`iSulad`支持在3.0.x之后的Kernel上运行。 ++ ++## Compatibility ++ ++`iSulad` 能够兼容的标准规范版本如下: ++ ++- 兼容 1.0.0 版本的OCI ++- 兼容 0.3.0 版本以上的CNI + - 兼容 2.1.x 版本以上的lcr +\ No newline at end of file +diff --git a/docs/design/architecture.md b/docs/design/architecture.md +index 7487616d..0c4d1dea 100644 +--- a/docs/design/architecture.md ++++ b/docs/design/architecture.md +@@ -1,43 +1,43 @@ +-# iSulad Architecture +- +-## Overview +- +-![architecture](../images/arch.jpg) +- +-iSulad is an OCI-compliant container runtime engine that emphasizes simplicity, robustness, performance and lightweight. +- +-As a daemon process, it manages the entire container life cycle of the host system, including image transmission and storage, container execution and monitoring management, container resource management, and network management. iSulad provides Docker-like CLI for users. +- +-You can use Docker-like commands to manage container images and iSulad provides gRPC APIs which comply with the CRI standard for Kubernetes. +- +-iSulad is divided into different modules, and the modules are organized into subsystems. Understanding these modules, subsystems, and their relationships is important to modify and extend iSulad. +- +-This document describes the high-level system architecture design. For more information about each module, please refer to the relevant design documents. +- +-## Subsystem +- +-You can interact with the iSulad by invoking gRPC APIs exported by the subsystem. +- +-- **image service** : Image management service, provides image-related operations, such as image download, query, and deletion. +-- **execution service**: Container life cycle management service, provides container-related operations, such as container creation, startup, and deletion. +-- **network**:The network subsystem is responsible for network management capabilities of the pod of the CRI. When a pod is started, the pod is added to the network plane specified in the configuration file through the CNI interface. When a pod is stopped, the CNI API is used to remove the pod from the network plane where the pod is located and clear related network resources. +- +-## Module +- +-- **image content** : Managing Image Metadata and Container File Systems +- +-- **resource manage**: Container resource management module, for example, setting available CPU and memory resource limits +- +-- **Executor**:Runtime for executing actual container operations. The LCR acts as the default runtime and can be extended through the plug-in mechanism. +- +-- **Events**:Container event collection module +- +-- **Plugins**:Provides the plugin mechanism to extend container capabilities through different plugins. +- +-- **HA**:This module provides fault locating and garbage collection service. +- +-### Network architecture design +- +-The figure shows the architecture: +- +-![CNI_architecture](../images/CNI_architecture.png) ++# iSulad Architecture ++ ++## Overview ++ ++![architecture](../images/arch.jpg) ++ ++iSulad is an OCI-compliant container runtime engine that emphasizes simplicity, robustness, performance and lightweight. ++ ++As a daemon process, it manages the entire container life cycle of the host system, including image transmission and storage, container execution and monitoring management, container resource management, and network management. iSulad provides Docker-like CLI for users. ++ ++You can use Docker-like commands to manage container images and iSulad provides gRPC APIs which comply with the CRI standard for Kubernetes. ++ ++iSulad is divided into different modules, and the modules are organized into subsystems. Understanding these modules, subsystems, and their relationships is important to modify and extend iSulad. ++ ++This document describes the high-level system architecture design. For more information about each module, please refer to the relevant design documents. ++ ++## Subsystem ++ ++You can interact with the iSulad by invoking gRPC APIs exported by the subsystem. ++ ++- **image service** : Image management service, provides image-related operations, such as image download, query, and deletion. ++- **execution service**: Container life cycle management service, provides container-related operations, such as container creation, startup, and deletion. ++- **network**:The network subsystem is responsible for network management capabilities of the pod of the CRI. When a pod is started, the pod is added to the network plane specified in the configuration file through the CNI interface. When a pod is stopped, the CNI API is used to remove the pod from the network plane where the pod is located and clear related network resources. ++ ++## Module ++ ++- **image content** : Managing Image Metadata and Container File Systems ++ ++- **resource manage**: Container resource management module, for example, setting available CPU and memory resource limits ++ ++- **Executor**:Runtime for executing actual container operations. The LCR acts as the default runtime and can be extended through the plug-in mechanism. ++ ++- **Events**:Container event collection module ++ ++- **Plugins**:Provides the plugin mechanism to extend container capabilities through different plugins. ++ ++- **HA**:This module provides fault locating and garbage collection service. ++ ++### Network architecture design ++ ++The figure shows the architecture: ++ ++![CNI_architecture](../images/CNI_architecture.png) +diff --git a/docs/design/architecture_zh.md b/docs/design/architecture_zh.md +index 288bbfe4..d036bb56 100644 +--- a/docs/design/architecture_zh.md ++++ b/docs/design/architecture_zh.md +@@ -1,41 +1,41 @@ +-# iSulad Architecture +- +-## Overview +- +-![architecture](../images/arch.jpg) +- +-iSulad是一个基于OCI标准的容器运行引擎,强调简单性、健壮性和轻量化。 +- +-作为守护进程,iSulad提供容器生命周期管理相关服务:包括镜像的传输和存储、容器执行和监控管理、容器资源管理以及网络等。iSulad对外提供与docker类似的CLI命令行接口,可使用该命令行进行容器管理;并且提供符合CRI接口标准的gRPC API,可供kubernetes 按照CRI接口协议调用。 +- +-为了方便理解,我们将iSulad分成不同的模块,并根据模块的类别组织成子系统。了解这些模块、子系统及其关系是修改和扩展iSulad的关键 +- +-本文档将仅描述各个模块的high-level功能设计。有关每个模块的详细信息,请参阅相关设计文档。 +- +-## 子系统 +- +-用户可通过调用子系统提供的GRPC API与iSulad进行交互。 +- +-- **image service** : 镜像管理服务,提供镜像相关操作,如镜像下载、查询、删除等 +-- **execution service**: 容器生命周期管理服务,提供容器的相关操作,如容器创建、启动、删除等 +-- **network**:网络子模块负责CRI的Pod的网络管理能力。当Pod启动时,通过CNI的接口把该Pod加入到配置文件制定的网络平面中;当Pod停止时,通过CNI的接口把该Pod从所在的网络平面中退出,并且清理相关的网络资源。 +- +-## 模块 +- +-- **image content** : 管理镜像元数据以及容器文件系统。 +- +-- **resource manage**: 容器资源管理,如设置可用cpu、memory等资源限制 +- +-- **Executor**:执行实际容器操作的runtime,提供lcr作为默认runtime,可通过plugin机制扩展 +- +-- **Events**:容器事件收集 +- +-- **Plugins**:提供插件机制,通过不同插件,实现扩展容器功能。 +- +-- **HA**:提供日志机制用于定位问题,提供garbage collect 机制回收容器D/Z 等异常容器资源。 +- +-### 网络架构设计 +- +-架构图,如下: +- +-![CNI_architecture](../images/CNI_architecture.png) ++# iSulad Architecture ++ ++## Overview ++ ++![architecture](../images/arch.jpg) ++ ++iSulad是一个基于OCI标准的容器运行引擎,强调简单性、健壮性和轻量化。 ++ ++作为守护进程,iSulad提供容器生命周期管理相关服务:包括镜像的传输和存储、容器执行和监控管理、容器资源管理以及网络等。iSulad对外提供与docker类似的CLI命令行接口,可使用该命令行进行容器管理;并且提供符合CRI接口标准的gRPC API,可供kubernetes 按照CRI接口协议调用。 ++ ++为了方便理解,我们将iSulad分成不同的模块,并根据模块的类别组织成子系统。了解这些模块、子系统及其关系是修改和扩展iSulad的关键 ++ ++本文档将仅描述各个模块的high-level功能设计。有关每个模块的详细信息,请参阅相关设计文档。 ++ ++## 子系统 ++ ++用户可通过调用子系统提供的GRPC API与iSulad进行交互。 ++ ++- **image service** : 镜像管理服务,提供镜像相关操作,如镜像下载、查询、删除等 ++- **execution service**: 容器生命周期管理服务,提供容器的相关操作,如容器创建、启动、删除等 ++- **network**:网络子模块负责CRI的Pod的网络管理能力。当Pod启动时,通过CNI的接口把该Pod加入到配置文件制定的网络平面中;当Pod停止时,通过CNI的接口把该Pod从所在的网络平面中退出,并且清理相关的网络资源。 ++ ++## 模块 ++ ++- **image content** : 管理镜像元数据以及容器文件系统。 ++ ++- **resource manage**: 容器资源管理,如设置可用cpu、memory等资源限制 ++ ++- **Executor**:执行实际容器操作的runtime,提供lcr作为默认runtime,可通过plugin机制扩展 ++ ++- **Events**:容器事件收集 ++ ++- **Plugins**:提供插件机制,通过不同插件,实现扩展容器功能。 ++ ++- **HA**:提供日志机制用于定位问题,提供garbage collect 机制回收容器D/Z 等异常容器资源。 ++ ++### 网络架构设计 ++ ++架构图,如下: ++ ++![CNI_architecture](../images/CNI_architecture.png) +diff --git a/docs/design/detailed/CRI/k8s125_New_Add_CRI.md b/docs/design/detailed/CRI/k8s125_New_Add_CRI.md +index aa8a9e17..1adf3e73 100644 +--- a/docs/design/detailed/CRI/k8s125_New_Add_CRI.md ++++ b/docs/design/detailed/CRI/k8s125_New_Add_CRI.md +@@ -1,162 +1,162 @@ +-# 章节一:CRI接口升级背景及版本 +-背景:当前iSulad CRI接口版本采用的K8s 1.15版本,升级至k8s1.25,CRI接口需要对升级后的新增CRI字段进行补充。 +- +-版本:升级至k8s1.25。 +-# 章节二:新增功能 +- +-## 1、Image +- +-### 1.1、ListImages +- void ListImages(const runtime::v1alpha2::ImageFilter &filter, +- std::vector> *images, Errors &error) override;" +- +-**新增CRI字段** +- +-ImageFilter里面的ImageSpec新增 map annotations = 2。 +-Image中新增ImageSpec spec = 7;bool pinned = 8;ImageSpec里面新增 map annotations = 2。 +-### 1.2、ImageStatus +- std::unique_ptr ImageStatus(construntime::v1alpha2::ImageSpec &image,Errors &error) override; +- +-**新增CRI字段** +- +-Image中新增ImageSpec spec = 7;bool pinned = 8;ImageSpec里面新增 map annotations = 2; +-### 1.3、PullImage +- std::string PullImage(const runtime::v1alpha2::ImageSpec &image, const runtime::v1alpha2::AuthConfig &auth,Errors &error) override; +- +-**新增CRI字段** +- +-ImageSpec里面新增 map annotations = 2;AuthConfig 无新增 +-### 1.4、RemoveImage +- void RemoveImage(const runtime::v1alpha2::ImageSpec &image, Errors &error) override; +- +-**新增CRI字段** +- +-ImageSpec里面新增 map annotations = 2; +-### 1.5、ImageFsInfo +-无新增CRI字段 +-## 2、POD +-### 2.1、RunPodSandbox +- auto RunPodSandbox(const runtime::v1alpha2::PodSandboxConfig &config, const std::string &runtimeHandler,Errors &error) -> std::string; +- +-**新增CRI字段** +- +-1、新增WindowsPodSandboxConfig windows = 9; +- +-2、原有LinuxPodSandboxConfig中新增LinuxContainerResources overhead = 4;LinuxContainerResources resources = 5; +- +-3、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中新增新增SecurityProfile seccomp = 9;SecurityProfile apparmor = 10; +- +-4、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; +- +-5、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中原有NamespaceOption原有中NamespaceMode新增TARGET +-### 2.2、StopPodSandbox +- void StopPodSandbox(const std::string &podSandboxID, Errors &error); +- +-**新增CRI字段** +- +-1、原有PodSandboxNetworkStatus中新增repeated PodIP additional_ips = 2; +- +-2、原有LinuxPodSandboxStatus中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; +- +-3、原有LinuxPodSandboxStatus中原有NamespaceOption中原有中NamespaceMode新增TARGET +-### 2.3、RemovePodSandbox +-无新增CRI字段 +-### 2.4、PodSandboxStatus +- auto PodSandboxStatus(const std::string &podSandboxID, Errors &error) +- -> std::unique_ptr; +- +-**新增CRI字段** +- +-1、原有PodSandboxNetworkStatus中新增repeated PodIP additional_ips = 2; +- +-2、原有LinuxPodSandboxStatus中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; +- +-3、原有LinuxPodSandboxStatus中原有NamespaceOption中原有中NamespaceMode新增TARGET +-### 2.5、ListPodSandbox +-无新增CRI字段 +-### 2.6、PortForward +-该函数未实现 +-## 3、Container +-### 3.1、CreateContainer +- auto CreateContainer(const std::string &podSandboxID, const runtime::v1alpha2::ContainerConfig &containerConfig, +- const runtime::v1alpha2::PodSandboxConfig &podSandboxConfig, Errors &error) -> std::string; +-**新增CRI字段1** +- +-1、原有ImageSpec中新增map annotations = 2; +- +-2、原有LinuxContainerConfig中原有LinuxContainerResources新增repeated HugepageLimit hugepage_limits = 8;map unified = 9;int64 memory_swap_limit_in_bytes = 10; +- +-3、原有LinuxContainerConfig中原有LinuxContainerSecurityContext新增SecurityProfile seccomp = 15;SecurityProfile apparmor = 16; +- +-4、原有LinuxContainerConfig中原有LinuxContainerSecurityContext中原有Capability新增repeated string add_ambient_capabilities = 3; +- +-5、原有LinuxContainerConfig中原有LinuxContainerSecurityContext中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; +- +-6、原有LinuxContainerConfig中原有LinuxContainerSecurityContext中原有NamespaceOption中原有NamespaceMode新增TARGET +- +-7、原有WindowsContainerConfig中原有WindowsContainerResources新增int64 rootfs_size_in_bytes = 5; +- +-8、原有WindowsContainerConfig中原有WindowsContainerSecurityContext新增bool host_process = 3; +- +-**新增CRI字段2** +- +-1、新增WindowsPodSandboxConfig windows = 9; +- +-2、原有LinuxPodSandboxConfig中新增LinuxContainerResources overhead = 4;LinuxContainerResources resources = 5 +- +-3、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中新增新增SecurityProfile seccomp = 9;SecurityProfile apparmor = 10; +- +-4、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; +- +-5、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中原有NamespaceOption原有中NamespaceMode新增TARGET +-### 3.2、StartContainer +-无新增CRI字段 +-### 3.3、StopContainer +-无新增CRI字段 +-### 3.4、RemoveContainer +-无新增CRI字段 +-### 3.5、ListContainers +- void ListContainers(const runtime::v1alpha2::ContainerFilter *filter, +- std::vector> *containers, Errors &error); +- +-**新增CRI字段** +- +-Container中原有ImageSpec新增 map annotations = 2; +-### 3.6、ListContainerStats +- void ListContainerStats(const runtime::v1alpha2::ContainerStatsFilter *filter, +- std::vector> *containerstats, +- Errors &error); +- +-**新增CRI字段** +- +-ContainerStatsFilter无新增CRI字段,ContainerStats中新增字段如下: +- +-1、原有MemoryUsage新增UInt64Value available_bytes = 3;UInt64Value usage_bytes = 4; UInt64Value rss_bytes = 5;UInt64Value page_faults = 6;UInt64Value major_page_faults = 7; +- +-2、原有CpuUsage新增UInt64Value usage_nano_cores = 3; +-## 3.7、ContainerStatus +- auto ContainerStatus(const std::string &containerID, Errors &error) +- -> std::unique_ptr; +- +-**新增CRI字段** +- +-ContainerStatus中新增字段ContainerResources resources = 16; +-### 3.8、UpdateContainerResources +- void UpdateContainerResources(const std::string &containerID,const runtime::v1alpha2::LinuxContainerResources &resources, Errors &error) +- +-**新增CRI字段** +- +-LinuxContainerResources中新增字段repeated HugepageLimit hugepage_limits = 8;map unified = 9;int64 memory_swap_limit_in_bytes = 10; +- +-### 3.9、UpdateRuntimeConfig +-无新增CRI字段 +-### 3.10、Status +-无新增CRI字段 +-### 3.11、Version +-无新增CRI字段 +-### 3.12、ExecSync +-无新增CRI字段 +-### 3.13、Exec +-无新增CRI字段 +-### 3.14、Attach +-无新增CRI字段 ++# 章节一:CRI接口升级背景及版本 ++背景:当前iSulad CRI接口版本采用的K8s 1.15版本,升级至k8s1.25,CRI接口需要对升级后的新增CRI字段进行补充。 ++ ++版本:升级至k8s1.25。 ++# 章节二:新增功能 ++ ++## 1、Image ++ ++### 1.1、ListImages ++ void ListImages(const runtime::v1alpha2::ImageFilter &filter, ++ std::vector> *images, Errors &error) override;" ++ ++**新增CRI字段** ++ ++ImageFilter里面的ImageSpec新增 map annotations = 2。 ++Image中新增ImageSpec spec = 7;bool pinned = 8;ImageSpec里面新增 map annotations = 2。 ++### 1.2、ImageStatus ++ std::unique_ptr ImageStatus(construntime::v1alpha2::ImageSpec &image,Errors &error) override; ++ ++**新增CRI字段** ++ ++Image中新增ImageSpec spec = 7;bool pinned = 8;ImageSpec里面新增 map annotations = 2; ++### 1.3、PullImage ++ std::string PullImage(const runtime::v1alpha2::ImageSpec &image, const runtime::v1alpha2::AuthConfig &auth,Errors &error) override; ++ ++**新增CRI字段** ++ ++ImageSpec里面新增 map annotations = 2;AuthConfig 无新增 ++### 1.4、RemoveImage ++ void RemoveImage(const runtime::v1alpha2::ImageSpec &image, Errors &error) override; ++ ++**新增CRI字段** ++ ++ImageSpec里面新增 map annotations = 2; ++### 1.5、ImageFsInfo ++无新增CRI字段 ++## 2、POD ++### 2.1、RunPodSandbox ++ auto RunPodSandbox(const runtime::v1alpha2::PodSandboxConfig &config, const std::string &runtimeHandler,Errors &error) -> std::string; ++ ++**新增CRI字段** ++ ++1、新增WindowsPodSandboxConfig windows = 9; ++ ++2、原有LinuxPodSandboxConfig中新增LinuxContainerResources overhead = 4;LinuxContainerResources resources = 5; ++ ++3、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中新增新增SecurityProfile seccomp = 9;SecurityProfile apparmor = 10; ++ ++4、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; ++ ++5、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中原有NamespaceOption原有中NamespaceMode新增TARGET ++### 2.2、StopPodSandbox ++ void StopPodSandbox(const std::string &podSandboxID, Errors &error); ++ ++**新增CRI字段** ++ ++1、原有PodSandboxNetworkStatus中新增repeated PodIP additional_ips = 2; ++ ++2、原有LinuxPodSandboxStatus中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; ++ ++3、原有LinuxPodSandboxStatus中原有NamespaceOption中原有中NamespaceMode新增TARGET ++### 2.3、RemovePodSandbox ++无新增CRI字段 ++### 2.4、PodSandboxStatus ++ auto PodSandboxStatus(const std::string &podSandboxID, Errors &error) ++ -> std::unique_ptr; ++ ++**新增CRI字段** ++ ++1、原有PodSandboxNetworkStatus中新增repeated PodIP additional_ips = 2; ++ ++2、原有LinuxPodSandboxStatus中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; ++ ++3、原有LinuxPodSandboxStatus中原有NamespaceOption中原有中NamespaceMode新增TARGET ++### 2.5、ListPodSandbox ++无新增CRI字段 ++### 2.6、PortForward ++该函数未实现 ++## 3、Container ++### 3.1、CreateContainer ++ auto CreateContainer(const std::string &podSandboxID, const runtime::v1alpha2::ContainerConfig &containerConfig, ++ const runtime::v1alpha2::PodSandboxConfig &podSandboxConfig, Errors &error) -> std::string; ++**新增CRI字段1** ++ ++1、原有ImageSpec中新增map annotations = 2; ++ ++2、原有LinuxContainerConfig中原有LinuxContainerResources新增repeated HugepageLimit hugepage_limits = 8;map unified = 9;int64 memory_swap_limit_in_bytes = 10; ++ ++3、原有LinuxContainerConfig中原有LinuxContainerSecurityContext新增SecurityProfile seccomp = 15;SecurityProfile apparmor = 16; ++ ++4、原有LinuxContainerConfig中原有LinuxContainerSecurityContext中原有Capability新增repeated string add_ambient_capabilities = 3; ++ ++5、原有LinuxContainerConfig中原有LinuxContainerSecurityContext中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; ++ ++6、原有LinuxContainerConfig中原有LinuxContainerSecurityContext中原有NamespaceOption中原有NamespaceMode新增TARGET ++ ++7、原有WindowsContainerConfig中原有WindowsContainerResources新增int64 rootfs_size_in_bytes = 5; ++ ++8、原有WindowsContainerConfig中原有WindowsContainerSecurityContext新增bool host_process = 3; ++ ++**新增CRI字段2** ++ ++1、新增WindowsPodSandboxConfig windows = 9; ++ ++2、原有LinuxPodSandboxConfig中新增LinuxContainerResources overhead = 4;LinuxContainerResources resources = 5 ++ ++3、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中新增新增SecurityProfile seccomp = 9;SecurityProfile apparmor = 10; ++ ++4、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中原有NamespaceOption中新增string target_id = 4;新增UserNamespace userns_options = 5; ++ ++5、原有LinuxPodSandboxConfig中原有LinuxSandboxSecurityContext中原有NamespaceOption原有中NamespaceMode新增TARGET ++### 3.2、StartContainer ++无新增CRI字段 ++### 3.3、StopContainer ++无新增CRI字段 ++### 3.4、RemoveContainer ++无新增CRI字段 ++### 3.5、ListContainers ++ void ListContainers(const runtime::v1alpha2::ContainerFilter *filter, ++ std::vector> *containers, Errors &error); ++ ++**新增CRI字段** ++ ++Container中原有ImageSpec新增 map annotations = 2; ++### 3.6、ListContainerStats ++ void ListContainerStats(const runtime::v1alpha2::ContainerStatsFilter *filter, ++ std::vector> *containerstats, ++ Errors &error); ++ ++**新增CRI字段** ++ ++ContainerStatsFilter无新增CRI字段,ContainerStats中新增字段如下: ++ ++1、原有MemoryUsage新增UInt64Value available_bytes = 3;UInt64Value usage_bytes = 4; UInt64Value rss_bytes = 5;UInt64Value page_faults = 6;UInt64Value major_page_faults = 7; ++ ++2、原有CpuUsage新增UInt64Value usage_nano_cores = 3; ++## 3.7、ContainerStatus ++ auto ContainerStatus(const std::string &containerID, Errors &error) ++ -> std::unique_ptr; ++ ++**新增CRI字段** ++ ++ContainerStatus中新增字段ContainerResources resources = 16; ++### 3.8、UpdateContainerResources ++ void UpdateContainerResources(const std::string &containerID,const runtime::v1alpha2::LinuxContainerResources &resources, Errors &error) ++ ++**新增CRI字段** ++ ++LinuxContainerResources中新增字段repeated HugepageLimit hugepage_limits = 8;map unified = 9;int64 memory_swap_limit_in_bytes = 10; ++ ++### 3.9、UpdateRuntimeConfig ++无新增CRI字段 ++### 3.10、Status ++无新增CRI字段 ++### 3.11、Version ++无新增CRI字段 ++### 3.12、ExecSync ++无新增CRI字段 ++### 3.13、Exec ++无新增CRI字段 ++### 3.14、Attach ++无新增CRI字段 +diff --git a/docs/design/detailed/Container/state_check.md b/docs/design/detailed/Container/state_check.md +index 69df60b0..a40f3201 100644 +--- a/docs/design/detailed/Container/state_check.md ++++ b/docs/design/detailed/Container/state_check.md +@@ -1,247 +1,247 @@ +-# 状态概况 +- +-## 容器的状态以及转换关系 +- +-```mermaid +-stateDiagram +- direction LR +- created +- running +- paused +- gced +- stoped +- deleted +- state origin <> +- +- origin --> created : isula create +- origin --> running : isula run +- created --> running : isula start +- created --> running : isula restart +- running --> paused : isula pause +- running --> running : isula restart +- paused --> running : isula unpause +- paused --> running : isula restart +- paused --> gced : isula stop +- paused --> gced : isula kill +- running --> gced : isula kill +- running --> gced : isula stop +- stoped --> running : isula restart +- gced --> stoped +- stoped --> running : isula start +- stoped --> deleted: isula rm +-``` +- +-1. created: 不创建名称相同的容器 +-2. running:若已经running状态则不需要进行start操作 +-3. paused:若已经位于gc状态则无需pause操作,该状态下无法执行start、exec、attach和rm等,但是应该是可以支持rm -f的 +-4. gced:若位于gc,则容器都不做其他操作。 +-5. stoped +-6. deleted:若位于removal状态,则无需再rm,若为running状态,则无法rm,若位于paused状态,则使用-f时,应该可以删除。 +- +- +- +-## isulad代码中设置的状态 +- +-| 状态 | true | false | 判定方式 | 拿容器锁前判定 | 拿容器锁后判断 | 其他使用 | +-| ------------------------- | ------------------------------------------------------------ | :----------------------------------------------------------: | ------------------------------------------- | ---------------------- | --------------------------------- | ------------------------------------------- | +-| running | container_state_set_running、container_state_set_restarting、container_restart_update_start_and_finish_time:在start操作完全完成之后设置 | container_state_set_stopped | container_is_running | rm、start、用户restart | start、kill、restart、pause、stop | | +-| paused | container_state_set_paused:在pause操作完成之后设置 | container_state_set_stopped | container_is_paused | rm | start、stop | | +-| restartring | container_state_set_restarting:系统自动的restart一开始设置 | container_state_set_running、container_state_set_stopped、container_restart_update_start_and_finish_time | container_is_restarting | | pause、stop | | +-| has_been_manually_stopped | stop和kill操作时直接赋值 | container_state_reset_has_been_manual_stopped start和restore paused以及running容器时 | container_state_get_has_been_manual_stopped | | | 用于区分容器异常停止退出和用户主动退出 | +-| starting | container_state_set_starting:在start的callback函数一开始设置 | container_state_reset_starting、container_state_set_running、container_state_set_stopped、container_restart_update_start_and_finish_time | if判断 | | | 用于restore时容器状态的判断和容器状态的显示 | +-| removal_inprogress | set_container_to_removal封装的container_state_set_removal_in_progress:在delete的callback函数一开始设置 | container_state_reset_removal_in_progress | container_is_removal_in_progress | | start、用户restart | | +-| dead | container_state_set_dead:未被设置 | | container_is_dead | | | 未被使用 | +- +- +- +-其他有关的变量: +- +-1. restart_manager_cancel:通知restart_manager取消restart动作,会对rm->canceled = true; +- +-2. cont->hostconfig->auto_remove_bak和 cont->hostconfig->auto_remove:create的时候赋值,在option中有与之对应的配置,默认为false。 +- +-为什么设置两个: +- +-因为在restart的stop过程中,需要将容器 cont->hostconfig->auto_remove置为false,从而防止容器在restart过程中被删除。还原时需要用bak备份的,gc和restore时也用bak为了保证不使用改变后的值。 +- +-auto_remove作用: +- +-(1)当位于异常stop状态,且重启策略不重启、auto_remove_bak为true时,就会设置状态为removal状态,并且delete_container。 +- +-(2)当start异常时,若设置了auto_remove,则需要先将容器的状态设置为removal之后删除容器,然后将容器状态存入disk。 +- +-(3)当容器的auto_remove设置为true时,容器的重启策略必须为no,不重启 +- +-auto_remove_bak作用: +- +-(1)当gc之后,且重启策略没将容器running且auto_remove_bak为true时,就会设置状态为removal状态,并且delete_container。 +- +-(2)当restore时,当容器不处于running状态且auto_remove_bak为true时,就会设置状态为removal状态,并且delete_container。 +- +-## gc的状态检查 +- +-| 判定方式 | 拿容器锁前判定 | 拿容器锁后判定 | +-| ------------------------------------------------- | ------------------------------------------- | -------------- | +-| 判断容器id是否位于g_gc_containers.containers_list | kill、pause、stop、系统restart、用户restart | rm、start | +- +- +- +-# 现有状态转换 +- +-## create过程 +- +-- create结束的标志是:将容器id和结构体conf作为key和value存入g_containers_store->map中即create成功,delete在最后的过程中才将容器id的map项删除 +- +-![create](../../../images/Container_state/create.svg) +- +- +- +-## pause过程 +- +-![paused](../../../images/Container_state/paused.svg) +- +-1. 是不是在pause在获得容器锁之前,需要补充: +- +-```c +- if (container_is_removal_in_progress(cont->state) || container_is_dead(cont->state)) { +- ERROR("Container is marked for removal and cannot be started."); +- isulad_set_error_message("Container is marked for removal and cannot be started."); +- ret = -1; +- goto out; +- } +-``` +- +-## running状态 +- +-这里在start_container之前不需要检查是否位于gc状态是因为,gc结束之后可能是位于stop状态,stop之后是允许start的,而在状态检查和start_container中间,准备工作还需要一段时间。 +- +-![running](../../../images/Container_state/running.svg) +- +-​ Q: +- +-1. docker中start一个paused的容器会报错容器位于paused状态,需要先unpased,而在我们的逻辑中,由于running状态和paused状态时一起为true的,现有状态检查中,只是检查出他位于running状态,无需进行start操作,不会返回用户信息。 +- +-## stop过程 +- +-Q:在发送命令创建之后检查容器容器是不是运行状态?发送命令之后通过running的变化得知是否完成了stop?runtime中是会对容器状态改变吗? +- +-![stop](../../../images/Container_state/stop.svg) +- +- +- +- +- +-1. 是不是在gc状态会包含了容器在删除过程中吗?或者说是不是在删除时,第一过程就是要将全局的id与conf删除,则没获取到就是校验了,如果不包含,则stop在获得容器锁之前,需要补充: +- +-```c +- if (container_is_removal_in_progress(cont->state) || container_is_dead(cont->state)) { +- ERROR("Container is marked for removal and cannot be started."); +- isulad_set_error_message("Container is marked for removal and cannot be started."); +- ret = -1; +- goto out; +- } +-``` +- +-但是由于,如果需要强行删除容器,因为在删除的最初已经将删除状态设置了,若在stop中加入了检查是否在删除状态,就会直接退出,不会再进行之后的操作。但是可以在force情况下也对stop放弃此检查?? +- +-2.与docker中行为的不同: +- +-在docker中容器位于paused状态也可以进行stop和kill以及restart,但是我们代码里面 +- +-```c +-if (container_is_paused(cont->state)) { +- ERROR("Container %s is paused. Unpause the container before stopping or killing", id); +- isulad_set_error_message("Container %s is paused. Unpause the container before stopping or killing", id); +- ret = -1; +- goto out; +- } +-``` +- +-在stop时检查了是否处于paused,若处于,则直接错误返回,而在kill时是没有进行检查的,而在用户手册中展示的为: +- +-![openeuler](../../../images/Container_state/openeuler.png) +- +-而stop的这个行为导致强行 rm 和restart时都对于处于pause状态的容器无法操作。 +- +-是需要和docker一致?还是保持原样,保持原样的话是不是需要将备注错误信息改一下呢? +- +-## gc和supervisor过程 +- +-gc过程结束之后才会将容器从g_gc_containers.containers_list中清除。 +- +-全局的g_gc_containers.containers_list中,而判断容器是否在gc状态的标准就是是不是在全局的这个list中 +- +-```c +-/* gc is gc progress */ +-bool gc_is_gc_progress(const char *id) +-{ +- bool ret = false; +- struct linked_list *it = NULL; +- struct linked_list *next = NULL; +- container_garbage_config_gc_containers_element *cont = NULL; +- +- gc_containers_lock(); +- +- linked_list_for_each_safe(it, &g_gc_containers.containers_list, next) { +- cont = (container_garbage_config_gc_containers_element *)it->elem; +- if (strcmp(id, cont->id) == 0) { +- ret = true; +- break; +- } +- } +- +- gc_containers_unlock(); +- +- return ret; +-} +-``` +- +- +- +-![gc_supervisor](../../../images/Container_state/gc_supervisor.svg) +- +-​ Q:resume container时啥作用,是start容器吗?这里为什么要做,且不需要容器状态检查 +- +-## delete过程 +- +-在容器状态异常发生stop event且不restart的状态下,也需要对容器delete_container,但是感觉疑问还是在,因为可以在delete_container中对状态进行判断之后在do_delete_container才进行remove状态的设置:可能是为了增加删除的概率,若在竞争状态下,先设置为removal能阻止其他的容器操作进程运行,但是其他的对于这个状态的判断很少 +- +-![delete](../../../images/Container_state/delete.svg) +- +- +- +-## kill过程 +- +-![kill](../../../images/Container_state/kill.svg) +- +-## restart过程 +- +-restart有两种,一种是重启策略的重启,默认的重启策略为no,若设置为always或者设置为unless-stoped且has_been_manually_stopped为fasle时,会重启;另一种是在用户使用restart命令时,也会对容器进行restart,因此在执行stop_and_start里面时,才需要对running状态进行检查。 +- +-### 用户restart +- +-![user_restart](../../../images/Container_state/user_restart.svg) +- +-### 容器event变为 stopped之后引发的容器状态改变 +- +-![sys_restart](../../../images/Container_state/sys_restart.svg) +- +- +- +-# Question +- +-## 在获得容器锁之前的状态检查和在获得容器锁之后的状态检查有什么区别? +- +-在之前检查的是必然没必要往下走的状态,而在获得容器锁之后检查的是必须保证满足开始进行操作的条件的状态,更严格。在第一次检查和获得容器锁之间有一段时间,需要允许其他操作并发操作。 +- +-## gc持有锁的情况 +- +-1. 在清理容器资源时,需要获得容器的锁:clean_container_resource +- +-## 状态改变但没有落盘 +- +-src/daemon/executor/container_cb/execution.c #391、421、502 +- +-src/daemon/modules/container/restart_manager/restartmanager.c #104 +- +-src/daemon/executor/container_cb/execution_extend.c #534 这里为什么要将这个放在container_update_health_monitor后面,里面好像没有涉及到状态的改变。 ++# 状态概况 ++ ++## 容器的状态以及转换关系 ++ ++```mermaid ++stateDiagram ++ direction LR ++ created ++ running ++ paused ++ gced ++ stoped ++ deleted ++ state origin <> ++ ++ origin --> created : isula create ++ origin --> running : isula run ++ created --> running : isula start ++ created --> running : isula restart ++ running --> paused : isula pause ++ running --> running : isula restart ++ paused --> running : isula unpause ++ paused --> running : isula restart ++ paused --> gced : isula stop ++ paused --> gced : isula kill ++ running --> gced : isula kill ++ running --> gced : isula stop ++ stoped --> running : isula restart ++ gced --> stoped ++ stoped --> running : isula start ++ stoped --> deleted: isula rm ++``` ++ ++1. created: 不创建名称相同的容器 ++2. running:若已经running状态则不需要进行start操作 ++3. paused:若已经位于gc状态则无需pause操作,该状态下无法执行start、exec、attach和rm等,但是应该是可以支持rm -f的 ++4. gced:若位于gc,则容器都不做其他操作。 ++5. stoped ++6. deleted:若位于removal状态,则无需再rm,若为running状态,则无法rm,若位于paused状态,则使用-f时,应该可以删除。 ++ ++ ++ ++## isulad代码中设置的状态 ++ ++| 状态 | true | false | 判定方式 | 拿容器锁前判定 | 拿容器锁后判断 | 其他使用 | ++| ------------------------- | ------------------------------------------------------------ | :----------------------------------------------------------: | ------------------------------------------- | ---------------------- | --------------------------------- | ------------------------------------------- | ++| running | container_state_set_running、container_state_set_restarting、container_restart_update_start_and_finish_time:在start操作完全完成之后设置 | container_state_set_stopped | container_is_running | rm、start、用户restart | start、kill、restart、pause、stop | | ++| paused | container_state_set_paused:在pause操作完成之后设置 | container_state_set_stopped | container_is_paused | rm | start、stop | | ++| restartring | container_state_set_restarting:系统自动的restart一开始设置 | container_state_set_running、container_state_set_stopped、container_restart_update_start_and_finish_time | container_is_restarting | | pause、stop | | ++| has_been_manually_stopped | stop和kill操作时直接赋值 | container_state_reset_has_been_manual_stopped start和restore paused以及running容器时 | container_state_get_has_been_manual_stopped | | | 用于区分容器异常停止退出和用户主动退出 | ++| starting | container_state_set_starting:在start的callback函数一开始设置 | container_state_reset_starting、container_state_set_running、container_state_set_stopped、container_restart_update_start_and_finish_time | if判断 | | | 用于restore时容器状态的判断和容器状态的显示 | ++| removal_inprogress | set_container_to_removal封装的container_state_set_removal_in_progress:在delete的callback函数一开始设置 | container_state_reset_removal_in_progress | container_is_removal_in_progress | | start、用户restart | | ++| dead | container_state_set_dead:未被设置 | | container_is_dead | | | 未被使用 | ++ ++ ++ ++其他有关的变量: ++ ++1. restart_manager_cancel:通知restart_manager取消restart动作,会对rm->canceled = true; ++ ++2. cont->hostconfig->auto_remove_bak和 cont->hostconfig->auto_remove:create的时候赋值,在option中有与之对应的配置,默认为false。 ++ ++为什么设置两个: ++ ++因为在restart的stop过程中,需要将容器 cont->hostconfig->auto_remove置为false,从而防止容器在restart过程中被删除。还原时需要用bak备份的,gc和restore时也用bak为了保证不使用改变后的值。 ++ ++auto_remove作用: ++ ++(1)当位于异常stop状态,且重启策略不重启、auto_remove_bak为true时,就会设置状态为removal状态,并且delete_container。 ++ ++(2)当start异常时,若设置了auto_remove,则需要先将容器的状态设置为removal之后删除容器,然后将容器状态存入disk。 ++ ++(3)当容器的auto_remove设置为true时,容器的重启策略必须为no,不重启 ++ ++auto_remove_bak作用: ++ ++(1)当gc之后,且重启策略没将容器running且auto_remove_bak为true时,就会设置状态为removal状态,并且delete_container。 ++ ++(2)当restore时,当容器不处于running状态且auto_remove_bak为true时,就会设置状态为removal状态,并且delete_container。 ++ ++## gc的状态检查 ++ ++| 判定方式 | 拿容器锁前判定 | 拿容器锁后判定 | ++| ------------------------------------------------- | ------------------------------------------- | -------------- | ++| 判断容器id是否位于g_gc_containers.containers_list | kill、pause、stop、系统restart、用户restart | rm、start | ++ ++ ++ ++# 现有状态转换 ++ ++## create过程 ++ ++- create结束的标志是:将容器id和结构体conf作为key和value存入g_containers_store->map中即create成功,delete在最后的过程中才将容器id的map项删除 ++ ++![create](../../../images/Container_state/create.svg) ++ ++ ++ ++## pause过程 ++ ++![paused](../../../images/Container_state/paused.svg) ++ ++1. 是不是在pause在获得容器锁之前,需要补充: ++ ++```c ++ if (container_is_removal_in_progress(cont->state) || container_is_dead(cont->state)) { ++ ERROR("Container is marked for removal and cannot be started."); ++ isulad_set_error_message("Container is marked for removal and cannot be started."); ++ ret = -1; ++ goto out; ++ } ++``` ++ ++## running状态 ++ ++这里在start_container之前不需要检查是否位于gc状态是因为,gc结束之后可能是位于stop状态,stop之后是允许start的,而在状态检查和start_container中间,准备工作还需要一段时间。 ++ ++![running](../../../images/Container_state/running.svg) ++ ++​ Q: ++ ++1. docker中start一个paused的容器会报错容器位于paused状态,需要先unpased,而在我们的逻辑中,由于running状态和paused状态时一起为true的,现有状态检查中,只是检查出他位于running状态,无需进行start操作,不会返回用户信息。 ++ ++## stop过程 ++ ++Q:在发送命令创建之后检查容器容器是不是运行状态?发送命令之后通过running的变化得知是否完成了stop?runtime中是会对容器状态改变吗? ++ ++![stop](../../../images/Container_state/stop.svg) ++ ++ ++ ++ ++ ++1. 是不是在gc状态会包含了容器在删除过程中吗?或者说是不是在删除时,第一过程就是要将全局的id与conf删除,则没获取到就是校验了,如果不包含,则stop在获得容器锁之前,需要补充: ++ ++```c ++ if (container_is_removal_in_progress(cont->state) || container_is_dead(cont->state)) { ++ ERROR("Container is marked for removal and cannot be started."); ++ isulad_set_error_message("Container is marked for removal and cannot be started."); ++ ret = -1; ++ goto out; ++ } ++``` ++ ++但是由于,如果需要强行删除容器,因为在删除的最初已经将删除状态设置了,若在stop中加入了检查是否在删除状态,就会直接退出,不会再进行之后的操作。但是可以在force情况下也对stop放弃此检查?? ++ ++2.与docker中行为的不同: ++ ++在docker中容器位于paused状态也可以进行stop和kill以及restart,但是我们代码里面 ++ ++```c ++if (container_is_paused(cont->state)) { ++ ERROR("Container %s is paused. Unpause the container before stopping or killing", id); ++ isulad_set_error_message("Container %s is paused. Unpause the container before stopping or killing", id); ++ ret = -1; ++ goto out; ++ } ++``` ++ ++在stop时检查了是否处于paused,若处于,则直接错误返回,而在kill时是没有进行检查的,而在用户手册中展示的为: ++ ++![openeuler](../../../images/Container_state/openeuler.png) ++ ++而stop的这个行为导致强行 rm 和restart时都对于处于pause状态的容器无法操作。 ++ ++是需要和docker一致?还是保持原样,保持原样的话是不是需要将备注错误信息改一下呢? ++ ++## gc和supervisor过程 ++ ++gc过程结束之后才会将容器从g_gc_containers.containers_list中清除。 ++ ++全局的g_gc_containers.containers_list中,而判断容器是否在gc状态的标准就是是不是在全局的这个list中 ++ ++```c ++/* gc is gc progress */ ++bool gc_is_gc_progress(const char *id) ++{ ++ bool ret = false; ++ struct linked_list *it = NULL; ++ struct linked_list *next = NULL; ++ container_garbage_config_gc_containers_element *cont = NULL; ++ ++ gc_containers_lock(); ++ ++ linked_list_for_each_safe(it, &g_gc_containers.containers_list, next) { ++ cont = (container_garbage_config_gc_containers_element *)it->elem; ++ if (strcmp(id, cont->id) == 0) { ++ ret = true; ++ break; ++ } ++ } ++ ++ gc_containers_unlock(); ++ ++ return ret; ++} ++``` ++ ++ ++ ++![gc_supervisor](../../../images/Container_state/gc_supervisor.svg) ++ ++​ Q:resume container时啥作用,是start容器吗?这里为什么要做,且不需要容器状态检查 ++ ++## delete过程 ++ ++在容器状态异常发生stop event且不restart的状态下,也需要对容器delete_container,但是感觉疑问还是在,因为可以在delete_container中对状态进行判断之后在do_delete_container才进行remove状态的设置:可能是为了增加删除的概率,若在竞争状态下,先设置为removal能阻止其他的容器操作进程运行,但是其他的对于这个状态的判断很少 ++ ++![delete](../../../images/Container_state/delete.svg) ++ ++ ++ ++## kill过程 ++ ++![kill](../../../images/Container_state/kill.svg) ++ ++## restart过程 ++ ++restart有两种,一种是重启策略的重启,默认的重启策略为no,若设置为always或者设置为unless-stoped且has_been_manually_stopped为fasle时,会重启;另一种是在用户使用restart命令时,也会对容器进行restart,因此在执行stop_and_start里面时,才需要对running状态进行检查。 ++ ++### 用户restart ++ ++![user_restart](../../../images/Container_state/user_restart.svg) ++ ++### 容器event变为 stopped之后引发的容器状态改变 ++ ++![sys_restart](../../../images/Container_state/sys_restart.svg) ++ ++ ++ ++# Question ++ ++## 在获得容器锁之前的状态检查和在获得容器锁之后的状态检查有什么区别? ++ ++在之前检查的是必然没必要往下走的状态,而在获得容器锁之后检查的是必须保证满足开始进行操作的条件的状态,更严格。在第一次检查和获得容器锁之间有一段时间,需要允许其他操作并发操作。 ++ ++## gc持有锁的情况 ++ ++1. 在清理容器资源时,需要获得容器的锁:clean_container_resource ++ ++## 状态改变但没有落盘 ++ ++src/daemon/executor/container_cb/execution.c #391、421、502 ++ ++src/daemon/modules/container/restart_manager/restartmanager.c #104 ++ ++src/daemon/executor/container_cb/execution_extend.c #534 这里为什么要将这个放在container_update_health_monitor后面,里面好像没有涉及到状态的改变。 +diff --git a/docs/manual/k8s_integration_zh.md b/docs/manual/k8s_integration_zh.md +index 82673bcc..6dda1e4d 100644 +--- a/docs/manual/k8s_integration_zh.md ++++ b/docs/manual/k8s_integration_zh.md +@@ -1,215 +1,215 @@ +-# 整合kubernetes +- +-## 配置 +- +-1. 配置`isulad` +- +- 在`/etc/isulad/daemon.json`中先配置`pod-sandbox-image` : +- +- ```json +- "pod-sandbox-image": "my-pause:1.0.0" +- ``` +- +- 之后配置`isulad`的 `endpoint`: +- +- ```json +- "hosts": [ +- "unix:///var/run/isulad.sock" +- ] +- ``` +- +- 如果`hosts`没有配置,默认的`endpoint`为``unix:///var/run/isulad.sock`` +- +-2. 重启`isulad` +- +- ```bash +- $ sudo systemctl restart isulad +- ``` +- +-3. 基于配置或者默认值启动`kubelet` +- +- ```bash +- $ /usr/bin/kubelet +- --container-runtime-endpoint=unix:///var/run/isulad.sock +- --image-service-endpoint=unix:///var/run/isulad.sock +- --pod-infra-container-image=my-pause:1.0.0 +- --container-runtime=remote +- ... +- ``` +- +-## 使用 RuntimeClass +- +-RuntimeClass 用于选择容器运行时配置从而运行 pod 的容器,RuntimeClass 的具体信息请查看 [runtime-class](https://kubernetes.io/docs/concepts/containers/runtime-class/)。目前,只支持`kata-containers` 和 `runc`这两种`oci runtime`。 +- +-1. 在`/etc/isulad/daemon.json`中配置`isulad` +- +- ```json +- "runtimes": { +- "kata-runtime": { +- "path": "/usr/bin/kata-runtime", +- "runtime-args": [ +- "--kata-config", +- "/usr/share/defaults/kata-containers/configuration.toml" +- ] +- } +- } +- ``` +- +-2. 其他配置 +- +- `isulad`支持`overlay2` 和 `devicemapper`作为存储驱动程序,默认的为`overlay2` 。 +- +- 在某些情况下,更适合使用块设备类型作为存储驱动程序,例如运行 `kata-containers`。配置`devicemapper`的过程如下: +- +- 首先创建ThinPool: +- +- ```bash +- $ sudo pvcreate /dev/sdb1 # /dev/sdb1 for example +- $ sudo vgcreate isulad /dev/sdb +- $ sudo echo y | lvcreate --wipesignatures y -n thinpool isulad -L 200G +- $ sudo echo y | lvcreate --wipesignatures y -n thinpoolmeta isulad -L 20G +- $ sudo lvconvert -y --zero n -c 512K --thinpool isulad/thinpool --poolmetadata isulad/thinpoolmeta +- $ sudo lvchange --metadataprofile isulad-thinpool isulad/thinpool +- ``` +- +- 之后在`/etc/isulad/daemon.json`中增加 `devicemapper` 的配置 : +- +- ```json +- "storage-driver": "devicemapper" +- "storage-opts": [ +- "dm.thinpooldev=/dev/mapper/isulad-thinpool", +- "dm.fs=ext4", +- "dm.min_free_space=10%" +- ] +- ``` +- +-3. 重启`isulad` +- +- ```bash +- $ sudo systemctl restart isulad +- ``` +- +-4. 定义 `kata-runtime.yaml`,例如创建一个`kata-runtime.yaml`内容如下: +- +- ```yaml +- apiVersion: node.k8s.io/v1beta1 +- kind: RuntimeClass +- metadata: +- name: kata-runtime +- handler: kata-runtime +- ``` +- +- 之后运行`kubectl apply -f kata-runtime.yaml`命令在kubectl中让这个配置生效。 +- +-5. 定义 pod spec `kata-pod.yaml` ,例如创建一个`kata-pod.yaml`,内容如下: +- +- ```yaml +- apiVersion: v1 +- kind: Pod +- metadata: +- name: kata-pod-example +- spec: +- runtimeClassName: kata-runtime +- containers: +- - name: kata-pod +- image: busybox:latest +- command: ["/bin/sh"] +- args: ["-c", "sleep 1000"] +- ``` +- +-6. 运行 pod +- +- ```bash +- $ kubectl create -f kata-pod.yaml +- $ kubectl get pod +- NAME READY STATUS RESTARTS AGE +- kata-pod-example 1/1 Running 4 2s +- ``` +- +-## CNI 网络配置 +- +-`isulad`实现了CRI接口从而可以连接CNI网络、解析CNI的网络配置文件、加入或者退出CNI网络。在本节中,我们调用 CRI 接口启动 pod 来验证 CNI 网络配置。 +- +-1. 在`/etc/isulad/daemon.json`中配置`isulad`: +- +- ```json +- "network-plugin": "cni", +- "cni-bin-dir": "/opt/cni/bin", +- "cni-conf-dir": "/etc/cni/net.d", +- ``` +- +-2. 准备CNI网络的插件: +- +- 编译生成 CNI 插件的二进制文件,并将该二进制文件复制到 `/opt/cni/bin`。 +- +- ```bash +- $ git clone https://github.com/containernetworking/plugins.git +- $ cd plugins && ./build_linux.sh +- $ cd ./bin && ls +- bandwidth bridge dhcp firewall flannel ... +- ``` +- +-3. 准备CNI网络的配置: +- +- 配置文件的后缀可以是`.conflist`或者`.conf`,区别在于是否包含多个插件。例如,我们在目录`/etc/cni/net.d/`下创建`10-mynet.conflist`文件,内容如下: +- +- ```json +- { +- "cniVersion": "0.3.1", +- "name": "default", +- "plugins": [ +- { +- "name": "default", +- "type": "ptp", +- "ipMasq": true, +- "ipam": { +- "type": "host-local", +- "subnet": "10.1.0.0/16", +- "routes": [ +- { +- "dst": "0.0.0.0/0" +- } +- ] +- } +- }, +- { +- "type": "portmap", +- "capabilities": { +- "portMappings": true +- } +- } +- ] +- } +- ``` +- +-4. 配置`sandbox-config.json`: +- +- ```json +- { +- "port_mappings":[{"protocol": 1, "container_port": 80, "host_port": 8080}], +- "metadata": { +- "name": "test", +- "namespace": "default", +- "attempt": 1, +- "uid": "hdishd83djaidwnduwk28bcsb" +- }, +- "labels": { +- "filter_label_key": "filter_label_val" +- }, +- "linux": { +- } +- } +- ``` +- +-5. 重启`isulad`并且启动pod: +- +- ```sh +- $ sudo systemctl restart isulad +- $ sudo crictl -i unix:///var/run/isulad.sock -r unix:///var/run/isulad.sock runp sandbox-config.json +- ``` +- +-6. 查看pod网络信息: +- +- ```sh +- $ sudo crictl -i unix:///var/run/isulad.sock -r unix:///var/run/isulad.sock inspectp ++# 整合kubernetes ++ ++## 配置 ++ ++1. 配置`isulad` ++ ++ 在`/etc/isulad/daemon.json`中先配置`pod-sandbox-image` : ++ ++ ```json ++ "pod-sandbox-image": "my-pause:1.0.0" ++ ``` ++ ++ 之后配置`isulad`的 `endpoint`: ++ ++ ```json ++ "hosts": [ ++ "unix:///var/run/isulad.sock" ++ ] ++ ``` ++ ++ 如果`hosts`没有配置,默认的`endpoint`为``unix:///var/run/isulad.sock`` ++ ++2. 重启`isulad` ++ ++ ```bash ++ $ sudo systemctl restart isulad ++ ``` ++ ++3. 基于配置或者默认值启动`kubelet` ++ ++ ```bash ++ $ /usr/bin/kubelet ++ --container-runtime-endpoint=unix:///var/run/isulad.sock ++ --image-service-endpoint=unix:///var/run/isulad.sock ++ --pod-infra-container-image=my-pause:1.0.0 ++ --container-runtime=remote ++ ... ++ ``` ++ ++## 使用 RuntimeClass ++ ++RuntimeClass 用于选择容器运行时配置从而运行 pod 的容器,RuntimeClass 的具体信息请查看 [runtime-class](https://kubernetes.io/docs/concepts/containers/runtime-class/)。目前,只支持`kata-containers` 和 `runc`这两种`oci runtime`。 ++ ++1. 在`/etc/isulad/daemon.json`中配置`isulad` ++ ++ ```json ++ "runtimes": { ++ "kata-runtime": { ++ "path": "/usr/bin/kata-runtime", ++ "runtime-args": [ ++ "--kata-config", ++ "/usr/share/defaults/kata-containers/configuration.toml" ++ ] ++ } ++ } ++ ``` ++ ++2. 其他配置 ++ ++ `isulad`支持`overlay2` 和 `devicemapper`作为存储驱动程序,默认的为`overlay2` 。 ++ ++ 在某些情况下,更适合使用块设备类型作为存储驱动程序,例如运行 `kata-containers`。配置`devicemapper`的过程如下: ++ ++ 首先创建ThinPool: ++ ++ ```bash ++ $ sudo pvcreate /dev/sdb1 # /dev/sdb1 for example ++ $ sudo vgcreate isulad /dev/sdb ++ $ sudo echo y | lvcreate --wipesignatures y -n thinpool isulad -L 200G ++ $ sudo echo y | lvcreate --wipesignatures y -n thinpoolmeta isulad -L 20G ++ $ sudo lvconvert -y --zero n -c 512K --thinpool isulad/thinpool --poolmetadata isulad/thinpoolmeta ++ $ sudo lvchange --metadataprofile isulad-thinpool isulad/thinpool ++ ``` ++ ++ 之后在`/etc/isulad/daemon.json`中增加 `devicemapper` 的配置 : ++ ++ ```json ++ "storage-driver": "devicemapper" ++ "storage-opts": [ ++ "dm.thinpooldev=/dev/mapper/isulad-thinpool", ++ "dm.fs=ext4", ++ "dm.min_free_space=10%" ++ ] ++ ``` ++ ++3. 重启`isulad` ++ ++ ```bash ++ $ sudo systemctl restart isulad ++ ``` ++ ++4. 定义 `kata-runtime.yaml`,例如创建一个`kata-runtime.yaml`内容如下: ++ ++ ```yaml ++ apiVersion: node.k8s.io/v1beta1 ++ kind: RuntimeClass ++ metadata: ++ name: kata-runtime ++ handler: kata-runtime ++ ``` ++ ++ 之后运行`kubectl apply -f kata-runtime.yaml`命令在kubectl中让这个配置生效。 ++ ++5. 定义 pod spec `kata-pod.yaml` ,例如创建一个`kata-pod.yaml`,内容如下: ++ ++ ```yaml ++ apiVersion: v1 ++ kind: Pod ++ metadata: ++ name: kata-pod-example ++ spec: ++ runtimeClassName: kata-runtime ++ containers: ++ - name: kata-pod ++ image: busybox:latest ++ command: ["/bin/sh"] ++ args: ["-c", "sleep 1000"] ++ ``` ++ ++6. 运行 pod ++ ++ ```bash ++ $ kubectl create -f kata-pod.yaml ++ $ kubectl get pod ++ NAME READY STATUS RESTARTS AGE ++ kata-pod-example 1/1 Running 4 2s ++ ``` ++ ++## CNI 网络配置 ++ ++`isulad`实现了CRI接口从而可以连接CNI网络、解析CNI的网络配置文件、加入或者退出CNI网络。在本节中,我们调用 CRI 接口启动 pod 来验证 CNI 网络配置。 ++ ++1. 在`/etc/isulad/daemon.json`中配置`isulad`: ++ ++ ```json ++ "network-plugin": "cni", ++ "cni-bin-dir": "/opt/cni/bin", ++ "cni-conf-dir": "/etc/cni/net.d", ++ ``` ++ ++2. 准备CNI网络的插件: ++ ++ 编译生成 CNI 插件的二进制文件,并将该二进制文件复制到 `/opt/cni/bin`。 ++ ++ ```bash ++ $ git clone https://github.com/containernetworking/plugins.git ++ $ cd plugins && ./build_linux.sh ++ $ cd ./bin && ls ++ bandwidth bridge dhcp firewall flannel ... ++ ``` ++ ++3. 准备CNI网络的配置: ++ ++ 配置文件的后缀可以是`.conflist`或者`.conf`,区别在于是否包含多个插件。例如,我们在目录`/etc/cni/net.d/`下创建`10-mynet.conflist`文件,内容如下: ++ ++ ```json ++ { ++ "cniVersion": "0.3.1", ++ "name": "default", ++ "plugins": [ ++ { ++ "name": "default", ++ "type": "ptp", ++ "ipMasq": true, ++ "ipam": { ++ "type": "host-local", ++ "subnet": "10.1.0.0/16", ++ "routes": [ ++ { ++ "dst": "0.0.0.0/0" ++ } ++ ] ++ } ++ }, ++ { ++ "type": "portmap", ++ "capabilities": { ++ "portMappings": true ++ } ++ } ++ ] ++ } ++ ``` ++ ++4. 配置`sandbox-config.json`: ++ ++ ```json ++ { ++ "port_mappings":[{"protocol": 1, "container_port": 80, "host_port": 8080}], ++ "metadata": { ++ "name": "test", ++ "namespace": "default", ++ "attempt": 1, ++ "uid": "hdishd83djaidwnduwk28bcsb" ++ }, ++ "labels": { ++ "filter_label_key": "filter_label_val" ++ }, ++ "linux": { ++ } ++ } ++ ``` ++ ++5. 重启`isulad`并且启动pod: ++ ++ ```sh ++ $ sudo systemctl restart isulad ++ $ sudo crictl -i unix:///var/run/isulad.sock -r unix:///var/run/isulad.sock runp sandbox-config.json ++ ``` ++ ++6. 查看pod网络信息: ++ ++ ```sh ++ $ sudo crictl -i unix:///var/run/isulad.sock -r unix:///var/run/isulad.sock inspectp + ``` +\ No newline at end of file +diff --git a/test/image/oci/registry/data/v2/ping_head b/test/image/oci/registry/data/v2/ping_head +index 93742901..d9456e50 100644 +--- a/test/image/oci/registry/data/v2/ping_head ++++ b/test/image/oci/registry/data/v2/ping_head +@@ -4,6 +4,6 @@ Date: Thu, 02 Jul 2020 09:14:14 GMT + Content-Type: application/json; charset=utf-8 + Content-Length: 2 + Connection: keep-alive +-Docker-Distribution-Api-Version: registry/2.0 +- ++Docker-Distribution-Api-Version: registry/2.0 ++ + {"errors":[{"code":"UNAUTHORIZED","message":"Unauthorized access."}]} +-- +2.40.1 + diff --git a/0002-restore-ping-head.patch b/0002-restore-ping-head.patch new file mode 100644 index 0000000..47b856a --- /dev/null +++ b/0002-restore-ping-head.patch @@ -0,0 +1,26 @@ +From b20af14432e3befcae1c40de91a4dfb579ccd03b Mon Sep 17 00:00:00 2001 +From: zhongtao +Date: Sat, 13 May 2023 11:05:35 +0800 +Subject: [PATCH 2/9] restore ping head + +Signed-off-by: zhongtao +--- + test/image/oci/registry/data/v2/ping_head | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/test/image/oci/registry/data/v2/ping_head b/test/image/oci/registry/data/v2/ping_head +index d9456e50..93742901 100644 +--- a/test/image/oci/registry/data/v2/ping_head ++++ b/test/image/oci/registry/data/v2/ping_head +@@ -4,6 +4,6 @@ Date: Thu, 02 Jul 2020 09:14:14 GMT + Content-Type: application/json; charset=utf-8 + Content-Length: 2 + Connection: keep-alive +-Docker-Distribution-Api-Version: registry/2.0 +- ++Docker-Distribution-Api-Version: registry/2.0 ++ + {"errors":[{"code":"UNAUTHORIZED","message":"Unauthorized access."}]} +-- +2.40.1 + diff --git a/0003-fix-health_check.sh.patch b/0003-fix-health_check.sh.patch new file mode 100644 index 0000000..34eba29 --- /dev/null +++ b/0003-fix-health_check.sh.patch @@ -0,0 +1,42 @@ +From 5bfde56d1130572e9bf76dd0fc40a5f0a34923d2 Mon Sep 17 00:00:00 2001 +From: zhongtao +Date: Sun, 14 May 2023 14:02:48 +0800 +Subject: [PATCH 3/9] fix health_check.sh + +Signed-off-by: zhongtao +--- + CI/test_cases/container_cases/health_check.sh | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +diff --git a/CI/test_cases/container_cases/health_check.sh b/CI/test_cases/container_cases/health_check.sh +index 621574cd..1542bd09 100755 +--- a/CI/test_cases/container_cases/health_check.sh ++++ b/CI/test_cases/container_cases/health_check.sh +@@ -103,7 +103,7 @@ function test_health_check_normally() + [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "healthy" ]] + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not healthy" && ((ret++)) + +- kill -9 $(isula inspect -f '{{.State.Pid}}' ${container_name}) && sleep 1 # Wait for the container to be killed ++ kill -9 $(isula inspect -f '{{.State.Pid}}' ${container_name}) && sleep 2 # Wait for the container to be killed + + # The container process exits abnormally and the health check status becomes unhealthy + [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "unhealthy" ]] +@@ -139,13 +139,13 @@ function test_health_check_timeout() + [[ $(isula inspect -f '{{.State.Status}}' ${container_name}) == "running" ]] + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container status: not running" && ((ret++)) + +- sleep 1 # Health check has been performed yet ++ sleep 2 # Health check has been performed yet + + # Initial status when the container is still starting + [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "starting" ]] + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not starting" && ((ret++)) + +- sleep 7 # finish first health check ++ sleep 10 # finish first health check + # The container process exits and the health check status becomes unhealthy + [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "unhealthy" ]] + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not unhealthy" && ((ret++)) +-- +2.40.1 + diff --git a/0004-ensure-isulad_io-not-NULL-before-close-fd.patch b/0004-ensure-isulad_io-not-NULL-before-close-fd.patch new file mode 100644 index 0000000..767a89e --- /dev/null +++ b/0004-ensure-isulad_io-not-NULL-before-close-fd.patch @@ -0,0 +1,62 @@ +From 2c651d3ed5e7d7d78338ce542e66ee9fb36a9275 Mon Sep 17 00:00:00 2001 +From: zhongtao +Date: Sun, 14 May 2023 14:25:22 +0800 +Subject: [PATCH 4/9] ensure isulad_io not NULL before close fd + +Signed-off-by: zhongtao +--- + src/cmd/isulad-shim/process.c | 28 ++++++++++++++++------------ + 1 file changed, 16 insertions(+), 12 deletions(-) + +diff --git a/src/cmd/isulad-shim/process.c b/src/cmd/isulad-shim/process.c +index 7716c288..6ad50c53 100644 +--- a/src/cmd/isulad-shim/process.c ++++ b/src/cmd/isulad-shim/process.c +@@ -194,6 +194,10 @@ static int stdin_cb(int fd, uint32_t events, void *cbdata, struct epoll_descr *d + } else { + fd_to = &(p->shim_io->in); + } ++ ++ if (fd_to == NULL || *fd_to == -1) { ++ return EPOLL_LOOP_HANDLE_CONTINUE; ++ } + w_count = write_nointr_in_total(*fd_to, p->buf, r_count); + if (w_count < 0) { + /* When any error occurs, set the write fd -1 */ +@@ -797,21 +801,21 @@ static int init_isulad_stdio(process_t *p) + return SHIM_OK; + failure: + if (p->isulad_io != NULL) { ++ if (p->isulad_io->in > 0) { ++ close(p->isulad_io->in); ++ } ++ if (p->isulad_io->out > 0) { ++ close(p->isulad_io->out); ++ } ++ if (p->isulad_io->err > 0) { ++ close(p->isulad_io->err); ++ } ++ if (p->isulad_io->resize > 0) { ++ close(p->isulad_io->resize); ++ } + free(p->isulad_io); + p->isulad_io = NULL; + } +- if (p->isulad_io->in > 0) { +- close(p->isulad_io->in); +- } +- if (p->isulad_io->out > 0) { +- close(p->isulad_io->out); +- } +- if (p->isulad_io->err > 0) { +- close(p->isulad_io->err); +- } +- if (p->isulad_io->resize > 0) { +- close(p->isulad_io->resize); +- } + return SHIM_ERR; + } + +-- +2.40.1 + diff --git a/0005-recheck-delete-command-exit-status.patch b/0005-recheck-delete-command-exit-status.patch new file mode 100644 index 0000000..864a7bd --- /dev/null +++ b/0005-recheck-delete-command-exit-status.patch @@ -0,0 +1,75 @@ +From 57921deef3849f519b3fffdcf76184144ba54fb3 Mon Sep 17 00:00:00 2001 +From: zhongtao +Date: Tue, 16 May 2023 15:16:13 +0800 +Subject: [PATCH 5/9] recheck delete command exit status + +Signed-off-by: zhongtao +--- + .../modules/runtime/isula/isula_rt_ops.c | 24 ++++++++++++------- + 1 file changed, 15 insertions(+), 9 deletions(-) + +diff --git a/src/daemon/modules/runtime/isula/isula_rt_ops.c b/src/daemon/modules/runtime/isula/isula_rt_ops.c +index 9008c5c7..07f714f0 100644 +--- a/src/daemon/modules/runtime/isula/isula_rt_ops.c ++++ b/src/daemon/modules/runtime/isula/isula_rt_ops.c +@@ -635,9 +635,9 @@ static int runtime_call_simple(const char *workdir, const char *runtime, const c + } + + // oci runtime return -1 if the container 'does not exist' +-// if output contains 'does not exist', means nothing to kill, return 0 +-// this will change the exit status of kill command +-static int kill_output_check(const char *output) ++// if output contains 'does not exist', means nothing to kill or delete, return 0 ++// this will change the exit status of kill or delete command ++static int non_existent_output_check(const char *output) + { + char *pattern = "does not exist"; + +@@ -645,24 +645,24 @@ static int kill_output_check(const char *output) + return -1; + } + +- // container not exist, kill success, return 0 ++ // container not exist, kill or delete success, return 0 + if (util_strings_contains_word(output, pattern)) { + return 0; + } + +- // kill failed, return -1 ++ // kill or delete failed, return -1 + return -1; + } + +-// kill success or kill_output_check succeed return 0, DO_RETRY_CALL will break; ++// kill success or non_existent_output_check succeed return 0, DO_RETRY_CALL will break; + // if kill failed, recheck on shim alive, if not alive, kill succeed, still return 0; + // else, return -1, DO_RETRY_CALL will call this again; + static int runtime_call_kill_and_check(const char *workdir, const char *runtime, const char *id) + { + int ret = -1; + +- // kill succeed, return 0; kill_output_check succeed, return 0; +- ret = runtime_call_simple(workdir, runtime, "kill", NULL, 0, id, kill_output_check); ++ // kill succeed, return 0; non_existent_output_check succeed, return 0; ++ ret = runtime_call_simple(workdir, runtime, "kill", NULL, 0, id, non_existent_output_check); + if (ret == 0) { + return 0; + } +@@ -677,7 +677,13 @@ static int runtime_call_kill_and_check(const char *workdir, const char *runtime, + static int runtime_call_delete_force(const char *workdir, const char *runtime, const char *id) + { + const char *opts[1] = { "--force" }; +- return runtime_call_simple(workdir, runtime, "delete", opts, 1, id, NULL); ++ // delete succeed, return 0; ++ // When the runc version is less than or equal to v1.0.0-rc3, ++ // if the container does not exist when force deleting it, ++ // runc will report an error and isulad does not need to retry the deletion again. ++ // related PR ID:d1a743674a98e23d348b29f52c43436356f56b79 ++ // non_existent_output_check succeed, return 0; ++ return runtime_call_simple(workdir, runtime, "delete", opts, 1, id, non_existent_output_check); + } + + #define ExitSignalOffset 128 +-- +2.40.1 + diff --git a/0006-restore-execSync-return-value.patch b/0006-restore-execSync-return-value.patch new file mode 100644 index 0000000..1518586 --- /dev/null +++ b/0006-restore-execSync-return-value.patch @@ -0,0 +1,31 @@ +From f591197fc150e9a137d869b518cda4cecbf70363 Mon Sep 17 00:00:00 2001 +From: zhongtao +Date: Thu, 18 May 2023 19:07:13 +0800 +Subject: [PATCH 6/9] restore execSync return value + +Signed-off-by: zhongtao +--- + src/daemon/entry/connect/grpc/runtime_runtime_service.cc | 7 ------- + 1 file changed, 7 deletions(-) + +diff --git a/src/daemon/entry/connect/grpc/runtime_runtime_service.cc b/src/daemon/entry/connect/grpc/runtime_runtime_service.cc +index 63a780cb..354b220e 100644 +--- a/src/daemon/entry/connect/grpc/runtime_runtime_service.cc ++++ b/src/daemon/entry/connect/grpc/runtime_runtime_service.cc +@@ -279,13 +279,6 @@ grpc::Status RuntimeRuntimeServiceImpl::ExecSync(grpc::ServerContext *context, + return grpc::Status(grpc::StatusCode::UNKNOWN, error.GetMessage()); + } + +- if (reply->exit_code() != 0) { +- ERROR("Object: CRI, Type: Sync exec in container: %s with exit code: %d", request->container_id().c_str(), +- reply->exit_code()); +- error.SetError(reply->stderr()); +- return grpc::Status(grpc::StatusCode::UNKNOWN, error.GetMessage()); +- } +- + WARN("Event: {Object: CRI, Type: sync execed Container: %s}", request->container_id().c_str()); + + return grpc::Status::OK; +-- +2.40.1 + diff --git a/0007-reinforce-cri_stream.sh-and-health_check.sh.patch b/0007-reinforce-cri_stream.sh-and-health_check.sh.patch new file mode 100644 index 0000000..44e432d --- /dev/null +++ b/0007-reinforce-cri_stream.sh-and-health_check.sh.patch @@ -0,0 +1,131 @@ +From e04d2f1d8382e7b3e81ab4725a21562147ad1727 Mon Sep 17 00:00:00 2001 +From: zhongtao +Date: Mon, 22 May 2023 12:50:32 +0800 +Subject: [PATCH 7/9] reinforce cri_stream.sh and health_check.sh + +Signed-off-by: zhongtao +--- + CI/test_cases/container_cases/cri_stream.sh | 30 ++++++++++++---- + CI/test_cases/container_cases/health_check.sh | 34 ++++++++++++++----- + 2 files changed, 50 insertions(+), 14 deletions(-) + +diff --git a/CI/test_cases/container_cases/cri_stream.sh b/CI/test_cases/container_cases/cri_stream.sh +index 8b5440d3..bfe90208 100755 +--- a/CI/test_cases/container_cases/cri_stream.sh ++++ b/CI/test_cases/container_cases/cri_stream.sh +@@ -58,6 +58,9 @@ function set_up() + function test_cri_exec_fun() + { + local ret=0 ++ local retry_limit=20 ++ local retry_interval=1 ++ local success=1 + local test="test_cri_exec_fun => (${FUNCNAME[@]})" + msg_info "${test} starting..." + declare -a fun_pids +@@ -74,9 +77,15 @@ function test_cri_exec_fun() + done + wait ${abn_pids[*]// /|} + +- sleep 2 +- ps -T -p $(cat /var/run/isulad.pid) | grep IoCopy +- [[ $? -eq 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - residual IO copy thread in CRI exec operation" && ((ret++)) ++ for i in $(seq 1 "$retry_limit"); do ++ ps -T -p $(cat /var/run/isulad.pid) | grep IoCopy ++ if [ $? -ne 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - residual IO copy thread in CRI exec operation" && ((ret++)) + + msg_info "${test} finished with return ${ret}..." + return ${ret} +@@ -85,6 +94,9 @@ function test_cri_exec_fun() + function test_cri_exec_abn + { + local ret=0 ++ local retry_limit=20 ++ local retry_interval=1 ++ local success=1 + local test="test_cri_exec_abn => (${FUNCNAME[@]})" + msg_info "${test} starting..." + +@@ -92,10 +104,16 @@ function test_cri_exec_abn + pid=$! + sleep 3 + kill -9 $pid +- sleep 2 + +- ps -T -p $(cat /var/run/isulad.pid) | grep IoCopy +- [[ $? -eq 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - residual IO copy thread in CRI exec operation" && ((ret++)) ++ for i in $(seq 1 "$retry_limit"); do ++ ps -T -p $(cat /var/run/isulad.pid) | grep IoCopy ++ if [ $? -ne 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - residual IO copy thread in CRI exec operation" && ((ret++)) + + msg_info "${test} finished with return ${ret}..." + return ${ret} +diff --git a/CI/test_cases/container_cases/health_check.sh b/CI/test_cases/container_cases/health_check.sh +index 1542bd09..28af6149 100755 +--- a/CI/test_cases/container_cases/health_check.sh ++++ b/CI/test_cases/container_cases/health_check.sh +@@ -123,6 +123,9 @@ function test_health_check_timeout() + { + local ret=0 + local image="busybox" ++ local retry_limit=10 ++ local retry_interval=1 ++ local success=1 + local test="list && inspect image info test => (${FUNCNAME[@]})" + + msg_info "${test} starting..." +@@ -139,16 +142,31 @@ function test_health_check_timeout() + [[ $(isula inspect -f '{{.State.Status}}' ${container_name}) == "running" ]] + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container status: not running" && ((ret++)) + +- sleep 2 # Health check has been performed yet +- ++ # Health check has been performed yet ++ for i in $(seq 1 "$retry_limit"); do ++ [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "starting" ]] ++ if [ $? -eq 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done + # Initial status when the container is still starting +- [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "starting" ]] +- [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not starting" && ((ret++)) +- +- sleep 10 # finish first health check ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not starting" && ((ret++)) ++ ++ sleep 7 # finish first health check ++ ++ success=1 ++ for i in $(seq 1 "$retry_limit"); do ++ [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "unhealthy" ]] ++ if [ $? -eq 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done + # The container process exits and the health check status becomes unhealthy +- [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "unhealthy" ]] +- [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not unhealthy" && ((ret++)) ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not unhealthy" && ((ret++)) + + [[ $(isula inspect -f '{{.State.ExitCode}}' ${container_name}) == "137" ]] + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container exit code: not 137" && ((ret++)) +-- +2.40.1 + diff --git a/0008-reinforce-omit-health_check.sh.patch b/0008-reinforce-omit-health_check.sh.patch new file mode 100644 index 0000000..9d81e61 --- /dev/null +++ b/0008-reinforce-omit-health_check.sh.patch @@ -0,0 +1,166 @@ +From ab3d902b09ace8d69172a4ea6cf9771a21540ffb Mon Sep 17 00:00:00 2001 +From: zhongtao +Date: Mon, 22 May 2023 19:37:23 +0800 +Subject: [PATCH 8/9] reinforce omit health_check.sh + +Signed-off-by: zhongtao +--- + CI/test_cases/container_cases/health_check.sh | 100 +++++++++++++++--- + 1 file changed, 83 insertions(+), 17 deletions(-) + +diff --git a/CI/test_cases/container_cases/health_check.sh b/CI/test_cases/container_cases/health_check.sh +index 28af6149..0bbad16e 100755 +--- a/CI/test_cases/container_cases/health_check.sh ++++ b/CI/test_cases/container_cases/health_check.sh +@@ -29,6 +29,9 @@ isula pull ${image} + function test_health_check_paraments() + { + local ret=0 ++ local retry_limit=10 ++ local retry_interval=1 ++ local success=1 + local test="list && inspect image info test => (${FUNCNAME[@]})" + + msg_info "${test} starting..." +@@ -45,16 +48,33 @@ function test_health_check_paraments() + [[ $(isula inspect -f '{{.State.Status}}' ${container_name}) == "running" ]] + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container status: not running" && ((ret++)) + +- sleep 13 # finish first health check ++ # finish first health check ++ sleep 10 ++ for i in $(seq 1 "$retry_limit"); do ++ [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "starting" ]] ++ if [ $? -eq 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done + + # keep starting status with health check return non-zero at always until status change to unhealthy +- [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "starting" ]] +- [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not starting" && ((ret++)) ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not starting" && ((ret++)) + + sleep 6 # finish second health check + +- [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "unhealthy" ]] +- [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not unhealthy" && ((ret++)) ++ success=1 ++ for i in $(seq 1 "$retry_limit"); do ++ [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "unhealthy" ]] ++ if [ $? -eq 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done ++ ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not unhealthy" && ((ret++)) + + # validate --health-retries option + [[ $(isula inspect -f '{{.State.Health.FailingStreak}}' ${container_name}) == "2" ]] +@@ -77,6 +97,9 @@ function test_health_check_normally() + { + local ret=0 + local image="busybox" ++ local retry_limit=10 ++ local retry_interval=1 ++ local success=1 + local test="list && inspect image info test => (${FUNCNAME[@]})" + + msg_info "${test} starting..." +@@ -92,25 +115,60 @@ function test_health_check_normally() + [[ $(isula inspect -f '{{.State.Status}}' ${container_name}) == "running" ]] + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container status: not running" && ((ret++)) + +- sleep 2 # Health check has been performed yet ++ # Health check has been performed yet ++ for i in $(seq 1 "$retry_limit"); do ++ [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "starting" ]] ++ if [ $? -eq 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done + + # Initial status when the container is still starting +- [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "starting" ]] +- [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not starting" && ((ret++)) ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not starting" && ((ret++)) + + sleep 8 # finish first health check ++ ++ success=1 ++ for i in $(seq 1 "$retry_limit"); do ++ [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "healthy" ]] ++ if [ $? -eq 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done + # When the health check returns successfully, status immediately becomes healthy +- [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "healthy" ]] +- [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not healthy" && ((ret++)) ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not healthy" && ((ret++)) + +- kill -9 $(isula inspect -f '{{.State.Pid}}' ${container_name}) && sleep 2 # Wait for the container to be killed ++ kill -9 $(isula inspect -f '{{.State.Pid}}' ${container_name}) ++ ++ # Wait for the container to be killed ++ success=1 ++ for i in $(seq 1 "$retry_limit"); do ++ [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "unhealthy" ]] ++ if [ $? -eq 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done + + # The container process exits abnormally and the health check status becomes unhealthy +- [[ $(isula inspect -f '{{.State.Health.Status}}' ${container_name}) == "unhealthy" ]] +- [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not unhealthy" && ((ret++)) ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not unhealthy" && ((ret++)) + +- [[ $(isula inspect -f '{{.State.ExitCode}}' ${container_name}) == "137" ]] +- [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container exit code: not 137" && ((ret++)) ++ success=1 ++ for i in $(seq 1 "$retry_limit"); do ++ [[ $(isula inspect -f '{{.State.ExitCode}}' ${container_name}) == "137" ]] ++ if [ $? -eq 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done ++ ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container exit code: not 137" && ((ret++)) + + isula rm -f ${container_name} + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - failed to remove container: ${container_name}" && ((ret++)) +@@ -168,8 +226,16 @@ function test_health_check_timeout() + # The container process exits and the health check status becomes unhealthy + [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container health check status: not unhealthy" && ((ret++)) + +- [[ $(isula inspect -f '{{.State.ExitCode}}' ${container_name}) == "137" ]] +- [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container exit code: not 137" && ((ret++)) ++ success=1 ++ for i in $(seq 1 "$retry_limit"); do ++ [[ $(isula inspect -f '{{.State.ExitCode}}' ${container_name}) == "137" ]] ++ if [ $? -eq 0 ]; then ++ success=0 ++ break; ++ fi ++ sleep $retry_interval ++ done ++ [[ $success -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - incorrent container exit code: not 137" && ((ret++)) + + isula rm -f ${container_name} + [[ $? -ne 0 ]] && msg_err "${FUNCNAME[0]}:${LINENO} - failed to remove container: ${container_name}" && ((ret++)) +-- +2.40.1 + diff --git a/0009-fix-memory-leak-and-array-access-out-of-range.patch b/0009-fix-memory-leak-and-array-access-out-of-range.patch new file mode 100644 index 0000000..f0cee17 --- /dev/null +++ b/0009-fix-memory-leak-and-array-access-out-of-range.patch @@ -0,0 +1,75 @@ +From ab1f394910103615d015077d538cb71c363397fc Mon Sep 17 00:00:00 2001 +From: "Neil.wrz" +Date: Tue, 23 May 2023 19:01:40 -0700 +Subject: [PATCH 9/9] fix memory leak and array access out of range + +Signed-off-by: Neil.wrz +--- + .../oci/storage/remote_layer_support/remote_support.c | 4 ++-- + .../storage/remote_layer_support/ro_symlink_maintain.c | 2 +- + src/utils/http/parser.c | 10 ++++++++++ + 3 files changed, 13 insertions(+), 3 deletions(-) + +diff --git a/src/daemon/modules/image/oci/storage/remote_layer_support/remote_support.c b/src/daemon/modules/image/oci/storage/remote_layer_support/remote_support.c +index 748298cb..400678c4 100644 +--- a/src/daemon/modules/image/oci/storage/remote_layer_support/remote_support.c ++++ b/src/daemon/modules/image/oci/storage/remote_layer_support/remote_support.c +@@ -105,12 +105,12 @@ int remote_start_refresh_thread(pthread_rwlock_t *remote_lock) + res = pthread_create(&a_thread, NULL, remote_refresh_ro_symbol_link, (void *)&supporters); + if (res != 0) { + CRIT("Thread creation failed"); +- return -1; ++ goto free_out; + } + + if (pthread_detach(a_thread) != 0) { + SYSERROR("Failed to detach 0x%lx", a_thread); +- return -1; ++ goto free_out; + } + + return 0; +diff --git a/src/daemon/modules/image/oci/storage/remote_layer_support/ro_symlink_maintain.c b/src/daemon/modules/image/oci/storage/remote_layer_support/ro_symlink_maintain.c +index 0e2b671b..2bcc43e6 100644 +--- a/src/daemon/modules/image/oci/storage/remote_layer_support/ro_symlink_maintain.c ++++ b/src/daemon/modules/image/oci/storage/remote_layer_support/ro_symlink_maintain.c +@@ -136,7 +136,7 @@ static int do_build_ro_dir(const char *home, const char *id) + nret = asprintf(&ro_layer_dir, "%s/%s/%s", home, REMOTE_RO_LAYER_DIR, id); + if (nret < 0 || nret > PATH_MAX) { + SYSERROR("Failed to create ro layer dir path"); +- return -1; ++ goto out; + } + + if (util_mkdir_p(ro_layer_dir, IMAGE_STORE_PATH_MODE) != 0) { +diff --git a/src/utils/http/parser.c b/src/utils/http/parser.c +index 12df2435..a79893ba 100644 +--- a/src/utils/http/parser.c ++++ b/src/utils/http/parser.c +@@ -88,6 +88,11 @@ static int parser_cb_header_field(http_parser *parser, const char *buf, + m->num_headers++; + } + ++ if (m->num_headers == 0) { ++ ERROR("Failed to parse header field because headers num is 0"); ++ return -1; ++ } ++ + strlncat(m->headers[m->num_headers - 1][0], sizeof(m->headers[m->num_headers - 1][0]), buf, len); + + m->last_header_element = FIELD; +@@ -100,6 +105,11 @@ static int parser_cb_header_value(http_parser *parser, const char *buf, + size_t len) + { + struct parsed_http_message *m = parser->data; ++ ++ if (m->num_headers == 0) { ++ ERROR("Failed to parse header value because headers num is 0"); ++ return -1; ++ } + + strlncat(m->headers[m->num_headers - 1][1], sizeof(m->headers[m->num_headers - 1][1]), buf, len); + m->last_header_element = VALUE; +-- +2.40.1 + diff --git a/iSulad.spec b/iSulad.spec index fb70af6..3868b11 100644 --- a/iSulad.spec +++ b/iSulad.spec @@ -1,5 +1,5 @@ %global _version 2.1.2 -%global _release 2 +%global _release 3 %global is_systemd 1 %global enable_shimv2 1 %global is_embedded 1 @@ -13,6 +13,16 @@ URL: https://gitee.com/openeuler/iSulad Source: https://gitee.com/openeuler/iSulad/repository/archive/v%{version}.tar.gz BuildRoot: {_tmppath}/iSulad-%{version} +Patch0001: 0001-convert-files-from-CRLF-to-LF.patch +Patch0002: 0002-restore-ping-head.patch +Patch0003: 0003-fix-health_check.sh.patch +Patch0004: 0004-ensure-isulad_io-not-NULL-before-close-fd.patch +Patch0005: 0005-recheck-delete-command-exit-status.patch +Patch0006: 0006-restore-execSync-return-value.patch +Patch0007: 0007-reinforce-cri_stream.sh-and-health_check.sh.patch +Patch0008: 0008-reinforce-omit-health_check.sh.patch +Patch0009: 0009-fix-memory-leak-and-array-access-out-of-range.patch + %ifarch x86_64 aarch64 Provides: libhttpclient.so()(64bit) Provides: libisula_client.so()(64bit) @@ -254,6 +264,12 @@ fi %endif %changelog +* Thu May 25 2023 zhongtao - 2.1.2-3 +- Type: bugfix +- ID: NA +- SUG: NA +- DESC: upgrade from upstream + * Fri May 12 2023 zhangxiaoyu - 2.1.2-2 - Type: bugfix - ID: NA