5 changed files with 26 additions and 141 deletions
--- a/24
+++ b/24
@ -1,24 +0,0 @@
-#Usage:
-#1.build image:
-#   docker build -f Dockerfile-llama -t llama_image .
-#2.run  image:   
-#   docker run -it --security-opt seccomp=unconfined llama_image:latest
-
-#base image
-FROM openeuler/openeuler:22.03
-
-#update openEuler2309 source  and install chatglm
-RUN echo '[everything]' > /etc/yum.repos.d/openEuler.repo  && \
-echo 'name=everything' >> /etc/yum.repos.d/openEuler.repo && \
-echo 'baseurl=http://121.36.84.172/dailybuild/EBS-openEuler-23.09/rc4_openeuler-2023-09-13-21-46-47/everything/$basearch/' >> /etc/yum.repos.d/openEuler.repo && \
-echo 'enabled=1' >> /etc/yum.repos.d/openEuler.repo && \
-echo 'gpgcheck=0' >> /etc/yum.repos.d/openEuler.repo && \  
-yum install -y llama.cpp wget
-
-#download ggml model 
-WORKDIR /model_path
-RUN  wget -P /model_path https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin
-
-# run ggml model
-CMD /usr/bin/llama_cpp_main -m /model_path/llama-2-13b-chat.ggmlv3.q4_0.bin --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1.1 -t 8
-
--- a/README.md
+++ b/README.md
@ -1,90 +1,37 @@
-# llama.cpp使用指南
+# llama.cpp

-## 介绍
-llama.cpp是基于C/C++实现的LLaMa英文大模型接口，可以支持用户在CPU机器上完成开源大模型的部署和使用。
+#### 介绍
+Port of English lagre model LLaMA implemented based on C/C++

-llama.cpp支持多个英文开源大模型的部署，如LLaMa，LLaMa2，Vicuna等。
-
-## 软件架构
-llama.cpp核心架构分为两层
- 模型量化层：可以量化开源模型，减少模型大小；
- 模型启动层：可以启动量化后的模型。
-
-特性：
- 基于ggml的C/C++实现；
- 通过int4/int8量化、优化的KV缓存和并行计算等多种方式加速CPU推理；
- 互动界面是流媒体生成，具有打字机效果；
- 无需 GPU，可只用 CPU 运行。
-
-## 安装教程
-### 软硬件要求 
-处理器架构：支持AArch64和X86_64处理器架构；
-
-操作系统：openEuler 23.09；
-
-内存：根据不同开源模型的大小，不低于4G。
-
-### 安装组件 
-使用llama.cpp部署大模型，需要安装llama.cpp软件包。安装前，请确保已经配置了openEuler yum源。
-1.  安装：
-```
-yum install llama.cpp
-```
-2.  查看是否安装成功：
-```
-llama_cpp_main -h
-```
-若成功显示help信息则安装成功。
+#### 软件架构
+软件架构说明


-## 使用说明
-### 不使用容器
-1.  需要安装llama.cpp软件包：
-```
-yum install llama.cpp
-```
-2.  需要下载开源大模型，如LLaMa、LLaMa2等。并将下载的开源大模型通过llama_convert.py进行模型量化：
-```
-python3 /usr/bin/llama_convert.py  model_path/
-```
-其中model_path为开源大模型的存放路径。
+#### 安装教程

-3.  启动模型，进行对话：
-```
-llama_cpp_main -m model_path --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1.1 -t 8
-```
-其中model_path为量化模型的存放路径。
+1.  xxxx
+2.  xxxx
+3.  xxxx

-可通过以下命令查看命令行选项用法：
-```
-llama_cpp_main -h
-```
-### 使用容器
-1.  拉取容器镜像：
-```
-docker pull hub.oepkgs.net/openeuler/llama_image
-```
-2. 运行容器镜像，进行对话：
-```
-docker run -it --security-opt seccomp=unconfined hub.oepkgs.net/openeuler/llama_image
-```
-### 正常启动界面
-模型启动后的界面如图1所示：
+#### 使用说明

-**图1** 模型启动界面
-![输入图片说明](llama.png)
-## 规格说明
-本项目可支持在CPU级别的机器上进行大模型的部署和推理，但是模型推理速度对硬件仍有一定的要求，硬件配置过低可能会导致推理速度过慢，降低使用效率。
+1.  xxxx
+2.  xxxx
+3.  xxxx

-表1可作为不同机器配置下推理速度的参考：
+#### 参与贡献

-表格中Q4_0，Q4_1，Q5_0，Q5_1代表模型的量化精度；ms/token代表模型的推理速度，含义为每个token推理耗费的毫秒数，该值越小推理速度越快；
+1.  Fork 本仓库
+2.  新建 Feat_xxx 分支
+3.  提交代码
+4.  新建 Pull Request

-**表1** 模型推理速度的测试数据

-| LLaMa-7B            | Q4_0 | Q4_1 | Q5_0 | Q5_1 |
-|--------------------------------|------|------|------|------|
-| ms/token (CPU @ Platinum 8260) | 55   | 54   | 76   | 83   | 
-| 模型大小                      | 3.5G | 3.9G | 4.3G | 6.7G | 
-| 内存占用                      | 3.9G | 4.2G | 4.5G | 5.0G |
+#### 特技

+1.  使用 Readme\_XXX.md 来支持不同的语言，例如 Readme\_en.md, Readme\_zh.md
+2.  Gitee 官方博客 [blog.gitee.com](https://blog.gitee.com)
+3.  你可以 [https://gitee.com/explore](https://gitee.com/explore) 这个地址来了解 Gitee 上的优秀开源项目
+4.  [GVP](https://gitee.com/gvp) 全称是 Gitee 最有价值开源项目，是综合评定出的优秀开源项目
+5.  Gitee 官方提供的使用手册 [https://gitee.com/help](https://gitee.com/help)
+6.  Gitee 封面人物是一档用来展示 Gitee 会员风采的栏目 [https://gitee.com/gitee-stars/](https://gitee.com/gitee-stars/)
--- a/add-loongarch64-support.patch
+++ b/add-loongarch64-support.patch
@ -1,26 +0,0 @@
-diff --git a/ggml.c b/ggml.c
-index beb7f46..2374287 100644
--- a/ggml.c
-+++ b/ggml.c
-@@ -299,7 +299,7 @@ typedef double ggml_float;
- #if defined(_MSC_VER) || defined(__MINGW32__)
- #include <intrin.h>
- #else
-#if !defined(__riscv)
-+#if !defined(__riscv) && !defined(__loongarch64)
- #include <immintrin.h>
- #endif
- #endif
-diff --git a/k_quants.c b/k_quants.c
-index 6348fce..6816121 100644
--- a/k_quants.c
-+++ b/k_quants.c
-@@ -26,7 +26,7 @@
- #if defined(_MSC_VER) || defined(__MINGW32__)
- #include <intrin.h>
- #else
-#if !defined(__riscv)
-+#if !defined(__riscv) && !defined(__loongarch64)
- #include <immintrin.h>
- #endif
- #endif
--- a/llama.cpp.spec
+++ b/llama.cpp.spec
@ -3,13 +3,12 @@

 Name:       llama.cpp
 Version:    20230815
-Release:    4
+Release:    1
 License:    MIT
 Summary:    Port of English lagre model LLaMA implemented based on C/C++

 URL:            https://github.com/ggerganov/llama.cpp
 Source0:        https://github.com/ggerganov/llama.cpp/archive/refs/tags/%{llama_commitid}.tar.gz
-Patch0:     add-loongarch64-support.patch

 BuildRequires:  gcc,gcc-c++,cmake

@ -31,7 +30,6 @@ popd
 pushd llama_builddir
 %make_install
 mv %{buildroot}%{_prefix}/local/bin/main  %{buildroot}%{_prefix}/local/bin/llama_cpp_main
-mv %{buildroot}%{_prefix}/local/bin/convert.py %{buildroot}%{_prefix}/local/bin/llama_convert.py
 mv %{buildroot}%{_prefix}/local/*  %{buildroot}%{_prefix}
 popd

@ -40,16 +38,6 @@ popd
 %{_libdir}/libembdinput.a

 %changelog
-* Tue May 14 2024 wangshuo <wangshuo@kylinos.cn> - 20230815-4
- add loongarch64 support
-
-* Wed Sep 20 2023 zhoupengcheng <zhoupengcheng11@huawei.com> - 20230815-3
- rename /usr/bin/convert.py 
- update long-term yum.repo in dockerfile
-
-* Tue Sep 19 2023 zhoupengcheng <zhoupengcheng11@huawei.com> - 20230815-2
- add dockerfile
-
 * Wed Aug 16 2023 zhoupengcheng <zhoupengcheng11@huawei.com> - 20230815-1
 - Init package

--- a/llama.png
+++ b/llama.png