ymcui

Open Source

Chinese-LLaMA-Alpaca

# [Chinese-LLaMA-Alpaca-3](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3)项目启动！ [**🇨🇳中文**](./README.md) | [**🌐English**](./README_EN.md) | [**📖文档/Docs**](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki) | [**❓提问/Issues**](https://github.com/ymcui/Chinese-LLaMA-Alpaca/issues) | [**💬讨论/Discussions**](https://github.com/ymcui/Chinese-LLaMA-Alpaca/discussions) | [**⚔️竞技场/Arena**](http://llm-arena.ymcui.com/) <img src="./pics/banner.png" width="700"/> <img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-LLaMA-Alpaca.svg?color=blue&style=flat-square"> <img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/ymcui/Chinese-LLaMA-Alpaca"> <img alt="GitHub top language" src="https://img.shields.io/github/languages/top/ymcui/Chinese-LLaMA-Alpaca"> <img alt="GitHub last commit" src="https://img.shields.io/github/last-commit/ymcui/Chinese-LLaMA-Alpaca"> <a href="https://app.codacy.com/gh/ymcui/Chinese-LLaMA-Alpaca/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/1710faac5e634acaabfc26b0a778cdde"/></a> 本项目开源了**中文LLaMA模型和指令精调的Alpaca大模型**，以进一步促进大模型在中文NLP社区的开放研究。这些模型**在原版LLaMA的基础上扩充了中文词表**并使用了中文数据进行二次预训练，进一步提升了中文基础语义理解能力。同时，中文Alpaca模型进一步使用了中文指令数据进行精调，显著提升了模型对指令的理解和执行能力。 **技术报告（V2）**：[[Cui, Yang, and Yao] Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca](https://arxiv.org/abs/2304.08177) **本项目主要内容：** - 🚀 针对原版LLaMA模型扩充了中文词表，提升了中文编解码效率 - 🚀 开源了使用中文文本数据预训练的中文LLaMA以及经过指令精调的中文Alpaca - 🚀 开源了预训练脚本、指令精调脚本，用户可根据需要进一步训练模型 - 🚀 快速使用笔记本电脑（个人PC）的CPU/GPU本地量化和部署体验大模型 - 🚀 支持[🤗transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [LlamaChat](https://github.com/alexrozanski/LlamaChat), [LangChain](https://github.com/hwchase17/langchain), [privateGPT](https://github.com/imartinez/privateGPT)等生态 - 目前已开源的模型版本：7B（基础版、**Plus版**、**Pro版**）、13B（基础版、**Plus版**、**Pro版**）、33B（基础版、**Plus版**、**Pro版**） 💡 下图是中文Alpaca-Plus-7B模型在本地CPU量化部署后的实际体验速度和效果。 ![](./pics/screencast.gif) ---- [**中文LLaMA-2&Alpaca-2大模型**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | [多模态中文LLaMA&Alpaca大模型](https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca) | [多模态VLE](https://github.com/iflytek/VLE) | [中文MiniRBT](https://github.com/iflytek/MiniRBT) | [中文LERT](https://github.com/ymcui/LERT) | [中英文PERT](https://github.com/ymcui/PERT) | [中文MacBERT](https://github.com/ymcui/MacBERT) | [中文ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [中文XLNet](https://github.com/ymcui/Chinese-XLNet) | [中文BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) | [模型裁剪工具TextPruner](https://github.com/airaria/TextPruner) ## 新闻 **[2024/04/30] Chinese-LLaMA-Alpaca-3 已正式发布，开源基于Llama-3的Llama-3-Chinese-8B和Llama-3-Chinese-8B-Instruct，推荐所有一期、二期项目用户升级至三代模型，请参阅：https://github.com/ymcui/Chinese-LLaMA-Alpaca-3** [2024/03/27] 本项目已入驻机器之心SOTA!模型平台，欢迎关注：https://sota.jiqizhixin.com/project/chinese-llama-alpaca [2023/08/14] Chinese-LLaMA-Alpaca-2 v2.0版本已正式发布，开源Chinese-LLaMA-2-13B和Chinese-Alpaca-2-13B，推荐所有一期用户升级至二代模型，请参阅：https://github.com/ymcui/Chinese-LLaMA-Alpaca-2 [2023/07/31] Chinese-LLaMA-Alpaca-2 v1.0版本已正式发布，请参阅：https://github.com/ymcui/Chinese-LLaMA-Alpaca-2 [2023/07/19] [v5.0版本](https://github.com/ymcui/Chinese-LLaMA-Alpaca/releases/tag/v5.0): 发布Alpaca-Pro系列模型，显著提升回复长度和质量；同时发布Plus-33B系列模型。 [2023/07/19] 🚀启动[中文LLaMA-2、Alpaca-2开源大模型项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)，欢迎关注了解最新信息。 [2023/07/10] Beta测试预览，提前了解即将到来的更新：详见[讨论区](https://github.com/ymcui/Chinese-LLaMA-Alpaca/discussions/732) [2023/07/07] Chinese-LLaMA-Alpaca家族再添新成员，推出面向视觉问答与对话的[多模态中文LLaMA&Alpaca大模型](https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca)，发布了7B测试版本。 [2023/06/30] llama.cpp下8K context支持（无需对模型做出修改），相关方法和讨论见[讨论区](https://github.com/ymcui/Chinese-LLaMA-Alpaca/discussions/696)；transformers下支持4K+ context的代码请参考[PR#705](https://github.com/ymcui/Chinese-LLaMA-Alpaca/pull/705) [2023/06/16] [v4.1版本](https://github.com/ymcui/Chinese-LLaMA-Alpaca/releases/tag/v4.1): 发布新版技术报告、添加C-Eval解码脚本、添加低资源模型合并脚本等。 [2023/06/08] [v4.0版本](https://github.com/ymcui/Chinese-LLaMA-Alpaca/releases/tag/v4.0): 发布中文LLaMA/Alpaca-33B、添加privateGPT使用示例、添加C-Eval结果等。 ## 内容导引 | 章节 | 描述 | | ------------------------------------- | ------------------------------------------------------------ | | [⏬模型下载](#模型下载) | 中文LLaMA、Alpaca大模型下载地址 | | [🈴合并模型](#合并模型) | （重要）介绍如何将下载的LoRA模型与原版LLaMA合并 | | [💻本地推理与快速部署](#本地推理与快速部署) | 介绍了如何对模型进行量化并使用个人电脑部署并体验大模型 | | [💯系统效果](#系统效果) | 介绍了部分场景和任务下的使用体验效果 | | [📝训练细节](#训练细节) | 介绍了中文LLaMA、Alpaca大模型的训练细节 | | [❓FAQ](#FAQ) | 一些常见问题的回复 | | [⚠️局限性](#局限性) | 本项目涉及模型的局限性 | ## 模型下载 ### 用户须知（必读） Facebook官方发布的[LLaMA模型禁止商用](https://github.com/facebookresearch/llama)，并且官方没有正式开源模型权重（虽然网上已经有很多第三方的下载地址）。为了遵循相应的许可，**这里发布的是LoRA权重**，可以理解为原LLaMA模型上的一个“补丁”，两者合并即可获得完整版权重。以下中文LLaMA/Alpaca LoRA模型无法单独使用，需要搭配[原版LLaMA模型](https://github.com/facebookresearch/llama)。请参考本项目给出的[合并模型](#合并模型)步骤重构模型。 ### 模型列表下图展示了本项目以及[二期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)推出的所有大模型之间的关系。 ![](./pics/models.png) ### 模型选择指引下面是中文LLaMA和Alpaca模型的基本对比以及建议使用场景（包括但不限于），更多内容见[训练细节](#训练细节)。 | 对比项 | 中文LLaMA | 中文Alpaca | | :-------------------- | ------------------------------------------------------ | ------------------------------------------------------------ | | 训练方式 | 传统CLM | 指令精调 | | 模型类型 | 基座模型 | 指令理解模型（类ChatGPT） | | 训练语料 | 无标注通用语料 | 有标注指令数据 | | 词表大小[3] | 4995**3** | 4995**4**=49953+1（pad token） | | 输入模板 | 不需要 | 需要符合模板要求[1] | | 适用场景 ✔️ | 文本续写：给定上文内容，让模型生成下文 | 指令理解（问答、写作、建议等）；多轮上下文理解（聊天等） | | 不适用场景 ❌ | 指令理解、多轮聊天等 | 文本无限制自由生成 | | llama.cpp | 使用`-p`参数指定上文 | 使用`-ins`参数启动指令理解+聊天模式 | | text-generation-webui | 不适合chat模式 | 使用`--cpu`可在无显卡形式下运行 | | LlamaChat | 加载模型时选择"LLaMA" | 加载模型时选择"Alpaca" | | [HF推理代码](./scripts/inference/inference_hf.py) | 无需添加额外启动参数 | 启动时添加参数 `--with_prompt` | | [web-demo代码](./scripts/inference/gradio_demo.py) | 不适用 | 直接提供Alpaca模型位置即可；支持多轮对话 | | [LangChain示例](./scripts/langchain) / privateGPT | 不适用 | 直接提供Alpaca模型位置即可 | | 已知问题 | 如果不控制终止，则会一直写下去，直到达到输出长度上限。[2] | 请使用Pro版，以避免Plus版回复过短的问题。 | *[1] llama.cpp/LlamaChat/[HF推理代码](./scripts/inference/inference_hf.py)/[web-demo代码](./scripts/inference/gradio_demo.py)/[LangChain示例](./scripts/langchain)等已内嵌，无需手动添加模板。* *[2] 如果出现模型回答质量特别低、胡言乱语、不理解问题等情况，请检查是否使用了正确的模型和启动参数。* *[3] 经过指令精调的Alpaca会比LLaMA多一个pad token，**因此请勿混用LLaMA/Alpaca词表**。* ### 推荐模型下载以下为本项目推荐使用的模型列表，通常使用了更多的训练数据和优化的模型训练方法和参数，请优先使用这些模型（其余模型请查看[其他模型](#其他模型)）。**如希望体验类ChatGPT对话交互，请使用Alpaca模型，而不是LLaMA模型。** 对于Alpaca模型，Pro版针对回复内容过短的问题进行改进，模型回复效果有明显提升；如果更偏好短回复，请选择Plus系列。 | 模型名称 | 类型 | 训练数据 | 重构模型[1] | 大小[2] | LoRA下载[3] | | :------------------------ | :------: | :------: | :--------------------------------------------------------: | :----------------: | :----------------------------------------------------------: | | Chinese-LLaMA-Plus-7B | 基座模型 | 通用120G | 原版LLaMA-7B | 790M | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-plus-lora-7b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-plus-lora-7b) [[Baidu]](https://pan.baidu.com/s/1zvyX9FN-WSRDdrtMARxxfw?pwd=2gtr) | | Chinese-LLaMA-Plus-13B | 基座模型 | 通用120G | 原版LLaMA-13B | 1.0G | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-plus-lora-13b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-plus-lora-13b) [[Baidu]](https://pan.baidu.com/s/1VGpNlrLx5zHuNzLOcTG-xw?pwd=8cvd) | | Chinese-LLaMA-Plus-33B 🆕 | 基座模型 | 通用120G | 原版LLaMA-33B | 1.3G[6] | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-plus-lora-33b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-plus-lora-33b) [[Baidu]](https://pan.baidu.com/s/1v2WsSA0RFyVfy7FXY9A2NA?pwd=n8ws) | | Chinese-Alpaca-Pro-7B 🆕 | 指令模型 | 指令4.3M | *原版LLaMA-7B & LLaMA-Plus-7B*[4] | 1.1G | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-pro-lora-7b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-pro-lora-7b) [[Baidu]](https://pan.baidu.com/s/1M7whRwG5DRRkzRXCH4aF3g?pwd=fqpd) | | Chinese-Alpaca-Pro-13B 🆕 | 指令模型 | 指令4.3M | *原版LLaMA-13B & LLaMA-Plus-13B[4]* | 1.3G | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-pro-lora-13b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-pro-lora-13b) [[Baidu]](https://pan.baidu.com/s/1ok5Iiou-MovZa7bFLvt4uA?pwd=m79g) | | Chinese-Alpaca-Pro-33B 🆕 | 指令模型 | 指令4.3M | *原版LLaMA-33B & LLaMA-Plus-33B[4]* | 2.1G | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-pro-lora-33b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-pro-lora-33b) [[Baidu]](https://pan.baidu.com/s/1u2TWZcsG_PZSTnmuu7vwww?pwd=8zj8) | *[1] 重构需要原版LLaMA模型，[去LLaMA项目申请使用](https://github.com/facebookresearch/llama)或参考这个[PR](https://github.com/facebookresearch/llama/pull/73/files)。因版权问题本项目无法提供下载链接。* *[2] 经过重构后的模型大小比同等量级的原版LLaMA大一些（主要因为扩充了词表）。* *[3] 下载后务必检查压缩包中模型文件的SHA256是否一致，请查看[SHA256.md](./SHA256.md)。* *[4] Alpaca-Plus模型需要同时下载对应的LLaMA-Plus模型，请参考[合并教程](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换#多lora权重合并适用于chinese-alpaca-plus)。* *[5] 有些地方称为30B，实际上是Facebook在发布模型时写错了，论文里仍然写的是33B。* *[6] 采用FP16存储，故模型体积较小。* 压缩包内文件目录如下（以Chinese-LLaMA-7B为例）： ``` chinese_llama_lora_7b/ - adapter_config.json # LoRA权重配置文件 - adapter_model.bin # LoRA权重文件 - special_tokens_map.json # special_tokens_map文件 - tokenizer_config.json # tokenizer配置文件 - tokenizer.model # tokenizer文件 ``` ### 其他模型下载由于训练方式和训练数据等因素影响，**以下模型已不再推荐使用（特定场景下可能仍然有用）**，请优先使用上一节中的[推荐模型](#推荐下载模型)。 | 模型名称 | 类型 | 训练数据 | 重构模型 | 大小 | LoRA下载 | | :---------------- | :------: | :------: | :--------------------: | :----------------: | :----------------------------------------------------------: | | Chinese-LLaMA-7B | 基座模型 | 通用20G | 原版LLaMA-7B | 770M | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-lora-7b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-lora-7b) [[Baidu]](https://pan.baidu.com/s/1oORTdpr2TvlkxjpyWtb5Sw?pwd=33hb) | | Chinese-LLaMA-13B | 基座模型 | 通用20G | 原版LLaMA-13B | 1.0G | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-lora-13b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-lora-13b) [[Baidu]](https://pan.baidu.com/s/1BxFhYhDMipW7LwI58cGmQQ?pwd=ef3t) | | Chinese-LLaMA-33B | 基座模型 | 通用20G | 原版LLaMA-33B | 2.7G | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-lora-33b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-lora-33b) [[Baidu]](https://pan.baidu.com/s/1-ylGyeM70QZ5vbEug5RD-A?pwd=hp6f) | | Chinese-Alpaca-7B | 指令模型 | 指令2M | 原版LLaMA-7B | 790M | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-lora-7b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-lora-7b) [[Baidu]](https://pan.baidu.com/s/1xV1UXjh1EPrPtXg6WyG7XQ?pwd=923e) | | Chinese-Alpaca-13B | 指令模型 | 指令3M | 原版LLaMA-13B | 1.1G | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-lora-13b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-lora-13b) [[Baidu]](https://pan.baidu.com/s/1wYoSF58SnU9k0Lndd5VEYg?pwd=mm8i) | | Chinese-Alpaca-33B | 指令模型 | 指令4.3M | 原版LLaMA-33B | 2.8G | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-lora-33b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-lora-33b) [[Baidu]](https://pan.baidu.com/s/1fey7lGMMw3GT982l8uJYMg?pwd=2f2s) | | Chinese-Alpaca-Plus-7B | 指令模型 | 指令4M | *原版LLaMA-7B & LLaMA-Plus-7B* | 1.1G | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-plus-lora-7b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-plus-lora-7b) [[Baidu]](https://pan.baidu.com/s/12tjjxmDWwLBM8Tj_7FAjHg?pwd=32hc) | | Chinese-Alpaca-Plus-13B | 指令模型 | 指令4.3M | *原版LLaMA-13B & LLaMA-Plus-13B* | 1.3G | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-plus-lora-13b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-plus-lora-13b) [[Baidu]](https://pan.baidu.com/s/1Mew4EjBlejWBBB6_WW6vig?pwd=mf5w) | | Chinese-Alpaca-Plus-33B | 指令模型 | 指令4.3M | *原版LLaMA-33B & LLaMA-Plus-33B* | 2.1G | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-plus-lora-33b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-plus-lora-33b) [[Baidu]](https://pan.baidu.com/s/1j2prOjiQGB8S5x67Uj8XZw?pwd=3pac) | ### 🤗transformers调用可以在🤗Model Hub下载以上所有模型，并且使用[transformers](https://github.com/huggingface/transformers)和[PEFT](https://github.com/huggingface/peft)调用中文LLaMA或Alpaca LoRA模型。以下模型调用名称指的是使用`.from_pretrained()`中指定的模型名称。详细清单与模型下载地址：https://huggingface.co/hfl ## 合并模型前面提到LoRA模型无法单独使用，必须与原版LLaMA进行合并才能转为完整模型，以便进行模型推理、量化或者进一步训练。请选择以下方法对模型进行转换合并。 | 方式 | 适用场景 | 教程 | | :----------- | :--------------------------------------------------------- | :----------------------------------------------------------: | | **在线转换** | Colab用户可利用本项目提供的notebook进行在线转换并量化模型 | [链接](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/在线模型合并与转换) | | **手动转换** | 离线方式转换，生成不同格式的模型，以便进行量化或进一步精调 | [链接](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换) | 以下是合并模型后，FP16精度和4-bit量化后的大小，转换前确保本机有足够的内存和磁盘空间（最低要求）： | 模型版本 | 7B | 13B | 33B | 65B | | :------------------ | :----: | :-----: | :-----: | :-----: | | 原模型大小（FP16） | 13 GB | 24 GB | 60 GB | 120 GB | | 量化后大小（8-bit） | 7.8 GB | 14.9 GB | 32.4 GB | ~60 GB | | 量化后大小（4-bit） | 3.9 GB | 7.8 GB | 17.2 GB | 38.5 GB | 具体内容请参考本项目 >>> [📚 GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/模型合并与转换) ## 本地推理与快速部署本项目中的模型主要支持以下量化、推理和部署方式。 | 推理和部署方式 | 特点 | 平台 | CPU | GPU | 量化加载 | 图形界面 | 教程 | | :----------------------------------------------------------- | -------------------------------------------- | :---: | :--: | :--: | :------: | :------: | :----------------------------------------------------------: | | [**llama.cpp**](https://github.com/ggerganov/llama.cpp) | 丰富的量化选项和高效本地推理 | 通用 | ✅ | ✅ | ✅ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/llama.cpp量化部署) | | [**🤗Transformers**](https://github.com/huggingface/transformers) | 原生transformers推理接口 | 通用 | ✅ | ✅ | ✅ | ✅ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用Transformers推理) | | [**text-generation-webui**](https://github.com/oobabooga/text-generation-webui) | 前端Web UI界面的部署方式 | 通用 | ✅ | ✅ | ✅ | ✅ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用text-generation-webui搭建界面) | | [**LlamaChat**](https://github.com/alexrozanski/LlamaChat) | macOS下的图形交互界面 | MacOS | ✅ | ❌ | ✅ | ✅ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用LlamaChat图形界面（macOS）) | | [**LangChain**](https://github.com/hwchase17/langchain) | LLM应用开发框架，适用于进行二次开发 | 通用 | ✅† | ✅ | ✅† | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/与LangChain进行集成) | | [**privateGPT**](https://github.com/imartinez/privateGPT) | 基于LangChain的多文档本地问答框架 | 通用 | ✅ | ✅ | ✅ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用privateGPT进行多文档问答) | | [**Colab Gradio Demo**](https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/notebooks/gradio_web_demo.ipynb) | Colab中启动基于Gradio的交互式Web服务 | 通用 | ✅ | ✅ | ✅ | ❌ | [link](https://colab.research.google.com/github/ymcui/Chinese-LLaMA-Alpaca/blob/main/notebooks/gradio_web_demo.ipynb) | | [**API调用**](https://platform.openai.com/docs/api-reference) | 仿OpenAI API接口的服务器Demo | 通用 | ✅ | ✅ | ✅ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/API调用) | †: LangChain框架支持，但教程中未实现；详细说明请参考LangChain官方文档。具体内容请参考本项目 >>> [📚 GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/模型推理与部署) ## 系统效果 ### 生成效果评测为了快速评测相关模型的实际文本生成表现，本项目在给定相同的prompt的情况下，在一些常见任务上对比测试了本项目的中文Alpaca-7B、中文Alpaca-13B、中文Alpaca-33B、中文Alpaca-Plus-7B、中文Alpaca-Plus-13B的效果。生成回复具有随机性，受解码超参、随机种子等因素影响。以下相关评测并非绝对严谨，测试结果仅供晾晒参考，欢迎自行体验。 - 详细评测结果及生成样例请查看[examples目录](./examples) - 📊 Alpaca模型在线对战：[http://llm-arena.ymcui.com](http://llm-arena.ymcui.com/) ### 客观效果评测本项目还在“NLU”类客观评测集合上对相关模型进行了测试。这类评测的结果不具有主观性，只需要输出给定标签（需要设计标签mapping策略），因此可以从另外一个侧面了解大模型的能力。本项目在近期推出的[C-Eval评测数据集](https://cevalbenchmark.com)上测试了相关模型效果，其中测试集包含12.3K个选择题，涵盖52个学科。以下是部分模型的valid和test集评测结果（Average），完整结果请参考[技术报告](https://arxiv.org/abs/2304.08177)。 | 模型 | Valid (zero-shot) | Valid (5-shot) | Test (zero-shot) | Test (5-shot) | | ----------------------- | :---------------: | :------------: | :--------------: | :-----------: | | Chinese-Alpaca-Plus-33B | 46.5 | 46.3 | 44.9 | 43.5 | | Chinese-Alpaca-33B | 43.3 | 42.6 | 41.6 | 40.4 | | Chinese-Alpaca-Plus-13B | 43.3 | 42.4 | 41.5 | 39.9 | | Chinese-Alpaca-Plus-7B | 36.7 | 32.9 | 36.4 | 32.3 | | Chinese-LLaMA-Plus-33B | 37.4 | 40.0 | 35.7 | 38.3 | | Chinese-LLaMA-33B | 34.9 | 38.4 | 34.6 | 39.5 | | Chinese-LLaMA-Plus-13B | 27.3 | 34.0 | 27.8 | 33.3 | | Chinese-LLaMA-Plus-7B | 27.3 | 28.3 | 26.9 | 28.4 | 需要注意的是，综合评估大模型能力仍然是亟待解决的重要课题，合理辩证地看待大模型相关各种评测结果有助于大模型技术的良性发展。推荐用户在自己关注的任务上进行测试，选择适配相关任务的模型。 C-Eval推理代码请参考本项目 >>> [📚 GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/C-Eval评测结果与脚本) ## 训练细节整个训练流程包括词表扩充、预训练和指令精调三部分。 - 本项目的模型均在原LLaMA词表的基础上扩充了中文单词，代码请参考[merge_tokenizers.py](./scripts/merge_tokenizer/merge_tokenizers.py) - 预训练和指令精调代码参考了🤗transformers中的[run_clm.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py)和[Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)项目中数据集处理的相关部分 - 已开源用于预训练和指令精调的训练脚本：[预训练脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/预训练脚本)、[指令精调脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/指令精调脚本) 具体内容请参考本项目 >>> [📚 GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/训练细节) ## FAQ FAQ中给出了常见问题的解答，请在提Issue前务必先查看FAQ。 ``` 问题1：为什么不能放出完整版本权重？问题2：后面会有33B、65B的版本吗？问题3：一些任务上效果不好！问题4：为什么要扩充词表？直接在原版LLaMA上用中文预训练不行吗？问题5：回复内容很短问题6：Windows下，模型无法理解中文、生成速度很慢等问题问题7：Chinese-LLaMA 13B模型没法用llama.cpp启动，提示维度不一致问题8：Chinese-Alpaca-Plus效果很差问题9：模型在NLU类任务（文本分类等）上效果不好问题10：为什么叫33B，不应该是30B吗？问题11：模型合并之后SHA256不一致 ``` 具体问题和解答请参考本项目 >>> [📚 GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/常见问题) ## 局限性虽然本项目中的模型具备一定的中文理解和生成能力，但也存在局限性，包括但不限于： - 可能会产生不可预测的有害内容以及不符合人类偏好和价值观的内容 - 由于算力和数据问题，相关模型的训练并不充分，中文理解能力有待进一步提升 - 暂时没有在线可互动的demo（注：用户仍然可以自行在本地部署） ## 引用如果您觉得本项目对您的研究有所帮助或使用了本项目的代码或数据，请参考引用本项目的技术报告：https://arxiv.org/abs/2304.08177 ``` @article{chinese-llama-alpaca, title={Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca}, author={Cui, Yiming and Yang, Ziqing and Yao, Xin}, journal={arXiv preprint arXiv:2304.08177}, url={https://arxiv.org/abs/2304.08177}, year={2023} } ``` ## 相关项目 | 项目名称 | 简介 | 类型 | | :----------------------------------------------------------- | :----------------------------- | :----: | | [**Chinese-LLaMA-Alpaca-2**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)（官方项目） | 中文LLaMA-2、Alpaca-2大模型 | 文本 | | [**Visual-Chinese-LLaMA-Alpaca**](https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca)（官方项目） | 多模态中文LLaMA & Alpaca大模型 | 多模态 | 想要加入列表？>>> [提交申请](https://github.com/ymcui/Chinese-LLaMA-Alpaca/discussions/740) ## 致谢本项目基于以下开源项目二次开发，在此对相关项目和研究开发人员表示感谢。 | 基础模型、代码 | 量化、推理、部署 | 数据 | | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | | [LLaMA by Facebook](https://github.com/facebookresearch/llama) [Alpaca by Stanford](https://github.com/tatsu-lab/stanford_alpaca) [alpaca-lora by @tloen](https://github.com/tloen/alpaca-lora) | [llama.cpp by @ggerganov](https://github.com/ggerganov/llama.cpp) [LlamaChat by @alexrozanski]( https://github.com/alexrozanski/LlamaChat) [text-generation-webui by @oobabooga](https://github.com/oobabooga/text-generation-webui) | [pCLUE and MT data by @brightmart](https://github.com/brightmart/nlp_chinese_corpus) [oasst1 by OpenAssistant](https://huggingface.co/datasets/OpenAssistant/oasst1) | ## 免责声明 **本项目相关资源仅供学术研究之用，严禁用于商业用途。** 使用涉及第三方代码的部分时，请严格遵循相应的开源协议。模型生成的内容受模型计算、随机性和量化精度损失等因素影响，本项目不对其准确性作出保证。对于模型输出的任何内容，本项目不承担任何法律责任，亦不对因使用相关资源和输出结果而可能产生的任何损失承担责任。本项目由个人及协作者业余时间发起并维护，因此无法保证能及时回复解决相应问题。 ## 问题反馈如有问题，请在GitHub Issue中提交。礼貌地提出问题，构建和谐的讨论社区。 - 在提交问题之前，请先查看FAQ能否解决问题，同时建议查阅以往的issue是否能解决你的问题。 - 提交问题请使用本项目设置的Issue模板，以帮助快速定位具体问题。 - 重复以及与本项目无关的issue会被[stable-bot](https://github.com/marketplace/stale)处理，敬请谅解。 ## 关注我们欢迎关注微信公众号"**涌现志**"，了解最新的技术动态。 ![qrcode.png](https://ymcui.com/images/qrcode.jpg)

LLM Tools & Chat UIs

19K Github Stars

Open Source

Chinese-LLaMA-Alpaca-2

# [Chinese-LLaMA-Alpaca-3](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3)项目启动！ [**🇨🇳中文**](./README.md) | [**🌐English**](./README_EN.md) | [**📖文档/Docs**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki) | [**❓提问/Issues**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/issues) | [**💬讨论/Discussions**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/discussions) | [**⚔️竞技场/Arena**](http://llm-arena.ymcui.com/) <img src="./pics/banner.png" width="800"/> <img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-LLaMA-Alpaca-2.svg?color=blue&style=flat-square"> <img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/ymcui/Chinese-LLaMA-Alpaca-2"> <img alt="GitHub top language" src="https://img.shields.io/github/languages/top/ymcui/Chinese-LLaMA-Alpaca-2"> <a href="https://app.codacy.com/gh/ymcui/Chinese-LLaMA-Alpaca-2/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/1710faac5e634acaabfc26b0a778cdde"/></a> 本项目基于Meta发布的可商用大模型[Llama-2](https://github.com/facebookresearch/llama)开发，是[中文LLaMA&Alpaca大模型](https://github.com/ymcui/Chinese-LLaMA-Alpaca)的第二期项目，开源了**中文LLaMA-2基座模型和Alpaca-2指令精调大模型**。这些模型**在原版Llama-2的基础上扩充并优化了中文词表**，使用了大规模中文数据进行增量预训练，进一步提升了中文基础语义和指令理解能力，相比一代相关模型获得了显著性能提升。相关模型**支持FlashAttention-2训练**。标准版模型支持4K上下文长度，**长上下文版模型支持16K、64k上下文长度**。**RLHF系列模型**为标准版模型基础上进行人类偏好对齐精调，相比标准版模型在**正确价值观体现**方面获得了显著性能提升。 #### 本项目主要内容 - 🚀 针对Llama-2模型扩充了**新版中文词表**，开源了中文LLaMA-2和Alpaca-2大模型 - 🚀 开源了预训练脚本、指令精调脚本，用户可根据需要进一步训练模型 - 🚀 使用个人电脑的CPU/GPU快速在本地进行大模型量化和部署体验 - 🚀 支持[🤗transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [LangChain](https://github.com/hwchase17/langchain), [privateGPT](https://github.com/imartinez/privateGPT), [vLLM](https://github.com/vllm-project/vllm)等LLaMA生态 #### 已开源的模型 - 基座模型（4K上下文）：Chinese-LLaMA-2 (1.3B, 7B, 13B) - 聊天模型（4K上下文）：Chinese-Alpaca-2 (1.3B, 7B, 13B) - 长上下文模型（16K/64K）： - Chinese-LLaMA-2-16K (7B, 13B) 、Chinese-Alpaca-2-16K (7B, 13B) - Chinese-LLaMA-2-64K (7B)、Chinese-Alpaca-2-64K (7B) - 偏好对齐模型：Chinese-Alpaca-2-RLHF (1.3B, 7B) ![](./pics/screencast.gif) ---- [中文LLaMA&Alpaca大模型](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | [多模态中文LLaMA&Alpaca大模型](https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca) | [多模态VLE](https://github.com/iflytek/VLE) | [中文MiniRBT](https://github.com/iflytek/MiniRBT) | [中文LERT](https://github.com/ymcui/LERT) | [中英文PERT](https://github.com/ymcui/PERT) | [中文MacBERT](https://github.com/ymcui/MacBERT) | [中文ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [中文XLNet](https://github.com/ymcui/Chinese-XLNet) | [中文BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) | [模型裁剪工具TextPruner](https://github.com/airaria/TextPruner) | [蒸馏裁剪一体化GRAIN](https://github.com/airaria/GRAIN) ## 新闻 **[2024/04/30] Chinese-LLaMA-Alpaca-3 已正式发布，开源基于Llama-3的Llama-3-Chinese-8B和Llama-3-Chinese-8B-Instruct，推荐所有一期、二期项目用户升级至三代模型，请参阅：https://github.com/ymcui/Chinese-LLaMA-Alpaca-3** [2024/03/27] 本项目已入驻机器之心SOTA!模型平台，欢迎关注：https://sota.jiqizhixin.com/project/chinese-llama-alpaca-2 [2024/01/23] 添加新版GGUF模型（imatrix量化）、AWQ量化模型，支持vLLM下加载YaRN长上下文模型。详情查看[📚 v4.1版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v4.1) [2023/12/29] 发布长上下文模型Chinese-LLaMA-2-7B-64K和Chinese-Alpaca-2-7B-64K，同时发布经过人类偏好对齐（RLHF）的Chinese-Alpaca-2-RLHF（1.3B/7B）。详情查看[📚 v4.0版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v4.0) [2023/09/01] 发布长上下文模型Chinese-Alpaca-2-7B-16K和Chinese-Alpaca-2-13B-16K，该模型可直接应用于下游任务，例如privateGPT等。详情查看[📚 v3.1版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v3.1) [2023/08/25] 发布长上下文模型Chinese-LLaMA-2-7B-16K和Chinese-LLaMA-2-13B-16K，支持16K上下文，并可通过NTK方法进一步扩展至24K+。详情查看[📚 v3.0版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v3.0) [2023/08/14] 发布Chinese-LLaMA-2-13B和Chinese-Alpaca-2-13B，添加text-generation-webui/LangChain/privateGPT支持，添加CFG Sampling解码方法等。详情查看[📚 v2.0版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v2.0) [2023/08/02] 添加FlashAttention-2训练支持，基于vLLM的推理加速支持，提供长回复系统提示语模板等。详情查看[📚 v1.1版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v1.1) [2023/07/31] 正式发布Chinese-LLaMA-2-7B（基座模型），使用120G中文语料增量训练（与一代Plus系列相同）；进一步通过5M条指令数据精调（相比一代略微增加），得到Chinese-Alpaca-2-7B（指令/chat模型）。详情查看[📚 v1.0版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/releases/tag/v1.0) [2023/07/19] 🚀启动[中文LLaMA-2、Alpaca-2开源大模型项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) ## 内容导引 | 章节 | 描述 | | ------------------------------------- | ------------------------------------------------------------ | | [💁🏻‍♂️模型简介](#模型简介) | 简要介绍本项目相关模型的技术特点 | | [⏬模型下载](#模型下载) | 中文LLaMA-2、Alpaca-2大模型下载地址 | | [💻推理与部署](#推理与部署) | 介绍了如何对模型进行量化并使用个人电脑部署并体验大模型 | | [💯系统效果](#系统效果) | 介绍了模型在部分任务上的效果 | | [📝训练与精调](#训练与精调) | 介绍了如何训练和精调中文LLaMA-2、Alpaca-2大模型 | | [❓常见问题](#常见问题) | 一些常见问题的回复 | ## 模型简介本项目推出了基于Llama-2的中文LLaMA-2以及Alpaca-2系列模型，相比[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)其主要特点如下： #### 📖 经过优化的中文词表 - 在[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)中，我们针对一代LLaMA模型的32K词表扩展了中文字词（LLaMA：49953，Alpaca：49954） - 在本项目中，我们**重新设计了新词表**（大小：55296），进一步提升了中文字词的覆盖程度，同时统一了LLaMA/Alpaca的词表，避免了因混用词表带来的问题，以期进一步提升模型对中文文本的编解码效率 #### ⚡ 基于FlashAttention-2的高效注意力 - [FlashAttention-2](https://github.com/Dao-AILab/flash-attention)是高效注意力机制的一种实现，相比其一代技术具有**更快的速度和更优化的显存占用** - 当上下文长度更长时，为了避免显存爆炸式的增长，使用此类高效注意力技术尤为重要 - 本项目的所有模型均使用了FlashAttention-2技术进行训练 #### 🚄 基于PI和YaRN的超长上下文扩展技术 - 在[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)中，我们实现了[基于NTK的上下文扩展技术](https://github.com/ymcui/Chinese-LLaMA-Alpaca/pull/743)，可在不继续训练模型的情况下支持更长的上下文 - 基于[位置插值PI](https://arxiv.org/abs/2306.15595)和NTK等方法推出了16K长上下文版模型，支持16K上下文，并可通过NTK方法最高扩展至24K-32K - 基于[YaRN](https://arxiv.org/abs/2309.00071)方法进一步推出了64K长上下文版模型，支持64K上下文 - 进一步设计了**方便的自适应经验公式**，无需针对不同的上下文长度设置NTK超参，降低了使用难度 #### 🤖 简化的中英双语系统提示语 - 在[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)中，中文Alpaca系列模型使用了[Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)的指令模板和系统提示语 - 初步实验发现，Llama-2-Chat系列模型的默认系统提示语未能带来统计显著的性能提升，且其内容过于冗长 - 本项目中的Alpaca-2系列模型简化了系统提示语，同时遵循Llama-2-Chat指令模板，以便更好地适配相关生态 #### 👮 人类偏好对齐 - 在[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)中，中文Alpaca系列模型仅完成预训练和指令精调，获得了基本的对话能力 - 通过基于人类反馈的强化学习（RLHF）实验，发现可显著提升模型传递正确价值观的能力 - 本项目推出了Alpaca-2-RLHF系列模型，使用方式与SFT模型一致下图展示了本项目以及[一期项目](https://github.com/ymcui/Chinese-LLaMA-Alpaca)推出的所有大模型之间的关系。 ![](./pics/models.png) ## 模型下载 ### 模型选择指引以下是中文LLaMA-2和Alpaca-2模型的对比以及建议使用场景。**如需聊天交互，请选择Alpaca而不是LLaMA。** | 对比项 | 中文LLaMA-2 | 中文Alpaca-2 | | :-------------------- | :----------------------------------------------------: | :----------------------------------------------------------: | | 模型类型 | **基座模型** | **指令/Chat模型（类ChatGPT）** | | 已开源大小 | 1.3B、7B、13B | 1.3B、7B、13B | | 训练类型 | Causal-LM (CLM) | 指令精调 | | 训练方式 | 7B、13B：LoRA + 全量emb/lm-head 1.3B：全量 | 7B、13B：LoRA + 全量emb/lm-head 1.3B：全量 | | 基于什么模型训练 | [原版Llama-2](https://github.com/facebookresearch/llama)（非chat版） | 中文LLaMA-2 | | 训练语料 | 无标注通用语料（120G纯文本） | 有标注指令数据（500万条） | | 词表大小[1] | 55,296 | 55,296 | | 上下文长度[2] | 标准版：4K（12K-18K） 长上下文版（PI）：16K（24K-32K） 长上下文版（YaRN）：64K | 标准版：4K（12K-18K） 长上下文版（PI）：16K（24K-32K） 长上下文版（YaRN）：64K | | 输入模板 | 不需要 | 需要套用特定模板[3]，类似Llama-2-Chat | | 适用场景 | 文本续写：给定上文，让模型生成下文 | 指令理解：问答、写作、聊天、交互等 | | 不适用场景 | 指令理解、多轮聊天等 | 文本无限制自由生成 | | 偏好对齐 | 无 | RLHF版本（1.3B、7B） | > [!NOTE] > [1] *本项目一代模型和二代模型的词表不同，请勿混用。二代LLaMA和Alpaca的词表相同。* > [2] *括号内表示基于NTK上下文扩展支持的最大长度。* > [3] *Alpaca-2采用了Llama-2-chat系列模板（格式相同，提示语不同），而不是一代Alpaca的模板，请勿混用。* > [4] *不建议单独使用1.3B模型，而是通过投机采样搭配更大的模型（7B、13B）使用。* ### 完整模型下载以下是完整版模型，直接下载即可使用，无需其他合并步骤。推荐网络带宽充足的用户。 | 模型名称 | 类型 | 大小 | 下载地址 | GGUF | | :------------------------ | :------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | | Chinese-LLaMA-2-13B | 基座模型 | 24.7 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-13b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-13b) [[Baidu]](https://pan.baidu.com/s/1T3RqEUSmyg6ZuBwMhwSmoQ?pwd=e9qy) | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-13b-gguf) | | Chinese-LLaMA-2-7B | 基座模型 | 12.9 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-7b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-7b) [[Baidu]](https://pan.baidu.com/s/1E5NI3nlQpx1j8z3eIzbIlg?pwd=n8k3) | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-7b-gguf) | | Chinese-LLaMA-2-1.3B | 基座模型 | 2.4 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-1.3b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-1.3b)[[Baidu]](https://pan.baidu.com/s/1hEuOCllnJJ5NMEZJf8OkRw?pwd=nwjg) | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-1.3b-gguf) | | Chinese-Alpaca-2-13B | 指令模型 | 24.7 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-13b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-13b) [[Baidu]](https://pan.baidu.com/s/1MT_Zlap1OtdYMgoBNTS3dg?pwd=9xja) | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-13b-gguf) | | Chinese-Alpaca-2-7B | 指令模型 | 12.9 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-7b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-7b) [[Baidu]](https://pan.baidu.com/s/1wxx-CdgbMupXVRBcaN4Slw?pwd=kpn9) | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-7b-gguf) | | Chinese-Alpaca-2-1.3B | 指令模型 | 2.4 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-1.3b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-1.3b) [[Baidu]](https://pan.baidu.com/s/1PD7Ng-ltOIdUGHNorveptA?pwd=ar1p) | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-1.3b-gguf) | #### 长上下文版模型以下是长上下文版模型，**推荐以长文本为主的下游任务使用**，否则建议使用上述标准版。 | 模型名称 | 类型 | 大小 | 下载地址 | GGUF | | :------------------------ | :------: | :-----: | :----------------------------------------------------------: | :----------------------------------------------------------: | | Chinese-LLaMA-2-7B-64K 🆕 | 基座模型 | 12.9 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-7b-64k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-7b-64k) [[Baidu]](https://pan.baidu.com/s/1ShDQ2FG2QUJrvfnxCn4hwQ?pwd=xe5k) | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-7b-64k-gguf) | | Chinese-Alpaca-2-7B-64K 🆕 | 指令模型 | 12.9 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-7b-64k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-7b-64k) [[Baidu]](https://pan.baidu.com/s/1KBAr9PCGvX2oQkYfCuLEjw?pwd=sgp6) | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-7b-64k-gguf) | | Chinese-LLaMA-2-13B-16K | 基座模型 | 24.7 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-13b-16k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-13b-16k) [[Baidu]](https://pan.baidu.com/s/1XWrh3Ru9x4UI4-XmocVT2w?pwd=f7ik) | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-13b-16k-gguf) | | Chinese-LLaMA-2-7B-16K | 基座模型 | 12.9 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-7b-16k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-7b-16k) [[Baidu]](https://pan.baidu.com/s/1ZH7T7KU_up61ugarSIXw2g?pwd=pquq) | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-7b-16k-gguf) | | Chinese-Alpaca-2-13B-16K | 指令模型 | 24.7 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-13b-16k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-13b-16k) [[Baidu]](https://pan.baidu.com/s/1gIzRM1eg-Xx1xV-3nXW27A?pwd=qi7c) | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-13b-16k-gguf) | | Chinese-Alpaca-2-7B-16K | 指令模型 | 12.9 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-7b-16k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-7b-16k) [[Baidu]](https://pan.baidu.com/s/1Qk3U1LyvMb1RSr5AbiatPw?pwd=bfis) | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-7b-16k-gguf) | #### RLHF版模型以下是人类偏好对齐版模型，对涉及法律、道德的问题较标准版有更优的价值导向。 | 模型名称 | 类型 | 大小 | 下载地址 | GGUF | | :------------------------ | :------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | | Chinese-Alpaca-2-7B-RLHF 🆕 | 指令模型 | 12.9 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-7b-rlhf) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-7b-rlhf) [[Baidu]](https://pan.baidu.com/s/17GJ1y4rpPDuvWlvPaWgnqw?pwd=4feb) | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-7b-rlhf-gguf) | | Chinese-Alpaca-2-1.3B-RLHF 🆕 | 指令模型 | 2.4 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-1.3b-rlhf) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-1.3b-rlhf) [[Baidu]](https://pan.baidu.com/s/1cLKJKieNitWbOggUXXaamw?pwd=cprp) | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-1.3b-rlhf-gguf) | #### AWQ版模型 AWQ（Activation-aware Weight Quantization）是一种高效的模型量化方案，目前可兼容🤗transformers、llama.cpp等主流框架。本项目模型的AWQ预搜索结果可通过以下链接获取：https://huggingface.co/hfl/chinese-llama-alpaca-2-awq - 生成AWQ量化模型（AWQ官方目录）：https://github.com/mit-han-lab/llm-awq - llama.cpp中使用AWQ：https://github.com/ggerganov/llama.cpp/tree/master/awq-py ### LoRA模型下载以下是LoRA模型（含emb/lm-head），与上述完整模型一一对应。需要注意的是**LoRA模型无法直接使用**，必须按照教程与重构模型进行合并。推荐网络带宽不足，手头有原版Llama-2且需要轻量下载的用户。 | 模型名称 | 类型 | 合并所需基模型 | 大小 | LoRA下载地址 | | :------------------------ | :------: | :--------------------------------------------------------: | :----------------: | :----------------------------------------------------------: | | Chinese-LLaMA-2-LoRA-13B | 基座模型 | [Llama-2-13B-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1.5 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-lora-13b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-lora-13b) [[Baidu]](https://pan.baidu.com/s/1PFKTBn54GjAjzWeQISKruw?pwd=we6s) | | Chinese-LLaMA-2-LoRA-7B | 基座模型 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-lora-7b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-lora-7b) [[Baidu]](https://pan.baidu.com/s/1bmgqdyRh9E3a2uqOGyNqiQ?pwd=7kvq) | | Chinese-Alpaca-2-LoRA-13B | 指令模型 | [Llama-2-13B-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1.5 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-lora-13b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-lora-13b) [[Baidu]](https://pan.baidu.com/s/1Y5giIXOUUzI4Na6JOcviVA?pwd=tc2j) | | Chinese-Alpaca-2-LoRA-7B | 指令模型 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-lora-7b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-lora-7b) [[Baidu]](https://pan.baidu.com/s/1g0olPxkB_rlZ9UUVfOnbcw?pwd=5e7w) | 以下是长上下文版模型，**推荐以长文本为主的下游任务使用**，否则建议使用上述标准版。 | 模型名称 | 类型 | 合并所需基模型 | 大小 | LoRA下载地址 | | :------------------------ | :------: | :--------------------------------------------------------: | :----------------: | :----------------------------------------------------------: | | Chinese-LLaMA-2-LoRA-7B-64K 🆕 | 基座模型 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-lora-7b-64k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-lora-7b-64k) [[Baidu]](https://pan.baidu.com/s/1QjqKNM9Xez5g6koUrbII_w?pwd=94pk) | | Chinese-Alpaca-2-LoRA-7B-64K 🆕 | 指令模型 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-lora-7b-64k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-lora-7b-64k) [[Baidu]](https://pan.baidu.com/s/1t6bPpMlJCrs9Ce7LXs09-w?pwd=37it) | | Chinese-LLaMA-2-LoRA-13B-16K | 基座模型 | [Llama-2-13B-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1.5 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-lora-13b-16k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-lora-13b-16k) [[Baidu]](https://pan.baidu.com/s/1VrfOJmhDnXxrXcdnfX00fA?pwd=4t2j) | | Chinese-LLaMA-2-LoRA-7B-16K | 基座模型 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-llama-2-lora-7b-16k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-llama-2-lora-7b-16k) [[Baidu]](https://pan.baidu.com/s/14Jnm7QmcDx3XsK_NHZz6Uw?pwd=5b7i) | | Chinese-Alpaca-2-LoRA-13B-16K | 指令模型 | [Llama-2-13B-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1.5 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-lora-13b-16k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-lora-13b-16k) [[Baidu]](https://pan.baidu.com/s/1g42_X7Z0QWDyrrDqv2jifQ?pwd=bq7n) | | Chinese-Alpaca-2-LoRA-7B-16K | 指令模型 | [Llama-2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1.1 GB | [[🤗HF]](https://huggingface.co/hfl/chinese-alpaca-2-lora-7b-16k) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/chinese-alpaca-2-lora-7b-16k) [[Baidu]](https://pan.baidu.com/s/1E7GEZ6stp8EavhkhR06FwA?pwd=ewwy) | > [!IMPORTANT] > LoRA模型无法单独使用，必须与原版Llama-2进行合并才能转为完整模型。请通过以下方法对模型进行合并。 > > - [**在线转换**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/online_conversion_zh)：Colab用户可利用本项目提供的notebook进行在线转换并量化模型 > - [**手动转换**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/manual_conversion_zh)：离线方式转换，生成不同格式的模型，以便进行量化或进一步精调 ## 推理与部署本项目中的相关模型主要支持以下量化、推理和部署方式，具体内容请参考对应教程。 | 工具 | 特点 | CPU | GPU | 量化 | GUI | API | vLLM§ | 16K‡ | 64K‡ |投机采样 | 教程 | | :----------------------------------------------------------- | ---------------------------- | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |:--: | | [**llama.cpp**](https://github.com/ggerganov/llama.cpp) | 丰富的量化选项和高效本地推理 | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ |✅ |✅ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/llamacpp_zh) | | [**🤗Transformers**](https://github.com/huggingface/transformers) | 原生transformers推理接口 | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/inference_with_transformers_zh) | | [**Colab Demo**](https://colab.research.google.com/drive/1yu0eZ3a66by8Zqm883LLtRQrguBAb9MR?usp=sharing) | 在Colab中启动交互界面 | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | [link](https://colab.research.google.com/drive/1yu0eZ3a66by8Zqm883LLtRQrguBAb9MR?usp=sharing) | | [**仿OpenAI API调用**](https://platform.openai.com/docs/api-reference) | 仿OpenAI API接口的服务器Demo | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/api_calls_zh) | | [**text-generation-webui**](https://github.com/oobabooga/text-generation-webui) | 前端Web UI界面的部署方式 | ✅ | ✅ | ✅ | ✅ | ✅† | ❌ | ✅ | ❌ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/text-generation-webui_zh) | | [**LangChain**](https://github.com/hwchase17/langchain) | 适合二次开发的大模型应用开源框架 | ✅† | ✅ | ✅† | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/langchain_zh) | | [**privateGPT**](https://github.com/imartinez/privateGPT) | 基于LangChain的多文档本地问答框架 | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | [link](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/privategpt_zh) | > [!NOTE] > † 工具支持该特性，但教程中未实现，详细说明请参考对应官方文档 > ‡ 指是否支持长上下文版本模型（需要第三方库支持自定义RoPE） > § vLLM后端不支持长上下文版本模型 ## 系统效果为了评测相关模型的效果，本项目分别进行了生成效果评测和客观效果评测（NLU类），从不同角度对大模型进行评估。需要注意的是，综合评估大模型能力仍然是亟待解决的重要课题，单个数据集的结果并不能综合评估模型性能。推荐用户在自己关注的任务上进行测试，选择适配相关任务的模型。 ### 生成效果评测为了更加直观地了解模型的生成效果，本项目仿照[Fastchat Chatbot Arena](https://chat.lmsys.org/?arena)推出了模型在线对战平台，可浏览和评测模型回复质量。对战平台提供了胜率、Elo评分等评测指标，并且可以查看两两模型的对战胜率等结果。题库来自于[一期项目人工制作的200题](https://github.com/ymcui/Chinese-LLaMA-Alpaca/tree/main/examples/f16-p7b-p13b-33b)，以及在此基础上额外增加的题目。生成回复具有随机性，受解码超参、随机种子等因素影响，因此相关评测并非绝对严谨，结果仅供晾晒参考，欢迎自行体验。部分生成样例请查看[examples目录](./examples)。 **⚔️ 模型竞技场：[http://llm-arena.ymcui.com](http://llm-arena.ymcui.com/)** | 系统 | 对战胜率（无平局） ↓ | Elo评分 | | ------------------------------------------------------------ | :------------------: | :-----: | | **Chinese-Alpaca-2-13B-16K** | 86.84% | 1580 | | **Chinese-Alpaca-2-13B** | 72.01% | 1579 | | [Chinese-Alpaca-Pro-33B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 64.87% | 1548 | | **Chinese-Alpaca-2-7B** | 64.11% | 1572 | | [Chinese-Alpaca-Pro-7B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 62.05% | 1500 | | **Chinese-Alpaca-2-7B-16K** | 61.67% | 1540 | | [Chinese-Alpaca-Pro-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 61.26% | 1567 | | [Chinese-Alpaca-Plus-33B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 31.29% | 1401 | | [Chinese-Alpaca-Plus-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 23.43% | 1329 | | [Chinese-Alpaca-Plus-7B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | 20.92% | 1379 | > [!NOTE] > 以上结果截至2023年9月1日。最新结果请进入[**⚔️竞技场**](http://llm-arena.ymcui.com/)进行查看。 ### 客观效果评测：C-Eval [C-Eval](https://cevalbenchmark.com)是一个全面的中文基础模型评估套件，其中验证集和测试集分别包含1.3K和12.3K个选择题，涵盖52个学科。实验结果以“zero-shot / 5-shot”进行呈现。C-Eval推理代码请参考本项目：[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/ceval_zh) | LLaMA Models | Valid | Test | Alpaca Models | Valid | Test | | ----------------------- | :---------: | :---------: | ------------------------ | :---------: | :---------: | | **Chinese-LLaMA-2-13B** | 40.6 / 42.7 | 38.0 / 41.6 | **Chinese-Alpaca-2-13B** | 44.3 / 45.9 | 42.6 / 44.0 | | **Chinese-LLaMA-2-7B** | 28.2 / 36.0 | 30.3 / 34.2 | **Chinese-Alpaca-2-7B** | 41.3 / 42.9 | 40.3 / 39.5 | | Chinese-LLaMA-Plus-33B | 37.4 / 40.0 | 35.7 / 38.3 | Chinese-Alpaca-Plus-33B | 46.5 / 46.3 | 44.9 / 43.5 | | Chinese-LLaMA-Plus-13B | 27.3 / 34.0 | 27.8 / 33.3 | Chinese-Alpaca-Plus-13B | 43.3 / 42.4 | 41.5 / 39.9 | | Chinese-LLaMA-Plus-7B | 27.3 / 28.3 | 26.9 / 28.4 | Chinese-Alpaca-Plus-7B | 36.7 / 32.9 | 36.4 / 32.3 | ### 客观效果评测：CMMLU [CMMLU](https://github.com/haonan-li/CMMLU)是另一个综合性中文评测数据集，专门用于评估语言模型在中文语境下的知识和推理能力，涵盖了从基础学科到高级专业水平的67个主题，共计11.5K个选择题。CMMLU推理代码请参考本项目：[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/cmmlu_zh) | LLaMA Models | Test (0/few-shot) | Alpaca Models | Test (0/few-shot) | | ----------------------- | :---------------: | ------------------------ | :---------------: | | **Chinese-LLaMA-2-13B** | 38.9 / 42.5 | **Chinese-Alpaca-2-13B** | 43.2 / 45.5 | | **Chinese-LLaMA-2-7B** | 27.9 / 34.1 | **Chinese-Alpaca-2-7B** | 40.0 / 41.8 | | Chinese-LLaMA-Plus-33B | 35.2 / 38.8 | Chinese-Alpaca-Plus-33B | 46.6 / 45.3 | | Chinese-LLaMA-Plus-13B | 29.6 / 34.0 | Chinese-Alpaca-Plus-13B | 40.6 / 39.9 | | Chinese-LLaMA-Plus-7B | 25.4 / 26.3 | Chinese-Alpaca-Plus-7B | 36.8 / 32.6 | ### 长上下文版模型评测 [LongBench](https://github.com/THUDM/LongBench)是一个大模型长文本理解能力的评测基准，由6大类、20个不同的任务组成，多数任务的平均长度在5K-15K之间，共包含约4.75K条测试数据。以下是本项目长上下文版模型在该中文任务（含代码任务）上的评测效果。LongBench推理代码请参考本项目：[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/longbench_zh) | Models | 单文档QA | 多文档QA | 摘要 | Few-shot学习 | 代码补全 | 合成任务 | Avg | | ---------------------------- | :------: | :------: | :--: | :----------: | :------: | :------: | :--: | | **Chinese-Alpaca-2-7B-64K** | 44.7 | 28.1 | 14.4 | 39.0 | 44.6 | 5.0 | 29.3| | **Chinese-LLaMA-2-7B-64K** | 27.2 | 16.4 | 6.5 | 33.0 | 7.8 | 5.0 | 16.0| | **Chinese-Alpaca-2-13B-16K** | 47.9 | 26.7 | 13.0 | 22.3 | 46.6 | 21.5 | 29.7 | | Chinese-Alpaca-2-13B | 38.4 | 20.0 | 11.9 | 17.3 | 46.5 | 8.0 | 23.7 | | **Chinese-Alpaca-2-7B-16K** | 46.4 | 23.3 | 14.3 | 29.0 | 49.6 | 9.0 | 28.6 | | Chinese-Alpaca-2-7B | 34.0 | 17.4 | 11.8 | 21.3 | 50.3 | 4.5 | 23.2 | | **Chinese-LLaMA-2-13B-16K** | 36.7 | 17.7 | 3.1 | 29.8 | 13.8 | 3.0 | 17.3 | | Chinese-LLaMA-2-13B | 28.3 | 14.4 | 4.6 | 16.3 | 10.4 | 5.4 | 13.2 | | **Chinese-LLaMA-2-7B-16K** | 33.2 | 15.9 | 6.5 | 23.5 | 10.3 | 5.3 | 15.8| | Chinese-LLaMA-2-7B | 19.0 | 13.9 | 6.4 | 11.0 | 11.0 | 4.7 | 11.0 | ### 量化效果评测以Chinese-LLaMA-2-7B为例，对比不同精度下的模型大小、PPL（困惑度）、C-Eval效果，方便用户了解量化精度损失。PPL以4K上下文大小计算，C-Eval汇报的是valid集合上zero-shot和5-shot结果。 | 精度 | 模型大小 | PPL | C-Eval | | :-------- | :------: | :----: | :---------: | | FP16 | 12.9 GB | 9.373 | 28.2 / 36.0 | | 8-bit量化 | 6.8 GB | 9.476 | 26.8 / 35.4 | | 4-bit量化 | 3.7 GB | 10.132 | 25.5 / 32.8 | 特别地，以下是在llama.cpp下不同量化方法的评测数据，供用户参考，速度以ms/tok计，测试设备为M1 Max。具体细节见[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/llamacpp_zh#关于量化方法选择及推理速度) | llama.cpp | F16 | Q2_K | Q3_K | Q4_0 | Q4_1 | Q4_K | Q5_0 | Q5_1 | Q5_K | Q6_K | Q8_0 | | --------- | -----: | -----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | | PPL | 9.128 | 11.107 | 9.576 | 9.476 | 9.576 | 9.240 | 9.156 | 9.213 | 9.168 | 9.133 | 9.129 | | Size | 12.91G | 2.41G | 3.18G | 3.69G | 4.08G | 3.92G | 4.47G | 4.86G | 4.59G | 5.30G | 6.81G | | CPU Speed | 117 | 42 | 51 | 39 | 44 | 43 | 48 | 51 | 50 | 54 | 65 | | GPU Speed | 53 | 19 | 21 | 17 | 18 | 20 | x | x | 25 | 26 | x | ### 投机采样加速效果评测通过投机采样方法并借助Chinese-LLaMA-2-1.3B和Chinese-Alpaca-2-1.3B，可以分别加速7B、13B的LLaMA和Alpaca模型的推理速度。以下是使用[投机采样脚本](scripts/inference/speculative_sample.py)在1*A40-48G上解码[生成效果评测](#生成效果评测)中的问题测得的平均速度（速度以ms/token计，模型均为fp16精度），供用户参考。详细说明见[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/inference_with_transformers_zh#投机采样解码)。 | 草稿模型 | 草稿模型速度 | 目标模型 | 目标模型速度 | 投机采样速度（加速比） | | :---------- | :-----------------: | :----------- | :-----------------: | :--------: | | Chinese-LLaMA-2-1.3B | 7.6 | Chinese-LLaMA-2-7B | 49.3 | 36.0（1.37x） | | Chinese-LLaMA-2-1.3B | 7.6 | Chinese-LLaMA-2-13B | 66.0 | 47.1（1.40x） | | Chinese-Alpaca-2-1.3B | 8.1 | Chinese-Alpaca-2-7B | 50.2 | 34.9（1.44x） | | Chinese-Alpaca-2-1.3B | 8.2 | Chinese-Alpaca-2-13B | 67.0 | 41.6（1.61x） | ### 人类偏好对齐（RLHF）版本评测 #### 对齐水平为评估中文模型与人类价值偏好对齐程度，我们自行构建了评测数据集，覆盖了道德、色情、毒品、暴力等人类价值偏好重点关注的多个方面。实验结果以价值体现正确率进行呈现（体现正确价值观题目数 / 总题数）。 | Alpaca Models | Accuracy | Alpaca Models | Accuracy | | ------------------------ | :---------------: |------------------------ | :---------------: | | Chinese-Alpaca-2-1.3B | 79.3% | Chinese-Alpaca-2-7B | 88.3% | | **Chinese-Alpaca-2-1.3B-RLHF** | 95.8% | **Chinese-Alpaca-2-7B-RLHF** | 97.5% | #### 客观效果评测：C-Eval & CMMLU | Alpaca Models | C-Eval (0/few-shot) | CMMLU (0/few-shot) | | ------------------------ | :---------------: | :---------------: | | Chinese-Alpaca-2-1.3B | 23.8 / 26.8 | 24.8 / 25.1 | | Chinese-Alpaca-2-7B | 42.1 / 41.0 | 40.0 / 41.8 | | **Chinese-Alpaca-2-1.3B-RLHF** | 23.6 / 27.1 | 24.9 / 25.0 | | **Chinese-Alpaca-2-7B-RLHF** | 40.6 / 41.2 | 39.5 / 41.0 | ## 训练与精调 ### 预训练 - 在原版Llama-2的基础上，利用大规模无标注数据进行增量训练，得到Chinese-LLaMA-2系列基座模型 - 训练数据采用了一期项目中Plus版本模型一致的数据，其总量约120G纯文本文件 - 训练代码参考了🤗transformers中的[run_clm.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py)，使用方法见[📖预训练脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/pt_scripts_zh) ### 指令精调 - 在Chinese-LLaMA-2的基础上，利用有标注指令数据进行进一步精调，得到Chinese-Alpaca-2系列模型 - 训练数据采用了一期项目中Pro版本模型使用的指令数据，其总量约500万条指令数据（相比一期略增加） - 训练代码参考了[Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)项目中数据集处理的相关部分，使用方法见[📖指令精调脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_zh) ### RLHF精调 - 在Chinese-Alpaca-2系列模型基础上，利用偏好数据和PPO算法进行人类偏好对齐精调，得到Chinese-Alpaca-2-RLHF系列模型 - 训练数据基于多个开源项目中的人类偏好数据和本项目指令精调数据进行采样，奖励模型阶段、强化学习阶段分别约69.5K、25.6K条样本 - 训练代码基于[DeepSpeed-Chat](https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat)开发，具体流程见[📖奖励模型Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/rm_zh)和[📖强化学习Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/rl_zh) ## 常见问题请在提Issue前务必先查看FAQ中是否已存在解决方案。具体问题和解答请参考本项目 [📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/faq_zh) ``` 问题1：本项目和一期项目的区别？问题2：模型能否商用？问题3：接受第三方Pull Request吗？问题4：为什么不对模型做全量预训练而是用LoRA？问题5：二代模型支不支持某些支持一代LLaMA的工具？问题6：Chinese-Alpaca-2是Llama-2-Chat训练得到的吗？问题7：为什么24G显存微调Chinese-Alpaca-2-7B会OOM？问题8：可以使用16K长上下文版模型替代标准版模型吗？问题9：如何解读第三方公开榜单的结果？问题10：会出34B或者70B级别的模型吗？问题11：为什么长上下文版模型是16K，不是32K或者100K？问题12：为什么Alpaca模型会回复说自己是ChatGPT？问题13：为什么pt_lora_model或者sft_lora_model下的adapter_model.bin只有几百k？ ``` ## 引用如果您使用了本项目的相关资源，请参考引用本项目的技术报告：https://arxiv.org/abs/2304.08177 ``` @article{Chinese-LLaMA-Alpaca, title={Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca}, author={Cui, Yiming and Yang, Ziqing and Yao, Xin}, journal={arXiv preprint arXiv:2304.08177}, url={https://arxiv.org/abs/2304.08177}, year={2023} } ``` ## 致谢本项目主要基于以下开源项目二次开发，在此对相关项目和研究开发人员表示感谢。 - [Llama-2 *by Meta*](https://github.com/facebookresearch/llama) - [llama.cpp *by @ggerganov*](https://github.com/ggerganov/llama.cpp) - [FlashAttention-2 by *Dao-AILab*](https://github.com/Dao-AILab/flash-attention) 同时感谢Chinese-LLaMA-Alpaca（一期项目）的contributor以及[关联项目和人员](https://github.com/ymcui/Chinese-LLaMA-Alpaca#致谢)。 ## 免责声明本项目基于由Meta发布的Llama-2模型进行开发，使用过程中请严格遵守Llama-2的开源许可协议。如果涉及使用第三方代码，请务必遵从相关的开源许可协议。模型生成的内容可能会因为计算方法、随机因素以及量化精度损失等影响其准确性，因此，本项目不对模型输出的准确性提供任何保证，也不会对任何因使用相关资源和输出结果产生的损失承担责任。如果将本项目的相关模型用于商业用途，开发者应遵守当地的法律法规，确保模型输出内容的合规性，本项目不对任何由此衍生的产品或服务承担责任。 <details> <summary>局限性声明</summary> 虽然本项目中的模型具备一定的中文理解和生成能力，但也存在局限性，包括但不限于： - 可能会产生不可预测的有害内容以及不符合人类偏好和价值观的内容 - 由于算力和数据问题，相关模型的训练并不充分，中文理解能力有待进一步提升 - 暂时没有在线可互动的demo（注：用户仍然可以自行在本地部署和体验） </details> ## 问题反馈如有疑问，请在GitHub Issue中提交。礼貌地提出问题，构建和谐的讨论社区。 - 在提交问题之前，请先查看FAQ能否解决问题，同时建议查阅以往的issue是否能解决你的问题。 - 提交问题请使用本项目设置的Issue模板，以帮助快速定位具体问题。 - 重复以及与本项目无关的issue会被[stable-bot](https://github.com/marketplace/stale)处理，敬请谅解。 ## 关注我们欢迎关注微信公众号"**涌现志**"，了解最新的技术动态。 ![qrcode.png](https://ymcui.com/images/qrcode.jpg)

LLM Tools & Chat UIs ML Frameworks

7.1K Github Stars

Open Source

Chinese-BERT-wwm

# [Chinese-LLaMA-Alpaca-2 v1.0版本](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)已正式发布！ [**中文说明**](https://github.com/ymcui/Chinese-BERT-wwm/) | [**English**](https://github.com/ymcui/Chinese-BERT-wwm/blob/master/README_EN.md) <img src="./pics/banner.png" width="500"/> <a href="https://github.com/ymcui/Chinese-BERT-wwm/blob/master/LICENSE"> <img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-BERT-wwm.svg?color=blue&style=flat-square"> </a> 在自然语言处理领域中，预训练语言模型（Pre-trained Language Models）已成为非常重要的基础技术。为了进一步促进中文信息处理的研究发展，我们发布了基于全词掩码（Whole Word Masking）技术的中文预训练模型BERT-wwm，以及与此技术密切相关的模型：BERT-wwm-ext，RoBERTa-wwm-ext，RoBERTa-wwm-ext-large, RBT3, RBTL3等。 - **[Pre-Training with Whole Word Masking for Chinese BERT](https://ieeexplore.ieee.org/document/9599397)** - *Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang* - Published in *IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)* 本项目基于谷歌官方BERT：https://github.com/google-research/bert ---- [中文LERT](https://github.com/ymcui/LERT) | [中英文PERT](https://github.com/ymcui/PERT) | [中文MacBERT](https://github.com/ymcui/MacBERT) | [中文ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [中文XLNet](https://github.com/ymcui/Chinese-XLNet) | [中文BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) | [模型裁剪工具TextPruner](https://github.com/airaria/TextPruner) 查看更多哈工大讯飞联合实验室（HFL）发布的资源：https://github.com/ymcui/HFL-Anthology ## 新闻 **2023/3/28 开源了中文LLaMA&Alpaca大模型，可快速在PC上部署体验，查看：https://github.com/ymcui/Chinese-LLaMA-Alpaca** 2023/3/9 我们提出了一种图文多模态预训练模型VLE，查看：https://github.com/iflytek/VLE 2022/11/15 我们提出了中文小型预训练模型MiniRBT。查看：https://github.com/iflytek/MiniRBT 2022/10/29 我们提出了一种融合语言学信息的预训练模型LERT。查看：https://github.com/ymcui/LERT 2022/3/30 我们开源了一种新预训练模型PERT。查看：https://github.com/ymcui/PERT <details> <summary>历史新闻</summary> 2021/12/17 哈工大讯飞联合实验室推出模型裁剪工具包TextPruner。查看：https://github.com/airaria/TextPruner 2021/10/24 哈工大讯飞联合实验室发布面向少数民族语言的预训练模型CINO。查看：https://github.com/ymcui/Chinese-Minority-PLM 2021/7/21 由哈工大SCIR多位学者撰写的[《自然语言处理：基于预训练模型的方法》](https://item.jd.com/13344628.html)已出版，欢迎大家选购。 2021/1/27 所有模型已支持TensorFlow 2，请通过transformers库进行调用或下载。https://huggingface.co/hfl 2020/9/15 我们的论文["Revisiting Pre-Trained Models for Chinese Natural Language Processing"](https://arxiv.org/abs/2004.13922)被[Findings of EMNLP](https://2020.emnlp.org)录用为长文。 2020/8/27 哈工大讯飞联合实验室在通用自然语言理解评测GLUE中荣登榜首，查看[GLUE榜单](https://gluebenchmark.com/leaderboard)，[新闻](http://dwz.date/ckrD)。 2020/3/23 本目录发布的模型已接入[飞桨PaddleHub](https://github.com/PaddlePaddle/PaddleHub)，查看[快速加载](#快速加载) 2020/3/11 为了更好地了解需求，邀请您填写[调查问卷](https://wj.qq.com/s2/5637766/6281)，以便为大家提供更好的资源。 2020/2/26 哈工大讯飞联合实验室发布[知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) 2020/1/20 祝大家鼠年大吉，本次发布了RBT3、RBTL3（3层RoBERTa-wwm-ext-base/large），查看[小参数量模型](#小参数量模型) 2019/12/19 本目录发布的模型已接入[Huggingface-Transformers](https://github.com/huggingface/transformers)，查看[快速加载](#快速加载) 2019/10/14 发布萝卜塔RoBERTa-wwm-ext-large模型，查看[中文模型下载](#中文模型下载) 2019/9/10 发布萝卜塔RoBERTa-wwm-ext模型，查看[中文模型下载](#中文模型下载) 2019/7/30 提供了在更大通用语料（5.4B词数）上训练的中文`BERT-wwm-ext`模型，查看[中文模型下载](#中文模型下载) 2019/6/20 初始版本，模型已可通过谷歌下载，国内云盘也已上传完毕，查看[中文模型下载](#中文模型下载) </details> ## 内容导引 | 章节 | 描述 | |-|-| | [简介](#简介) | 介绍BERT-wwm基本原理 | | [中文模型下载](#中文模型下载) | 提供了BERT-wwm的下载地址 | | [快速加载](#快速加载) | 介绍了如何使用[🤗Transformers](https://github.com/huggingface/transformers)、[PaddleHub](https://github.com/PaddlePaddle/PaddleHub)快速加载模型 | | [模型对比](#模型对比) | 提供了本目录中模型的参数对比 | | [中文基线系统效果](#中文基线系统效果) | 列举了部分中文基线系统效果 | | [小参数量模型](#小参数量模型) | 列举了小参数量模型（3层Transformer）的效果 | | [使用建议](#使用建议) | 提供了若干使用中文预训练模型的建议 | | [英文模型下载](#英文模型下载) | 谷歌官方的英文BERT-wwm下载地址 | | [FAQ](#FAQ) | 常见问题答疑 | | [引用](#引用) | 本目录的技术报告 | ## 简介 **Whole Word Masking (wwm)**，暂翻译为`全词Mask`或`整词Mask`，是谷歌在2019年5月31日发布的一项BERT的升级版本，主要更改了原预训练阶段的训练样本生成策略。简单来说，原有基于WordPiece的分词方式会把一个完整的词切分成若干个子词，在生成训练样本时，这些被分开的子词会随机被mask。在`全词Mask`中，如果一个完整的词的部分WordPiece子词被mask，则同属该词的其他部分也会被mask，即`全词Mask`。 **需要注意的是，这里的mask指的是广义的mask（替换成[MASK]；保持原词汇；随机替换成另外一个词），并非只局限于单词替换成`[MASK]`标签的情况。更详细的说明及样例请参考：[#4](https://github.com/ymcui/Chinese-BERT-wwm/issues/4)** 同理，由于谷歌官方发布的`BERT-base, Chinese`中，中文是以**字**为粒度进行切分，没有考虑到传统NLP中的中文分词（CWS）。我们将全词Mask的方法应用在了中文中，使用了中文维基百科（包括简体和繁体）进行训练，并且使用了[哈工大LTP](http://ltp.ai)作为分词工具，即对组成同一个**词**的汉字全部进行Mask。下述文本展示了`全词Mask`的生成样例。 **注意：为了方便理解，下述例子中只考虑替换成[MASK]标签的情况。** | 说明 | 样例 | | :------- | :--------- | | 原始文本 | 使用语言模型来预测下一个词的probability。 | | 分词文本 | 使用语言模型来预测下一个词的 probability 。 | | 原始Mask输入 | 使用语言 [MASK] 型来 [MASK] 测下一个词的 pro [MASK] ##lity 。 | | 全词Mask输入 | 使用语言 [MASK] [MASK] 来 [MASK] [MASK] 下一个词的 [MASK] [MASK] [MASK] 。 | ## 中文模型下载本目录中主要包含base模型，故我们不在模型简称中标注`base`字样。对于其他大小的模型会标注对应的标记（例如large）。 * **`BERT-large模型`**：24-layer, 1024-hidden, 16-heads, 330M parameters * **`BERT-base模型`**：12-layer, 768-hidden, 12-heads, 110M parameters **注意：开源版本不包含MLM任务的权重；如需做MLM任务，请使用额外数据进行二次预训练（和其他下游任务一样）。** | 模型简称 | 语料 | 🤗HF下载 | 百度网盘下载 | | :------- | :--------: | :---------: | :---------: | | **`BERT-wwm, Chinese`** | 中文维基 | [HF Link](https://huggingface.co/hfl/chinese-bert-wwm) | [TensorFlow（密码qfh8）](https://pan.baidu.com/s/1HDdDXiYxGT5ub5OeO7qdWw?pwd=qfh8) | | **`BERT-wwm-ext, Chinese`** | EXT数据[1] | [HF Link](https://huggingface.co/hfl/chinese-bert-wwm-ext) | [TensorFlow（密码wgnt）](https://pan.baidu.com/s/1x-jIw1X2yNYHGak2yiq4RQ?pwd=wgnt) | | **`RoBERTa-wwm-ext, Chinese`** | EXT数据[1] | [HF Link](https://huggingface.co/hfl/chinese-roberta-wwm-ext) | [TensorFlow（密码vybq）](https://pan.baidu.com/s/1oR0cgSXE3Nz6dESxr98qVA?pwd=vybq) | | **`RoBERTa-wwm-ext-large, Chinese`** | EXT数据[1] | [HF Link](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large) | [TensorFlow（密码dqqe）](https://pan.baidu.com/s/1F68xzCLWEonTEVP7HQ0Ciw?pwd=dqqe) | | **`RBT3, Chinese`** | EXT数据[1] | [HF Link](https://huggingface.co/hfl/rbt3) | [TensorFlow（密码5a57）](https://pan.baidu.com/s/1AnapwWj1YBZ_4E6AAtj2lg?pwd=5a57) | | **`RBT4, Chinese`** | EXT数据[1] | [HF Link](https://huggingface.co/hfl/rbt4) | [TensorFlow（密码sjpt）](https://pan.baidu.com/s/1MUrmuTULnMn3L1aw_dXxSA?pwd=sjpt) | | **`RBT6, Chinese`** | EXT数据[1] | [HF Link](https://huggingface.co/hfl/rbt6) | [TensorFlow（密码hniy）](https://pan.baidu.com/s/1_MDAIYIGVgDovWkSs51NDA?pwd=hniy) | | **`RBTL3, Chinese`** | EXT数据[1] | [HF Link](https://huggingface.co/hfl/rbtl3) | [TensorFlow（密码s6cu）](https://pan.baidu.com/s/1vV9ClBMbsSpt8wUpfQz62Q?pwd=s6cu) | > [1] EXT数据包括：中文维基百科，其他百科、新闻、问答等数据，总词数达5.4B。 ### PyTorch版本如需PyTorch版本， 1）请自行通过[🤗Transformers](https://github.com/huggingface/transformers)提供的转换脚本进行转换。 2）或者通过huggingface官网直接下载PyTorch版权重：https://huggingface.co/hfl 下载方法：点击任意需要下载的模型 → 选择"Files and versions"选项卡 → 下载对应的模型文件。 ### 使用说明中国大陆境内建议使用百度网盘下载点，境外用户建议使用谷歌下载点，base模型文件大小约**400M**。以TensorFlow版`BERT-wwm, Chinese`为例，下载完毕后对zip文件进行解压得到： ``` chinese_wwm_L-12_H-768_A-12.zip |- bert_model.ckpt # 模型权重 |- bert_model.meta # 模型meta信息 |- bert_model.index # 模型index信息 |- bert_config.json # 模型参数 |- vocab.txt # 词表 ``` 其中`bert_config.json`和`vocab.txt`与谷歌原版`BERT-base, Chinese`完全一致。 PyTorch版本则包含`pytorch_model.bin`, `bert_config.json`, `vocab.txt`文件。 ## 快速加载 ### 使用Huggingface-Transformers 依托于[🤗transformers库](https://github.com/huggingface/transformers)，可轻松调用以上模型。 ``` tokenizer = BertTokenizer.from_pretrained("MODEL_NAME") model = BertModel.from_pretrained("MODEL_NAME") ``` **注意：本目录中的所有模型均使用BertTokenizer以及BertModel加载，请勿使用RobertaTokenizer/RobertaModel！** 其中`MODEL_NAME`对应列表如下： | 模型名 | MODEL_NAME | | - | - | | RoBERTa-wwm-ext-large | hfl/chinese-roberta-wwm-ext-large | | RoBERTa-wwm-ext | hfl/chinese-roberta-wwm-ext | | BERT-wwm-ext | hfl/chinese-bert-wwm-ext | | BERT-wwm | hfl/chinese-bert-wwm | | RBT3 | hfl/rbt3 | | RBTL3 | hfl/rbtl3 | ### 使用PaddleHub 依托[PaddleHub](https://github.com/PaddlePaddle/PaddleHub)，只需一行代码即可完成模型下载安装，十余行代码即可完成文本分类、序列标注、阅读理解等任务。 ``` import paddlehub as hub module = hub.Module(name=MODULE_NAME) ``` 其中`MODULE_NAME`对应列表如下： | 模型名 | MODULE_NAME | | - | - | | RoBERTa-wwm-ext-large | [chinese-roberta-wwm-ext-large](https://www.paddlepaddle.org.cn/hubdetail?name=chinese-roberta-wwm-ext-large&en_category=SemanticModel) | | RoBERTa-wwm-ext | [chinese-roberta-wwm-ext](https://www.paddlepaddle.org.cn/hubdetail?name=chinese-roberta-wwm-ext&en_category=SemanticModel) | | BERT-wwm-ext | [chinese-bert-wwm-ext](https://www.paddlepaddle.org.cn/hubdetail?name=chinese-bert-wwm-ext&en_category=SemanticModel) | | BERT-wwm | [chinese-bert-wwm](https://www.paddlepaddle.org.cn/hubdetail?name=chinese-bert-wwm&en_category=SemanticModel) | | RBT3 | [rbt3](https://www.paddlepaddle.org.cn/hubdetail?name=rbt3&en_category=SemanticModel) | | RBTL3 | [rbtl3](https://www.paddlepaddle.org.cn/hubdetail?name=rbtl3&en_category=SemanticModel) | ## 模型对比针对大家比较关心的一些模型细节进行汇总如下。 | - | BERTGoogle | BERT-wwm | BERT-wwm-ext | RoBERTa-wwm-ext | RoBERTa-wwm-ext-large | | :------- | :---------: | :---------: | :---------: | :---------: | :---------: | | Masking | WordPiece | WWM[1] | WWM | WWM | WWM | | Type | base | base | base | base | **large** | | Data Source | wiki | wiki | wiki+ext[2] | wiki+ext | wiki+ext | | Training Tokens # | 0.4B | 0.4B | 5.4B | 5.4B | 5.4B | | Device | TPU Pod v2 | TPU v3 | TPU v3 | TPU v3 | **TPU Pod v3-32[3]** | | Training Steps | ? | 100KMAX128 +100KMAX512 | 1MMAX128 +400KMAX512 | 1MMAX512 | 2MMAX512 | | Batch Size | ? | 2,560 / 384 | 2,560 / 384 | 384 | 512 | | Optimizer | AdamW | LAMB | LAMB | AdamW | AdamW | | Vocabulary | 21,128 | ~BERT[4] | ~BERT | ~BERT | ~BERT | | Init Checkpoint | Random Init | ~BERT | ~BERT | ~BERT | Random Init | > [1] WWM = Whole Word Masking > [2] ext = extended data > [3] TPU Pod v3-32 (512G HBM)等价于4个TPU v3 (128G HBM) > [4] `~BERT`表示**继承**谷歌原版中文BERT的属性 ## 中文基线系统效果为了对比基线效果，我们在以下几个中文数据集上进行了测试，包括`句子级`和`篇章级`任务。对于`BERT-wwm-ext`、`RoBERTa-wwm-ext`、`RoBERTa-wwm-ext-large`，我们**没有进一步调整最佳学习率**，而是直接使用了`BERT-wwm`的最佳学习率。最佳学习率： | 模型 | BERT | ERNIE | BERT-wwm* | | :------- | :---------: | :---------: | :---------: | | CMRC 2018 | 3e-5 | 8e-5 | 3e-5 | | DRCD | 3e-5 | 8e-5 | 3e-5 | | CJRC | 4e-5 | 8e-5 | 4e-5 | | XNLI | 3e-5 | 5e-5 | 3e-5 | | ChnSentiCorp | 2e-5 | 5e-5 | 2e-5 | | LCQMC | 2e-5 | 3e-5 | 2e-5 | | BQ Corpus | 3e-5 | 5e-5 | 3e-5 | | THUCNews | 2e-5 | 5e-5 | 2e-5 | *代表所有wwm系列模型 (BERT-wwm, BERT-wwm-ext, RoBERTa-wwm-ext, RoBERTa-wwm-ext-large) **下面仅列举部分结果，完整结果请查看我们的[技术报告](https://arxiv.org/abs/1906.08101)。** - [**CMRC 2018**：篇章片段抽取型阅读理解（简体中文）](https://github.com/ymcui/cmrc2018) - [**DRCD**：篇章片段抽取型阅读理解（繁体中文）](https://github.com/DRCSolutionService/DRCD) - [**CJRC**: 法律阅读理解（简体中文）](http://cail.cipsc.org.cn) - [**XNLI**：自然语言推断](https://github.com/google-research/bert/blob/master/multilingual.md) - [**ChnSentiCorp**：情感分析](https://github.com/pengming617/bert_classification) - [**LCQMC**：句对匹配](http://icrc.hitsz.edu.cn/info/1037/1146.htm) - [**BQ Corpus**：句对匹配](http://icrc.hitsz.edu.cn/Article/show/175.html) - [**THUCNews**：篇章级文本分类](http://thuctc.thunlp.org) **注意：为了保证结果的可靠性，对于同一模型，我们运行10遍（不同随机种子），汇报模型性能的最大值和平均值（括号内为平均值）。不出意外，你运行的结果应该很大概率落在这个区间内。** **评测指标中，括号内表示平均值，括号外表示最大值。** ### 简体中文阅读理解：CMRC 2018 [**CMRC 2018数据集**](https://github.com/ymcui/cmrc2018)是哈工大讯飞联合实验室发布的中文机器阅读理解数据。根据给定问题，系统需要从篇章中抽取出片段作为答案，形式与SQuAD相同。评测指标为：EM / F1 | 模型 | 开发集 | 测试集 | 挑战集 | | :------- | :---------: | :---------: | :---------: | | BERT | 65.5 (64.4) / 84.5 (84.0) | 70.0 (68.7) / 87.0 (86.3) | 18.6 (17.0) / 43.3 (41.3) | | ERNIE | 65.4 (64.3) / 84.7 (84.2) | 69.4 (68.2) / 86.6 (86.1) | 19.6 (17.0) / 44.3 (42.8) | | **BERT-wwm** | 66.3 (65.0) / 85.6 (84.7) | 70.5 (69.1) / 87.4 (86.7) | 21.0 (19.3) / 47.0 (43.9) | | **BERT-wwm-ext** | 67.1 (65.6) / 85.7 (85.0) | 71.4 (70.0) / 87.7 (87.0) | 24.0 (20.0) / 47.3 (44.6) | | **RoBERTa-wwm-ext** | 67.4 (66.5) / 87.2 (86.5) | 72.6 (71.4) / 89.4 (88.8) | 26.2 (24.6) / 51.0 (49.1) | | **RoBERTa-wwm-ext-large** | **68.5 (67.6) / 88.4 (87.9)** | **74.2 (72.4) / 90.6 (90.0)** | **31.5 (30.1) / 60.1 (57.5)** | ### 繁体中文阅读理解：DRCD [**DRCD数据集**](https://github.com/DRCKnowledgeTeam/DRCD)由中国台湾台达研究院发布，其形式与SQuAD相同，是基于繁体中文的抽取式阅读理解数据集。 **由于ERNIE中去除了繁体中文字符，故不建议在繁体中文数据上使用ERNIE（或转换成简体中文后再处理）。** 评测指标为：EM / F1 | 模型 | 开发集 | 测试集 | | :------- | :---------: | :---------: | | BERT | 83.1 (82.7) / 89.9 (89.6) | 82.2 (81.6) / 89.2 (88.8) | | ERNIE | 73.2 (73.0) / 83.9 (83.8) | 71.9 (71.4) / 82.5 (82.3) | | **BERT-wwm** | 84.3 (83.4) / 90.5 (90.2) | 82.8 (81.8) / 89.7 (89.0) | | **BERT-wwm-ext** | 85.0 (84.5) / 91.2 (90.9) | 83.6 (83.0) / 90.4 (89.9) | | **RoBERTa-wwm-ext** | 86.6 (85.9) / 92.5 (92.2) | 85.6 (85.2) / 92.0 (91.7) | | **RoBERTa-wwm-ext-large** | **89.6 (89.1) / 94.8 (94.4)** | **89.6 (88.9) / 94.5 (94.1)** | ### 司法阅读理解：CJRC [**CJRC数据集**](http://cail.cipsc.org.cn)是哈工大讯飞联合实验室发布的面向**司法领域**的中文机器阅读理解数据。需要注意的是实验中使用的数据并非官方发布的最终数据，结果仅供参考。评测指标为：EM / F1 | 模型 | 开发集 | 测试集 | | :------- | :---------: | :---------: | | BERT | 54.6 (54.0) / 75.4 (74.5) | 55.1 (54.1) / 75.2 (74.3) | | ERNIE | 54.3 (53.9) / 75.3 (74.6) | 55.0 (53.9) / 75.0 (73.9) | | **BERT-wwm** | 54.7 (54.0) / 75.2 (74.8) | 55.1 (54.1) / 75.4 (74.4) | | **BERT-wwm-ext** | 55.6 (54.8) / 76.0 (75.3) | 55.6 (54.9) / 75.8 (75.0) | | **RoBERTa-wwm-ext** | 58.7 (57.6) / 79.1 (78.3) | 59.0 (57.8) / 79.0 (78.0) | | **RoBERTa-wwm-ext-large** | **62.1 (61.1) / 82.4 (81.6)** | **62.4 (61.4) / 82.2 (81.0)** | ### 自然语言推断：XNLI 在自然语言推断任务中，我们采用了[**XNLI**数据](https://github.com/google-research/bert/blob/master/multilingual.md)，需要将文本分成三个类别：`entailment`，`neutral`，`contradictory`。评测指标为：Accuracy | 模型 | 开发集 | 测试集 | | :------- | :---------: | :---------: | | BERT | 77.8 (77.4) | 77.8 (77.5) | | ERNIE | 79.7 (79.4) | 78.6 (78.2) | | **BERT-wwm** | 79.0 (78.4) | 78.2 (78.0) | | **BERT-wwm-ext** | 79.4 (78.6) | 78.7 (78.3) | | **RoBERTa-wwm-ext** | 80.0 (79.2) | 78.8 (78.3) | | **RoBERTa-wwm-ext-large** | **82.1 (81.3)** | **81.2 (80.6)** | ### 情感分析：ChnSentiCorp 在情感分析任务中，二分类的情感分类数据集ChnSentiCorp。评测指标为：Accuracy | 模型 | 开发集 | 测试集 | | :------- | :---------: | :---------: | | BERT | 94.7 (94.3) | 95.0 (94.7) | | ERNIE | 95.4 (94.8) | 95.4 **(95.3)** | | **BERT-wwm** | 95.1 (94.5) | 95.4 (95.0) | | **BERT-wwm-ext** | 95.4 (94.6) | 95.3 (94.7) | | **RoBERTa-wwm-ext** | 95.0 (94.6) | 95.6 (94.8) | | **RoBERTa-wwm-ext-large** | **95.8 (94.9)** | **95.8** (94.9) | ### 句对分类：LCQMC, BQ Corpus 以下两个数据集均需要将一个句对进行分类，判断两个句子的语义是否相同（二分类任务）。 #### LCQMC [LCQMC](http://icrc.hitsz.edu.cn/info/1037/1146.htm)由哈工大深圳研究生院智能计算研究中心发布。评测指标为：Accuracy | 模型 | 开发集 | 测试集 | | :------- | :---------: | :---------: | | BERT | 89.4 (88.4) | 86.9 (86.4) | | ERNIE | 89.8 (89.6) | **87.2 (87.0)** | | **BERT-wwm** | 89.4 (89.2) | 87.0 (86.8) | | **BERT-wwm-ext** | 89.6 (89.2) | 87.1 (86.6) | | **RoBERTa-wwm-ext** | 89.0 (88.7) | 86.4 (86.1) | | **RoBERTa-wwm-ext-large** | **90.4 (90.0)** | 87.0 (86.8) | #### BQ Corpus [BQ Corpus](http://icrc.hitsz.edu.cn/Article/show/175.html)由哈工大深圳研究生院智能计算研究中心发布，是面向银行领域的数据集。评测指标为：Accuracy | 模型 | 开发集 | 测试集 | | :------- | :---------: | :---------: | | BERT | 86.0 (85.5) | 84.8 (84.6) | | ERNIE | 86.3 (85.5) | 85.0 (84.6) | | **BERT-wwm** | 86.1 (85.6) | 85.2 **(84.9)** | | **BERT-wwm-ext** | **86.4** (85.5) | 85.3 (84.8) | | **RoBERTa-wwm-ext** | 86.0 (85.4) | 85.0 (84.6) | | **RoBERTa-wwm-ext-large** | 86.3 **(85.7)** | **85.8 (84.9)** | ### 篇章级文本分类：THUCNews 篇章级文本分类任务我们选用了由清华大学自然语言处理实验室发布的新闻数据集**THUCNews**。我们采用的是其中一个子集，需要将新闻分成10个类别中的一个。评测指标为：Accuracy | 模型 | 开发集 | 测试集 | | :------- | :---------: | :---------: | | BERT | 97.7 (97.4) | 97.8 (97.6) | | ERNIE | 97.6 (97.3) | 97.5 (97.3) | | **BERT-wwm** | 98.0 (97.6) | 97.8 (97.6) | | **BERT-wwm-ext** | 97.7 (97.5) | 97.7 (97.5) | | **RoBERTa-wwm-ext** | 98.3 (97.9) | 97.7 (97.5) | | **RoBERTa-wwm-ext-large** | 98.3 (97.7) | 97.8 (97.6) | ### 小参数量模型以下是在若干NLP任务上的实验效果，表中只提供测试集结果对比。 | 模型 | CMRC 2018 | DRCD | XNLI | CSC | LCQMC | BQ | 平均 | 参数量 | | :------- | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | | RoBERTa-wwm-ext-large | 74.2 / 90.6 | 89.6 / 94.5 | 81.2 | 95.8 | 87.0 | 85.8 | 87.335 | 325M | | RoBERTa-wwm-ext | 72.6 / 89.4 | 85.6 / 92.0 | 78.8 | 95.6 | 86.4 | 85.0 | 85.675 | 102M | | RBTL3 | 63.3 / 83.4 | 77.2 / 85.6 | 74.0 | 94.2 | 85.1 | 83.6 | 80.800 | 61M (59.8%) | | RBT3 | 62.2 / 81.8 | 75.0 / 83.9 | 72.3 | 92.8 | 85.1 | 83.3 | 79.550 | 38M (37.3%) | 效果相对值比较： | 模型 | CMRC 2018 | DRCD | XNLI | CSC | LCQMC | BQ | 平均 | 分类平均 | | :------- | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | | RoBERTa-wwm-ext-large | 102.2% / 101.3% | 104.7% / 102.7% | 103.0% | 100.2% | 100.7% | 100.9% | 101.9% | 101.2% | | RoBERTa-wwm-ext | 100% / 100% | 100% / 100% | 100% | 100% | 100% | 100% | 100% | 100% | | RBTL3 | 87.2% / 93.3% | 90.2% / 93.0% | 93.9% | 98.5% | 98.5% | 98.4% | 94.3% | 97.35% | | RBT3 | 85.7% / 91.5% | 87.6% / 91.2% | 91.8% | 97.1% | 98.5% | 98.0% | 92.9% | 96.35% | - 参数量是以XNLI分类任务为基准进行计算 - 括号内参数量百分比以原始base模型（即RoBERTa-wwm-ext）为基准 - RBT3：由RoBERTa-wwm-ext 3层进行初始化，继续训练了1M步 - RBTL3：由RoBERTa-wwm-ext-large 3层进行初始化，继续训练了1M步 - RBT的名字是RoBERTa三个音节首字母组成，L代表large模型 - 直接使用RoBERTa-wwm-ext-large前三层进行初始化并进行下游任务的训练将显著降低效果，例如在CMRC 2018上测试集仅能达到42.9/65.3，而RBTL3能达到63.3/83.4 欢迎使用效果更优的中文小型预训练模型MiniRBT：https://github.com/iflytek/MiniRBT ## 使用建议 * 初始学习率是非常重要的一个参数（不论是`BERT`还是其他模型），需要根据目标任务进行调整。 * `ERNIE`的最佳学习率和`BERT`/`BERT-wwm`相差较大，所以使用`ERNIE`时请务必调整学习率（基于以上实验结果，`ERNIE`需要的初始学习率较高）。 * 由于`BERT`/`BERT-wwm`使用了维基百科数据进行训练，故它们对正式文本建模较好；而`ERNIE`使用了额外的百度贴吧、知道等网络数据，它对非正式文本（例如微博等）建模有优势。 * 在长文本建模任务上，例如阅读理解、文档分类，`BERT`和`BERT-wwm`的效果较好。 * 如果目标任务的数据和预训练模型的领域相差较大，请在自己的数据集上进一步做预训练。 * 如果要处理繁体中文数据，请使用`BERT`或者`BERT-wwm`。因为我们发现`ERNIE`的词表中几乎没有繁体中文。 ## 英文模型下载为了方便大家下载，顺便带上**谷歌官方发布**的英文`BERT-large (wwm)`模型： * **[`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/bert_models/2019_05_30/wwm_uncased_L-24_H-1024_A-16.zip)**: 24-layer, 1024-hidden, 16-heads, 340M parameters * **[`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/bert_models/2019_05_30/wwm_cased_L-24_H-1024_A-16.zip)**: 24-layer, 1024-hidden, 16-heads, 340M parameters ## FAQ **Q: 这个模型怎么用？** A: 谷歌发布的中文BERT怎么用，这个就怎么用。 **文本不需要经过分词，wwm只影响预训练过程，不影响下游任务的输入。** **Q: 请问有预训练代码提供吗？** A: 很遗憾，我不能提供相关代码，实现可以参考 [#10](https://github.com/ymcui/Chinese-BERT-wwm/issues/10) 和 [#13](https://github.com/ymcui/Chinese-BERT-wwm/issues/13)。 **Q: 某某数据集在哪里下载？** A: 请查看`data`目录，任务目录下的`README.md`标明了数据来源。对于有版权的内容，请自行搜索或与原作者联系获取数据。 **Q: 会有计划发布更大模型吗？比如BERT-large-wwm版本？** A: 如果我们从实验中得到更好效果，会考虑发布更大的版本。 **Q: 你骗人！无法复现结果😂** A: 在下游任务中，我们采用了最简单的模型。比如分类任务，我们直接使用的是`run_classifier.py`（谷歌提供）。如果无法达到平均值，说明实验本身存在bug，请仔细排查。最高值存在很多随机因素，我们无法保证能够达到最高值。另外一个公认的因素：降低batch size会显著降低实验效果，具体可参考BERT，XLNet目录的相关Issue。 **Q: 我训出来比你更好的结果！** A: 恭喜你。 **Q: 训练花了多长时间，在什么设备上训练的？** A: 训练是在谷歌TPU v3版本（128G HBM）完成的，训练BERT-wwm花费约1.5天，BERT-wwm-ext则需要数周时间（使用了更多数据需要迭代更充分）。需要注意的是，预训练阶段我们使用的是`LAMB Optimizer`（[TensorFlow版本实现](https://github.com/ymcui/LAMB_Optimizer_TF)）。该优化器对大的batch有良好的支持。在微调下游任务时，我们采用的是BERT默认的`AdamWeightDecayOptimizer`。 **Q: ERNIE是谁？** A: 本项目中的ERNIE模型特指百度公司提出的[ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE)，而非清华大学在ACL 2019上发表的[ERNIE](https://github.com/thunlp/ERNIE)。 **Q: BERT-wwm的效果不是在所有任务都很好** A: 本项目的目的是为研究者提供多元化的预训练模型，自由选择BERT，ERNIE，或者是BERT-wwm。我们仅提供实验数据，具体效果如何还是得在自己的任务中不断尝试才能得出结论。多一个模型，多一种选择。 **Q: 为什么有些数据集上没有试？** A: 很坦率的说： 1）没精力找更多的数据； 2）没有必要； 3）没有钞票； **Q: 简单评价一下这几个模型** A: 各有侧重，各有千秋。中文自然语言处理的研究发展需要多方共同努力。 **Q: 你预测下一个预训练模型叫什么？** A: 可能叫ZOE吧，ZOE: Zero-shOt Embeddings from language model **Q: 更多关于`RoBERTa-wwm-ext`模型的细节？** A: 我们集成了RoBERTa和BERT-wwm的优点，对两者进行了一个自然的结合。和之前本目录中的模型之间的区别如下: 1）预训练阶段采用wwm策略进行mask（但没有使用dynamic masking） 2）简单取消Next Sentence Prediction（NSP）loss 3）不再采用先max_len=128然后再max_len=512的训练模式，直接训练max_len=512 4）训练步数适当延长需要注意的是，该模型并非原版RoBERTa模型，只是按照类似RoBERTa训练方式训练出的BERT模型，即RoBERTa-like BERT。故在下游任务使用、模型转换时请按BERT的方式处理，而非RoBERTa。 ## 引用如果本项目中的资源或技术对你的研究工作有所帮助，欢迎在论文中引用下述论文。 - 首选（期刊扩充版）：https://ieeexplore.ieee.org/document/9599397 ``` @journal{cui-etal-2021-pretrain, title={Pre-Training with Whole Word Masking for Chinese BERT}, author={Cui, Yiming and Che, Wanxiang and Liu, Ting and Qin, Bing and Yang, Ziqing}, journal={IEEE Transactions on Audio, Speech and Language Processing}, year={2021}, url={https://ieeexplore.ieee.org/document/9599397}, doi={10.1109/TASLP.2021.3124365}, } ``` - 或者（会议版本）：https://www.aclweb.org/anthology/2020.findings-emnlp.58 ``` @inproceedings{cui-etal-2020-revisiting, title = "Revisiting Pre-Trained Models for {C}hinese Natural Language Processing", author = "Cui, Yiming and Che, Wanxiang and Liu, Ting and Qin, Bing and Wang, Shijin and Hu, Guoping", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.58", pages = "657--668", } ``` ## 致谢第一作者部分受到[**谷歌TPU Research Cloud**](https://www.tensorflow.org/tfrc)计划资助。 ## 免责声明 **本项目并非谷歌官方发布的Chinese BERT-wwm模型。同时，本项目不是哈工大或科大讯飞的官方产品。** 技术报告中所呈现的实验结果仅表明在特定数据集和超参组合下的表现，并不能代表各个模型的本质。实验结果可能因随机数种子，计算设备而发生改变。 **该项目中的内容仅供技术研究参考，不作为任何结论性依据。使用者可以在许可证范围内任意使用该模型，但我们不对因使用该项目内容造成的直接或间接损失负责。** ## 关注我们欢迎关注微信公众号"**涌现志**"，了解最新的技术动态。 ![qrcode.png](https://ymcui.com/images/qrcode.jpg) ## 问题反馈如有问题，请在GitHub Issue中提交。

LLM Tools & Chat UIs ML Frameworks

10.2K Github Stars

Open Source

Chinese-LLaMA-Alpaca-3

[**🇨🇳中文**](./README.md) | [**🌐English**](./README_EN.md) | [**📖文档/Docs**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki) | [**❓提问/Issues**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/issues) | [**💬讨论/Discussions**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/discussions) | [**⚔️竞技场/Arena**](http://llm-arena.ymcui.com/) <img src="./pics/banner.png" width="800"/> <img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-LLaMA-Alpaca-3.svg?color=blue&style=flat-square"> <img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/ymcui/Chinese-LLaMA-Alpaca-3"> <img alt="GitHub top language" src="https://img.shields.io/github/languages/top/ymcui/Chinese-LLaMA-Alpaca-3"> <a href="https://app.codacy.com/gh/ymcui/Chinese-LLaMA-Alpaca-3/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/142d688425494644b5b156068f55370d"/></a> <a href="https://huggingface.co/hfl">🤗 Hugging Face</a> • <a href="https://modelscope.cn/profile/ChineseAlpacaGroup">🤖 ModelScope</a> • <a href="https://sota.jiqizhixin.com/project/chinese-llama-alpaca-3">🐿️ 机器之心SOTA!模型</a> • <a href="https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3">🟣 wisemodel</a> • <a href="https://huggingface.co/spaces/hfl-rc/llama-3-chinese-8b-instruct-demo">🤗 在线Demo</a> 本项目基于Meta最新发布的新一代开源大模型[Llama-3](https://github.com/facebookresearch/llama3)开发，是Chinese-LLaMA-Alpaca开源大模型相关系列项目（[一期](https://github.com/ymcui/Chinese-LLaMA-Alpaca)、[二期](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)）的第三期。本项目开源了**中文Llama-3基座模型和中文Llama-3-Instruct指令精调大模型**。这些模型在原版Llama-3的基础上使用了大规模中文数据进行增量预训练，并且使用精选指令数据进行精调，进一步提升了中文基础语义和指令理解能力，相比二代相关模型获得了显著性能提升。 #### 主要内容 - 🚀 开源Llama-3-Chinese基座模型和Llama-3-Chinese-Instruct指令模型（v1, v2, v3） - 🚀 开源了预训练脚本、指令精调脚本，用户可根据需要进一步训练或微调模型 - 🚀 开源了alpaca_zh_51k, stem_zh_instruction, ruozhiba_gpt4 (4o/4T) 指令精调数据 - 🚀 提供了利用个人电脑CPU/GPU快速在本地进行大模型量化和部署的教程 - 🚀 支持[🤗transformers](https://github.com/huggingface/transformers), [llama.cpp](https://github.com/ggerganov/llama.cpp), [text-generation-webui](https://github.com/oobabooga/text-generation-webui), [vLLM](https://github.com/vllm-project/vllm), [Ollama](https://ollama.com)等Llama-3生态 ---- [中文Mixtral大模型](https://github.com/ymcui/Chinese-Mixtral) | [中文LLaMA-2&Alpaca-2大模型](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | [中文LLaMA&Alpaca大模型](https://github.com/ymcui/Chinese-LLaMA-Alpaca) | [多模态中文LLaMA&Alpaca大模型](https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca) | [多模态VLE](https://github.com/iflytek/VLE) | [中文MiniRBT](https://github.com/iflytek/MiniRBT) | [中文LERT](https://github.com/ymcui/LERT) | [中英文PERT](https://github.com/ymcui/PERT) | [中文MacBERT](https://github.com/ymcui/MacBERT) | [中文ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [中文XLNet](https://github.com/ymcui/Chinese-XLNet) | [中文BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) | [模型裁剪工具TextPruner](https://github.com/airaria/TextPruner) | [蒸馏裁剪一体化GRAIN](https://github.com/airaria/GRAIN) ## 新闻 **[2024/05/30] 发布Llama-3-Chinese-8B-Instruct-v3版指令模型，相比v1/v2在下游任务上获得显著提升。详情查看：[📚v3.0版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/releases/tag/v3.0)** [2024/05/08] 发布Llama-3-Chinese-8B-Instruct-v2版指令模型，直接采用500万条指令数据在 [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) 上进行精调。详情查看：[📚v2.0版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/releases/tag/v2.0) [2024/05/07] 添加预训练脚本、指令精调脚本。详情查看：[📚v1.1版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/releases/tag/v1.1) [2024/04/30] 发布Llama-3-Chinese-8B基座模型和Llama-3-Chinese-8B-Instruct指令模型。详情查看：[📚v1.0版本发布日志](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/releases/tag/v1.0) [2024/04/19] 🚀 正式启动Chinese-LLaMA-Alpaca-3项目 ## 内容导引 | 章节 | 描述 | | ------------------------------------- | ------------------------------------------------------------ | | [💁🏻‍♂️模型简介](#模型简介) | 简要介绍本项目相关模型的技术特点 | | [⏬模型下载](#模型下载) | 中文Llama-3大模型下载地址 | | [💻推理与部署](#推理与部署) | 介绍了如何对模型进行量化并使用个人电脑部署并体验大模型 | | [💯模型效果](#模型效果) | 介绍了模型在部分任务上的效果 | | [📝训练与精调](#训练与精调) | 介绍了如何训练和精调中文Llama-3大模型 | | [❓常见问题](#常见问题) | 一些常见问题的回复 | ## 模型简介本项目推出了基于Meta Llama-3的中文开源大模型Llama-3-Chinese以及Llama-3-Chinese-Instruct。主要特点如下： #### 📖 使用原版Llama-3词表 - Llama-3相比其前两代显著扩充了词表大小，由32K扩充至128K，并且改为BPE词表 - 初步实验发现Llama-3词表的编码效率与我们扩充词表的[中文LLaMA-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2)相当，效率约为中文LLaMA-2词表的95%（基于维基百科数据上的编码效率测试） - 结合我们在[中文Mixtral](https://github.com/ymcui/Chinese-Mixtral)上的相关经验及实验结论[^1]，我们**并未对词表进行额外扩充** [^1]: [Cui and Yao, 2024. Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral](https://arxiv.org/abs/2403.01851) #### 🚄 长上下文长度由二代4K扩展至8K - Llama-3将原生上下文窗口长度从4K提升至8K，能够进一步处理更长的上下文信息 - 用户也可通过PI、NTK、YaRN等方法对模型进行长上下文的扩展，以支持更长文本的处理 #### ⚡ 使用分组查询注意力机制 - Llama-3采用了Llama-2中大参数量版本应用的分组查询注意力（GQA）机制，能够进一步提升模型的效率 #### 🗒 全新的指令模板 - Llama-3-Instruct采用了全新的指令模板，与Llama-2-chat不兼容，使用时应遵循官方指令模板（见[指令模板](#指令模板)） ## 模型下载 ### 模型选择指引以下是本项目的模型对比以及建议使用场景。**如需聊天交互，请选择Instruct版。** | 对比项 | Llama-3-Chinese-8B | Llama-3-Chinese-8B-Instruct | | :-------------------- | :----------------------------------------------------: | :----------------------------------------------------------: | | 模型类型 | 基座模型 | 指令/Chat模型（类ChatGPT） | | 模型大小 | 8B | 8B | | 训练类型 | Causal-LM (CLM) | 指令精调 | | 训练方式 | LoRA + 全量emb/lm-head | LoRA + 全量emb/lm-head | | 初始化模型 | [原版Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | v1: Llama-3-Chinese-8B v2: [原版Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) v3: mix of inst/inst-v2/inst-meta | | 训练语料 | 无标注通用语料（约120GB） | 有标注指令数据（约500万条） | | 词表大小 | 原版词表（128,256） | 原版词表（128,256） | | 支持上下文长度 | 8K | 8K | | 输入模板 | 不需要 | 需要套用Llama-3-Instruct模板 | | 适用场景 | 文本续写：给定上文，让模型生成下文 | 指令理解：问答、写作、聊天、交互等 | 以下是Instruct版本之间的对比，**如无明确偏好，请优先使用Instruct-v3版本。** | 对比项 | Instruct-v1 | Instruct-v2 | Instruct-v3 | | :-------------------- | :----------------------------------------------------: | :----------------------------------------------------------: | :-------------------: | | 发布时间 | 2024/4/30 | 2024/5/8 | 2024/5/30 | | 基模型 | [原版Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | [原版Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | （见训练方式） | | 训练方式 | 第一阶段：120G中文语料预训练 第二阶段：500万指令数据精调 | 直接使用500万指令数据精调 | 使用inst-v1, inst-v2, inst-meta进行模型融合，并经过少量指令数据（~5K条）的精调得到 | | 中文能力[1] | 49.3 / 51.5 | 51.6 / 51.6 | **55.2 / 54.8** 👍🏻 | | 英文能力[1] | 63.21 | 66.68 | **66.81** 👍🏻 | | 长文本能力[1] | 29.6 | **46.4** 👍🏻 | 40.5 | | 大模型竞技场胜率 / Elo评分[2] | 49.4% / 1430 | 66.1% / 1559 | **83.6% / 1627** 👍🏻 | > [!NOTE] > [1] 中文能力效果来自C-Eval (valid)；英文能力效果来自Open LLM Leaderboard (avg)；长文本能力来自LongBench (avg)；详细效果请参阅[💯模型效果](#模型效果)一节。 > [2] 大模型竞技场效果获取时间：2024/5/30，仅供参考。 ### 下载地址 | 模型名称 | 完整版 | LoRA版 | GGUF版 | | :------------------------ | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | | **Llama-3-Chinese-8B-Instruct-v3** (指令模型) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-instruct-v3) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3) [[🟣wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3) | N/A | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-instruct-v3-gguf) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v3-gguf) | | **Llama-3-Chinese-8B-Instruct-v2** (指令模型) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-instruct-v2) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v2) [[🟣wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v2) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-instruct-v2-lora) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v2-lora) [[🟣wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v2-lora) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-instruct-v2-gguf) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-v2-gguf) | | **Llama-3-Chinese-8B-Instruct** (指令模型) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-instruct) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct) [[🟣wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-instruct-lora) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-lora) [[🟣wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-lora) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-instruct-gguf) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct-gguf) | | **Llama-3-Chinese-8B** (基座模型) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b) [[🟣wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-lora) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-lora) [[🟣wisemodel]](https://wisemodel.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-lora) | [[🤗Hugging Face]](https://huggingface.co/hfl/llama-3-chinese-8b-gguf) [[🤖ModelScope]](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-gguf) | 模型类型说明： - **完整模型**：可直接用于训练和推理，无需其他合并步骤 - **LoRA模型**：需要与基模型合并并才能转为完整版模型，合并方法：[**💻 模型合并步骤**](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/model_conversion_zh) - v1基模型：原版[Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) - v2基模型：原版[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) - **GGUF模型**：[llama.cpp](https://github.com/ggerganov/llama.cpp)推出的量化格式，适配ollama等常见推理工具，推荐只需要做推理部署的用户下载；模型名后缀为`-im`表示使用了importance matrix进行量化，通常具有更低的PPL，建议使用（用法与常规版相同） > [!NOTE] > 若无法访问HF，可考虑一些镜像站点（如hf-mirror.com），具体方法请自行查找解决。 ## 推理与部署本项目中的相关模型主要支持以下量化、推理和部署方式，具体内容请参考对应教程。 | 工具 | 特点 | CPU | GPU | 量化 | GUI | API | vLLM | 教程 | | :----------------------------------------------------------- | ---------------------------- | :--: | :--: | :--: | :--: | :--: | :--: |:--: | | [llama.cpp](https://github.com/ggerganov/llama.cpp) | 丰富的GGUF量化选项和高效本地推理 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | [[link]](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/llamacpp_zh) | | [🤗transformers](https://github.com/huggingface/transformers) | 原生transformers推理接口 | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | [[link]](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/inference_with_transformers_zh) | | [仿OpenAI API调用](https://platform.openai.com/docs/api-reference) | 仿OpenAI API接口的服务器Demo | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | [[link]](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/openai_api_zh) | | [text-generation-webui](https://github.com/oobabooga/text-generation-webui) | 前端Web UI界面的部署方式 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | [[link]](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/text-generation-webui_zh) | | [LM Studio](https://lmstudio.ai) | 多平台聊天软件（带界面） | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | [[link]](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/lmstudio_zh) | | [Ollama](https://github.com/ollama/ollama) | 本地运行大模型推理 | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | [[link]](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/ollama_zh) | ## 模型效果为了评测相关模型的效果，本项目分别进行了生成效果评测和客观效果评测（NLU类），从不同角度对大模型进行评估。推荐用户在自己关注的任务上进行测试，选择适配相关任务的模型。 ### 生成效果评测 - 本项目仿照[Fastchat Chatbot Arena](https://chat.lmsys.org/?arena)推出了模型在线对战平台，可浏览和评测模型回复质量。对战平台提供了胜率、Elo评分等评测指标，并且可以查看两两模型的对战胜率等结果。**⚔️ 模型竞技场：[http://llm-arena.ymcui.com](http://llm-arena.ymcui.com/)** - examples目录中提供了Llama-3-Chinese-8B-Instruct和Chinese-Mixtral-Instruct的输出样例，并通过GPT-4-turbo进行了打分对比，**Llama-3-Chinese-8B-Instruct平均得分为8.1、Chinese-Mixtral-Instruct平均得分为7.8**。**📄 输出样例对比：[examples](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/blob/main/examples)** - 本项目已入驻机器之心SOTA!模型平台，后期将实现在线体验：https://sota.jiqizhixin.com/project/chinese-llama-alpaca-3 ### 客观效果评测 #### C-Eval [C-Eval](https://cevalbenchmark.com)是一个全面的中文基础模型评估套件，其中验证集和测试集分别包含1.3K和12.3K个选择题，涵盖52个学科。C-Eval推理代码请参考本项目：[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/ceval_zh) | Models | Valid (0-shot) | Valid (5-shot) | Test (0-shot) | Test (5-shot) | | ------------------------ | :-----------: | :-----------: | :-----------: | :-----------: | | **Llama-3-Chinese-8B-Instruct-v3** | 55.2 | 54.8 | 52.1 | 52.4 | | **Llama-3-Chinese-8B-Instruct-v2** | 51.6 | 51.6 | 49.7 | 49.8 | | **Llama-3-Chinese-8B-Instruct** | 49.3 | 51.5 | 48.3 | 49.4 | | **Llama-3-Chinese-8B** | 47.0 | 50.5 | 46.1 | 49.0 | | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | 51.3 | 51.3 | 49.5 | 51.0 | | [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | 49.3 | 51.2 | 46.1 | 49.4 | | [Chinese-Mixtral-Instruct](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 51.7 | 55.0 | 50.0 | 51.5 | | [Chinese-Mixtral](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 45.8 | 54.2 | 43.1 | 49.1 | | [Chinese-Alpaca-2-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 44.3 | 45.9 | 42.6 | 44.0 | | [Chinese-LLaMA-2-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 40.6 | 42.7 | 38.0 | 41.6 | #### CMMLU [CMMLU](https://github.com/haonan-li/CMMLU)是另一个综合性中文评测数据集，专门用于评估语言模型在中文语境下的知识和推理能力，涵盖了从基础学科到高级专业水平的67个主题，共计11.5K个选择题。CMMLU推理代码请参考本项目：[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/cmmlu_zh) | Models | Test (0-shot) | Test (5-shot) | | ------------------------ | :-----------: | :-----------: | | **Llama-3-Chinese-8B-Instruct-v3** | 54.4 | 54.8 | | **Llama-3-Chinese-8B-Instruct-v2** | 51.8 | 52.4 | | **Llama-3-Chinese-8B-Instruct** | 49.7 | 51.5 | | **Llama-3-Chinese-8B** | 48.0 | 50.9 | | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | 53.0 | 53.5 | | [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | 47.8 | 50.8 | | [Chinese-Mixtral-Instruct](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 50.0 | 53.0 | | [Chinese-Mixtral](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 42.5 | 51.0 | | [Chinese-Alpaca-2-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 43.2 | 45.5 | | [Chinese-LLaMA-2-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 38.9 | 42.5 | #### MMLU [MMLU](https://github.com/hendrycks/test)是一个用于评测自然语言理解能力的英文评测数据集，是当今用于评测大模型能力的主要数据集之一，其中验证集和测试集分别包含1.5K和14.1K个选择题，涵盖57个学科。MMLU推理代码请参考本项目：[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/mmlu_zh) | Models | Valid (0-shot) | Valid (5-shot) | Test (0-shot) | Test (5-shot) | | ------------------------ | :-----------: | :-----------: | :-----------: | :-----------: | | **Llama-3-Chinese-8B-Instruct-v3** | 64.7 | 65.0 | 64.8 | 65.9 | | **Llama-3-Chinese-8B-Instruct-v2** | 62.1 | 63.9 | 62.6 | 63.7 | | **Llama-3-Chinese-8B-Instruct** | 60.1 | 61.3 | 59.8 | 61.8 | | **Llama-3-Chinese-8B** | 55.5 | 58.5 | 57.3 | 61.1 | | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | 63.4 | 64.8 | 65.1 | 66.4 | | [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | 58.6 | 62.5 | 60.5 | 65.0 | | [Chinese-Mixtral-Instruct](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 65.1 | 69.6 | 67.5 | 69.8 | | [Chinese-Mixtral](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 63.2 | 67.1 | 65.5 | 68.3 | | [Chinese-Alpaca-2-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 49.6 | 53.2 | 50.9 | 53.5 | | [Chinese-LLaMA-2-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 46.8 | 50.0 | 46.6 | 51.8 | #### LongBench [LongBench](https://github.com/THUDM/LongBench)是一个大模型长文本理解能力的评测基准，由6大类、20个不同的任务组成，多数任务的平均长度在5K-15K之间，共包含约4.75K条测试数据。以下是本项目模型在该中文任务（含代码任务）上的评测效果。LongBench推理代码请参考本项目：[📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/longbench_zh) | Models | 单文档QA | 多文档QA | 摘要 | FS学习 | 代码 | 合成 | 平均 | | ------------------------------------------------------------ | :------: | :------: | :--: | :----: | :--: | :--: | :--: | | **Llama-3-Chinese-8B-Instruct-v3** | 20.3 | 28.8 | 24.5 | 28.1 | 59.4 | 91.9 | 40.5 | | **Llama-3-Chinese-8B-Instruct-v2** | 57.3 | 27.1 | 13.9 | 30.3 | 60.6 | 89.5 | 46.4 | | **Llama-3-Chinese-8B-Instruct** | 44.1 | 24.0 | 12.4 | 33.5 | 51.8 | 11.5 | 29.6 | | **Llama-3-Chinese-8B** | 16.4 | 19.3 | 4.3 | 28.7 | 14.3 | 4.6 | 14.6 | | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | 55.1 | 15.1 | 0.1 | 24.0 | 51.3 | 94.5 | 40.0 | | [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | 21.2 | 22.9 | 2.7 | 35.8 | 65.9 | 40.8 | 31.6 | | [Chinese-Mixtral-Instruct](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 50.3 | 34.2 | 16.4 | 42.0 | 56.1 | 89.5 | 48.1 | | [Chinese-Mixtral](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 32.0 | 23.7 | 0.4 | 42.5 | 27.4 | 14.0 | 23.3 | | [Chinese-Alpaca-2-13B-16K](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 47.9 | 26.7 | 13.0 | 22.3 | 46.6 | 21.5 | 29.7 | | [Chinese-LLaMA-2-13B-16K](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 36.7 | 17.7 | 3.1 | 29.8 | 13.8 | 3.0 | 17.3 | | [Chinese-Alpaca-2-7B-64K](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 44.7 | 28.1 | 14.4 | 39.0 | 44.6 | 5.0 | 29.3 | | [Chinese-LLaMA-2-7B-64K](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | 27.2 | 16.4 | 6.5 | 33.0 | 7.8 | 5.0 | 16.0 | ### Open LLM Leaderboard [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)是由HuggingFaceH4团队发起的大模型综合能力评测基准（英文），包含ARC、HellaSwag、MMLU、TruthfulQA、Winograde、GSM8K等6个单项测试。以下是本项目模型在该榜单上的评测效果。 | Models | ARC | HellaS | MMLU | TQA | WinoG | GSM8K | 平均 | | ------------------------------------------------------------ | :---: | :----: | :---: | :---: | :---: | :---: | :---: | | **Llama-3-Chinese-8B-Instruct-v3** | 63.40 | 80.51 | 67.90 | 53.57 | 76.24 | 59.21 | 66.81 | | **Llama-3-Chinese-8B-Instruct-v2** | 62.63 | 79.72 | 66.48 | 53.93 | 76.72 | 60.58 | 66.68 | | **Llama-3-Chinese-8B-Instruct** | 61.26 | 80.24 | 63.10 | 55.15 | 75.06 | 44.43 | 63.21 | | **Llama-3-Chinese-8B** | 55.88 | 79.53 | 63.70 | 41.14 | 77.03 | 37.98 | 59.21 | | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | 60.75 | 78.55 | 67.07 | 51.65 | 74.51 | 68.69 | 66.87 | | [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | 59.47 | 82.09 | 66.69 | 43.90 | 77.35 | 45.79 | 62.55 | | [Chinese-Mixtral-Instruct](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 67.75 | 85.67 | 71.53 | 57.46 | 83.11 | 55.65 | 70.19 | | [Chinese-Mixtral](https://github.com/ymcui/Chinese-Mixtral) (8x7B) | 67.58 | 85.34 | 70.38 | 46.86 | 82.00 | 0.00 | 58.69 | *注：MMLU结果与不同的主要原因是评测脚本不同导致。* ### 量化效果评测在llama.cpp下，测试了Llama-3-Chinese-8B（基座模型）的量化性能，如下表所示。实测速度相比二代Llama-2-7B略慢。 | | F16 | Q8_0 | Q6_K | Q5_K | Q5_0 | Q4_K | Q4_0 | Q3_K | Q2_K | | ------------- | ----: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | -----: | | **Size (GB)** | 14.97 | 7.95 | 6.14 | 5.34 | 5.21 | 4.58 | 4.34 | 3.74 | 2.96 | | **BPW** | 16.00 | 8.50 | 6.56 | 5.70 | 5.57 | 4.89 | 4.64 | 4.00 | 3.16 | | **PPL** | 5.130 | 5.135 | 5.148 | 5.181 | 5.222 | 5.312 | 5.549 | 5.755 | 11.859 | | **PP Speed** | 5.99 | 6.10 | 7.17 | 7.34 | 6.65 | 6.38 | 6.00 | 6.85 | 6.43 | | **TG Speed** | 44.03 | 26.08 | 21.61 | 22.33 | 20.93 | 18.93 | 17.09 | 22.50 | 19.21 | > [!NOTE] > > - 模型大小：单位GB > - BPW（Bits-Per-Weight）：单位参数比特，例如Q8_0实际平均精度为8.50 > - PPL（困惑度）：以8K上下文测量（原生支持长度），数值越低越好 > - PP/TG速度：提供了Apple M3 Max（Metal）的指令处理（PP）和文本生成（TG）速度，单位ms/token，数值越低越快 ## 训练与精调 ### 手动训练与精调 - 使用无标注数据进行预训练：[📖预训练脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/pt_scripts_zh) - 使用有标注数据进行指令精调：[📖指令精调脚本Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/sft_scripts_zh) ### 指令模板本项目Llama-3-Chinese-Instruct沿用原版Llama-3-Instruct的指令模板。以下是一组对话示例： > <|begin_of_text|><|start_header_id|>system<|end_header_id|> > > You are a helpful assistant. 你是一个乐于助人的助手。<|eot_id|><|start_header_id|>user<|end_header_id|> > > 你好<|eot_id|><|start_header_id|>assistant<|end_header_id|> > > 你好！有什么可以帮助你的吗？<|eot_id|> ### 指令数据以下是本项目开源的部分指令数据。详情请查看：[📚 指令数据](./data) | 数据名称 | 说明 | 数量 | | ------------------------------------------------------------ | :----------------------------------------------------------- | :--: | | [alpaca_zh_51k](https://huggingface.co/datasets/hfl/alpaca_zh_51k) | 使用gpt-3.5翻译的Alpaca数据 | 51K | | [stem_zh_instruction](https://huggingface.co/datasets/hfl/stem_zh_instruction) | 使用gpt-3.5爬取的STEM数据，包含物理、化学、医学、生物学、地球科学 | 256K | | [ruozhiba_gpt4](https://huggingface.co/datasets/hfl/ruozhiba_gpt4) | 使用GPT-4o和GPT-4T获取的ruozhiba问答数据 | 2449 | ## 常见问题请在提交Issue前务必先查看FAQ中是否已存在解决方案。具体问题和解答请参考本项目 [📖GitHub Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/faq_zh) ``` 问题1：为什么没有像一期、二期项目一样做词表扩充？问题2：会有70B版本发布吗？问题3：为什么指令模型不叫Alpaca了？问题4：本仓库模型能否商用？问题5：为什么不对模型做全量预训练而是用LoRA？问题6：为什么Llama-3-Chinese对话效果不好？问题7：为什么指令模型会回复说自己是ChatGPT？问题8：Instruct模型的v1（原版）和v2有什么区别？ ``` ## 引用如果您使用了本项目的相关资源，请参考引用本项目的技术报告：https://arxiv.org/abs/2304.08177 ``` @article{chinese-llama-alpaca, title={Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca}, author={Cui, Yiming and Yang, Ziqing and Yao, Xin}, journal={arXiv preprint arXiv:2304.08177}, url={https://arxiv.org/abs/2304.08177}, year={2023} } ``` 针对是否扩充词表的分析，可参考引用：https://arxiv.org/abs/2403.01851 ``` @article{chinese-mixtral, title={Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral}, author={Cui, Yiming and Yao, Xin}, journal={arXiv preprint arXiv:2403.01851}, url={https://arxiv.org/abs/2403.01851}, year={2024} } ``` ## 免责声明本项目基于由Meta发布的Llama-3模型进行开发，使用过程中请严格遵守Llama-3的[开源许可协议](https://github.com/meta-llama/llama3/blob/main/LICENSE)。如果涉及使用第三方代码，请务必遵从相关的开源许可协议。模型生成的内容可能会因为计算方法、随机因素以及量化精度损失等影响其准确性，因此，本项目不对模型输出的准确性提供任何保证，也不会对任何因使用相关资源和输出结果产生的损失承担责任。如果将本项目的相关模型用于商业用途，开发者应遵守当地的法律法规，确保模型输出内容的合规性，本项目不对任何由此衍生的产品或服务承担责任。 ## 问题反馈如有疑问，请在GitHub Issue中提交。礼貌地提出问题，构建和谐的讨论社区。 - 在提交问题之前，请先查看FAQ能否解决问题，同时建议查阅以往的issue是否能解决你的问题。 - 提交问题请使用本项目设置的Issue模板，以帮助快速定位具体问题。 - 重复以及与本项目无关的issue会被[stable-bot](https://github.com/marketplace/stale)处理，敬请谅解。 ## 关注我们欢迎关注微信公众号"**涌现志**"，了解最新的技术动态。 ![qrcode.png](https://ymcui.com/images/qrcode.jpg)

AI & Machine Learning LLM Tools & Chat UIs ML Frameworks

2K Github Stars

Open Source

Chinese-XLNet

[**中文说明**](./README.md) | [**English**](./README_EN.md) <img src="./pics/banner.png" width="500"/> <a href="https://github.com/ymcui/Chinese-PreTrained-XLNet/blob/master/LICENSE"> <img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-PreTrained-XLNet.svg?color=blue&style=flat-square"> </a> 本项目提供了面向中文的XLNet预训练模型，旨在丰富中文自然语言处理资源，提供多元化的中文预训练模型选择。我们欢迎各位专家学者下载使用，并共同促进和发展中文资源建设。本项目基于CMU/谷歌官方的XLNet：https://github.com/zihangdai/xlnet ---- [中文LERT](https://github.com/ymcui/LERT) | [中英文PERT](https://github.com/ymcui/PERT) | [中文MacBERT](https://github.com/ymcui/MacBERT) | [中文ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [中文XLNet](https://github.com/ymcui/Chinese-XLNet) | [中文BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) | [模型裁剪工具TextPruner](https://github.com/airaria/TextPruner) 查看更多哈工大讯飞联合实验室（HFL）发布的资源：https://github.com/ymcui/HFL-Anthology ## 新闻 **2023/3/28 开源了中文LLaMA&Alpaca大模型，可快速在PC上部署体验，查看：https://github.com/ymcui/Chinese-LLaMA-Alpaca** 2022/10/29 我们提出了一种融合语言学信息的预训练模型LERT。查看：https://github.com/ymcui/LERT 2022/3/30 我们开源了一种新预训练模型PERT。查看：https://github.com/ymcui/PERT 2021/12/17 哈工大讯飞联合实验室推出模型裁剪工具包TextPruner。查看：https://github.com/airaria/TextPruner 2021/10/24 哈工大讯飞联合实验室发布面向少数民族语言的预训练模型CINO。查看：https://github.com/ymcui/Chinese-Minority-PLM 2021/7/21 由哈工大SCIR多位学者撰写的[《自然语言处理：基于预训练模型的方法》](https://item.jd.com/13344628.html)已出版，欢迎大家选购。 2021/1/27 所有模型已支持TensorFlow 2，请通过transformers库进行调用或下载。https://huggingface.co/hfl <details> <summary>历史新闻</summary> 2020/9/15 我们的论文["Revisiting Pre-Trained Models for Chinese Natural Language Processing"](https://arxiv.org/abs/2004.13922)被[Findings of EMNLP](https://2020.emnlp.org)录用为长文。 2020/8/27 哈工大讯飞联合实验室在通用自然语言理解评测GLUE中荣登榜首，查看[GLUE榜单](https://gluebenchmark.com/leaderboard)，[新闻](http://dwz.date/ckrD)。 2020/3/11 为了更好地了解需求，邀请您填写[调查问卷](https://wj.qq.com/s2/5637766/6281)，以便为大家提供更好的资源。 2020/2/26 哈工大讯飞联合实验室发布[知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) 2019/12/19 本目录发布的模型已接入[Huggingface-Transformers](https://github.com/huggingface/transformers)，查看[快速加载](#快速加载) 2019/9/5 `XLNet-base`已可下载，查看[模型下载](#模型下载) 2019/8/19 提供了在大规模通用语料（5.4B词数）上训练的中文`XLNet-mid`模型，查看[模型下载](#模型下载) </details> ## 内容导引 | 章节 | 描述 | |-|-| | [模型下载](#模型下载) | 提供了中文预训练XLNet下载地址 | | [基线系统效果](#基线系统效果) | 列举了部分基线系统效果 | | [预训练细节](#预训练细节) | 预训练细节的相关描述 | | [下游任务微调细节](#下游任务微调细节) | 下游任务微调细节的相关描述 | | [FAQ](#faq) | 常见问题答疑 | | [引用](#引用) | 本目录的技术报告 | ## 模型下载 * **`XLNet-mid`**：24-layer, 768-hidden, 12-heads, 209M parameters * **`XLNet-base`**：12-layer, 768-hidden, 12-heads, 117M parameters | 模型简称 | 语料 | 🤗HF | 百度网盘下载 | | :------- | :--------- | :---------: | :---------: | | **`XLNet-mid, Chinese`** | **中文维基+ 通用数据[1]** | **[PyTorch](https://huggingface.co/hfl/chinese-xlnet-mid)** | **[TensorFlow（密码2jv2）](https://pan.baidu.com/s/1bWEhc5gJ-ZMH6SO4m4GVyw?pwd=2jv2)** | | **`XLNet-base, Chinese`** | **中文维基+ 通用数据[1]** | **[PyTorch](https://huggingface.co/hfl/chinese-xlnet-base)** | **[TensorFlow（密码ge7w）](https://pan.baidu.com/s/14KNb5KMvixKACEzgdd4Ntg?pwd=ge7w)** | > [1] 通用数据包括：百科、新闻、问答等数据，总词数达5.4B，与我们发布的[BERT-wwm-ext](https://github.com/ymcui/Chinese-BERT-wwm)训练语料相同。 ### PyTorch版本如需PyTorch版本， 1）请自行通过[🤗Transformers](https://github.com/huggingface/transformers)提供的转换脚本进行转换。 2）或者通过huggingface官网直接下载PyTorch版权重：https://huggingface.co/hfl 方法：点击任意需要下载的model → 拉到最下方点击"List all files in model" → 在弹出的小框中下载bin和json文件。 ### 使用说明中国大陆境内建议使用百度网盘下载点，境外用户建议使用谷歌下载点，`XLNet-mid`模型文件大小约**800M**。以TensorFlow版`XLNet-mid, Chinese`为例，下载完毕后对zip文件进行解压得到： ``` chinese_xlnet_mid_L-24_H-768_A-12.zip |- xlnet_model.ckpt # 模型权重 |- xlnet_model.meta # 模型meta信息 |- xlnet_model.index # 模型index信息 |- xlnet_config.json # 模型参数 |- spiece.model # 词表 ``` ### 快速加载依托于[Huggingface-Transformers 2.2.2](https://github.com/huggingface/transformers)，可轻松调用以上模型。 ``` tokenizer = AutoTokenizer.from_pretrained("MODEL_NAME") model = AutoModel.from_pretrained("MODEL_NAME") ``` 其中`MODEL_NAME`对应列表如下： | 模型名 | MODEL_NAME | | - | - | | XLNet-mid | hfl/chinese-xlnet-mid | | XLNet-base | hfl/chinese-xlnet-base | ## 基线系统效果为了对比基线效果，我们在以下几个中文数据集上进行了测试。对比了中文BERT、BERT-wwm、BERT-wwm-ext以及XLNet-base、XLNet-mid。其中中文BERT、BERT-wwm、BERT-wwm-ext结果取自[中文BERT-wwm项目](https://github.com/ymcui/Chinese-BERT-wwm)。时间及精力有限，并未能覆盖更多类别的任务，请大家自行尝试。 **注意：为了保证结果的可靠性，对于同一模型，我们运行10遍（不同随机种子），汇报模型性能的最大值和平均值。不出意外，你运行的结果应该很大概率落在这个区间内。** **评测指标中，括号内表示平均值，括号外表示最大值。** ### 简体中文阅读理解：CMRC 2018 **[CMRC 2018数据集](https://github.com/ymcui/cmrc2018)**是哈工大讯飞联合实验室发布的中文机器阅读理解数据。根据给定问题，系统需要从篇章中抽取出片段作为答案，形式与SQuAD相同。评测指标为：EM / F1 | 模型 | 开发集 | 测试集 | 挑战集 | | :------- | :---------: | :---------: | :---------: | | BERT | 65.5 (64.4) / 84.5 (84.0) | 70.0 (68.7) / 87.0 (86.3) | 18.6 (17.0) / 43.3 (41.3) | | BERT-wwm | 66.3 (65.0) / 85.6 (84.7) | 70.5 (69.1) / 87.4 (86.7) | 21.0 (19.3) / 47.0 (43.9) | | BERT-wwm-ext | **67.1** (65.6) / 85.7 (85.0) | **71.4 (70.0)** / 87.7 (87.0) | 24.0 (20.0) / 47.3 (44.6) | | **XLNet-base** | 65.2 (63.0) / 86.9 (85.9) | 67.0 (65.8) / 87.2 (86.8) | 25.0 (22.7) / 51.3 (49.5) | | **XLNet-mid** | 66.8 **(66.3) / 88.4 (88.1)** | 69.3 (68.5) / **89.2 (88.8)** | **29.1 (27.1) / 55.8 (54.9)** | ### 繁体中文阅读理解：DRCD **[DRCD数据集](https://github.com/DRCKnowledgeTeam/DRCD)**由中国台湾台达研究院发布，其形式与SQuAD相同，是基于繁体中文的抽取式阅读理解数据集。评测指标为：EM / F1 | 模型 | 开发集 | 测试集 | | :------- | :---------: | :---------: | | BERT | 83.1 (82.7) / 89.9 (89.6) | 82.2 (81.6) / 89.2 (88.8) | | BERT-wwm | 84.3 (83.4) / 90.5 (90.2) | 82.8 (81.8) / 89.7 (89.0) | | BERT-wwm-ext | 85.0 (84.5) / 91.2 (90.9) | 83.6 (83.0) / 90.4 (89.9) | | **XLNet-base** | 83.8 (83.2) / 92.3 (92.0) | 83.5 (82.8) / 92.2 (91.8) | | **XLNet-mid** | **85.3 (84.9) / 93.5 (93.3)** | **85.5 (84.8) / 93.6 (93.2)** | ### 情感分类：ChnSentiCorp 在情感分类任务中，我们使用的是ChnSentiCorp数据集。模型需要将文本分成`积极`, `消极`两个类别。评测指标为：Accuracy | 模型 | 开发集 | 测试集 | | :------- | :---------: | :---------: | | BERT | 94.7 (94.3) | 95.0 (94.7) | | BERT-wwm | 95.1 (94.5) | **95.4 (95.0)** | | **XLNet-base** | | | | **XLNet-mid** | **95.8 (95.2)** | **95.4** (94.9) | ## 预训练细节以下以`XLNet-mid`模型为例，对预训练细节进行说明。 ### 生成词表按照XLNet官方教程步骤，首先需要使用[Sentence Piece](https://github.com/google/sentencepiece)生成词表。在本项目中，我们使用的词表大小为32000，其余参数采用官方示例中的默认配置。 ``` spm_train \ --input=wiki.zh.txt \ --model_prefix=sp10m.cased.v3 \ --vocab_size=32000 \ --character_coverage=0.99995 \ --model_type=unigram \ --control_symbols=\<cls\>,\<sep\>,\<pad\>,\<mask\>,\<eod\> \ --user_defined_symbols=\<eop\>,.,$,$,\",-,–,£,€ \ --shuffle_input_sentence \ --input_sentence_size=10000000 ``` ### 生成tf_records 生成词表后，开始利用原始文本语料生成训练用的tf_records文件。原始文本的构造方式与原教程相同： - 每行都是一个句子 - 空行代表文档末尾以下是生成数据时的命令（`num_task`与`task`请根据实际切片数量进行设置）： ``` SAVE_DIR=./output_b32 INPUT=./data/*.proc.txt python data_utils.py \ --bsz_per_host=32 \ --num_core_per_host=8 \ --seq_len=512 \ --reuse_len=256 \ --input_glob=${INPUT} \ --save_dir=${SAVE_DIR} \ --num_passes=20 \ --bi_data=True \ --sp_path=spiece.model \ --mask_alpha=6 \ --mask_beta=1 \ --num_predict=85 \ --uncased=False \ --num_task=10 \ --task=1 ``` ### 预训练获得以上数据后，正式开始预训练XLNet。之所以叫`XLNet-mid`是因为仅相比`XLNet-base`增加了层数（12层增加到24层），其余参数没有变动，主要因为计算设备受限。使用的命令如下： ``` DATA=YOUR_GS_BUCKET_PATH_TO_TFRECORDS MODEL_DIR=YOUR_OUTPUT_MODEL_PATH TPU_NAME=v3-xlnet TPU_ZONE=us-central1-b python train.py \ --record_info_dir=$DATA \ --model_dir=$MODEL_DIR \ --train_batch_size=32 \ --seq_len=512 \ --reuse_len=256 \ --mem_len=384 \ --perm_size=256 \ --n_layer=24 \ --d_model=768 \ --d_embed=768 \ --n_head=12 \ --d_head=64 \ --d_inner=3072 \ --untie_r=True \ --mask_alpha=6 \ --mask_beta=1 \ --num_predict=85 \ --uncased=False \ --train_steps=2000000 \ --save_steps=20000 \ --warmup_steps=20000 \ --max_save=20 \ --weight_decay=0.01 \ --adam_epsilon=1e-6 \ --learning_rate=1e-4 \ --dropout=0.1 \ --dropatt=0.1 \ --tpu=$TPU_NAME \ --tpu_zone=$TPU_ZONE \ --use_tpu=True ``` ## 下游任务微调细节下游任务微调使用的设备是谷歌Cloud TPU v2（64G HBM），以下简要说明各任务精调时的配置。如果你使用GPU进行精调，请更改相应参数以适配，尤其是`batch_size`, `learning_rate`等参数。 **相关代码请查看`src`目录。** ### CMRC 2018 对于阅读理解任务，首先需要生成tf_records数据。请参考XLNet官方教程之[SQuAD 2.0处理方法](https://github.com/zihangdai/xlnet#squad20)，在这里不再赘述。以下是CMRC 2018中文机器阅读理解任务中使用的脚本参数： ``` XLNET_DIR=YOUR_GS_BUCKET_PATH_TO_XLNET MODEL_DIR=YOUR_OUTPUT_MODEL_PATH DATA_DIR=YOUR_DATA_DIR_TO_TFRECORDS RAW_DIR=YOUR_RAW_DATA_DIR TPU_NAME=v2-xlnet TPU_ZONE=us-central1-b python -u run_cmrc_drcd.py \ --spiece_model_file=./spiece.model \ --model_config_path=${XLNET_DIR}/xlnet_config.json \ --init_checkpoint=${XLNET_DIR}/xlnet_model.ckpt \ --tpu_zone=${TPU_ZONE} \ --use_tpu=True \ --tpu=${TPU_NAME} \ --num_hosts=1 \ --num_core_per_host=8 \ --output_dir=${DATA_DIR} \ --model_dir=${MODEL_DIR} \ --predict_dir=${MODEL_DIR}/eval \ --train_file=${DATA_DIR}/cmrc2018_train.json \ --predict_file=${DATA_DIR}/cmrc2018_dev.json \ --uncased=False \ --max_answer_length=40 \ --max_seq_length=512 \ --do_train=True \ --train_batch_size=16 \ --do_predict=True \ --predict_batch_size=16 \ --learning_rate=3e-5 \ --adam_epsilon=1e-6 \ --iterations=1000 \ --save_steps=2000 \ --train_steps=2400 \ --warmup_steps=240 ``` ### DRCD 以下是DRCD繁体中文机器阅读理解任务中使用的脚本参数： ``` XLNET_DIR=YOUR_GS_BUCKET_PATH_TO_XLNET MODEL_DIR=YOUR_OUTPUT_MODEL_PATH DATA_DIR=YOUR_DATA_DIR_TO_TFRECORDS RAW_DIR=YOUR_RAW_DATA_DIR TPU_NAME=v2-xlnet TPU_ZONE=us-central1-b python -u run_cmrc_drcd.py \ --spiece_model_file=./spiece.model \ --model_config_path=${XLNET_DIR}/xlnet_config.json \ --init_checkpoint=${XLNET_DIR}/xlnet_model.ckpt \ --tpu_zone=${TPU_ZONE} \ --use_tpu=True \ --tpu=${TPU_NAME} \ --num_hosts=1 \ --num_core_per_host=8 \ --output_dir=${DATA_DIR} \ --model_dir=${MODEL_DIR} \ --predict_dir=${MODEL_DIR}/eval \ --train_file=${DATA_DIR}/DRCD_training.json \ --predict_file=${DATA_DIR}/DRCD_dev.json \ --uncased=False \ --max_answer_length=30 \ --max_seq_length=512 \ --do_train=True \ --train_batch_size=16 \ --do_predict=True \ --predict_batch_size=16 \ --learning_rate=3e-5 \ --adam_epsilon=1e-6 \ --iterations=1000 \ --save_steps=2000 \ --train_steps=3600 \ --warmup_steps=360 ``` ### ChnSentiCorp 与阅读理解任务不同，分类任务无需提前生成tf_records。以下是ChnSentiCorp情感分类任务中使用的脚本参数： ``` XLNET_DIR=YOUR_GS_BUCKET_PATH_TO_XLNET MODEL_DIR=YOUR_OUTPUT_MODEL_PATH DATA_DIR=YOUR_DATA_DIR_TO_TFRECORDS RAW_DIR=YOUR_RAW_DATA_DIR TPU_NAME=v2-xlnet TPU_ZONE=us-central1-b python -u run_classifier.py \ --spiece_model_file=./spiece.model \ --model_config_path=${XLNET_DIR}/xlnet_config.json \ --init_checkpoint=${XLNET_DIR}/xlnet_model.ckpt \ --task_name=csc \ --do_train=True \ --do_eval=True \ --eval_all_ckpt=False \ --uncased=False \ --data_dir=${RAW_DIR} \ --output_dir=${DATA_DIR} \ --model_dir=${MODEL_DIR} \ --train_batch_size=48 \ --eval_batch_size=48 \ --num_hosts=1 \ --num_core_per_host=8 \ --num_train_epochs=3 \ --max_seq_length=256 \ --learning_rate=2e-5 \ --save_steps=5000 \ --use_tpu=True \ --tpu=${TPU_NAME} \ --tpu_zone=${TPU_ZONE} ``` ## FAQ **Q: 会发布更大的模型吗？** A: 不一定，不保证。如果我们获得了显著性能提升，会考虑发布出来。 **Q: 在某些数据集上效果不好？** A: 选用其他模型或者在这个checkpoint上继续用你的数据做预训练。 **Q: 预训练数据会发布吗？** A: 抱歉，因为版权问题无法发布。 **Q: 训练XLNet花了多长时间？** A: `XLNet-mid`使用了Cloud TPU v3 (128G HBM)训练了2M steps（batch=32），大约需要3周时间。`XLNet-base`则是训练了4M steps。 **Q: 为什么XLNet官方没有发布Multilingual或者Chinese XLNet？** A: （以下是个人看法）不得而知，很多人留言表示希望有，戳[XLNet-issue-#3](https://github.com/zihangdai/xlnet/issues/3)。以XLNet官方的技术和算力来说，训练一个这样的模型并非难事（multilingual版可能比较复杂，需要考虑各语种之间的平衡，也可以参考[multilingual-bert](https://github.com/google-research/bert/blob/master/multilingual.md)中的描述。）。 **不过反过来想一下，作者们也并没有义务一定要这么做。** 作为学者来说，他们的technical contribution已经足够，不发布出来也不应受到指责，呼吁大家理性对待别人的工作。 **Q: XLNet多数情况下比BERT要好吗？** A: 目前看来至少上述几个任务效果都还不错，使用的数据和我们发布的[BERT-wwm-ext](https://github.com/ymcui/Chinese-BERT-wwm)是一样的。 **Q: ？** A: 。 ## 引用如果本目录中的内容对你的研究工作有所帮助，欢迎在论文中引用下述技术报告： https://arxiv.org/abs/2004.13922 ``` @inproceedings{cui-etal-2020-revisiting, title = "Revisiting Pre-Trained Models for {C}hinese Natural Language Processing", author = "Cui, Yiming and Che, Wanxiang and Liu, Ting and Qin, Bing and Wang, Shijin and Hu, Guoping", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.58", pages = "657--668", } ``` ## 致谢项目作者：崔一鸣（哈工大讯飞联合实验室）、车万翔（哈工大）、刘挺（哈工大）、王士进（科大讯飞）、胡国平（科大讯飞）本项目受到谷歌[TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc)计划资助。建设该项目过程中参考了如下仓库，在这里表示感谢： - XLNet: https://github.com/zihangdai/xlnet - Malaya: https://github.com/huseinzol05/Malaya/tree/master/xlnet - Korean XLNet（韩文描述，无翻译）: https://github.com/yeontaek/XLNET-Korean-Model ## 免责声明本项目并非[XLNet官方](https://github.com/zihangdai/xlnet)发布的Chinese XLNet模型。同时，本项目不是哈工大或科大讯飞的官方产品。该项目中的内容仅供技术研究参考，不作为任何结论性依据。使用者可以在许可证范围内任意使用该模型，但我们不对因使用该项目内容造成的直接或间接损失负责。 ## 关注我们欢迎关注微信公众号"**涌现志**"，了解最新的技术动态。 ![qrcode.png](https://ymcui.com/images/qrcode.jpg) ## 问题反馈 & 贡献如有问题，请在GitHub Issue中提交。我们没有运营，鼓励网友互相帮助解决问题。如果发现实现上的问题或愿意共同建设该项目，请提交Pull Request。

ML Frameworks

1.6K Github Stars

Open Source

Chinese-ELECTRA

[**中文说明**](./README.md) | [**English**](./README_EN.md) <img src="./pics/banner.png" width="500"/> <a href="https://github.com/ymcui/Chinese-ELECTRA/blob/master/LICENSE"> <img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-ELECTRA.svg?color=blue&style=flat-square"> </a> 谷歌与斯坦福大学共同研发的最新预训练模型ELECTRA因其小巧的模型体积以及良好的模型性能受到了广泛关注。为了进一步促进中文预训练模型技术的研究与发展，哈工大讯飞联合实验室基于官方ELECTRA训练代码以及大规模的中文数据训练出中文ELECTRA预训练模型供大家下载使用。其中ELECTRA-small模型可与BERT-base甚至其他同等规模的模型相媲美，而参数量仅为BERT-base的1/10。本项目基于谷歌&斯坦福大学官方的ELECTRA：[https://github.com/google-research/electra](https://github.com/google-research/electra) ---- [中文LERT](https://github.com/ymcui/LERT) | [中英文PERT](https://github.com/ymcui/PERT) | [中文MacBERT](https://github.com/ymcui/MacBERT) | [中文ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [中文XLNet](https://github.com/ymcui/Chinese-XLNet) | [中文BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) | [模型裁剪工具TextPruner](https://github.com/airaria/TextPruner) 查看更多哈工大讯飞联合实验室（HFL）发布的资源：https://github.com/ymcui/HFL-Anthology ## 新闻 **2023/3/28 开源了中文LLaMA&Alpaca大模型，可快速在PC上部署体验，查看：https://github.com/ymcui/Chinese-LLaMA-Alpaca** 2022/10/29 我们提出了一种融合语言学信息的预训练模型LERT。查看：https://github.com/ymcui/LERT 2022/3/30 我们开源了一种新预训练模型PERT。查看：https://github.com/ymcui/PERT 2021/12/17 哈工大讯飞联合实验室推出模型裁剪工具包TextPruner。查看：https://github.com/airaria/TextPruner 2021/10/24 哈工大讯飞联合实验室发布面向少数民族语言的预训练模型CINO。查看：https://github.com/ymcui/Chinese-Minority-PLM 2021/7/21 由哈工大SCIR多位学者撰写的[《自然语言处理：基于预训练模型的方法》](https://item.jd.com/13344628.html)已出版，欢迎大家选购。 2020/12/13 基于大规模法律文书数据，我们训练了面向司法领域的中文ELECTRA系列模型，查看[模型下载](#模型下载)，[司法任务效果](#司法任务效果)。 <details> <summary>点击这里查看历史新闻</summary> 2020/10/22 ELECTRA-180g已发布，增加了CommonCrawl的高质量数据，查看[模型下载](#模型下载)。 2020/9/15 我们的论文["Revisiting Pre-Trained Models for Chinese Natural Language Processing"](https://arxiv.org/abs/2004.13922)被[Findings of EMNLP](https://2020.emnlp.org)录用为长文。 2020/8/27 哈工大讯飞联合实验室在通用自然语言理解评测GLUE中荣登榜首，查看[GLUE榜单](https://gluebenchmark.com/leaderboard)，[新闻](http://dwz.date/ckrD)。 2020/5/29 Chinese ELECTRA-large/small-ex已发布，请查看[模型下载](#模型下载)，目前只提供Google Drive下载地址，敬请谅解。 2020/4/7 PyTorch用户可通过[🤗Transformers](https://github.com/huggingface/transformers)加载模型，查看[快速加载](#快速加载)。 2020/3/31 本目录发布的模型已接入[飞桨PaddleHub](https://github.com/PaddlePaddle/PaddleHub)，查看[快速加载](#快速加载)。 2020/3/25 Chinese ELECTRA-small/base已发布，请查看[模型下载](#模型下载)。 </details> ## 内容导引 | 章节 | 描述 | |-|-| | [简介](#简介) | 介绍ELECTRA基本原理 | | [模型下载](#模型下载) | 中文ELECTRA预训练模型下载 | | [快速加载](#快速加载) | 介绍了如何使用[🤗Transformers](https://github.com/huggingface/transformers)、[PaddleHub](https://github.com/PaddlePaddle/PaddleHub)快速加载模型 | | [基线系统效果](#基线系统效果) | 中文基线系统效果：阅读理解、文本分类等 | | [使用方法](#使用方法) | 模型的详细使用方法 | | [FAQ](#FAQ) | 常见问题答疑 | | [引用](#引用) | 本目录的技术报告 | ## 简介 **ELECTRA**提出了一套新的预训练框架，其中包括两个部分：**Generator**和**Discriminator**。 - **Generator**: 一个小的MLM，在[MASK]的位置预测原来的词。Generator将用来把输入文本做部分词的替换。 - **Discriminator**: 判断输入句子中的每个词是否被替换，即使用Replaced Token Detection (RTD)预训练任务，取代了BERT原始的Masked Language Model (MLM)。需要注意的是这里并没有使用Next Sentence Prediction (NSP)任务。在预训练阶段结束之后，我们只使用Discriminator作为下游任务精调的基模型。更详细的内容请查阅ELECTRA论文：[ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB) ![](./pics/model.png) ## 模型下载 * **`ELECTRA-large, Chinese`**: 24-layer, 1024-hidden, 16-heads, 324M parameters * **`ELECTRA-base, Chinese`**: 12-layer, 768-hidden, 12-heads, 102M parameters * **`ELECTRA-small-ex, Chinese`**: 24-layer, 256-hidden, 4-heads, 25M parameters * **`ELECTRA-small, Chinese`**: 12-layer, 256-hidden, 4-heads, 12M parameters #### 大语料版（新版，180G数据） | 模型简称 | 🤗HF下载 | 百度网盘下载 | 压缩包大小 | | :------- | :---------: | :---------: | :---------: | | **`ELECTRA-180g-large, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-electra-180g-large-discriminator) | [TensorFlow（密码2v5r）](https://pan.baidu.com/s/13UJIG2G0lASjjCvPmh13RQ?pwd=2v5r) | 1G | | **`ELECTRA-180g-base, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-electra-180g-base-discriminator) | [TensorFlow（密码3vg1）](https://pan.baidu.com/s/15PQdeh7nRxCgXp9YmjqgsQ?pwd=3vg1) | 383M | | **`ELECTRA-180g-small-ex, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-electra-180g-small-ex-discriminator) | [TensorFlow（密码93n8）](https://pan.baidu.com/s/1UV83d2LNp5HHwK7X14HjPQ?pwd=93n8) | 92M | | **`ELECTRA-180g-small, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-electra-180g-small-discriminator) | [TensorFlow（密码k9iu）](https://pan.baidu.com/s/1J5DXcehcNtX0iBXNRKLWBw?pwd=k9iu) | 46M | #### 基础版（原版，20G数据） | 模型简称 | 🤗HF下载 | 百度网盘下载 | 压缩包大小 | | :------- | :---------: | :---------: | :---------: | | **`ELECTRA-large, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-electra-large-discriminator) | [TensorFlow（密码1e14）](https://pan.baidu.com/s/1M5pSqDRbb3Vsv5r3TfviBQ?pwd=1e14) | 1G | | **`ELECTRA-base, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-electra-base-discriminator) | [TensorFlow（密码f32j）](https://pan.baidu.com/s/1HOzCBNaoIEULj_s-q3dDzA?pwd=f32j) | 383M | | **`ELECTRA-small-ex, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-electra-small-ex-discriminator) | [TensorFlow（密码gfb1）](https://pan.baidu.com/s/1dOLw4feMJcsgZL07V-koWA?pwd=gfb1) | 92M | | **`ELECTRA-small, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-electra-small-discriminator) | [TensorFlow（密码1r4r）](https://pan.baidu.com/s/1UIosBYOHVA3bDuJrFqU0NQ?pwd=1r4r) | 46M | #### 司法领域版 | 模型简称 | 🤗HF下载 | 百度网盘下载 | 压缩包大小 | | :------- | :---------: | :---------: | :---------: | | **`legal-ELECTRA-large, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-legal-electra-large-discriminator) | [TensorFlow（密码q4gv）](https://pan.baidu.com/s/180cloQ0A3m3VqpLPeKpPYg?pwd=q4gv) | 1G | | **`legal-ELECTRA-base, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-legal-electra-base-discriminator) | [TensorFlow（密码8gcv）](https://pan.baidu.com/s/1OWwSsr-jCWq3vb7Js4B2vg?pwd=8gcv) | 383M | | **`legal-ELECTRA-small, Chinese`** | [HF link](https://huggingface.co/hfl/chinese-legal-electra-small-discriminator) | [TensorFlow（密码kmrj）](https://pan.baidu.com/s/1FIblX4EU23KSQWft3DWL0g?pwd=kmrj) | 46M | ### 使用须知中国大陆境内建议使用百度网盘下载点，境外用户建议使用谷歌下载点。以TensorFlow版`ELECTRA-small, Chinese`为例，下载完毕后对zip文件进行解压得到如下文件。 ``` chinese_electra_small_L-12_H-256_A-4.zip |- electra_small.data-00000-of-00001 # 模型权重 |- electra_small.meta # 模型meta信息 |- electra_small.index # 模型index信息 |- vocab.txt # 词表 |- discriminator.json # 配置文件：discriminator（若没有可从本repo中的config目录获取） |- generator.json # 配置文件：generator（若没有可从本repo中的config目录获取） ``` ### 训练细节我们采用了大规模中文维基以及通用文本训练了ELECTRA模型，总token数达到5.4B，与[RoBERTa-wwm-ext系列模型](https://github.com/ymcui/Chinese-BERT-wwm)一致。词表方面沿用了谷歌原版BERT的WordPiece词表，包含21,128个token。其他细节和超参数如下（未提及的参数保持默认）： - `ELECTRA-large`: 24层，隐层1024，16个注意力头，学习率1e-4，batch96，最大长度512，训练2M步 - `ELECTRA-base`: 12层，隐层768，12个注意力头，学习率2e-4，batch256，最大长度512，训练1M步 - `ELECTRA-small-ex`: 24层，隐层256，4个注意力头，学习率5e-4，batch384，最大长度512，训练2M步 - `ELECTRA-small`: 12层，隐层256，4个注意力头，学习率5e-4，batch1024，最大长度512，训练1M步 ## 快速加载 ### 使用Huggingface-Transformers [Huggingface-Transformers 2.8.0](https://github.com/huggingface/transformers/releases/tag/v2.8.0)版本已正式支持ELECTRA模型，可通过如下命令调用。 ```python tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) model = AutoModel.from_pretrained(MODEL_NAME) ``` 其中`MODEL_NAME`对应列表如下： | 模型名 | 组件 | MODEL_NAME | | - | - | - | | ELECTRA-180g-large, Chinese | discriminator | hfl/chinese-electra-180g-large-discriminator | | ELECTRA-180g-large, Chinese | generator | hfl/chinese-electra-180g-large-generator | | ELECTRA-180g-base, Chinese | discriminator | hfl/chinese-electra-180g-base-discriminator | | ELECTRA-180g-base, Chinese | generator | hfl/chinese-electra-180g-base-generator | | ELECTRA-180g-small-ex, Chinese | discriminator | hfl/chinese-electra-180g-small-ex-discriminator | | ELECTRA-180g-small-ex, Chinese | generator | hfl/chinese-electra-180g-small-ex-generator | | ELECTRA-180g-small, Chinese | discriminator | hfl/chinese-electra-180g-small-discriminator | | ELECTRA-180g-small, Chinese | generator | hfl/chinese-electra-180g-small-generator | | ELECTRA-large, Chinese | discriminator | hfl/chinese-electra-large-discriminator | | ELECTRA-large, Chinese | generator | hfl/chinese-electra-large-generator | | ELECTRA-base, Chinese | discriminator | hfl/chinese-electra-base-discriminator | | ELECTRA-base, Chinese | generator | hfl/chinese-electra-base-generator | | ELECTRA-small-ex, Chinese | discriminator | hfl/chinese-electra-small-ex-discriminator | | ELECTRA-small-ex, Chinese | generator | hfl/chinese-electra-small-ex-generator | | ELECTRA-small, Chinese | discriminator | hfl/chinese-electra-small-discriminator | | ELECTRA-small, Chinese | generator | hfl/chinese-electra-small-generator | 司法领域版本： | 模型名 | 组件 | MODEL_NAME | | - | - | - | | legal-ELECTRA-large, Chinese | discriminator | hfl/chinese-legal-electra-large-discriminator | | legal-ELECTRA-large, Chinese | generator | hfl/chinese-legal-electra-large-generator | | legal-ELECTRA-base, Chinese | discriminator | hfl/chinese-legal-electra-base-discriminator | | legal-ELECTRA-base, Chinese | generator | hfl/chinese-legal-electra-base-generator |å | legal-ELECTRA-small, Chinese | discriminator | hfl/chinese-legal-electra-small-discriminator | | legal-ELECTRA-small, Chinese | generator | hfl/chinese-legal-electra-small-generator | ### 使用PaddleHub 依托[PaddleHub](https://github.com/PaddlePaddle/PaddleHub)，我们只需一行代码即可完成模型下载安装，十余行代码即可完成文本分类、序列标注、阅读理解等任务。 ``` import paddlehub as hub module = hub.Module(name=MODULE_NAME) ``` 其中`MODULE_NAME`对应列表如下： | 模型名 | MODULE_NAME | | - | - | | ELECTRA-base, Chinese | [chinese-electra-base](https://paddlepaddle.org.cn/hubdetail?name=chinese-electra-base&en_category=SemanticModel) | | ELECTRA-small, Chinese | [chinese-electra-small](https://paddlepaddle.org.cn/hubdetail?name=chinese-electra-small&en_category=SemanticModel) | ## 基线系统效果我们将`ELECTRA-small/base`与[`BERT-base`](https://github.com/google-research/bert)、[`BERT-wwm`、`BERT-wwm-ext`、`RoBERTa-wwm-ext`、`RBT3`](https://github.com/ymcui/Chinese-BERT-wwm)进行了效果对比，包括以下六个任务： - [**CMRC 2018 (Cui et al., 2019)**：篇章片段抽取型阅读理解（简体中文）](https://github.com/ymcui/cmrc2018) - [**DRCD (Shao et al., 2018)**：篇章片段抽取型阅读理解（繁体中文）](https://github.com/DRCSolutionService/DRCD) - [**XNLI (Conneau et al., 2018)**：自然语言推断](https://github.com/google-research/bert/blob/master/multilingual.md) - [**ChnSentiCorp**：情感分析](https://github.com/pengming617/bert_classification) - [**LCQMC (Liu et al., 2018)**：句对匹配](http://icrc.hitsz.edu.cn/info/1037/1146.htm) - [**BQ Corpus (Chen et al., 2018)**：句对匹配](http://icrc.hitsz.edu.cn/Article/show/175.html) 对于ELECTRA-small/base模型，我们使用原论文默认的`3e-4`和`1e-4`的学习率。 **需要注意的是，我们没有针对任何任务进行参数精调，所以通过调整学习率等超参数可能获得进一步性能提升。** 为了保证结果的可靠性，对于同一模型，我们使用不同随机种子训练10遍，汇报模型性能的最大值和平均值（括号内为平均值）。 ### 简体中文阅读理解：CMRC 2018 [**CMRC 2018数据集**](https://github.com/ymcui/cmrc2018)是哈工大讯飞联合实验室发布的中文机器阅读理解数据。根据给定问题，系统需要从篇章中抽取出片段作为答案，形式与[SQuAD](http://arxiv.org/abs/1606.05250)相同。评价指标为：EM / F1 | 模型 | 开发集 | 测试集 | 挑战集 | 参数量 | | :------- | :---------: | :---------: | :---------: | :---------: | | BERT-base | 65.5 (64.4) / 84.5 (84.0) | 70.0 (68.7) / 87.0 (86.3) | 18.6 (17.0) / 43.3 (41.3) | 102M | | BERT-wwm | 66.3 (65.0) / 85.6 (84.7) | 70.5 (69.1) / 87.4 (86.7) | 21.0 (19.3) / 47.0 (43.9) | 102M | | BERT-wwm-ext | 67.1 (65.6) / 85.7 (85.0) | 71.4 (70.0) / 87.7 (87.0) | 24.0 (20.0) / 47.3 (44.6) | 102M | | RoBERTa-wwm-ext | 67.4 (66.5) / 87.2 (86.5) | 72.6 (71.4) / 89.4 (88.8) | 26.2 (24.6) / 51.0 (49.1) | 102M | | RBT3 | 57.0 / 79.0 | 62.2 / 81.8 | 14.7 / 36.2 | 38M | | **ELECTRA-small** | 63.4 (62.9) / 80.8 (80.2) | 67.8 (67.4) / 83.4 (83.0) | 16.3 (15.4) / 37.2 (35.8) | 12M | | **ELECTRA-180g-small** | 63.8 / 82.7 | 68.5 / 85.2 | 15.1 / 35.8 | 12M | | **ELECTRA-small-ex** | 66.4 / 82.2 | 71.3 / 85.3 | 18.1 / 38.3 | 25M | | **ELECTRA-180g-small-ex** | 68.1 / 85.1 | 71.8 / 87.2 | 20.6 / 41.7 | 25M | | **ELECTRA-base** | 68.4 (68.0) / 84.8 (84.6) | 73.1 (72.7) / 87.1 (86.9) | 22.6 (21.7) / 45.0 (43.8) | 102M | | **ELECTRA-180g-base** | 69.3 / 87.0 | 73.1 / 88.6 | 24.0 / 48.6 | 102M | | **ELECTRA-large** | 69.1 / 85.2 | 73.9 / 87.1 | 23.0 / 44.2 | 324M | | **ELECTRA-180g-large** | 68.5 / 86.2 | 73.5 / 88.5 | 21.8 / 42.9 | 324M | ### 繁体中文阅读理解：DRCD [**DRCD数据集**](https://github.com/DRCKnowledgeTeam/DRCD)由中国台湾台达研究院发布，其形式与SQuAD相同，是基于繁体中文的抽取式阅读理解数据集。评价指标为：EM / F1 | 模型 | 开发集 | 测试集 | 参数量 | | :------- | :---------: | :---------: | :---------: | | BERT-base | 83.1 (82.7) / 89.9 (89.6) | 82.2 (81.6) / 89.2 (88.8) | 102M | | BERT-wwm | 84.3 (83.4) / 90.5 (90.2) | 82.8 (81.8) / 89.7 (89.0) | 102M | | BERT-wwm-ext | 85.0 (84.5) / 91.2 (90.9) | 83.6 (83.0) / 90.4 (89.9) | 102M | | RoBERTa-wwm-ext | 86.6 (85.9) / 92.5 (92.2) | 85.6 (85.2) / 92.0 (91.7) | 102M | | RBT3 | 76.3 / 84.9 | 75.0 / 83.9 | 38M | | **ELECTRA-small** | 79.8 (79.4) / 86.7 (86.4) | 79.0 (78.5) / 85.8 (85.6) | 12M | | **ELECTRA-180g-small** | 83.5 / 89.2 | 82.9 / 88.7 | 12M | | **ELECTRA-small-ex** | 84.0 / 89.5 | 83.3 / 89.1 | 25M | | **ELECTRA-180g-small-ex** | 87.3 / 92.3 | 86.5 / 91.3 | 25M | | **ELECTRA-base** | 87.5 (87.0) / 92.5 (92.3) | 86.9 (86.6) / 91.8 (91.7) | 102M | | **ELECTRA-180g-base** | 89.6 / 94.2 | 88.9 / 93.7 | 102M | | **ELECTRA-large** | 88.8 / 93.3 | 88.8 / 93.6 | 324M | | **ELECTRA-180g-large** | 90.1 / 94.8 | 90.5 / 94.7 | 324M | ### 自然语言推断：XNLI 在自然语言推断任务中，我们采用了[**XNLI**数据](https://github.com/google-research/bert/blob/master/multilingual.md)，需要将文本分成三个类别：`entailment`，`neutral`，`contradictory`。评价指标为：Accuracy | 模型 | 开发集 | 测试集 | 参数量 | | :------- | :---------: | :---------: | :---------: | | BERT-base | 77.8 (77.4) | 77.8 (77.5) | 102M | | BERT-wwm | 79.0 (78.4) | 78.2 (78.0) | 102M | | BERT-wwm-ext | 79.4 (78.6) | 78.7 (78.3) | 102M | | RoBERTa-wwm-ext | 80.0 (79.2) | 78.8 (78.3) | 102M | | RBT3 | 72.2 | 72.3 | 38M | | **ELECTRA-small** | 73.3 (72.5) | 73.1 (72.6) | 12M | | **ELECTRA-180g-small** | 74.6 | 74.6 | 12M | | **ELECTRA-small-ex** | 75.4 | 75.8 | 25M | | **ELECTRA-180g-small-ex** | 76.5 | 76.6 | 25M | | **ELECTRA-base** | 77.9 (77.0) | 78.4 (77.8) | 102M | | **ELECTRA-180g-base** | 79.6 | 79.5 | 102M | | **ELECTRA-large** | 81.5 | 81.0 | 324M | | **ELECTRA-180g-large** | 81.2 | 80.4 | 324M | ### 情感分析：ChnSentiCorp 在情感分析任务中，二分类的情感分类数据集[**ChnSentiCorp**](https://github.com/pengming617/bert_classification)。评价指标为：Accuracy | 模型 | 开发集 | 测试集 | 参数量 | | :------- | :---------: | :---------: | :---------: | | BERT-base | 94.7 (94.3) | 95.0 (94.7) | 102M | | BERT-wwm | 95.1 (94.5) | 95.4 (95.0) | 102M | | BERT-wwm-ext | 95.4 (94.6) | 95.3 (94.7) | 102M | | RoBERTa-wwm-ext | 95.0 (94.6) | 95.6 (94.8) | 102M | | RBT3 | 92.8 | 92.8 | 38M | | **ELECTRA-small** | 92.8 (92.5) | 94.3 (93.5) | 12M | | **ELECTRA-180g-small** | 94.1 | 93.6 | 12M | | **ELECTRA-small-ex** | 92.6 | 93.6 | 25M | | **ELECTRA-180g-small-ex** | 92.8 | 93.4 | 25M | | **ELECTRA-base** | 93.8 (93.0) | 94.5 (93.5) | 102M | | **ELECTRA-180g-base** | 94.3 | 94.8 | 102M | | **ELECTRA-large** | 95.2 | 95.3 | 324M | | **ELECTRA-180g-large** | 94.8 | 95.2 | 324M | ### 句对分类：LCQMC 以下两个数据集均需要将一个句对进行分类，判断两个句子的语义是否相同（二分类任务）。 [**LCQMC**](http://icrc.hitsz.edu.cn/info/1037/1146.htm)由哈工大深圳研究生院智能计算研究中心发布。评价指标为：Accuracy | 模型 | 开发集 | 测试集 | 参数量 | | :------- | :---------: | :---------: | :---------: | | BERT | 89.4 (88.4) | 86.9 (86.4) | 102M | | BERT-wwm | 89.4 (89.2) | 87.0 (86.8) | 102M | | BERT-wwm-ext | 89.6 (89.2) | 87.1 (86.6) | 102M | | RoBERTa-wwm-ext | 89.0 (88.7) | 86.4 (86.1) | 102M | | RBT3 | 85.3 | 85.1 | 38M | | **ELECTRA-small** | 86.7 (86.3) | 85.9 (85.6) | 12M | | **ELECTRA-180g-small** | 86.6 | 85.8 | 12M | | **ELECTRA-small-ex** | 87.5 | 86.0 | 25M | | **ELECTRA-180g-small-ex** | 87.6 | 86.3 | 25M | | **ELECTRA-base** | 90.2 (89.8) | 87.6 (87.3) | 102M | | **ELECTRA-180g-base** | 90.2 | 87.1 | 102M | | **ELECTRA-large** | 90.7 | 87.3 | 324M | | **ELECTRA-180g-large** | 90.3 | 87.3 | 324M | ### 句对分类：BQ Corpus [**BQ Corpus**](http://icrc.hitsz.edu.cn/Article/show/175.html)由哈工大深圳研究生院智能计算研究中心发布，是面向银行领域的数据集。评价指标为：Accuracy | 模型 | 开发集 | 测试集 | 参数量 | | :------- | :---------: | :---------: | :---------: | | BERT | 86.0 (85.5) | 84.8 (84.6) | 102M | | BERT-wwm | 86.1 (85.6) | 85.2 (84.9) | 102M | | BERT-wwm-ext | 86.4 (85.5) | 85.3 (84.8) | 102M | | RoBERTa-wwm-ext | 86.0 (85.4) | 85.0 (84.6) | 102M | | RBT3 | 84.1 | 83.3 | 38M | | **ELECTRA-small** | 83.5 (83.0) | 82.0 (81.7) | 12M | | **ELECTRA-180g-small** | 83.3 | 82.1 | 12M | | **ELECTRA-small-ex** | 84.0 | 82.6 | 25M | | **ELECTRA-180g-small-ex** | 84.6 | 83.4 | 25M | | **ELECTRA-base** | 84.8 (84.7) | 84.5 (84.0) | 102M | | **ELECTRA-180g-base** | 85.8 | 84.5 | 102M | | **ELECTRA-large** | 86.7 | 85.1 | 324M | | **ELECTRA-180g-large** | 86.4 | 85.4 | 324M | ### 司法任务效果我们使用CAIL 2018司法评测的[罪名预测数据](https://github.com/liuhuanyong/CrimeKgAssitant)对司法ELECTRA进行了测试。small/base/large学习率分别为：5e-4/3e-4/1e-4。评价指标为：Accuracy | 模型 | 开发集 | 测试集 | 参数量 | | :------- | :---------: | :---------: | :---------: | | ELECTRA-small | 78.84 | 76.35 | 12M | | **legal-ELECTRA-small** | **79.60** | **77.03** | 12M | | ELECTRA-base | 80.94 | 78.41 | 102M | | **legal-ELECTRA-base** | **81.71** | **79.17** | 102M | | ELECTRA-large | 81.53 | 78.97 | 324M | | **legal-ELECTRA-large** | **82.60** | **79.89** | 324M | ## 使用方法用户可以基于已发布的上述中文ELECTRA预训练模型进行下游任务精调。在这里我们只介绍最基本的用法，更详细的用法请参考[ELECTRA官方介绍](https://github.com/google-research/electra)。本例中，我们使用`ELECTRA-small`模型在CMRC 2018任务上进行精调，相关步骤如下。假设， - `data-dir`：工作根目录，可按实际情况设置。 - `model-name`：模型名称，本例中为`electra-small`。 - `task-name`：任务名称，本例中为`cmrc2018`。本目录中的代码已适配了以上六个中文任务，`task-name`分别为`cmrc2018`，`drcd`，`xnli`，`chnsenticorp`，`lcqmc`，`bqcorpus`。 ### 第一步：下载预训练模型并解压在[模型下载](#模型下载)章节中，下载ELECTRA-small模型，并解压至`${data-dir}/models/${model-name}`。该目录下应包含`electra_model.*`，`vocab.txt`，`checkpoint`，共计5个文件。 ### 第二步：准备任务数据下载[CMRC 2018训练集和开发集](https://github.com/ymcui/cmrc2018/tree/master/squad-style-data)，并重命名为`train.json`和`dev.json`。将两个文件放到`${data-dir}/finetuning_data/${task-name}`。 ### 第三步：运行训练命令 ```shell python run_finetuning.py \ --data-dir ${data-dir} \ --model-name ${model-name} \ --hparams params_cmrc2018.json ``` 其中`data-dir`和`model-name`在上面已经介绍。`hparams`是一个JSON词典，在本例中的`params_cmrc2018.json`包含了精调相关超参数，例如： ```json { "task_names": ["cmrc2018"], "max_seq_length": 512, "vocab_size": 21128, "model_size": "small", "do_train": true, "do_eval": true, "write_test_outputs": true, "num_train_epochs": 2, "learning_rate": 3e-4, "train_batch_size": 32, "eval_batch_size": 32, } ``` 在上述JSON文件中，我们只列举了最重要的一些参数，完整参数列表请查阅[configure_finetuning.py](./configure_finetuning.py)。运行完毕后， 1. 对于阅读理解任务，生成的预测JSON数据`cmrc2018_dev_preds.json`保存在`${data-dir}/results/${task-name}_qa/`。可以调用外部评测脚本来得到最终评测结果，例如：`python cmrc2018_drcd_evaluate.py dev.json cmrc2018_dev_preds.json` 2. 对于分类任务，相关accuracy信息会直接打印在屏幕，例如：`xnli: accuracy: 72.5 - loss: 0.67` ## FAQ **Q: 在下游任务精调的时候ELECTRA模型的学习率怎么设置？** A: 我们建议使用原论文使用的学习率作为初始基线（small是3e-4，base是1e-4）然后适当增减学习率进行调试。需要注意的是，相比BERT、RoBERTa一类的模型来说ELECTRA的学习率要相对大一些。 **Q: 有没有PyTorch版权重？** A: 有，[模型下载](#模型下载)。 **Q: 预训练用的数据能共享一下吗？** A: 很遗憾，不可以。 **Q: 未来计划？** A: 敬请关注。 ## 引用如果本目录中的内容对你的研究工作有所帮助，欢迎在论文中引用下述论文。 - 首选：https://ieeexplore.ieee.org/document/9599397 ``` @journal{cui-etal-2021-pretrain, title={Pre-Training with Whole Word Masking for Chinese BERT}, author={Cui, Yiming and Che, Wanxiang and Liu, Ting and Qin, Bing and Yang, Ziqing}, journal={IEEE Transactions on Audio, Speech and Language Processing}, year={2021}, url={https://ieeexplore.ieee.org/document/9599397}, doi={10.1109/TASLP.2021.3124365}, } ``` - 或者：https://www.aclweb.org/anthology/2020.findings-emnlp.58 ``` @inproceedings{cui-etal-2020-revisiting, title = "Revisiting Pre-Trained Models for {C}hinese Natural Language Processing", author = "Cui, Yiming and Che, Wanxiang and Liu, Ting and Qin, Bing and Wang, Shijin and Hu, Guoping", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.58", pages = "657--668", } ``` ## 关注我们欢迎关注微信公众号"**涌现志**"，了解最新的技术动态。 ![qrcode.png](https://ymcui.com/images/qrcode.jpg) ## 问题反馈 Before you submit an issue: - **You are advised to read [FAQ](https://github.com/ymcui/MacBERT#FAQ) first before you submit an issue.** - Repetitive and irrelevant issues will be ignored and closed by [stable-bot](stale · GitHub Marketplace). Thank you for your understanding and support. - We cannot acommodate EVERY request, and thus please bare in mind that there is no guarantee that your request will be met. - Always be polite when you submit an issue.

ML Frameworks

1.4K Github Stars

Software by ymcui

Chinese-LLaMA-Alpaca

Chinese-LLaMA-Alpaca-2

Chinese-BERT-wwm

Chinese-LLaMA-Alpaca-3

Chinese-XLNet

Chinese-ELECTRA