thunlp

Open Source

PromptPapers

# PromptPapers ![](https://img.shields.io/github/last-commit/thunlp/PromptPapers?color=green) ![](https://img.shields.io/badge/PaperNumber-65-brightgreen) ![](https://img.shields.io/badge/PRs-Welcome-red) We have released an open-source prompt-learning toolkit, check out **[OpenPrompt](https://github.com/thunlp/OpenPrompt)!** We strongly encourage the researchers that want to promote their fantastic work to the community to make **pull request** to update their paper's information! (See [contributing details](#contribution)) Effective adaptation of pre-trained models could be probed from different perspectives. Prompt-learning more focuses on the organization of training procedure and the unification of different tasks, while delta tuning (parameter efficient methods) provides another direction from the specific optimization of pre-trained models. Check [DeltaPapers](https://github.com/thunlp/DeltaPapers)!  ## Contents Must-read papers on prompt-based tuning for pre-trained language models. The paper list is mainly mantained by [Ning Ding](https://github.com/ningding97) and [Shengding Hu](https://github.com/shengdinghu). Watch this repository for the latest updates! - [PromptPapers](#promptpapers) - [Introduction](#introduction) - [Keywords Convention](#keywords-convention) - [Papers](#papers) - [Overview](#overview) - [Pilot Work](#pilot-work) - [Basics](#basics) - [Analysis](#analysis) - [Improvements](#improvements) - [Specializations](#specializations) - [Contribution](#contribution) - [Other contributors](#other-contributors) - [Contributing to this paper list](#contributing-to-this-paper-list) ## Introduction This is a paper list about **prompt-based tuning** for large-scale pre-trained language models. Different from traditional fine-tuning that uses an explicit classifier, prompt-based tuning directly uses the pre-trained models to conduct the pre-training tasks for classification or regression. ### Keywords Convention ![](https://img.shields.io/badge/T5-blue) The abbreviation of the work. ![](https://img.shields.io/badge/Continuous_Template-red) The key features in terms of prompt learning used in the work. ![](https://img.shields.io/badge/Generation-brown) The mainly explored task of the work. ![](https://img.shields.io/badge/Analysis-green) The mainly explored property of prompt learning methods in the work. ## Papers ### Overview This section contains the papers that overview the general trends in recent natural language processing with big (pretrained) models. 1. **OpenPrompt: An Open-source Framework for Prompt-learning.** Preprint. *Ning Ding, Shengding Hu, Weilin Zhao, Yulin Chen, Zhiyuan Liu, Hai-Tao Zheng, Maoson Sun* [[pdf](https://arxiv.org/pdf/2111.01998.pdf)] [[project](https://github.com/thunlp/OpenPrompt)], 2021.11 2. **Pre-Trained Models: Past, Present and Future.** Preprint. *Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Yuan Yao, Ao Zhang, Liang Zhang, Wentao Han, Minlie Huang, Qin Jin, Yanyan Lan, Yang Liu, Zhiyuan Liu, Zhiwu Lu, Xipeng Qiu, Ruihua Song, Jie Tang, Ji-Rong Wen, Jinhui Yuan, Wayne Xin Zhao, Jun Zhu.* [[pdf](https://arxiv.org/abs/2106.07139)], 2021.6 3. **Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.** Preprint. *Liu, Pengfei, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig.* [[pdf](https://arxiv.org/abs/2107.13586)] [[project](http://pretrain.nlpedia.ai)], 2021.7 4. **Paradigm Shift in Natural Language Processing.** Machine Intelligence Research. *Tianxiang Sun, Xiangyang Liu, Xipeng Qiu, Xuanjing Huang* [[pdf](https://arxiv.org/abs/2109.12575)] [[project](https://txsun1997.github.io/nlp-paradigm-shift/)], 2021.9 ### Pilot Work This section contains the pilot works that might contributes to the prevalence of prompt learning paradigm. 1. **Parameter-Efficient Transfer Learning for NLP.** ICML 2019. ![](https://img.shields.io/badge/Adapter-blue) *Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly*. [[pdf](http://proceedings.mlr.press/v97/houlsby19a/houlsby19a.pdf)], [[project](https://github.com/google-research/adapter-bert)], 2019.6 2. **Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.** JMLR. ![](https://img.shields.io/badge/T5-blue) ![](https://img.shields.io/badge/Seq2Seq-red) *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.* [[pdf](https://arxiv.org/abs/1910.10683)], [[project](https://github.com/google-research/text-to-text-transfer-transformer)]. 2019.10. 3. **Language Models as Knowledge Bases?** EMNLP 2019. ![](https://img.shields.io/badge/LAMA-blue) ![](https://img.shields.io/badge/Probing-brown) ![](https://img.shields.io/badge/Discrete_Template-red) *Fabio Petroni, Tim Rocktaschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel.* [[pdf](https://arxiv.org/abs/1909.01066)], [[project](https://github.com/facebookresearch/LAMA)] , 2019.9 4. **How Can We Know What Language Models Know?** TACL 2020. ![](https://img.shields.io/badge/Probing-brown) ![](https://img.shields.io/badge/Discrete_Template-red) *Zhengbao Jiang, Frank F. Xu, Jun Araki, Graham Neubig*. [[pdf](https://arxiv.org/abs/1911.12543)], [[project](https://github.com/jzbjyb/LPAQA)], 2019.11 5. **Language Models are Few-shot Learners.** NeurIPS 2020. ![](https://img.shields.io/badge/GPT3-blue) ![](https://img.shields.io/badge/Discrete_Template-red) *Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei.* [[pdf](https://arxiv.org/abs/2005.14165)], [[website](https://openai.com/blog/gpt-3-apps/)], 2020.5 6. **AdaPrompt: Adaptive Model Training for Prompt-based NLP** ![](https://img.shields.io/badge/PET_Extension-blue) *Yulong Chen, Yang Liu, Li Dong, Shuohang Wang, Chenguang Zhu, Michael Zeng, Yue Zhang* [[pdf](https://arxiv.org/abs/2202.04824)], 2022.02 ### Basics This section contains the exploration on the basic aspects of prompt tuning, such as template, verbalizer, training paradigms, etc. 1. **Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference.** EACL 2021. ![](https://img.shields.io/badge/PET-blue) ![](https://img.shields.io/badge/Discrete_Template-red) *Timo Schick, Hinrich Schütze.* [[pdf](https://arxiv.org/abs/2001.07676)], [[project](https://github.com/timoschick/pet)], 2020.1 2. **It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners.** NAACL 2021. ![](https://img.shields.io/badge/PET_Extension-blue) ![](https://img.shields.io/badge/Discrete_Template-red) *Timo Schick, Hinrich Schütze.* [[pdf](https://arxiv.org/abs/2009.07118)], [[project](https://github.com/timoschick/pet)], 2020.9 3. **Autoprompt: Eliciting knowledge from language models with automatically generated prompts.** Preprint. ![](https://img.shields.io/badge/AutoPrompt-blue) ![](https://img.shields.io/badge/Discrete_Template-red) *Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, Sameer Singh.* [[pdf](https://arxiv.org/abs/2010.15980)], [[website](https://ucinlp.github.io/autoprompt/)], 2020.10 4. **Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification.** COLING 2020. ![](https://img.shields.io/badge/PETAL-blue) ![](https://img.shields.io/badge/Verbalizer-red) *Timo Schick, Helmut Schmid, Hinrich Schütze.* [[pdf](https://arxiv.org/abs/2010.13641)], [[project](https://github.com/timoschick/pet)], 2020.12 5. **Making Pre-trained Language Models Better Few-shot Learners.** ACL 2021. ![](https://img.shields.io/badge/LMBFF-blue) ![](https://img.shields.io/badge/Discrete_Template-red) *Tianyu Gao, Adam Fisch, Danqi Chen.* [[pdf](https://arxiv.org/abs/2012.15723)], [[project](https://github.com/princeton-nlp/LM-BFF)], 2020.12 6. **Prefix-tuning: Optimizing continuous prompts for generation**. ACL 2021. ![](https://img.shields.io/badge/Prefix_Tuning-blue) ![](https://img.shields.io/badge/Continuous_Template-red) ![](https://img.shields.io/badge/Generation-brown) *Xiang Lisa Li, Percy Liang.* [[pdf](https://arxiv.org/abs/2101.00190)], [[project](https://github.com/XiangLi1999/PrefixTuning)], 2021.1 7. **Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm.** Preprint. ![](https://img.shields.io/badge/Discrete_Template-red) *Laria Reynolds, Kyle McDonell.* [[pdf](https://arxiv.org/abs/2102.07350)], 2021.2 8. **Improving and Simplifying Pattern Exploiting Training.** Preprint. ![](https://img.shields.io/badge/ADAPET-blue) ![](https://img.shields.io/badge/Discrete_Template-red) *Derek Tam, Rakesh R Menon, Mohit Bansal, Shashank Srivastava, Colin Raffel.* [[pdf]](https://arxiv.org/abs/2103.11955), 2021.3 9. **GPT understands, too.** Preprint. ![](https://img.shields.io/badge/P_tuning-blue) ![](https://img.shields.io/badge/Mix_Template-red) ![](https://img.shields.io/badge/Probing-brown) *Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, Jie Tang*. [[pdf](https://arxiv.org/abs/2103.10385)], [[project](https://github.com/THUDM/P-tuning)], 2021.3 10. **The Power of Scale for Parameter-Efﬁcient Prompt Tuning.** Preprint. ![](https://img.shields.io/badge/Soft_Prompt-blue) ![](https://img.shields.io/badge/Continuous_Template-red) *Brian Lester, Rami Al-Rfou, Noah Constant*. [[pdf](https://arxiv.org/abs/2104.08691)], [[project](https://github.com/kipgparker/soft-prompt-tuning)], 2021.4 11. **Learning How to Ask: Querying LMs with Mixtures of Soft Prompts.** NAACL 2021. ![](https://img.shields.io/badge/Ensemble-blue) ![](https://img.shields.io/badge/Continuous_Template-red) *Guanghui Qin, Jason Eisner.* [[pdf](https://arxiv.org/abs/2104.06599)][[project](https://github.com/hiaoxui/soft-prompts)], 2021.4 12. **Factual Probing Is [MASK]: Learning vs. Learning to Recall.** NAACL 2021. ![](https://img.shields.io/badge/OptiPrompt-blue) ![](https://img.shields.io/badge/Continuous_Template-red) ![](https://img.shields.io/badge/Probing-brown) *Zexuan Zhong, Dan Friedman, Danqi Chen.* [[pdf](https://arxiv.org/abs/2104.05240)], [[project](https://github.com/princeton-nlp/OptiPrompt)], 2021.4 13. **Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models.** Preprint. ![](https://img.shields.io/badge/null_prompts-blue) *Robert L. Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, Sebastian Riedel*. [[pdf](https://arxiv.org/abs/2106.13353)], 2021.6 14. **WARP: Word-level Adversarial ReProgramming.** ACL 2021. ![](https://img.shields.io/badge/WARP-blue) ![](https://img.shields.io/badge/Continuous_Template-red) *Karen Hambardzumyan, Hrant Khachatrian, Jonathan May.* [[pdf](https://arxiv.org/abs/2101.00121)], [[project](https://github.com/YerevaNN/WARP)], 2021.6 15. **PTR: Prompt Tuning with Rules for Text Classification.** Preprint. ![](https://img.shields.io/badge/PTR-blue) ![](https://img.shields.io/badge/Discrete_Template-red) *Xu Han, Weilin Zhao, Ning Ding, Zhiyuan Liu, Maosong Sun.* [[pdf](https://arxiv.org/abs/2105.11259)], 2021.5 16. **NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction** ![](https://img.shields.io/badge/NSPBERT-blue) ![](https://img.shields.io/badge/Discrete_Template-red) *Yi Sun\*, Yu Zheng\*, Chao Hao, Hangping Qiu*, [[pdf](https://arxiv.org/abs/2109.03564)], [[project](https://github.com/sunyilgdx/NSP-BERT)], 2021.9 17. **Finetuned language models are zero-shot learners.** ![](https://img.shields.io/badge/Instruction_Tuning-blue) ![](https://img.shields.io/badge/FLAN-blue) ![](https://img.shields.io/badge/Pretrain_Prompt-red) *ason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le.* [[pdf](https://arxiv.org/abs/2109.01652)], 2021.9 18. **PPT: Pre-trained Prompt Tuning for Few-shot Learning** ![](https://img.shields.io/badge/PPT-blue) ![](https://img.shields.io/badge/Pretrain_Prompt-red) *Yuxian Gu\*, Xu Han\*, Zhiyuan Liu, Minlie Huang.* [[pdf](https://arxiv.org/abs/2109.04332)], 2021.9 19. **Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners.** ICLR 2022. ![](https://img.shields.io/badge/DART-blue) ![](https://img.shields.io/badge/Continuous_Template-red) *Ningyu Zhang, Luoqiu Li, Xiang Chen, Shumin Deng, Zhen Bi, Chuanqi Tan, Fei Huang, Huajun Chen.* [[pdf](https://arxiv.org/abs/2108.13161)], [[project](https://github.com/zjunlp/DART)], 2021.10 20. **Multitask Prompted Training Enables Zero-Shot Task Generalization.** ![](https://img.shields.io/badge/T0-blue) ![](https://img.shields.io/badge/multitask-green) *Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Stella Biderman, Leo Gao, Tali Bers, Thomas Wolf, Alexander M. Rush.* [[pdf](https://arxiv.org/abs/2110.08207)], 2021.10 21. **P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks.** ACL 2022. ![](https://img.shields.io/badge/P_Tuning_v2-blue) ![](https://img.shields.io/badge/Continuous_Template-red) *Xiao Liu, Kaixuan Ji, Yicheng Fu, Zhengxiao Du, Zhilin Yang, Jie Tang* [[pdf](https://arxiv.org/abs/2110.07602)], [[project](https://github.com/THUDM/P-tuning-v2)], 2021.10 22. **Black-Box Tuning for Language-Model-as-a-Service.** ICML 2022. ![](https://img.shields.io/badge/Black--Box_Tuning-blue) ![](https://img.shields.io/badge/Derivative--free_optimization-red) *Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu* [[pdf](https://arxiv.org/abs/2201.03514)], [[project](https://github.com/txsun1997/Black-Box-Tuning)], 2022.1 23. **Black-box Prompt Learning for Pre-trained Language Models.** Preprint. ![](https://img.shields.io/badge/-Black--Box%20Prompt%20Learning%20-blue) ![](https://img.shields.io/badge/Derivative--free_optimization-red) *Shizhe Diao, Xuechun Li, Yong Lin, Zhichao Huang, Tong Zhang* [[pdf](https://arxiv.org/abs/2201.08531)], 2022.1 24. **Binding Language Models in Symbolic Languages.** Preprint. ![](https://img.shields.io/badge/Binder-blue) ![](https://img.shields.io/badge/Discrete_Template-red) ![](https://img.shields.io/badge/Neural_Symbolic-green) *Zhoujun Cheng\*, Tianbao Xie\*, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu* [[pdf](https://arxiv.org/abs/2210.02875)], [[project](https://github.com/HKUNLP/Binder)], [[website](https://lm-code-binder.github.io/)], 2022.10 25. **A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT.** ![](https://img.shields.io/badge/chatGPT-blue) ![](https://img.shields.io/badge/Prompt_Pattern-red) *Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, Douglas C. Schmidt* [[pdf](https://arxiv.org/abs/2302.11382)], 2023.2 ### Analysis This section contains the analysis of prompt learning methods, including but not limited to why does prompt learning work, various properties of prompt learning methods, limilation of prompt learning methods. 1. **What Makes Good In-Context Examples for GPT-3?**. Preprint. ![](https://img.shields.io/badge/Analysis-green) *Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, Weizhu Chen*. [[pdf](https://arxiv.org/abs/2101.06804)] 2021.1 2. **How Many Data Points is a Prompt Worth?** NAACL 2021. ![](https://img.shields.io/badge/Analysis-green) *Teven Le Scao, Alexander M. Rush.* [[pdf](https://arxiv.org/abs/2103.08493)], [[project](https://github.com/TevenLeScao/pet)], 2021.3 3. **Surface Form Competition-Why the Highest Probability Answer Isn’t Always Right.** Preprint. Preprint. ![](https://img.shields.io/badge/calibration-green) *Ari Holtzman, Peter West, Vered Schwartz, Yejin Choi, Luke Zettlemoyer.* [[pdf](https://arxiv.org/abs/2104.08315)][[project](https://github.com/peterwestuw/surface-form-competition)], 2021.4 4. **Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions.** Preprint. *Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hannaneh Hajishirzi.* [[pdf](https://arxiv.org/abs/2104.08773)], [[project](https://arxiv.org/abs/2104.08773)], 2021.4 5. **Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity**. Preprint. ![](https://img.shields.io/badge/Analysis-green) *Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, Pontus Stenetorp*. [[pdf](https://arxiv.org/abs/2104.08786)] 2021.4 6. **Meta-tuning Language Models to Answer Prompts Better.** Preprint. ![](https://img.shields.io/badge/meta_tuning-green) *Ruiqi Zhong, Kristy Lee\*, Zheng Zhang\*, Dan Klein*. [[pdf](https://arxiv.org/abs/2104.04670)] 2021.4 7. **True Few-Shot Learning with Language Models**. Preprint. ![](https://img.shields.io/badge/few_shot-green) *Ethan Perez, Douwe Kiela, Kyunghyun Cho*. [[pdf](https://arxiv.org/abs/2105.11447)], [[project](https://github.com/ethanjperez/true_few_shot)] 2021.5 8. **Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning**. Preprint. ![](https://img.shields.io/badge/theoretical-green) *Colin Wei Sang Michael Xie Tengyu Ma* [[pdf](https://arxiv.org/abs/2106.09226)], 2021.6 9. **Do Prompt-Based Models Really Understand the Meaning of their Prompts?** Preprint. ![](https://img.shields.io/badge/rethinking-green) *Albert Webson, Ellie Pavlick.* [[pdf](https://arxiv.org/abs/2109.01247)], [[project](https://github.com/awebson/prompt_semantics)] 2021.9 10. **Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning.** Preprint. ![](https://img.shields.io/badge/Sentence_Pair-brown) ![](https://img.shields.io/badge/shortcut-green) *Prasetya Ajie Utama, Nafise Sadat Moosavi, Victor Sanh, Iryna Gurevych.* [[pdf](https://arxiv.org/abs/2109.04144)], 2021.9 11. **Towards a Unified View of Parameter-Efficient Transfer Learning.** Preprint. ![](https://img.shields.io/badge/transfer-green) ![](https://img.shields.io/badge/unify-green) *Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig.* [[pdf](https://arxiv.org/abs/2110.04366)], 2021.10 12. **Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning.** Preprint. ![](https://img.shields.io/badge/multitask-green) *Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Zhiyuan Liu, Juanzi Li, Lei Hou,Peng Li, Maosong Sun, Jie Zhou* [[pdf](https://arxiv.org/abs/2110.07867)] 13. **Exploring the Universal Vulnerability of Prompt-based Learning Paradigm.** Findings of NAACL 2022. ![](https://img.shields.io/badge/vulnerability-green) *Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Zhiyuan Liu* [[pdf](https://arxiv.org/abs/2204.05239)], [[project](https://github.com/leix28/prompt-universal-vulnerability)] 14. **Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?.** Arxiv 2022. ![](https://img.shields.io/badge/vulnerability-green) *Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer* [[pdf](https://arxiv.org/abs/2202.12837)], [[project](https://github.com/Alrope123/rethinking-demonstrations)] 15. **Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers.** Preprint. ![](https://img.shields.io/badge/generalization-green) ![](https://img.shields.io/badge/calibration-green) *Weng Lam Tam, Xiao Liu, Kaixuan Ji, Lilong Xue, Yuxiao Dong, Jiahua Liu, Maodi Hu, Jie Tang* [[pdf](https://arxiv.org/abs/2207.07087)] [[project](https://github.com/THUDM/P-tuning-v2/tree/main/PT-Retrieval)] 16. **Ignore Previous Prompt: Attack Techniques For Language Models.** Best Paper Award @ NeurIPS ML Safety Workshop 2022. ![](https://img.shields.io/badge/PromptInject-blue) *Fábio Perez, Ian Ribeiro* [[pdf](https://arxiv.org/abs/2211.09527)] [[project](https://github.com/agencyenterprise/PromptInject)], 2022.11 ### Improvements This section contains the improvement of the basic prompt tuning methods, include but not limitedd to using additional resources to improving the performances, making up the shortcomings of previous work or conducting prompt tuning in unsual ways. 1. **Calibrate Before Use: Improving Few-Shot Performance of Language Models.** Preprint. ![](https://img.shields.io/badge/Calibration-green) *Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, Sameer Singh.* [[pdf](https://arxiv.org/abs/2102.09690)], [[project](https://github.com/tonyzhaozh/few-shot-learning)], 2021.2 2. **Text Generation with Efficient (Soft) Q-Learning.** Preprint. ![](https://img.shields.io/badge/Template_Generation-red) ![](https://img.shields.io/badge/RL-green) *Han Guo, Bowen Tan, Zhengzhong Liu, Eric P. Xing, Zhiting Hu.* [[pdf](https://arxiv.org/abs/2106.07704)], 2021.6 3. **Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification.** Preprint. ![](https://img.shields.io/badge/KPT-blue) ![](https://img.shields.io/badge/Discrete_Template-red) ![](https://img.shields.io/badge/knowledge-green) *Shengding Hu, Ning Ding, Huadong Wang, Zhiyuan Liu, Juanzi Li, Maosong Sun*. [[pdf](https://arxiv.org/abs/2108.02035)], [[project](https://github.com/ShengdingHu/KnowledgeablePromptTuning)], 2021.8 4. **Noisy Channel Language Model Prompting for Few-Shot Text Classiﬁcation.** Preprint. ![](https://img.shields.io/badge/noisy_channel-green) *Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer.* [[pdf](https://arxiv.org/abs/2108.04106)], 2021.8 5. **Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collection.** ![](https://img.shields.io/badge/meta_learning-green) *Ruiqi Zhong, Kristy Lee\* Zheng Zhang\*, Dan Klein.* [[pdf](http://nlp.cs.berkeley.edu/pubs/Zhong-Lee-Zhang-Klein_2021_MetaTuning_paper.pdf)], 2021.9 6. **Revisiting Self-Training for Few-Shot Learning of Language Model.** Preprint. ![](https://img.shields.io/badge/SFLM-blue) ![](https://img.shields.io/badge/Self_Training-green) *Yiming Chen, Yan Zhang, Chen Zhang, Grandee Lee, Ran Cheng, Haizhou Li.* [[pdf](https://arxiv.org/abs/2110.01256)], 2021.10 7. **LiST: Lite Self-training Makes Efficient Few-shot Learners.** Preprint. ![](https://img.shields.io/badge/LiST-blue) ![](https://img.shields.io/badge/Lightweight_Self_Training-green) *Yaqing Wang, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao.* [[pdf](https://arxiv.org/abs/2110.06274)], 2021.10 8. **Prototypical Verbalizer for Prompt-based Few-shot Tuning.** ACL 2022. ![](https://img.shields.io/badge/ProtoVerb-blue) ![](https://img.shields.io/badge/Discrete_Template-red) ![](https://img.shields.io/badge/Verbalizer-red) *Ganqu Cui, Shengding Hu, Ning Ding, Longtao Huang, Zhiyuan Liu*. [[pdf](https://arxiv.org/abs/2203.09770)], [[project](https://github.com/thunlp/OpenPrompt)], 2022.3 9. **BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent for Few-Shot Learning.** Preprint. ![](https://img.shields.io/badge/BBTv2-blue) ![](https://img.shields.io/badge/Derivative--free_optimization-red) *Tianxiang Sun, Zhengfu He, Hong Qian, Xuanjing Huang, Xipeng Qiu* [[pdf](https://arxiv.org/abs/2205.11200)] [[project](https://github.com/txsun1997/Black-Box-Tuning)], 2022.5 ### Specializations This section contains the prompt learning methods designed for various NLP task. 1. **Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning Performance of GPT-2.** Preprint. ![](https://img.shields.io/badge/Discrete_Template-red) ![](https://img.shields.io/badge/Reasoning-brown) *Gregor Betz, Kyle Richardson, Christian Voigt.* [[pdf](https://arxiv.org/abs/2103.13033)] 2021.3 2. **GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation.** Preprint. ![](https://img.shields.io/badge/GPT3Mix-blue) ![](https://img.shields.io/badge/Discrete-red) ![](https://img.shields.io/badge/DataAugmentation-brown) *Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, Woomyeong Park.* [[pdf](https://arxiv.org/abs/2104.08826)] 2021.4 3. **Constrained Language Models Yield Few-Shot Semantic Parsers.** Preprint. ![](https://img.shields.io/badge/Dynamic_Template-red) ![](https://img.shields.io/badge/Semantic_Parsing-brown) *Richard Shin, Christopher H. Lin, Sam Thomson, Charles Chen, Subhro Roy, Emmanouil Antonios Platanios, Adam Pauls, Dan Klein, Jason Eisner, Benjamin Van Durme.* [[pdf]](https://arxiv.org/abs/2104.08768) 2021.4 4. **Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction.** EMNLP 2021. ![](https://img.shields.io/badge/Relation_Extraction-brown) *Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre.* [[pdf](https://arxiv.org/abs/2109.03659)], 2021.4 5. **PADA: A Prompt-based Autoregressive Approach for Adaptation to Unseen Domains** Preprint.![](https://img.shields.io/badge/PADA-blue) ![](https://img.shields.io/badge/Discrete-red) ![](https://img.shields.io/badge/DomainAdaptation-brown) *Eyal Ben-David, Nadav Oved, Roi Reichart.* [[pdf](https://arxiv.org/abs/2102.12206)][[project](https://github.com/eyalbd2/PADA)] 2021.5 6. **Prompt-Learning for Fine-grained Entity Typing.** Preprint. ![](https://img.shields.io/badge/PLET-blue) ![](https://img.shields.io/badge/EntityTyping-brown) *Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu, Juanzi Li, Hong-Gee Kim* [[pdf](https://arxiv.org/abs/2108.10604)],2021.8 7. **KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction.** WWW 2022. ![](https://img.shields.io/badge/KnowPrompt-blue) ![](https://img.shields.io/badge/Relation_Extraction-brown) *Xiang Chen, Xin Xie, Ningyu Zhang, Jiahuan Yan, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen.*. [[pdf](https://arxiv.org/abs/2104.07650)], [[project](https://github.com/zjunlp/KnowPrompt)], 2021.9 8. **Exploring Prompt-based Few-shot Learning for Grounded Dialog Generation.** Preprint. ![](https://img.shields.io/badge/dialogue-brown) *Chujie Zheng, Minlie Huang.* [[pdf](https://arxiv.org/abs/2109.06513)], 2021.9 9. **SentiPrompt: Sentiment Knowledge Enhanced Prompt-Tuning for Aspect-Based Sentiment Analysis.** Preprint. ![](https://img.shields.io/badge/SentiPrompt-blue) ![](https://img.shields.io/badge/Sentiment_Analysis-brown) *Chengxi Li, Feiyu Gao, Jiajun Bu, Lu Xu, Xiang Chen, Yu Gu, Zirui Shao, Qi Zheng, Ningyu Zhang, Yongpan Wang, Zhi Yu.* [[pdf](https://arxiv.org/abs/2109.08306)] 2021.9 10. **Template-free Prompt Tuning for Few-shot NER.** Preprint. ![](https://img.shields.io/badge/EntLM-blue) ![](https://img.shields.io/badge/NER-brown) *Ruotian Ma\*, Xin Zhou\*, Tao Gui, Yiding Tan, Qi Zhang, Xuanjing Huang.* [[pdf](https://arxiv.org/abs/2109.13532)], 2021.9 11. **Learning to Prompt for Vision-Language Models.** Preprint. ![](https://img.shields.io/badge/CoOp-blue) ![](https://img.shields.io/badge/Visual_LM-brown) *Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu.* [[pdf](https://arxiv.org/abs/2109.01134)], 2021.9 12. **CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models.** Preprint. ![](https://img.shields.io/badge/CPT-blue)![](https://img.shields.io/badge/Visual_Grounding-brown) *Yuan Yao\*, Ao Zhang\*, Zhengyan Zhang, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun.* [[pdf](https://arxiv.org/abs/2109.11797)], 2021.10 13. **MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators.** Preprint. ![](https://img.shields.io/badge/MSP-blue)![](https://img.shields.io/badge/Machine_Translation-brown) *Zhixing Tan, Xiangwen Zhang, Shuo Wang, Yang Liu.* [[pdf](https://arxiv.org/abs/2110.06609)], 2021.10 15. **Few-Shot Bot: Prompt-Based Learning for Dialogue Systems.** Preprint. ![](https://img.shields.io/badge/dialogue-brown) *Andrea Madotto, Zhaojiang Lin, Genta Indra Winata, Pascale Fung* [[pdf](https://arxiv.org/abs/2110.08118)], 2021.10 16. **Control Prefixes for Text Generation.** Preprint. ![](https://img.shields.io/badge/Control_Prefixes-blue) ![](https://img.shields.io/badge/Generation-brown) ![](https://img.shields.io/badge/Dynamic_template-red) *Jordan Clive, Kris Cao, Marek Rei.* [[pdf](https://arxiv.org/abs/2110.08329)], 2021.10 17. **The Power of Prompt Tuning for Low-Resource Semantic Parsing.** Preprint. ![](https://img.shields.io/badge/Semantic_Parsing-brown) *Nathan Schucher, Siva Reddy, Harm de Vries.* [[pdf](https://arxiv.org/abs/2110.08525)], 2021.10 18. **A Good Prompt Is Worth Millions of Parameters? Low-resource Prompt-based Learning for Vision-Language Models.** ![](https://img.shields.io/badge/FewVLM-blue) ![](https://img.shields.io/badge/Visual_LM-brown) *Woojeong Jin, Yu Cheng, Yelong Shen, Weizhu Chen, Xiang Ren.* [[pdf](https://arxiv.org/abs/2110.08484)] 19. **LightNER: A Lightweight Generative Framework with Prompt-guided Attention for Low-resource NER.** COLING 2022. ![](https://img.shields.io/badge/LightNER-blue) ![](https://img.shields.io/badge/NER-brown) *Xiang Chen, Lei Li, Shumin Deng, Chuanqi Tan, Changliang Xu, Fei Huang, Luo Si, Huajun Chen, Ningyu Zhang.* [[pdf](https://arxiv.org/abs/2109.00720)], [[project](https://github.com/zjunlp/DeepKE/tree/main/example/ner/few-shot)], 2021.8 20. **UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models.** ![](https://img.shields.io/badge/UnifiedSKG-blue) ![](https://img.shields.io/badge/Structured_Knowledge_Grounding-brown) ![](https://img.shields.io/badge/Continuous_Template-red) *Tianbao Xie\*, Chen Henry Wu\*, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu.*[[pdf](https://arxiv.org/abs/2201.05966)], [[project](https://github.com/hkunlp/unifiedskg)], [[website](https://unifiedskg.com/)], 2022.1 21. **Ontology-enhanced Prompt-tuning for Few-shot Learning.** WWW 2022. ![](https://img.shields.io/badge/OntoPrompt-blue) ![](https://img.shields.io/badge/Information_Extraction-brown) ![](https://img.shields.io/badge/Knowledge_Graph_Completion-red) *Hongbin Ye, Ningyu Zhang, Shumin Deng, Xiang Chen, Hui Chen, Feiyu Xiong, Xi Chen, Huajun Chen.* [[pdf](https://arxiv.org/abs/2201.11332)], 2022.1 22. **Learning to Prompt for Continual Learning.** CVPR 2022. ![](https://img.shields.io/badge/L2P-blue) ![](https://img.shields.io/badge/Continual_Learning-brown) ![](https://img.shields.io/badge/Prompt_Pool-red) *Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister.* [[pdf](https://arxiv.org/abs/2112.08654)], [[project](https://github.com/google-research/l2p)], 2021.12 23. **Relation Extraction as Open-book Examination: Retrieval-enhanced Prompt Tuning.** SIGIR 2022. ![](https://img.shields.io/badge/RetrievalRE-blue) ![](https://img.shields.io/badge/Relation_Extraction-brown) *Xiang Chen, Lei Li, Ningyu Zhang, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen.* [[pdf](https://arxiv.org/abs/2205.02355)], [[project](https://github.com/zjunlp/PromptKG/tree/main/research/RetrievalRE)], 2022.5 24. **Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction.** Findings of NAACL 2022. ![](https://img.shields.io/badge/HVPNeT-blue) ![](https://img.shields.io/badge/Information_Extraction-brown) ![](https://img.shields.io/badge/Multimodal-red) *Xiang Chen, Ningyu Zhang, Lei Li, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen.* [[pdf](https://arxiv.org/abs/2205.03521)], [[project](https://github.com/zjunlp/HVPNeT)], 2022.5 25. **Chain of Thought Prompting Elicits Reasoning in Large Language Models.** Preprint 2022. ![](https://img.shields.io/badge/Reasoning-red) *Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, Denny Zhou.* [[pdf](https://arxiv.org/abs/2201.11903)] 26. **Self-Consistency Improves Chain of Thought Reasoning in Language Models.** Preprint 2022. ![](https://img.shields.io/badge/Reasoning-red) *Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou.* [[pdf](https://arxiv.org/abs/2203.11171)] 27. **Large Language Models are Zero-Shot Reasoners.** Preprint 2022. ![](https://img.shields.io/badge/Reasoning-red) *Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa.* [[pdf](https://arxiv.org/abs/2205.11916)] 28. **Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.** Preprint 2022. ![](https://img.shields.io/badge/Reasoning-red) *Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Olivier Bousquet, Quoc Le, Ed Chi.* [[pdf](https://arxiv.org/abs/2205.10625)] 29. **Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations.** Preprint 2022. ![](https://img.shields.io/badge/Reasoning-red) *Jaehun Jung, Lianhui Qin, Sean Welleck, Faeze Brahman, Chandra Bhagavatula, Ronan Le Bras, Yejin Choi* [[pdf](https://arxiv.org/abs/2205.11822)] 30. **On the Advance of Making Language Models Better Reasoners.** Preprint 2022. ![](https://img.shields.io/badge/Reasoning-red) *Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen* [[pdf](https://arxiv.org/abs/2205.11822)] 31. **Learning to Compose Soft Prompts for Compositional Zero-Shot Learning.** Preprint 2022. ![](https://img.shields.io/badge/CSP-blue) ![](https://img.shields.io/badge/Continuous_Template-red) *Nihal V. Nayak\*, Peilin Yu\*, Stephen H. Bach* [[pdf](https://arxiv.org/abs/2204.03574)], [[project](https://github.com/BatsResearch/csp)] 32. **Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning.** NeurIPS 2022. ![](https://img.shields.io/badge/RetroPrompt-blue) ![](https://img.shields.io/badge/Continuous_Template-red) *Xiang Chen, Lei Li, Ningyu Zhang, Xiaozhuan Liang, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, Huajun Chen.* [[pdf](https://arxiv.org/abs/2205.14704)], [[project](https://github.com/zjunlp/PromptKG/tree/main/research/RetroPrompt)], 2022.5 33. **Exploring Length Generalization in Large Language Models.** Preprint 2022. ![](https://img.shields.io/badge/Continuous_Template-red) *Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur* [[pdf](https://arxiv.org/abs/2207.04901)] 34. **Ask Me Anything: A simple strategy for prompting language models.** Preprint 2022. ![](https://img.shields.io/badge/Continuous_Template-red) *Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré* [[pdf](https://arxiv.org/abs/2210.02441)] 35. **Measuring And Narrowing The Compositionality Gap In Language Models** Preprint 2022. ![](https://img.shields.io/badge/Continuous_Template-red) *Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis* [[pdf](https://ofir.io/self-ask.pdf)] 36. **RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning** Preprint 2022. ![](https://img.shields.io/badge/Continuous_Template-red) *Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan Wang, Han Guo, Tianmin Shu, Meng Song, Eric P. Xing, Zhiting Hu* [[pdf](https://arxiv.org/abs/2205.12548)] 37. **Reasoning with Language Model Prompting: A Survey** Preprint 2022. ![](https://img.shields.io/badge/Reasoning-red) *Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Huajun Chen* [[pdf](https://arxiv.org/abs/2212.09597)] ## Contribution ### Other contributors We thank [Yujia Qin](https://github.com/thuqinyj16), [Xiachong Feng](https://github.com/xcfcode), [Chenglei Si](https://github.com/NoviScl) , [Tianbao Xie](https://github.com/Timothyxxx), [Muhtasham Oblokulov](https://github.com/Muhtasham) for the paper recommendation. ### Contributing to this paper list - First, think about which category the work should belong to. - Second, use the same format as the others to discribe the work. Note that there should be an empty line between the title and the authors list, and take care of the indentation. - Then, add [keywords tags](#keywords-convention). Add the pdf link of the paper. If it is an arxiv publication, we prefer /abs/ format to /pdf/ format. **Don't worry if you put all these wrong, we will fix them for you.** Just contribute and promote your awesome work here! If you recommended a work that wasn't yours, you will be added to the contributor list (be sure to provide your information in [other contributors](#other-contributors)).

Education & Learning ML Frameworks

4.3K Github Stars

Open Source

WantWords

An open-source online reverse dictionary.

ML Frameworks CRM

7.1K Github Stars

Open Source

THUOCL

THUOCL（THU Open Chinese Lexicon）中文词库

Data Labeling

1.1K Github Stars

Open Source

OpenDelta

<div align="center"> <img src="https://s4.ax1x.com/2022/02/14/Hy7lAf.png" width="350px"> **An Open-Source Framework for Parameter-Efficient Tuning (Delta Tuning).** ------ <p align="center"> <a href="#Overview">Overview</a> • <a href="#installation">Installation</a> • <a href="https://opendelta.readthedocs.io/en/latest/notes/usage.html">Basic Usage</a> • <a href="https://opendelta.readthedocs.io/">Docs</a> • <a href="https://docs.google.com/spreadsheets/d/1BIVa8ocAPga-u7rBOXLYaTfaJSjI1dWfwohmLjmFDrY/edit?usp=sharing">Performance</a> • </p> </div> ![version](https://img.shields.io/badge/version-0.3.2-blue) ## Overview OpenDelta is a toolkit for parameter-efficient tuning methods (we dub it as *delta tuning*), by which users could flexibly assign (or add) a small amount parameters to update while keeping the most parameters frozen. By using OpenDelta, users could easily implement prefix-tuning, adapters, Lora, or any other types of delta tuning with preferred PTMs. - The latest version of OpenDelta is tested on Python==3.8.13, PyTorch==1.12.1, transformers==4.22.2. Other versions are likely to be supported as well. If you encounter bugs when using your own package versions, please raise an issue, we will look into it as soon as possible. - **A demo of using OpenDelta to modify the PLM (E.g., BART).** ![How PLM changes using Delta-tuning](docs/source/imgs/demo.gif) ## News - **2022.10.25** Release v0.3.2. Support [BMTrain]()! Improve docs. Add inspect utilities. - **2022.10.14** Release v0.3.0. We make the usage of default configurations of each delta tuning methods (i.e., the position they are attached) more friendly! If a custom model has our supported models as submodules inside, the default configuration is also available. Other key changes can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-3-0) - **2022.10.10** Merge a long-developed branch v0.2.4 into the master branch. Key updates are (1) the an example unifying the delta tuning paradigm and the prompt-tuning paradigm; (2) and support for [Delta Center](https://www.openbmb.org/toolKits/deltacenter), whose webpage is still under construction. Details can be seen in [Update Log](https://opendelta.readthedocs.io/en/latest/notes/update.html#version-0-2-4) - **2022.03.24** We notice several bugs in Soft Prompt Tuning and Prefix Tuning, mainly due to their need to customize attention ids, token_type_ids, we are fixing it! Currently, please use the other methods since they are stabler and better in performance. - **2022.03.20** Add a [Colab example](https://colab.research.google.com/drive/1uAhgAdc8Qr42UKYDlgUv0f7W1-gAFwGo?usp=sharing) to illustrate efficient training and space-saving multitask-serving. - **2022.03.20** A new pip version released. - **2022.02.16** Support [regular expression](https://opendelta.readthedocs.io/en/latest/notes/namebasedaddr.html#regexexpr) in named-based addressing. ## Installation 1. create a virtualenv (optional) ```shell conda create -n opendelta_env python=3.8 conda activate opendelta_env ``` 2. install the latest version ```bash pip install git+https://github.com/thunlp/OpenDelta.git ``` **or** install the latest pip version (more stable) ```bash pip install opendelta ``` **or** build from source ```bash git clone [email protected]:thunlp/OpenDelta.git cd OpenDelta python setup.py install # python setup.py develop # if you want to do some modifications on the code for your research: ``` ## Must Try The following codes and comments walk you through the key functionality of OpenDelta. It is also in [must_try.py](https://github.com/thunlp/OpenDelta/tree/main/examples/unittest/must_try.py) and [must_try.ipynb in Colab](https://colab.research.google.com/drive/1Nbe9zxt8LGQnKmtvEs07IN_PznjNCyk4?usp=sharing). ```python # use transformers as usual. from transformers import AutoModelForSeq2SeqLM, AutoTokenizer t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large") t5_tokenizer = AutoTokenizer.from_pretrained("t5-large") # A running example inputs_ids = t5_tokenizer.encode("Is Harry Potter written by J.K. Rowling", return_tensors="pt") t5_tokenizer.decode(t5.generate(inputs_ids)[0]) # >>> '<pad><extra_id_0>? Is it Harry Potter?</s>' # use existing delta models from opendelta import AutoDeltaModel, AutoDeltaConfig # use existing delta models from DeltaCenter delta = AutoDeltaModel.from_finetuned("thunlp/Spelling_Correction_T5_LRAdapter_demo", backbone_model=t5) # freeze the whole backbone model except the delta models. delta.freeze_module() # visualize the change delta.log() t5_tokenizer.decode(t5.generate(inputs_ids)[0]) # >>> <pad> Is Harry Potter written by J.K. Rowling?</s> # Now save merely the delta models, not the whole backbone model, to tmp/ delta.save_finetuned(".tmp") import os; os.listdir(".tmp") # >>> The state dict size is 1.443 MB # >>> We encourage users to push their final and public models to delta center to share them with the community! # reload the model from local url and add it to pre-trained T5. t5 = AutoModelForSeq2SeqLM.from_pretrained("t5-large") delta1 = AutoDeltaModel.from_finetuned(".tmp", backbone_model=t5) import shutil; shutil.rmtree(".tmp") # don't forget to remove the tmp files. t5_tokenizer.decode(t5.generate(inputs_ids)[0]) # >>> <pad> Is Harry Potter written by J.K. Rowling?</s> # detach the delta models, the model returns to the unmodified status. delta1.detach() t5_tokenizer.decode(t5.generate(inputs_ids)[0]) # >>> '<pad><extra_id_0>? Is it Harry Potter?</s>' # use default configuration for customized wrapped models which have PLMs inside. This is a common need for users. import torch.nn as nn class WrappedModel(nn.Module): def __init__(self, inner_model): super().__init__() self.inner = inner_model def forward(self, *args, **kwargs): return self.inner(*args, **kwargs) wrapped_model = WrappedModel(WrappedModel(t5)) # say we use LoRA delta_config = AutoDeltaConfig.from_dict({"delta_type":"lora"}) delta2 = AutoDeltaModel.from_config(delta_config, backbone_model=wrapped_model) delta2.log() # >>> root # -- inner # -- inner # ... # ... lora_A:[8,1024], lora_B:[1024,8] delta2.detach() # use a not default configuration # say we add lora to the last four layer of the decoder of t5, with lora rank=5 delta_config3 = AutoDeltaConfig.from_dict({"delta_type":"lora", "modified_modules":["[r]decoder.*((20)|(21)|(22)|(23)).*DenseReluDense\.wi"], "lora_r":5}) delta3 = AutoDeltaModel.from_config(delta_config3, backbone_model=wrapped_model) delta3.log() ``` ## Verified Default Configurations - **You can try to use OpenDelta on *any* backbone models based on PyTorch.** - However, with small chances that the interface of the submodules of the backbone model is not supported. Therefore we verified some commonly used models that OpenDelta are sure to support. - We will keep testing more and more emerging models. - Pull requests are welcomed when you successfully apply OpenDelta on your own backbone model. ## Citation ```bibtex @article{hu2023opendelta, title={OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models}, author={Hu, Shengding and Ding, Ning and Zhao, Weilin and Lv, Xingtai and Zhang, Zhen and Liu, Zhiyuan and Sun, Maosong}, journal={arXiv preprint arXiv:2307.03084}, year={2023} } ``` ```bibtex @article{ding2022delta, title={Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models}, author={Ding, Ning and Qin, Yujia and Yang, Guang and Wei, Fuchao and Yang, Zonghan and Su, Yusheng and Hu, Shengding and Chen, Yulin and Chan, Chi-Min and Chen, Weize and others}, journal={arXiv preprint arXiv:2203.06904}, year={2022} } ```

ML Frameworks

1K Github Stars

Open Source

OpenAttack

<p align="center"> <img src="docs/source/images/logo.svg" width = "400" alt="OpenAttack Logo" align=center /> </p> <p align="center"> <a target="_blank"> <img src="https://github.com/thunlp/OpenAttack/workflows/Test/badge.svg?branch=master" alt="Github Runner Covergae Status"> </a> <a href="https://openattack.readthedocs.io/" target="_blank"> <img src="https://readthedocs.org/projects/openattack/badge/?version=latest" alt="ReadTheDoc Status"> </a> <a href="https://pypi.org/project/OpenAttack/" target="_blank"> <img src="https://img.shields.io/pypi/v/OpenAttack?label=pypi" alt="PyPI version"> </a> <a href="https://github.com/thunlp/OpenAttack/releases" target="_blank"> <img src="https://img.shields.io/github/v/release/thunlp/OpenAttack" alt="GitHub release (latest by date)"> </a> <a target="_blank"> <img alt="GitHub" src="https://img.shields.io/github/license/thunlp/OpenAttack"> </a> <a target="_blank"> <img src="https://img.shields.io/badge/PRs-Welcome-red" alt="PRs are Welcome"> </a> <br><br> <a href="https://openattack.readthedocs.io/" target="_blank">Documentation</a> • <a href="#features--uses">Features & Uses</a> • <a href="#usage-examples">Usage Examples</a> • <a href="#attack-models">Attack Models</a> • <a href="#toolkit-design">Toolkit Design</a> <br> </p> OpenAttack is an open-source Python-based textual adversarial attack toolkit, which handles the whole process of textual adversarial attacking, including preprocessing text, accessing the victim model, generating adversarial examples and evaluation. ## Features & Uses #### OpenAttack has the following features: ⭐️ **Support for all attack types**. OpenAttack supports all types of attacks including sentence-/word-/character-level perturbations and gradient-/score-/decision-based/blind attack models; ⭐️ **Multilinguality**. OpenAttack supports English and Chinese now. Its extensible design enables quick support for more languages; ⭐️ **Parallel processing**. OpenAttack provides support for multi-process running of attack models to improve attack efficiency; ⭐️ **Compatibility with 🤗 Hugging Face**. OpenAttack is fully integrated with 🤗 [Transformers](https://github.com/huggingface/transformers) and [Datasets](https://github.com/huggingface/datasets) libraries; ⭐️ **Great extensibility**. You can easily attack a customized <u>victim model</u> on any customized <u>dataset</u> or develop and evaluate a customized <u>attack model</u>. #### OpenAttack has a wide range of uses, including: ✅ Providing various handy **baselines** for attack models; ✅ Comprehensively **evaluating** attack models using its thorough evaluation metrics; ✅ Assisting in quick development of **new attack models** with the help of its common attack components; ✅ Evaluating the **robustness** of a machine learning model against various adversarial attacks; ✅ Conducting **adversarial training** to improve robustness of a machine learning model by enriching the training data with generated adversarial examples. ## Installation #### 1. Using `pip` (recommended) ```bash pip install OpenAttack ``` #### 2. Cloning this repo ```bash git clone https://github.com/thunlp/OpenAttack.git cd OpenAttack python setup.py install ``` After installation, you can try running `demo.py` to check if OpenAttack works well: ``` python demo.py ``` ![demo](./docs/source/images/demo.gif) ## Usage Examples #### Attack Built-in Victim Models OpenAttack builds in some commonly used NLP models like BERT ([Devlin et al. 2018](https://arxiv.org/abs/1810.04805)) and RoBERTa ([Liu et al. 2019](https://arxiv.org/abs/1907.11692)) that have been fine-tuned on some commonly used datasets (such as [SST-2](https://nlp.stanford.edu/sentiment/treebank.html)). You can effortlessly conduct adversarial attacks against these built-in victim models. The following code snippet shows how to use PWWS, a greedy algorithm-based attack model ([Ren et al., 2019](https://www.aclweb.org/anthology/P19-1103.pdf)), to attack BERT on the SST-2 dataset (the complete executable code is [here](./examples/workflow.py)). ```python import OpenAttack as oa import datasets # use the Hugging Face's datasets library # change the SST dataset into 2-class def dataset_mapping(x): return { "x": x["sentence"], "y": 1 if x["label"] > 0.5 else 0, } # choose a trained victim classification model victim = oa.DataManager.loadVictim("BERT.SST") # choose 20 examples from SST-2 as the evaluation data dataset = datasets.load_dataset("sst", split="train[:20]").map(function=dataset_mapping) # choose PWWS as the attacker and initialize it with default parameters attacker = oa.attackers.PWWSAttacker() # prepare for attacking attack_eval = OpenAttack.AttackEval(attacker, victim) # launch attacks and print attack results attack_eval.eval(dataset, visualize=True) ``` <details> <summary><strong>Customized Victim Model</strong></summary> The following code snippet shows how to use PWWS to attack a **customized sentiment analysis model** (a statistical model built in NLTK) on SST-2 (the complete executable code is [here](./examples/custom_victim.py)). ```python import OpenAttack as oa import numpy as np import datasets import nltk from nltk.sentiment.vader import SentimentIntensityAnalyzer # configure access interface of the customized victim model by extending OpenAttack.Classifier. class MyClassifier(oa.Classifier): def __init__(self): # nltk.sentiment.vader.SentimentIntensityAnalyzer is a traditional sentiment classification model. nltk.download('vader_lexicon') self.model = SentimentIntensityAnalyzer() def get_pred(self, input_): return self.get_prob(input_).argmax(axis=1) # access to the classification probability scores with respect input sentences def get_prob(self, input_): ret = [] for sent in input_: # SentimentIntensityAnalyzer calculates scores of “neg” and “pos” for each instance res = self.model.polarity_scores(sent) # we use 𝑠𝑜𝑐𝑟𝑒_𝑝𝑜𝑠 / (𝑠𝑐𝑜𝑟𝑒_𝑛𝑒𝑔 + 𝑠𝑐𝑜𝑟𝑒_𝑝𝑜𝑠) to represent the probability of positive sentiment # Adding 10^−6 is a trick to avoid dividing by zero. prob = (res["pos"] + 1e-6) / (res["neg"] + res["pos"] + 2e-6) ret.append(np.array([1 - prob, prob])) # The get_prob method finally returns a np.ndarray of shape (len(input_), 2). See Classifier for detail. return np.array(ret) def dataset_mapping(x): return { "x": x["sentence"], "y": 1 if x["label"] > 0.5 else 0, } # load some examples of SST-2 for evaluation dataset = datasets.load_dataset("sst", split="train[:20]").map(function=dataset_mapping) # choose the costomized classifier as the victim model victim = MyClassifier() # choose PWWS as the attacker and initialize it with default parameters attacker = oa.attackers.PWWSAttacker() # prepare for attacking attack_eval = oa.AttackEval(attacker, victim) # launch attacks and print attack results attack_eval.eval(dataset, visualize=True) ``` </details> <details> <summary><strong>Customized Dataset</strong></summary> The following code snippet shows how to use PWWS to attack an existing fine-tuned sentiment analysis model on a **customized** dataset (the complete executable code is [here](./examples/custom_dataset.py)). ```python import OpenAttack as oa import transformers import datasets # load a fine-tuned sentiment analysis model from Transformers (you can also use our fine-tuned Victim.BERT.SST) tokenizer = transformers.AutoTokenizer.from_pretrained("echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid") model = transformers.AutoModelForSequenceClassification.from_pretrained("echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid", num_labels=2, output_hidden_states=False) victim = oa.classifiers.TransformersClassifier(model, tokenizer, model.bert.embeddings.word_embeddings) # choose PWWS as the attacker and initialize it with default parameters attacker = oa.attackers.PWWSAttacker() # create your customized dataset dataset = datasets.Dataset.from_dict({ "x": [ "I hate this movie.", "I like this apple." ], "y": [ 0, # 0 for negative 1, # 1 for positive ] }) # prepare for attacking attack_eval = oa.AttackEval(attacker, victim, metrics = [oa.metric.EditDistance(), oa.metric.ModificationRate()]) # launch attacks and print attack results attack_eval.eval(dataset, visualize=True) ``` </details> <details> <summary><strong>Multiprocessing</strong></summary> OpenAttack supports convenient multiprocessing to accelerate the process of adversarial attacks. The following code snippet shows how to use multiprocessing in adversarial attacks with Genetic ([Alzantot et al. 2018](https://www.aclweb.org/anthology/D18-1316)), a genetic algorithm-based attack model (the complete executable code is [here](./examples/multiprocess_eval.py)). ```python import OpenAttack as oa import datasets def dataset_mapping(x): return { "x": x["sentence"], "y": 1 if x["label"] > 0.5 else 0, } victim = oa.loadVictim("BERT.SST") dataset = datasets.load_dataset("sst", split="train[:20]").map(function=dataset_mapping) attacker = oa.attackers.GeneticAttacker() attack_eval = oa.AttackEval(attacker, victim) # Using multiprocessing simply by specify num_workers attack_eval.eval(dataset, visualize=True, num_workers=4) ``` </details> <details> <summary><strong>Chinese Attack</strong></summary> OpenAttack now supports adversarial attacks against English and Chinese victim models. [Here](./examples/chinese.py) is an example code of conducting adversarial attacks against a Chinese review classification model using PWWS. </details> <details> <summary><strong>Customized Attack Model</strong></summary> OpenAttack incorporates many handy components that can be easily assembled into new attack models. [Here](./examples/custom_attacker.py) gives an example of how to design a simple attack model that shuffles the tokens in the original sentence. </details> <details> <summary><strong>Adversarial Training</strong></summary> OpenAttack can easily generate adversarial examples by attacking instances in the training set, which can be added to original training data set to retrain a more robust victim model, i.e., adversarial training. [Here](./examples/adversarial_training.py) gives an example of how to conduct adversarial training with OpenAttack. </details> <details> <summary><strong>More Examples</strong></summary> - Attack Sentence Pair Classification Models. In addition to single sentence classification models, OpenAttack support attacks against sentence pair classification models. [Here](./examples/nli_attack.py) is an example code of conducting adversarial attacks against an NLI model with OpenAttack. - Customized Evaluation Metric. OpenAttack supports designing a customized adversarial attack evaluation metric. [Here](./examples/custom_eval.py) gives an example of how to add a customized evaluation metric and use it to evaluate adversarial attacks. </details> ## Attack Models According to the level of perturbations imposed on original input, textual adversarial attack models can be categorized into sentence-level, word-level, character-level attack models. According to the accessibility to the victim model, textual adversarial attack models can be categorized into `gradient`-based, `score`-based, `decision`-based and `blind` attack models. > [TAADPapers](https://github.com/thunlp/TAADpapers) is a paper list which summarizes almost all the papers concerning textual adversarial attack and defense. You can have a look at this list to find more attack models. Currently OpenAttack includes 15 typical attack models against text classification models that cover **all** attack types. Here is the list of currently involved attack models. - Sentence-level - (SEA) **Semantically Equivalent Adversarial Rules for Debugging NLP Models**. *Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin*. ACL 2018. `decision` [[pdf](https://aclweb.org/anthology/P18-1079)] [[code](https://github.com/marcotcr/sears)] - (SCPN) **Adversarial Example Generation with Syntactically Controlled Paraphrase Networks**. *Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer*. NAACL-HLT 2018. `blind` [[pdf](https://www.aclweb.org/anthology/N18-1170)] [[code&data](https://github.com/miyyer/scpn)] - (GAN) **Generating Natural Adversarial Examples**. *Zhengli Zhao, Dheeru Dua, Sameer Singh*. ICLR 2018. `decision` [[pdf](https://arxiv.org/pdf/1710.11342.pdf)] [[code](https://github.com/zhengliz/natural-adversary)] - Word-level - (TextFooler) **Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment**. *Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits*. AAAI-20. `score` [[pdf](https://arxiv.org/pdf/1907.11932v4)] [[code](https://github.com/wqj111186/TextFooler)] - (PWWS) **Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency**. *Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che*. ACL 2019. `score` [[pdf](https://www.aclweb.org/anthology/P19-1103.pdf)] [[code](https://github.com/JHL-HUST/PWWS/)] - (Genetic) **Generating Natural Language Adversarial Examples**. *Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, Kai-Wei Chang*. EMNLP 2018. `score` [[pdf](https://www.aclweb.org/anthology/D18-1316)] [[code](https://github.com/nesl/nlp_adversarial_examples)] - (SememePSO) **Word-level Textual Adversarial Attacking as Combinatorial Optimization**. *Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu and Maosong Sun*. ACL 2020. `score` [[pdf](https://www.aclweb.org/anthology/2020.acl-main.540.pdf)] [[code](https://github.com/thunlp/SememePSO-Attack)] - (BERT-ATTACK) **BERT-ATTACK: Adversarial Attack Against BERT Using BERT**. *Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, Xipeng Qiu*. EMNLP 2020. `score` [[pdf](https://www.aclweb.org/anthology/2020.emnlp-main.500.pdf)] [[code](https://github.com/LinyangLee/BERT-Attack)] - (BAE) **BAE: BERT-based Adversarial Examples for Text Classification**. *Siddhant Garg, Goutham Ramakrishnan. EMNLP 2020*. `score` [[pdf](https://www.aclweb.org/anthology/2020.emnlp-main.498.pdf)] [[code](https://github.com/QData/TextAttack/blob/master/textattack/attack_recipes/bae_garg_2019.py)] - (FD) **Crafting Adversarial Input Sequences For Recurrent Neural Networks**. *Nicolas Papernot, Patrick McDaniel, Ananthram Swami, Richard Harang*. MILCOM 2016. `gradient` [[pdf](https://arxiv.org/pdf/1604.08275.pdf)] - Word/Char-level - (TextBugger) **TEXTBUGGER: Generating Adversarial Text Against Real-world Applications**. *Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, Ting Wang*. NDSS 2019. `gradient` `score` [[pdf](https://arxiv.org/pdf/1812.05271.pdf)] - (UAT) **Universal Adversarial Triggers for Attacking and Analyzing NLP.** *Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh*. EMNLP-IJCNLP 2019. `gradient` [[pdf](https://arxiv.org/pdf/1908.07125.pdf)] [[code](https://github.com/Eric-Wallace/universal-triggers)] [[website](http://www.ericswallace.com/triggers)] - (HotFlip) **HotFlip: White-Box Adversarial Examples for Text Classification**. *Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou*. ACL 2018. `gradient` [[pdf](https://www.aclweb.org/anthology/P18-2006)] [[code](https://github.com/AnyiRao/WordAdver)] - Char-level - (VIPER) **Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems**. *Steffen Eger, Gözde Gül ¸Sahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych*. NAACL-HLT 2019. `score` [[pdf](https://www.aclweb.org/anthology/N19-1165)] [[code&data](https://github.com/UKPLab/naacl2019-like-humans-visual-attacks)] - (DeepWordBug) **Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers**. *Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi*. IEEE SPW 2018. `score` [[pdf](https://ieeexplore.ieee.org/document/8424632)] [[code](https://github.com/QData/deepWordBug)] The following table illustrates the comparison of the attack models. | Model | Accessibility | Perturbation | Main Idea | | :---------: | :-------------: | :----------: | :-------------------------------------------------- | | SEA | Decision | Sentence | Rule-based paraphrasing | | SCPN | Blind | Sentence | Paraphrasing | | GAN | Decision | Sentence | Text generation by encoder-decoder | | TextFooler | Score | Word | Greedy word substitution | | PWWS | Score | Word | Greedy word substitution | | Genetic | Score | Word | Genetic algorithm-based word substitution | | SememePSO | Score | Word | Particle Swarm Optimization-based word substitution | | BERT-ATTACK | Score | Word | Greedy contextualized word substitution | | BAE | Score | Word | Greedy contextualized word substitution and insertion | | FD | Gradient | Word | Gradient-based word substitution | | TextBugger | Gradient, Score | Word+Char | Greedy word substitution and character manipulation | | UAT | Gradient | Word, Char | Gradient-based word or character manipulation | | HotFlip | Gradient | Word, Char | Gradient-based word or character substitution | | VIPER | Blind | Char | Visually similar character substitution | | DeepWordBug | Score | Char | Greedy character manipulation | ## Toolkit Design Considering the significant distinctions among different attack models, we leave considerable freedom for the skeleton design of attack models, and focus more on streamlining the general processing of adversarial attacking and the common components used in attack models. OpenAttack has 7 main modules: <img src="./docs/source/images/toolkit_framework.png" alt="toolkit_framework" style="zoom:40%;" /> * **TextProcessor**: processing the original text sequence to assist attack models in generating adversarial examples; * **Victim**: wrapping victim models; * **Attacker**: comprising various attack models; * **AttackAssist**: packing different word/character substitution methods that are used in word-/character-level attack models and some other components used in sentence-level attack models like the paraphrasing model; * **Metric**: providing several adversarial example quality metrics that can serve as either the constraints on the adversarial examples during attacking or evaluation metrics for evaluating adversarial attacks; * **AttackEval**: evaluating textual adversarial attacks from attack effectiveness, adversarial example quality and attack efficiency; * **DataManager**: managing all data and saved models that are used in other modules. ## Citation Please cite our [paper](https://aclanthology.org/2021.acl-demo.43.pdf) if you use this toolkit: ``` @inproceedings{zeng2020openattack, title={{Openattack: An open-source textual adversarial attack toolkit}}, author={Zeng, Guoyang and Qi, Fanchao and Zhou, Qianrui and Zhang, Tingji and Hou, Bairu and Zang, Yuan and Liu, Zhiyuan and Sun, Maosong}, booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations}, pages={363--371}, year={2021}, url={https://aclanthology.org/2021.acl-demo.43}, doi={10.18653/v1/2021.acl-demo.43} } ``` ## Contributors We thank all the contributors to this project. And more contributions are very welcome. <a href="https://github.com/thunlp/OpenAttack/graphs/contributors"> <img src="https://contrib.rocks/image?repo=thunlp/OpenAttack"/></a>

ML Frameworks Testing & QA

777 Github Stars

Open Source

UltraChat

<div align="center"> <img src="figures/ultra_logo.png" width="250px"> **Large-scale, Informative, and Diverse Multi-round Dialogue Data, and Models** <p align="center"> <a href="#UltraLM"> UltraLM</a> • <a href="http://39.101.77.220/">Data Explorer</a> • <a href="https://atlas.nomic.ai/map/0ce65783-c3a9-40b5-895d-384933f50081/a7b46301-022f-45d8-bbf4-98107eabdbac">Nomic AI Atlas Explorer</a> • <a href="#data">Data Release</a> • <a href="#construction-of-ultrachat">Construction Process</a> • <a href="https://arxiv.org/abs/2305.14233">Paper</a> </p> </div> <div align="center"> ![Dialogues](https://img.shields.io/badge/Current\_Dialogues-1.57M-red?style=flat-square) ![Dialogues](https://img.shields.io/badge/UltraLM\_13B-Released-blue?style=flat-square) ![Dialogues](https://img.shields.io/badge/UltraChat-Released-green?style=flat-square) </div> ## News - **❗️❗️ Septempber 26, 2023: Releasing a large-scale, fine-grained, diverse preference dataset [UltraFeedback](https://github.com/thunlp/UltraFeedback), a new language model [UltraLM-13B-v2.0](https://huggingface.co/openbmb/UltraLM-13b-v2.0), a reward model [UltraRM](https://huggingface.co/openbmb/UltraRM-13b) and a critic model [UltraCM](https://huggingface.co/openbmb/UltraCM-13b).** - **❗️❗️With UltraRM, a simple best-of-16 sampling achieves 92.30% (UltraLM2, 🥇 in 13B results) and 91.54% (UltraLM, 🥇 in LLaMA-1 results) win rates against text-davinci-003 on [AlpacaEval benchmark](https://tatsu-lab.github.io/alpaca_eval/).** - August 7, 2023: Releasing the first version of UltraLM-65B. - June 28, 2023: UltraLM-13B ranks #1 among open-source models on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/)! - June 27, 2023: We release the first version of UltraLM-13B, a powerful chat language model trained on UltraChat! - April 20, 2023: Released all data, more processing and additional data are expected. - April 17, 2023: The rest of the Writing and Creation sector is released (457k). - April 12, 2023: The first part of the Writing and Creation sector is released. - April 9, 2023: Supported by [gpt4all](https://github.com/nomic-ai/gpt4all), we now also have an [Atlas explorer](https://atlas.nomic.ai/map/0ce65783-c3a9-40b5-895d-384933f50081/a7b46301-022f-45d8-bbf4-98107eabdbac) - April 8, 2023: We release a training script by taking GPT-J as an example. - April 7, 2023: The second part of Questions about the World is released. It contains 290k generated multi-round dialogues. - March 31, 2023: The first part of Questions about the World is released. It contains 280k generated multi-round dialogues. ## UltraLM UltraLM is a series of chat language models trained on UltraChat. Currently, we have released the 13B version, which **ranks #1** among open-source models and **ranks #4** among all models on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/) (June 28, 2023). UltraLM-13B is based upon LLaMA-13B and supported by [BMTrain](https://github.com/OpenBMB/BMTrain) in the training process. #### Download | Model | Link | Version | | ------------- | ------------- |------------- | | UltraLM-13B | [Huggingface Repo](https://huggingface.co/openbmb/UltraLM-13b) |v1.0| | UltraLM-65B | [Huggingface Repo](https://huggingface.co/openbmb/UltraLM-65b) |v1.0| | UltraLM-13B | [Huggingface Repo](https://huggingface.co/openbmb/UltraLM-13b-v2.0) |v2.0| | UltraRM-13B | [Huggingface Repo](https://huggingface.co/openbmb/UltraRM-13b) |v1.0| | UltraCM-13B | [Huggingface Repo](https://huggingface.co/openbmb/UltraCM-13b) |v1.0| #### Use UltraLM - To reconstruct UltraLM, please download LLaMA-13B and our Delta weights. Please use this [link](https://huggingface.co/openbmb/UltraLM-13b) if you would like to test UltraLM, we haven't evaluated other weights provided by third parties. - Run the script in `/UltraLM/recover.sh` to obtain the final weights of the recovered model. - After obtaining the recovered model, replace the model path in `/UltraLM/chat_cli.sh` with your path and run to start chatting! *Note: Different hyper-parameters or system prompts will affect the outputs. You can refer to details in `/UltraLM/inference_cli.py` for our default setting.* <details><summary> <b> Performance </b> </summary> <p> We report three evaluations in this section: Alpaca-Eval from Stanford, Evol-instruct from Microsoft's WizardLM, and our curated evaluation set. Evaluations of modern LLMs may be biased and affected by many factors, we are also actively working on more comprehensive evaluation methods. #### Alpaca-Eval [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) is a leaderboard specifically designed for evaluating LLMs. The leaderboard is created based on the win-rate against Text-Davince-003 automatically evaluated by GPT-4. <img src="figures/alpaca.png" width="550px"> #### Evol-instruct This dataset is constructed with an evolutionary strategy by rewriting the instructions through multiple rounds to obtain instructions at different complexity levels. The benchmark is developed by the [WizardLM project](https://github.com/nlpxucan/WizardLM), another excellent chat language model! ![Results](figures/wizard_test.jpg) #### Our Evaluation Set We curate an evaluation set, encompassing the [Vicuna Benchmark](https://lmsys.org/blog/2023-03-30-vicuna/) and additional 300 questions and instructions generated by GPT-4. The questions/instructions cover a wide range of topics, including commonsense, world knowledge, professional knowledge (specifically physics and biology) , mathematics, and writing tasks on different levels of difficulty. We use GPT-4 for evaluation. [Here](data/ultra_eval.json) is the dataset. ![Results](figures/compare_ultra.jpg) </p> </details> <details><summary> <b> Examples of UltraLM </b> </summary> <p> - Example A: Question of Common Knowledge > **User:** Which animal is known as man's best friend? > **UltraLM:** There are many animals that humans enjoy as companion animals, but one of the absolute standout answers to the question "which animal is known as a man's best friend?" is the domestically-bred dog. Domesticated dogs have been by humans' sides for tens of thousands of years, providing faithful companionship and help; guarding people and property; and lending emotional support in everything from long, lonely days to military deployments. Dogs come in an almost mind-boggling variety of breeds of various sizes, with different energy levels, personalities, and temperaments, meaning there's almost certainly a pup out there who will be a perfect fit for whoever needs it most. - Example B: Commonsense Reasoning > **User:** Which weighs more: a pound of feathers or a pound of bricks? > **UltraLM:** A pound of feathers and a pound of bricks both weigh exactly one pound. This is because one pound is a unit of weight, just as a foot is a unit of length or a second is a unit of time. We use the pound (or other unit of weight) to describe the weight of different objects, regardless of what that object is made of. So the answer to your question is that a pound of feathers and a pound of bricks both weigh the same amount. - Example C: Reasoning > **User:** If a farmer has 17 sheep and all but 9 die, how many are left? > **UltraLM:** The total number of sheep that are left is 9. - Example D: Reasoning > **User:** If a red house is made from red bricks, and a blue house is made from blue bricks, what is a greenhouse made from? > **UltraLM:** A greenhouse is not made from green bricks, but rather from materials such as glass or plastic that allow light to pass through and create a controlled environment for plants to grow. Greenhouses can come in a variety of colors, including green, but the color of the building itself does not affect the plants growing inside. - Example E: Creation > **User:** Write a poem about electromagnetism. > **UltraLM:** Electromagnetism is a force we can't ignore, It's the reason for many things we adore. In a magnetic field, a wire will glow, And the electrons will align, both old and new. Electricity and magnetism are hand in hand, One creates a field, the other has a brand. The magnetic field can push and pull, On a wire with current, it will react in full. The attraction and repulsion it brings, Are the reason for many things. From motors to generators, and much more in between, Electromagnetism is at the root of what it means. It's a force that's all around us, From the compass to the light switch. With its power, it can do much, From a simple needle to a rocket's thrust. It's a force we can't deny, For it's a part of our lives. Electromagnetism is something to admire, For its strength and its might, it has no equal, it's sure. - Example F: Creation > **User:** Write a short story about the first second of the universe. > **UltraLM:** In the first second of the universe, darkness reigned supreme. There was no light, no sound, no shape, no substance. Nothing yet was, but there was an intense power, a raw energy waiting to be unleashed. Within infinity, the first second held the potential for creation in its grasp. And then, with a mighty bang, a bright point of light emerged into existence. In it, the most fundamental particles were born, their forms yet unknown as they frenetically collided in a chaotic dance of creation. Within this primordial light, every force of nature and every component of life was stirred from pure energy to physical reality. The first atoms quickly assembled, and from these, stars and galaxies began to form. Their fusion creating a hot environment that pushed back the very void, shaping the cosmos through expanding and cooling waves. In this cosmic first, no structure of experience, not even time or space, could establish its rule. For an instant, the entirety of the universe existed in an embryonic nexus, a cosmic singularity from which all future existence originated. It was a microcosm forging the macrocosm of which it ultimately becomes a part, its moment of birth a metaphor for the cycles of creation that permeate the multiverse(s).From that single point of light, an infinite universe unfolded. For centuries, eons ahead, and forevermore, it remained the most foundational blink of its unceasing journey into the depths of the Great Beyond. The first second is the beginning of everything. </p> </details> ## Overview of UltraChat This project aims to construct *open-source, large-scale, and multi-round* dialogue data powered by Turbo APIs to facilitate the construction of powerful language models with general conversational capability. In consideration of factors such as safeguarding privacy, **we do not directly use any data available on the Internet as prompts**. <details><summary> <b> UltraChat is composed of three sectors </b> </summary> <p> - 🌏 **Questions about the World**: The dialogue data in this sector is derived from a wide range of inquiries related to concepts, entities, and objects from the real world. The topics covered are extensive, spanning areas such as technology, art, and entrepreneurship. - ✍🏻 **Writing and Creation**: The dialogue data in this sector is driven by the demands for writing/creation from scratch, and encompasses any tasks that an AI assistant may aid within the creative process, spanning from email composition to crafting narratives and plays, and beyond. - 📋 **Assistance on Existent Materials**: The dialogue data in this sector is generated based on existing materials, including but not limited to rewriting, continuation, summarization, and inference, covering a diverse range of topics. </p> </details> *Disclaimer: Although the process of building UltraChat does NOT involve any publicly available benchmark data, scaling to a certain extent may still result in some overlap in some evaluation benchmarks. We would like to emphasize again that **all the data is automatically generated (including the instructions and responses)**, and we do not insert any open benchmark data. For example, UltraChat was released (April, 2023) **earlier** than Alpaca Eval (May, 2023). We encourage users to closely monitor such phenomena, while we are also actively considering how to evaluate LLMs more properly.* <details><summary> <b>An Example of UltraChat </b> </summary> <p> <div align="center"> <img src="https://i.328888.xyz/2023/04/02/iHh8DC.png" width="900px"> </div> </p> </details> ## Data The dataset is intended solely for research and educational purposes and should not be construed as reflecting the opinions or views of the creators, owners, or contributors of this dataset. And it is distributed under the MIT license. ### Data Release [Explore](http://39.101.77.220/) the data before downloading, or use [Atlas explorer](https://atlas.nomic.ai/map/0ce65783-c3a9-40b5-895d-384933f50081/a7b46301-022f-45d8-bbf4-98107eabdbac). - 🤗 [Huggingface Datasets Host](https://huggingface.co/datasets/stingning/ultrachat) Direct Download links: - [Questions about the World [Part I + Part II]](https://cloud.tsinghua.edu.cn/f/0a27393192ad46a5a081/?dl=1) - [Writing and Creation [Part I]](https://cloud.tsinghua.edu.cn/f/57258a87846243218a9b/?dl=1) - [Writing and Creation [Part II]](https://cloud.tsinghua.edu.cn/f/099b4dd71b82448fb7fb/?dl=1) - [Assistance on Existent Materials [Part I]](https://cloud.tsinghua.edu.cn/f/1f7abdf2d2564cb4b338/?dl=1) ### Data Format Each line in the downloaded data file is a json dict containing the data id and dialogue data in a list format. Below is an example line. ```JSON { "id": "0", "data": [ "How can cross training benefit groups like runners, swimmers, or weightlifters?", "Cross training can benefit groups like runners, swimmers, or weightlifters in the following ways: ...", "That makes sense. I've been wanting to improve my running time, but I never thought about incorporating strength training. Do you have any recommendations for specific exercises?", "Sure, here are some strength training exercises that can benefit runners: ...", "Hmm, I'm not really a fan of weightlifting though. Can I incorporate other forms of exercise into my routine to improve my running time?", "Yes, absolutely! ...", "..." ] } ``` ## Training We provide training code to fine-tune [LLaMa](https://github.com/facebookresearch/llama) (however we are not distributing the weights of LLaMa) on UltraChat in [`.src/`](src), the training is accelerated by [BMTrain](https://github.com/OpenBMB/BMTrain). - Download the released data and put it under `./data` - Run `train_bm.py`, for example: ```bash WANDB_MODE="offline" torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:50003 train_bm.py --tensorboard ./ultrachat_llama_tb_2 --save_step 5000 --logging_step 100 ``` We also provide a training script to fine-tune GPT-J on UltraChat in [`.src/train_legacy/`](src), which is implemented with [OpenPrompt](https://github.com/thunlp/OpenPrompt) - Download the released data and put it under `./data` - Run `accelerate launch train.py` to start training ## Construction of UltraChat The general idea of UltraChat is to use separate LLMs to generate opening lines, simulate users and respond to queries. Each sector of UltraChat has its own challenges and requires particular strategy designs. We will specify the construction process once a sector of UltraChat is released. <div align="center"> <img src="figures/ultra-process.png" width="700px"> </div> <details><summary> <b>Questions about the World</b> </summary> <p> #### Meta Topics & Sub-Topics - The data is derived from 30 representative and diverse meta topics (icons are from [flaticon](https://www.flaticon.com/)) <div align="center"> <img src="figures/meta_topic.png" width="650px"> </div> - Based on the above meta topics, we generate 1100+ subtopics for data construction - For each subtopic, we generate up to 10 specific questions. - Then we use Turbo APIs to generate new relevant questions for each of the 10 questions. We use hand-crafted prompts to instruct the model to generate a diverse set of questions covering a wide range of common concepts and objects. - For each question, we generate a 3~7-round dialogue using the two models iteratively as described above. </p> <p> #### Common Real-world Entities - We gather top-frequent 10000 named entities from Wikidata. - We generate 5 meta questions for each entity using ChatGPT API. - For each meta question, we generate 10 more specific questions and 20 related but general questions. - We sample 200k specific questions and 250k general questions along with the 50k meta-questions, and we generate a 3~7-round dialogue for each. </p> </details> <details><summary> <b>Writing and Creation</b> </summary> <p> - We first collect 20 types of writing, as shown below (icons are from [flaticon](https://www.flaticon.com/)). <div align="center"> <img src="https://github.com/thunlp/UltraChat/raw/main/figures/figure.png" width="650px"> </div> - For each type of writing, generate 200 different instructions that ask an AI assistant to generate text material, and 80% of the instructions are further expanded and detailed. - Use the generated instructions as initial input and generate a 2~4-round dialogue each. </p> </details> <details><summary> <b>Assistance on Existent Materials</b> </summary> <p> - We extract ~10w diverse materials from C4 dataset. - We generate up to 5 questions/instructions for each piece of material. - We combine the material with each question/instruction with a set of manually designed template as the initial input of a user to start a dialogue with AI assistant. - For each input, we generate a 2~4-round dialogue. </p> </details> ## To Do - [x] Release the rest part of the data for Questions about the World. - [x] Continue to release the data of Writing and Creation. - [x] Continue to release the data of Assistance on Existent Materials in the future. - [x] Train a model on UltraChat and conduct in-detail analysis. Welcome to use it to train your chat model! - [ ] There will be a Chinese version of UltraChat. ## Limitations - Although UltraChat could perform promisingly on several benchmarks, it still has limitations such as hallucinations. In our experience, UltraLM could handle questions in terms of world knowledge and creation. The reasoning, math, and coding abilities still need to be explicitly enhanced. ## Citation Feel free to cite the repo if you think UltraChat is useful. ```bibtex @article{ding2023enhancing, title={Enhancing Chat Language Models by Scaling High-quality Instructional Conversations}, author={Ding, Ning and Chen, Yulin and Xu, Bokai and Qin, Yujia and Zheng, Zhi and Hu, Shengding and Liu, Zhiyuan and Sun, Maosong and Zhou, Bowen}, journal={arXiv preprint arXiv:2305.14233}, year={2023} } ```

LLM Tools & Chat UIs ML Frameworks Data Labeling

2.9K Github Stars

Software by thunlp

PromptPapers

WantWords

THUOCL

OpenDelta

OpenAttack

UltraChat