About Agent-FLAN

Agent-FLAN is an open-source initiative that fine-tunes Large Language Models to excel as autonomous agents. Published in ACL 2024 Findings, it addresses the performance gap between open models and API-based agents by proposing a novel data generation pipeline and training methodology. The project identifies three critical challenges in current agent tuning: entangled training formats shifting from pre-training distributions, varying learning speeds across required capabilities, and hallucination issues. To solve these, Agent-FLAN decomposes and redesigns the training corpus using a mix of AgentInstruct, ToolBench, and ShareGPT datasets. The resulting models, including a 7B parameter version fine-tuned on Llama2, demonstrate superior performance on Held-In and Held-Out agent evaluation tasks. Key features include significant improvements in tool utilization, a 3.5% performance gain over prior best works, and effective mitigation of hallucinations through comprehensively constructed negative samples. The solut

i

Published by

internlm

Visit View Profile

README.md

View on GitHub

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

✨ Introduction

[🤗 HuggingFace] [🧰 OpenXLab] [📃 Paper] [🌐 Project Page]

Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. How to integrate agent ability into general LLMs becomes a crucial and urgent problem. This paper first delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. Based on the above findings, we propose Agent-FLAN to effectively Fine-tune LANguage models for Agents. Through careful decomposition and redesign of the training corpus, Agent-FLAN enables Llama2-7B to outperform prior best works by 3.5% across various agent evaluation datasets. With comprehensively constructed negative samples, Agent-FLAN greatly alleviates the hallucination issues based on our established evaluation benchmark. Besides, it consistently improves the agent capability of LLMs when scaling model sizes while slightly enhancing the general capability of LLMs.

🚀 What's New

[2024.3.21] Paper available on ArXiv. 🔥🔥🔥
[2024.3.20] Release the dataset and model checkpoint for Agent-FLAN. 🎉🎉🎉

♟️ Agent-FLAN

Agent-FLAN series are finetuned on AgentInstruct and Toolbench by applying the data generation pipeline proposed in Agent-FLAN paper, which holds strong abilities on various agent tasks and tool utilization~

Comparison of recent agent tuning approaches on Held-In, Held-Out tasks. Performances are normalized with GPT-4 results for better visualization. * denotes our re-implementation for a fair comparison.

🤗 HuggingFace Model & Dataset

Agent-FLAN is produced by mixed training on AgentInstruct, ToolBench, and ShareGPT datasets from the Llama2-chat series.

The models follow the conversation format of Llama-2-chat, with the template protocol as:

dict(role='user', begin='<|Human|>െ', end='\n '),
dict(role='system', begin='<|Human|>െ', end='\n '),
dict(role='assistant', begin='<|Assistant|>െ', end='ി\n '),

The 7B model is available on Huggingface & OpenXLab model hub.

Model	Huggingface Repo	OpenXLab Repo
Agent-FLAN-7B	Model Link	Model Link

The Agent-FLAN dataset is also available on Huggingface dataset hub.

Dataset	Huggingface Repo
Agent-FLAN	Dataset Link

💫 Detailed Results

Main results of Agent-FLAN. Agent-FLAN significantly outperforms previous agent-tuning approaches by a large margin on both held-in and held-out tasks. * denotes our re-implementation with the same amount of training data for a fair comparison. Since FireAct does not train on AgentInstruct dataset, we omit its performance on the HELD-IN set. Bold: the best in API-based and open-sourced models.

❤️ Acknowledgements

Agent-FLAN is built with Lagent and T-Eval. Thanks for their awesome work!

🖊️ Citation

If you find this project useful in your research, please consider cite:

@article{chen2024agent,
  title={Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models},
  author={Chen, Zehui and Liu, Kuikun and Wang, Qiuchen and Zhang, Wenwei and Liu, Jiangning and Lin, Dahua and Chen, Kai and Zhao, Feng},
  journal={arXiv preprint arXiv:2403.12881},
  year={2024}
}

💳 License

This project is released under the Apache 2.0 license.

Agent-FLAN