bigscience-workshop

Open Source

petals

<img src="https://i.imgur.com/7eR7Pan.png" width="400"> Run large language models at home, BitTorrent-style. Fine-tuning and inference <a href="https://github.com/bigscience-workshop/petals#benchmarks">up to 10x faster</a> than offloading <a href="https://pypi.org/project/petals/"><img src="https://img.shields.io/pypi/v/petals.svg?color=green"></a> <a href="https://discord.gg/tfHfe8B34k"><img src="https://img.shields.io/discord/865254854262652969?label=discord&logo=discord&logoColor=white"></a> Generate text with distributed **Llama 3.1** (up to 405B), **Mixtral** (8x22B), **Falcon** (40B+) or **BLOOM** (176B) and fine‑tune them for your own tasks — right from your desktop computer or Google Colab: ```python from transformers import AutoTokenizer from petals import AutoDistributedModelForCausalLM # Choose any model available at https://health.petals.dev model_name = "meta-llama/Meta-Llama-3.1-405B-Instruct" # Connect to a distributed network hosting model layers tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoDistributedModelForCausalLM.from_pretrained(model_name) # Run the model as if it were on your computer inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"] outputs = model.generate(inputs, max_new_tokens=5) print(tokenizer.decode(outputs[0])) # A cat sat on a mat... ``` 🚀  <a href="https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing">Try now in Colab</a> 🦙 **Want to run Llama?** [Request access](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) to its weights, then run `huggingface-cli login` in the terminal before loading the model. Or just try it in our [chatbot app](https://chat.petals.dev). 🔏 **Privacy.** Your data will be processed with the help of other people in the public swarm. Learn more about privacy [here](https://github.com/bigscience-workshop/petals/wiki/Security,-privacy,-and-AI-safety). For sensitive data, you can set up a [private swarm](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm) among people you trust. 💬 **Any questions?** Ping us in [our Discord](https://discord.gg/KdThf2bWVU)! ## Connect your GPU and increase Petals capacity Petals is a community-run system — we rely on people sharing their GPUs. You can help serving one of the [available models](https://health.petals.dev) or host a new model from 🤗 [Model Hub](https://huggingface.co/models)! As an example, here is how to host a part of [Llama 3.1 (405B) Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) on your GPU: 🦙 **Want to host Llama?** [Request access](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) to its weights, then run `huggingface-cli login` in the terminal before loading the model. 🐧 **Linux + Anaconda.** Run these commands for NVIDIA GPUs (or follow [this](https://github.com/bigscience-workshop/petals/wiki/Running-on-AMD-GPU) for AMD): ```bash conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia pip install git+https://github.com/bigscience-workshop/petals python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct ``` 🪟 **Windows + WSL.** Follow [this guide](https://github.com/bigscience-workshop/petals/wiki/Run-Petals-server-on-Windows) on our Wiki. 🐋 **Docker.** Run our [Docker](https://www.docker.com) image for NVIDIA GPUs (or follow [this](https://github.com/bigscience-workshop/petals/wiki/Running-on-AMD-GPU) for AMD): ```bash sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \ learningathome/petals:main \ python -m petals.cli.run_server --port 31330 meta-llama/Meta-Llama-3.1-405B-Instruct ``` 🍏 **macOS + Apple M1/M2 GPU.** Install [Homebrew](https://brew.sh/), then run these commands: ```bash brew install python python3 -m pip install git+https://github.com/bigscience-workshop/petals python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct ``` 📚  <a href="https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions#running-a-server">Learn more</a> (how to use multiple GPUs, start the server on boot, etc.) 🔒 **Security.** Hosting a server does not allow others to run custom code on your computer. Learn more [here](https://github.com/bigscience-workshop/petals/wiki/Security,-privacy,-and-AI-safety). 💬 **Any questions?** Ping us in [our Discord](https://discord.gg/X7DgtxgMhc)! 🏆 **Thank you!** Once you load and host 10+ blocks, we can show your name or link on the [swarm monitor](https://health.petals.dev) as a way to say thanks. You can specify them with `--public_name YOUR_NAME`. ## How does it work? - You load a small part of the model, then join a [network](https://health.petals.dev) of people serving the other parts. Single‑batch inference runs at up to **6 tokens/sec** for **Llama 2** (70B) and up to **4 tokens/sec** for **Falcon** (180B) — enough for [chatbots](https://chat.petals.dev) and interactive apps. - You can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of **PyTorch** and **🤗 Transformers**. <img src="https://i.imgur.com/RTYF3yW.png" width="800"> 📜  <a href="https://arxiv.org/pdf/2209.01188.pdf">Read paper</a>            📚  <a href="https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions">See FAQ</a> ## 📚 Tutorials, examples, and more Basic tutorials: - Getting started: [tutorial](https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing) - Prompt-tune Llama-65B for text semantic classification: [tutorial](https://colab.research.google.com/github/bigscience-workshop/petals/blob/main/examples/prompt-tuning-sst2.ipynb) - Prompt-tune BLOOM to create a personified chatbot: [tutorial](https://colab.research.google.com/github/bigscience-workshop/petals/blob/main/examples/prompt-tuning-personachat.ipynb) Useful tools: - [Chatbot web app](https://chat.petals.dev) (connects to Petals via an HTTP/WebSocket endpoint): [source code](https://github.com/petals-infra/chat.petals.dev) - [Monitor](https://health.petals.dev) for the public swarm: [source code](https://github.com/petals-infra/health.petals.dev) Advanced guides: - Launch a private swarm: [guide](https://github.com/bigscience-workshop/petals/wiki/Launch-your-own-swarm) - Run a custom model: [guide](https://github.com/bigscience-workshop/petals/wiki/Run-a-custom-model-with-Petals) ### Benchmarks Please see **Section 3.3** of our [paper](https://arxiv.org/pdf/2209.01188.pdf). ### 🛠️ Contributing Please see our [FAQ](https://github.com/bigscience-workshop/petals/wiki/FAQ:-Frequently-asked-questions#contributing) on contributing. ### 📜 Citations Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel. [Petals: Collaborative Inference and Fine-tuning of Large Models.](https://arxiv.org/abs/2209.01188) _Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)._ 2023. ```bibtex @inproceedings{borzunov2023petals, title = {Petals: Collaborative Inference and Fine-tuning of Large Models}, author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin}, booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)}, pages = {558--568}, year = {2023}, url = {https://arxiv.org/abs/2209.01188} } ``` Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, and Colin Raffel. [Distributed inference and fine-tuning of large language models over the Internet.](https://arxiv.org/abs/2312.08361) _Advances in Neural Information Processing Systems_ 36 (2023). ```bibtex @inproceedings{borzunov2023distributed, title = {Distributed inference and fine-tuning of large language models over the {I}nternet}, author = {Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin}, booktitle = {Advances in Neural Information Processing Systems}, volume = {36}, pages = {12312--12331}, year = {2023}, url = {https://arxiv.org/abs/2312.08361} } ``` -------------------------------------------------------------------------------- This project is a part of the <a href="https://bigscience.huggingface.co/">BigScience</a> research workshop. <img src="https://petals.dev/bigscience.png" width="150">

AI & Machine Learning LLM Tools & Chat UIs

10.2K Github Stars

promptsource

Toolkit for creating, sharing and using natural language prompts.

Developer Tools ML Frameworks

3K Github Stars

bigscience

# bigscience [Research workshop on large language models - The Summer of Language Models 21](https://bigscience.huggingface.co/) At the moment we have 2 code repos: 1. https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base 2. https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc. Currently, the most active segments of this repo are: - [JZ](./jz/) - Lots of information about our work environment which helps evaluate, plan and get things done - [Experiments](./experiments) - many experiments are being done. Documentation, result tables, scripts and logs are all there - [Datasets info](./data/) - [Train](./train) - all the information about the current trainings (see below for the most important ones) We have READMEs for specific aspects, such as: - [hub integration](./tools/README.md) ## Trainings While we keep detailed chronicles of experiments and findings for some of the main trainings, here is a doc that contains a summary of the most important findings: [Lessons learned](train/lessons-learned.md) ### Train 1 - 13B - unmodified Megatron gpt2 - baseline * [the full spec and discussions](./train/tr1-13B-base) * [the training script](./train/tr1-13B-base/tr1-13B-round1.slurm) * checkpoints and logs: - [tensorboard](https://huggingface.co/bigscience/tr1-13B-tensorboard/tensorboard) - [logs](https://huggingface.co/bigscience/tr1-13B-logs/) * [chronicles](./train/tr1-13B-base/chronicles.md) You can watch the training logs live by running this `tail -f` like script over remote log file that gets synced to the hub once an hour: ``` perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \ print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \ https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt ``` ### Train 3 Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards: | Size | 1B3 | 760M | 350M | 125M | |--------------------- |----- |------ |------ |------ | | C4 + low warmup | [a](https://huggingface.co/bigscience/tr3-1B3-modeling-baseline-tensorboard) | [b](https://huggingface.co/bigscience/tr3b-760M-modeling-baseline-tensorboard) | [c](https://huggingface.co/bigscience/tr3c-350M-modeling-baseline-tensorboard) | | | OSCAR + low warmup | [f](https://huggingface.co/bigscience/tr3f-1B3-diagnostic2-low-warmup-oscar-tensorboard) | | | | | C4 + high warmup | [e](https://huggingface.co/bigscience/tr3e-1B3-diagnostic1-warmup-c4-tensorboard) | | | | | OSCAR + high warmup | **[d (current baseline)](https://huggingface.co/bigscience/tr3d-1B3-more-warmup-tensorboard)** | [g](https://huggingface.co/bigscience/tr3g-760M-v2-tensorboard) | [h](https://huggingface.co/bigscience/tr3h-350M-v2-tensorboard) | [i](https://huggingface.co/bigscience/tr3i-125M-v2-tensorboard) | | Pile + high warmup | [m](https://huggingface.co/bigscience/tr3m-1B3-pile-tensorboard) | [j](https://huggingface.co/bigscience/tr3j-760M-pile-tensorboard) | [k](https://huggingface.co/bigscience/tr3k-350M-pile-tensorboard) | [l](https://huggingface.co/bigscience/tr3l-125M-pile-tensorboard) | ### Train 8 104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities * [the full spec and discussions](./train/tr8-104B-wide) * [the training script](./train/tr8-104B-wide/tr8-104B.slurm) * checkpoints and logs: - [tensorboard](https://huggingface.co/bigscience/tr8-104B-logs/tensorboard) - [logs](https://huggingface.co/bigscience/tr8-104B-logs/tree/main/logs) * [chronicles](./train/tr8-104B-wide/chronicles.md) You can watch the training logs live by running this `tail -f` like script over remote log file that gets synced to the hub once an hour: ``` perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \ print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \ https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9 ``` ### Train 11 **This is the current main training** tr11-176B-ml * [the full spec and discussions](./train/tr11-176B-ml/) * [the training script](./train/tr11-176B-ml/tr11-176B-ml.slurm) * checkpoints and logs: - [tensorboard](https://huggingface.co/bigscience/tr11-176B-ml-logs/tensorboard) - [logs](https://huggingface.co/bigscience/tr11-176B-ml-logs/tree/main/logs/main) * [chronicles-prequel](./train/tr11-176B-ml/chronicles-prequel.md) * [chronicles](./train/tr11-176B-ml/chronicles.md) You can watch the training logs live by running this `tail -f` like script over remote log file that gets synced to the hub once an hour: ``` perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -LsI $u]=~/2 200.*?content-length: (\d+)/s; \ print qx[curl -Lsr $b-$e $u] if $e>$b; $b=$e; sleep 300}' \ https://huggingface.co/bigscience/tr11-176B-ml-logs/resolve/main/logs/main/main_log.txt ```

ML Frameworks

1K Github Stars

Software by bigscience-workshop

petals

promptsource

bigscience