minimax-ai

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Visit Website

Total Products

Software by minimax-ai

Open Source

MiniMax-01

<div align="center"> <picture> <source srcset="figures/MiniMaxLogo-Dark.png" media="(prefers-color-scheme: dark)"> <img src="figures/MiniMaxLogo-Light.png" width="60%" alt="MiniMax"> </source> </picture> </div> <hr> <div align="center" style="line-height: 1;"> <a href="https://www.minimax.io" target="_blank" style="margin: 2px; color: var(--fgColor-default);"> <img alt="Homepage" src="https://img.shields.io/badge/_Homepage-MiniMax-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDkwLjE2IDQxMS43Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2ZmZjt9PC9zdHlsZT48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMjMzLjQ1LDQwLjgxYTE3LjU1LDE3LjU1LDAsMSwwLTM1LjEsMFYzMzEuNTZhNDAuODIsNDAuODIsMCwwLDEtODEuNjMsMFYxNDVhMTcuNTUsMTcuNTUsMCwxLDAtMzUuMDksMHY3OS4wNmE0MC44Miw0MC44MiwwLDAsMS04MS42MywwVjE5NS40MmExMS42MywxMS42MywwLDAsMSwyMy4yNiwwdjI4LjY2YTE3LjU1LDE3LjU1LDAsMCwwLDM1LjEsMFYxNDVBNDAuODIsNDAuODIsMCwwLDEsMTQwLDE0NVYzMzEuNTZhMTcuNTUsMTcuNTUsMCwwLDAsMzUuMSwwVjIxNy41aDBWNDAuODFhNDAuODEsNDAuODEsMCwxLDEsODEuNjIsMFYyODEuNTZhMTEuNjMsMTEuNjMsMCwxLDEtMjMuMjYsMFptMjE1LjksNjMuNEE0MC44Niw0MC44NiwwLDAsMCw0MDguNTMsMTQ1VjMwMC44NWExNy41NSwxNy41NSwwLDAsMS0zNS4wOSwwdi0yNjBhNDAuODIsNDAuODIsMCwwLDAtODEuNjMsMFYzNzAuODlhMTcuNTUsMTcuNTUsMCwwLDEtMzUuMSwwVjMzMGExMS42MywxMS42MywwLDEsMC0yMy4yNiwwdjQwLjg2YTQwLjgxLDQwLjgxLDAsMCwwLDgxLjYyLDBWNDAuODFhMTcuNTUsMTcuNTUsMCwwLDEsMzUuMSwwdjI2MGE0MC44Miw0MC44MiwwLDAsMCw4MS42MywwVjE0NWExNy41NSwxNy41NSwwLDEsMSwzNS4xLDBWMjgxLjU2YTExLjYzLDExLjYzLDAsMCwwLDIzLjI2LDBWMTQ1QTQwLjg1LDQwLjg1LDAsMCwwLDQ0OS4zNSwxMDQuMjFaIi8+PC9zdmc+&logoWidth=20" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://arxiv.org/abs/2501.08313" target="_blank" style="margin: 2px;"> <img alt="Paper" src="https://img.shields.io/badge/📖_Paper-MiniMax--01-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://chat.minimax.io/" target="_blank" style="margin: 2px;"> <img alt="Chat" src="https://img.shields.io/badge/_MiniMax_Chat-FF4040?style=flat-square&labelColor=2C3E50&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB2aWV3Qm94PSIwIDAgNDkwLjE2IDQxMS43Ij48ZGVmcz48c3R5bGU+LmNscy0xe2ZpbGw6I2ZmZjt9PC9zdHlsZT48L2RlZnM+PHBhdGggY2xhc3M9ImNscy0xIiBkPSJNMjMzLjQ1LDQwLjgxYTE3LjU1LDE3LjU1LDAsMSwwLTM1LjEsMFYzMzEuNTZhNDAuODIsNDAuODIsMCwwLDEtODEuNjMsMFYxNDVhMTcuNTUsMTcuNTUsMCwxLDAtMzUuMDksMHY3OS4wNmE0MC44Miw0MC44MiwwLDAsMS04MS42MywwVjE5NS40MmExMS42MywxMS42MywwLDAsMSwyMy4yNiwwdjI4LjY2YTE3LjU1LDE3LjU1LDAsMCwwLDM1LjEsMFYxNDVBNDAuODIsNDAuODIsMCwwLDEsMTQwLDE0NVYzMzEuNTZhMTcuNTUsMTcuNTUsMCwwLDAsMzUuMSwwVjIxNy41aDBWNDAuODFhNDAuODEsNDAuODEsMCwxLDEsODEuNjIsMFYyODEuNTZhMTEuNjMsMTEuNjMsMCwxLDEtMjMuMjYsMFptMjE1LjksNjMuNEE0MC44Niw0MC44NiwwLDAsMCw0MDguNTMsMTQ1VjMwMC44NWExNy41NSwxNy41NSwwLDAsMS0zNS4wOSwwdi0yNjBhNDAuODIsNDAuODIsMCwwLDAtODEuNjMsMFYzNzAuODlhMTcuNTUsMTcuNTUsMCwwLDEtMzUuMSwwVjMzMGExMS42MywxMS42MywwLDEsMC0yMy4yNiwwdjQwLjg2YTQwLjgxLDQwLjgxLDAsMCwwLDgxLjYyLDBWNDAuODFhMTcuNTUsMTcuNTUsMCwwLDEsMzUuMSwwdjI2MGE0MC44Miw0MC44MiwwLDAsMCw4MS42MywwVjE0NWExNy41NSwxNy41NSwwLDEsMSwzNS4xLDBWMjgxLjU2YTExLjYzLDExLjYzLDAsMCwwLDIzLjI2LDBWMTQ1QTQwLjg1LDQwLjg1LDAsMCwwLDQ0OS4zNSwxMDQuMjFaIi8+PC9zdmc+&logoWidth=20" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://www.minimax.io/platform" style="margin: 2px;"> <img alt="API" src="https://img.shields.io/badge/⚡_API-Platform-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/MiniMax-AI/MiniMax-MCP" style="margin: 2px;"> <img alt="MCP" src="https://img.shields.io/badge/🚀_MCP-MiniMax_MCP-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://huggingface.co/MiniMaxAI" target="_blank" style="margin: 2px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/🤗_Hugging_Face-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/MiniMax-AI/MiniMax-AI.github.io/blob/main/images/wechat-qrcode.jpeg" target="_blank" style="margin: 2px;"> <img alt="WeChat" src="https://img.shields.io/badge/_WeChat-MiniMax-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://github.com/MiniMax-AI/MiniMax-01/blob/main/LICENSE-MODEL" style="margin: 2px;"> <img alt="Model License" src="https://img.shields.io/badge/_Model_License-Model_Agreement-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/MiniMax-AI/MiniMax-01/blob/main/LICENSE-CODE" style="margin: 2px;"> <img alt="Code License" src="https://img.shields.io/badge/_Code_License-MIT-FF4040?style=flat-square&labelColor=2C3E50" style="display: inline-block; vertical-align: middle;"/> </a> </div> # MiniMax-01 ## 1. Introduction We are delighted to introduce two remarkable models, **MiniMax-Text-01** and **MiniMax-VL-01**. MiniMax-Text-01 is a powerful language model boasting 456 billion total parameters, with 45.9 billion activated per token. To unlock its long-context capabilities, it adopts a hybrid architecture integrating Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies like Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, and Expert Tensor Parallel (ETP), its training context length extends to 1 million tokens, and it can handle up to 4 million tokens during inference. Consequently, MiniMax-Text-01 showcases top-tier performance on various academic benchmarks. Building on MiniMax-Text-01's prowess, we developed MiniMax-VL-01 for enhanced visual capabilities. It uses the "ViT-MLP-LLM" framework common in multimodal LLMs. It is initialized and trained using three key components: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and MiniMax-Text-01 as the base LLM. This model features a dynamic resolution mechanism. Input images are resized according to a pre-set grid, with resolutions ranging from 336×336 to 2016×2016, while maintaining a 336×336 thumbnail. The resized images are split into non - overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined to form a full image representation. As a result, MiniMax-VL-01 has achieved top-level performance on multimodal leaderboards, demonstrating its edge in complex multimodal tasks. <img width="100%" src="figures/TextBench.png"> <img width="100%" src="figures/VisionBench.png"> ## 2. Model Architecture The architecture of MiniMax-Text-01 is briefly described as follows: - Total Parameters: 456B - Activated Parameters per Token: 45.9B - Number Layers: 80 - Hybrid Attention: a softmax attention is positioned after every 7 lightning attention. - Number of attention heads: 64 - Attention head dimension: 128 - Mixture of Experts: - Number of experts: 32 - Expert hidden dimension: 9216 - Top-2 routing strategy - Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000 - Hidden Size: 6144 - Vocab Size: 200,064 For MiniMax-VL-01, the additional ViT architecture details is as follows: - Total Parameters: 303M - Number of layers: 24 - Patch size: 14 - Hidden size: 1024 - FFN hidden size: 4096 - Number of heads: 16 - Attention head dimension: 64 ## 3. Evaluation ### Text Benchmarks #### Core Academic Benchmarks | **Tasks** | **GPT-4o (11-20)** | **Claude-3.5-Sonnet (10-22)** | **Gemini-1.5-Pro (002)** | **Gemini-2.0-Flash (exp)** | **Qwen2.5-72B-Inst.** | **DeepSeek-V3** | **Llama-3.1-405B-Inst.** | **MiniMax-Text-01** | |-------------------------------|--------------------|-------------------------------|--------------------------|----------------------------|-----------------------|-----------------|--------------------------|---------------------| | **General** | | | | | | | | | | MMLU* | 85.7 | 88.3 | 86.8 | 86.5 | 86.1 | 88.5 | **88.6** | 88.5 | | MMLU-Pro* | 74.4 | **78.0** | 75.8 | 76.4 | 71.1 | 75.9 | 73.3 | 75.7 | | SimpleQA | **39.0** | 28.1 | 23.4 | 26.6 | 10.3 | 24.9 | 23.2 | 23.7 | | C-SimpleQA | 64.6 | 56.8 | 59.4 | 63.3 | 52.2 | 64.8 | 54.7 | **67.4** | | IFEval _(avg)_ | 84.1 | **90.1** | 89.4 | 88.4 | 87.2 | 87.3 | 86.4 | 89.1 | | Arena-Hard | **92.4** | 87.6 | 85.3 | 72.7 | 81.2 | 91.4 | 63.5 | 89.1 | | **Reasoning** | | | | | | | | | | GPQA* _(diamond)_ | 46.0 | **65.0** | 59.1 | 62.1 | 49.0 | 59.1 | 50.7 | 54.4 | | DROP* _(F1)_ | 89.2 | 88.8 | 89.2 | 89.3 | 85.0 | 91.0 | **92.5** | 87.8 | | **Mathematics** | | | | | | | | | | GSM8k* | 95.6 | **96.9** | 95.2 | 95.4 | 95.8 | 96.7 | 96.7 | 94.8 | | MATH* | 76.6 | 74.1 | **84.6** | 83.9 | 81.8 | **84.6** | 73.8 | 77.4 | | **Coding** | | | | | | | | | | MBPP + | 76.2 | 75.1 | 75.4 | 75.9 | 77.0 | **78.8** | 73.0 | 71.7 | | HumanEval | 90.2 | **93.7** | 86.6 | 89.6 | 86.6 | 92.1 | 89.0 | 86.9 | * Evaluated following a _0-shot CoT_ setting. #### Long Benchmarks **4M Needle In A Haystack Test** <img width="90%" src="figures/niah.png"> **Ruler** | Model | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M | |-------|----|----|-----|-----|-----|------|------|------|----| | **GPT-4o (11-20)** | **0.970** | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - | | **Claude-3.5-Sonnet (10-22)** | 0.965 | 0.960 | 0.957 | 0.950 | **0.952** | 0.938 | - | - | - | | **Gemini-1.5-Pro (002)** | 0.962 | 0.960 | **0.960** | **0.958** | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 | | **Gemini-2.0-Flash (exp)** | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - | | **MiniMax-Text-01** | 0.963 | **0.961** | 0.953 | 0.954 | 0.943 | **0.947** | **0.945** | **0.928** | **0.910** | **LongBench v2** | **Model** | **overall** | **easy** | **hard** | **short** | **medium** | **long** | |----------------------------|-------------|----------|----------|------------|------------|----------| | Human | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 | | **w/ CoT** | | | | | | | | GPT-4o (11-20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 | | Claude-3.5-Sonnet (10-22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 | | Deepseek-V3 | - | - | - | - | - | - | | Qwen2.5-72B-Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 | | **MiniMax-Text-01** | **56.5** | **66.1** | **50.5** | **61.7** | **56.7** | **47.2** | | **w/o CoT** | | | | | | | | GPT-4o (11-20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 | | Claude-3.5-Sonnet (10-22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 | | Deepseek-V3 | 48.7 | - | - | - | - | - | | Qwen2.5-72B-Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | **44.4** | | **MiniMax-Text-01** | **52.9** | **60.9** | **47.9** | **58.9** | **52.6** | 43.5 | **MTOB** | **Context Type** | **no context** | **half book** | **full book** | **Δ half book** | **Δ full book** | |------------------|----------------|---------------|---------------|------------------|-----------------| | **eng → kalam (ChrF)** | | | | | | | GPT-4o (11-20) | 9.90 | **54.30** | - | 44.40 | - | | Claude-3.5-Sonnet (10-22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 | | Gemini-1.5-Pro (002) | 16.79 | 53.68 | **57.90** | 36.89 | 41.11 | | Gemini-2.0-Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 | | Qwen-Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 | | **MiniMax-Text-01** | 6.0 | 51.74 | 51.60 | **45.7** | **45.6** | | **kalam → eng (BLEURT)** | | | | | | | GPT-4o (11-20) | 33.20 | 58.30 | - | 25.10 | - | | Claude-3.5-Sonnet (10-22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 | | Gemini-1.5-Pro (002) | 32.02 | **61.52** | **63.09** | **29.50** | **31.07** | | Gemini-2.0-Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 | | Qwen-Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 | | **MiniMax-Text-01** | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 | ### Vision Benchmarks | Tasks | GPT-4o (11-20) | Claude-3.5-Sonnet (10-22) | Gemini-1.5-Pro (002) | Gemini-2.0-Flash (exp) | Qwen2-VL-72B-Inst. | InternVL2.5-78B | LLama-3.2-90B | MiniMax-VL-01 | | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | | **Knowledge** | | | | | | | | | | MMMU* | 63.5 | **72.0** | 68.4 | 70.6 | 64.5 | 66.5 | 62.1 | 68.5 | | MMMU-Pro* | 54.5 | 54.7 | 50.9 | **57.0** | 43.2 | 47.3 | 36.0 | 52.7 | | **Visual Q&A** | | | | | | | | | | ChartQA*relaxed | 88.1 | 90.8 | 88.7 | 88.3 | 91.2 | 91.5 | 85.5 | **91.7** | | DocVQA* | 91.1 | 94.2 | 91.5 | 92.9 | **97.1** | 96.1 | 90.1 | 96.4 | | OCRBench | 806 | 790 | 800 | 846 | 856 | 847 | 805 | **865** | | **Mathematics & Sciences** || | | | | | | | | AI2D* | 83.1 | 82.0 | 80.9 | 85.1 | 84.4 | **86.8** | 78.9 | 83.3 | | MathVista* | 62.1 | 65.4 | 70.6 | **73.1** | 69.6 | 68.4 | 57.3 | 68.6 | | OlympiadBenchfull | 25.2 | 28.4 | 32.1 | **46.1** | 21.9 | 25.1 | 19.3 | 24.2 | |**Long Context**||||| |M-LongDocacc| **41.4** | 31.4 | 26.2 | 31.4 | 11.6 | 19.7 | 13.9 | 32.5 | |**Comprehensive**||||| |MEGA-Benchmacro | 49.4 | 51.4 | 45.9 | **53.9** | 46.8 | 45.3 | 19.9 | 47.4 | |**User Experience**||||| |In-house Benchmark | 62.3 | 47.0 | 49.2 | **72.1** | 40.6 | 34.8 | 13.6 | 56.6 | * Evaluated following a _0-shot CoT_ setting. ## 4. Quickstart Here, we provide a simple example to demonstrate how to use MiniMax-Text-01 and MiniMax-VL-01 respectively. ### MiniMax-Text-01 ```python from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig # load hf config hf_config = AutoConfig.from_pretrained("MiniMaxAI/MiniMax-Text-01", trust_remote_code=True) # quantization config, int8 is recommended quantization_config = QuantoConfig( weights="int8", modules_to_not_convert=[ "lm_head", "embed_tokens", ] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)] + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)] ) # assume 8 GPUs world_size = 8 layers_per_device = hf_config.num_hidden_layers // world_size # set device map device_map = { 'model.embed_tokens': 'cuda:0', 'model.norm': f'cuda:{world_size - 1}', 'lm_head': f'cuda:{world_size - 1}' } for i in range(world_size): for j in range(layers_per_device): device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}' # load tokenizer tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-Text-01") prompt = "Hello!" messages = [ {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]}, {"role": "user", "content": [{"type": "text", "text": prompt}]}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # tokenize and move to device model_inputs = tokenizer(text, return_tensors="pt").to("cuda") # load bfloat16 model, move to device, and apply quantization quantized_model = AutoModelForCausalLM.from_pretrained( "MiniMaxAI/MiniMax-Text-01", torch_dtype="bfloat16", device_map=device_map, quantization_config=quantization_config, trust_remote_code=True, offload_buffers=True, ) # generate response generation_config = GenerationConfig( max_new_tokens=20, eos_token_id=200020, use_cache=True, ) generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config) print(f"generated_ids: {generated_ids}") generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ### MiniMax-VL-01 ```python from transformers import AutoModelForCausalLM, AutoProcessor, AutoConfig, QuantoConfig, GenerationConfig import torch import json import os from PIL import Image # load hf config hf_config = AutoConfig.from_pretrained("MiniMaxAI/MiniMax-VL-01", trust_remote_code=True) # quantization config, int8 is recommended quantization_config = QuantoConfig( weights="int8", modules_to_not_convert=[ "vision_tower", "image_newline", "multi_modal_projector", "lm_head", "embed_tokens", ] + [f"model.layers.{i}.coefficient" for i in range(hf_config.text_config.num_hidden_layers)] + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.text_config.num_hidden_layers)] ) # set device map model_safetensors_index_path = os.path.join("MiniMax-VL-01", "model.safetensors.index.json") with open(model_safetensors_index_path, "r") as f: model_safetensors_index = json.load(f) weight_map = model_safetensors_index['weight_map'] vision_map = {} for key, value in weight_map.items(): if 'vision_tower' in key or 'image_newline' in key or 'multi_modal_projector' in key: new_key = key.replace('.weight','').replace('.bias','') if new_key not in vision_map: vision_map[new_key] = value # assume 8 GPUs world_size = 8 device_map = { 'language_model.model.embed_tokens': 'cuda:0', 'language_model.model.norm': f'cuda:{world_size - 1}', 'language_model.lm_head': f'cuda:{world_size - 1}' } for key, value in vision_map.items(): device_map[key] = f'cuda:0' device_map['vision_tower.vision_model.post_layernorm'] = f'cuda:0' layers_per_device = hf_config.text_config.num_hidden_layers // world_size for i in range(world_size): for j in range(layers_per_device): device_map[f'language_model.model.layers.{i * layers_per_device + j}'] = f'cuda:{i}' # load processor processor = AutoProcessor.from_pretrained("MiniMaxAI/MiniMax-VL-01", trust_remote_code=True) messages = [ {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-VL-01 model."}]}, {"role": "user", "content": [{"type": "image", "image": "placeholder"},{"type": "text", "text": "Describe this image."}]}, ] prompt = processor.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) raw_image = Image.open("figures/image.jpg") # tokenize and move to device model_inputs = processor(images=[raw_image], text=prompt, return_tensors='pt').to('cuda').to(torch.bfloat16) # load bfloat16 model, move to device, and apply quantization quantized_model = AutoModelForCausalLM.from_pretrained( "MiniMaxAI/MiniMax-VL-01", torch_dtype="bfloat16", device_map=device_map, quantization_config=quantization_config, trust_remote_code=True, offload_buffers=True, ) generation_config = GenerationConfig( max_new_tokens=100, eos_token_id=200020, use_cache=True, ) # generate response generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config) print(f"generated_ids: {generated_ids}") generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## 5. Deployment Guide For production deployment, we recommend using [vLLM](https://docs.vllm.ai/en/latest/) to serve MiniMax-Text-01 and MiniMax-VL-01. vLLM provides excellent performance for serving large language models with the following features: - 🔥 Outstanding service throughput performance - ⚡ Efficient and intelligent memory management - 📦 Powerful batch request processing capability - ⚙️ Deeply optimized underlying performance For detailed vLLM deployment instructions, please refer to our [vLLM Deployment Guide](docs/vllm_deployment_guide.md). Alternatively, you can also deploy using Transformers directly. For detailed Transformers deployment instructions, you can see our [MiniMax-Text-01 Transformers Deployment Guide](docs/transformers_deployment_guide.md). ## 6. Citation ``` @misc{minimax2025minimax01scalingfoundationmodels, title={MiniMax-01: Scaling Foundation Models with Lightning Attention}, author={MiniMax and Aonian Li and Bangwei Gong and Bo Yang and Boji Shan and Chang Liu and Cheng Zhu and Chunhao Zhang and Congchao Guo and Da Chen and Dong Li and Enwei Jiao and Gengxin Li and Guojun Zhang and Haohai Sun and Houze Dong and Jiadai Zhu and Jiaqi Zhuang and Jiayuan Song and Jin Zhu and Jingtao Han and Jingyang Li and Junbin Xie and Junhao Xu and Junjie Yan and Kaishun Zhang and Kecheng Xiao and Kexi Kang and Le Han and Leyang Wang and Lianfei Yu and Liheng Feng and Lin Zheng and Linbo Chai and Long Xing and Meizhi Ju and Mingyuan Chi and Mozhi Zhang and Peikai Huang and Pengcheng Niu and Pengfei Li and Pengyu Zhao and Qi Yang and Qidi Xu and Qiexiang Wang and Qin Wang and Qiuhui Li and Ruitao Leng and Shengmin Shi and Shuqi Yu and Sichen Li and Songquan Zhu and Tao Huang and Tianrun Liang and Weigao Sun and Weixuan Sun and Weiyu Cheng and Wenkai Li and Xiangjun Song and Xiao Su and Xiaodong Han and Xinjie Zhang and Xinzhu Hou and Xu Min and Xun Zou and Xuyang Shen and Yan Gong and Yingjie Zhu and Yipeng Zhou and Yiran Zhong and Yongyi Hu and Yuanxiang Fan and Yue Yu and Yufeng Yang and Yuhao Li and Yunan Huang and Yunji Li and Yunpeng Huang and Yunzhi Xu and Yuxin Mao and Zehan Li and Zekang Li and Zewei Tao and Zewen Ying and Zhaoyang Cong and Zhen Qin and Zhenhua Fan and Zhihang Yu and Zhuo Jiang and Zijia Wu}, year={2025}, eprint={2501.08313}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2501.08313}, } ``` ## 7. Chatbot & API For general use and evaluation, we provide a [Chatbot](https://chat.minimax.io/) with online search capabilities and the [online API](https://www.minimax.io/platform) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers. Contact us at [[email protected]](mailto:[email protected]).

LLM Tools & Chat UIs ML Frameworks

3.4K Github Stars

Open Source

OpenRoom

# VibeApps [中文](./README_zh.md) | English > Imagine a desktop that lives in your browser — and an AI that knows how to use every app on it. ![License](https://img.shields.io/badge/license-MIT-blue.svg) **[Website](https://www.openroom.ai)** · **[X / Twitter](https://x.com/openroom_ai_)** https://github.com/user-attachments/assets/adb176a3-02db-41e0-ba71-c9f9cece13d5 ## What is VibeApps? VibeApps brings a full desktop experience into your browser — windows you can drag and resize, apps you can open side by side, all wrapped in a clean macOS-inspired interface. But what makes it different is the **AI Agent** sitting inside. Instead of clicking through menus, just tell it what you want: > *"Play some jazz"* — and the Music app starts playing. > > *"Write a diary entry about today's hiking trip"* — Diary opens, a new entry appears. > > *"Let's play chess"* — the board is ready. The Agent doesn't just launch apps — it **operates** them. It reads data, triggers actions, and updates state, all through a structured Action system that every app speaks. Everything runs locally in your browser. No backend, no accounts, no setup headaches. Your data stays in IndexedDB, right where it belongs. ## Built-in Apps Out of the box, you get a suite of apps ready to explore: | App | Description | |-----|-------------| | 🎵 Music | Full-featured player with playlists, playback controls, and album art | | ♟️ Chess | Classic chess with complete rule enforcement | | ⚫ Gomoku | Five-in-a-row — simple rules, deep strategy | | 🃏 FreeCell | The solitaire game that's all skill, no luck | | 📧 Email | Inbox, sent, drafts — a familiar email experience | | 📔 Diary | Journal with mood tracking to capture your days | | 🐦 Twitter | A social feed you actually control | | 📷 Album | Browse and organize your photo collections | | 📰 CyberNews | Stay informed with a curated news aggregator | Each app is fully integrated with the AI Agent — meaning you can interact with any of them through natural language. ## Getting Started ### Prerequisites | Tool | Version | Check | Install | |------|---------|-------|---------| | **Node.js** | 18+ | `node -v` | [nodejs.org](https://nodejs.org/) | | **pnpm** | 9+ | `pnpm -v` | `npm install -g pnpm@9` | > **In China?** Uncomment the mirror lines in `.npmrc` for faster downloads via npmmirror. ### Up and Running in 60 Seconds ```bash # Clone & enter the project git clone https://github.com/MiniMax-AI/OpenRoom.git cd OpenRoom # Install dependencies pnpm install # (Optional) Set up environment variables cp apps/webuiapps/.env.example apps/webuiapps/.env # Launch pnpm dev ``` Open `http://localhost:3000` — you'll see a desktop with app icons. **Double-click** to open any app. ### Meet the AI Agent (In-App Chat) Click the **chat icon** in the bottom-right corner. A panel slides open — that's your Agent. Type naturally: *"play the next song"*, *"show me my emails"*, *"start a new chess game"*. The Agent figures out which app to talk to, what action to take, and makes it happen. > **Note:** You'll need an LLM API key. Configure it in the Chat Panel settings. > > This chat panel is for **using** existing apps. To **create** new apps, see the [Vibe Workflow](#build-your-own-apps--just-describe-them) section below — that runs in Claude Code CLI. ## Build Your Own Apps — Just Describe Them This is where it gets interesting. With the **Vibe Workflow**, you can generate a complete, fully-integrated app just by describing what you want. No boilerplate, no scaffolding — [Claude Code](https://docs.anthropic.com/en/docs/claude-code) handles the entire process. > **Important:** The Vibe Workflow runs in **Claude Code (CLI terminal)**, not in the browser's chat panel. The in-app chat panel is for operating existing apps; creating new apps happens in your development environment. ### Create from Scratch ```bash /vibe WeatherApp Create a weather dashboard with 5-day forecasts and temperature charts ``` Behind the scenes, the workflow runs through **6 stages** — each one building on the last: ``` Requirement Analysis → What exactly are we building? Architecture Design → Components, data models, state shape Task Planning → Breaking it down into implementable chunks Code Generation → Writing the actual React + TypeScript code Asset Generation → Creating icons and images Project Integration → Registering the app so it shows up on the desktop ``` When it's done, your new app is live — complete with AI Agent integration. ### Evolve Existing Apps Already have an app but want more? Describe the change: ```bash /vibe MusicApp Add a lyrics panel that shows synced lyrics during playback ``` This triggers a focused **4-stage change workflow**: Impact Analysis → Planning → Implementation → Verification. ### Resume or Replay ```bash # Pick up where you left off /vibe MyApp # Jump to a specific stage /vibe MyApp --from=04-codegen ``` ## Under the Hood ### Project Layout ``` OpenRoom/ ├── apps/webuiapps/ # The main desktop application │ └── src/ │ ├── components/ # Shell, window manager, chat panel │ ├── lib/ # Core SDK — file API, actions, app registry │ ├── pages/ # Where each app lives │ └── routers/ # Route definitions ├── packages/ │ └── vibe-container/ # iframe communication SDK (stub in open-source mode) ├── .claude/ # AI workflow engine │ ├── commands/vibe.md # Workflow entry point │ ├── workflow/ # Stage definitions & rules │ └── rules/ # Code generation constraints └── .github/workflows/ # CI pipeline ``` > **Note on `vibe-container`:** In the open-source standalone version, the real iframe SDK is replaced by a local mock (`src/lib/vibeContainerMock.ts`) that uses IndexedDB for storage and a local event bus for Agent communication. The package under `packages/vibe-container/` provides type definitions and the client-side SDK interface. See its [README](./packages/vibe-container/README.md) for details. ### Anatomy of an App Every app follows the same structure — consistent, predictable, easy to navigate: ``` pages/MusicApp/ ├── components/ # UI building blocks ├── data/ # Seed data (JSON) ├── store/ # State management (Context + Reducer) ├── actions/ # How the AI Agent talks to this app │ └── constants.ts # APP_ID + action type definitions ├── i18n/ # Translations (en.ts + zh.ts) ├── meta/ # Metadata for the Vibe workflow │ ├── meta_cn/ # guide.md + meta.yaml (Chinese) │ └── meta_en/ # guide.md + meta.yaml (English) ├── index.tsx # Entry point ├── types.ts # TypeScript definitions └── index.module.scss # Scoped styles ``` ## Development | Command | Description | |---------|-------------| | `pnpm dev` | Start dev server → `http://localhost:3000` | | `pnpm build` | Production build | | `pnpm run lint` | Lint + auto-fix | | `pnpm run pretty` | Format with Prettier | | `pnpm clean` | Clean build artifacts | ## Tech Stack | | | |---|---| | **Framework** | React 18 + TypeScript + Vite | | **Styling** | Tailwind CSS + CSS Modules + Design Tokens | | **Icons** | Lucide React | | **State** | React Context + Reducer | | **Storage** | IndexedDB (standalone) / Cloud NAS (production) | | **i18n** | i18next + react-i18next | | **Monorepo** | pnpm workspaces + Turborepo | | **CI** | GitHub Actions | ## Environment Variables ```bash cp apps/webuiapps/.env.example apps/webuiapps/.env ``` | Variable | Required | Description | |----------|----------|-------------| | `CDN_PREFIX` | No | CDN prefix for static assets | | `VITE_RUM_SITE` | No | RUM monitoring endpoint | | `VITE_RUM_CLIENT_TOKEN` | No | RUM client token | | `SENTRY_AUTH_TOKEN` | No | Sentry auth token (enables error tracking when set) | | `SENTRY_ORG` | No | Sentry organization slug | | `SENTRY_PROJECT` | No | Sentry project slug | All optional. The app runs fine without any of them. ## Contributing We'd love your help. Whether it's fixing a bug, building a new app, or improving docs — check out [CONTRIBUTING.md](./CONTRIBUTING.md) to get started. ## License [MIT](LICENSE) — Copyright (c) 2025 MiniMax

AI Agents Customer Engagement

1.2K Github Stars