yohasebe

Open Source

openai-chat-api-workflow

# OpenAI Chat API Workflow for Alfred <img src='./icons/openai.png' style='height:120px;'/> 🎩 An [Alfred 5](https://www.alfredapp.com/) Workflow for using the [OpenAI](https://platform.openai.com/) Chat API to interact with GPT models 🤖💬. It also allows file understanding 📎 (images, PDFs, Office documents, code, and more), image generation 🖼️, speech-to-text conversion 🎤, and text-to-speech synthesis 🔈. 📦 Download [**OpenAI Chat API Workflow**](https://github.com/yohasebe/openai-chat-api-workflow/raw/main/openai-chat-api.alfredworkflow) (version `5.1.0`) You can execute all the above features using: - Alfred UI 🖥️ - Selected text 📝 - A dedicated web UI 🌐 The web UI is constructed by the workflow and runs locally on your Mac 💻. The API call is made directly between the workflow and OpenAI, ensuring your chat messages are not shared online with anyone other than OpenAI 🔒. Furthermore, OpenAI does not use the data from the API Platform for training 🚫. All messages in a conversation are displayed on a single scrollable page 📜, making it easy to review the full context. You can export the chat data to an external file in simple JSON format 📄, and it is possible to continue the chat by importing it later 🔄. <img src="./docs/img/OpenAI-Alfred-Workflow.png" width="600" /> <kbd><img src="./docs/img/web-interface.png" width="700"></kbd> <kbd><img src="./docs/img/openai-chat-api-workflow.gif" width="700" /></kbd> ## Installation 1. Download and run [**OpenAI Chat API Workflow**](https://github.com/yohasebe/openai-chat-api-workflow/raw/main/openai-chat-api.alfredworkflow) 2. Set your [OpenAI API key](https://platform.openai.com/account/api-keys) 3. Enable accessibility settings for Alfred in `System Preferences` → `Security & Privacy` → `Privacy` → `Accessibility` <kbd><img src="./docs/img/accessibility.png" width="600"></kbd> **Setup Hotkeys** You can set up hotkeys in the settings screen of the workflow. To set up hotkeys, double-click on the light purple workflow elements. <kbd><img width="700" src="./docs/img/openai-workflow-overview.png"></kbd> 1. Open Web UI (Recommended) 2. Direct Query 3. Send Selected Text 4. Screen Capture for Image Editing 5. Screen Capture for Image Understanding 6. Speech to Text 7. Text to Speech (Selected text) There is also a "Stop text-to-speech playback" command to stop the playback of the text-to-speech audio stream. Current it needs to be assigned a hotkey different from that of the "Text to Speech" command. **Dependencies** - Alfred 5 [Powerpack](https://www.alfredapp.com/shop/) - OpenAI [API key](https://platform.openai.com/account/api-keys) No external dependencies (Homebrew, etc.) are required. All features work out of the box. To start using this workflow, you must set the environment variable `apikey`, which you can obtain by creating a new [OpenAI account](https://platform.openai.com/account/api-keys). See also the [Configuration](#configuration) section below. > **Note:** Voice input uses the browser's built-in Web Audio API in the Web UI. No external dependencies are required. **Recent Changelog** - 5.1.0: - New model `gpt-5.5` added as a selectable option (not the default) - Default chat model changed from `gpt-5-mini` to `gpt-5.4-mini` for a better balance of cost and capability - Removed unused expensive-model confirmation dialog mechanism (the `pro` flagship and o-series models are intentionally not bundled) - 5.0.0: - New image generation model `gpt-image-2` (now default); simplified to `gpt-image-2` and `gpt-image-1.5` - File input via OpenAI Files API (`file_id` reference) for images, PDFs, Office documents, text, code, and more - Uploaded files are automatically deleted from OpenAI's storage after each response - Structured error display with Status/Code/Message/Request ID and collapsible debug info - File input via Alfred Universal Action ("OpenAI File Input") for all supported file types - Iterative image editing (Refine Image) with conversation history - Automatic context truncation for long conversations - Simplified command list: removed 12 legacy prompt modes (Write Program Code, Grammar Correction, etc.) — use natural language prompts instead - Removed unused API parameters (Temperature, Top P, Frequency/Presence Penalty, Max Size for Image Understanding) from settings - Unified file upload architecture: both starter and chat UIs now use WEBrick `/upload` endpoint - Improved WebSocket stability for large file uploads - JSON export/import reliability improvements - Cache management: auto-cleanup on server start (7+ days) and manual "Clear Cache" button - No external dependencies required (Homebrew install step removed) - 4.8.0: - New models: `gpt-5.4-mini`, `gpt-5.4-nano` - Reasoning effort defaults optimized per model - 4.7.0: - New models: `gpt-5.4`, `gpt-5.3-chat-latest`, `gpt-5.3-codex` - Removed deprecated models and added invalid model fallback - Fix outdated OpenAI documentation URLs - 4.6.0: - Fix Large Type parallel firing issue on Alfred 5.7+ / macOS 16 - Remove date-suffixed TTS/STT model names; use base names as defaults - Fix typo in JSON validation error message - 4.5.0: - Simplified image models: `gpt-image-1.5` (default) and `chatgpt-image-latest` (lightweight) - Removed deprecated `gpt-image-1` and `gpt-image-1-mini` models - 4.4.0: - Added `gpt-image-1.5` as default image generation model - Removed DALL·E 2 and DALL·E 3 models (deprecated by OpenAI) - 4.3.0: - New TTS model: `gpt-4o-mini-tts-2025-12-15` (now default) - New STT model: `gpt-4o-mini-transcribe-2025-12-15` (now default) - Improved error rates, fewer hallucinations, better instruction following - Enhanced support for Chinese, Japanese, Indonesian, Hindi, Bengali, Italian - 4.2.0: - GPT-5.2 series models now supported (`gpt-5.2`, `gpt-5.2-chat-latest`, `gpt-5.2-pro`) - New `xhigh` reasoning effort level for maximum quality - `gpt-5.2-pro` has very high API pricing - confirmation dialog added - Model-specific reasoning effort constraints for GPT-5.2 series - Web UI updated with new models and pricing warning - 4.1.0: - GPT-5.1 series models now supported (`gpt-5.1`, `gpt-5.1-chat-latest`, `gpt-5.1-codex`, `gpt-5.1-codex-mini`) - `gpt-5.1` replaces `gpt-5` as the flagship model with enhanced reasoning capabilities - `gpt-5.1-chat-latest` replaces `chatgpt-4o-latest` for latest optimizations - New codex variants (`gpt-5.1-codex`, `gpt-5.1-codex-mini`) for code generation tasks - Model-specific reasoning effort constraints with dynamic UI adjustment - Removed older models: `gpt-5`, `gpt-4.1` series, `gpt-4o` series, `chatgpt-4o-latest` - Default model remains `gpt-5-mini` for balanced performance - `gpt-5-mini` and `gpt-5-nano` continue to be supported - 4.0.0: - GPT-5 series models (`gpt-5`, `gpt-5-mini`, `gpt-5-nano`) supported with Responses API - GPT-5 models feature reasoning capabilities with configurable reasoning_effort (minimal/low/medium/high) - Note: Only `gpt-5` supports `minimal` reasoning_effort. `gpt-5-mini`, `gpt-5-nano`, and other reasoning-capable models do not support `minimal`. - Default model changed to `gpt-5-mini` - Removed support for o1, o3, o4 series reasoning models - GPT-4.1 and earlier models continue to use Chat Completion API - Full support for PDF and image understanding across all GPT-5 models [Complete Change Log](https://github.com/yohasebe/openai-chat-api-workflow/blob/main/CHANGELOG.md) ## Methods of Execution Here are three methods to run the workflow: 1) Using commands within the Alfred UI, 2) Passing selected text to the workflow, 3) Utilizing the Web UI. Additionally, there's a convenient method for making brief inquiries to GPT. All methods share the same conversation history — messages accumulate on a single scrollable page, regardless of how you send them. **Commands within the Alfred UI** You can enter a query directly into Alfred's textbox: - Method 1: Alfred textbox → keyword (`openai`) → space/tab → input query → select a command (see below) - Method 2: Alfred textbox → input query → select fallback search (`OpenAI Query`) **Passing Selected Text** You can select any text on your Mac and send it to the workflow: - Method 1: Select text → universal action hotkey → select `OpenAI Query` - Method 2: Set up a custom hotkey to `Send selected text to OpenAI` **Using Web Interface** You can open the web interface: - Method 1: Alfred textbox → keyword (`openai-webui`) - Method 2: Set up a custom hotkey to `Open web interface` **Using the Default Browser** If your default browser is set to one of the following, the web interface will automatically open in your chosen browser. If not, Safari will be used as the default. - Google Chrome (Stable, Beta, Dev, etc.) - Microsoft Edge (Stable, Beta, Dev, etc.) - Brave Browser Restart the OpenAI Workflow server by executing `openai-restart-server` if the web UI does not work as expected after changing the default browser. **Web UI Modes** Switch modes (`light`/`dark`/`auto`) with the `Web UI Mode` selector in the settings. <kbd><img width="700" src="./docs/img/web-interface-dark.png"></kbd> **Simple Direct Query/Chat** To quickly chat with GPT: - Method 1: Type keyword `gpt` → space/tab → input query (e.g., "**gpt** what is a large language model?") - Method 2: Set up a custom hotkey to `OpenAI Direct Query` <img src='./docs/img/direct-query.png' style='width:700px;'/> ## Basic Commands With `Direct Query`, the input text is sent directly to the OpenAI Chat API as a prompt. You can also create a query by prepending or appending text to the input. <img src='./icons/patch-question.png' style='height:1em;'/> **Direct Query** The input text is directly sent as a prompt to the OpenAI Chat API. <kbd><img src='./docs/img/direct-query.gif' style='width:700px;'/></kbd> <img src='./icons/arrow-bar-down.png' style='height:1em;'/> **Prepend Text + Query** After entering the initial text, you are prompted for additional text. The additional text is added *before* the initial text, and the resulting text is used as the query. <kbd><img src='./docs/img/prepend.gif' style='width:700px;'/></kbd> <img src='./icons/arrow-bar-up.png' style='height:1em;'/> **Append Text + Query** After entering the initial text, you are prompted for additional text. The additional text is added *after* the initial text, and the resulting text is used as the query. <img src='./icons/picture.png' style='height:1em;'/> **Generate Image** The GPT Image API (`gpt-image-2` or `gpt-image-1.5`) is used to generate images based on the entered prompts. See [Image Generation](#image-generation) below. ## Image Generation Image generation can be executed through one of the above commands. It is also possible to use the web UI. By using the web UI, you can interactively change the prompt to get closer to the desired image. <kbd><img width="700" src="./docs/img/image-generation-1.png"></kbd> To use the image generation mode with the `gpt-image-2` or `gpt-image-1.5` model, you may need to complete the <a href="https://help.openai.com/en/articles/10910291-api-organization-verification">API Organization Verification</a> from your <a href="https://platform.openai.com/settings/organization/general">developer console</a>. <kbd><img width="700" src="./docs/img/image-generation-2.png"></kbd> <kbd><img width="700" src="./docs/img/image-generation-3.png"></kbd> ## Image Editing There is a command to edit images using `gpt-image-2` or `gpt-image-1.5`. There is an Universal Action command `OpenAI Image Edit`. You can also use the web UI to upload an image file for editing. The image file is sent to the OpenAI Image Editing API, and the result is displayed after a while (at the maximum of 2 minutes). ### Iterative Image Refinement After an image is generated, a **Refine Image** panel appears below the result. You can type follow-up prompts to iteratively refine the image (e.g., "make the background blue", "zoom out"). The previously generated image is automatically used as the source for the next edit. The full conversation history (all prompts and images) is preserved on screen. Use **Cmd+Enter** or **Ctrl+Enter** to submit. <kbd><img width="700" src="./docs/img/image-editing-1.png"></kbd> <kbd><img width="700" src="./docs/img/image-editing-2.png"></kbd> ## File Understanding You can upload various file types for analysis through the web UI. Supported file types include: - **Images**: PNG, JPG, JPEG, GIF, WebP - **Documents**: PDF, Word (.doc, .docx), ODT, RTF - **Spreadsheets**: Excel (.xls, .xlsx), CSV, TSV - **Presentations**: PowerPoint (.ppt, .pptx) - **Text & Code**: .txt, .md, .json, .html, .xml, .py, .rb, .js, .ts, .java, .c, .cpp, .go, .rs, .swift, .sql, and many more Maximum file size is 50MB per file. Files are uploaded via OpenAI's Files API for processing and automatically deleted from OpenAI's storage after each response. Screen capture analysis can be executed through the `openai-vision` command, which starts capture mode and lets you specify a part of the screen to be analyzed. You can also send any supported file to OpenAI using the "OpenAI File Input" universal action in Finder. <kbd><img src="./docs/img/openai-workflow-vision.gif" width="700"></kbd> Alternatively, you can use the web UI to upload a file for analysis. The file is sent to the OpenAI API, and the result is displayed in the web UI. <kbd><img src="./docs/img/openai-vision-web-ui.png" width="700"></kbd> You can also specify a file using the universal action hotkey on the file in Finder. ## Speech Synthesis and Speech Recognition Most text-to-speech and speech-to-text features are available on the web UI. However, there are certain specific features provided as commands, such as audio file to text conversion and transcription with timestamps. <kbd><img width="700" src="./docs/img/speech-to-text-web.png"></kbd> **Text-to-Speech Synthesis** Text entered or response text from GPT can be read out in a natural voice using OpenAI's text-to-speech API. - Method 1: Press the `Play TTS` button on the web UI - Method 2: Select text → universal action hotkey → select `OpenAI Text-to-Speech` **Speech-to-Text Conversion** - Method 1: Press the `Voice Input` button on the web UI - Method 2: Alfred textbox → keyword (`openai-speech`) **Audio File to Text** You can select an audio file in `mp3`, `mp4`, `flac`, `webm`, `wav`, or `m4a` format (under 25MB) and send it to the workflow: - Select the file → universal action hotkey → select `OpenAI Speech-to-Text` **Record Voice Audio and Transcribe** You can record voice audio and send it to the Workflow for transcription using the speech-to-text API. - **Web UI (Recommended)**: Press the `Voice Input` button on the web UI to record and transcribe directly in the browser using the Web Audio API. No external tools required. - **Alfred keyword**: Alfred textbox → keyword (`openai-speech`) → redirects to the Web UI for recording. You can choose the format of the transcribed text as `text`, `srt`, or `vtt` in the workflow's settings. Below are examples in the `text` and `srt` formats: <kbd><img width="700" alt="transcript-text" src="./docs/img/transcript-text.png"></kbd> <kbd><img width="700" alt="transcript-srt" src="./docs/img/transcript-srt.png"></kbd> - **Reasoning Effort**: For GPT-5 series models, set the reasoning effort to control how many reasoning tokens the model generates before creating a response. Available values and defaults vary by model: - **gpt-5.5**: `none`, `low`, `medium`, `high`, `xhigh` (default: `none`) - **gpt-5.4**: `none`, `low`, `medium`, `high`, `xhigh` (default: `none`) - **gpt-5.4-mini**: `none`, `low`, `medium`, `high`, `xhigh` (default: `none`) - **gpt-5.4-nano**: `none`, `low`, `medium`, `high`, `xhigh` (default: `none`) - **gpt-5.3-chat-latest**: `medium` only (default: `medium`) - **gpt-5.3-codex**: `none`, `low`, `medium`, `high`, `xhigh` (default: `none`) - **gpt-5.1-codex-mini**: `low`, `medium`, `high` (default: `low`) - **gpt-5-mini**: `minimal`, `low`, `medium`, `high` (default: `minimal`) - **gpt-5-nano**: `minimal`, `low`, `medium`, `high` (default: `minimal`) The `none` setting provides lower-latency interactions similar to non-reasoning models. The `xhigh` setting provides maximum quality for complex tasks. The web UI automatically adjusts available options based on the selected model. **Note**: When using Alfred's Configuration Builder (not the Web UI), all reasoning effort options are shown regardless of the selected model. If an invalid combination is selected (e.g., `none` with `gpt-5.3-codex`), the workflow automatically falls back to the model's default reasoning effort at runtime. **Model selection policy**: This workflow targets quick-turnaround Alfred interactions. Flagship `pro` variants (e.g., `gpt-5.5-pro`) and the `o`-series reasoning models are intentionally **not** bundled because their pricing and latency profile do not fit the workflow's typical use case. The default model is a `mini` tier (`gpt-5.4-mini`) so common usage stays affordable; you can switch to a flagship (`gpt-5.5`, `gpt-5.4`) for harder tasks. See OpenAI's [documentation](https://platform.openai.com/docs/guides/reasoning#reasoning-effort). - **Max Tokens**: Maximum number of tokens to be generated upon completion (default: `2048`). If this parameter is set to `0`, `null` is sent to the API as the default value (the maximum number of tokens is not specified). - **Memory Span**: Set the number of past utterances sent to the API as context. Setting `4` for this parameter means 2 conversation turns (user → assistant → user → assistant) will be sent as context for a new query. The larger the value, the more tokens will be consumed. (default: `10`) - **Max Characters**: Maximum number of characters that can be included in a query (default: `50000`). - **Timeout**: The number of seconds (default: `10`) to wait before opening the socket and connecting to the API. If the connection fails, reconnection (up to 20 times) will be attempted after 1 second. - **Add Emoji**: If enabled, the response text from GPT will contain emoji characters appropriate for the content. This is realized by adding the following sentence at the end of the system content. (default: `enabled`) > Add emojis that are appropriate to the content of the response. - **System Content**: Text to send with every query sent to the API as general information about the specification of the chat. The default value is as follows: > You are a friendly but professional consultant who answers various questions, makes decent suggestions, and gives helpful advice in response to a prompt from the user. Your response must be concise, suggestive, and accurate. **Image Generation/Editing Parameters** Image editing feature is available for GPT Image models (`gpt-image-2`, `gpt-image-1.5`). - **Image Generation Model**: `gpt-image-2` (flagship) and `gpt-image-1.5` are available. (default: `gpt-image-2`) - **Image Size**: Set the size of images to generate: `auto`, `1024x1024`, `1536x1024`, or `1024x1536` (default: `auto`) - **Quality**: Choose the quality of the image: `auto`, `low`, `medium`, or `high` (default: `auto`) - **Content Moderation**: `auto` or `low` (default: `auto`) - **Background**: `auto`, `transparent`, or `opaque` (default: `auto`) **Speech-to-Text Parameters** - **Transcription Model**: One of the available transcription models: `whisper-1`, `gpt-4o-mini-transcribe`, or `gpt-4o-transcribe`. (default: `gpt-4o-mini-transcribe`) - **Transcription Format**: Set the format of the text transcribed from the microphone input or audio files to `text`, `srt`, or `vtt` (default: `text`). Since `srt` and `vtt` formats are supported by `whisper-1` only, the workflow will automatically switch to `whisper-1` when these formats are selected. - **Processes after Recording**: Set the default choice of what processes follow after audio recording finishes. (default: `Transcribe [+ delete recording]`). - Transcribe [+ delete recording] - Transcribe [+ save recording to desktop] - Transcribe and query [+ delete recording] - Transcribe and query [+ save recording to desktop] - **Audio to English**: When enabled, the speech-to-text (STT) API will transcribe the input audio and output text translated into English. (default: `disabled`) **Text-to-Speech Parameters** - **Text-to-Speech Model**: One of the available TTS models: `tts-1`, `tts-1-hd`, or `gpt-4o-mini-tts`. (default: `gpt-4o-mini-tts`) - **Text-to-Speech Voice**: The voice to use when generating the audio. Supported voices are: `alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, and `shimmer`. (default: `alloy`) - **Text-to-Speech Speed**: The speed of the generated audio. Select a value from 0.25 to 4.0. (default: `1.0`) - **TTS Instruction**: Specify character or speaking style instructions for text-to-speech synthesis. - **Automatic Text to Speech**: If enabled, the results will be read aloud using the system's default text-to-speech language and voice. (default: `disabled`) **Other Settings** - **Sound**: If checked, a notification sound will play when the response is returned. (default: `disabled`) - **Save File Path**: If set, the results will be saved in the specified path as a markdown file. (default: `not set`) **Environment Variables** Environment variables can be accessed by clicking the `[x]` button located at the top right of the workflow settings screen. Normally, there is no need to change the values of the environment variables. - `http_keep_alive`: This workflow starts an HTTP server when the web UI is first displayed. After that, if the web UI is not used for the time (in seconds) set by this environment variable, the server will stop. (default: `7200` = 2 hours) - `http_port`: Specifies the port number for the web UI. (default: `8787`) - Note: Default changed to 8787 in v4.0.0 to avoid privileged port conflicts. - `http_server_wait`: Specifies the wait time from when the HTTP server is started until the page is displayed in the browser. (default: `2.5`) - `websocket_port`: Specifies the port number for websocket communication used to display responses in streaming on the web UI. (default: `8080`) ## Troubleshooting - Port conflict or permission error - The web UI binds to `127.0.0.1` on `http_port` (default `8787`). If startup fails with a port error, change the environment variable `http_port` to a free non‑privileged port (e.g., `8888`). - Logs location and rotation - Logs are written to `$alfred_workflow_cache/workflow.log` with simple rotation (up to ~1MB × 5 files). - macOS notification permission - If startup error notifications do not appear, check System Settings → Notifications → allow notifications for Alfred. - Cache management - Uploaded files, TTS audio, and temporary HTML are cached in `$alfred_workflow_cache`. Old files (7+ days) are automatically cleaned up when the server starts. - To manually clear all cached files, use the **Clear Cache** button on the web UI (starter page or chat page). This does not delete your current conversation data. - No external dependencies required - All features work out of the box with macOS built-in tools. ## Author Yoichiro Hasebe (<[email protected]>) ## License The MIT License ## Disclaimer The author assumes no responsibility for any potential damages arising from the use of this software.

AI Tools Workflow Automation

316 Github Stars

Software by yohasebe

openai-chat-api-workflow