YT Framework
PyPI | Docs | DeepWiki | Examples
Overview
Python helpers and conventions for YTsaurus pipelines: YAML config, ordered stages under stages/, dev mode that mirrors many prod behaviors on disk, and prod mode that uploads src/ bundles to the cluster.
Architecture
- Pipeline — loads config, builds the YT client, walks
enabled_stages. - Stage — one
BaseStagesubclass plusconfig.yaml(and optionalsrc/for jobs). - Operations — map, vanilla, map-reduce/reduce, YQL via the client, S3 helpers, sorts, etc.
- Configuration — OmegaConf-backed YAML; secrets in
configs/secrets.env.
What ships in the box
- Stage discovery (
DefaultPipeline) from the filesystem layout. dev/prodswitch on the same code paths where possible.- Map, vanilla, YQL helpers, S3 listing/download patterns, table helpers, checkpoint upload wiring.
- Optional custom Docker images, tokenizer tarballs, and multi-operation stages.
Installation
For Users
Install from PyPI into any Python 3.11+ environment (system Python, a virtualenv, or a Conda env):
pip install yt-framework
For Developers and Contributors
Recommended: one Conda environment for tests, formatting, pre-commit, and local documentation builds (avoids reinstalling tooling for each task):
git clone https://github.com/GregoryKogan/yt-framework.git
cd yt-framework
conda create -n yt-framework python=3.11
conda activate yt-framework
pip install -e ".[dev,docs]"
Use conda-forge as the channel when creating the env if that matches your setup (conda create -n yt-framework python=3.11 -c conda-forge).
Alternative: pip only — install in editable mode from source:
git clone https://github.com/GregoryKogan/yt-framework.git
cd yt-framework
pip install -e .
For development with testing tools (without the docs extra):
pip install -e ".[dev]"
For local Sphinx builds without the full dev extra, use pip install -e ".[docs]".
See CONTRIBUTING.md for the full development setup and Installation Guide for prerequisites.
Quick start
Three files: layout, entrypoint, stage + pipeline config.
-
Layout
mkdir my_pipeline && cd my_pipeline mkdir -p stages/my_stage configs -
pipeline.pyfrom yt_framework.core.pipeline import DefaultPipeline if __name__ == "__main__": DefaultPipeline.main() -
Stage + config
# stages/my_stage/stage.py from yt_framework.core.stage import BaseStage class MyStage(BaseStage): def run(self, debug): self.logger.info("Hello from YT Framework!") return debug# configs/config.yaml stages: enabled_stages: - my_stage pipeline: mode: "dev" # Use "dev" for local development
python pipeline.py
Next: Docs quick start (table write), examples/, Pipelines and stages.
Examples
examples/ holds runnable trees; each folder has a README with scope and commands.
Requirements
Prerequisites
- Python 3.11+
- YT proxy + token when you run
pipeline.mode: prod
YT Cluster Requirements
When running pipelines in production mode, code from ytjobs executes on YT cluster nodes. The cluster's Docker image (default or custom) must include:
- Python 3.11+
- ytsaurus-client >= 0.13.0 (for checkpoint operations)
- boto3 == 1.35.99 (for S3 operations)
- botocore == 1.35.99 (auto-installed with boto3)
If the cell default image lacks those pins, build a custom Docker image. Background: Cluster requirements.
Documentation
- Published: yt-framework.readthedocs.io
- Source:
docs/ - Examples
Getting help
- Troubleshooting
- GitHub Issues (bugs, features, questions with the
questionlabel)
Contributing
See CONTRIBUTING.md