Home
Softono
flintml

flintml

Open source Python
140
Stars
7
Forks
1
Issues
0
Watchers
11 months
Last Commit

About flintml

FlintML is an all-in-one, self-hosted machine learning platform designed for real-world teams seeking a simple, flexible, and fast-to-deploy solution. It centralizes the entire MLOps lifecycle to reduce infrastructure overhead while providing a developer-centric experience. The platform features a Delta Lake storage layer that adds ACID guarantees and time travel to data pipelines, a unified data catalog treating tables, models, and artifacts as first-class citizens, and efficient data processing powered by Polars for memory-efficient operations. Integrated tools include experiment tracking via Aim for comparing runs, a Jupyter Lab environment for seamless notebook development, and workflow orchestration through Dagster. Built on Docker Compose, FlintML offers reproducible compute with support for flexible drivers and composable, declarative infrastructure. It requires Docker and runs on Linux or WSL systems, currently excluding ARM architecture. Users can quickly deploy the platform with a single command to

Platforms

Web Self-hosted

Languages

Python

Links

FlintML Logo Text
Version 0.2.1 License BSL 1.1



FlintML is the all-in-one, self-hosted ML platform for real-world teams. Simple, flexible, fast to deploy, and built for people solving actual problemsβ€”not chasing hype.

FlintML Logo Text

πŸš€ Quickstart

curl -sL https://raw.githubusercontent.com/flintml/flint/main/flintml-quickstart.sh | bash

FlintML will become available at localhost:8701. The first time you execute code may take a couple of minutes while FlintML downloads the relevant worker image.

Note: ensure Docker is installed and you are on a Linux (or WSL) machine (ARM currently not supported.)

FlintML Logo Why FlintML?

FlintML enables teams to deliver end-to-end ML quickly and with minimal infrastructure overhead. With FlintML, all key components of the MLOps process are centralised, providing an integrated and developer-centric experience.

Core features:

  • βœ” **[Delta Lake](https://github.com/delta-io/delta) storage layer** – adds ACID guarantees and time travel to your data pipelines with scalable, versioned storage.
  • βœ” **Unified data catalog** – tables, models, artefacts, and any other file types are treated as first-class data citizens.
  • βœ” **Efficient data processing with [Polars](https://github.com/pola-rs/polars)** – leverage lazy execution for memory-efficient data operations.

Integrated tools:

  • βœ” **Experiment tracking with [Aim](https://github.com/aimhubio/aim)** – run experiments and compare them all in one place.
  • βœ” **Familiar notebook development environment** – all functions are seamlessly integrated with Jupyter Lab.
  • βœ” **Workflow orchestration via [Dagster](https://github.com/dagster-io/dagster) (WIP\*)** – load data, retrain models, and run inference on a schedule.

Platform & Deployment:

  • βœ” **Flexible and reproducible compute** – switch between compute [drivers](docs/concepts.md#drivers) to fit your use case, or write your own.
  • βœ” **Composable, declarative infrastructure** – have full control over your Docker Compose deployment.

πŸ”Ž Demo

To get a sense of what you can do with FlintML, check out the Instacart Kaggle example. You can also read about FlintML concepts and check out the reference to learn more about the platform's capabilities

βš™οΈ Customising Your Deployment

Data Storage

FlintML ships with its own Storage service that depends on the mounts, storage_data and storage_meta. If you wish to specify custom volumes, you should create an override docker-compose.override.yml and compose it when spinning up flint. See the docs.

Environment Variables

FLINT_PORT

Defines the port number at which the FlintML platform will be served.

Default of 8701.

STORAGE_USER and STORAGE_PASSWORD.

These environment variables define the login credentials that Worker Containers and services in the Flint Control Plane use to authenticate to the Flint Catalog.

Defaults are admin and password respectively.

DRIVER_CONFIG

This must be the file path to the desired Worker Configuration.

No default. See example env file.

DOCKER_SOCKET

FlintML uses Worker Containers to execute code and run jobs. These containers are orchestrated by the configured driver. If the Local Driver is selected, the Docker socket must be mounted to the Compute Manager service in the Flint Control Plane.

Since mounting the Docker socket has security implications, the default value for this environment variable is /dev/null, meaning that the Docker socket IS NOT mounted.

The quickstart uses the Local Driver for simplicity and thus mounts the docker socket. See example env file.

Worker Configuration

To see the full worker specificaton, see the schema. Some example configurations are provided below.

Local Driver with mounted NAS volume:

driver:
  type: local
  image: flintml/worker-base:latest
  mounts:
    # <mount_name>: <host_mount_point>
    image-data: /mnt/nas/images # Will be available at `/mnt/image-data` inside worker container.

To customise compute environments, you will need to use a custom image.

Building Locally

All FlintML codefiles live under src/. You will see two sets of Compose, env and worker config files - build and release-template. release-template files are used to define the release tarball and thus can be ignored for local development.

To build locally, you must firstly build the base worker image so it exists in your local Docker registry:

docker build -f ./src/worker-base/Dockerfile -t worker-base:latest ./src/

Then you can spin up the platform by using the build files:

docker compose -f ./src/docker-compose.build.yml --env-file ./src/.env.build up

Note: If you update the dependencies of any packages in src/common-lib, make sure you run ./update-common-lib.sh to update dependant Poetry lock files.

🎯 Roadmap

  1. Workflows (in-progress)
  2. Data upload support (can be achieved by defining a worker mount but is inconvenient)
  3. Libcloud driver
  4. Multi-user support

🀝 Contributing

We would be stoked for you to get involved with FlintML development! If you'd like to get more involved, please contact us at [email protected].