aporia-ai

Open Source

mlplatform-workshop

# ML Platform Workshop This repo contains example code for a (very basic) ML platform. * The **model-template** directory contains an example for a Cookiecutter-based template that data scientists can clone to start a new project. * The **infra** directory contains Pulumi code that spins up the shared infrastructure of the ML platform, such as Kubernetes, MLFlow, etc. Made with :heart: by <a href="https://www.aporia.com?utm_source=github&utm_medium=github&utm_campaign=mlplatform-workshop" target="_blank">Aporia</a> ## The YouTube Video [![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/s8Jj9gzQ3xA/0.jpg)](https://www.youtube.com/watch?v=s8Jj9gzQ3xA) ## Why? As data science teams become more mature with models reaching actual production, the need for a proper infrastructure becomes crucial. Leading companies in the field with massive engineering teams like Uber, Netflix and Airbnb had created multiple solutions for their infrastructure and named the combination of them as “ML Platform”. We hope this repo can help you get started with building your own ML platform ❤️ ## Architecture <img src="docs/architecture.png"> ### Based on the following projects: * [FastAPI](https://fastapi.tiangolo.com/) - for model serving * [MLFlow](https://www.mlflow.org/) - for experiment tracking * [DVC](https://dvc.org/) - for data versioning * [Cookiecutter](https://cookiecutter.readthedocs.io/) - for the model template * [Pulumi](https://www.pulumi.com/) - Infrastructure as Code * [GitHub Actions](https://github.com/features/actions) - for CI/CD * [Traefik](https://traefik.io/) - API gateway * [Poetry](https://python-poetry.org/) - Python dependency management When building your own ML platform, do not take these tools for granted! [Check out alternatives](https://mlops.toys) and find the best tools that solve each one of your problems. ## What's missing from this? Well... a lot actually. Here's a partial list: * HTTPS & Authentication * Environments (staging, production) * Common library for preprocessing, postprocessing, etc * Model input & validation * Training orchestration * and probably much more! We would love your help!

ML Frameworks

445 Github Stars

mlops.toys

<img src="assets/icons/logo.png" width="100" /> Curated list of useful MLOps projects, tools and resources. **Visit at [https://mlops.toys](https://mlops.toys)!** Made with :heart: by <a href="https://www.aporia.com?utm_source=github&utm_medium=github&utm_campaign=mlops-toys" target="_blank">Aporia</a> ## Contribute We'd love your help! If we missed a project, please [create an issue](https://github.com/aporia-ai/mlops.toys/issues/new) with the name of the project and we'll add it :) You can also directly create a [pull request](https://github.com/aporia-ai/mlops.toys/edit/main/store/data/projects.yaml). To run the project locally: `npm run dev`

Education & Learning ML Frameworks

188 Github Stars

inferencedb

<img src="logo.svg" width="400" /> --- **InferenceDB** makes it easy to stream inferences of real-time ML models in production to a data lake, based on Kafka. This data can later be used for model retraining, data drift monitoring, performance degradation detection, AI incident investigation and more. ### Quickstart * [Flask](https://github.com/aporia-ai/inferencedb/wiki/Flask-Quickstart) * [FastAPI](https://github.com/aporia-ai/inferencedb/wiki/FastAPI-Quickstart) * [KServe](https://github.com/aporia-ai/inferencedb/wiki/KServe-Quickstart) ### Features * **Cloud Native** - Runs on top of Kubernetes and supports any cloud infrastructure * **Model Serving Integrations** - Connects to ML model serving tools like [KServe](https://kserve.github.io/website/) * **Extensible** - Add your own model serving frameworks and database destinations * **Horizontally Scalable** - Add more workers to support more models and more traffic * **Python Ecosystem** - Written in Python using [Faust](https://faust.readthedocs.io/en/latest/), so you can add your own data transformations using Numpy, Pandas, etc. Made with :heart: by <a href="https://www.aporia.com?utm_source=github&utm_medium=github&utm_campaign=inferencedb" target="_blank">Aporia</a> **WARNING:** InferenceDB is still experimental, use at your own risk! 💀 ## Installation The only requirement to InferenceDB is a Kafka cluster, with [Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html) and [Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html). To install InferenceDB using Helm, run: ```sh helm install inferencedb inferencedb/inferencedb -n inferencedb --create-namespace \ --set kafka.broker=kafka:9092 \ --set kafka.schemaRegistryUrl=http://schema-registry:8081 \ --set kafka.connectUrl=http://kafka-connect:8083 ``` ## Usage To start logging your model inferences, create an **InferenceLogger** Kubernetes resource. This is a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) that is defined and controlled by InferenceDB. **Example:** ```yaml apiVersion: inferencedb.aporia.com/v1alpha1 kind: InferenceLogger metadata: name: my-model-inference-logger namespace: default spec: topic: my-model events: type: kserve config: {} destination: type: confluent-s3 config: url: s3://my-bucket/inferencedb format: parquet awsRegion: us-east-2 ``` This InferenceLogger will watch the `my-model` Kafka topic for events in KServe format, and log them to a Parquet file on S3. See the [KServe quickstart guide](https://github.com/aporia-ai/inferencedb/wiki/KServe-Quickstart) for more details. ## Development InferenceDB dev is done using [Skaffold](https://skaffold.dev/). Make sure you have a Kubernetes cluster with Kafka installed (can be local or remote), and edit [skaffold.yaml](skaffold.yaml) with the correct Kafka URLs and Docker image registry (for local, just use `local/inferencedb`). To start development, run: skaffold dev --trigger=manual This will build the Docker image, push it to the Docker registry you provided, and install the Helm chart on the cluster. Now, you can make changes to the code, click "Enter" on the Skaffold CLI and that would update the cluster. ## Roadmap ### Core * [ ] Add support for Spark Streaming in addition to Faust * [ ] Add more input validations on the Kafka URLs ### Event Processors * [x] JSON * [x] KServe * [ ] Seldon Core * [ ] BentoML * [ ] MLFlow Deployments ### Destinations * [x] Parquet on S3 * [ ] HDF5 on S3 * [ ] Azure Blob Storage * [ ] Google Cloud Storage * [ ] ADLS Gen2 * [ ] AWS Glue * [ ] Delta Lake * [ ] PostgreSQL * [ ] Snowflake * [ ] Iceberg ### Documentation * [ ] How to set up Kafka using AWS / Azure / GCP managed services * [ ] API Reference for the CRDs

DevOps & Infrastructure ML Frameworks

81 Github Stars

Software by aporia-ai

mlplatform-workshop

mlops.toys

inferencedb