Home
Softono

Open Source Software Directory

Discover self-hostable and developer-friendly software with rich filters for category, tags, features, and technical stack.

unstructured

![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg) ![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/... Read More

14.9K stars open source
AI & Machine Learning Data Pipelines & ETL

dolphinscheduler

# Apache Dolphinscheduler [![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html) ![codecov](https://codecov.io/... Read More

14.3K stars open source
Cron & Job Scheduling Data Pipelines & ETL

kedro

[![Python version](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13%20%7C%203.14-blue.svg)](https://pypi.org/project/kedro/)... Read More

10.9K stars open source
ML Frameworks Data Pipelines & ETL

seatunnel

# Apache SeaTunnel [![Build Workflow](https://github.com/apache/seatunnel/actions/workflows/build_main.yml/badge.svg?branch=dev)](https://github.com/apache/seatunnel/actions/wor... Read More

9.4K stars open source
AI & Machine Learning Data Pipelines & ETL

unstract

Unstract Turn Unstructured Documents into Structured Data Documentation | Enterprise... Read More

6.6K stars open source
AI Agents Data Pipelines & ETL

data-juicer

# Data-Juicer: The Data Operating System for the Foundation Model Era Multimodal | Cloud-Native | AI-Ready | Large-Scale Data-Ju... Read More

6.5K stars open source
Data Labeling Data Pipelines & ETL

Parsr

Turn your documents into data! Français | Portuguese | Spanish | 中文 > [!WARNING] > **This project is no longer maintained.** Security patches are not being... Read More

6.2K stars open source
ML Frameworks Data Pipelines & ETL

trafilatura

# Trafilatura: Discover and Extract Text Data on the Web [![Python package](https://img.shields.io/pypi/v/trafilatura.svg)](https://pypi.python.org/pypi/trafilatura) [![Pyth... Read More

6.1K stars open source
Browser Automation Data Pipelines & ETL

flashtext

========= FlashText ========= .. image:: https://api.travis-ci.org/vi3k6i5/flashtext.svg?branch=master :target: https://travis-ci.org/vi3k6i5/flashtext :alt: Build Status .... Read More

5.7K stars open source
ML Frameworks Data Pipelines & ETL

Daft

|Banner| |CI| |PyPI| |Latest Tag| |Coverage| |Slack| `Website `_ • `Docs `_ • `Installation `_ • `Daft Quickstart `_ • `Community and Support `_ Daft: High-Performance Data Engi... Read More

5.6K stars open source
ML Frameworks Data Pipelines & ETL

SynapseML

![SynapseML](https://mmlspark.azureedge.net/icons/mmlspark.svg) # Synapse Machine Learning SynapseML (previously known as MMLSpark), is an open-source library that simplifies the... Read More

5.2K stars open source
ML Frameworks Data Pipelines & ETL