About flowview

Visual data pipeline debugger for Polars.

g

Published by

guillermodotn

Visit View Profile

README.md

View on GitHub

flowview

Visual data pipeline debugger for Polars. Stop print-debugging your pipelines.

flowview demo

Install

pip install flowview

or with uv:

uv add flowview

Quick Start

Add @fv.trace to any function that transforms a Polars DataFrame. flowview traces every method call and renders a visual flow in your terminal.

import polars as pl
import flowview as fv

@fv.trace
def process(df: pl.DataFrame) -> pl.DataFrame:
    return (
        df.filter(pl.col("status") == "active")
          .with_columns((pl.col("price") * pl.col("quantity")).alias("revenue"))
          .group_by("category")
          .agg(pl.col("revenue").sum().alias("total_revenue"))
          .sort("total_revenue", descending=True)
    )

df = pl.DataFrame({
    "status": ["active", "inactive", "active"],
    "category": ["Books", "Books", "Electronics"],
    "price": [14.99, 599.99, 299.99],
    "quantity": [5, 1, 2],
})

result = process(df)

.pipe() chains work too:

@fv.trace
def process(df: pl.DataFrame) -> pl.DataFrame:
    return df.pipe(clean).pipe(filter_active).pipe(add_revenue)

What You See

Each step in your pipeline is displayed as a box showing:

Row count with diff from the previous step (e.g., 700 rows x 4 cols (-300 rows))
Schema changes — columns added or removed (e.g., +cols: revenue -cols: status)
Sample data — first N rows at each transformation
Execution time per step

Steps are connected with arrows to show the flow. A summary footer shows the total step count and wall-clock time.

Supported Operations

flowview traces any DataFrame method that returns a new DataFrame. These methods get human-readable step names:

Method	Step name example
`filter(expr)`	`filter((col("status")) == ("active"))`
`with_columns(exprs)`	`with_columns(revenue, tax)`
`select(cols)`	`select(status, price)`
`drop(cols)`	`drop(status, category)`
`rename(mapping)`	`rename(price->unit_price)`
`sort(cols)`	`sort(price, quantity)`
`head(n)` / `tail(n)`	`head(10)` / `tail(5)`
`unique(subset)`	`unique(id)`
`join(other, ...)`	`join(on=id, how=left)`
`group_by(cols).agg(exprs)`	`group_by(category).agg(total_revenue)`
`pipe(fn)`	uses the function name, e.g. `clean_data`

Other methods (e.g., explode, melt, unpivot) are traced with a fallback name like explode('tags').

Options

@fv.trace(sample_rows=3, show_sample=True, show_schema=True)
def process(df):
    ...

Option	Type	Default	Description
`sample_rows`	`int`	`5`	Number of sample rows to capture at each step
`show_sample`	`bool`	`True`	Display sample data tables in the output
`show_schema`	`bool`	`False`	Display the full schema at each step

How It Works

The @fv.trace decorator wraps the first DataFrame argument in a lightweight proxy before calling your function. The proxy intercepts every method call, captures a snapshot of the result (row count, schema, sample rows, timing), and delegates to the real Polars DataFrame underneath. When your function returns, the proxy is unwrapped and you get back a regular pl.DataFrame.

There is no monkey-patching and no global state. Each decorated call is fully isolated.

Limitations

LazyFrame is not supported — df.lazy() exits the proxy. Only eager DataFrames are traced.
Pipe internals are not individually traced — df.pipe(fn) produces a single step named after fn, not one step per operation inside fn.
IDE autocomplete may not show DataFrame methods inside the decorated function body.
type(df) returns TracedDataFrame inside the decorated function. isinstance(df, pl.DataFrame) works correctly.
Only the first DataFrame argument is wrapped when a function takes multiple DataFrames.

License

MIT

flowview

About flowview

Platforms

Languages

Links

README.md

flowview

Install

Quick Start

What You See

Supported Operations

Options

How It Works

Limitations

License