Performance & Benchmarks¶

quent adds dispatch overhead to each pipeline step. This page quantifies that overhead and puts it in context. The takeaway: for I/O-bound workloads -- quent's primary use case -- the overhead is a rounding error.

All benchmarks: Python 3.14.3, Apple M-series (ARM64), macOS. Measured with timeit (5 repeats, best-of-5). Scripts in benchmarks/.

At a Glance¶

Pipeline	1 Step	5 Steps	10 Steps	Per Step
Sync	0.8 us	1.7 us	2.7 us	~210 ns
Async	17.5 us	33 us	53 us	~4 us
Mixed (sync+async)	--	22 us	34 us	--

Baselines: raw function call ~26 ns. Bare await coroutine() ~2.8 us.

Sync Pipeline¶

Fully synchronous pipelines execute in a tight while loop. No event loop, no coroutines, no async machinery.

Benchmark	Time
Raw `fn(x)` call	26 ns
Raw 5-call chain `fn(fn(fn(fn(fn(x)))))`	118 ns
Raw 10-call chain	229 ns
`Q().run()` (empty pipeline)	822 ns
1-step pipeline	817 ns
5-step pipeline	1,675 ns
10-step pipeline	2,734 ns

Per-step breakdown¶

The fixed pipeline overhead (empty Q().run()) is ~820 ns. Each additional step adds ~210 ns:

~26 ns -- the function call itself
~30 ns -- frozenset awaitable type check (_SYNC_TYPES lookup)
~154 ns -- linked-list traversal, calling convention dispatch, state management

The per-step cost scales linearly. A 100-step sync pipeline takes approximately 820 + (100 x 210) = 21.8 us.

Async Pipeline¶

Async benchmarks run inside a running event loop (no asyncio.run() startup cost per iteration). This measures quent's actual dispatch overhead, not event loop bootstrapping.

Benchmark	Time
Bare `await async_fn(x)`	2,846 ns
1-step async pipeline	17,499 ns
5-step async pipeline	33,425 ns
10-step async pipeline	52,890 ns
5-step mixed (3 sync + 2 async)	22,137 ns
10-step mixed (5 sync + 5 async)	34,388 ns

Per-step breakdown¶

The first async step pays a one-time transition cost of ~14.7 us (17.5 us total - 2.8 us baseline). Each subsequent async step adds ~4 us.

Mixed pipelines are faster because sync steps within the async continuation still only pay ~210 ns, not the full ~4 us async per-step cost.

Why mixed pipelines matter

Real-world pipelines are rarely all-async. A typical 5-step pipeline might have 2 I/O steps (async) and 3 computation/validation steps (sync). The mixed pipeline benchmark (22 us for 5 steps) is the most realistic number for most applications.

I/O-Bound Workloads¶

quent's primary use case is I/O-bound pipelines -- database queries, HTTP calls, cache lookups. Pipeline overhead is a constant ~22 us for a typical 5-step mixed pipeline. I/O latency is orders of magnitude larger.

Calculated overhead¶

I/O Operation	Typical Latency	Pipeline Overhead	% of Total
Slow cache (network Redis)	500 us	22 us	4.2%
Fast database query	1 ms	22 us	2.2%
Typical database query	5 ms	22 us	0.4%
HTTP API (same region)	10 ms	22 us	0.2%
HTTP API (cross-region)	50 ms	22 us	0.04%
External API	200 ms	22 us	0.01%

Sync-only pipelines with sync I/O

If your pipeline is fully synchronous (e.g. using synchronous Redis or file I/O), the overhead drops to ~1.7 us for 5 steps instead of 22 us. Sync-over-sync pipelines have 10x lower overhead than mixed async pipelines.

Measured with simulated I/O¶

These benchmarks use asyncio.sleep() to simulate real I/O. The pipeline has 3 steps: sync → async I/O → sync.

Scenario	Raw	Pipeline (3 steps)	Measured Delta
1 ms I/O	1.60 ms	1.81 ms	~0.2 ms
5 ms I/O	7.50 ms	7.75 ms	~0.3 ms
50 ms I/O	53.9 ms	53.1 ms	noise

Note

The measured deltas for 1 ms and 5 ms I/O are larger than the calculated ~22 us pipeline overhead because asyncio.sleep() has significant timer jitter at the millisecond scale. The 50 ms measurement confirms that pipeline overhead is lost in the noise for realistic network I/O.

Pipeline Construction¶

Pipeline construction is a build-time cost, not a runtime cost. Pipelines are typically built once and executed many times.

Benchmark	Time
`Q()` (empty)	515 ns
`Q()` + 5x `.then()`	7.4 us
`Q()` + 10x `.then()`	14.6 us

Each .then() call creates a Link node and appends it to the linked list. If construction cost matters for your use case, build once and call .run() repeatedly -- or use .clone() to fork a pre-built base pipeline.

What Makes It Fast¶

quent's engine is optimized for the common case at every decision point:

Fast awaitable check (~30 ns)¶

A frozenset of common sync return types (int, str, float, bool, list, dict, tuple, set, bytes) rejects non-awaitables in a single O(1) lookup. This is ~10x faster than inspect.isawaitable() (~380 ns), which goes through the ABC machinery.

# _engine.py — hot path
_SYNC_TYPES = frozenset({int, str, float, bool, list, dict, tuple, set, bytes})

# After each step:
if type(result) is CoroutineType or (
  result is not None and type(result) not in _SYNC_TYPES and _isawaitable(result)
):
  # async transition

Hot-path calling convention¶

~90% of pipeline steps use the default calling convention (no explicit args). The dispatch front-loads this path:

# _eval.py — _evaluate_value hot path
if not link.args and not link.kwargs:  # ~2ns truthiness check on None
  if link.is_callable:
    return link.v(current_value) if current_value is not Null else link.v()
  return link.v

Zero instrumentation overhead¶

When on_step is None (the default), all timing and callback logic is completely bypassed -- not a no-op callback, but a short-circuited code path. Zero cost when you don't use it.

One-way async transition¶

The sync-to-async transition happens at most once per pipeline execution. Once in async mode, the engine never checks whether to go back to sync. This means N async steps pay one transition cost plus N marginal costs -- not N transition costs.

No async imports on sync path¶

Sync pipelines never touch asyncio at evaluation time. No event loop interaction, no coroutine creation, no async frame allocation.

See Sync/Async Bridging -- Performance for more on the zero-overhead sync path.

Running Benchmarks¶

The benchmarks/ directory contains reproducible scripts:

# Quick overview (timeit-based, ~2 minutes)
python benchmarks/bench_for_docs.py

# Rigorous microbenchmarks (pyperf, ~15 minutes each)
python benchmarks/bench_core.py          # core operations
python benchmarks/bench_async.py         # async execution
python benchmarks/bench_ops.py           # foreach, gather, with_, if_
python benchmarks/bench_q_sizes.py       # scaling from 1 to 1000 steps

# Profiling
python benchmarks/profile_cprofile.py    # cProfile
bash benchmarks/flamegraph_pyspy.sh      # py-spy flamegraph

Tip

For the most stable results, close other applications and use pyperf with --rigorous. The bench_for_docs.py script uses timeit for quick, reproducible results.

Performance & Benchmarks¶

At a Glance¶

Sync Pipeline¶

Per-step breakdown¶

Async Pipeline¶

Per-step breakdown¶

I/O-Bound Workloads¶

Calculated overhead¶

Measured with simulated I/O¶

Pipeline Construction¶

What Makes It Fast¶

Fast awaitable check (~30 ns)¶

Hot-path calling convention¶

Zero instrumentation overhead¶

One-way async transition¶

No async imports on sync path¶

Running Benchmarks¶

Further Reading¶