chonkle

A Python host for Wasm codec pipelines. Pipelines are directed acyclic graphs (DAGs) of codec steps defined in JSON. The orchestrator parses the DAG, validates wiring against codec signatures, and executes the pipeline via Wasmtime.

Status: proof of concept.

Codec backends

chonkle supports three codec backends. Each implements the same Codec ABC (call(direction, port_map) and signature()), so backends can be mixed freely within a single pipeline.

Component Model Wasm — .wasm components implementing the chonkle:codec/transform@0.1.0 WIT interface. Any language with a Component Model toolchain (Rust, C, Python via componentize-py) can produce a conforming component. Data transfer uses the canonical ABI. The Wasmtime sandbox isolates each component from the host.

Core Wasm — wasm32-wasi reactor modules using a binary port-map wire format via Memory.read/Memory.write. When consecutive pipeline steps are both core wasm, data transfers between their linear memories use ctypes.memmove (single-copy, no serialization round-trip).

Native (numcodecs) — Python codecs from the numcodecs library. No Wasm overhead. numcodecs and numpy are optional dependencies, imported lazily. Adding a new numcodecs codec requires only adding a signature file.

The Resolver selects among available implementations using a configurable backend preference list. The default preference order is ["native", "core", "component"].

Usage

CLI

# Run a pipeline
chonkle run pipeline.json --input bytes=chunk.bin --output bytes=out.bin

# With resolver options
chonkle run pipeline.json --input bytes=chunk.bin \
  --direction decode \
  --codec-store ./codec/ \
  --preference core,component,native \
  --override zlib=zlib-rs \
  --source zlib=https://example.com/zlib.wasm

# List installed codecs
chonkle codecs

# Show details for a specific codec
chonkle codecs zlib

# Embed a signature into a .wasm binary (build-time tool)
chonkle embed-signature codec.wasm signature.json

Python API

from chonkle.pipeline import prepare
from chonkle.executor import run

prepared = prepare("pipeline.json", direction="decode")
outputs = run(prepared, {"bytes": chunk_bytes})

Format drivers

The executor is format-agnostic. It accepts a pipeline DAG and chunk data, runs the codecs, and returns the result. It has no knowledge of Zarr, Parquet, COG, ORC, or any other file format.

A format driver is the layer above the executor that bridges a specific file format and the pipeline executor. It reads format-specific metadata, translates it into a pipeline DAG, supplies metadata-derived inputs, and manages chunk I/O. Format drivers are outside the scope of this repository.

Documentation

docs/OVERVIEW.md — Architecture, design rationale, and execution model
docs/reference/PIPELINE_SCHEMA.md — Pipeline JSON schema
docs/reference/codec-contract/ — Codec interface specs (Component Model, Core Wasm, Native)
docs/reference/CODEC_RESOLUTION.md — Codec resolution chain and backend preference

See docs/README.md for the full index.

Development

Package manager: uv
Build backend: hatchling
Python: >= 3.13
Linting/formatting: ruff
Type checking: mypy
Testing: pytest
Pre-commit: ruff check, ruff format, mypy, yaml/toml validation
CI: GitHub Actions (lint on 3.14, test on 3.13 and 3.14)

# Install dependencies
uv sync

# Include native (numcodecs) backend
uv sync --extra native

# Run tests
uv run pytest

# Run linter
uv run ruff check

# Network tests (downloads codecs from OCI registries)
uv run pytest --run-network

Acknowledgements

Partially supported by NASA-IMPACT VEDA project.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github/workflows		.github/workflows
bench		bench
codec		codec
demo		demo
docs		docs
src/chonkle		src/chonkle
tests		tests
.gitignore		.gitignore
.markdownlint.jsonc		.markdownlint.jsonc
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chonkle

Codec backends

Usage

CLI

Python API

Format drivers

Documentation

Development

Acknowledgements

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

chonkle

Codec backends

Usage

CLI

Python API

Format drivers

Documentation

Development

Acknowledgements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages