Global-Bandwidth Local Spectral Representation for continuous image reconstruction.
Paper: arXiv:2606.19617
A fixed-grid local spectral image representation: the image domain is partitioned into a fixed grid of non-overlapping square patches, each patch carries a small block of coefficients for a truncated Fourier basis predicted from shared convolutional-encoder features by a single linear projection, and a single trainable scalar bandwidth is shared globally across all patches. Reconstruction at any continuous coordinate is a fixed-size basis contraction whose cost is independent of image size.
gblsr/ repo root
├── pyproject.toml project metadata + dependencies + CLI entries
├── uv.lock reproducible dependency pins (uv)
├── README.md (this file)
├── LICENSE BSD 3-Clause
├── CITATION.cff GitHub-readable citation metadata
├── src/gblsr/ Python package
│ ├── models/ Basis, encoder, local-spectral decoder
│ ├── encoders/ Heavyweight image encoders (e.g. RDN)
│ ├── asr/ Arbitrary-scale SR extension (RDN + ASR decoder)
│ ├── latency/ Fixed GPU latency protocol
│ ├── training/ Training driver + losses
│ ├── data/ Image loaders + region slicing
│ ├── metrics/ PSNR / SSIM / LPIPS / edge-LPIPS / LSE
│ └── cli/ Console-script entry points
├── scripts/ Repo-root shims for the CLI commands
├── configs/ Example YAML configs
└── tests/ Unit + smoke + end-to-end tests
Python 3.10+.
uv gives reproducible installs via the
checked-in uv.lock and is the canonical dev workflow for this repo:
uv sync # resolve + install from uv.lock (reproducible)
uv sync --extra dev # also installs pytest + ruff
uv run pytest # run any command in the env without activating it
uv run gblsr-train --config configs/example.yamlTo bump dependencies, edit pyproject.toml and run uv lock to
refresh the lock file.
If you do not have uv installed, plain pip works too (you do not get
the pinned-versions reproducibility of uv.lock, but the package will
install fine):
pip install -e . # editable install for development
pip install -e ".[dev]" # also installs pytest + ruffCUDA toolkit and a matching torch build must be installed separately.
Six console commands are installed by pip install -e . / uv sync:
| Command | Purpose |
|---|---|
gblsr-train |
Train one or more (arm, seed) combinations from a RunConfig YAML. |
gblsr-eval |
Evaluate a saved checkpoint on the val split; prints aggregate as JSON. |
gblsr-measure-latency |
Measure per-image inference latency under the fixed GPU latency protocol. |
gblsr-reconstruct |
Run a trained checkpoint on one image and write the reconstruction. |
gblsr-encode |
Encode an image to a compact feature tensor (encoder-only forward). |
gblsr-decode |
Decode a feature tensor back to an image (decoder-only forward). |
Usage:
uv run gblsr-train --config configs/example.yaml
uv run gblsr-eval --config configs/example.yaml --checkpoint <path/to/model.pt>
uv run gblsr-measure-latency --config configs/example.yaml --device cuda
uv run gblsr-reconstruct \
--config configs/example.yaml \
--checkpoint <path/to/model.pt> \
--input image.png --output recon.png
uv run gblsr-encode \
--config configs/example.yaml \
--checkpoint <path/to/model.pt> \
--input image.png --output features.pt
uv run gblsr-decode \
--config configs/example.yaml \
--checkpoint <path/to/model.pt> \
--input features.pt --output recon.pngAlso reachable as python -m gblsr.cli.<name> or scripts/<name>.py.
The most-used entry points are re-exported at the package top level:
from gblsr import LocalSpectralArm, build_model, ModelConfig
from gblsr import BasisConfig, EncoderConfig
from gblsr import measure_latency, LatencyConfigFor specialized entry points, use the subpackages:
from gblsr.models import (
LocalSpectralArm, BaselineArm,
LocalSpectralDecoder, GlobalFourierMLPDecoder,
build_model, ModelConfig, EncoderConfig, BasisConfig,
)
from gblsr.encoders import RDNEncoder, RDNConfig, build_rdn_encoder
from gblsr.latency import measure_latency, LatencyConfig, LatencyResult
from gblsr.training import RunConfig, train_one, build_model_from_run_config
from gblsr.metrics import psnr, ssim, lpips_metric, edge_lpips, local_spectrum_error
from gblsr.data import DataConfig, build_datasets, label_patchesEach __init__.py declares an __all__ listing the canonical
public symbols; the deep-import paths (e.g.
gblsr.models.arms.LocalSpectralArm) continue to work for power
users.
import torch
from gblsr import LocalSpectralArm, ModelConfig, EncoderConfig, BasisConfig
from gblsr.latency import measure_latency
# Build a tiny model
mc = ModelConfig(
arm="local_spectral",
image_size=256,
patch_size=32,
basis=BasisConfig(patch_size=32, p_max=16),
encoder=EncoderConfig(d_feat=128),
)
model = LocalSpectralArm(mc, bandwidth_mode="global_scalar").eval()
# Forward pass
x = torch.randn(1, 3, 256, 256)
with torch.no_grad():
out = model(x)
recon = out["recon"]
print(recon.shape) # (1, 3, 256, 256)
# Inference latency (10 warmup + 50 timed reps under the fixed protocol)
result = measure_latency(model, x, track_peak_memory=False)
print(f"median latency: {result.median_ms:.2f} ms")The GB-LSR family is parameterized by bandwidth_mode in
LocalSpectralDecoder:
bandwidth_mode |
Description |
|---|---|
fixed_midpoint |
Bandwidth pinned to a single fixed value (no training). |
global_scalar |
Main variant: one global trainable scalar shared across all patches. |
local_linear |
Per-patch bandwidth from a linear-sigmoid head. |
local_logspace |
Per-patch bandwidth from a log-space sigmoid head. |
gblsr.asr decodes one RDN-encoder feature map at any output resolution via
a LIIF/LTE-style continuous-query interface. All reported variants are
config-only (no separate code path):
import torch
from gblsr import GBLSRScalarASR
model = GBLSRScalarASR() # base, 22.02M params
noLE = GBLSRScalarASR(decoder_cfg={"local_ensemble": False}) # drop 4-corner ensemble
nf96 = GBLSRScalarASR(encoder_cfg={"num_features": 96}) # wider encoder, 24.93M
hr = model.predict_full(torch.rand(1, 3, 64, 64), H_q=256, W_q=256) # (1,3,256,256)encoder_cfg takes RDNConfig fields; decoder_cfg takes
GBLSRScalarASRDecoder arguments (p_max, bandwidth_init,
local_ensemble); variants compose (e.g. nf96+noLE).
Scope: ships the method (encoder + decoder + predict_full). The training
recipe (1M steps on DIV2K) is in the paper appendix; the timing harness and
the LIIF/LTE/SwinIR baselines are not bundled.
The data loaders read external datasets from disk (paths configured per run). Canonical sources:
- DTD: https://www.robots.ox.ac.uk/~vgg/data/dtd/
- DIV2K: https://data.vision.ee.ethz.ch/cvl/DIV2K/
- Kodak: https://r0k.us/graphics/kodak/
- Set14 / Set5 / Urban100: https://github.com/jbhuang0604/SelfExSR
- BSDS300 / B100: https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/segbench/
Kodak / Set14 / Urban100 are the native-reconstruction benchmark; Set5 / B100 (and Set14 / Urban100) are the arbitrary-scale SR evaluation datasets.
Defaults are deployment-conservative (batch=1, no AMP, no
torch.compile, no CUDA Graphs); gblsr-measure-latency uses the
same defaults. For production, layer these on top of
LocalSpectralArm:
torch.compile(model = torch.compile(model)): ~2.4x at 256x256 (1.43 ms -> 0.58 ms on NVIDIA H200 SXM 141GB). One-time ~60 s compile per input shape.- Batching: pass
(B, 3, H, W); per-image cost amortizes. - CUDA Graphs: capture + replay at a fixed input shape.
- AMP (bf16/fp16): use only inside a larger AMP pipeline.
If you use GB-LSR in your work, please cite:
@article{shad2026gblsr,
title = {GB-LSR: A Fast Local Spectral Image Representation with a
Single Global Bandwidth for Continuous Reconstruction and
Super-Resolution},
author = {Shad, Max and Khoshnevis, Naeem},
journal = {arXiv preprint arXiv:2606.19617},
year = {2026},
}BSD 3-Clause. Copyright (c) 2026, President and Fellows of Harvard College. See LICENSE for the full text.