hf-cache-sync

Sync your Hugging Face model cache to Backblaze B2 (or any S3-compatible store), share it across machines, and keep local disk usage under control with LRU eviction.

What Is hf-cache-sync?

hf-cache-sync is a CLI tool that syncs the local Hugging Face hub cache (~/.cache/huggingface/hub) to S3-compatible object storage. It is built for ML teams and CI pipelines that waste bandwidth and disk re-downloading the same 5-30 GB models on every machine.

It works as a two-tier cache: local disk + a shared remote bucket. One machine downloads a model, pushes the blobs to the bucket, and every other machine hydrates from there instead of pulling from the Hub again.

Problem

Hugging Face models and datasets are cached locally. Disks fill up fast, teams re-download identical artifacts on every machine, CI runners fetch the same weights every run, and Windows symlink limitations cause blob duplication. There is no native way to back the cache with shared object storage.

Solution

hf-cache-sync uploads content-addressed blobs and revision manifests to a Backblaze B2 bucket (or any S3-compatible store). Other machines pull from the bucket to reconstruct the local cache. An LRU prune command enforces a local disk budget by evicting old revisions.

Who Should Use This

ML teams (5-50 engineers) sharing models across workstations
CI/CD pipelines that repeatedly download the same model weights
GPU cluster operators preloading models across nodes
Anyone hitting disk limits from accumulated HF cache

Key Features

Push — Upload local cache blobs, manifests, and refs to remote storage. Content-addressed by SHA-256; existing blobs are skipped.
Pull — Hydrate local cache from remote. Downloads blobs, reconstructs snapshot directories with symlinks. Hash-verified on download.
Pull-all — Pull every model available in the remote bucket.
Prune — LRU eviction to stay within a local disk budget. Evicts by revision, preserves active refs, cleans orphaned blobs.
Team namespacing — Per-team prefix isolation within a single bucket. Read-only keys for devs, read-write for CI.
Windows fallback — Symlink-first, with automatic hardlink/copy fallback when symlinks are unavailable.

Architecture

[Local HF Cache] --push--> [Backblaze B2 Bucket]
                 <--pull--

Component	Description	Path
CLI	Click-based command interface	`src/hf_cache_sync/cli.py`
Cache scanner	Reads local HF hub cache structure (blobs, snapshots, refs)	`src/hf_cache_sync/cache.py`
Storage backend	S3-compatible client via boto3. Sets `b2ai-hfcache` user-agent for B2 endpoints.	`src/hf_cache_sync/storage.py`
Manifest	JSON manifest per repo/revision mapping files to blob hashes	`src/hf_cache_sync/manifest.py`
Push	Uploads missing blobs + manifests + refs	`src/hf_cache_sync/push.py`
Pull	Downloads blobs, reconstructs snapshots	`src/hf_cache_sync/pull.py`
Prune	LRU eviction + orphan blob cleanup	`src/hf_cache_sync/prune.py`

Remote storage layout

blobs/<sha256>                          # content-addressed model files
manifests/<repo>@<revision>.json        # file list per revision
refs/<repo>/<ref>                       # ref -> commit hash

Quick Start

Prerequisites:

Python >= 3.9
A Backblaze B2 bucket (or any S3-compatible store)
B2 application key ID and key (or AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars)

pip install hf-cache-sync

hf-cache-sync init          # creates .hf-cache-sync.yaml
# edit .hf-cache-sync.yaml with your bucket details

hf-cache-sync push          # upload local cache to remote
hf-cache-sync pull mistralai/Mistral-7B-v0.1   # hydrate on another machine
hf-cache-sync prune --max-gb 50                 # enforce disk budget

Step-by-Step Setup

1. Install

pip install hf-cache-sync

2. Create a Backblaze B2 Bucket

Create a bucket (private recommended)
Create an application key scoped to that bucket
Note the key ID, the key, and the bucket's S3 endpoint (e.g. https://s3.us-west-000.backblazeb2.com)

3. Configure

hf-cache-sync init

Edit .hf-cache-sync.yaml:

storage:
  endpoint: https://s3.us-west-000.backblazeb2.com
  bucket: team-hf-cache
  region: us-west-000
  access_key: ""   # B2 application key ID
  secret_key: ""   # B2 application key

cache:
  max_local_gb: 50
  sync_xet: false

team:
  prefix: ""
  allow_gated: false

Alternatively, set credentials via environment variables:

export AWS_ACCESS_KEY_ID=<your-b2-key-id>
export AWS_SECRET_ACCESS_KEY=<your-b2-key>

4. Push Your Local Cache

hf-cache-sync push

5. Pull on Another Machine

hf-cache-sync pull mistralai/Mistral-7B-v0.1
hf-cache-sync pull-all   # pull everything available

6. Prune Old Revisions

hf-cache-sync prune --max-gb 50

Configuration

Variable	Description	Default	Required
`storage.endpoint`	S3-compatible endpoint URL	—	Yes
`storage.bucket`	Bucket name	—	Yes
`storage.region`	Bucket region	—	Yes
`storage.access_key`	Access key (or `AWS_ACCESS_KEY_ID` env)	—	Yes
`storage.secret_key`	Secret key (or `AWS_SECRET_ACCESS_KEY` env)	—	Yes
`cache.max_local_gb`	Local disk budget in GB for prune	`50`	No
`cache.sync_xet`	Sync xet-format blobs	`false`	No
`team.prefix`	Namespace prefix within the bucket	`""`	No
`team.allow_gated`	Allow pushing gated model blobs	`false`	No

Config file search order: ./.hf-cache-sync.yaml, then ~/.hf-cache-sync.yaml.

Common Tasks

hf-cache-sync init                        # Generate config template
hf-cache-sync push                        # Upload cache to remote
hf-cache-sync pull <repo_id>              # Hydrate a specific model
hf-cache-sync pull <repo_id> -r <hash>    # Hydrate a specific revision
hf-cache-sync pull-all                    # Hydrate all remote models
hf-cache-sync prune --max-gb 50           # Evict old revisions
hf-cache-sync status                      # Show cache size and config
hf-cache-sync list                        # List cached repos

Testing

pip install -e ".[dev]"
pytest

Troubleshooting

Symptom	Fix
`hf-cache-sync push` uploads nothing	Verify HF cache exists at `~/.cache/huggingface/hub` and contains model directories
Authentication errors on push/pull	Check `access_key` / `secret_key` in config or `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` env vars
Hash mismatch on pull	Blob was corrupted in transit or storage. Delete the remote blob and re-push from a known-good machine.
Prune removes nothing	All revisions are pointed to by active refs. Detached revisions are evicted first.
Windows symlink errors	hf-cache-sync falls back to hardlinks, then file copies automatically. No action needed.

Security

Only public models are synced by default. Gated models require allow_gated: true.
All blobs are hash-verified (SHA-256) on pull.
Use B2 application keys scoped to a single bucket. Create separate read-only keys for distribution.
Never commit .hf-cache-sync.yaml with credentials. Use environment variables in CI.

License

MIT — See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/hf_cache_sync		src/hf_cache_sync
tests		tests
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hf-cache-sync

What Is hf-cache-sync?

Problem

Solution

Who Should Use This

Key Features

Architecture

Remote storage layout

Quick Start

Step-by-Step Setup

1. Install

2. Create a Backblaze B2 Bucket

3. Configure

4. Push Your Local Cache

5. Pull on Another Machine

6. Prune Old Revisions

Configuration

Common Tasks

Testing

Troubleshooting

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

backblaze-b2-samples/hf-cache-sync

Folders and files

Latest commit

History

Repository files navigation

hf-cache-sync

What Is hf-cache-sync?

Problem

Solution

Who Should Use This

Key Features

Architecture

Remote storage layout

Quick Start

Step-by-Step Setup

1. Install

2. Create a Backblaze B2 Bucket

3. Configure

4. Push Your Local Cache

5. Pull on Another Machine

6. Prune Old Revisions

Configuration

Common Tasks

Testing

Troubleshooting

Security

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages