fix: check local store before pulling HuggingFace models #617

littleKitchen · 2026-01-30T18:17:22Z

Summary

Fixes #616

Previously, HuggingFace models were always downloaded from the Hub even if the same model already existed in the local store. This caused unnecessary bandwidth usage and slower pull times.

Changes

Added cache check in PullModel() before calling pullNativeHuggingFace()
If model exists in local store, returns immediately with "Using cached model" message
Added hf.co to huggingface.co URL normalization in normalizeModelName() for consistent cache lookups
Added unit tests for both full URL and short URL cache scenarios

Testing

go test ./pkg/distribution/distribution/... -v -run "TestPullHuggingFace"

Both new test cases pass:

TestPullHuggingFaceModelFromCache - tests cache hit with full huggingface.co URL
TestPullHuggingFaceModelFromCacheWithShortURL - tests cache hit when pulling with hf.co URL

Result

After this change:

First pull: downloads from HuggingFace Hub as normal
Subsequent pulls: uses cached model, no network request needed
Works with both huggingface.co/... and hf.co/... URL formats

Fixes docker#616 Previously, HuggingFace models were always downloaded from the Hub even if the same model already existed in the local store. This caused unnecessary bandwidth usage and slower pull times. This change adds a cache check before calling pullNativeHuggingFace(), similar to how OCI models are handled. If the model is found in the local store, it returns immediately without downloading. Also includes hf.co to huggingface.co URL normalization to ensure consistent cache lookups regardless of which URL format is used.

gemini-code-assist · 2026-01-30T18:17:37Z

Summary of Changes

Hello @littleKitchen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly optimizes the process of pulling HuggingFace models by introducing a local caching mechanism. It ensures that models are only downloaded once, reducing bandwidth consumption and accelerating subsequent pull operations. Additionally, it standardizes model identification by normalizing different HuggingFace URL formats, leading to more reliable cache utilization.

Highlights

HuggingFace Model Caching: Implemented a local cache check for HuggingFace models in PullModel(), preventing unnecessary re-downloads if the model already exists in the local store.
URL Normalization: Added logic to normalizeModelName() to convert hf.co short URLs to their canonical huggingface.co form, ensuring consistent cache lookups regardless of the URL format used.
Improved User Feedback: When a HuggingFace model is retrieved from the local cache, a 'Using cached model' message is now displayed to the user.
New Unit Tests: Introduced two new unit tests (TestPullHuggingFaceModelFromCache and TestPullHuggingFaceModelFromCacheWithShortURL) to validate the caching mechanism for both full and short HuggingFace URLs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey - I've left some high level feedback:

The hf.co normalization only handles bare hf.co/ prefixes; consider also normalizing common variants like https://hf.co/… or https://huggingface.co/… so cache hits are more robust to different user inputs.
The HuggingFace cache lookup logic inside PullModel is getting a bit verbose; consider extracting it into a small helper (e.g., tryHuggingFaceCacheHit) to keep PullModel focused on flow control and make future behavior changes easier.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `hf.co` normalization only handles bare `hf.co/` prefixes; consider also normalizing common variants like `https://hf.co/…` or `https://huggingface.co/…` so cache hits are more robust to different user inputs.
- The HuggingFace cache lookup logic inside `PullModel` is getting a bit verbose; consider extracting it into a small helper (e.g., `tryHuggingFaceCacheHit`) to keep `PullModel` focused on flow control and make future behavior changes easier.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces a valuable optimization by checking for locally cached HuggingFace models before downloading them, which will improve performance and reduce bandwidth usage. The normalization of hf.co URLs is also a good addition for consistency. I've identified one critical issue with error handling in the cache check that could mask underlying problems, and a medium-severity suggestion to refactor the new tests for better maintainability. Overall, great work on this feature.

gemini-code-assist · 2026-01-30T18:19:11Z

pkg/distribution/distribution/client.go

+		if localModel, err := c.store.Read(reference); err == nil {
+			c.log.Infoln("HuggingFace model found in local store:", utils.SanitizeForLog(reference))
+			cfg, err := localModel.Config()
+			if err != nil {
+				return fmt.Errorf("getting cached model config: %w", err)
+			}
+			if err := progress.WriteSuccess(progressWriter, fmt.Sprintf("Using cached model: %s", cfg.GetSize()), oci.ModePull); err != nil {
+				c.log.Warnf("Writing progress: %v", err)
+			}
+			return nil
+		}


The current error handling for the cache check is incorrect. By using if localModel, err := c.store.Read(reference); err == nil, any error returned from c.store.Read is silently ignored, and the code proceeds to download the model from the remote source. This can hide underlying problems with the local store (e.g., file permissions, corrupted index). The error should be explicitly checked. Only a ErrModelNotFound error should be treated as a cache miss, while any other error should be propagated to the caller.

localModel, err := c.store.Read(reference) if err == nil { c.log.Infoln("HuggingFace model found in local store:", utils.SanitizeForLog(reference)) cfg, err := localModel.Config() if err != nil { return fmt.Errorf("getting cached model config: %w", err) } if err := progress.WriteSuccess(progressWriter, fmt.Sprintf("Using cached model: %s", cfg.GetSize()), oci.ModePull); err != nil { c.log.Warnf("Writing progress: %v", err) } return nil } if !errors.Is(err, ErrModelNotFound) { return fmt.Errorf("checking for cached HuggingFace model: %w", err) }

gemini-code-assist · 2026-01-30T18:19:11Z

pkg/distribution/distribution/client_test.go

+func TestPullHuggingFaceModelFromCache(t *testing.T) {
+	tempDir := t.TempDir()
+
+	// Create client
+	client, err := newTestClient(tempDir)
+	if err != nil {
+		t.Fatalf("Failed to create client: %v", err)
+	}
+
+	// Create a test model and write it to the store with a HuggingFace tag
+	model, err := gguf.NewModel(testGGUFFile)
+	if err != nil {
+		t.Fatalf("Failed to create model: %v", err)
+	}
+
+	// Use a normalized HuggingFace tag (as it would be stored)
+	hfTag := "huggingface.co/testorg/testmodel:latest"
+	if err := client.store.Write(model, []string{hfTag}, nil); err != nil {
+		t.Fatalf("Failed to write model to store: %v", err)
+	}
+
+	// Now try to pull the same model - it should use the cache
+	var progressBuffer bytes.Buffer
+	err = client.PullModel(t.Context(), "huggingface.co/testorg/testmodel:latest", &progressBuffer)
+	if err != nil {
+		t.Fatalf("Failed to pull model from cache: %v", err)
+	}
+
+	// Verify that progress shows it was cached
+	progressOutput := progressBuffer.String()
+	if !strings.Contains(progressOutput, "Using cached model") {
+		t.Errorf("Expected progress to indicate cached model, got: %s", progressOutput)
+	}
+}
+
+func TestPullHuggingFaceModelFromCacheWithShortURL(t *testing.T) {
+	tempDir := t.TempDir()
+
+	// Create client
+	client, err := newTestClient(tempDir)
+	if err != nil {
+		t.Fatalf("Failed to create client: %v", err)
+	}
+
+	// Create a test model and write it to the store with a normalized HuggingFace tag
+	model, err := gguf.NewModel(testGGUFFile)
+	if err != nil {
+		t.Fatalf("Failed to create model: %v", err)
+	}
+
+	// Store with normalized tag (huggingface.co)
+	hfTag := "huggingface.co/testorg/testmodel:latest"
+	if err := client.store.Write(model, []string{hfTag}, nil); err != nil {
+		t.Fatalf("Failed to write model to store: %v", err)
+	}
+
+	// Now try to pull using short URL (hf.co) - should normalize and find in cache
+	var progressBuffer bytes.Buffer
+	err = client.PullModel(t.Context(), "hf.co/testorg/testmodel:latest", &progressBuffer)
+	if err != nil {
+		t.Fatalf("Failed to pull model from cache: %v", err)
+	}
+
+	// Verify that progress shows it was cached
+	progressOutput := progressBuffer.String()
+	if !strings.Contains(progressOutput, "Using cached model") {
+		t.Errorf("Expected progress to indicate cached model, got: %s", progressOutput)
+	}
+}


These two test functions, TestPullHuggingFaceModelFromCache and TestPullHuggingFaceModelFromCacheWithShortURL, contain a lot of duplicated code. They can be consolidated into a single table-driven test to improve maintainability and reduce redundancy. This makes it easier to add more test cases for different URL formats in the future.

func TestPullHuggingFaceModelFromCache(t *testing.T) { testCases := []struct { name string pullRef string }{ { name: "full URL", pullRef: "huggingface.co/testorg/testmodel:latest", }, { name: "short URL", pullRef: "hf.co/testorg/testmodel:latest", }, } for _, tc := range testCases { t.Run(tc.name, func(t *testing.T) { tempDir := t.TempDir() // Create client client, err := newTestClient(tempDir) if err != nil { t.Fatalf("Failed to create client: %v", err) } // Create a test model and write it to the store with a normalized HuggingFace tag model, err := gguf.NewModel(testGGUFFile) if err != nil { t.Fatalf("Failed to create model: %v", err) } // Store with normalized tag (huggingface.co) hfTag := "huggingface.co/testorg/testmodel:latest" if err := client.store.Write(model, []string{hfTag}, nil); err != nil { t.Fatalf("Failed to write model to store: %v", err) } // Now try to pull using the test case's reference - it should use the cache var progressBuffer bytes.Buffer err = client.PullModel(t.Context(), tc.pullRef, &progressBuffer) if err != nil { t.Fatalf("Failed to pull model from cache: %v", err) } // Verify that progress shows it was cached progressOutput := progressBuffer.String() if !strings.Contains(progressOutput, "Using cached model") { t.Errorf("Expected progress to indicate cached model, got: %s", progressOutput) } }) } }

- Fix error handling: only treat ErrModelNotFound as cache miss, propagate other errors - Consolidate test functions into table-driven test for better maintainability

ilopezluna

Nice! @littleKitchen could you please address gemini comments?

sourcery-ai bot reviewed Jan 30, 2026

View reviewed changes

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

refactor: apply Gemini code review suggestions

c42c34e

- Fix error handling: only treat ErrModelNotFound as cache miss, propagate other errors - Consolidate test functions into table-driven test for better maintainability

ilopezluna approved these changes Jan 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: check local store before pulling HuggingFace models #617

fix: check local store before pulling HuggingFace models #617

littleKitchen commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

ilopezluna left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: check local store before pulling HuggingFace models #617

Are you sure you want to change the base?

fix: check local store before pulling HuggingFace models #617

Conversation

littleKitchen commented Jan 30, 2026

Summary

Changes

Testing

Result

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

ilopezluna left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants