Skip to content

Conversation

@littleKitchen
Copy link
Contributor

Summary

Fixes #616

Previously, HuggingFace models were always downloaded from the Hub even if the same model already existed in the local store. This caused unnecessary bandwidth usage and slower pull times.

Changes

  • Added cache check in PullModel() before calling pullNativeHuggingFace()
  • If model exists in local store, returns immediately with "Using cached model" message
  • Added hf.co to huggingface.co URL normalization in normalizeModelName() for consistent cache lookups
  • Added unit tests for both full URL and short URL cache scenarios

Testing

go test ./pkg/distribution/distribution/... -v -run "TestPullHuggingFace"

Both new test cases pass:

  • TestPullHuggingFaceModelFromCache - tests cache hit with full huggingface.co URL
  • TestPullHuggingFaceModelFromCacheWithShortURL - tests cache hit when pulling with hf.co URL

Result

After this change:

  • First pull: downloads from HuggingFace Hub as normal
  • Subsequent pulls: uses cached model, no network request needed
  • Works with both huggingface.co/... and hf.co/... URL formats

Fixes docker#616

Previously, HuggingFace models were always downloaded from the Hub even
if the same model already existed in the local store. This caused
unnecessary bandwidth usage and slower pull times.

This change adds a cache check before calling pullNativeHuggingFace(),
similar to how OCI models are handled. If the model is found in the
local store, it returns immediately without downloading.

Also includes hf.co to huggingface.co URL normalization to ensure
consistent cache lookups regardless of which URL format is used.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @littleKitchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly optimizes the process of pulling HuggingFace models by introducing a local caching mechanism. It ensures that models are only downloaded once, reducing bandwidth consumption and accelerating subsequent pull operations. Additionally, it standardizes model identification by normalizing different HuggingFace URL formats, leading to more reliable cache utilization.

Highlights

  • HuggingFace Model Caching: Implemented a local cache check for HuggingFace models in PullModel(), preventing unnecessary re-downloads if the model already exists in the local store.
  • URL Normalization: Added logic to normalizeModelName() to convert hf.co short URLs to their canonical huggingface.co form, ensuring consistent cache lookups regardless of the URL format used.
  • Improved User Feedback: When a HuggingFace model is retrieved from the local cache, a 'Using cached model' message is now displayed to the user.
  • New Unit Tests: Introduced two new unit tests (TestPullHuggingFaceModelFromCache and TestPullHuggingFaceModelFromCacheWithShortURL) to validate the caching mechanism for both full and short HuggingFace URLs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The hf.co normalization only handles bare hf.co/ prefixes; consider also normalizing common variants like https://hf.co/… or https://huggingface.co/… so cache hits are more robust to different user inputs.
  • The HuggingFace cache lookup logic inside PullModel is getting a bit verbose; consider extracting it into a small helper (e.g., tryHuggingFaceCacheHit) to keep PullModel focused on flow control and make future behavior changes easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `hf.co` normalization only handles bare `hf.co/` prefixes; consider also normalizing common variants like `https://hf.co/…` or `https://huggingface.co/…` so cache hits are more robust to different user inputs.
- The HuggingFace cache lookup logic inside `PullModel` is getting a bit verbose; consider extracting it into a small helper (e.g., `tryHuggingFaceCacheHit`) to keep `PullModel` focused on flow control and make future behavior changes easier.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable optimization by checking for locally cached HuggingFace models before downloading them, which will improve performance and reduce bandwidth usage. The normalization of hf.co URLs is also a good addition for consistency. I've identified one critical issue with error handling in the cache check that could mask underlying problems, and a medium-severity suggestion to refactor the new tests for better maintainability. Overall, great work on this feature.

Comment on lines 254 to 264
if localModel, err := c.store.Read(reference); err == nil {
c.log.Infoln("HuggingFace model found in local store:", utils.SanitizeForLog(reference))
cfg, err := localModel.Config()
if err != nil {
return fmt.Errorf("getting cached model config: %w", err)
}
if err := progress.WriteSuccess(progressWriter, fmt.Sprintf("Using cached model: %s", cfg.GetSize()), oci.ModePull); err != nil {
c.log.Warnf("Writing progress: %v", err)
}
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current error handling for the cache check is incorrect. By using if localModel, err := c.store.Read(reference); err == nil, any error returned from c.store.Read is silently ignored, and the code proceeds to download the model from the remote source. This can hide underlying problems with the local store (e.g., file permissions, corrupted index). The error should be explicitly checked. Only a ErrModelNotFound error should be treated as a cache miss, while any other error should be propagated to the caller.

localModel, err := c.store.Read(reference)
if err == nil {
	c.log.Infoln("HuggingFace model found in local store:", utils.SanitizeForLog(reference))
	cfg, err := localModel.Config()
	if err != nil {
		return fmt.Errorf("getting cached model config: %w", err)
	}
	if err := progress.WriteSuccess(progressWriter, fmt.Sprintf("Using cached model: %s", cfg.GetSize()), oci.ModePull); err != nil {
		c.log.Warnf("Writing progress: %v", err)
	}
	return nil
}
if !errors.Is(err, ErrModelNotFound) {
	return fmt.Errorf("checking for cached HuggingFace model: %w", err)
}

Comment on lines 1144 to 1212
func TestPullHuggingFaceModelFromCache(t *testing.T) {
tempDir := t.TempDir()

// Create client
client, err := newTestClient(tempDir)
if err != nil {
t.Fatalf("Failed to create client: %v", err)
}

// Create a test model and write it to the store with a HuggingFace tag
model, err := gguf.NewModel(testGGUFFile)
if err != nil {
t.Fatalf("Failed to create model: %v", err)
}

// Use a normalized HuggingFace tag (as it would be stored)
hfTag := "huggingface.co/testorg/testmodel:latest"
if err := client.store.Write(model, []string{hfTag}, nil); err != nil {
t.Fatalf("Failed to write model to store: %v", err)
}

// Now try to pull the same model - it should use the cache
var progressBuffer bytes.Buffer
err = client.PullModel(t.Context(), "huggingface.co/testorg/testmodel:latest", &progressBuffer)
if err != nil {
t.Fatalf("Failed to pull model from cache: %v", err)
}

// Verify that progress shows it was cached
progressOutput := progressBuffer.String()
if !strings.Contains(progressOutput, "Using cached model") {
t.Errorf("Expected progress to indicate cached model, got: %s", progressOutput)
}
}

func TestPullHuggingFaceModelFromCacheWithShortURL(t *testing.T) {
tempDir := t.TempDir()

// Create client
client, err := newTestClient(tempDir)
if err != nil {
t.Fatalf("Failed to create client: %v", err)
}

// Create a test model and write it to the store with a normalized HuggingFace tag
model, err := gguf.NewModel(testGGUFFile)
if err != nil {
t.Fatalf("Failed to create model: %v", err)
}

// Store with normalized tag (huggingface.co)
hfTag := "huggingface.co/testorg/testmodel:latest"
if err := client.store.Write(model, []string{hfTag}, nil); err != nil {
t.Fatalf("Failed to write model to store: %v", err)
}

// Now try to pull using short URL (hf.co) - should normalize and find in cache
var progressBuffer bytes.Buffer
err = client.PullModel(t.Context(), "hf.co/testorg/testmodel:latest", &progressBuffer)
if err != nil {
t.Fatalf("Failed to pull model from cache: %v", err)
}

// Verify that progress shows it was cached
progressOutput := progressBuffer.String()
if !strings.Contains(progressOutput, "Using cached model") {
t.Errorf("Expected progress to indicate cached model, got: %s", progressOutput)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These two test functions, TestPullHuggingFaceModelFromCache and TestPullHuggingFaceModelFromCacheWithShortURL, contain a lot of duplicated code. They can be consolidated into a single table-driven test to improve maintainability and reduce redundancy. This makes it easier to add more test cases for different URL formats in the future.

func TestPullHuggingFaceModelFromCache(t *testing.T) {
	testCases := []struct {
		name    string
		pullRef string
	}{
		{
			name:    "full URL",
			pullRef: "huggingface.co/testorg/testmodel:latest",
		},
		{
			name:    "short URL",
			pullRef: "hf.co/testorg/testmodel:latest",
		},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			tempDir := t.TempDir()

			// Create client
			client, err := newTestClient(tempDir)
			if err != nil {
				t.Fatalf("Failed to create client: %v", err)
			}

			// Create a test model and write it to the store with a normalized HuggingFace tag
			model, err := gguf.NewModel(testGGUFFile)
			if err != nil {
				t.Fatalf("Failed to create model: %v", err)
			}

			// Store with normalized tag (huggingface.co)
			hfTag := "huggingface.co/testorg/testmodel:latest"
			if err := client.store.Write(model, []string{hfTag}, nil); err != nil {
				t.Fatalf("Failed to write model to store: %v", err)
			}

			// Now try to pull using the test case's reference - it should use the cache
			var progressBuffer bytes.Buffer
			err = client.PullModel(t.Context(), tc.pullRef, &progressBuffer)
			if err != nil {
				t.Fatalf("Failed to pull model from cache: %v", err)
			}

			// Verify that progress shows it was cached
			progressOutput := progressBuffer.String()
			if !strings.Contains(progressOutput, "Using cached model") {
				t.Errorf("Expected progress to indicate cached model, got: %s", progressOutput)
			}
		})
	}
}

- Fix error handling: only treat ErrModelNotFound as cache miss, propagate other errors
- Consolidate test functions into table-driven test for better maintainability
Copy link
Contributor

@ilopezluna ilopezluna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! @littleKitchen could you please address gemini comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing cache check for native HuggingFace pulls causes duplicate downloads

2 participants