-
Notifications
You must be signed in to change notification settings - Fork 90
fix: check local store before pulling HuggingFace models #617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: check local store before pulling HuggingFace models #617
Conversation
Fixes docker#616 Previously, HuggingFace models were always downloaded from the Hub even if the same model already existed in the local store. This caused unnecessary bandwidth usage and slower pull times. This change adds a cache check before calling pullNativeHuggingFace(), similar to how OCI models are handled. If the model is found in the local store, it returns immediately without downloading. Also includes hf.co to huggingface.co URL normalization to ensure consistent cache lookups regardless of which URL format is used.
Summary of ChangesHello @littleKitchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly optimizes the process of pulling HuggingFace models by introducing a local caching mechanism. It ensures that models are only downloaded once, reducing bandwidth consumption and accelerating subsequent pull operations. Additionally, it standardizes model identification by normalizing different HuggingFace URL formats, leading to more reliable cache utilization. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've left some high level feedback:
- The
hf.conormalization only handles barehf.co/prefixes; consider also normalizing common variants likehttps://hf.co/…orhttps://huggingface.co/…so cache hits are more robust to different user inputs. - The HuggingFace cache lookup logic inside
PullModelis getting a bit verbose; consider extracting it into a small helper (e.g.,tryHuggingFaceCacheHit) to keepPullModelfocused on flow control and make future behavior changes easier.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `hf.co` normalization only handles bare `hf.co/` prefixes; consider also normalizing common variants like `https://hf.co/…` or `https://huggingface.co/…` so cache hits are more robust to different user inputs.
- The HuggingFace cache lookup logic inside `PullModel` is getting a bit verbose; consider extracting it into a small helper (e.g., `tryHuggingFaceCacheHit`) to keep `PullModel` focused on flow control and make future behavior changes easier.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable optimization by checking for locally cached HuggingFace models before downloading them, which will improve performance and reduce bandwidth usage. The normalization of hf.co URLs is also a good addition for consistency. I've identified one critical issue with error handling in the cache check that could mask underlying problems, and a medium-severity suggestion to refactor the new tests for better maintainability. Overall, great work on this feature.
| if localModel, err := c.store.Read(reference); err == nil { | ||
| c.log.Infoln("HuggingFace model found in local store:", utils.SanitizeForLog(reference)) | ||
| cfg, err := localModel.Config() | ||
| if err != nil { | ||
| return fmt.Errorf("getting cached model config: %w", err) | ||
| } | ||
| if err := progress.WriteSuccess(progressWriter, fmt.Sprintf("Using cached model: %s", cfg.GetSize()), oci.ModePull); err != nil { | ||
| c.log.Warnf("Writing progress: %v", err) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current error handling for the cache check is incorrect. By using if localModel, err := c.store.Read(reference); err == nil, any error returned from c.store.Read is silently ignored, and the code proceeds to download the model from the remote source. This can hide underlying problems with the local store (e.g., file permissions, corrupted index). The error should be explicitly checked. Only a ErrModelNotFound error should be treated as a cache miss, while any other error should be propagated to the caller.
localModel, err := c.store.Read(reference)
if err == nil {
c.log.Infoln("HuggingFace model found in local store:", utils.SanitizeForLog(reference))
cfg, err := localModel.Config()
if err != nil {
return fmt.Errorf("getting cached model config: %w", err)
}
if err := progress.WriteSuccess(progressWriter, fmt.Sprintf("Using cached model: %s", cfg.GetSize()), oci.ModePull); err != nil {
c.log.Warnf("Writing progress: %v", err)
}
return nil
}
if !errors.Is(err, ErrModelNotFound) {
return fmt.Errorf("checking for cached HuggingFace model: %w", err)
}| func TestPullHuggingFaceModelFromCache(t *testing.T) { | ||
| tempDir := t.TempDir() | ||
|
|
||
| // Create client | ||
| client, err := newTestClient(tempDir) | ||
| if err != nil { | ||
| t.Fatalf("Failed to create client: %v", err) | ||
| } | ||
|
|
||
| // Create a test model and write it to the store with a HuggingFace tag | ||
| model, err := gguf.NewModel(testGGUFFile) | ||
| if err != nil { | ||
| t.Fatalf("Failed to create model: %v", err) | ||
| } | ||
|
|
||
| // Use a normalized HuggingFace tag (as it would be stored) | ||
| hfTag := "huggingface.co/testorg/testmodel:latest" | ||
| if err := client.store.Write(model, []string{hfTag}, nil); err != nil { | ||
| t.Fatalf("Failed to write model to store: %v", err) | ||
| } | ||
|
|
||
| // Now try to pull the same model - it should use the cache | ||
| var progressBuffer bytes.Buffer | ||
| err = client.PullModel(t.Context(), "huggingface.co/testorg/testmodel:latest", &progressBuffer) | ||
| if err != nil { | ||
| t.Fatalf("Failed to pull model from cache: %v", err) | ||
| } | ||
|
|
||
| // Verify that progress shows it was cached | ||
| progressOutput := progressBuffer.String() | ||
| if !strings.Contains(progressOutput, "Using cached model") { | ||
| t.Errorf("Expected progress to indicate cached model, got: %s", progressOutput) | ||
| } | ||
| } | ||
|
|
||
| func TestPullHuggingFaceModelFromCacheWithShortURL(t *testing.T) { | ||
| tempDir := t.TempDir() | ||
|
|
||
| // Create client | ||
| client, err := newTestClient(tempDir) | ||
| if err != nil { | ||
| t.Fatalf("Failed to create client: %v", err) | ||
| } | ||
|
|
||
| // Create a test model and write it to the store with a normalized HuggingFace tag | ||
| model, err := gguf.NewModel(testGGUFFile) | ||
| if err != nil { | ||
| t.Fatalf("Failed to create model: %v", err) | ||
| } | ||
|
|
||
| // Store with normalized tag (huggingface.co) | ||
| hfTag := "huggingface.co/testorg/testmodel:latest" | ||
| if err := client.store.Write(model, []string{hfTag}, nil); err != nil { | ||
| t.Fatalf("Failed to write model to store: %v", err) | ||
| } | ||
|
|
||
| // Now try to pull using short URL (hf.co) - should normalize and find in cache | ||
| var progressBuffer bytes.Buffer | ||
| err = client.PullModel(t.Context(), "hf.co/testorg/testmodel:latest", &progressBuffer) | ||
| if err != nil { | ||
| t.Fatalf("Failed to pull model from cache: %v", err) | ||
| } | ||
|
|
||
| // Verify that progress shows it was cached | ||
| progressOutput := progressBuffer.String() | ||
| if !strings.Contains(progressOutput, "Using cached model") { | ||
| t.Errorf("Expected progress to indicate cached model, got: %s", progressOutput) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two test functions, TestPullHuggingFaceModelFromCache and TestPullHuggingFaceModelFromCacheWithShortURL, contain a lot of duplicated code. They can be consolidated into a single table-driven test to improve maintainability and reduce redundancy. This makes it easier to add more test cases for different URL formats in the future.
func TestPullHuggingFaceModelFromCache(t *testing.T) {
testCases := []struct {
name string
pullRef string
}{
{
name: "full URL",
pullRef: "huggingface.co/testorg/testmodel:latest",
},
{
name: "short URL",
pullRef: "hf.co/testorg/testmodel:latest",
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
tempDir := t.TempDir()
// Create client
client, err := newTestClient(tempDir)
if err != nil {
t.Fatalf("Failed to create client: %v", err)
}
// Create a test model and write it to the store with a normalized HuggingFace tag
model, err := gguf.NewModel(testGGUFFile)
if err != nil {
t.Fatalf("Failed to create model: %v", err)
}
// Store with normalized tag (huggingface.co)
hfTag := "huggingface.co/testorg/testmodel:latest"
if err := client.store.Write(model, []string{hfTag}, nil); err != nil {
t.Fatalf("Failed to write model to store: %v", err)
}
// Now try to pull using the test case's reference - it should use the cache
var progressBuffer bytes.Buffer
err = client.PullModel(t.Context(), tc.pullRef, &progressBuffer)
if err != nil {
t.Fatalf("Failed to pull model from cache: %v", err)
}
// Verify that progress shows it was cached
progressOutput := progressBuffer.String()
if !strings.Contains(progressOutput, "Using cached model") {
t.Errorf("Expected progress to indicate cached model, got: %s", progressOutput)
}
})
}
}- Fix error handling: only treat ErrModelNotFound as cache miss, propagate other errors - Consolidate test functions into table-driven test for better maintainability
ilopezluna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! @littleKitchen could you please address gemini comments?
Summary
Fixes #616
Previously, HuggingFace models were always downloaded from the Hub even if the same model already existed in the local store. This caused unnecessary bandwidth usage and slower pull times.
Changes
PullModel()before callingpullNativeHuggingFace()hf.cotohuggingface.coURL normalization innormalizeModelName()for consistent cache lookupsTesting
Both new test cases pass:
TestPullHuggingFaceModelFromCache- tests cache hit with full huggingface.co URLTestPullHuggingFaceModelFromCacheWithShortURL- tests cache hit when pulling with hf.co URLResult
After this change:
huggingface.co/...andhf.co/...URL formats