bug: Qwen3.5 Cannot Run with GPU Acceleration on Apple M4

### Issue description

wen3.5-4B-Q4_K_M.gguf cannot be loaded when gpuLayers is set to any value other than 0, getting `InsufficientMemoryError` regardless of the context size.



### Actual Behavior

The issue is NOT related to context size. The error occurs as long as `gpuLayers` is set to any value other than 0:
- `gpuLayers: 0` → Works ✅
- `gpuLayers: 'auto'` → InsufficientMemoryError ❌
- `gpuLayers: 40` → InsufficientMemoryError ❌

## Error Message

```
InsufficientMemoryError: A context size of XXXXX is too large for the available VRAM
    at resolveContextContextSizeOption (file:///.../node-llama-cpp/dist/gguf/insights/utils/resolveContextContextSizeOption.js:28:19)
    at async GgufInsightsConfigurationResolver.resolveContextContextSize (file:///.../node-llama-cpp/dist/gguf/insights/GgufInsightsConfigurationResolver.js:235:16)
    at async LlamaContext._create (file:///.../node-llama-cpp/dist/evaluator/LlamaContext/LlamaContext.js:581:27)
```

## Additional Issue: Function Calling Causes Context Size Error

When using `gpuLayers: 0` (CPU-only mode), there is another issue: regardless of how large the contextSize is set, as long as functions are used, it will throw "Error: The context size is too small to generate a response".

### Code:
```typescript
import path from 'node:path'
import {
  defineChatSessionFunction,
  getLlama,
  LlamaChatSession,
} from 'node-llama-cpp'

const MODEL_PATH = './models/Qwen3.5-4B-UD-Q4_K_XL.gguf'

const functions = {
  getCurrentWeather: defineChatSessionFunction({
    description: 'Gets the current weather in the provided location.',
    params: {
      type: 'object',
      properties: {
        location: {
          type: 'string',
          description: 'The city and state, e.g. San Francisco, CA',
        },
        format: {
          enum: ['celsius', 'fahrenheit'],
        },
      },
    },
    handler({ location, format }) {
      console.warn(`Getting current weather for "${location}" in ${format}`)
      return {
        temperature: format === 'celsius' ? 20 : 68,
        format,
      }
    },
  }),
}

const llama = await getLlama()
const model = await llama.loadModel({
  modelPath: path.resolve(MODEL_PATH),
  gpuLayers: 0,
})

console.warn('Creating context without explicit contextSize...')
const context = await model.createContext()
console.warn('Context created successfully')

const session = new LlamaChatSession({
  contextSequence: context.getSequence(),
})

const q1 = 'What is the weather like in SF?'
console.warn(`User: ${q1}`)

const a1 = await session.prompt(q1, { functions })
console.warn(`AI: ${a1}`)

```


### My Environment

OS: macOS 25.3.0 (arm64)
Node: 22.21.1 (arm64)
TypeScript: 5.9.3

node-llama-cpp: 3.17.1
Prebuilt binaries: b8179

Metal: available

Metal device: Apple M4
Metal used VRAM: 0% (464KB/11.84GB)
Metal free VRAM: 99.99% (11.84GB/11.84GB)

CPU model: Apple M4
Math cores: 4
Used RAM: 99.18% (15.87GB/16GB)
Free RAM: 0.81% (133.38MB/16GB)
Used swap: 78.92% (3.16GB/4GB)
Max swap size: dynamic
mmap: supported

### Additional Context

_No response_

### Relevant Features Used

- [x] Metal support
- [ ] CUDA support
- [ ] Vulkan support
- [ ] Grammar
- [x] Function calling

### Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: Qwen3.5 Cannot Run with GPU Acceleration on Apple M4 #571

Issue description

Actual Behavior

Error Message

Additional Issue: Function Calling Causes Context Size Error

Code:

My Environment

Additional Context

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

bug: Qwen3.5 Cannot Run with GPU Acceleration on Apple M4 #571

Description

Issue description

Actual Behavior

Error Message

Additional Issue: Function Calling Causes Context Size Error

Code:

My Environment

Additional Context

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions