wolfi-llama

A container image based on Wolfi OS for running OSS LLMs via llama.cpp

Full tutorial available here.

Usage

Download .gguf models from huggingface and place them in the models folder. Then run the image using a volume to share the models and a port redirect (for the server).

Example 1: Running Qwen3 VL with a web chat interface (personal assistant like ChatGPT)

1. Download the Models

unsloth Qwn3 VL models
What I used:
- https://huggingface.co/unsloth/Qwen3-VL-2B-Instruct-GGUF
- Smaller version that runs on Raspberry Pi 5
- You need to download also the mmproj file to use image capabilities

Feel free to experiment with larger models, but keep in mind memory limitations.

2. Build the Image

docker build . -t wolfi-llama

This might take several minutes depending on the platform. Once the image is built you can run it with Docker.

3. Running the Image

Run the image with default entrypoint (llama-server) on port 8000 and create a port redirect so you can access it from your local network:

docker run --rm --device /dev/dri/card1 --device /dev/dri/renderD128 -v ${PWD}/models:/models -p 8000:8000 wolfi-llama:latest --no-mmap --no-warmup -m /models/Qwen3-VL-2B-Instruct-Q8_0.gguf --mmproj /models/mmproj-F32.gguf --port 8000 --host 0.0.0.0 -n 512

You can also include the recommended settings for this specific model:

docker run --rm --device /dev/dri/card1 --device /dev/dri/renderD128 -v ${PWD}/models:/models -p 8000:8000 wolfi-llama:latest --no-mmap --no-warmup -m /models/Qwen3-VL-2B-Instruct-Q8_0.gguf --mmproj /models/mmproj-F32.gguf --port 8000 --h
ost 0.0.0.0 -n 512 --temp 0.7 --top-p 0.8 --top-k 20 --presence-penalty 1.5

Once the container is up and running and all services initialized, you can access the chat via localhost:8000 or local_network_ip:8000.

You should get a page like this:

You should be able to use the image features, just keep in mind that large images consume a lot of tokens (so it takes a longer time to process as part of the prompt), so consider shrinking / optimizing your pictures before you use them in the prompt.

Example 2: Using Docker Compose

The included docker-compose.yaml file facilitates running a setup with pre-defined values. Check the file and modify the command line if you are using a different model or different parameters. Then, you can simply run:

docker compose up

And this will get the llama server up and running the same way as before.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
models		models
resources		resources
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

wolfi-llama

Usage

Example 1: Running Qwen3 VL with a web chat interface (personal assistant like ChatGPT)

1. Download the Models

2. Build the Image

3. Running the Image

Example 2: Using Docker Compose

About

Uh oh!

Releases

Packages

Languages

License

erikaheidi/wolfi-llama

Folders and files

Latest commit

History

Repository files navigation

wolfi-llama

Usage

Example 1: Running Qwen3 VL with a web chat interface (personal assistant like ChatGPT)

1. Download the Models

2. Build the Image

3. Running the Image

Example 2: Using Docker Compose

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages