install llama.cpp and copy the 'llama-server' binary to this folder. download and put the 'Qwen3-4B-Q4_K_M.gguf' from huggingface to this folder.(link - https://huggingface.co/ggml-org/Qwen3-4B-GGUF/blob/main/Qwen3-4B-Q4_K_M.gguf)
you can use any other compatible model but you have to edit the code to use that model.