LLM/VLM Memory Estimator

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned	license
LLM/VLM Memory Estimator	🧮	blue	green	gradio	5.49.1	app.py	false	mit

LLM/VLM Memory Estimator

Estimate GPU memory requirements for fine-tuning Large Language Models and Vision-Language Models.

Features

✅ LoRA / QLoRA (4-bit quantization)
✅ FlashAttention & Gradient Checkpointing
✅ DeepSpeed ZeRO (0/1/2/3)
✅ Multi-GPU (DP/TP/PP)
✅ Web UI & CLI
✅ Supports Qwen, Llama, Mistral, etc.

Quick Start

Web UI

# Using uv (recommended)
uv sync
uv run python -m llm_memory_estimator.app
# Or: uv run python llm_memory_estimator/app.py
# Visit http://localhost:7860

# Using pip
pip install -e .
python -m llm_memory_estimator.app

Command Line

# Using uv (recommended)
uv sync
uv run llm-memory-estimator --help

# Using pip
pip install -e .

# Basic estimation (with uv)
uv run llm-memory-estimator \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dtype bf16 \
  --seq-len 2048 \
  --per-device-batch 2 \
  --grad-accum 128 \
  --zero 2 \
  --dp 8 \
  --flashattn true \
  --grad-checkpoint true

Examples

Full Fine-Tuning (7B Model)

uv run llm-memory-estimator \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dtype bf16 \
  --seq-len 2048 \
  --per-device-batch 2 \
  --grad-accum 128 \
  --zero 2 \
  --dp 8 \
  --flashattn true \
  --grad-checkpoint true

Result: ~14 GiB/GPU (estimated, may vary based on actual runtime behavior)

QLoRA (7B Model)

uv run llm-memory-estimator \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dtype bf16 \
  --qlora true \
  --lora-rank 8 \
  --seq-len 2048 \
  --per-device-batch 1 \
  --grad-accum 64 \
  --zero 3 \
  --dp 4 \
  --tp 2 \
  --flashattn true \
  --grad-checkpoint true

Result: ~8 GiB/GPU (estimated, may vary based on actual runtime behavior)

LoRA (72B Model)

uv run llm-memory-estimator \
  --model Qwen/Qwen2.5-72B-Instruct \
  --dtype bf16 \
  --lora true \
  --lora-rank 64 \
  --seq-len 2048 \
  --per-device-batch 1 \
  --grad-accum 512 \
  --zero 3 \
  --dp 32 \
  --tp 4 \
  --flashattn true \
  --grad-checkpoint true

Result: ~18 GiB/GPU (estimated, may vary based on actual runtime behavior)

Vision-Language Model

uv run llm-memory-estimator \
  --model Qwen/Qwen2.5-VL-7B-Instruct \
  --dtype bf16 \
  --seq-len 2048 \
  --per-device-batch 1 \
  --grad-accum 64 \
  --zero 2 \
  --dp 4 \
  --flashattn true \
  --grad-checkpoint true \
  --use-vision true \
  --vision-hidden 1024 \
  --vision-layers 24 \
  --vision-image-size 448 \
  --vision-patch 14

Empirical Probe (GPU Required)

uv run llm-memory-estimator \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dtype bf16 \
  --seq-len 2048 \
  --flashattn true \
  --grad-checkpoint true \
  --probe true \
  --warmup-steps 3 \
  --probe-steps 6

Compares estimated vs actual GPU memory usage.

Key Parameters

Parameter	Description	Default
`--model`	HuggingFace model ID	Required
`--dtype`	Precision (bf16/fp16/fp32/int8/nf4)	bf16
`--seq-len`	Sequence length	2048
`--per-device-batch`	Batch size per GPU	1
`--grad-accum`	Gradient accumulation steps	1
`--dp`	Data parallelism degree	1
`--tp`	Tensor parallelism degree	1
`--pp`	Pipeline parallelism degree	1
`--zero`	DeepSpeed ZeRO stage (0/1/2/3)	0
`--flashattn`	Enable FlashAttention	false
`--grad-checkpoint`	Enable gradient checkpointing	false
`--lora`	Enable LoRA	false
`--qlora`	Enable QLoRA (4-bit)	false
`--lora-rank`	LoRA rank	8
`--fused-loss`	Enable fused loss (Liger)	false
`--probe`	Run empirical GPU probe	false
`--use-vision`	Enable vision encoder	false

For complete parameter list: uv run llm-memory-estimator --help

How It Works

The estimator calculates per-GPU memory as:

Total = Weights + Gradients + Optimizer States + Activations + Logits + Overheads

Key components:

Weights: Model parameters (sharded by TP/PP/ZeRO-3)
Trainable State: Gradients + optimizer states (sharded by ZeRO-1/2/3)
Activations: Attention + MLP intermediate tensors (reduced by gradient checkpointing & FlashAttention)
Logits: Output layer activations (reduced by fused loss)
Overheads: CUDA context, DeepSpeed, fragmentation

Requirements

Estimation (no GPU needed):

Python 3.8+
transformers >= 4.36.0
accelerate >= 0.25.0
torch >= 2.1.0 (CPU version OK)

Empirical Probe (GPU required):

CUDA-enabled PyTorch

Installation

Using uv (recommended)

uv is a fast Python package installer and resolver.

Note: This project uses CPU-only PyTorch by default (~650 MB vs 1.7 GB CUDA version, 1.xGB vs 7.x GB for full virtual env), which is sufficient for memory estimation. The empirical probe feature (GPU required) is not yet implemented.

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/MiaoDX/llm-memory-estimator.git
cd llm-memory-estimator
uv sync

# Run the CLI
uv run llm-memory-estimator --help

# Run the web UI
uv run python -m llm_memory_estimator.app

Using pip

git clone https://github.com/MiaoDX/llm-memory-estimator.git
cd llm-memory-estimator
pip install -e .

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! This tool uses mathematical models calibrated against empirical measurements. Estimates typically accurate within ±10-15%.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
llm_memory_estimator		llm_memory_estimator
output		output
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
llm_vlm_memory_estimation.md		llm_vlm_memory_estimation.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM/VLM Memory Estimator

Features

Quick Start

Web UI

Command Line

Examples

Full Fine-Tuning (7B Model)

QLoRA (7B Model)

LoRA (72B Model)

Vision-Language Model

Empirical Probe (GPU Required)

Key Parameters

How It Works

Requirements

Installation

Using uv (recommended)

Using pip

License

Contributing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

MiaoDX/llm_vlm_memory_estimator

Folders and files

Latest commit

History

Repository files navigation

LLM/VLM Memory Estimator

Features

Quick Start

Web UI

Command Line

Examples

Full Fine-Tuning (7B Model)

QLoRA (7B Model)

LoRA (72B Model)

Vision-Language Model

Empirical Probe (GPU Required)

Key Parameters

How It Works

Requirements

Installation

Using uv (recommended)

Using pip

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages