VRAM (Video RAM)
Technical InfrastructureThe dedicated memory on a GPU used to store AI model data during processing. More VRAM = ability to run larger, more capable AI models locally.
VRAM is the GPU's working memory, and it's the primary bottleneck for running AI models locally. The entire model (or the active portion) must fit in VRAM during inference. A 7B parameter model might need 4-6GB of VRAM, while a 70B model needs 40-48GB.
Consumer GPU VRAM ranges: NVIDIA RTX 3060 (12GB), RTX 4070 (12GB), RTX 4080 (16GB), RTX 4090 (24GB). Professional GPUs: RTX A6000 (48GB), H100 (80GB). Apple Silicon Macs can use system RAM as unified memory for AI, with M1 Max offering 64GB and M2 Ultra up to 192GB.
Quantization helps fit larger models in less VRAM by reducing precision. A 4-bit quantized 70B model can run in 35-40GB VRAM instead of 140GB. Quality degrades somewhat, but for many tasks the tradeoff is worthwhile.
Real-World Example
If you want to run Stable Diffusion or Llama locally, VRAM is the key spec to check on your GPU — 8GB minimum for basic models, 24GB for comfortable use of larger ones.
Related Terms
More in Technical Infrastructure
FAQ
What is VRAM (Video RAM)?
The dedicated memory on a GPU used to store AI model data during processing. More VRAM = ability to run larger, more capable AI models locally.
How is VRAM (Video RAM) used in practice?
If you want to run Stable Diffusion or Llama locally, VRAM is the key spec to check on your GPU — 8GB minimum for basic models, 24GB for comfortable use of larger ones.
What concepts are related to VRAM (Video RAM)?
Key related concepts include GPU (Graphics Processing Unit), Parameters, Quantization, Self-hosting, Inference. Understanding these together gives a more complete picture of how VRAM (Video RAM) fits into the AI landscape.