KV Cache Calculator

Calculate KV cache memory requirements for transformer models. Supports MHA, GQA, and MLA attention mechanisms with fp16/bf16, fp8, and fp4 data types.

🔍 Search Model

💡 Tip: Type model names like 'llama', 'qwen', 'mistral', then press Enter to search

Context Length

Number of Users

KV Cache Data Type

HuggingFace Token (optional)

KV Cache Size (GB)

Model Configuration