AI Models
Frontier LLM Models for On Premise Deployments
A curated selection of the most capable open-source AI models available today, optimized for deployment on NVIDIA Blackwell-powered hardware. These production-ready models deliver state-of-the-art performance while running entirely on your premises — ensuring complete data sovereignty and eliminating cloud dependencies.
Model Qwen 3.6 VL
Releasedate 2026-04-24
Vendor
Alibaba Cloud
Modality 
Parameter Total 35B
Parameter Active 3B
Size 24gb
Status Testing
Model Nemotron OCR v2
Releasedate 2026-04-02
Vendor
Nvidia
Modality
Parameter Total 0.1B
Parameter Active 0.1B
Size 0.4gb
Status Testing
Model Gemma 4
Releasedate 2026-04-02
Vendor
Google
Google Modality 



Parameter Total 31B
Parameter Active 31B
Size 32gb
Status Testing
Model Cohere Transcribe
Releasedate 2026-03-25
Vendor
Cohere Labs
Modality
Parameter Total 2B
Parameter Active 2B
Size 2gb
Status Testing
Model Nemotron 3 Super
Releasedate 2026-03-10
Vendor
Nvidia
Modality 


Parameter Total 124B
Parameter Active 12B
Size 74gb
Status Stable
Model LTX-2.3
Releasedate 2026-03-03
Vendor
Lightricks
Modality
Parameter Total 22B
Parameter Active 22B
Size 20gb
Status Stable
Model Qwen 3.5
Releasedate 2026-02-16
Vendor
Alibaba Cloud
Modality 

Parameter Total 397B
Parameter Active 17B
Size 233gb
Status Stable
Model MiniMax M2.5
Releasedate 2026-02-12
Vendor
MiniMax AI
Modality 


Parameter Total 229B
Parameter Active 10B
Size 130gb
Status Stable
Model Step 3.5 Flash
Releasedate 2026-02-11
Vendor
Stepfun AI
Modality

Parameter Total 199B
Parameter Active 11B
Size 194gb
Status Stable
Model GLM 5
Releasedate 2026-02-10
Vendor
Z AI
Modality 


Parameter Total 435B
Parameter Active 40B
Size 429gb
Status Stable
Model Qwen 3 Coder Next
Releasedate 2026-02-03
Vendor
Alibaba Cloud
Modality

Parameter Total 80B
Parameter Active 3B
Size 45gb
Status Stable
Model Paddle OCR VL 1.5
Releasedate 2026-01-28
Vendor
Baidu
Modality
Parameter Total 1B
Parameter Active 1B
Size 1gb
Status Stable
Model DeepSeek OCR v2
Releasedate 2026-01-27
Vendor
DeepSeek AI
Modality
Parameter Total 3B
Parameter Active 0.6B
Size 7gb
Status Stable
Model Trinity Large
Releasedate 2026-01-27
Vendor
Arcee AI
Modality 


Parameter Total 398B
Parameter Active 13B
Size 376gb
Status Experimental
Model Kimi K2.5
Releasedate 2026-01-26
Vendor
Moonshot AI
Modality 



Parameter Total 1058B
Parameter Active 32B
Size 550gb
Status Stable
Model GLM 4.7
Releasedate 2025-12-22
Vendor
Z AI
Modality 


Parameter Total 358B
Parameter Active 32B
Size 203gb
Status Stable
Model Devstral 2
Releasedate 2025-12-08
Vendor
Mistral
Modality

Parameter Total 123B
Parameter Active 123B
Size 119gb
Status Experimental
Model Mistral Large 3
Releasedate 2025-12-01
Vendor
Mistral
Modality 



Parameter Total 673B
Parameter Active 41B
Size 375gb
Status Stable
Model DeepSeek V3.2
Releasedate 2025-11-30
Vendor
DeepSeek AI
Modality

Parameter Total 685B
Parameter Active 37B
Size 642gb
Status Stable
Model FLUX.2 Dev
Releasedate 2025-11-25
Vendor
Black Forest Labs
Modality
Parameter Total 32B
Parameter Active 32B
Size 60gb
Status Stable
Model Kimi K2 Thinking
Releasedate 2025-11-06
Vendor
Moonshot AI
Modality 

Parameter Total 1058B
Parameter Active 32B
Size 553gb
Status Stable
Model GLM 4.6
Releasedate 2025-09-30
Vendor
Z AI
Modality


Parameter Total 200B
Parameter Active 32B
Size 187gb
Status Stable
Model Qwen 3 VL
Releasedate 2025-09-23
Vendor
Alibaba Cloud
Modality 
Parameter Total 235B
Parameter Active 22B
Size 125gb
Status Stable
Model Qwen 3 Next Thinking
Releasedate 2025-09-10
Vendor
Alibaba Cloud
Modality
Parameter Total 80B
Parameter Active 3B
Size 44gb
Status Stable
Model Qwen 3 Next Instruct
Releasedate 2025-09-10
Vendor
Alibaba Cloud
Modality

Parameter Total 80B
Parameter Active 3B
Size 44gb
Status Stable
Model Apertus
Releasedate 2025-09-01
Vendor
Swiss AI
Modality
Parameter Total 70B
Parameter Active 70B
Size 67gb
Status Stable
Model GPT OSS
Releasedate 2025-08-05
Vendor
OpenAI
Modality


Parameter Total 120B
Parameter Active 5B
Size 60gb
Status Stable
Model Qwen 3 Coder
Releasedate 2025-07-22
Vendor
Alibaba Cloud
Modality
Parameter Total 241B
Parameter Active 35B
Size 254gb
Status Stable
Model Qwen 3
Releasedate 2025-04-28
Vendor
Alibaba Cloud
Modality
Parameter Total 235B
Parameter Active 22B
Size 133gb
Status Stable
Model Llama 4 Maverick
Releasedate 2025-04-05
Vendor
Meta
Modality 

Parameter Total 401B
Parameter Active 17B
Size 379gb
Status Stable
Model DeepSeek R1
Releasedate 2025-01-20
Vendor
DeepSeek AI
Modality
Parameter Total 396B
Parameter Active 37B
Size 394gb
Status Stable
Model Mistral Large Instruct
Releasedate 2024-07-24
Vendor
Mistral
Modality

Parameter Total 122B
Parameter Active 122B
Size 114gb
Status Stable
Model Llama 3.1
Releasedate 2024-07-23
Vendor
Meta
Modality
Parameter Total 405B
Parameter Active 405B
Size 381gb
Status Stable
Model Mixtral 8x22B
Releasedate 2024-04-10
Vendor
Mistral
Modality
Parameter Total 176B
Parameter Active 40B
Size 68gb
Status Stable
Frequently Asked Questions
- Parameters refer to the number of learnable weights in the model (measured in billions, e.g., 70B). Size refers to the storage space required for the model files (measured in GB). Quantized models have fewer bits per parameter, resulting in smaller file sizes while maintaining most of the model's capabilities.
- In production, you need more VRAM than just the model size due to KV cache memory for context handling. With FP8 quantized KV cache (standard in production), plan for roughly 1.4–1.5× the model size. For example, a 550 GB model runs comfortably on 768 GB VRAM with FP8 KV cache. This is virtually lossless — NVIDIA H100/H200 GPUs have native FP8 tensor core support, making it essentially free performance-wise. With default BF16 KV cache, you'd need 1.7–2× model size instead.
- Experimental models are newer quantizations or configurations that are still being validated. They may offer better performance or efficiency but haven't been thoroughly tested in production environments. Stable models have been verified for reliable operation.
- Choose based on the models you need. Larger models require more VRAM. The S tier (96GB) handles most 70B models, M tier (384GB) supports multiple large models simultaneously, L (768GB) and XL (1440GB) tiers enable the largest frontier models like Llama 4 Maverick and DeepSeek V3.
- Yes, if you have sufficient VRAM. The total size of loaded models must fit within your server's available memory. Larger tiers allow running several models concurrently for different use cases.
- Modalities indicate what types of data a model can process (input) and generate (output). Text models handle written content, image models can analyze or generate visuals, code models are optimized for programming tasks, and multimodal models combine multiple capabilities.