AI & LLM Models – onprem.ai

Sort:

Modality:

Model Qwen 3.6 VL

Releasedate 2026-04-24

Vendor

Alibaba Cloud

Modality

Parameter Total 35B

Parameter Active 3B

Size 24gb

Status Testing

Model Nemotron OCR v2

Releasedate 2026-04-02

Vendor

Nvidia

Modality

Parameter Total 0.1B

Parameter Active 0.1B

Size 0.4gb

Status Testing

Model Gemma 4

Releasedate 2026-04-02

Vendor

Google

Modality

Parameter Total 31B

Parameter Active 31B

Size 32gb

Status Testing

Model Cohere Transcribe

Releasedate 2026-03-25

Vendor

Cohere Labs

Modality

Parameter Total 2B

Parameter Active 2B

Size 2gb

Status Testing

Model Nemotron 3 Super

Releasedate 2026-03-10

Vendor

Nvidia

Modality

Parameter Total 124B

Parameter Active 12B

Size 74gb

Status Stable

Model LTX-2.3

Releasedate 2026-03-03

Vendor

Lightricks

Modality

Parameter Total 22B

Parameter Active 22B

Size 20gb

Status Stable

Model Qwen 3.5

Releasedate 2026-02-16

Vendor

Alibaba Cloud

Modality

Parameter Total 397B

Parameter Active 17B

Size 233gb

Status Stable

Model MiniMax M2.5

Releasedate 2026-02-12

Vendor

MiniMax AI

Modality

Parameter Total 229B

Parameter Active 10B

Size 130gb

Status Stable

Model Step 3.5 Flash

Releasedate 2026-02-11

Vendor

Stepfun AI

Modality

Parameter Total 199B

Parameter Active 11B

Size 194gb

Status Stable

Model GLM 5

Releasedate 2026-02-10

Vendor

Z AI

Modality

Parameter Total 435B

Parameter Active 40B

Size 429gb

Status Stable

Model Qwen 3 Coder Next

Releasedate 2026-02-03

Vendor

Alibaba Cloud

Modality

Parameter Total 80B

Parameter Active 3B

Size 45gb

Status Stable

Model Paddle OCR VL 1.5

Releasedate 2026-01-28

Vendor

Baidu

Modality

Parameter Total 1B

Parameter Active 1B

Size 1gb

Status Stable

Model DeepSeek OCR v2

Releasedate 2026-01-27

Vendor

DeepSeek AI

Modality

Parameter Total 3B

Parameter Active 0.6B

Size 7gb

Status Stable

Model Trinity Large

Releasedate 2026-01-27

Vendor

Arcee AI

Modality

Parameter Total 398B

Parameter Active 13B

Size 376gb

Status Experimental

Model Kimi K2.5

Releasedate 2026-01-26

Vendor

Moonshot AI

Modality

Parameter Total 1058B

Parameter Active 32B

Size 550gb

Status Stable

Model GLM 4.7

Releasedate 2025-12-22

Vendor

Z AI

Modality

Parameter Total 358B

Parameter Active 32B

Size 203gb

Status Stable

Model Devstral 2

Releasedate 2025-12-08

Vendor

Mistral

Modality

Parameter Total 123B

Parameter Active 123B

Size 119gb

Status Experimental

Model Mistral Large 3

Releasedate 2025-12-01

Vendor

Mistral

Modality

Parameter Total 673B

Parameter Active 41B

Size 375gb

Status Stable

Model DeepSeek V3.2

Releasedate 2025-11-30

Vendor

DeepSeek AI

Modality

Parameter Total 685B

Parameter Active 37B

Size 642gb

Status Stable

Model FLUX.2 Dev

Releasedate 2025-11-25

Vendor

Black Forest Labs

Modality

Parameter Total 32B

Parameter Active 32B

Size 60gb

Status Stable

Model Kimi K2 Thinking

Releasedate 2025-11-06

Vendor

Moonshot AI

Modality

Parameter Total 1058B

Parameter Active 32B

Size 553gb

Status Stable

Model GLM 4.6

Releasedate 2025-09-30

Vendor

Z AI

Modality

Parameter Total 200B

Parameter Active 32B

Size 187gb

Status Stable

Model Qwen 3 VL

Releasedate 2025-09-23

Vendor

Alibaba Cloud

Modality

Parameter Total 235B

Parameter Active 22B

Size 125gb

Status Stable

Model Qwen 3 Next Thinking

Releasedate 2025-09-10

Vendor

Alibaba Cloud

Modality

Parameter Total 80B

Parameter Active 3B

Size 44gb

Status Stable

Model Qwen 3 Next Instruct

Releasedate 2025-09-10

Vendor

Alibaba Cloud

Modality

Parameter Total 80B

Parameter Active 3B

Size 44gb

Status Stable

Model Apertus

Releasedate 2025-09-01

Vendor

Swiss AI

Modality

Parameter Total 70B

Parameter Active 70B

Size 67gb

Status Stable

Model GPT OSS

Releasedate 2025-08-05

Vendor

OpenAI

Modality

Parameter Total 120B

Parameter Active 5B

Size 60gb

Status Stable

Model Qwen 3 Coder

Releasedate 2025-07-22

Vendor

Alibaba Cloud

Modality

Parameter Total 241B

Parameter Active 35B

Size 254gb

Status Stable

Model Qwen 3

Releasedate 2025-04-28

Vendor

Alibaba Cloud

Modality

Parameter Total 235B

Parameter Active 22B

Size 133gb

Status Stable

Model Llama 4 Maverick

Releasedate 2025-04-05

Vendor

Meta

Modality

Parameter Total 405B

Parameter Active 405B

Size 381gb

Status Stable

Model Mixtral 8x22B

Releasedate 2024-04-10

Vendor

Mistral

Modality

Parameter Total 176B

Parameter Active 40B

Size 68gb

Status Stable

Model	Vendor	Releasedate	Parameter Total	Parameter Active	Size	Status
Qwen 3.6 VL	Alibaba Cloud	2026-04-24	35B	3B	24gb	Testing
Nemotron OCR v2	Nvidia	2026-04-02	0.1B	0.1B	0.4gb	Testing
Gemma 4	Google	2026-04-02	31B	31B	32gb	Testing
Cohere Transcribe	Cohere Labs	2026-03-25	2B	2B	2gb	Testing
Nemotron 3 Super	Nvidia	2026-03-10	124B	12B	74gb	Stable
LTX-2.3	Lightricks	2026-03-03	22B	22B	20gb	Stable
Qwen 3.5	Alibaba Cloud	2026-02-16	397B	17B	233gb	Stable
MiniMax M2.5	MiniMax AI	2026-02-12	229B	10B	130gb	Stable
Step 3.5 Flash	Stepfun AI	2026-02-11	199B	11B	194gb	Stable
GLM 5	Z AI	2026-02-10	435B	40B	429gb	Stable
Qwen 3 Coder Next	Alibaba Cloud	2026-02-03	80B	3B	45gb	Stable
Paddle OCR VL 1.5	Baidu	2026-01-28	1B	1B	1gb	Stable
DeepSeek OCR v2	DeepSeek AI	2026-01-27	3B	0.6B	7gb	Stable
Trinity Large	Arcee AI	2026-01-27	398B	13B	376gb	Experimental
Kimi K2.5	Moonshot AI	2026-01-26	1058B	32B	550gb	Stable
GLM 4.7	Z AI	2025-12-22	358B	32B	203gb	Stable
Devstral 2	Mistral	2025-12-08	123B	123B	119gb	Experimental
Mistral Large 3	Mistral	2025-12-01	673B	41B	375gb	Stable
DeepSeek V3.2	DeepSeek AI	2025-11-30	685B	37B	642gb	Stable
FLUX.2 Dev	Black Forest Labs	2025-11-25	32B	32B	60gb	Stable
Kimi K2 Thinking	Moonshot AI	2025-11-06	1058B	32B	553gb	Stable
GLM 4.6	Z AI	2025-09-30	200B	32B	187gb	Stable
Qwen 3 VL	Alibaba Cloud	2025-09-23	235B	22B	125gb	Stable
Qwen 3 Next Thinking	Alibaba Cloud	2025-09-10	80B	3B	44gb	Stable
Qwen 3 Next Instruct	Alibaba Cloud	2025-09-10	80B	3B	44gb	Stable
Apertus	Swiss AI	2025-09-01	70B	70B	67gb	Stable
GPT OSS	OpenAI	2025-08-05	120B	5B	60gb	Stable
Qwen 3 Coder	Alibaba Cloud	2025-07-22	241B	35B	254gb	Stable
Qwen 3	Alibaba Cloud	2025-04-28	235B	22B	133gb	Stable
Llama 4 Maverick	Meta	2025-04-05	401B	17B	379gb	Stable
DeepSeek R1	DeepSeek AI	2025-01-20	396B	37B	394gb	Stable
Mistral Large Instruct	Mistral	2024-07-24	122B	122B	114gb	Stable
Llama 3.1	Meta	2024-07-23	405B	405B	381gb	Stable
Mixtral 8x22B	Mistral	2024-04-10	176B	40B	68gb	Stable

Frequently Asked Questions

Parameters refer to the number of learnable weights in the model (measured in billions, e.g., 70B). Size refers to the storage space required for the model files (measured in GB). Quantized models have fewer bits per parameter, resulting in smaller file sizes while maintaining most of the model's capabilities.
In production, you need more VRAM than just the model size due to KV cache memory for context handling. With FP8 quantized KV cache (standard in production), plan for roughly 1.4–1.5× the model size. For example, a 550 GB model runs comfortably on 768 GB VRAM with FP8 KV cache. This is virtually lossless — NVIDIA H100/H200 GPUs have native FP8 tensor core support, making it essentially free performance-wise. With default BF16 KV cache, you'd need 1.7–2× model size instead.
Experimental models are newer quantizations or configurations that are still being validated. They may offer better performance or efficiency but haven't been thoroughly tested in production environments. Stable models have been verified for reliable operation.
Choose based on the models you need. Larger models require more VRAM. The S tier (96GB) handles most 70B models, M tier (384GB) supports multiple large models simultaneously, L (768GB) and XL (1440GB) tiers enable the largest frontier models like Llama 4 Maverick and DeepSeek V3.
Yes, if you have sufficient VRAM. The total size of loaded models must fit within your server's available memory. Larger tiers allow running several models concurrently for different use cases.
Modalities indicate what types of data a model can process (input) and generate (output). Text models handle written content, image models can analyze or generate visuals, code models are optimized for programming tasks, and multimodal models combine multiple capabilities.

AI Models

Frontier LLM Models for On Premise Deployments

Frequently Asked Questions