GPT-OSS by OpenAI: Open-Weight AI Models for Developers

OpenAI has released GPT-OSS, a set of open-weight language models under the Apache 2.0 license. You have full access to the model weights, so you can run them locally or on your own cloud setup without API restrictions or subscription fees.

What is GPT-OSS

GPT-OSS is a group of large language models available for free use. They are different from GPT-3 or GPT-4, which require paid APIs. With GPT-OSS, you can deploy the models on your own hardware.

Available models:

gpt-oss-20b: 21 billion parameters, 3.6 billion active per token. Works on GPUs with 16GB VRAM.
gpt-oss-120b: 117 billion parameters, 5.1 billion active per token. Requires high-end GPUs with 80GB+ VRAM.

Both use a Mixture-of-Experts architecture, which activates only part of the model per token for efficiency.

Key Features

Mixture-of-Experts: Activates 4 experts per token out of 32 or 128 per layer, reducing resource needs.
Extended context: Handles up to 128,000 tokens for large documents or long chats.
Tokenization: Uses the o200k_harmony tokenizer with 200,000 tokens in its vocabulary. Optimized for coding and technical language.
Quantization: MoE layers trained with 4-bit MXFP4 precision to save VRAM.
Instruction following: Supports step-by-step reasoning, tool calls, and long-form dialogue.

Performance and Use Cases

The gpt-oss-120b model performs close to mini-GPT-4 variants on reasoning, coding, and STEM benchmarks.
The gpt-oss-20b model reaches about 90 to 92 percent of the 120b model’s quality and is faster to run on consumer GPUs.