OPT 125M

facebook/opt-125m

published May 2022 · updated Sep 2023

OPT 125M is a text-generation model that uses a decoder-only transformer architecture for causal language modeling, trained on a large corpus of English text.

status

coming soon

API providers

downloads / mo

13.7M

license

other

specs

Task	Text Generation
Architecture	Decoder-only Transformer
Parameters	125M
License	Other

about this model

OPT-125M is a text-generation model based on a decoder-only transformer architecture, pretrained with a causal language modeling objective. It is part of the Open Pre-trained Transformers (OPT) suite developed by Meta AI, which includes models from 125M to 175B parameters. The 125M variant is designed for efficient zero- and few-shot learning, and its larger counterpart (OPT-175B) has been shown to be comparable to GPT-3 in performance while requiring only one-seventh the carbon footprint to develop (Zhang et al., 2022).

Training Data

The model was pretrained on a 180B-token corpus (800GB) combining BookCorpus, CC-Stories, subsets of The Pile, Pushshift.io Reddit data, and CCNewsV2. Texts are tokenized using the GPT-2 byte-level BPE with a vocabulary of 50,272 tokens, and the model processes sequences of up to 2,048 consecutive tokens.

Key Strengths

Open weights allow full reproducibility and research, addressing limitations of closed API-only models.
Performance comparable to GPT-3 on many NLP benchmarks (translation, question-answering, cloze tasks) in zero- and few-shot settings.
Trained with efficient methods, reducing compute requirements relative to similar-scale models.

Limitations

As noted in the original model card, the training data contains unfiltered internet content, leading to potential biases, toxicity, and hallucination in generations. The model is not immune to the known issues of large language models, and its outputs should be evaluated critically.

best for

·Zero-shot and few-shot text generation
·Fine-tuning for domain-specific language tasks
·Prompt-based evaluation of NLP benchmarks

FAQ

What is OPT 125M?

OPT 125M is a 125-million parameter decoder-only transformer model for causal language modeling, released by Meta AI as part of the Open Pre-trained Transformer suite.

How does OPT 125M compare to GPT-3?

OPT 125M is a much smaller model in the OPT suite, which was designed to match the performance and sizes of GPT-3 class models but with a smaller carbon footprint. The 175B version is comparable to GPT-3.

What input format does the model expect?

It expects tokenized text using GPT-2 byte-level BPE with a vocabulary size of 50,272, and sequences of up to 2,048 consecutive tokens.

How can I call OPT 125M via the gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key, sending a POST request with a prompt and generation parameters.

What is the license for OPT 125M?

The license is listed as "other" on the Hugging Face model hub; the original model card does not specify a standard open-source license.

not yet live

We're benchmarking and onboarding OPT 125M as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

compare all →

gpt2

13.3M dl/mo

tiny-Qwen2ForCausalLM-2.5

9.2M dl/mo

deepseek-v4-gguf

6.4M dl/mo

Qwen3.6-35B-A3B-NVFP4

6.2M dl/mo

gemma-3-270m

5.1M dl/mo

dolphin-2.9.1-yi-1.5-34b

4.6M dl/mo