OPT 125M
facebook/opt-125m
published May 2022 · updated Sep 2023
OPT 125M is a text-generation model that uses a decoder-only transformer architecture for causal language modeling, trained on a large corpus of English text.
specs
| Task | Text Generation |
| Architecture | Decoder-only Transformer |
| Parameters | 125M |
| License | Other |
about this model
OPT-125M is a text-generation model based on a decoder-only transformer architecture, pretrained with a causal language modeling objective. It is part of the Open Pre-trained Transformers (OPT) suite developed by Meta AI, which includes models from 125M to 175B parameters. The 125M variant is designed for efficient zero- and few-shot learning, and its larger counterpart (OPT-175B) has been shown to be comparable to GPT-3 in performance while requiring only one-seventh the carbon footprint to develop (Zhang et al., 2022).
Training Data
The model was pretrained on a 180B-token corpus (800GB) combining BookCorpus, CC-Stories, subsets of The Pile, Pushshift.io Reddit data, and CCNewsV2. Texts are tokenized using the GPT-2 byte-level BPE with a vocabulary of 50,272 tokens, and the model processes sequences of up to 2,048 consecutive tokens.
Key Strengths
- Open weights allow full reproducibility and research, addressing limitations of closed API-only models.
- Performance comparable to GPT-3 on many NLP benchmarks (translation, question-answering, cloze tasks) in zero- and few-shot settings.
- Trained with efficient methods, reducing compute requirements relative to similar-scale models.
Limitations
As noted in the original model card, the training data contains unfiltered internet content, leading to potential biases, toxicity, and hallucination in generations. The model is not immune to the known issues of large language models, and its outputs should be evaluated critically.
best for
- ·Zero-shot and few-shot text generation
- ·Fine-tuning for domain-specific language tasks
- ·Prompt-based evaluation of NLP benchmarks
FAQ
OPT 125M is a 125-million parameter decoder-only transformer model for causal language modeling, released by Meta AI as part of the Open Pre-trained Transformer suite.
OPT 125M is a much smaller model in the OPT suite, which was designed to match the performance and sizes of GPT-3 class models but with a smaller carbon footprint. The 175B version is comparable to GPT-3.
It expects tokenized text using GPT-2 byte-level BPE with a vocabulary size of 50,272, and sequences of up to 2,048 consecutive tokens.
Use the OpenAI-compatible endpoint with your gigarouter API key, sending a POST request with a prompt and generation parameters.
The license is listed as "other" on the Hugging Face model hub; the original model card does not specify a standard open-source license.
We're benchmarking and onboarding OPT 125M as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.