skip to content
gigarouter gigarouter
models / text generation · coming soon

GPT-2

openai-community/gpt2

published Mar 2022 · updated Feb 2024

GPT-2 is a text-generation model that predicts the next word in a sequence, trained on a large corpus of English data using a causal language modeling objective.

status
coming soon
API providers
0
downloads / mo
13.3M
license
mit

specs

TaskText Generation
ArchitectureTransformer (decoder-only)
Parameters124M
LicenseModified MIT

about this model

GPT-2 is a text-generation model that predicts the next word in a sequence, pretrained on a large corpus of English text using a causal language modeling objective. Developed by OpenAI in February 2019, this is the smallest version of GPT-2 with 124 million parameters. The model was trained on the WebText dataset, which comprises text from 45 million outbound Reddit links (excluding Wikipedia) and contains approximately 40GB of text. The training data cutoff is the end of 2017.

The model uses a byte-level version of Byte Pair Encoding (BPE) with a vocabulary size of 50,257 and processes input sequences of up to 1024 consecutive tokens. It was trained to predict the next token in a sequence using a causal mask, learning an internal representation of English that is best suited for text generation from a prompt.

Zero-shot evaluation results

DatasetLAMBADA (PPL)LAMBADA (ACC)CBT-CN (ACC)CBT-NE (ACC)WikiText2 (PPL)PTB (PPL)enwiki8 (BPB)text8 (BPC)WikiText103 (PPL)1BW (PPL)
GPT-2 (124M)35.1345.9987.6583.429.4165.851.161.1737.5075.20

The model was trained on the WebText dataset, which contains text from 45 million outbound Reddit links (excluding Wikipedia) and weighs approximately 40GB. The training data reflects the biases present in unfiltered internet content. As noted in the original model card, GPT-2 does not distinguish fact from fiction, and all versions should be approached with similar caution regarding biases related to human attributes.

Link to ExBERT interactive visualization for GPT-2

best for

FAQ

What is GPT-2 best used for?

GPT-2 is best for generating coherent English text from a prompt, such as creative writing, autocompletion, or chatbots. It is not fine-tuned for factual accuracy.

How does GPT-2 compare to larger versions in size and speed?

GPT-2 (124M parameters) is the smallest version, making it faster and less resource-intensive than GPT-2 Medium (355M), Large (774M), or XL (1.5B).

What is the license for GPT-2?

GPT-2 is released under a Modified MIT license, which permits use, modification, and distribution with attribution.

What input format does GPT-2 expect?

GPT-2 expects tokenized text using a byte-level BPE tokenizer with a vocabulary size of 50,257. Input sequences can be up to 1024 tokens.

How can I call GPT-2 via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending a prompt in the standard chat completions or text completions format.

not yet live

We're benchmarking and onboarding GPT-2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

compare all →