Qwythos 9B
empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF
published Jun 2026 · updated Jun 2026
Qwythos 9B is a vlm model that fine-tunes Qwen3.5-9B on 500M tokens of Claude Mythos/Fable traces for enhanced reasoning, with vision input via a frozen CLIP tower, native function calling, and a 1M token context window.
specs
| Task | Vision-Language Reasoning |
| Architecture | Transformer (Qwen3.5-9B base) with YaRN rope-scaling and CLIP-style vision encoder |
| Parameters | 9 billion |
| Context Window | 1,048,576 tokens (1M) |
| License | Apache-2.0 |
about this model
Qwythos-9B-Claude-Mythos-5-1M-GGUF is a vision-language model (VLM) and reasoning model that processes both text and image inputs, post-trained on over 500 million tokens of high-quality Claude Mythos and Claude Fable chain-of-thought reasoning traces generated in-house by Empero AI. It inherits the vision tower of Qwen3.5-9B (frozen during training, so image-grounded behavior matches the base model) and adds a full-parameter reasoning fine-tune.
Under matched lm-eval-harness evaluation, Qwythos-9B outperforms base Qwen3.5-9B by +34 points on MMLU, +30 points on gsm8k-strict, and +19 points on gsm8k-flex. It supports native function calling per the Qwen3.5 chat template specification and ships with a 1,048,576-token (1M) context window via YaRN rope-scaling enabled by default. An optional multi-token prediction (MTP) head variant is available for speculative decoding in compatible runtimes.
For vision tasks, the model requires pairing a text quantization with the included mmproj-Qwythos-9B-Claude-Mythos-5-1M-F16.gguf file (CLIP-style vision encoder and projector). Vision capabilities include detailed image description, OCR (printed and handwritten), chart and table reading, UI and document understanding, and basic spatial reasoning — matching the documented behavior of Qwen3.5-9B. The vision path was not fine-tuned during SFT, so image-grounded reasoning has not been independently evaluated for this release.
Recommended sampling parameters are temperature 0.6, top-p 0.95, top-k 20, and repeat penalty 1.05. Greedy decoding or temperatures at or below 0.3 can cause repetition loops during lengthy reasoning generations. The model is uncensored and engages seriously with technically demanding questions across cybersecurity, biology, pharmacology, and clinical medicine.
The following quantization files are available (normal text weights):
| File | Quant | Size | Notes |
|---|---|---|---|
Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf |
Q4_K_M | 5.24 GiB / 5.63 GB | Recommended default |
Qwythos-9B-Claude-Mythos-5-1M-Q5_K_M.gguf |
Q5_K_M | 6.02 GiB / 6.47 GB | Balanced quality and size |
Qwythos-9B-Claude-Mythos-5-1M-Q6_K.gguf |
Q6_K | 6.85 GiB / 7.36 GB | High quality |
Qwythos-9B-Claude-Mythos-5-1M-Q8_0.gguf |
Q8_0 | 8.87 GiB / 9.53 GB | Near-lossless |
Qwythos-9B-Claude-Mythos-5-1M-BF16.gguf |
BF16 | 16.69 GiB / 17.92 GB | Full precision conversion base |

best for
- ·Complex multimodal reasoning with text and images
- ·Tool-use and function calling in agentic workflows
- ·Uncensored technical Q&A in cybersecurity, biology, and medicine
- ·Long-document analysis with 1M token context
FAQ
Complex reasoning with vision inputs, native function calling for tool use, and uncensored technical Q&A across domains like cybersecurity and biology.
It supports up to 1,048,576 tokens (1M) via YaRN rope-scaling.
Yes, it inherits a CLIP-style vision encoder from Qwen3.5-9B for image description, OCR, chart reading, and document understanding.
Temperature 0.6, top_p 0.95, top_k 20, repeat_penalty 1.05. Avoid greedy or very low temperature sampling.
Use the OpenAI-compatible endpoint with your API key, following the standard chat completions format.
We're benchmarking and onboarding Qwythos 9B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.