Qwythos 9B

empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF

published Jun 2026 · updated Jun 2026

Qwythos 9B is a vlm model that fine-tunes Qwen3.5-9B on 500M tokens of Claude Mythos/Fable traces for enhanced reasoning, with vision input via a frozen CLIP tower, native function calling, and a 1M token context window.

status

coming soon

API providers

downloads / mo

1.4M

license

apache-2.0

specs

Task	Vision-Language Reasoning
Architecture	Transformer (Qwen3.5-9B base) with YaRN rope-scaling and CLIP-style vision encoder
Parameters	9 billion
Context Window	1,048,576 tokens (1M)
License	Apache-2.0

about this model

Qwythos-9B-Claude-Mythos-5-1M-GGUF is a vision-language model (VLM) and reasoning model that processes both text and image inputs, post-trained on over 500 million tokens of high-quality Claude Mythos and Claude Fable chain-of-thought reasoning traces generated in-house by Empero AI. It inherits the vision tower of Qwen3.5-9B (frozen during training, so image-grounded behavior matches the base model) and adds a full-parameter reasoning fine-tune.

Under matched lm-eval-harness evaluation, Qwythos-9B outperforms base Qwen3.5-9B by +34 points on MMLU, +30 points on gsm8k-strict, and +19 points on gsm8k-flex. It supports native function calling per the Qwen3.5 chat template specification and ships with a 1,048,576-token (1M) context window via YaRN rope-scaling enabled by default. An optional multi-token prediction (MTP) head variant is available for speculative decoding in compatible runtimes.

For vision tasks, the model requires pairing a text quantization with the included mmproj-Qwythos-9B-Claude-Mythos-5-1M-F16.gguf file (CLIP-style vision encoder and projector). Vision capabilities include detailed image description, OCR (printed and handwritten), chart and table reading, UI and document understanding, and basic spatial reasoning — matching the documented behavior of Qwen3.5-9B. The vision path was not fine-tuned during SFT, so image-grounded reasoning has not been independently evaluated for this release.

Recommended sampling parameters are temperature 0.6, top-p 0.95, top-k 20, and repeat penalty 1.05. Greedy decoding or temperatures at or below 0.3 can cause repetition loops during lengthy reasoning generations. The model is uncensored and engages seriously with technically demanding questions across cybersecurity, biology, pharmacology, and clinical medicine.

The following quantization files are available (normal text weights):

File	Quant	Size	Notes
`Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf`	Q4_K_M	5.24 GiB / 5.63 GB	Recommended default
`Qwythos-9B-Claude-Mythos-5-1M-Q5_K_M.gguf`	Q5_K_M	6.02 GiB / 6.47 GB	Balanced quality and size
`Qwythos-9B-Claude-Mythos-5-1M-Q6_K.gguf`	Q6_K	6.85 GiB / 7.36 GB	High quality
`Qwythos-9B-Claude-Mythos-5-1M-Q8_0.gguf`	Q8_0	8.87 GiB / 9.53 GB	Near-lossless
`Qwythos-9B-Claude-Mythos-5-1M-BF16.gguf`	BF16	16.69 GiB / 17.92 GB	Full precision conversion base

Qwythos-9B model logo

best for

·Complex multimodal reasoning with text and images
·Tool-use and function calling in agentic workflows
·Uncensored technical Q&A in cybersecurity, biology, and medicine
·Long-document analysis with 1M token context

FAQ

What is Qwythos 9B best used for?

Complex reasoning with vision inputs, native function calling for tool use, and uncensored technical Q&A across domains like cybersecurity and biology.

What is the model's context window size?

It supports up to 1,048,576 tokens (1M) via YaRN rope-scaling.

Does it support image inputs?

Yes, it inherits a CLIP-style vision encoder from Qwen3.5-9B for image description, OCR, chart reading, and document understanding.

What are the recommended sampling parameters?

Temperature 0.6, top_p 0.95, top_k 20, repeat_penalty 1.05. Avoid greedy or very low temperature sampling.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, following the standard chat completions format.

not yet live

We're benchmarking and onboarding Qwythos 9B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →

Qwen2.5-VL-7B-Instruct

9.8M dl/mo

Qwen3.6-35B-A3B-FP8

6.2M dl/mo

Qwen2.5-VL-3B-Instruct

5.3M dl/mo

gemma-4-26B-A4B-it-AWQ-4bit