skip to content
gigarouter gigarouter
models / text generation · coming soon

Ornith 1.0 397B

deepreinforce-ai/Ornith-1.0-397B-FP8

published Jun 2026 · updated Jun 2026

Ornith 1.0 397B is a text-generation model designed for agentic coding, achieving state-of-the-art performance on coding benchmarks through a self-improving reinforcement learning framework.

status
coming soon
API providers
0
downloads / mo
65K
license
mit

specs

TaskText Generation (Coding Agent)
ArchitectureMixture of Experts (MoE) post-trained on Gemma 4 and Qwen 3.5
Parameters397B
LicenseMIT

about this model

Ornith-1.0-397B is a text-generation model specialized for agentic coding tasks, including repository-level code generation, software engineering, and tool-use scenarios. Built on Qwen 3.5 and Gemma 4 pretrained bases and released under the MIT license, it achieves state-of-the-art results among open-source models of its size.

Capabilities

The model employs a self-improving reinforcement learning framework that jointly optimizes both the scaffold (the search or tool-use strategy) and the solution rollout. This two-stage training loop allows Ornith to discover better search trajectories and generate higher-quality solutions. A three-layer defense against reward hacking is built in: an immutable trust boundary, a deterministic monitor, and a frozen LLM judge.

Benchmark Results

Ornith-1.0-397B surpasses comparably sized open models and competes with larger proprietary systems across multiple agentic coding benchmarks.

Ornith model architecture diagram
Benchmark Ornith-1.0-397B Qwen3.5-397B Qwen3.7-Max GLM-5.2-744B Minimax-M3-428B DeepSeek-V4-Pro Claude Opus 4.7 Claude Opus 4.8
Terminal-Bench 2.1 (Terminus-2)77.553.573.581.0646470.385
Terminal-Bench 2.1 (Claude Code)78.248.669.882.7-66.569.778.9
SWE-bench Verified82.476.480.4--80.680.887.6
SWE-bench Pro62.251.660.662.15955.464.369.2
SWE-bench Multilingual78.969.378.3--76.2--
NL2Repo48.236.847.248.942.1--69.7
Claw-eval Avg77.170.765.2--75.878.2-
SWE Atlas – QnA41.220.4--37.927.240.348.8
SWE Atlas – RF42.618.4------
Performance comparison chart showing Ornith-1.0-397B against other models

Training Methodology

The self-scaffolding RL approach first proposes a refined scaffold, then generates a solution rollout conditioned on that scaffold. Reward is propagated to both stages, enabling the model to improve its own search strategies. The built-in reward hacking defenses use an immutable outer trust boundary, a deterministic monitor to flag violations, and a frozen LLM judge to detect intent-level gaming within allowed tool surfaces.

best for

FAQ

What is Ornith 1.0 397B best used for?

Agentic coding tasks like SWE-bench, Terminal-Bench, and NL2Repo, where it generates both the scaffolding and solution rollouts via RL.

How many parameters does this model have?

397B parameters, using a Mixture of Experts (MoE) architecture.

What license is the model released under?

MIT license, globally accessible with no regional restrictions.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint at gigarouter with your API key, sending text prompts to the model's deployed endpoint.

What base models was Ornith 1.0 397B post-trained on?

It is post-trained on top of Gemma 4 and Qwen 3.5 pretrained models.

not yet live

We're benchmarking and onboarding Ornith 1.0 397B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

compare all →