Ornith 1.0 397B
deepreinforce-ai/Ornith-1.0-397B-FP8
published Jun 2026 · updated Jun 2026
Ornith 1.0 397B is a text-generation model designed for agentic coding, achieving state-of-the-art performance on coding benchmarks through a self-improving reinforcement learning framework.
specs
| Task | Text Generation (Coding Agent) |
| Architecture | Mixture of Experts (MoE) post-trained on Gemma 4 and Qwen 3.5 |
| Parameters | 397B |
| License | MIT |
about this model
Ornith-1.0-397B is a text-generation model specialized for agentic coding tasks, including repository-level code generation, software engineering, and tool-use scenarios. Built on Qwen 3.5 and Gemma 4 pretrained bases and released under the MIT license, it achieves state-of-the-art results among open-source models of its size.
Capabilities
The model employs a self-improving reinforcement learning framework that jointly optimizes both the scaffold (the search or tool-use strategy) and the solution rollout. This two-stage training loop allows Ornith to discover better search trajectories and generate higher-quality solutions. A three-layer defense against reward hacking is built in: an immutable trust boundary, a deterministic monitor, and a frozen LLM judge.
Benchmark Results
Ornith-1.0-397B surpasses comparably sized open models and competes with larger proprietary systems across multiple agentic coding benchmarks.
| Benchmark | Ornith-1.0-397B | Qwen3.5-397B | Qwen3.7-Max | GLM-5.2-744B | Minimax-M3-428B | DeepSeek-V4-Pro | Claude Opus 4.7 | Claude Opus 4.8 |
|---|---|---|---|---|---|---|---|---|
| Terminal-Bench 2.1 (Terminus-2) | 77.5 | 53.5 | 73.5 | 81.0 | 64 | 64 | 70.3 | 85 |
| Terminal-Bench 2.1 (Claude Code) | 78.2 | 48.6 | 69.8 | 82.7 | - | 66.5 | 69.7 | 78.9 |
| SWE-bench Verified | 82.4 | 76.4 | 80.4 | - | - | 80.6 | 80.8 | 87.6 |
| SWE-bench Pro | 62.2 | 51.6 | 60.6 | 62.1 | 59 | 55.4 | 64.3 | 69.2 |
| SWE-bench Multilingual | 78.9 | 69.3 | 78.3 | - | - | 76.2 | - | - |
| NL2Repo | 48.2 | 36.8 | 47.2 | 48.9 | 42.1 | - | - | 69.7 |
| Claw-eval Avg | 77.1 | 70.7 | 65.2 | - | - | 75.8 | 78.2 | - |
| SWE Atlas – QnA | 41.2 | 20.4 | - | - | 37.9 | 27.2 | 40.3 | 48.8 |
| SWE Atlas – RF | 42.6 | 18.4 | - | - | - | - | - | - |
Training Methodology
The self-scaffolding RL approach first proposes a refined scaffold, then generates a solution rollout conditioned on that scaffold. Reward is propagated to both stages, enabling the model to improve its own search strategies. The built-in reward hacking defenses use an immutable outer trust boundary, a deterministic monitor to flag violations, and a frozen LLM judge to detect intent-level gaming within allowed tool surfaces.
best for
- ·Building autonomous software engineering agents that can resolve real-world GitHub issues
- ·Automated code generation and repository-level programming from natural language descriptions
- ·Advanced terminal-based coding assistants and tool-use AI agents
FAQ
Agentic coding tasks like SWE-bench, Terminal-Bench, and NL2Repo, where it generates both the scaffolding and solution rollouts via RL.
397B parameters, using a Mixture of Experts (MoE) architecture.
MIT license, globally accessible with no regional restrictions.
Use the OpenAI-compatible endpoint at gigarouter with your API key, sending text prompts to the model's deployed endpoint.
It is post-trained on top of Gemma 4 and Qwen 3.5 pretrained models.
We're benchmarking and onboarding Ornith 1.0 397B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.