Ornith 1.0 397B

deepreinforce-ai/Ornith-1.0-397B-FP8

published Jun 2026 · updated Jun 2026

Ornith 1.0 397B is a text-generation model designed for agentic coding, achieving state-of-the-art performance on coding benchmarks through a self-improving reinforcement learning framework.

status

coming soon

API providers

downloads / mo

65K

license

mit

specs

Task	Text Generation (Coding Agent)
Architecture	Mixture of Experts (MoE) post-trained on Gemma 4 and Qwen 3.5
Parameters	397B
License	MIT

about this model

Ornith-1.0-397B is a text-generation model specialized for agentic coding tasks, including repository-level code generation, software engineering, and tool-use scenarios. Built on Qwen 3.5 and Gemma 4 pretrained bases and released under the MIT license, it achieves state-of-the-art results among open-source models of its size.

Capabilities

The model employs a self-improving reinforcement learning framework that jointly optimizes both the scaffold (the search or tool-use strategy) and the solution rollout. This two-stage training loop allows Ornith to discover better search trajectories and generate higher-quality solutions. A three-layer defense against reward hacking is built in: an immutable trust boundary, a deterministic monitor, and a frozen LLM judge.

Benchmark Results

Ornith-1.0-397B surpasses comparably sized open models and competes with larger proprietary systems across multiple agentic coding benchmarks.

Benchmark	Ornith-1.0-397B	Qwen3.5-397B	Qwen3.7-Max	GLM-5.2-744B	Minimax-M3-428B	DeepSeek-V4-Pro	Claude Opus 4.7	Claude Opus 4.8
Terminal-Bench 2.1 (Terminus-2)	77.5	53.5	73.5	81.0	64	64	70.3	85
Terminal-Bench 2.1 (Claude Code)	78.2	48.6	69.8	82.7	-	66.5	69.7	78.9
SWE-bench Verified	82.4	76.4	80.4	-	-	80.6	80.8	87.6
SWE-bench Pro	62.2	51.6	60.6	62.1	59	55.4	64.3	69.2
SWE-bench Multilingual	78.9	69.3	78.3	-	-	76.2	-	-
NL2Repo	48.2	36.8	47.2	48.9	42.1	-	-	69.7
Claw-eval Avg	77.1	70.7	65.2	-	-	75.8	78.2	-
SWE Atlas – QnA	41.2	20.4	-	-	37.9	27.2	40.3	48.8
SWE Atlas – RF	42.6	18.4	-	-	-	-	-	-

Performance comparison chart showing Ornith-1.0-397B against other models

Training Methodology

The self-scaffolding RL approach first proposes a refined scaffold, then generates a solution rollout conditioned on that scaffold. Reward is propagated to both stages, enabling the model to improve its own search strategies. The built-in reward hacking defenses use an immutable outer trust boundary, a deterministic monitor to flag violations, and a frozen LLM judge to detect intent-level gaming within allowed tool surfaces.

best for

·Building autonomous software engineering agents that can resolve real-world GitHub issues
·Automated code generation and repository-level programming from natural language descriptions
·Advanced terminal-based coding assistants and tool-use AI agents

FAQ

What is Ornith 1.0 397B best used for?

Agentic coding tasks like SWE-bench, Terminal-Bench, and NL2Repo, where it generates both the scaffolding and solution rollouts via RL.

How many parameters does this model have?

397B parameters, using a Mixture of Experts (MoE) architecture.

What license is the model released under?

MIT license, globally accessible with no regional restrictions.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint at gigarouter with your API key, sending text prompts to the model's deployed endpoint.

What base models was Ornith 1.0 397B post-trained on?

It is post-trained on top of Gemma 4 and Qwen 3.5 pretrained models.

not yet live

We're benchmarking and onboarding Ornith 1.0 397B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

tiny-Qwen2ForCausalLM-2.5

9.2M dl/mo

deepseek-v4-gguf

6.4M dl/mo

Qwen3.6-35B-A3B-NVFP4

6.2M dl/mo

gemma-3-270m

5.1M dl/mo