skip to content
gigarouter gigarouter
models / vision-language · coming soon

Qwen3.6-40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking NEO CODE Di IMatrix MAX

DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF

published May 2026 · updated Jun 2026

Qwen3.6-40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking NEO CODE Di IMatrix MAX is a vlm model that expands the Qwen 3.6 27B base to 40B parameters with Deckard/Heretic uncensored finetuning, Claude 4.6 Opus reasoning distillation, and dual-imatrix GGUF quants for high-precision, uncensored text and vision tasks.

status
coming soon
API providers
0
downloads / mo
519.4K
license
apache-2.0

specs

TaskText generation with vision (multimodal), reasoning, coding, creative writing
ArchitectureDense causal language model with vision encoder, 96 layers, 1275 tensors
Parameters40B (expanded from 27B)
LicenseUnspecified (uncensored, no safety alignment)

about this model

DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF is a 40-billion-parameter dense vision-language model (VLM) built from Qwen 3.6 27B, expanded and fine-tuned for uncensored, high-reasoning tasks with variable-length thinking and 256K native context.

Architecture and training

The model expands the original 64-layer, 27B Qwen 3.6 to 96 layers and 1275 tensors (approximately 40B parameters). Training proceeds in stages: first uncensored (Heretic), then on five Deckard internal datasets for character, intelligence, depth, observation, and point of view, followed by expansion to 40B, and finally distillation on a Claude 4.6 Opus high-reasoning dataset to shorten and stabilize reasoning. Vision capabilities are preserved and require an mmproj file for image inputs.

Key strengths

  • Fully uncensored with no safety alignment; no content restrictions.
  • Variable-length reasoning — shorter for simple queries, deeper for complex ones.
  • NEO-CODE-Di-IMatrix-MAX quants engineered for balance and precision, benchmarked against BF16 full precision: IQ2_M at 83-84%, IQ4XS at 94%, Q8_0 HIGH at 98.4%.
  • Outperforms the base Qwen 3.6 27B model in 6 out of 7 benchmarks in instruct mode.

Benchmark results (instruct mode)

BenchmarkThis model (mxfp8)Base Qwen 3.6 27B (mxfp8)
ARC-c0.6510.647
ARC/e0.8160.803
BoolQ0.9080.910
HellaSwag0.773
OBQA0.450
PIQA0.806
WinoGrande0.742

Note: instruct mode yields stronger scores; this model exceeds the base in 6 of 7 benchmarks despite a minor regression on BoolQ.

Model architecture diagram for Qwen3.6-40B Qwen3.6-27B base model banner

best for

FAQ

What is the context length of this model?

It supports 256K tokens natively.

Is this model censored or safety-aligned?

No, safety alignment is removed — it is fully uncensored and unfiltered.

What quant quality should I use for best results?

The card suggests a minimum of Q4_K_S (non-imatrix) or IQ3_S (imatrix) or higher; for toolcalls, Q5/Q6 minimum.

Does this model support image inputs?

Yes, it has a vision encoder and requires an mmproj file placed in the same folder as the GGUF for image processing.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, selecting the model name as listed on the platform.

not yet live

We're benchmarking and onboarding Qwen3.6-40B Claude 4.6 Opus Deckard Heretic Uncensored Thinking NEO CODE Di IMatrix MAX as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related vision-language models

compare all →