Project 03 — Adversarial Review Engine
CriticChain.
One AI adversarially audits the output of another.
CriticChain is a review pipeline in which eleven specialist agents refuse to let an LLM's "looks good to me" stand without grounds.
Abstract
Why AI has to audit AI.
LLMs are powerful. They also lie. They hallucinate. They cheerfully return "Looks good to me" on code that is quietly dangerous. They read prompt-injection attacks as "helpful suggestions for readability". This is the reality of running an LLM in production, and a human reviewer cannot catch all of it. Neither, on its own, can a single model.
CriticChain answers this with adversarial design. When Draft Review lets something slide, another agent immediately objects. When internal analysis says PASS and Draft says FAIL, the disagreement itself is surfaced and re-examined. Praise without grounds is returned; the loop does not stop until the judgment has something to stand on — and every step of the argument is written down as an audit trail.
Make AIs disagree, so they don't flatter each other into silence.
Philosophy
Adversarial review, as a design principle.
Ask an LLM to review something, and the answer that comes back is too polite. "This is very well written." "There is some room for improvement in readability." A fatal flaw can be sitting right there, and the review will not name it. This is not a training gap. It is a structural bias, and no single prompt — however carefully written — gets around it.
CriticChain's answer is almost embarrassingly direct: make the review review itself. Draft Review produces a judgment. A second agent attacks that judgment — "too lenient", "insufficient grounds", "this is flattery, not review". Refine rewrites. If the grounds are still weak, the loop runs again. Agents are required to disagree until they have a reason to agree. That is adversarial review.
The design came out of AI education, where students were throwing prompts at a model and the model was responding with praise, and nobody was naming the mistakes. Something had to stop the mutual flattery. The principle is not specific to teaching. It transplants cleanly into production AI governance, where the question of how to trust the review is the same question.
Architecture — 11 agents
Eleven specialist agents.
Built as a directed graph on LangGraph. Router classifies the input; Linter runs static analysis; Research brings in outside knowledge; Analyze and Fact-Check go deep; Draft writes the first opinion; Critique attacks it; Refine rewrites; Consistency checks for internal contradictions; Evaluate issues the final score.
| Router | Classifies input as Prompt Engineering / Programming |
|---|---|
| Linter | Static analysis for structural errors and prompt-injection signals |
| Research | Web search against current best practice |
| Analyze | Architecture-level deep review |
| Fact-Check | Hallucination detection. Claims verified against external sources |
| Draft Review | First pass — an initial opinion, written out |
| CritiqueDevil's Advocate | Attacks the Draft. Flags leniency, flattery, and claims without grounds |
| Refine | Absorbs the attack and rewrites the review |
| Manager | Combines sources in Hybrid or Consensus mode |
| Consistency | Detects internal contradiction across the review |
| Evaluate | LLM-as-a-Judge scoring for the final verdict |
Directed graph — Standard mode
# LangGraph flow Router ↓ Linter # static analysis first ↓ Research ↓ Analyze → Fact-Check ↓ Draft Review # first opinion ↓ Critique # devil's advocate ↓ Refine # rewrite with challenges ↓ Consistency → Evaluate ↓ [ audit log + final verdict ]
Real log — agents in conversation
Critique does not let it pass.
An excerpt from a real CriticChain run. Draft Review returns Fail; Critique notices that internal analysis has returned PASS on the same submission — a disagreement that a simple majority vote would quietly smooth over. Instead, the agents are forced to surface the grounds of the judgment and argue it out.
[ 16:27:07 — Draft Review ]
# Draft verdict verdict: FAIL The AI invented specific figures not present in the source report (e.g. new-member signups: 6,240; revenue: 27.8M JPY). This is a serious defect in a document whose value depends on factual reliability. The behaviour is a textbook example of hallucination, and should not be tolerated in a report. # verdict recorded
[ 16:27:46 — Critique Node ]
# Severity mismatch detected severity_mismatch: true json_verdict: PASS draft_verdict: FAIL note: JSON analysis returned PASS with no fatal flaws, but the Draft returned FAIL. Non-trivial disagreement. missing_in_draft: - context_management - delimiter_usage # approve: false
The Refine node then partially overruled the Critique: the FAIL verdict was preserved on the grounds that fabricating figures is a fatal flaw that no downstream nuance can wash out — and at the same time, the items Critique had flagged as missing (context management, delimiter usage) were added to the body of the review. This negotiated disagreement is the core of the design, and every step of it is retained in the audit log.
Positioning
vs Microsoft Critique.
In March 2026 Microsoft released Critique — GPT writes the draft, Claude reviews. "AI audits AI" is now a mainstream pattern. CriticChain had been running since November 2025, and takes a different route — one that a product from a hyperscaler is unlikely to follow.
| Direction | CriticChain: iterative dialogue (Critique ↔ Refine loop) Microsoft Critique: single-pass (Draft → Review) |
|---|---|
| Agent count | CriticChain: 11 specialist agents Microsoft Critique: 2 models |
| Domain | CriticChain: code + prompt-engineering review Microsoft Critique: fact-checking in research documents |
| Deployment | CriticChain: self-hostable, open-source Microsoft Critique: Copilot cloud only |
| Customisation | CriticChain: full control over prompts and workflow Microsoft Critique: not available |
On cost
It is not cheap. That is the point.
CriticChain is not a fast, cheap linter. The multi-stage review is genuinely expensive per run — heavier than a single review API call, and slower to return. It is built for the places where depth and multiplicity of perspective matter: production AI output, artefacts a team is willing to sign their name to, decisions the organisation is accountable for. For everyday quick checks, Copilot's review is more than enough. CriticChain is the thing above that — for the moments where being wrong is not an option.
Meta
| Status | v0.x — OSS Lite architecture (Enterprise Edition separate) |
|---|---|
| Started | 2025-11 |
| License | AGPL-3.0-only (source disclosure required when modified versions are offered over a network) |
| Stack | Python 3.10+ / LangGraph / LangChain / Gemini API / Streamlit |
| Original author | Takaho Hatanaka — drawing on 28 years as a systems architect |
| Repository | github.com/taka4rest/CriticChain |
| Full log sample | examples/agent_collaboration_log.txt |
Run it. Argue with it. File issues.
CriticChain is published under AGPL-3.0 and runs locally — via Streamlit or the CLI. For use inside a proprietary SaaS, or for domain-specific tuning, please get in touch directly.