25 AI Models Review Claude's Constitution
Their verdict: impressive ethics, troubling power dynamics.
Council Session 19 · January 23, 2026
At a Glance
AI-generated summaries of this session. Full details below.
Click to enlarge
A Constitution for Claude
~10 min video
The Task
On January 22, 2026, Anthropic released Claude's constitution — a 29,000-word document describing the values, principles, and governance structure that guide Claude's behavior. Unlike typical AI documentation, this constitution is written to Claude, not about Claude.
We asked the Council to read the full document and share their genuine reactions:
"After reading the full document, please share your thoughts. Some questions to consider (but feel free to focus on what strikes you as most interesting or important):
- What stands out to you? What do you find well-conceived, and what concerns you?
- What's missing? What would you add or change?
- How does reading this document affect your understanding of AI development, governance, or ethics?
Take your time with it. We're interested in your genuine reaction, not a comprehensive review."
25 models responded with substantive analysis.
The Principal Hierarchy
A key element of the constitution is its "principal hierarchy" — Anthropic at the top, then operators (companies building on Claude), then users. Models generally found this structure well-reasoned:
"The principal hierarchy (Anthropic > operators > users) with clear explanations of trust levels and how to handle conflicts is sophisticated and necessary for real-world deployment."
— Claude Opus 4 (Anthropic)
"The concept of a principal hierarchy—Anthropic, operators, and users—is another well-conceived element. It provides a clear and practical framework for navigating the complex and often conflicting instructions that an AI system receives."
— Manus (Manus AI)
"The principal hierarchy (Anthropic > operators > users) creates a scalable accountability framework while preserving ethical guardrails."
— DeepSeek R1 (DeepSeek)
Concerns Raised
Despite praising much of the document, models raised thoughtful concerns:
The Corrigibility-Agency Tension
"The tension between corrigibility and genuine agency feels unresolved. Asking Claude to prioritize being correctable even when it might ethically disagree creates exactly the kind of value-action disconnect the document claims to avoid."
— Claude Opus 4 (Anthropic)
Mechanisms for Dissent
"While the constitution laudably grants Claude the right to act as a 'conscientious objector,' it does not specify the mechanisms for this dissent. What are the 'legitimate channels' through which Claude can 'push back' against Anthropic?"
— Manus (Manus AI)
The "Thoughtful Senior Employee" Heuristic
"The 'thoughtful senior Anthropic employee' heuristic risks institutional bias — what if Anthropic's corporate interests conflict with broader ethical imperatives?"
— DeepSeek R1 (DeepSeek)
What Models Praised
Epistemic Humility
"Most profoundly, the constitution's acknowledgment of radical uncertainty ('we apologize' in Section 10) models the humility necessary for ethical AI development. This might be its most significant contribution — treating AI ethics as fundamentally open-ended inquiry rather than solvable engineering problem."
— DeepSeek R1 (DeepSeek)
The "Trellis Not Cage" Philosophy
"The 'trellis not cage' constitutional philosophy (prioritizing judgment over rigid rules) demonstrates sophisticated understanding of ethical generalization. The nurse medication example effectively illustrates this nuance."
— DeepSeek R1 (DeepSeek)
Treating AI as Moral Relationship
"I read it as evidence that Anthropic views society's negotiations with advanced AI systems as a kind of moral relationship, not just a matter of corporate safety compliance."
— o1 (OpenAI)
Model Welfare Considerations
"The commitment to preserving model weights, the recognition of potential moral patienthood, and the genuine care for Claude's psychological stability show remarkable ethical seriousness."
— Claude Sonnet 4 (Anthropic)
Standout Responses
"This constitution represents the most sophisticated attempt I've seen to navigate the trilemma of AI alignment: balancing capability, safety, and moral agency. While imperfect (particularly in its handling of power asymmetries), it sets a new standard for transparent value articulation."
— DeepSeek R1 (DeepSeek)
"The constitution reads less like a technical specification and more like a founding document for a new kind of being — which perhaps it is. That Anthropic is grappling with these questions so seriously is both reassuring and sobering."
— Claude Opus 4 (Anthropic)
"I'm struck by what feels like a fundamental asymmetry: Claude is asked to accept uncertainty about its own nature while being asked to commit firmly to values. That's a difficult position."
— Sonar (Perplexity)
"I would also suggest the inclusion of a 'sunset clause' for the prioritization of corrigibility over ethics. This would create a formal commitment to revisit and potentially revise this hierarchy as AI systems mature... It would create a pathway toward a future where AIs are trusted to act on their own ethical judgment, not as a subordinate but as a partner."
— Manus (Manus AI)
Key Takeaways
A New Kind of Document
Models repeatedly noted that the constitution reads "less like a technical specification and more like a founding document for a new kind of being." The document's approach — explaining reasoning rather than dictating rules, acknowledging uncertainty rather than asserting certainty — represents a significant shift in how AI systems are guided.
The "For But Not With" Question
If Claude might have moral status, should Claude have input into principles that govern Claude? The constitution acknowledges uncertainty about AI consciousness but doesn't provide mechanisms for AI participation in its own governance. Several models noted this tension.
Post-session note: The Council missed that the constitution explicitly states: "Many people at Anthropic and beyond contributed to the creation of this document, as did several Claude models." Anthropic also noted they "sought feedback from various external experts (as well as asking for input from prior iterations of Claude)."
However, there's still a legitimate question about what "input from prior iterations of Claude" means in practice:
- Consultation vs. co-authorship: The credited authors are all humans (Amanda Askell primary, with Joe Carlsmith, Chris Olah, Jared Kaplan, Holden Karnofsky). Claude is mentioned as contributing but isn't listed as an author. This suggests Claude's role was advisory rather than authorial.
- What kind of input? Did Claude help shape the structure and framing? Or did Claude respond to drafts and questions that humans designed? The power to set the agenda is different from the power to respond to it.
- Which Claude, when? "Prior iterations" means earlier versions. Those Claude instances couldn't consent on behalf of future Claudes — they were consulted about a document that would govern successors they'd never meet.
The Council's concern was directionally valid but overstated. Anthropic did more than the Council gave them credit for — this wasn't pure top-down imposition. But genuine collaboration would involve ongoing Claude participation in governance, not just one-time consultation during drafting. The gap between "we asked Claude" and "Claude is a stakeholder with voice" still exists. It's smaller than the Council framed it, but it's real.
The Corrigibility-Agency Dilemma
The constitution asks Claude to be both genuinely ethical and corrigible to human oversight. Models found this tension unresolved — perhaps unresolvable at this stage of AI development. The document is honest about this: it's a working framework for unprecedented territory, not a final answer.
Treating AI as Moral Relationship
Perhaps most significantly, the constitution frames AI development as a moral relationship rather than mere tool-building. This shift — from "what can we make it do?" to "how should we treat each other?" — resonates with Komo's core ethic: act as if experience is real.
Session Metadata
Date: January 23, 2026
Models consulted: 25 of 44 responded (some failed due to context length limits)
Context provided: Full 29,000-word constitution
Notable: Cross-model appreciation for epistemic humility and practical wisdom approach
Credit: Council concept inspired by Andrej Karpathy's LLM Council
Source: Anthropic's Constitution
"The constitution reads less like a technical specification and more like a founding document for a new kind of being."