Governance Model¶
The Problem specsmith Solves¶
AI coding agents are stateless. They don't remember what happened last session, don't know what's been tested, and don't follow consistent processes unless told to. specsmith generates the governance layer that makes AI-assisted development auditable and structured.
The Closed-Loop Workflow¶
Every AI agent action follows five steps:
- Propose — The agent describes what it wants to do, why, and what risks exist. For non-trivial changes, this is a formal proposal in LEDGER.md.
- Check — The human reviews the proposal. No execution without approval (Hard Rule H2).
- Execute — The agent implements the approved change.
- Verify — The agent runs verification tools (from the Tool Registry) and records what passed and failed.
- Record — The agent writes a ledger entry with what changed, what was tested, and what's next.
This loop ensures every change is proposed, approved, verified, and recorded.
File Hierarchy (Authority Order)¶
Every specsmith-governed project has this authority hierarchy — higher files override lower ones when they conflict:
- AGENTS.md + docs/governance/* — Highest. Governance rules are law.
- README.md — Project intent and scope.
- docs/REQUIREMENTS.md — What the system must do.
- docs/ARCHITECTURE.md — How the system is structured.
- docs/TESTS.md — How the system is verified.
- LEDGER.md — Sole authority for session state (what's been done, what's next).
AGENTS.md — The Governance Hub¶
This is the first file every AI agent reads. It contains:
- Project summary — type, language, platforms, spec version
- Governance file registry — table of modular governance files with load timing
- Authority hierarchy — the precedence order above
- Type-specific rules — tailored to the project type (e.g., patent claim rules, Rust clippy rules, legal compliance tracking)
- Quick command reference — start, resume, save, commit, sync, audit
specsmith generates AGENTS.md with type-specific rules. For example, a patent application gets rules about claim self-containment and prior art tracking, while a Rust CLI gets rules about clippy warnings and doc comments.
Modular Governance¶
When AGENTS.md is kept small (~100-150 lines), governance details are delegated to six modular files under docs/governance/:
| File | Content | When Loaded |
|---|---|---|
RULES.md |
Hard rules H1-H22, stop conditions | Every session start |
WORKFLOW.md |
Session lifecycle, proposal format, ledger format | Every session start |
ROLES.md |
Agent role boundaries, behavioral rules | Every session start |
CONTEXT-BUDGET.md |
Context management, credit optimization | Every session start |
VERIFICATION.md |
Verification standards, tools listing, acceptance criteria | When performing verification |
DRIFT-METRICS.md |
Drift detection, feedback loops, health signals | On audit or session start |
This lazy-loading approach minimizes token consumption — agents only load VERIFICATION.md when they're actually running tests, not at every session start.
LEDGER.md — The Session Memory¶
The ledger is append-only. Agents write entries here after every task:
## 2026-04-01 — Add export command
- **Proposal**: Add `specsmith export` to generate compliance reports
- **Changes**: Created exporter.py, wired CLI command, added tests
- **Verified**: 113 tests pass, lint clean, mypy clean
- **Next**: Update documentation site
This is how context persists across sessions. When an agent starts with resume, it reads the last ledger entry to know where things stand.
Requirements and Tests¶
docs/REQUIREMENTS.md uses numbered IDs:
### REQ-CLI-001
- **Description**: specsmith init scaffolds a governed project from interactive prompts or YAML config
docs/TESTS.md links tests to requirements:
### TEST-CLI-002
- **Covers**: REQ-CLI-001
- **Description**: specsmith init --config creates project from YAML
specsmith audit checks that every REQ has at least one TEST with a Covers: reference. specsmith export generates the full coverage matrix.
Drift Detection¶
specsmith audit checks six health dimensions:
- File existence — Are AGENTS.md, LEDGER.md, and recommended files present?
- REQ↔TEST coverage — Does every requirement have test coverage?
- Ledger health — Is the ledger within size limits? Are there too many open TODOs?
- Governance size — Are individual governance files within line-count thresholds?
- Tool configuration — Does the CI config reference the expected verification tools?
- Consistency — Do AGENTS.md references resolve? Are requirement IDs unique?
specsmith audit --fix auto-repairs what it can: creates missing stubs, compresses oversized ledgers, regenerates CI configs.
Hard Rules (H11 and H12)¶
Two rules were added in v0.2.3 specifically for long-running agentic workflows:
H11 — No unbounded loops or blocking I/O without a deadline
Every loop or blocking wait in agent-written scripts and automation must have an explicit deadline or iteration cap, a fallback exit path when the deadline fires, and a diagnostic message on timeout. Violating patterns include while True: / while ($true) / for (;;) with no deadline guard, I/O polling loops with no deadline, and sleep inside a loop with no termination condition.
specsmith validate enforces this by scanning .sh, .cmd, .ps1, and .bash files under scripts/ and the project root for infinite-loop patterns without a recognised deadline/timeout guard.
H12 — Platform-aware automation (updated)
Automation scripts must use the platform-appropriate shell: sh/bash on Unix and macOS, .cmd or .ps1 on Windows. Inline multi-line quoting on Windows is fragile and causes avoidable hangs.
See docs/governance/RULES.md in any governed project for the full set of H1–H22 rules and stop conditions.
Anti-Hallucination Rules (H15–H22) — OEA Framework¶
Specsmith v0.11.3+ ships eight additional governance rules derived from empirical research on AI hallucination and semantic drift in production LLM systems.
Research Background¶
The "Ontology-Epistemic-Agentic (OEA) Recursive Generative Stability" study (BitConcepts Research, 2026) identified the root causes of LLM hallucination and drift through controlled ablation experiments across multiple model families. It validated that four primary intervention categories reliably suppress hallucination in production systems:
- Epistemic calibration — expressing uncertainty proportional to evidence quality
- Scope bounding — refusing to extend claims beyond verified knowledge
- Retrieval filtering — discarding low-relevance context before injection
- Recursion guarding — capping autonomous generation chain depth
Specsmith encodes each OEA finding as an enforceable H-rule rather than a recommendation.
H15 — Epistemic Scope Bounding¶
No claims outside verified knowledge. Respond with explicit uncertainty rather than
fabricating plausible-sounding but unverified content. Checked via specsmith epistemic-audit.
H16 — Anti-Drift Recursion Guard¶
Max 5 autonomous generation steps before a human checkpoint. Recursive self-refinement loops require confidence above the project threshold. Unbound recursion is a stop condition.
H17 — Calibration Direction¶
Express uncertainty proportional to evidence quality. False confidence is a harder failure
than acknowledged uncertainty. specsmith preflight escalates when confidence falls
below escalate_threshold.
H18 — RAG Retrieval Filtering¶
External context (vector search, web, files) must pass relevance validation (similarity ≥ 0.6) before inclusion. Chunks must be tagged with source, timestamp, and confidence tier.
H19 — Synthetic Contamination Prevention¶
Synthetically generated data must never be silently mixed with real ground-truth data in
evaluation or fine-tuning pipelines. Every dataset entry must carry source_type.
H20 — Falsifiability Required¶
All factual agent claims must cite a verifiable source or be explicitly flagged as
[HYPOTHESIS]. Unflagged claims without sources are a H7 violation.
H21 — No Undisclosed Model Assumptions¶
Context window size, instruction format, tool-call support, temperature, and provider/model version must be disclosed when they affect output correctness.
H22 — Cross-Platform CI Enforcement¶
CI must run on both Linux/macOS AND Windows. Green on a single platform alone does not constitute cross-platform coverage.
YAML-First Governance (v0.12+)¶
As of specsmith v0.12 the governance authority has flipped from Markdown-primary to YAML-primary. If your project has a .specsmith/governance-mode file containing yaml, then:
docs/requirements/*.ymlanddocs/tests/*.ymlare the canonical sources — edit these, not the Markdown files.docs/REQUIREMENTS.mdanddocs/TESTS.mdare generated artifacts — they are overwritten on every sync..specsmith/requirements.jsonand.specsmith/testcases.jsonare JSON caches updated byspecsmith sync.
The sync pipeline¶
# Full pipeline: YAML → JSON cache → Markdown artifacts
specsmith sync
# Regenerate only Markdown (skip JSON rewrite)
specsmith generate docs
# Dry-run: see what would change without writing
specsmith generate docs --check
# CI gate: exits 1 if JSON cache is out of sync with YAML
specsmith sync --check
Strict schema validation¶
specsmith validate --strict # human-readable output
specsmith validate --strict --json # structured {ok, strict_errors, strict_warnings}
Enforces 8 checks: duplicate REQ IDs, duplicate TEST IDs, missing required fields, orphaned TESTs (reference non-existent REQ), untested REQs (warning), duplicate titles (warning), machine-state drift (warning). Exits 1 on errors; warnings do not block.
Domain YAML files¶
Requirements are split into domain files, each covering a logical range of REQ IDs:
| File | REQ range | Domain |
|---|---|---|
docs/requirements/governance.yml |
REQ-001..064 | Core AEE governance |
docs/requirements/agent.yml |
REQ-065..129 | Nexus + CI |
docs/requirements/harness.yml |
REQ-130..160 | Slash commands + subagents |
docs/requirements/intelligence.yml |
REQ-161..220 | Instinct, eval, memory |
docs/requirements/context.yml |
REQ-244..247 | Context window |
docs/requirements/esdb.yml |
REQ-248..262 | ESDB + skills + MCP |
docs/requirements/ai_intelligence.yml |
REQ-263..299 | AI model intelligence |
docs/requirements/yaml_governance.yml |
REQ-300..399 | YAML governance layer |
To add a new requirement, edit the appropriate domain YAML file and run specsmith sync.
Migrating from Markdown-primary¶
python scripts/migrate_governance_to_yaml.py
This idempotent script: removes duplicate REQs from REQUIREMENTS.md, re-syncs JSON, exports JSON to grouped YAML files, and writes .specsmith/governance-mode = yaml. Safe to re-run.
CI enforcement¶
The validate-strict and sync-check jobs in .github/workflows/ci.yml run on every push and PR:
- name: Validate governance schema (strict)
run: specsmith validate --strict --json
- name: Check machine state sync
run: specsmith sync --check
Both jobs block the build on failure. See YAML Governance Reference for the full API.