Making Verification Tangible

Verification is the scarce resource. So I built a tool for it.

The argument I kept making

Across the last several posts, I’ve been circling the same idea from different angles. In The Verification Gap, I described how a project manager reviews AI-generated meeting notes, catches the hallucinated deadline, corrects the misattributed action item — and nobody ever knows it happened. In When You Stop Owning Your Words, I pushed the same logic into relationships and then into enterprise accountability: the moment you remove the human from the judgment seat, the system becomes hollow, even when the outputs are correct. And in When Coding Becomes Cheap, I argued that when software production costs collapse, the thing that stays expensive is the work of proving the output can be trusted.

Every one of those posts ends in roughly the same place: generation is abundant, belief is scarce, and the tools that matter are the ones that make verification visible.

At a certain point, you either build the thing or you stop talking about it.

What I actually built

The Governance Crosswalks are a working example of the pattern I’ve been describing. The concept is straightforward: take two documents, have an AI read both of them, and produce a structured argument about how they relate — mapping sections of one to sections of the other, classifying the type of relationship, and citing the exact lines in each source that support the claim.

That’s the generation step. An AI is good at this. It can read two long documents, hold both in context, and produce hundreds of plausible connections with cited evidence, faster than any person could. For the initial deployment, I pointed it at my book, On Trust and AI, and two major governance frameworks: the NIST AI Risk Management Framework and the EU AI Act. The AI read all three, argued for every connection it could find, and backed each argument with line-level citations from both documents.

The result in the software is a chord diagram that visualizes the relationships, a set of treemaps that break down the types and directions of the connections, and an analysis panel where you can drill into any specific mapping and read the AI’s argument alongside the cited evidence.

And none of that is the interesting part.

The new constraint on production: Human Judgement

The interesting part is what happens next: the Reviewer.

Every mapping the AI produced also flows into a review interface where a human can read the argument, follow the citations back to the source text, and record a judgment. Accept or reject. Per mapping, per evidence point. Your name goes on it. Your decision is visible to the next person who opens the same mapping.

This is verification made tangible. It isn’t a vague “human in the loop” checkbox. It’s a concrete workflow: the AI argued that Chapter 3 of the book aligns with Article 9 of the EU AI Act on the topic of risk management. It cited four specific passages. You can read each one, check whether the AI’s characterization holds up, and record whether you agree. If you don’t, that disagreement is visible. If you do, your confirmation becomes part of the trust signal attached to that mapping.

The crosswalk page itself then communicates the difference. Mappings carry verification badges: grey for unreviewed, green for human-verified. The more reviewers who confirm a mapping, the higher the confidence. You can see at a glance which parts of the AI’s analysis have been checked and which are still running on blind faith.

This is the pattern I kept describing in the abstract. Now it exists as a tool you can use.

Why I think this is necessary

There is a growing category of work where AI is genuinely good at the generative step but where the output carries real consequences if it’s wrong. Governance mapping is one example: if you’re a compliance team trying to understand how your internal policies align with a new regulation, an incorrect mapping isn’t just an inconvenience. It’s a gap in your compliance posture that you won’t discover until an auditor does.

The usual approach is one of two extremes. Either you trust the AI output and ship it, which is fast but fragile. Or you throw the AI away entirely and do the comparison by hand, which is thorough but brutally slow. Neither approach captures the value of the middle path: let the AI do the heavy lifting of generation, then give humans an efficient, structured way to verify the result before it drives a business decision.

That middle path is what the crosswalk tool implements and demonstrates as a practical and usable example. The AI does the work it’s good at — reading, comparing, arguing, citing. The human does the work that only humans can do — judging whether the argument holds, deciding whether the evidence is sufficient, and putting their name on the result.

Generation is the AI’s job. Verification is yours. We need to develop new tools and types of interfaces that exist solely to make the new human bottleneck efficient and your judgment visible.

A generic pattern, not just for Governance

I built this for governance because that’s a domain I write about, and a few people were directly interested in crosswalks to some of the emerging standards. But the pattern this tool supports is domain-agnostic. Any time you have two documents and need a structured, verifiable comparison between them, this tool and process can be applied.

Think about where this shows up in practice; these are all places I would use this type of tool:

Meeting transcript against prior commitments. Your team had a planning session last quarter where specific deliverables were committed. This quarter’s project review just happened. An AI could read both transcripts, map every commitment to its corresponding discussion point in the review, and flag which ones were addressed, which were quietly dropped, and which were reinterpreted. A project lead reviews the mappings and confirms or corrects them. Now the team has a verified record of follow-through, not just two transcripts sitting in separate folders.

Sales proposal against an RFP specification. A prospect issues an RFP with 200 requirements. Your sales team produces a proposal that claims to address each one. An AI reads both, maps every proposal claim to its corresponding RFP requirement, and classifies the relationship: direct match, partial coverage, aspirational claim, or gap. A solutions architect reviews the mappings before the proposal goes out the door. The customer receives a proposal where the coverage claims have actually been verified against the spec, not just asserted.

Internal policy against regulatory text. A new regulation lands. Your legal team needs to understand which existing policies already cover the new requirements and where there are gaps. An AI reads the regulation alongside your policy library and produces a structured gap analysis with citations. A compliance officer reviews each mapping. The board receives a gap assessment that carries human verification, not just an AI-generated report.

Contract terms against statement of work. A vendor delivers a contract with terms that should reflect what was agreed in the SOW. An AI maps every contractual clause to its corresponding SOW provision and flags discrepancies. A procurement lead reviews the mappings before signing. Misalignments surface before they become disputes.

Technical documentation against implementation. An architecture document describes how a system should behave. The actual codebase or configuration has drifted. An AI reads both and maps the documented behavior to what’s actually implemented, flagging divergences. An engineer reviews the mappings. Now you have a verified assessment of documentation debt, not just a hunch that the docs are stale.

The pattern is always the same: two bodies of text, a structured comparison, cited evidence, and a human verification layer that turns an AI generation into something you can actually stand behind.

The design principle

Every design choice in the crosswalk tool follows a single rule: AI does what AI is good at, and humans get tools to amplify and speed what humans are now needed for in the new workflow.

AI is good at reading large volumes of text, holding multiple documents in context, identifying connections, and generating structured arguments with citations. It can do this at a speed and scale that no human team can match. The generation step is genuinely better when an AI does it.

Humans are good at judgment and alignment. They’re good at reading an argument and deciding whether it’s convincing to other humans. They’re good at noticing when a citation technically matches but misses the point. They’re good at catching the subtle cases where the AI’s logic is plausible but wrong or missing context — the kind of error that passes every automated check but fails the smell test of someone who actually understands the domain.

The tool doesn’t ask the human to do the AI’s job. It doesn’t present two 300-page documents and say “compare these.” It presents the AI’s work and says “here’s what the model found and why it is arguing for each point — does this hold up?” That’s a fundamentally different task. It’s faster, more focused, and it produces a trust-bearing artifact: a verified mapping that carries a human’s professional judgment and reputation, not just a model’s statistical confidence.

The goal is not to remove humans from the loop. The goal is to make the loop worth their time.

Try it

The crosswalk viewer is live. You can explore the existing governance crosswalks, see how the AI mapped my book to NIST and the EU AI Act, and drill into any connection to read the argument and the evidence.

If you want to go further, the Reviewer lets you verify the mappings yourself. Accept or reject. Add your judgment to the community’s. See where the AI got it right on the first pass and where it didn’t.

And if the pattern is useful to you beyond governance, the tool supports it. You can submit your own crosswalk comparing any two documents. Upload the sources, let the AI generate the mappings, and use the same review workflow to verify the results. The intake process is open, and so is the Agent Kit for anyone who wants to build crosswalks programmatically.

I hope people find this practical. I built it because I got tired of arguing that verification needs better tooling without actually providing any. This is my attempt: a concrete tool, a reusable pattern, and an open invitation to put it to work on whatever comparison matters to you.

If you have thoughts, suggestions, or a use case you’d like to see supported, create an account and send me a comment. I’m happy to extend or adapt the tool for anyone who wants to use it seriously. The whole point is that this pattern should be widespread, not locked up in one application.

Read the book Back to blog