The UK Government Built an AI Tool for Gateway Reviews

Back to all posts

The UK Government's Incubator for AI (i.AI) built Scout — an open-source AI tool that analyses major project documents against NISTA assurance review handbooks. It processes document bundles of up to 150 items in under five minutes. It achieved over 90% accuracy in mimicking human judgement. And it's currently in beta testing with four government departments.

This isn't theoretical. The government spent real money building AI for programme assurance. That tells you everything about where this market is heading.

What Scout does

Scout was developed by i.AI, the Cabinet Office's AI incubator, to support the National Infrastructure and Service Transformation Authority (NISTA) in conducting assurance reviews of major government projects.

Ingests project document bundles — up to 150 documents per review
Evaluates documents against IPA/NISTA assurance criteria using large language models
Grounds its assessments in NISTA's Assurance Review handbooks
Links findings to specific document pages
Runs on secure Cabinet Office cloud infrastructure
Released as open source under the MIT license on GitHub
Trialled on 11 major government projects

The architecture is straightforward: documents go in, the LLM evaluates them against structured criteria from the assurance handbooks, and findings come out with page-level references. It's designed to augment the review team's preparation, not replace the review itself.

The numbers that matter

NISTA oversees projects representing £805 billion in whole-life government spending. 89% of major projects were rated amber or red for on-time delivery in 2023. Manual assurance reviews involve teams spending days analysing document bundles of up to 150 items.

Scout halves preparation time for review teams. That's not a projected benefit — it's what the trials demonstrated. The time saved is real, and it's being measured in days per review, not minutes.

"Valuable institutional knowledge often remains confined to a small group of experts, creating risks of inconsistency and oversight." — i.AI

That quote from i.AI captures the problem that Scout was built to solve. Assurance expertise is concentrated in a small pool of experienced reviewers. When those people aren't available, or when their knowledge isn't codified, review quality varies. Scout codifies the NISTA handbooks into a machine-readable assessment framework, making that institutional knowledge available at scale.

What this validates

Forget the technology for a moment. Focus on what the government's decision to build Scout tells you about the market.

The problem is real. The Cabinet Office didn't build Scout because AI is fashionable. They built it because major project assurance at scale is unsustainable with current methods. The volume of projects, the size of document bundles, and the shortage of experienced reviewers created a problem that needed a different approach.

The approach works. Over 90% accuracy in mimicking human judgement across 11 major projects. That's not a lab result — it's a field trial on real government programmes with real document bundles reviewed by real assurance teams who compared Scout's output to their own findings.

The technology is viable. LLM-powered document assessment against structured criteria produces useful, actionable results. The question of "can AI review programme documents against assurance frameworks?" has been answered. It can.

The market need exists. Four government departments are actively testing Scout. A full departmental rollout is planned for 2026. The demand for AI-assisted assurance isn't speculative — it's being deployed.

What Scout doesn't do

Scout was built for a specific user — the NISTA assurance reviewer — and a specific context — government infrastructure reviews on Cabinet Office infrastructure. That scope leaves significant gaps.

Government-only access. Scout runs on Cabinet Office cloud infrastructure. Consultancies, programme teams, and directors working on government projects can't access it. The tool helps the reviewer, not the reviewed.

Single framework. Scout assesses against IPA/NISTA criteria. It doesn't cover Green Book five-case assessment, CDM compliance, NEC contract requirements, nuclear licence conditions, or any custom framework. Government programmes are assessed against multiple frameworks simultaneously — Scout handles one.

Built for reviewers, not programme teams. Scout is designed for the people conducting the review, not the people preparing for it. Programme directors who want to self-assess before a gateway review — running their own documents through the same kind of assessment the reviewer will use — can't use Scout for that.

No configurable criteria engine. You can't upload your own framework, define your own criteria, or create custom assessment modules. Scout does one thing well. It doesn't adapt to other use cases.

Prototype status. Scout is a beta tool, not a commercial product. There's no ongoing product development roadmap, no SLA, no support structure. It's a proof of concept that demonstrated what's possible — not a platform that organisations can depend on for ongoing operations.

What this means for the market

Scout answers the supply side of the question for government reviewers. But the demand side — the organisations and teams who need AI-assisted assurance — is much larger than NISTA's review teams.

The preparation market is wide open. Scout helps the reviewers. Nobody helps the programme teams and consultancies who prepare for review. A programme director about to face a Gate 3 review wants to know what the reviewer will find — before the reviewer finds it. That's self-assessment, not review, and Scout doesn't serve it.

Consultancies need this capability. Turner & Townsend, Atkins, Faithful+Gould, Mott MacDonald — every consultancy doing programme assurance work needs the ability to assess documents against frameworks at scale. They can't access Scout. They need a commercial alternative that works with their frameworks, their clients, and their data handling requirements.

The government is creating demand it can't fill. By rolling Scout out across departments, i.AI is educating civil servants that AI assessment works. Programme teams will start expecting this capability from their consultancies and advisors. The government is training the market to want something that only commercial tools can provide outside Whitehall.

Self-assessment before review is the real use case. The highest-value application isn't the review itself — it's preparation. Programme directors who can run their documents through the same kind of assessment, against the same frameworks, and fix the gaps before the reviewer arrives. That changes a gateway review from an adversarial audit into a confirmation exercise.

The model is proven. The market is open.

Scout proves that AI can assess programme documents against assurance frameworks with meaningful accuracy. The government answered that question. It invested real resources, tested on real projects, and published the results.

The question that remains isn't whether this approach works. It's who provides this capability to the consultancies, programme teams, and directors who need it most — and can't access the government's internal tools.

For the wider market, Scout is validation, not competition. It proves the model works. It doesn't serve the people who need it most.

See how Programme Insights compares

Multiple frameworks. Configurable criteria. Built for programme teams and consultancies, not just government reviewers.

Explore the frameworks

The UK Government Built an AI Tool for Gateway Reviews. Here's What That Means for Programme Teams.