The UK Government's Incubator for AI (i.AI) has published a library of 47 AI prompts for civil servants, including several specifically for programme assurance and project delivery. This is significant. The government itself is telling programme teams to use AI for document review.
For anyone working in programme assurance, this isn't a novelty. It's a signal. When the Cabinet Office publishes structured prompts that reference the Green Book, the IPA's gateway review criteria, and the NAO's assurance frameworks, they're validating that AI has a role in how major projects are assessed.
The question is whether prompts alone are enough to meet that role.
What the Knowledge Hub offers
The AI Knowledge Hub includes six project delivery prompts covering a meaningful range of assurance activities:
- Document assurance review against government standards
- Business case deliverability assessment
- Portfolio prioritisation frameworks
- Lifecycle mapping for major programmes
- Risk register analysis and gap identification
- Governance structure evaluation
These prompts reference nine government guidance sources: the Green Book, Magenta Book, Orange Book, Aqua Book, Rose Book, Teal Book, IPA guidance, GovS 002, and NAO frameworks. The depth of referencing is notable. These aren't throwaway prompts. Someone who understands programme assurance wrote them.
The assurance review prompt, for example, sets the AI up as "a senior-level programme assurance expert" and runs a three-step process: context gathering, structured analysis, and recommendations. It's designed for "any AI assistant (Microsoft Copilot or Gemini)" and explicitly targets civil servants preparing for or conducting reviews.
Why this matters
Three things are happening here simultaneously.
First, the government is validating that AI has a legitimate role in programme assurance. This isn't a startup claiming AI can review documents. It's the Cabinet Office publishing prompts that assume AI will be part of how reviews are conducted.
Second, they're educating the market. Programme teams across government are being told to experiment with AI for document review. That creates familiarity, reduces resistance, and establishes expectations about what AI-assisted assurance looks like.
Third, the frameworks referenced in these prompts define the standard. The Green Book's five-case model. The Teal Book's monitoring and evaluation approach. IPA gateway criteria. These are the benchmarks that any assessment tool — prompt-based or otherwise — should be measured against.
89% of major government projects were rated amber or red for on-time delivery in the 2023 NISTA Annual Report. That represents programmes worth a combined £805 billion in whole-life spending. The scale of the assessment challenge is not theoretical.Where the prompts fall short
The Knowledge Hub prompts are a good starting point. They are not a sufficient endpoint. Here's why.
Generic by design. The same three-step prompt structure is used for every document type. A Full Business Case gets the same treatment as a risk register. There's no framework-specific criteria decomposition — no mapping of individual Green Book requirements to specific evidence in your documents.
No persistence. Each session starts from scratch. There's no memory of what was assessed previously, no tracking of how findings change over time, no trend analysis showing whether your programme is getting more or less ready for review.
No evidence citation. ChatGPT and Copilot can summarise a document. They can identify themes. What they don't do reliably is cite specific page and paragraph references that a reviewer can trace back to the source material. In assurance, an uncited finding is an opinion.
No scoring. The prompts ask for qualitative review — narrative assessment rather than structured ratings per criterion. A gateway review team needs to know which criteria are green, which are amber, which are red, and what evidence supports each rating.
No audit trail. There's no record of what was assessed, when it was assessed, against which criteria, or what the results were. If you re-run the same prompt on the same document next week, you'll get a different response. There's no baseline to compare against.
Data security concerns. The prompts suggest using Microsoft Copilot or Google Gemini. For routine administrative documents, that may be acceptable. For sensitive programme documents — commercial terms, security assessments, cost models — pasting content into public AI tools raises significant data handling questions. Most government departments have restrictions on exactly this.
One document at a time. A real gateway review involves a document bundle of 20 to 150 items. The prompts handle one document per session. Assessing a programme's readiness requires cross-referencing findings across multiple documents — the risk register against the management case, the benefits realisation plan against the strategic case. A single-document prompt can't do that.
What good AI assessment actually looks like
The gap between a generic prompt and purpose-built assessment isn't a matter of sophistication. It's a matter of architecture.
Not a prompt — a pipeline. Document ingestion, criteria decomposition, evidence retrieval, evaluation, citation, self-critique, and report generation. Each step is distinct, verifiable, and repeatable. The output of one stage feeds the next.
Deterministic. Same input, same output, every time. If you assess a document today and re-assess it tomorrow without changes, the results should be identical. That's the foundation of any audit trail.
Evidence-cited. Every finding traced to a specific document, page, and paragraph. Not "the strategic case appears aligned." Instead: "Strategic alignment is evidenced in Full Business Case, page 14, paragraph 3, where the programme objectives are mapped to departmental priorities."
Multi-framework. Assess the same document bundle against IPA gateway criteria, Green Book five-case requirements, CDM compliance, NEC contract terms — all from the same platform, without reconfiguring or re-prompting.
Audit-trailed. Who assessed what, when, against which criteria, with what results. Every assessment logged, timestamped, and retrievable. When the review team asks "how did you arrive at this rating?" the answer is documented.
UK data residency. Documents stay in UK infrastructure. They don't pass through US cloud services. For government and regulated-sector programmes, this isn't optional — it's a requirement.
The Knowledge Hub is a starting line
The AI Knowledge Hub is a good first step. It validates the problem, educates the market, and gives programme teams a way to start experimenting with AI-assisted review. That matters.
But the gap between a generic prompt and a purpose-built assessment platform is the gap between "AI said it's fine" and traceable, evidence-based assurance. Between a single-session conversation and a persistent assessment record. Between one document at a time and a full programme review.
The sector needs both. It shouldn't confuse one for the other.
See what purpose-built AI assessment looks like
From document ingestion to board-ready reports. Evidence-cited, audit-trailed, and built for the frameworks that matter.
Explore the platform