Generate Content — Programme Insights User Guide

1. The Problem

Assessment tells you what is missing. But knowing the gap exists does not close it.

You run an assessment against IPA gateway criteria, Green Book requirements, or your organisation's own framework. The platform finds 14 RED and AMBER findings. Each one is specific: "No sensitivity analysis found." "Escalation procedures not defined." "Benefits not quantified."

Now what?

Today, you write the missing content from scratch. You open a blank document, look up the expected format, dig through your business case for the relevant figures, and spend two or three hours producing a sensitivity analysis table that should have been in the Financial Case from the start. Multiply that across 14 findings and you are looking at a week of remediation work before the next gateway.

Generate Content produces an 80% complete first draft in minutes. It assembles context from the criterion definition, the evidence requirements, and the data already in your uploaded documents. You get a structured draft with real figures pulled from your business case, citations back to source pages, and clear markers where you need to add information the AI could not find.

You review it, fill in the gaps, and download as Word. The platform already knows what good looks like for each criterion. Generate Content uses that knowledge to write the first draft so you can focus on the 20% that requires human judgement.

Traceable by design

Every citation in the generated content traces back to a specific page in your uploaded documents. No invented data. No hallucinated figures. If a value cannot be found, it is marked as an action for you to complete.

2. How It Works

Five steps from assessment finding to downloadable first draft.

Assessment identifies a gap. A RED or AMBER finding flags a specific deficiency — for example, "No sensitivity analysis has been identified in the uploaded documents." The finding includes the criterion reference, evidence requirements, and a recommended action.
You click "Generate Draft Content." The button appears on qualifying findings — those where the platform knows the expected structure and can produce useful output. You select your output format (structured table, narrative section, or full appendix) and confirm which uploaded documents to draw from.
The AI assembles context. The system pulls together the criterion definition, the evidence requirements, the assessment finding, relevant chunks from your uploaded documents, and the framework-specific template for that content type. This is targeted retrieval — not your entire document set, but the specific sections that contain relevant data.
A first draft appears. Generated content streams into a preview panel within 5–15 seconds. Data values cite their source document and page number. Fields where data was not found are marked [ACTION REQUIRED] with an explanation of what you need to provide. Each section carries a confidence rating.
You review, edit, and download. Check the citations against your documents. Fill in the placeholders. Verify any yellow-highlighted values where the AI flagged uncertainty. Download as Word, incorporate into your documents, and re-upload for the next assessment run.

Re-assessment loop

When you upload the updated document containing the generated content, the platform re-assesses automatically. If the content closes the gap, the finding moves from RED to GREEN. This creates a closed loop: assess, generate, fix, verify.

3. What Can Be Generated

Not every gap can be addressed with generated content. The system knows the difference.

The key distinction: can the AI produce the structure and populate it with data from your documents, or does it need information that only you possess? Each criterion in the assessment framework has a suitability rating that determines whether the Generate Content button appears and what kind of output you get.

Suitability Tiers

Tier	Definition	Examples	What You Get
HIGH	Structure is known and data can be extracted from your uploaded documents. The AI produces a populated draft.	Sensitivity analysis tables, escalation procedures, RACI matrices, benefits quantification tables, risk escalation triggers, evidence requirement checklists	70–90% complete draft with real figures, citations, and source references. You review and refine.
MEDIUM	Structure is known but significant data requires human judgement or input the AI cannot infer.	Stakeholder engagement plans, risk appetite statements, change control procedures, lessons learned summaries, benefits realisation plans	Structured template with standard content pre-filled and [ACTION REQUIRED] placeholders where you need to add information.
LOW / EXCLUDED	Gap is about missing raw data, decisions not yet made, or content requiring original research. Generating a template would be misleading.	Cost breakdown structures, environmental impact data, financial model outputs (NPV, BCR, IRR), technical specifications	The Generate button does not appear. These gaps require human work that AI cannot shortcut.

Defined per criterion, not guessed

Suitability is encoded in the assessment framework configuration by the framework author. The system does not guess whether it can generate useful content — it knows, because the suitability was defined when the framework was built. If the button appears, the content type has been validated for that criterion.

4. Example: Sensitivity Analysis

A Green Book business case is assessed. The sensitivity analysis is missing. Here is what Generate Content produces.

Criterion G3-4.7: Sensitivity Analysis — RED No sensitivity analysis has been identified in the uploaded documents. The business case presents a single-point BCR of 2.1 but does not test the robustness of this figure against changes in key assumptions. HMT Green Book requires testing of optimism bias adjustments, demand forecasts, cost estimates, and discount rate sensitivity.

The AI assembles context from the Financial Case (capital costs, operating costs, discount rate), Economic Case (benefit estimates, BCR), and HMT Green Book sensitivity analysis template. The generated output:

AI-Generated Draft — Review Required
Generated from assessment of 12 uploaded documents on 15 April 2026 | Confidence: MEDIUM

1. Key Variables Tested

Variable	Base Case	OB Adjustment	Source
Capital costs	£340M	+24% (standard infrastructure)	[Source: FBC p.23]
Operating costs	£12M/year	+15% (non-novel operations)	[Source: FBC p.31]
Demand forecast	[ACTION REQUIRED: Insert base case demand]	[ACTION REQUIRED: Apply OB uplift]	Not found in documents
Benefit values	£714M PV (30 years)	-10% to -30% range test	[Source: Economic Appraisal p.18]
Discount rate	3.5% real	Test at 1.5% and 5.0%	[Source: FBC p.24]

2. Switching Values

Variable	Switching Value	Headroom	Confidence
Capital costs	+110% (to £714M)	£374M headroom	HIGH
Benefit values	-52% (to £343M PV)	£371M headroom	HIGH
Discount rate	[ACTION REQUIRED: Requires full cashflow profile]	—	LOW

3. Scenario Analysis

Scenario	Capital Cost	Benefits	BCR	VfM Rating
Base case	£340M	£714M	2.1	High
Optimistic	£274M (-20%)	£857M (+20%)	3.1	High
Central with OB	£422M (+24%)	£714M	1.7	Medium
Pessimistic	£442M (+30%)	£500M (-30%)	1.1	Low
Worst case	£510M (+50%)	£429M (-40%)	0.84	Poor — does not represent VfM

Sources: Business Case v2.1 (pp. 23, 24, 31), Economic Appraisal (p. 18). OB adjustments from HMT Supplementary Green Book Guidance on Optimism Bias (2022). Confidence: Medium (demand forecast data not found).

5. Example: Escalation Procedure

A governance framework gap triggers a generated escalation procedure grounded in the project's terms of reference.

Criterion G3-2.4: Escalation and Exception Reporting — AMBER The governance structure identifies the Project Board and SRO but does not define escalation triggers, response timeframes, or exception reporting thresholds. The terms of reference reference "escalation as appropriate" without defining what constitutes an exception.

The AI draws on the Terms of Reference (governance roles), the Risk Management Strategy (tolerance references), and the project value (£340M) to produce a structured procedure:

AI-Generated Draft — Review Required
Confidence: MEDIUM | Financial thresholds not specified in source documents

1. Escalation Levels

Level	Escalated To	Authority	Response Time
1 — Project	Programme Director [Source: ToR p.4]	Corrective action within delegated authority	5 working days
2 — Programme Board	Project Board (Chair: SRO) [Source: ToR p.2]	Budget changes up to [ACTION REQUIRED: Insert limit]	Next board or 10 days (emergency)
3 — Sponsoring Body	[ACTION REQUIRED: Insert sponsor]	Scope changes, budget increases beyond delegated authority	15 working days
4 — Portfolio / Minister	[ACTION REQUIRED: Insert authority]	Reset or closure decisions	As required

2. Exception Triggers

Category	Amber Trigger (Level 1–2)	Red Trigger (Level 3–4)
Cost	Forecast outturn exceeds budget by >5%	Forecast outturn exceeds budget by >10%
Schedule	Critical path slippage >4 weeks	Key delivery date at risk or slippage >8 weeks
Benefits	Realisation <90% of forecast at review	Realisation <75% or strategic benefit unachievable
Risk	New residual rating ≥16 (4×4 matrix)	Risk materialised, impact exceeds contingency
Quality	Deliverable rejected at quality gate	Repeated failures indicating systemic capability gap

3. Exception Report Format

Each exception report must include: (1) Exception description, (2) Impact on cost, schedule, benefits, risk, (3) Minimum 3 options including "do nothing", (4) Recommendation with rationale, (5) Specific decision required.

Sources: Terms of Reference (pp. 2, 4), Risk Management Strategy. Confidence: Medium (financial thresholds not specified in source documents — review all triggers against programme risk appetite).

6. Example: Benefits Quantification

Four of six benefits are not monetised. Generate Content produces a structured table mixing extracted data with clear placeholders.

Criterion G3-3.1: Benefits Quantification and Monetisation — AMBER The Strategic Case identifies 6 benefits but only 2 are monetised. The remaining 4 are described qualitatively without monetisation methodology, proxy values, or confidence levels. HMT Green Book requires all benefits to be quantified where possible, with non-monetised benefits justified.

AI-Generated Draft — Review Required
Confidence: MEDIUM | 4 of 6 benefits require human monetisation input

Benefits Quantification Table

#	Benefit	Value (PV)	Methodology	Confidence
B1	Reduced journey times	£412M [Source: Econ. Appraisal p.12]	DfT WebTAG values of time × forecast demand	MED
B2	Construction employment	£302M [Source: Econ. Appraisal p.15]	ONS multiplier applied to construction spend	MED
B3	Improved air quality	[ACTION REQUIRED]	Recommend: DEFRA air quality damage costs (2024 values)	—
B4	Reduced carbon emissions	[ACTION REQUIRED]	Recommend: BEIS traded/non-traded carbon values	—
B5	Community connectivity	Non-monetised	[ACTION REQUIRED: Justify per Green Book para 5.14]	LOW
B6	Skills development	Non-monetised	[ACTION REQUIRED: Consider lifetime earnings uplift]	LOW

Totals: Monetised: £714M PV (B1 + B2) [Source: Econ. Appraisal p.18]. To be monetised: B3, B4. Non-monetised (justify): B5, B6.

Sources: Economic Appraisal (pp. 12, 15, 18), Strategic Case benefit descriptions. Confidence: Medium.

Notice the pattern: cells with extracted data carry teal citations. Cells requiring your input carry red ACTION REQUIRED markers. The methodology column recommends the appropriate HMT valuation approach even where the value itself must come from you.

7. Understanding Confidence Scores

Every piece of generated content tells you how much it trusts its own output. Here is how to read the signals.

Section-Level Confidence

Each generated section carries an overall confidence rating based on how much source data was available and how reliably it could be extracted.

High Confidence

Data was found in your documents and verified across multiple sources. Figures are directly extracted, not inferred. The structure matches the framework template exactly.

Your action: Verify the figures match your latest version of the source document. Spot-check two or three citations.

Medium Confidence

Data was found but not independently verified across documents. Some values are extracted from a single source. The structure is correct but some cells rely on a single reference.

Your action: Check each cited value against the source. Confirm the context has not changed since the document was written.

Low Confidence

Values are inferred rather than explicitly stated in your documents. The AI has made a reasonable interpretation but it may not reflect your intent.

Your action: Treat these values as suggestions. Replace with verified data before using in any formal submission.

Field-Level Highlights

Even within a High-confidence section, individual values can carry uncertainty. These are highlighted in yellow in the generated output. A yellow highlight means the AI found something relevant but is not certain it extracted the right figure.

Common causes of yellow highlights:

Value appears in a table that was OCR-processed from a scanned PDF — possible digit errors
Multiple conflicting values found across documents (e.g., different cost estimates in different versions)
Value was inferred from narrative text rather than extracted from a structured table
Unit or currency not explicitly stated in the source — the AI assumed £ sterling

Do not skip yellow highlights

A High-confidence section can still contain one or two Low-confidence values. The yellow highlight is your signal to check that specific cell before using the draft in a formal submission. Every yellow-highlighted value includes the source reference so you can verify it directly.

8. Provenance and Auditability

Every generation is fully traceable. If content is questioned during an audit, the full chain is reconstructable.

Programme documentation goes through gateway reviews, NAO scrutiny, and public accountability processes. AI-generated content in this context requires a complete audit trail. Programme Insights records every step of the generation process.

What Gets Recorded

Element	What Is Captured
Trigger	Which finding triggered the generation, the criterion code, the RAG rating, and the assessment run ID
Context assembled	The exact document chunks retrieved, their source documents and page numbers, the search queries used to find them, and the total token count
Prompt sent	The full system prompt including criterion definition, evidence requirements, framework template, and all retrieved context — the complete input to the AI
Model version	The specific AI model and version used, with a timestamp of the generation
Raw output	The unedited AI output exactly as generated, before any user modifications
User's final version	If the user edits and saves the content, the final version is stored alongside the original, with a diff showing what was changed

Why This Matters

If a reviewer or auditor asks "where did this sensitivity analysis come from?", the answer is fully documented:

The finding that triggered it (criterion G3-4.7, rated RED, from assessment run #47)
The source data used (Financial Case pp. 23, 24, 31; Economic Appraisal p. 18)
The AI model and exact prompt that produced it
The raw AI output and the human-edited final version
Who generated it, when, and what edits were made

This is not an afterthought. The provenance system is designed for environments where generated content may be scrutinised by the IPA, the NAO, or a parliamentary committee.

See it in action

A full provenance walkthrough is available at programmeinsights.co.uk/mockup-generate-provenance.html

9. The Feedback Loop

Generated content improves over time because the system learns from your edits.

When you edit a generated draft and save it, the platform captures the difference between the AI's version and yours. This is not abstract machine learning — it is a concrete record of what you changed and why:

Structural changes — did you reorder sections, add rows to a table, or remove a column? This tells the system the template needs adjusting for your context.
Value corrections — did you change a figure the AI extracted? This flags a potential extraction error for that document format.
Tone and style — did you rewrite narrative sections in a different register? This teaches the system your organisation's preferred language.
Placeholder resolution — what did you fill in for each ACTION REQUIRED marker? Over time, this builds a picture of what data your organisation typically provides.

What This Means in Practice

If your organisation consistently changes "5 working days" to "3 working days" for Level 1 escalation response times, future generations for your projects will default to 3 working days. If you always add a "Dependencies" column to RACI matrices, future RACI generations will include it.

The learning is scoped to your organisation. Your edits improve your future generations. They do not affect other users.

First generation, best generation

The more you use Generate Content on a project, the better the output becomes. The first draft on your first project is solid but generic. By your third project, the system has learned your preferences and the output is noticeably closer to your standards.

10. Getting Started

From assessment findings to first draft in seven steps.

Run an assessment on your project documents. Upload your business case, governance documents, and supporting material. Select the appropriate module (IPA Gateway, Green Book, NEC, or your custom framework) and run the assessment.
Review the RED and AMBER findings. Each finding tells you exactly what is missing and what the framework expects. Focus on the criteria rated RED first — these are the gaps that will fail a gateway review.
Look for the "Generate Draft Content" button. It appears on qualifying findings where the platform can produce useful output. If the button is not there, the gap requires data that only you can provide — no amount of AI will help.
Click it and choose your output format. Select structured table, narrative section, or full appendix depending on what the criterion requires. Confirm which uploaded documents to draw from — the platform pre-selects the most relevant ones.
Review the draft. Check the teal citations against your source documents. Fill in every [ACTION REQUIRED] placeholder. Verify any yellow-highlighted values where the AI flagged uncertainty.
Download as Word and incorporate into your project documents. Edit as needed — this is a first draft, not a final submission. Your domain expertise turns the 80% draft into a 100% deliverable.
Re-upload the updated document. The platform re-assesses automatically. If the generated content closes the gap, the finding moves from RED or AMBER to GREEN. You have a closed loop: assess, generate, fix, verify.

Questions?

Contact us at support@programmeinsights.co.uk or visit programmeinsights.co.uk/help for documentation, walkthroughs, and framework guidance.

Related Guides

User Guide — Full platform walkthrough
Assessment Guide — Running and interpreting assessments
Custom Criteria Guide — Building your own framework