feat(PBI-67): IDEA_REVIEW_PLAN — iterative multi-model plan review (#199)

* feat(ideas): upload-plan knop — short-circuit van Make-Plan AI-flow

Voegt een 'Upload plan' knop toe in idea-row-actions (verschijnt in zowel
list als idea-detail). Klik → file picker → kies .md → server-side parse +
opslaan; idea-status springt naar PLAN_READY. Vandaaruit de bestaande
'Maak PBI' knop voor materialize.

Server (uploadPlanMdAction):
- Toegestaan vanuit DRAFT, GRILLED, PLAN_FAILED, PLAN_READY
- DRAFT → skip-grill: status gaat direct naar PLAN_READY
- PLAN_READY overschrijft het bestaande plan (consistent met
  updatePlanMdAction, geen confirmation)
- Geblokkeerd in GRILLING/PLANNING (job loopt), PLANNED (al gematerialiseerd)
- Parse-failure → 422 + details (NIET opslaan, zodat een onparseerbaar plan
  nooit in de DB belandt)
- Empty / >100k chars → 422
- Schrijft IdeaLog NOTE met from_status + length
- Rate-limit + demo-guard + ownership-check via loadOwnedIdea (zelfde
  patroon als updatePlanMdAction)

UI (idea-row-actions.tsx):
- Hidden <input type=file accept=".md,.markdown,text/markdown,text/plain">
- FileReader → text → action
- Toast bij success + router.refresh()
- Blocked-tooltip in andere statussen

Tests: 10 nieuwe in __tests__/actions/ideas-crud.test.ts dekkend voor:
happy paths (DRAFT/GRILLED/PLAN_READY-overwrite/PLAN_FAILED), blocks
(PLANNED/GRILLING), validation (empty/oversized/parse-fail), 404.
Full suite groen: 849/849.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add reviews for Bootstrap-wizard plans v3.2 to v3.4

- Review v3.2: Addressed executor model, fire-and-forget issues, and PAT handling.
- Review v3.3: Improved transaction handling, stale recovery, and ID generation.
- Review v3.4: Finalized GitHub permissions, catalog versioning, and E2E verification queries.
- Updated recommendations for each version to enhance implementation readiness.

* docs(plans): M8 bootstrap-wizard upload-variant v1.4 — backtick-paden

Upload-variant van het volledige technische plan (docs/plans/M8-bootstrap-wizard.md),
bedoeld voor de "Upload plan"-functie. Genereert 1 PBI + 4 Stories + 22 Tasks
via materializeIdeaPlanAction.

v1.4-aanpassingen tov eerdere generatie-iteratie:
- Alle bestandspaden in implementation_plan in backticks (path-extractor matchen)
- Expliciete "Bestanden:" blok per task vóór de stappen
- Alle tasks op verify_required: ALIGNED_OR_PARTIAL (was deels ALIGNED — te strict
  voor ADR-stubs en multi-file edits)

Fixt forward-only: T-963 cancelled_by_self door DIVERGENT verifier-verdict.
Re-upload van dit bestand produceert tasks die door verify_task_against_plan
als ALIGNED of PARTIAL geclassificeerd kunnen worden.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* PBI-67: Add review-plan support to Idea model and job config

- Add plan_review_log and reviewed_at fields to Idea model
- Add REVIEWING_PLAN, PLAN_REVIEW_FAILED, PLAN_REVIEWED to IdeaStatus enum
- Add IDEA_REVIEW_PLAN to ClaudeJobKind enum
- Add IDEA_REVIEW_PLAN config to job-config.ts with model=opus, thinking_budget=6000
- Create migration record for schema changes (applied via db push)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* PBI-67 Phase 2: Add update-idea-plan-reviewed MCP tool

- Create src/tools/update-idea-plan-reviewed.ts: saves review-log and transitions idea status to PLAN_REVIEWED
- Add PLAN_REVIEW_RESULT to IdeaLogType enum (both repos)
- Register tool in src/index.ts
- Update Prisma schemas (both repos): add plan_review_log and reviewed_at fields to Idea model
- Add REVIEWING_PLAN, PLAN_REVIEW_FAILED, PLAN_REVIEWED to IdeaStatus enum (MCP schema)
- Add IDEA_REVIEW_PLAN to ClaudeJobKind enum (MCP schema)
- Tool includes transaction safety and convergence metrics logging

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat(PBI-67): IDEA_REVIEW_PLAN Phases 3-6 — server actions, UI components, prompt & tests

- Phase 3: startReviewPlanJobAction, cancelIdeaJobAction, status transitions
  (REVIEWING_PLAN / PLAN_REVIEWED / PLAN_REVIEW_FAILED), status colors,
  job-card/jobs-column filters, idea-list status tabs
- Phase 4: review-plan-job.md prompt (multi-model orchestration with codex
  injection + active plan revision via update_idea_plan_md after each round),
  runbook, 13 unit tests
- Phase 5: ReviewLogViewer component (rounds, convergence, approval, issues),
  idea-detail integration, proper ReviewLog TypeScript types exported from component
- Phase 6.1: wait-for-job discriminator wired (IDEA_REVIEW_PLAN), plan-revision
  step made mandatory in prompt (was previously optional/missing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Janpeter Visser 2026-05-14 01:35:02 +00:00 committed by GitHub
parent b8e22539f6
commit d84cdf664f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
28 changed files with 4387 additions and 30 deletions

View file

@ -0,0 +1,285 @@
# Review-Plan Job Orchestration
> Implementation guide for the IDEA_REVIEW_PLAN job kind and multi-model iterative plan review.
---
## Overview
The review-plan job is an autonomous agent that performs iterative multi-model review of implementation plans (YAML frontmatter + markdown documents). It coordinates three review stages (structure, logic/patterns, risk assessment), detects convergence, and either approves the plan or returns it for manual refinement.
**Job Kind:** `IDEA_REVIEW_PLAN`
**Triggerable From:** `PLAN_READY`, `PLAN_REVIEWED` (re-review)
**Transitions To:** `PLAN_REVIEWED` (approved) or `PLAN_REVIEW_FAILED` (rejected/abandoned)
---
## System Design
### Data Flow
```
User clicks "Review Plan" on PLAN_READY idea
startReviewPlanJobAction() queues IDEA_REVIEW_PLAN job
Worker claims job via wait_for_job (MCP)
Review-plan prompt orchestrates:
- Ronde 1: Structure check (YAML parsing, format correctness)
- Ronde 2: Logic & patterns (dependencies, architecture fit)
- Ronde 3: Risk assessment (edge cases, refactoring, type-safety)
Convergence detection: if stable, ask approval
On approval: update_idea_plan_reviewed(approval_status='approved')
→ Idea transitions to PLAN_REVIEWED
→ IdeaLog entry created with PLAN_REVIEW_RESULT
On rejection: return for manual edit (status → PLAN_REVIEW_FAILED)
```
### Review-Log JSON Schema
The orchestrator produces a detailed JSON log stored in `idea.plan_review_log`:
```typescript
interface ReviewLog {
plan_file: string; // Idea code (e.g., "I-042")
created_at: ISO8601; // Review start timestamp
rounds: Array<{
round: number; // 0, 1, 2 (structure, logic, risk)
model: string; // claude-3-5-haiku | claude-3-5-sonnet | claude-opus-4-7
role: string; // "Structure Review" | "Logic & Patterns" | "Risk Assessment"
focus: string; // Review focus summary
plan_before: string; // Original plan_md at round start
plan_after: string; // Revised plan after feedback
issues: Array<{
category: 'structure' | 'logic' | 'risk' | 'pattern';
severity: 'error' | 'warning' | 'info';
suggestion: string; // Concrete fix recommendation
}>;
score: number; // 0-100 review score
plan_diff_lines: number; // Changed lines in this round
converged: boolean; // Did this round trigger convergence?
timestamp: ISO8601; // Round completion time
}>;
convergence?: {
stable_at_round: number; // Round where convergence was detected
final_diff_pct: number; // Percentage of changed lines at convergence
convergence_metric: string; // "plan_stability" (constant for now)
};
approval: {
status: 'pending' | 'approved' | 'rejected';
timestamp?: ISO8601; // When user made decision
};
summary: string; // 12 sentence summary for IdeaLog
}
```
---
## Assumptions & Constraints
### Prompt Assumptions
1. **Plan Format:** Idea's `plan_md` field contains YAML frontmatter (parsed at PLAN_READY) + markdown body.
- Frontmatter keys: `pbi`, `stories`, `tasks`, `priority`, `verify_required`.
- If parse fails, orchestrator transitions idea to `PLAN_REVIEW_FAILED`.
2. **Context Availability:** The job payload includes:
- `idea.plan_md`: The plan to review (required)
- `idea.grill_md`: Context from grill phase (optional but recommended)
- `product.definition_of_done`: Product-level acceptance criteria
- `repo_url`: Local repository for pattern inspection
3. **User Availability:** At least one worker is active (server-side check via `countActiveWorkers`).
4. **No External APIs:** Orchestrator performs reviews entirely with information from job context. No external codex or multi-model APIs are called directly.
- Future improvement: Codex-injection from `docs/patterns/**/*.md` and `docs/architecture/**/*.md`.
### Convergence Detection Assumptions
1. **Stability Metric:** Two consecutive rounds with < 5% line changes = convergence.
- Threshold is hardcoded; future: make configurable per product.
- Diff percentage = `(changed_lines / total_lines) * 100`.
2. **Max Iterations:** 3 initial rounds + 2 optional extra rounds (total max 5) before forced approval.
3. **No Infinite Loops:** If max iterations reached, approval gate enforces a decision.
### Validation Assumptions
1. **Plan is Mutable:** Orchestrator can revise `plan_md` between rounds without breaking downstream parsing.
- If YAML structure is corrupted, `parsePlanMd` (server-side) will fail on approval.
- Orchestrator should never corrupt YAML syntax.
2. **IdeaLog Persistence:** MCP tool `update_idea_plan_reviewed` atomically saves:
- `idea.plan_review_log` (full JSON)
- `idea.reviewed_at` (timestamp)
- `idea.status` (transition)
- `IdeaLog` entry (audit)
3. **User Decisions are Final:** Once approved, plan-review log is immutable (until next re-review).
---
## Implementation Details
### Prompt Location
- **Main Repo:** `lib/idea-prompts/review-plan-job.md`
- **MCP Server:** `scrum4me-mcp/src/prompts/idea/review-plan.md`
- **Synchronization:** Manual (for now); future: sync-schema.sh-like mechanism.
### Job Config Snapshot
Job created with config from `lib/job-config.ts`:
```typescript
IDEA_REVIEW_PLAN: {
model: 'claude-opus-4-7', // Opus for final orchestration
thinking_budget: 6000, // Extended for multi-round analysis
permission_mode: 'acceptEdits',
max_turns: 1,
allowed_tools: [
'Read', 'Write', 'Grep', 'Glob',
'mcp__scrum4me__update_idea_plan_reviewed',
'mcp__scrum4me__log_idea_decision',
'mcp__scrum4me__update_job_status',
'mcp__scrum4me__ask_user_question',
],
}
```
**Note:** Model is fixed to Opus for orchestration. Individual review rounds are simulated (not actual model switching) within Opus's analysis. Future: Direct multi-model support via Claude API.
### MCP Tool: update_idea_plan_reviewed
**Location:** `scrum4me-mcp/src/tools/update-idea-plan-reviewed.ts`
**Input:**
```typescript
{
idea_id: string;
review_log: object; // Full ReviewLog JSON
approval_status?: 'pending' | 'approved' | 'rejected';
}
```
**Behavior:**
1. Validates user owns idea.
2. Transitions idea status:
- `approval_status='approved'``PLAN_REVIEWED`
- `approval_status='rejected'``PLAN_REVIEW_FAILED`
- Default → `PLAN_REVIEWED`
3. Saves `plan_review_log` and `reviewed_at` atomically.
4. Creates `IdeaLog` entry with type `PLAN_REVIEW_RESULT`.
---
## Dependencies
### Database
- **Idea Model:** Must have fields `plan_review_log` (Json), `reviewed_at` (DateTime).
- **IdeaStatus Enum:** Must include `REVIEWING_PLAN`, `PLAN_REVIEW_FAILED`, `PLAN_REVIEWED`.
- **IdeaLogType Enum:** Must include `PLAN_REVIEW_RESULT`.
### Server Actions
- `startReviewPlanJobAction()` — Queues job, enforces status transitions.
- `cancelIdeaJobAction()` — Allows user to cancel mid-review (reverts to `PLAN_READY`).
### MCP Tools
- `update_idea_plan_reviewed()` — Saves review-log and transitions status.
- `log_idea_decision()` — Logs convergence/approval decisions.
- `update_job_status()` — Marks job as done/failed.
- `ask_user_question()` — Approval gate interaction.
### Files
- `lib/idea-prompts/review-plan-job.md` — Orchestrator prompt.
- `scrum4me-mcp/src/prompts/idea/review-plan.md` — MCP server copy.
- `scrum4me-mcp/src/lib/kind-prompts.ts` — Prompt loader.
- `scrum4me-mcp/src/tools/wait-for-job.ts` — Job context builder.
---
## Error Handling
### Parse Failures
If `plan_md` cannot be parsed as valid YAML frontmatter:
1. Orchestrator logs error in review_log.
2. Calls `update_job_status('failed', error: 'plan_parse_failed')`.
3. Idea remains in `REVIEWING_PLAN` (no transition).
4. User can manually edit `plan_md` and retry.
### User Cancellation
If user cancels job via UI:
1. Server sets job status → `CANCELLED`.
2. Worker receives no further answer from `ask_user_question`.
3. Orchestrator gracefully saves partial review_log.
4. Calls `update_job_status('skipped', ...)`.
5. Idea reverts to `PLAN_READY`.
### Question Timeout
If approval question expires (24h):
1. Orchestrator logs timeout in review_log.
2. Calls `update_job_status('failed', error: 'approval_timeout')`.
3. Idea reverts to `PLAN_READY`.
---
## Testing Strategy
### Unit Tests
- **Mock ReviewLog Generation:** Verify review-log JSON structure matches schema.
- **Convergence Calculation:** Diff percentage computation, stability threshold.
- **Status Transitions:** Valid state machine paths (PLAN_READY → REVIEWING_PLAN → PLAN_REVIEWED).
### Integration Tests
- **End-to-End:** Draft idea → Grill → Plan → Review → PLAN_REVIEWED.
- **Re-Review:** PLAN_REVIEWED → REVIEWING_PLAN → PLAN_REVIEWED (no data loss).
- **Cancellation:** Mid-review cancellation → revert to PLAN_READY.
- **Parse Errors:** Malformed plan_md → PLAN_REVIEW_FAILED.
### Manual Testing
1. Create test idea with PLAN_READY status.
2. Click "Review Plan".
3. Monitor job in Jobs dashboard.
4. Verify review-log in idea detail page.
5. Accept/reject approval.
6. Confirm status transition and IdeaLog entry.
---
## Future Enhancements
1. **Direct Multi-Model Calls:** Use Claude API to invoke Haiku, Sonnet, Opus separately with model switching.
2. **Codex Injection:** Auto-load and inject `docs/patterns/**/*.md` and `docs/architecture/**/*.md` as context.
3. **Configurable Thresholds:** Allow product-level convergence percentage and max-rounds settings.
4. **Review History:** Preserve all review-logs for audit trail and re-review diffs.
5. **Feedback Loop:** Log user edits between review rounds and suggest re-run based on delta.
6. **Scheduled Re-Review:** Auto-trigger review after N days (staleness check).
---
## References
- `docs/architecture/jobs.md` — Job system architecture.
- `docs/patterns/server-action.md` — Server action pattern (startReviewPlanJobAction).
- `docs/api/rest-contract.md` — API surface for plan-review.
- `lib/idea-status.ts` — Status transition graph and state machine.
- `lib/idea-plan-parser.ts` — Plan YAML parsing (validator for approved plans).