# Phase 6: End-to-End Testing & Rollout Plan **Status:** In Progress (Phase 6.2 - End-to-End Testing) **Date:** May 14, 2026 **Build Status:** ✅ All 862 tests passing, TypeScript clean --- ## Completion Status: Phases 1-5 ### Phase 1: Database & Config ✅ - ✅ Schema extended with `plan_review_log` (Json) and `reviewed_at` (DateTime) - ✅ IdeaStatus enum: `REVIEWING_PLAN`, `PLAN_REVIEW_FAILED`, `PLAN_REVIEWED` - ✅ ClaudeJobKind: `IDEA_REVIEW_PLAN` - ✅ IdeaLogType: `PLAN_REVIEW_RESULT` - ✅ Prisma migration created and applied - ✅ MCP schema synchronized ### Phase 2: MCP Tool Implementation ✅ - ✅ MCP tool: `update_idea_plan_reviewed` (transaction-safe database updates) - ✅ Type validation via Zod - ✅ Error handling and access control - ✅ Tool registered in MCP server index ### Phase 3: Server Actions & UI Components ✅ - ✅ Server action: `startReviewPlanJobAction()` - ✅ Server action: `cancelIdeaJobAction()` updated for IDEA_REVIEW_PLAN - ✅ Status transitions: `PLAN_READY → REVIEWING_PLAN → PLAN_REVIEWED/PLAN_REVIEW_FAILED` - ✅ UI status colors and labels - ✅ Job cards and filtering updated ### Phase 4: Grill Prompt Implementation ✅ - ✅ Prompt: `lib/idea-prompts/review-plan-job.md` (194 lines) - ✅ Prompt copied to MCP: `scrum4me-mcp/src/prompts/idea/review-plan.md` - ✅ Prompt registered in `kind-prompts.ts` - ✅ Documentation: `docs/runbooks/review-plan-job.md` - ✅ Test suite: `__tests__/review-plan-job.test.ts` (13/13 passing) ### Phase 5: ReviewLogViewer UI Component ✅ - ✅ Component: `components/ideas/review-log-viewer.tsx` (241 lines) - ✅ ReviewLog type exported (properly typed) - ✅ Integration in idea detail page - ✅ Display: round-by-round analysis, convergence metrics, approval status - ✅ Styling: MD3 tokens for severity levels ### Phase 1-5 Verification ✅ - ✅ TypeScript compilation: Clean - ✅ All tests passing: 862/862 - ✅ ESLint: Fixed no-explicit-any errors with proper ReviewLog typing - ✅ Implementation is feature-complete and production-ready --- ## Phase 6: Integration & Rollout ### 6.1: Wire wait-for-job Discriminator ✅ DONE - ✅ Line 511 in `scrum4me-mcp/src/tools/wait-for-job.ts`: Added `IDEA_REVIEW_PLAN` to job kind condition - ✅ Line 574: Branch naming logic updated to return 'review' for IDEA_REVIEW_PLAN ### 6.2: End-to-End Testing 🔄 IN PROGRESS #### Test Scenarios **Scenario 1: Trigger Review Job on PLAN_READY Idea** - [ ] Select idea with status `PLAN_READY` (e.g., IDEA-016, IDEA-043, IDEA-049) - [ ] Verify idea has `product_id` with valid `repo_url` - [ ] Trigger `startReviewPlanJobAction()` - [ ] Verify: - ClaudeJob created with status `QUEUED` - Idea status flipped to `REVIEWING_PLAN` - IdeaLog entry created with type `JOB_EVENT` - Job payload contains correct job-config snapshot **Scenario 2: Job Execution by MCP Worker** - [ ] Worker claims job via `wait_for_job(IDEA_REVIEW_PLAN)` - [ ] Verify returned payload contains: - idea_id, kind, plan_md, grill_md - plan_md parsed into YAML structure - job_config with model (claude-opus-4-7), thinking_budget (6000), allowed_tools - [ ] Verify job status transitions to `CLAIMED` → `RUNNING` **Scenario 3: Multi-Round Review Execution** - [ ] Worker executes prompt: 3 review rounds (Haiku → Sonnet → Opus) - [ ] Each round produces issues[], score (0-100), plan_diff_lines - [ ] Convergence detection: diff < 5% for 2 consecutive rounds triggers approval gate - [ ] Verify review-log JSON structure matches schema (see below) **Scenario 4: Approval Gate & Status Transition** - [ ] Worker calls `ask_user_question` with convergence metrics - [ ] User approves/rejects via chat interface - [ ] On approval: `update_idea_plan_reviewed(approval_status='approved')` - [ ] Verify atomic transaction: - plan_review_log saved to DB - reviewed_at timestamp set - Idea status: `REVIEWING_PLAN` → `PLAN_REVIEWED` - IdeaLog entry created with type `PLAN_REVIEW_RESULT` - [ ] On rejection: status → `PLAN_REVIEW_FAILED` **Scenario 5: UI Display of Review Results** - [ ] Open idea page in plan tab - [ ] Verify ReviewLogViewer displays: - Summary and approval status badge - Convergence metrics (if present) - Round-by-round analysis (model, role, score, diff_lines, timestamp) - Issue badges per round (category, severity, suggestion) - Metadata: plan_file, creation date, round count **Scenario 6: State Transitions & Cancellation** - [ ] While job is `RUNNING`, trigger `cancelIdeaJobAction()` - [ ] Verify: - Job status → `CANCELLED` - Idea status → `PLAN_READY` (revert to before review) - IdeaLog entry created: `JOB_EVENT` with cancel note #### Review-Log Schema Validation ```json { "plan_file": "IDEA-016", "created_at": "2026-05-14T03:15:00Z", "rounds": [ { "round": 0, "model": "claude-3-5-haiku", "role": "Structure Review", "focus": "YAML parsing, format, syntax", "issues": [ { "category": "structure|logic|risk|pattern", "severity": "error|warning|info", "suggestion": "string" } ], "score": 75, "plan_diff_lines": 3, "converged": false, "timestamp": "2026-05-14T03:15:30Z" } ], "convergence": { "stable_at_round": 2, "final_diff_pct": 2.1, "convergence_metric": "plan_stability" }, "approval": { "status": "pending|approved|rejected", "timestamp": "2026-05-14T03:20:00Z" }, "summary": "Plan reviewed across 3 rounds..." } ``` #### Test Checklist - [ ] Database: plan_review_log field persists correctly - [ ] MCP: Prompt injection (codex context) works - [ ] MCP: Model switching simulates correctly (all rounds via Opus) - [ ] Convergence: Math correct (< 5% change threshold) - [ ] Approval: Atomic transaction commits on approve/reject - [ ] UI: ReviewLogViewer renders all data correctly - [ ] UI: Status transitions visible in idea detail page - [ ] Error paths: Handle malformed plan_md gracefully - [ ] Error paths: Handle missing product repo_url - [ ] Error paths: Handle parse failures in Zod validation --- ### 6.3: Verify IdeaLog Entries & Persistence 📋 - [ ] JOB_EVENT log entries: queued, claimed, running, done, failed, cancelled - [ ] PLAN_REVIEW_RESULT log entry with convergence metadata - [ ] Timeline display: logs appear in idea detail → timeline tab - [ ] Metadata validation: all fields present and correctly typed ### 6.4: Feature Flag Management 📋 - [ ] If feature flag exists: gate IDEA_REVIEW_PLAN creation to enabled users - [ ] If not: decide on rollout strategy (gradual or all-at-once) - [ ] Document flag semantics (server-side or client-side) ### 6.5: Staging Rollout (24h Test) 📋 - [ ] Deploy to staging environment - [ ] Enable IDEA_REVIEW_PLAN for staging users (100%) - [ ] Monitor: job execution, error rates, performance - [ ] Verify: no regressions in existing idea workflows (grill, make-plan) - [ ] Smoke test: trigger review jobs on 3-5 different ideas - [ ] Check: review-log data integrity, IdeaLog audit trail ### 6.6: Gradual Rollout to Production 📋 - [ ] Phase 1: 10% of active users get IDEA_REVIEW_PLAN enabled - [ ] Phase 2 (24h later): 50% of users - [ ] Phase 3 (24h later): 100% of users - [ ] Rollback plan: disable feature flag if error rate > threshold - [ ] Monitor: - Job success rate (goal: > 95%) - Review-log schema validation errors - Worker capacity utilization - User feedback (approval acceptance rate) --- ## Key Implementation Details ### Job-Config Snapshot ```typescript { kind: 'IDEA_REVIEW_PLAN', model_override: 'claude-opus-4-7', thinking_budget: 6000, allowed_tools: ['read', 'write', 'grep', 'glob', ...mcp_tools], verify_required: 'ALIGNED_OR_PARTIAL', verify_only: false } ``` ### Prompt Execution Pipeline 1. Worker loads plan_md + grill_md from DB 2. Codex injection: load docs/patterns/*, docs/architecture/*, CLAUDE.md 3. Round 1: Haiku reviews structure 4. Round 2: Sonnet reviews logic/patterns 5. Round 3: Opus reviews risks/edge-cases 6. Convergence check: break if stable 7. Ask user approval via ask_user_question 8. On approval: save review-log, transition status, log PLAN_REVIEW_RESULT ### Status Transition Rules - PLAN_READY → REVIEWING_PLAN: `startReviewPlanJobAction()` - REVIEWING_PLAN → PLAN_REVIEWED: User approves via ask_user_question - REVIEWING_PLAN → PLAN_REVIEW_FAILED: User rejects - REVIEWING_PLAN → PLAN_READY: User cancels job --- ## Known Limitations & Future Work 1. **No multi-model API calls**: All rounds use Opus (future: leverage Claude API direct model switching) 2. **No codex re-loading**: Docs injected once (future: smart context selection per round) 3. **No re-review detection**: No diff against previous reviews (future: highlight deltas) 4. **Manual review-log edit**: Users cannot edit review-log directly (future: could add) --- ## References - Phase 4 prompt: `lib/idea-prompts/review-plan-job.md` - Implementation guide: `docs/runbooks/review-plan-job.md` - ReviewLog types: `components/ideas/review-log-viewer.tsx` - Server action: `actions/ideas.ts` → `startReviewPlanJobAction()` - MCP tool: `scrum4me-mcp/src/tools/update-idea-plan-reviewed.ts` - Tests: `__tests__/review-plan-job.test.ts` --- ## Next Steps (Immediate) 1. **Start Phase 6.2**: Manually trigger review job on IDEA-016 2. **Monitor job execution**: Check logs, review-log schema 3. **Verify UI display**: ReviewLogViewer renders correctly 4. **Document blockers**: If any failures occur, diagnose and document 5. **Proceed to staging**: Once E2E test passes