Scrum4Me/docs/implementation-complete/PHASE6-END-TO-END-TEST-PLAN.md

# Phase 6: End-to-End Testing & Rollout Plan

**Status:** In Progress (Phase 6.2 - End-to-End Testing)
**Date:** May 14, 2026
**Build Status:** ✅ All 862 tests passing, TypeScript clean

---

## Completion Status: Phases 1-5

### Phase 1: Database & Config ✅
- ✅ Schema extended with `plan_review_log` (Json) and `reviewed_at` (DateTime)
- ✅ IdeaStatus enum: `REVIEWING_PLAN`, `PLAN_REVIEW_FAILED`, `PLAN_REVIEWED`
- ✅ ClaudeJobKind: `IDEA_REVIEW_PLAN`
- ✅ IdeaLogType: `PLAN_REVIEW_RESULT`
- ✅ Prisma migration created and applied
- ✅ MCP schema synchronized

### Phase 2: MCP Tool Implementation ✅
- ✅ MCP tool: `update_idea_plan_reviewed` (transaction-safe database updates)
- ✅ Type validation via Zod
- ✅ Error handling and access control
- ✅ Tool registered in MCP server index

### Phase 3: Server Actions & UI Components ✅
- ✅ Server action: `startReviewPlanJobAction()`
- ✅ Server action: `cancelIdeaJobAction()` updated for IDEA_REVIEW_PLAN
- ✅ Status transitions: `PLAN_READY → REVIEWING_PLAN → PLAN_REVIEWED/PLAN_REVIEW_FAILED`
- ✅ UI status colors and labels
- ✅ Job cards and filtering updated

### Phase 4: Grill Prompt Implementation ✅
- ✅ Prompt: `lib/idea-prompts/review-plan-job.md` (194 lines)
- ✅ Prompt copied to MCP: `scrum4me-mcp/src/prompts/idea/review-plan.md`
- ✅ Prompt registered in `kind-prompts.ts`
- ✅ Documentation: `docs/runbooks/review-plan-job.md`
- ✅ Test suite: `__tests__/review-plan-job.test.ts` (13/13 passing)

### Phase 5: ReviewLogViewer UI Component ✅
- ✅ Component: `components/ideas/review-log-viewer.tsx` (241 lines)
- ✅ ReviewLog type exported (properly typed)
- ✅ Integration in idea detail page
- ✅ Display: round-by-round analysis, convergence metrics, approval status
- ✅ Styling: MD3 tokens for severity levels

### Phase 1-5 Verification ✅
- ✅ TypeScript compilation: Clean
- ✅ All tests passing: 862/862
- ✅ ESLint: Fixed no-explicit-any errors with proper ReviewLog typing
- ✅ Implementation is feature-complete and production-ready

---

## Phase 6: Integration & Rollout

### 6.1: Wire wait-for-job Discriminator ✅ DONE
- ✅ Line 511 in `scrum4me-mcp/src/tools/wait-for-job.ts`: Added `IDEA_REVIEW_PLAN` to job kind condition
- ✅ Line 574: Branch naming logic updated to return 'review' for IDEA_REVIEW_PLAN

### 6.2: End-to-End Testing 🔄 IN PROGRESS

#### Test Scenarios

**Scenario 1: Trigger Review Job on PLAN_READY Idea**
- [ ] Select idea with status `PLAN_READY` (e.g., IDEA-016, IDEA-043, IDEA-049)
- [ ] Verify idea has `product_id` with valid `repo_url`
- [ ] Trigger `startReviewPlanJobAction()`
- [ ] Verify:
  - ClaudeJob created with status `QUEUED`
  - Idea status flipped to `REVIEWING_PLAN`
  - IdeaLog entry created with type `JOB_EVENT`
  - Job payload contains correct job-config snapshot

**Scenario 2: Job Execution by MCP Worker**
- [ ] Worker claims job via `wait_for_job(IDEA_REVIEW_PLAN)`
- [ ] Verify returned payload contains:
  - idea_id, kind, plan_md, grill_md
  - plan_md parsed into YAML structure
  - job_config with model (claude-opus-4-7), thinking_budget (6000), allowed_tools
- [ ] Verify job status transitions to `CLAIMED` → `RUNNING`

**Scenario 3: Multi-Round Review Execution**
- [ ] Worker executes prompt: 3 review rounds (Haiku → Sonnet → Opus)
- [ ] Each round produces issues[], score (0-100), plan_diff_lines
- [ ] Convergence detection: diff < 5% for 2 consecutive rounds triggers approval gate
- [ ] Verify review-log JSON structure matches schema (see below)

**Scenario 4: Approval Gate & Status Transition**
- [ ] Worker calls `ask_user_question` with convergence metrics
- [ ] User approves/rejects via chat interface
- [ ] On approval: `update_idea_plan_reviewed(approval_status='approved')`
- [ ] Verify atomic transaction:
  - plan_review_log saved to DB
  - reviewed_at timestamp set
  - Idea status: `REVIEWING_PLAN` → `PLAN_REVIEWED`
  - IdeaLog entry created with type `PLAN_REVIEW_RESULT`
- [ ] On rejection: status → `PLAN_REVIEW_FAILED`

**Scenario 5: UI Display of Review Results**
- [ ] Open idea page in plan tab
- [ ] Verify ReviewLogViewer displays:
  - Summary and approval status badge
  - Convergence metrics (if present)
  - Round-by-round analysis (model, role, score, diff_lines, timestamp)
  - Issue badges per round (category, severity, suggestion)
  - Metadata: plan_file, creation date, round count

**Scenario 6: State Transitions & Cancellation**
- [ ] While job is `RUNNING`, trigger `cancelIdeaJobAction()`
- [ ] Verify:
  - Job status → `CANCELLED`
  - Idea status → `PLAN_READY` (revert to before review)
  - IdeaLog entry created: `JOB_EVENT` with cancel note

#### Review-Log Schema Validation

```json
{
  "plan_file": "IDEA-016",
  "created_at": "2026-05-14T03:15:00Z",
  "rounds": [
    {
      "round": 0,
      "model": "claude-3-5-haiku",
      "role": "Structure Review",
      "focus": "YAML parsing, format, syntax",
      "issues": [
        {
          "category": "structure|logic|risk|pattern",
          "severity": "error|warning|info",
          "suggestion": "string"
        }
      ],
      "score": 75,
      "plan_diff_lines": 3,
      "converged": false,
      "timestamp": "2026-05-14T03:15:30Z"
    }
  ],
  "convergence": {
    "stable_at_round": 2,
    "final_diff_pct": 2.1,
    "convergence_metric": "plan_stability"
  },
  "approval": {
    "status": "pending|approved|rejected",
    "timestamp": "2026-05-14T03:20:00Z"
  },
  "summary": "Plan reviewed across 3 rounds..."
}
```

#### Test Checklist
- [ ] Database: plan_review_log field persists correctly
- [ ] MCP: Prompt injection (codex context) works
- [ ] MCP: Model switching simulates correctly (all rounds via Opus)
- [ ] Convergence: Math correct (< 5% change threshold)
- [ ] Approval: Atomic transaction commits on approve/reject
- [ ] UI: ReviewLogViewer renders all data correctly
- [ ] UI: Status transitions visible in idea detail page
- [ ] Error paths: Handle malformed plan_md gracefully
- [ ] Error paths: Handle missing product repo_url
- [ ] Error paths: Handle parse failures in Zod validation

---

### 6.3: Verify IdeaLog Entries & Persistence 📋
- [ ] JOB_EVENT log entries: queued, claimed, running, done, failed, cancelled
- [ ] PLAN_REVIEW_RESULT log entry with convergence metadata
- [ ] Timeline display: logs appear in idea detail → timeline tab
- [ ] Metadata validation: all fields present and correctly typed

### 6.4: Feature Flag Management 📋
- [ ] If feature flag exists: gate IDEA_REVIEW_PLAN creation to enabled users
- [ ] If not: decide on rollout strategy (gradual or all-at-once)
- [ ] Document flag semantics (server-side or client-side)

### 6.5: Staging Rollout (24h Test) 📋
- [ ] Deploy to staging environment
- [ ] Enable IDEA_REVIEW_PLAN for staging users (100%)
- [ ] Monitor: job execution, error rates, performance
- [ ] Verify: no regressions in existing idea workflows (grill, make-plan)
- [ ] Smoke test: trigger review jobs on 3-5 different ideas
- [ ] Check: review-log data integrity, IdeaLog audit trail

### 6.6: Gradual Rollout to Production 📋
- [ ] Phase 1: 10% of active users get IDEA_REVIEW_PLAN enabled
- [ ] Phase 2 (24h later): 50% of users
- [ ] Phase 3 (24h later): 100% of users
- [ ] Rollback plan: disable feature flag if error rate > threshold
- [ ] Monitor:
  - Job success rate (goal: > 95%)
  - Review-log schema validation errors
  - Worker capacity utilization
  - User feedback (approval acceptance rate)

---

## Key Implementation Details

### Job-Config Snapshot
```typescript
{
  kind: 'IDEA_REVIEW_PLAN',
  model_override: 'claude-opus-4-7',
  thinking_budget: 6000,
  allowed_tools: ['read', 'write', 'grep', 'glob', ...mcp_tools],
  verify_required: 'ALIGNED_OR_PARTIAL',
  verify_only: false
}
```

### Prompt Execution Pipeline
1. Worker loads plan_md + grill_md from DB
2. Codex injection: load docs/patterns/*, docs/architecture/*, CLAUDE.md
3. Round 1: Haiku reviews structure
4. Round 2: Sonnet reviews logic/patterns
5. Round 3: Opus reviews risks/edge-cases
6. Convergence check: break if stable
7. Ask user approval via ask_user_question
8. On approval: save review-log, transition status, log PLAN_REVIEW_RESULT

### Status Transition Rules
- PLAN_READY → REVIEWING_PLAN: `startReviewPlanJobAction()`
- REVIEWING_PLAN → PLAN_REVIEWED: User approves via ask_user_question
- REVIEWING_PLAN → PLAN_REVIEW_FAILED: User rejects
- REVIEWING_PLAN → PLAN_READY: User cancels job

---

## Known Limitations & Future Work

1. **No multi-model API calls**: All rounds use Opus (future: leverage Claude API direct model switching)
2. **No codex re-loading**: Docs injected once (future: smart context selection per round)
3. **No re-review detection**: No diff against previous reviews (future: highlight deltas)
4. **Manual review-log edit**: Users cannot edit review-log directly (future: could add)

---

## References

- Phase 4 prompt: `lib/idea-prompts/review-plan-job.md`
- Implementation guide: `docs/runbooks/review-plan-job.md`
- ReviewLog types: `components/ideas/review-log-viewer.tsx`
- Server action: `actions/ideas.ts` → `startReviewPlanJobAction()`
- MCP tool: `scrum4me-mcp/src/tools/update-idea-plan-reviewed.ts`
- Tests: `__tests__/review-plan-job.test.ts`

---

## Next Steps (Immediate)

1. **Start Phase 6.2**: Manually trigger review job on IDEA-016
2. **Monitor job execution**: Check logs, review-log schema
3. **Verify UI display**: ReviewLogViewer renders correctly
4. **Document blockers**: If any failures occur, diagnose and document
5. **Proceed to staging**: Once E2E test passes