Scrum4Me/docs/implementation-complete/PHASE6-END-TO-END-TEST-PLAN.md
Janpeter Visser d84cdf664f
feat(PBI-67): IDEA_REVIEW_PLAN — iterative multi-model plan review (#199)
* feat(ideas): upload-plan knop — short-circuit van Make-Plan AI-flow

Voegt een 'Upload plan' knop toe in idea-row-actions (verschijnt in zowel
list als idea-detail). Klik → file picker → kies .md → server-side parse +
opslaan; idea-status springt naar PLAN_READY. Vandaaruit de bestaande
'Maak PBI' knop voor materialize.

Server (uploadPlanMdAction):
- Toegestaan vanuit DRAFT, GRILLED, PLAN_FAILED, PLAN_READY
- DRAFT → skip-grill: status gaat direct naar PLAN_READY
- PLAN_READY overschrijft het bestaande plan (consistent met
  updatePlanMdAction, geen confirmation)
- Geblokkeerd in GRILLING/PLANNING (job loopt), PLANNED (al gematerialiseerd)
- Parse-failure → 422 + details (NIET opslaan, zodat een onparseerbaar plan
  nooit in de DB belandt)
- Empty / >100k chars → 422
- Schrijft IdeaLog NOTE met from_status + length
- Rate-limit + demo-guard + ownership-check via loadOwnedIdea (zelfde
  patroon als updatePlanMdAction)

UI (idea-row-actions.tsx):
- Hidden <input type=file accept=".md,.markdown,text/markdown,text/plain">
- FileReader → text → action
- Toast bij success + router.refresh()
- Blocked-tooltip in andere statussen

Tests: 10 nieuwe in __tests__/actions/ideas-crud.test.ts dekkend voor:
happy paths (DRAFT/GRILLED/PLAN_READY-overwrite/PLAN_FAILED), blocks
(PLANNED/GRILLING), validation (empty/oversized/parse-fail), 404.
Full suite groen: 849/849.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add reviews for Bootstrap-wizard plans v3.2 to v3.4

- Review v3.2: Addressed executor model, fire-and-forget issues, and PAT handling.
- Review v3.3: Improved transaction handling, stale recovery, and ID generation.
- Review v3.4: Finalized GitHub permissions, catalog versioning, and E2E verification queries.
- Updated recommendations for each version to enhance implementation readiness.

* docs(plans): M8 bootstrap-wizard upload-variant v1.4 — backtick-paden

Upload-variant van het volledige technische plan (docs/plans/M8-bootstrap-wizard.md),
bedoeld voor de "Upload plan"-functie. Genereert 1 PBI + 4 Stories + 22 Tasks
via materializeIdeaPlanAction.

v1.4-aanpassingen tov eerdere generatie-iteratie:
- Alle bestandspaden in implementation_plan in backticks (path-extractor matchen)
- Expliciete "Bestanden:" blok per task vóór de stappen
- Alle tasks op verify_required: ALIGNED_OR_PARTIAL (was deels ALIGNED — te strict
  voor ADR-stubs en multi-file edits)

Fixt forward-only: T-963 cancelled_by_self door DIVERGENT verifier-verdict.
Re-upload van dit bestand produceert tasks die door verify_task_against_plan
als ALIGNED of PARTIAL geclassificeerd kunnen worden.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* PBI-67: Add review-plan support to Idea model and job config

- Add plan_review_log and reviewed_at fields to Idea model
- Add REVIEWING_PLAN, PLAN_REVIEW_FAILED, PLAN_REVIEWED to IdeaStatus enum
- Add IDEA_REVIEW_PLAN to ClaudeJobKind enum
- Add IDEA_REVIEW_PLAN config to job-config.ts with model=opus, thinking_budget=6000
- Create migration record for schema changes (applied via db push)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* PBI-67 Phase 2: Add update-idea-plan-reviewed MCP tool

- Create src/tools/update-idea-plan-reviewed.ts: saves review-log and transitions idea status to PLAN_REVIEWED
- Add PLAN_REVIEW_RESULT to IdeaLogType enum (both repos)
- Register tool in src/index.ts
- Update Prisma schemas (both repos): add plan_review_log and reviewed_at fields to Idea model
- Add REVIEWING_PLAN, PLAN_REVIEW_FAILED, PLAN_REVIEWED to IdeaStatus enum (MCP schema)
- Add IDEA_REVIEW_PLAN to ClaudeJobKind enum (MCP schema)
- Tool includes transaction safety and convergence metrics logging

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat(PBI-67): IDEA_REVIEW_PLAN Phases 3-6 — server actions, UI components, prompt & tests

- Phase 3: startReviewPlanJobAction, cancelIdeaJobAction, status transitions
  (REVIEWING_PLAN / PLAN_REVIEWED / PLAN_REVIEW_FAILED), status colors,
  job-card/jobs-column filters, idea-list status tabs
- Phase 4: review-plan-job.md prompt (multi-model orchestration with codex
  injection + active plan revision via update_idea_plan_md after each round),
  runbook, 13 unit tests
- Phase 5: ReviewLogViewer component (rounds, convergence, approval, issues),
  idea-detail integration, proper ReviewLog TypeScript types exported from component
- Phase 6.1: wait-for-job discriminator wired (IDEA_REVIEW_PLAN), plan-revision
  step made mandatory in prompt (was previously optional/missing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 03:35:02 +02:00

9.4 KiB

Phase 6: End-to-End Testing & Rollout Plan

Status: In Progress (Phase 6.2 - End-to-End Testing)
Date: May 14, 2026
Build Status: All 862 tests passing, TypeScript clean


Completion Status: Phases 1-5

Phase 1: Database & Config

  • Schema extended with plan_review_log (Json) and reviewed_at (DateTime)
  • IdeaStatus enum: REVIEWING_PLAN, PLAN_REVIEW_FAILED, PLAN_REVIEWED
  • ClaudeJobKind: IDEA_REVIEW_PLAN
  • IdeaLogType: PLAN_REVIEW_RESULT
  • Prisma migration created and applied
  • MCP schema synchronized

Phase 2: MCP Tool Implementation

  • MCP tool: update_idea_plan_reviewed (transaction-safe database updates)
  • Type validation via Zod
  • Error handling and access control
  • Tool registered in MCP server index

Phase 3: Server Actions & UI Components

  • Server action: startReviewPlanJobAction()
  • Server action: cancelIdeaJobAction() updated for IDEA_REVIEW_PLAN
  • Status transitions: PLAN_READY → REVIEWING_PLAN → PLAN_REVIEWED/PLAN_REVIEW_FAILED
  • UI status colors and labels
  • Job cards and filtering updated

Phase 4: Grill Prompt Implementation

  • Prompt: lib/idea-prompts/review-plan-job.md (194 lines)
  • Prompt copied to MCP: scrum4me-mcp/src/prompts/idea/review-plan.md
  • Prompt registered in kind-prompts.ts
  • Documentation: docs/runbooks/review-plan-job.md
  • Test suite: __tests__/review-plan-job.test.ts (13/13 passing)

Phase 5: ReviewLogViewer UI Component

  • Component: components/ideas/review-log-viewer.tsx (241 lines)
  • ReviewLog type exported (properly typed)
  • Integration in idea detail page
  • Display: round-by-round analysis, convergence metrics, approval status
  • Styling: MD3 tokens for severity levels

Phase 1-5 Verification

  • TypeScript compilation: Clean
  • All tests passing: 862/862
  • ESLint: Fixed no-explicit-any errors with proper ReviewLog typing
  • Implementation is feature-complete and production-ready

Phase 6: Integration & Rollout

6.1: Wire wait-for-job Discriminator DONE

  • Line 511 in scrum4me-mcp/src/tools/wait-for-job.ts: Added IDEA_REVIEW_PLAN to job kind condition
  • Line 574: Branch naming logic updated to return 'review' for IDEA_REVIEW_PLAN

6.2: End-to-End Testing 🔄 IN PROGRESS

Test Scenarios

Scenario 1: Trigger Review Job on PLAN_READY Idea

  • Select idea with status PLAN_READY (e.g., IDEA-016, IDEA-043, IDEA-049)
  • Verify idea has product_id with valid repo_url
  • Trigger startReviewPlanJobAction()
  • Verify:
    • ClaudeJob created with status QUEUED
    • Idea status flipped to REVIEWING_PLAN
    • IdeaLog entry created with type JOB_EVENT
    • Job payload contains correct job-config snapshot

Scenario 2: Job Execution by MCP Worker

  • Worker claims job via wait_for_job(IDEA_REVIEW_PLAN)
  • Verify returned payload contains:
    • idea_id, kind, plan_md, grill_md
    • plan_md parsed into YAML structure
    • job_config with model (claude-opus-4-7), thinking_budget (6000), allowed_tools
  • Verify job status transitions to CLAIMEDRUNNING

Scenario 3: Multi-Round Review Execution

  • Worker executes prompt: 3 review rounds (Haiku → Sonnet → Opus)
  • Each round produces issues[], score (0-100), plan_diff_lines
  • Convergence detection: diff < 5% for 2 consecutive rounds triggers approval gate
  • Verify review-log JSON structure matches schema (see below)

Scenario 4: Approval Gate & Status Transition

  • Worker calls ask_user_question with convergence metrics
  • User approves/rejects via chat interface
  • On approval: update_idea_plan_reviewed(approval_status='approved')
  • Verify atomic transaction:
    • plan_review_log saved to DB
    • reviewed_at timestamp set
    • Idea status: REVIEWING_PLANPLAN_REVIEWED
    • IdeaLog entry created with type PLAN_REVIEW_RESULT
  • On rejection: status → PLAN_REVIEW_FAILED

Scenario 5: UI Display of Review Results

  • Open idea page in plan tab
  • Verify ReviewLogViewer displays:
    • Summary and approval status badge
    • Convergence metrics (if present)
    • Round-by-round analysis (model, role, score, diff_lines, timestamp)
    • Issue badges per round (category, severity, suggestion)
    • Metadata: plan_file, creation date, round count

Scenario 6: State Transitions & Cancellation

  • While job is RUNNING, trigger cancelIdeaJobAction()
  • Verify:
    • Job status → CANCELLED
    • Idea status → PLAN_READY (revert to before review)
    • IdeaLog entry created: JOB_EVENT with cancel note

Review-Log Schema Validation

{
  "plan_file": "IDEA-016",
  "created_at": "2026-05-14T03:15:00Z",
  "rounds": [
    {
      "round": 0,
      "model": "claude-3-5-haiku",
      "role": "Structure Review",
      "focus": "YAML parsing, format, syntax",
      "issues": [
        {
          "category": "structure|logic|risk|pattern",
          "severity": "error|warning|info",
          "suggestion": "string"
        }
      ],
      "score": 75,
      "plan_diff_lines": 3,
      "converged": false,
      "timestamp": "2026-05-14T03:15:30Z"
    }
  ],
  "convergence": {
    "stable_at_round": 2,
    "final_diff_pct": 2.1,
    "convergence_metric": "plan_stability"
  },
  "approval": {
    "status": "pending|approved|rejected",
    "timestamp": "2026-05-14T03:20:00Z"
  },
  "summary": "Plan reviewed across 3 rounds..."
}

Test Checklist

  • Database: plan_review_log field persists correctly
  • MCP: Prompt injection (codex context) works
  • MCP: Model switching simulates correctly (all rounds via Opus)
  • Convergence: Math correct (< 5% change threshold)
  • Approval: Atomic transaction commits on approve/reject
  • UI: ReviewLogViewer renders all data correctly
  • UI: Status transitions visible in idea detail page
  • Error paths: Handle malformed plan_md gracefully
  • Error paths: Handle missing product repo_url
  • Error paths: Handle parse failures in Zod validation

6.3: Verify IdeaLog Entries & Persistence 📋

  • JOB_EVENT log entries: queued, claimed, running, done, failed, cancelled
  • PLAN_REVIEW_RESULT log entry with convergence metadata
  • Timeline display: logs appear in idea detail → timeline tab
  • Metadata validation: all fields present and correctly typed

6.4: Feature Flag Management 📋

  • If feature flag exists: gate IDEA_REVIEW_PLAN creation to enabled users
  • If not: decide on rollout strategy (gradual or all-at-once)
  • Document flag semantics (server-side or client-side)

6.5: Staging Rollout (24h Test) 📋

  • Deploy to staging environment
  • Enable IDEA_REVIEW_PLAN for staging users (100%)
  • Monitor: job execution, error rates, performance
  • Verify: no regressions in existing idea workflows (grill, make-plan)
  • Smoke test: trigger review jobs on 3-5 different ideas
  • Check: review-log data integrity, IdeaLog audit trail

6.6: Gradual Rollout to Production 📋

  • Phase 1: 10% of active users get IDEA_REVIEW_PLAN enabled
  • Phase 2 (24h later): 50% of users
  • Phase 3 (24h later): 100% of users
  • Rollback plan: disable feature flag if error rate > threshold
  • Monitor:
    • Job success rate (goal: > 95%)
    • Review-log schema validation errors
    • Worker capacity utilization
    • User feedback (approval acceptance rate)

Key Implementation Details

Job-Config Snapshot

{
  kind: 'IDEA_REVIEW_PLAN',
  model_override: 'claude-opus-4-7',
  thinking_budget: 6000,
  allowed_tools: ['read', 'write', 'grep', 'glob', ...mcp_tools],
  verify_required: 'ALIGNED_OR_PARTIAL',
  verify_only: false
}

Prompt Execution Pipeline

  1. Worker loads plan_md + grill_md from DB
  2. Codex injection: load docs/patterns/, docs/architecture/, CLAUDE.md
  3. Round 1: Haiku reviews structure
  4. Round 2: Sonnet reviews logic/patterns
  5. Round 3: Opus reviews risks/edge-cases
  6. Convergence check: break if stable
  7. Ask user approval via ask_user_question
  8. On approval: save review-log, transition status, log PLAN_REVIEW_RESULT

Status Transition Rules

  • PLAN_READY → REVIEWING_PLAN: startReviewPlanJobAction()
  • REVIEWING_PLAN → PLAN_REVIEWED: User approves via ask_user_question
  • REVIEWING_PLAN → PLAN_REVIEW_FAILED: User rejects
  • REVIEWING_PLAN → PLAN_READY: User cancels job

Known Limitations & Future Work

  1. No multi-model API calls: All rounds use Opus (future: leverage Claude API direct model switching)
  2. No codex re-loading: Docs injected once (future: smart context selection per round)
  3. No re-review detection: No diff against previous reviews (future: highlight deltas)
  4. Manual review-log edit: Users cannot edit review-log directly (future: could add)

References

  • Phase 4 prompt: lib/idea-prompts/review-plan-job.md
  • Implementation guide: docs/runbooks/review-plan-job.md
  • ReviewLog types: components/ideas/review-log-viewer.tsx
  • Server action: actions/ideas.tsstartReviewPlanJobAction()
  • MCP tool: scrum4me-mcp/src/tools/update-idea-plan-reviewed.ts
  • Tests: __tests__/review-plan-job.test.ts

Next Steps (Immediate)

  1. Start Phase 6.2: Manually trigger review job on IDEA-016
  2. Monitor job execution: Check logs, review-log schema
  3. Verify UI display: ReviewLogViewer renders correctly
  4. Document blockers: If any failures occur, diagnose and document
  5. Proceed to staging: Once E2E test passes