Scrum4Me/docs/manual/06-troubleshooting.md
Madhura68 e75bac9375 docs(PBI-58): add developer manual chapters under docs/manual/
Adds a 7-file English-language manual targeted at new human contributors:
index, overview, statuses & transitions (with mermaid state diagrams),
git workflow, MCP integration, docker, and troubleshooting. The manual
is the *map* — it cross-references existing runbooks/ADRs/architecture
docs rather than duplicating their content.

Regenerates docs/INDEX.md and validates with check-doc-links.mjs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:37:43 +02:00

8.5 KiB

title status audience language last_updated when_to_read
Troubleshooting active
contributor
en 2026-05-07 When something breaks. Start with the symptom table; fall back to the error-code reference.

06 — Troubleshooting

This chapter is the first place to look when something is wrong. Each row links to the authoritative source so you can dig deeper without losing your trail.

Error code reference

These three HTTP status codes are non-negotiable hardstops in the API surface — they always mean the same thing across every route handler.

Code Meaning Where it comes from
400 JSON parse error Body couldn't be parsed as JSON. Usually a malformed request from a client.
422 Zod validation error Body parsed, but failed schema validation. Response includes the offending field path.
403 Demo-user write blocked Authenticated user is_demo = true attempted a write. Three layers enforce this — see docs/adr/0006-demo-user-three-layer-policy.md.

Hardstop: these codes are reserved. Do not use 400 for validation errors or 422 for unauthorised access. The contract is enforced at the route-handler level — see the Route Handler pattern.

Other common codes:

Code Meaning
401 No session / invalid bearer token
404 Resource not found, or token does not have access
409 State conflict — e.g. trying to claim a job that's already CLAIMED
429 Rate-limited — typically the Anthropic quota cap, not Scrum4Me itself
500 Unhandled server error. Always check Vercel function logs.

Symptom → cause → fix

MCP

Symptom Likely cause Fix
mcp__scrum4me__get_claude_context returns null or empty story Bearer token doesn't have access to that product Run mcp__scrum4me__list_products to confirm scope; rotate the token if needed
mcp__scrum4me__update_task_status returns 403 Demo user, or token mismatch in a sprint run Check user identity; if inside a sprint run, the bearer token must match claimed_by_token_id of the parent job
mcp__scrum4me__wait_for_job returns nothing for the full 600s block Queue is genuinely empty This is normal — loop and call again. See runbooks/mcp-integration.md
Job stays CLAIMED for >30 minutes Worker died mid-job Auto-requeue triggers on next wait_for_job; no manual action needed
update_idea_plan_md causes idea to flip to PLAN_FAILED parsePlanMd server-side rejected the YAML-frontmatter Inspect IdeaLog{JOB_EVENT, errors} for the parse error; re-run IDEA_MAKE_PLAN after fixing the prompt

Statuses & data integrity

Symptom Likely cause Fix
Status displayed differently in DB vs UI Some code path bypassed lib/task-status.ts Grep the codebase for direct enum string usage; force everything through the mappers. See adr/0004-status-enum-mapping.md
Story stuck IN_SPRINT when all tasks are DONE Auto-promotion not triggered Check the most recent update_task_status call — it may have failed silently. Re-issue with the correct task
PBI not auto-promoting to DONE Not all child stories are DONE yet List stories under the PBI; one is probably still OPEN or IN_SPRINT
422 from create_pbi / create_story / create_task Zod validation failed (length cap, missing required field) Response body includes field path — fix and retry
IdeaStatus stays GRILLING long after the worker stopped The job ended without calling update_idea_grill_md Check the worker logs for an exception; manually requeue or mark GRILL_FAILED to allow retry

Git & deploy

Symptom Likely cause Fix
Unexpected Vercel preview build appeared mid-batch An interim push happened that shouldn't have Inspect git log --all --graph for the offending push; review runbooks/branch-and-commit.md
PR has multiple Vercel deployments for the same commit range Force-push, or push-then-revert Don't force-push. If genuinely needed, document in the PR description
Auto-PR didn't open after story DONE Story not actually DONE, or auto-PR pre-conditions unmet Walk through runbooks/auto-pr-flow.md; typically a missing update_task_status('done') for the last task
Vercel skipped the deploy entirely skip-deploy label or path-filter excluded the changed paths See runbooks/deploy-control.md for the rules
Merge conflict between two parallel batches Two branches touched the same files Serialise: merge the first PR before pushing the second. Then git fetch origin main && git rebase origin/main

Realtime

Symptom Likely cause Fix
Solo Board doesn't update when status changes SSE connection dropped, or NOTIFY payload missing fields Reload the page; if it persists, check DIRECT_URL (LISTEN/NOTIFY needs the pooler-bypass URL). See patterns/realtime-notify-payload.md
NavBar bell doesn't pulse on new question SSE/event channel mismatched, or payload missing required fields Confirm the question was actually inserted (mcp__scrum4me__list_open_questions); inspect the Network tab for the SSE connection
Worker shows offline despite a running container worker_heartbeat not reaching MCP Verify SCRUM4ME_BASE_URL and bearer token; tail container logs

Auth & sessions

Symptom Likely cause Fix
Login redirects in a loop Session cookie not set; usually SESSION_SECRET mismatch between deployments Check Vercel env vars for SESSION_SECRET (must be ≥32 chars); see patterns/iron-session.md
All write buttons disabled with "Niet beschikbaar in demo-modus" tooltip You're logged in as the demo user Log out and log in with a real account
403 on a route that should be allowed Proxy or server-action layer rejected the request Walk through the three layers in adr/0006-demo-user-three-layer-policy.md; each can independently say "no"

Build & dev-server

Symptom Likely cause Fix
npm run build fails with Cannot find module '@/...' TypeScript path alias mismatch Check tsconfig.json paths; rerun npm run prebuild if codegen is stale
Mermaid diagram renders as plain text in the in-app /manual viewer MermaidBlock not picking up language-mermaid See 04 — MCP Integration won't help here — open app/(app)/manual/_components/mermaid-block.tsx and confirm the dynamic import is ssr: false
"Server-only" import error in browser A *-server.ts module was imported into a client component Refactor — split server logic out, or use a server action. Hardstop in CLAUDE.md
npm run dev shows hydration mismatch Server and client render diverge — usually time-based or random values Wrap in useEffect for client-only state, or pass server time as a prop

When in doubt

  1. Read the runbook. Each runbook in docs/runbooks/ starts with a when_to_read field — match the situation.
  2. Check the ADRs. The ADR index in docs/INDEX.md lists the rationale for every cross-cutting decision. If your fix would contradict an ADR, talk to a maintainer first.
  3. Read the agent-flow pitfalls log. docs/runbooks/agent-flow-pitfalls.md is a living list of issues found during agent runs and how they were resolved.
  4. Look at recent commits. git log --oneline --since='7 days ago' often reveals the very change that broke whatever you're debugging.

Escalation

If after the steps above the issue is still unresolved:

What's next

You've reached the end of the manual. Bookmark this troubleshooting chapter — it's the most-revisited page once you're past onboarding.

Back to index.