Adds a 7-file English-language manual targeted at new human contributors: index, overview, statuses & transitions (with mermaid state diagrams), git workflow, MCP integration, docker, and troubleshooting. The manual is the *map* — it cross-references existing runbooks/ADRs/architecture docs rather than duplicating their content. Regenerates docs/INDEX.md and validates with check-doc-links.mjs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.5 KiB
8.5 KiB
| title | status | audience | language | last_updated | when_to_read | |
|---|---|---|---|---|---|---|
| Troubleshooting | active |
|
en | 2026-05-07 | When something breaks. Start with the symptom table; fall back to the error-code reference. |
06 — Troubleshooting
This chapter is the first place to look when something is wrong. Each row links to the authoritative source so you can dig deeper without losing your trail.
Error code reference
These three HTTP status codes are non-negotiable hardstops in the API surface — they always mean the same thing across every route handler.
| Code | Meaning | Where it comes from |
|---|---|---|
400 |
JSON parse error | Body couldn't be parsed as JSON. Usually a malformed request from a client. |
422 |
Zod validation error | Body parsed, but failed schema validation. Response includes the offending field path. |
403 |
Demo-user write blocked | Authenticated user is_demo = true attempted a write. Three layers enforce this — see docs/adr/0006-demo-user-three-layer-policy.md. |
Hardstop: these codes are reserved. Do not use
400for validation errors or422for unauthorised access. The contract is enforced at the route-handler level — see the Route Handler pattern.
Other common codes:
| Code | Meaning |
|---|---|
401 |
No session / invalid bearer token |
404 |
Resource not found, or token does not have access |
409 |
State conflict — e.g. trying to claim a job that's already CLAIMED |
429 |
Rate-limited — typically the Anthropic quota cap, not Scrum4Me itself |
500 |
Unhandled server error. Always check Vercel function logs. |
Symptom → cause → fix
MCP
| Symptom | Likely cause | Fix |
|---|---|---|
mcp__scrum4me__get_claude_context returns null or empty story |
Bearer token doesn't have access to that product | Run mcp__scrum4me__list_products to confirm scope; rotate the token if needed |
mcp__scrum4me__update_task_status returns 403 |
Demo user, or token mismatch in a sprint run | Check user identity; if inside a sprint run, the bearer token must match claimed_by_token_id of the parent job |
mcp__scrum4me__wait_for_job returns nothing for the full 600s block |
Queue is genuinely empty | This is normal — loop and call again. See runbooks/mcp-integration.md |
Job stays CLAIMED for >30 minutes |
Worker died mid-job | Auto-requeue triggers on next wait_for_job; no manual action needed |
update_idea_plan_md causes idea to flip to PLAN_FAILED |
parsePlanMd server-side rejected the YAML-frontmatter |
Inspect IdeaLog{JOB_EVENT, errors} for the parse error; re-run IDEA_MAKE_PLAN after fixing the prompt |
Statuses & data integrity
| Symptom | Likely cause | Fix |
|---|---|---|
| Status displayed differently in DB vs UI | Some code path bypassed lib/task-status.ts |
Grep the codebase for direct enum string usage; force everything through the mappers. See adr/0004-status-enum-mapping.md |
Story stuck IN_SPRINT when all tasks are DONE |
Auto-promotion not triggered | Check the most recent update_task_status call — it may have failed silently. Re-issue with the correct task |
PBI not auto-promoting to DONE |
Not all child stories are DONE yet |
List stories under the PBI; one is probably still OPEN or IN_SPRINT |
422 from create_pbi / create_story / create_task |
Zod validation failed (length cap, missing required field) | Response body includes field path — fix and retry |
IdeaStatus stays GRILLING long after the worker stopped |
The job ended without calling update_idea_grill_md |
Check the worker logs for an exception; manually requeue or mark GRILL_FAILED to allow retry |
Git & deploy
| Symptom | Likely cause | Fix |
|---|---|---|
| Unexpected Vercel preview build appeared mid-batch | An interim push happened that shouldn't have | Inspect git log --all --graph for the offending push; review runbooks/branch-and-commit.md |
| PR has multiple Vercel deployments for the same commit range | Force-push, or push-then-revert | Don't force-push. If genuinely needed, document in the PR description |
Auto-PR didn't open after story DONE |
Story not actually DONE, or auto-PR pre-conditions unmet |
Walk through runbooks/auto-pr-flow.md; typically a missing update_task_status('done') for the last task |
| Vercel skipped the deploy entirely | skip-deploy label or path-filter excluded the changed paths |
See runbooks/deploy-control.md for the rules |
| Merge conflict between two parallel batches | Two branches touched the same files | Serialise: merge the first PR before pushing the second. Then git fetch origin main && git rebase origin/main |
Realtime
| Symptom | Likely cause | Fix |
|---|---|---|
| Solo Board doesn't update when status changes | SSE connection dropped, or NOTIFY payload missing fields | Reload the page; if it persists, check DIRECT_URL (LISTEN/NOTIFY needs the pooler-bypass URL). See patterns/realtime-notify-payload.md |
| NavBar bell doesn't pulse on new question | SSE/event channel mismatched, or payload missing required fields | Confirm the question was actually inserted (mcp__scrum4me__list_open_questions); inspect the Network tab for the SSE connection |
| Worker shows offline despite a running container | worker_heartbeat not reaching MCP |
Verify SCRUM4ME_BASE_URL and bearer token; tail container logs |
Auth & sessions
| Symptom | Likely cause | Fix |
|---|---|---|
| Login redirects in a loop | Session cookie not set; usually SESSION_SECRET mismatch between deployments |
Check Vercel env vars for SESSION_SECRET (must be ≥32 chars); see patterns/iron-session.md |
| All write buttons disabled with "Niet beschikbaar in demo-modus" tooltip | You're logged in as the demo user | Log out and log in with a real account |
403 on a route that should be allowed |
Proxy or server-action layer rejected the request | Walk through the three layers in adr/0006-demo-user-three-layer-policy.md; each can independently say "no" |
Build & dev-server
| Symptom | Likely cause | Fix |
|---|---|---|
npm run build fails with Cannot find module '@/...' |
TypeScript path alias mismatch | Check tsconfig.json paths; rerun npm run prebuild if codegen is stale |
Mermaid diagram renders as plain text in the in-app /manual viewer |
MermaidBlock not picking up language-mermaid |
See 04 — MCP Integration won't help here — open app/(app)/manual/_components/mermaid-block.tsx and confirm the dynamic import is ssr: false |
| "Server-only" import error in browser | A *-server.ts module was imported into a client component |
Refactor — split server logic out, or use a server action. Hardstop in CLAUDE.md |
npm run dev shows hydration mismatch |
Server and client render diverge — usually time-based or random values | Wrap in useEffect for client-only state, or pass server time as a prop |
When in doubt
- Read the runbook. Each runbook in
docs/runbooks/starts with awhen_to_readfield — match the situation. - Check the ADRs. The ADR index in
docs/INDEX.mdlists the rationale for every cross-cutting decision. If your fix would contradict an ADR, talk to a maintainer first. - Read the agent-flow pitfalls log.
docs/runbooks/agent-flow-pitfalls.mdis a living list of issues found during agent runs and how they were resolved. - Look at recent commits.
git log --oneline --since='7 days ago'often reveals the very change that broke whatever you're debugging.
Escalation
If after the steps above the issue is still unresolved:
- AI agent / MCP issues → file in the
scrum4me-mcprepo. - Worker container issues → file in the
scrum4me-dockerrepo. - App / data / status issues → file in the
Scrum4Merepo.
What's next
You've reached the end of the manual. Bookmark this troubleshooting chapter — it's the most-revisited page once you're past onboarding.
Back to index.