Scrum4Me/docs/manual/06-troubleshooting.md
Janpeter Visser bd7478861b
PBI-58: Developer manual + in-app /manual page (#148)
* docs(PBI-58): add developer manual chapters under docs/manual/

Adds a 7-file English-language manual targeted at new human contributors:
index, overview, statuses & transitions (with mermaid state diagrams),
git workflow, MCP integration, docker, and troubleshooting. The manual
is the *map* — it cross-references existing runbooks/ADRs/architecture
docs rather than duplicating their content.

Regenerates docs/INDEX.md and validates with check-doc-links.mjs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(PBI-58): add markdown rendering deps + manual:build script

Adds mermaid, rehype-slug, rehype-autolink-headings for the in-app
/manual page. Wires manual:build into prebuild so production builds
always regenerate the chapter TOC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(PBI-58): codegen script for in-app manual TOC

scripts/build-manual.mjs walks docs/manual/, parses YAML front-matter,
strips it from the body, and emits lib/manual.generated.ts with a typed
ManualEntry[] containing slug, title, description, filePath, and the
embedded markdown body. Pure Node 20, mirrors generate-docs-index.mjs.

Inlining the markdown at build time keeps runtime serverless functions
free of filesystem reads, which avoids whole-project NFT tracing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(PBI-58): /manual route renders developer manual chapters in-app

Catch-all route at app/(app)/manual/[[...slug]]/page.tsx with
generateStaticParams covering every TOC entry. Server-side
MarkdownView uses react-markdown with remark-gfm, rehype-slug, and
rehype-autolink-headings; mermaid code blocks are routed to a
client-only MermaidBlock that dynamic-imports mermaid on mount.

ManualSidebar (client) reads the typed TOC and highlights the active
chapter via usePathname.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(PBI-58): add Manual link to main nav bar

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 18:00:10 +02:00

112 lines
8.5 KiB
Markdown

---
title: "Troubleshooting"
status: active
audience: [contributor]
language: en
last_updated: 2026-05-07
when_to_read: "When something breaks. Start with the symptom table; fall back to the error-code reference."
---
# 06 — Troubleshooting
This chapter is the **first place to look** when something is wrong. Each row links to the authoritative source so you can dig deeper without losing your trail.
## Error code reference
These three HTTP status codes are non-negotiable hardstops in the API surface — they always mean the same thing across every route handler.
| Code | Meaning | Where it comes from |
|---|---|---|
| **`400`** | JSON parse error | Body couldn't be parsed as JSON. Usually a malformed request from a client. |
| **`422`** | Zod validation error | Body parsed, but failed schema validation. Response includes the offending field path. |
| **`403`** | Demo-user write blocked | Authenticated user `is_demo = true` attempted a write. Three layers enforce this — see [`docs/adr/0006-demo-user-three-layer-policy.md`](../adr/0006-demo-user-three-layer-policy.md). |
> **Hardstop:** these codes are reserved. Do not use `400` for validation errors or `422` for unauthorised access. The contract is enforced at the route-handler level — see the [Route Handler pattern](../patterns/route-handler.md).
Other common codes:
| Code | Meaning |
|---|---|
| `401` | No session / invalid bearer token |
| `404` | Resource not found, or token does not have access |
| `409` | State conflict — e.g. trying to claim a job that's already `CLAIMED` |
| `429` | Rate-limited — typically the Anthropic quota cap, not Scrum4Me itself |
| `500` | Unhandled server error. Always check Vercel function logs. |
## Symptom → cause → fix
### MCP
| Symptom | Likely cause | Fix |
|---|---|---|
| `mcp__scrum4me__get_claude_context` returns `null` or empty story | Bearer token doesn't have access to that product | Run `mcp__scrum4me__list_products` to confirm scope; rotate the token if needed |
| `mcp__scrum4me__update_task_status` returns `403` | Demo user, or token mismatch in a sprint run | Check user identity; if inside a sprint run, the bearer token must match `claimed_by_token_id` of the parent job |
| `mcp__scrum4me__wait_for_job` returns nothing for the full 600s block | Queue is genuinely empty | This is normal — loop and call again. See [`runbooks/mcp-integration.md`](../runbooks/mcp-integration.md#batch-loop-verplichte-agent-flow) |
| Job stays `CLAIMED` for >30 minutes | Worker died mid-job | Auto-requeue triggers on next `wait_for_job`; no manual action needed |
| `update_idea_plan_md` causes idea to flip to `PLAN_FAILED` | `parsePlanMd` server-side rejected the YAML-frontmatter | Inspect `IdeaLog{JOB_EVENT, errors}` for the parse error; re-run `IDEA_MAKE_PLAN` after fixing the prompt |
### Statuses & data integrity
| Symptom | Likely cause | Fix |
|---|---|---|
| Status displayed differently in DB vs UI | Some code path bypassed `lib/task-status.ts` | Grep the codebase for direct enum string usage; force everything through the mappers. See [`adr/0004-status-enum-mapping.md`](../adr/0004-status-enum-mapping.md) |
| Story stuck `IN_SPRINT` when all tasks are `DONE` | Auto-promotion not triggered | Check the most recent `update_task_status` call — it may have failed silently. Re-issue with the correct task |
| PBI not auto-promoting to `DONE` | Not all child stories are `DONE` yet | List stories under the PBI; one is probably still `OPEN` or `IN_SPRINT` |
| `422` from `create_pbi` / `create_story` / `create_task` | Zod validation failed (length cap, missing required field) | Response body includes field path — fix and retry |
| `IdeaStatus` stays `GRILLING` long after the worker stopped | The job ended without calling `update_idea_grill_md` | Check the worker logs for an exception; manually requeue or mark `GRILL_FAILED` to allow retry |
### Git & deploy
| Symptom | Likely cause | Fix |
|---|---|---|
| Unexpected Vercel preview build appeared mid-batch | An interim push happened that shouldn't have | Inspect `git log --all --graph` for the offending push; review [`runbooks/branch-and-commit.md`](../runbooks/branch-and-commit.md) |
| PR has multiple Vercel deployments for the same commit range | Force-push, or push-then-revert | Don't force-push. If genuinely needed, document in the PR description |
| Auto-PR didn't open after story `DONE` | Story not actually `DONE`, or auto-PR pre-conditions unmet | Walk through [`runbooks/auto-pr-flow.md`](../runbooks/auto-pr-flow.md); typically a missing `update_task_status('done')` for the last task |
| Vercel skipped the deploy entirely | `skip-deploy` label or path-filter excluded the changed paths | See [`runbooks/deploy-control.md`](../runbooks/deploy-control.md) for the rules |
| Merge conflict between two parallel batches | Two branches touched the same files | Serialise: merge the first PR before pushing the second. Then `git fetch origin main && git rebase origin/main` |
### Realtime
| Symptom | Likely cause | Fix |
|---|---|---|
| Solo Board doesn't update when status changes | SSE connection dropped, or NOTIFY payload missing fields | Reload the page; if it persists, check `DIRECT_URL` (LISTEN/NOTIFY needs the pooler-bypass URL). See [`patterns/realtime-notify-payload.md`](../patterns/realtime-notify-payload.md) |
| NavBar bell doesn't pulse on new question | SSE/event channel mismatched, or payload missing required fields | Confirm the question was actually inserted (`mcp__scrum4me__list_open_questions`); inspect the Network tab for the SSE connection |
| Worker shows offline despite a running container | `worker_heartbeat` not reaching MCP | Verify `SCRUM4ME_BASE_URL` and bearer token; tail container logs |
### Auth & sessions
| Symptom | Likely cause | Fix |
|---|---|---|
| Login redirects in a loop | Session cookie not set; usually `SESSION_SECRET` mismatch between deployments | Check Vercel env vars for `SESSION_SECRET` (must be ≥32 chars); see [`patterns/iron-session.md`](../patterns/iron-session.md) |
| All write buttons disabled with "Niet beschikbaar in demo-modus" tooltip | You're logged in as the demo user | Log out and log in with a real account |
| `403` on a route that should be allowed | Proxy or server-action layer rejected the request | Walk through the three layers in [`adr/0006-demo-user-three-layer-policy.md`](../adr/0006-demo-user-three-layer-policy.md); each can independently say "no" |
### Build & dev-server
| Symptom | Likely cause | Fix |
|---|---|---|
| `npm run build` fails with `Cannot find module '@/...'` | TypeScript path alias mismatch | Check `tsconfig.json` `paths`; rerun `npm run prebuild` if codegen is stale |
| Mermaid diagram renders as plain text in the in-app `/manual` viewer | `MermaidBlock` not picking up `language-mermaid` | See [04 — MCP Integration](./04-mcp-integration.md) won't help here — open `app/(app)/manual/_components/mermaid-block.tsx` and confirm the dynamic import is `ssr: false` |
| "Server-only" import error in browser | A `*-server.ts` module was imported into a client component | Refactor — split server logic out, or use a server action. Hardstop in [`CLAUDE.md`](../../CLAUDE.md#hardstop-regels) |
| `npm run dev` shows hydration mismatch | Server and client render diverge — usually time-based or random values | Wrap in `useEffect` for client-only state, or pass server time as a prop |
## When in doubt
1. **Read the runbook.** Each runbook in [`docs/runbooks/`](../runbooks/) starts with a `when_to_read` field — match the situation.
2. **Check the ADRs.** The ADR index in [`docs/INDEX.md`](../INDEX.md) lists the rationale for every cross-cutting decision. If your fix would contradict an ADR, talk to a maintainer first.
3. **Read the agent-flow pitfalls log.** [`docs/runbooks/agent-flow-pitfalls.md`](../runbooks/agent-flow-pitfalls.md) is a living list of issues found during agent runs and how they were resolved.
4. **Look at recent commits.** `git log --oneline --since='7 days ago'` often reveals the very change that broke whatever you're debugging.
## Escalation
If after the steps above the issue is still unresolved:
- **AI agent / MCP issues** → file in the [`scrum4me-mcp` repo](https://github.com/madhura68/scrum4me-mcp).
- **Worker container issues** → file in the [`scrum4me-docker` repo](https://github.com/madhura68/scrum4me-docker).
- **App / data / status issues** → file in the [`Scrum4Me` repo](https://github.com/madhura68/Scrum4Me).
## What's next
You've reached the end of the manual. Bookmark this troubleshooting chapter — it's the most-revisited page once you're past onboarding.
Back to [index](./index.md).