fix(immich): scale people/duplicate sync (txn timeout + bind-param limits) #44
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/immich-sync-scale"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom: the Immich sync buttons return the generic "Immich synchronisatie mislukt" (config + CSRF are fine).
Two scale bugs:
getPeople+ tens-of-thousands of per-persongetPersonStatisticsHTTP calls + every upsert — ran inside one Prisma interactive transaction (5s default timeout) viarunWithImmichSyncLock. At this library's size (~28k people fetched, 16,100 synced; 53,278 duplicate groups) it died with "A start cannot be executed on an expired transaction (5000 ms, 14858 ms passed)".deleteMany({ <id>: { notIn: [...] } })blew the database bind-parameter limit at 53k+ ids.Fix:
collect*(Immich HTTP, outside any transaction, with concurrency-limited person-statistics) andpersist*(DB writes only). The advisory-lock transaction now wraps only the writes; its timeout is raised to match the write volume. HTTP latency no longer counts against it.notInstale deletes withsynced_at < runTimestamp(every upserted row is stamped with the run timestamp), which scales without a huge parameter list. Per-group asset cleanup stays scoped.Verified against live Immich: people 16,100 synced (~19s); duplicates 53,278 groups / 125,388 assets (~4m23s);
npm run build, lint, and the rewritten Immich unit tests all green; full suite unchanged (only the pre-existing DB-integration tests fail, no Immich fails).Follow-up worth considering (not in this PR): the duplicates write still takes ~4m synchronously behind the button — fine functionally, but a background/job-queue trigger would be better UX at this scale.
The sync ran the whole job — getPeople + ~tens-of-thousands of per-person getPersonStatistics HTTP calls + every upsert — inside ONE Prisma interactive transaction (5s default), so at scale it died with 'A start cannot be executed on an expired transaction'. A second limit then surfaced: deleting stale rows via deleteMany({ <id>: { notIn: [...] } }) exceeds the DB bind-parameter limit when Immich has tens of thousands of people / duplicate groups. - Split each sync into collect* (Immich HTTP, OUTSIDE any transaction, concurrency-limited person statistics) and persist* (DB writes only). The advisory-lock transaction now wraps only the writes, with a raised timeout. - Replace top-level notIn stale-row deletes with 'synced_at < runTimestamp' (every upserted row is stamped with the run's syncedAt) — scales without a huge parameter list. Per-group asset cleanup stays scoped (small). Verified against live Immich: people 16,100 (~19s); duplicates 53,278 groups / 125,388 assets (~4m23s); build + lint + immich unit tests green.