Files
91/tests
nianzhibai e32da9016b feat: unify crawler pipeline and duplicate maintenance
Remove the legacy spider91-specific storage, route, migration, and admin upload-target handling so crawler imports are treated as generic scriptcrawler drives.

Replace the spider91 migrator with crawlerupload and update the nightly pipeline to run generic crawler crawling, crawler uploads, and full-library duplicate video maintenance.

Add exact duplicate removal by size_bytes plus sampled_sha256 and near-duplicate removal by title similarity, duration, and thumbnail SSIM, keeping the larger source and deleting duplicate catalog rows with tombstones.

Mark automatically deduped tombstones with reason=duplicate and show a compact 重复文件 pill in the admin blacklist table while leaving manual blacklist entries unmarked.

Add media similarity helpers, scriptcrawler near-duplicate checks, file_name-backed public search, crawler upload UI updates, and tests for the new behavior.

Remove the old /p/spider91 playback route and frontend special casing after the dedicated spider91 drive implementation was removed.

Verified with: go test ./... -count=1; npm test; npm run build.
2026-06-22 22:49:18 +08:00
..