113 Commits

Author SHA1 Message Date
nianzhibai 992b20da93 chore: bump version to 0.1.6 2026-06-12 20:24:29 +08:00
nianzhibai 1770693666 Revise feature descriptions and remove documentation
Updated feature list and removed documentation section.
2026-06-12 20:18:12 +08:00
nianzhibai 177041633a docs: remove crawler module documentation 2026-06-12 20:07:02 +08:00
nianzhibai ae324d3752 feat: add Unicom cloud drive source operations
Add Unicom cloud drive support for source-file deletion and crawler uploads.

- Implement source-file removal for Unicom cloud drive so deleting videos can also remove the original cloud-drive file when requested.

- Resolve Unicom cloud drive source identifiers across file FID, object ID, directory ID, rename, and delete flows.

- Add upload support for Spider91 crawler imports targeting Unicom cloud drive storage.

- Add Unicom cloud drive QR login backend APIs, frontend form support, and tests.

- Extend drive capability metadata, scanner behavior, proxy handling, preview handling, and migration coverage for cloud-drive source operations.

- Rename Chinese display labels from 联通沃盘 to 联通网盘 and from 123 云盘 to 123网盘 while keeping the root README aligned with origin/main.

- Add referrer-policy coverage for 302 video playback and update related frontend playback tests.
2026-06-12 15:49:15 +08:00
nianzhibai 7f1e4eaa29 Fix formatting and update feature descriptions in README 2026-06-12 10:59:17 +08:00
nianzhibai 811d87cc27 feat: 完善脚本爬虫导入与管理体验
- 重写添加/编辑爬虫弹窗布局,优化桌面宽度、脚本来源与测试区域比例,并隐藏本地路径/导入 URL 等内部信息。

- 调整爬虫管理页文案和移动端统计卡片布局,统一状态卡片两列展示,避免弹窗点击遮罩误关闭。

- 支持 GitHub blob 链接自动转换为 raw 链接,提升通过 URL 导入脚本的兼容性。

- 为脚本爬虫下载结果增加 ffprobe 完整性校验,失败时删除坏文件且不写入 seen,允许后续重新抓取。

- 支持 .m3u8/HLS 媒体通过 ffmpeg 重新封装为本地 MP4,并继续走指纹、封面、预览和上传迁移流程。

- 修复 dry-run stderr 日志偶发丢失问题,并补充 GitHub URL、坏视频清理、HLS 下载、弹窗交互和响应式布局测试。
2026-06-12 10:53:18 +08:00
nianzhibai e4408f5655 Clarify script crawler support in README
Updated the script crawler section to clarify that the project supports importing custom scripts with specific guidelines, and removed the mention of built-in crawlers.
2026-06-11 23:25:04 +08:00
nianzhibai e93c906921 Add PR submission guidelines section
Added PR submission guidelines to README.
2026-06-11 23:16:15 +08:00
nianzhibai 96e423b952 feat: 完善爬虫去重、上传进度和源文件删除
为脚本爬虫增加候选预算、重复 source 记录和默认爬虫标签,避免重复视频占满目标新增数量。

新增爬虫上传迁移进度上报和管理页上传卡片,让每个爬虫可以展示本轮上传处理情况。

为视频删除增加可选删除云盘源文件能力,补齐播放页、管理页交互,并为多个网盘驱动实现 Remove 接口。

补充相关测试并更新爬虫协议文档。
2026-06-11 22:42:11 +08:00
nianzhibai a8ccc19e9e Fix script crawler migration to PikPak
Handle already-migrated crawler assets by binding local script crawler rows to equivalent files that already exist on the configured target drive. This keeps thumbnail, preview, and fingerprint readiness stable while removing local crawler videos once an equivalent target object is available.

Harden PikPak uploads by retrying failed upload sessions, requesting fresh resumable upload metadata between attempts, and using CNAME-style OSS requests for PikPak upload endpoints so the SDK does not generate invalid bucket-prefixed hosts such as vip-lixian-07.upload-a10b.mypikpak.com.

Add focused tests for duplicate target binding, retrying failed PikPak OSS uploads with a fresh session, and preserving the expected PikPak upload endpoint URL shape.
2026-06-11 14:03:37 +08:00
nianzhibai 7ddf33d726 Improve crawler asset stats and admin navigation
- Count crawler assets by crawler source ID prefix after cloud migration

- Add crawler API totals for cumulative, local, and migrated videos

- Let crawler thumbnail and preview readiness inherit equivalent canonical videos

- Show cumulative crawl data in crawler management cards

- Remove low-value expanded crawler metadata fields from the card body

- Move return-to-site into the main admin navigation with grouped sections

- Rename the content admin group to management and adjust footer icon sizing

- Update backend and frontend tests for crawler/admin behavior
2026-06-10 23:45:43 +08:00
nianzhibai c1355385e1 feat(crawler): simplify script crawler workflow
Redesign crawler management around imported Python scripts instead of built-in crawler storage. Crawler scripts now declare CRAWLER_NAME, imports validate metadata, crawler IDs are generated internally, and deleted crawler scripts are detached without deleting already imported videos.

Add backend support for file and URL script imports, dry-run testing, metadata parsing, safer job paths, original filename preservation, and crawler listing that ignores detached script records. Remove the legacy built-in Spider91 script path flow and hidden Python/config JSON fields from the crawler API.

Rework the admin crawler page into an independent crawler console with script import, dry-run testing, status metrics, spider iconography, and simplified controls. Update docs, examples, installer checks, Docker/release packaging, and tests for the new protocol.
2026-06-10 14:27:16 +08:00
nianzhibai ec5a01b6aa feat(crawler): redesign crawler scripts and admin workflow
- add generic scriptcrawler backend runner using the crawler.v1 JSONL protocol

- support crawler script upload and HTTP(S) URL import from the admin crawler page

- simplify the user-facing crawler contract to title, media_url, optional thumbnail_url and optional source_id

- convert Spider91 into a normal script crawler and reject new Spider91 storage-drive configs

- keep legacy Spider91 storage rows visible only for cleanup/deletion

- add crawler protocol docs, example script, admin UI, tests and migration coverage
2026-06-09 23:51:12 +08:00
nianzhibai 71d4a16db1 chore: release v0.1.5 2026-06-08 23:55:05 +08:00
nianzhibai 940e5dd76d feat: support spider91 uploads to google drive 2026-06-08 23:50:19 +08:00
nianzhibai e826c05d5c chore: release v0.1.4 2026-06-08 19:25:27 +08:00
nianzhibai 3465b9e837 Fix drive card icon fallback 2026-06-08 19:07:53 +08:00
nianzhibai d33c1b1b20 Support custom Google Drive OAuth credentials 2026-06-08 18:58:05 +08:00
nianzhibai 5fc8e9ebb7 Improve drive scan task coordination 2026-06-08 17:37:58 +08:00
nianzhibai dc7d2a5de3 Release v0.1.3 for ArtPlayer video detail updates 2026-06-07 15:24:57 +08:00
nianzhibai 2f2bfbfcdc Improve video detail player controls and layout 2026-06-07 15:17:08 +08:00
nianzhibai 9def08b0c5 Enhance video detail player experience
Add ArtPlayer/HLS playback, resume prompts, mobile gestures, orientation toggle, and theme-aware controls. Hide author metadata from video detail headers.
2026-06-07 00:15:32 +08:00
nianzhibai c87208117e Fix scanner cancellation and shorts UI 2026-06-06 08:37:00 +00:00
nianzhibai a770b3af6b Support local STRM files 2026-06-06 07:50:43 +00:00
nianzhibai e1b8f0eae7 Fix drive form dirty state and media fallbacks 2026-06-05 14:42:12 +00:00
nianzhibai 2d907da07d Redesign admin drive/video management UI
- 新建网盘弹窗:改为品牌色卡片选择器,二步式流程,选中后展示已选品牌栏
- 网盘详情页:简化页头(类型芯片 + 状态),生成状态改为三列布局,本地存储改为横向指标
- 视频管理页:标题列加缩略图,标签列合并至标题内联,来源列修复折行,操作按钮统一为纯图标

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 14:09:43 +00:00
nianzhibai 78cfb0a9e5 Fix admin modal focus reset 2026-06-05 12:57:06 +00:00
nianzhibai fa7823ef3e Fix admin loading spinner and empty drive copy 2026-06-05 12:50:21 +00:00
nianzhibai 5b0afcfc6c Fix deploy script update exit status 2026-06-05 12:35:14 +00:00
nianzhibai 76ae3cea7d fix admin video batch delete and spider91 form 2026-06-04 23:18:53 +08:00
nianzhibai abe335cea0 chore: install spider91 runtime deps in deploy script 2026-06-04 16:24:28 +08:00
nianzhibai 8dff0f07b9 feat: add admin video deletion and mobile UI polish
Adds tombstone-backed video deletion with generated asset cleanup, plus responsive video management actions and centered confirmation dialogs.
2026-06-04 16:10:26 +08:00
nianzhibai 5080203b7c feat: add drive task stop controls
Add per-drive and global admin controls to stop scan, preview, thumbnail, and fingerprint work.

Keep stopped pending generation resumable, wire cancellation through workers and nightly runs, and refine mobile drive-management UI/history behavior.
2026-06-03 23:42:54 +08:00
nianzhibai df6f0ebbbf feat: support spider91 upload to 123pan 2026-06-03 21:49:27 +08:00
nianzhibai 8f0d52aec4 fix: hash long local media asset filenames 2026-06-03 20:35:53 +08:00
nianzhibai 53327c9b8e fix: cool down p115 transient stream errors 2026-06-03 20:04:49 +08:00
nianzhibai 57ed546b83 chore: remove obsolete project docs 2026-06-03 19:47:37 +08:00
nianzhibai 869c0d5f78 refactor: rename teaser UI copy to preview video 2026-06-03 19:45:15 +08:00
nianzhibai 397823bb8d refactor: polish admin mobile management UI 2026-06-03 19:28:00 +08:00
nianzhibai 9e1acd4e56 fix: prevent mobile admin video card text highlight 2026-06-03 19:23:03 +08:00
nianzhibai 2cd365acd4 Improve admin UI accessibility and feedback 2026-06-03 10:53:18 +08:00
nianzhibai 48808ec568 fix: wire admin video keyword filter 2026-06-02 23:41:36 +08:00
nianzhibai 5dc00e486d refactor: optimize admin UI usability and code structure
- Split DrivesPage.tsx (1821→594 lines) into modular components under src/admin/drive/
- Add Escape key to close any modal dialog
- Pause drive list polling when browser tab is hidden (Page Visibility API)
- Remove duplicate formatBytes from VideosPage, unify to storageFormat.ts
- Batch delete (TagsPage) and batch regen (VideosPage) now use Promise.allSettled for concurrency
- Add mobile bottom sheet for logout and check-update (previously hidden on <768px)
- Update adminDriveForm tests to cover extracted component files
2026-06-02 23:30:46 +08:00
nianzhibai 4ec1097496 Update supported cloud services in README 2026-06-02 16:02:30 +08:00
nianzhibai 95e46d8fbb fix: rename failed teaser retry action 2026-06-02 15:54:37 +08:00
nianzhibai fdfc4771df chore: verify runtime dependencies during install 2026-06-02 15:50:19 +08:00
nianzhibai c8c6812ae6 fix: prevent empty listing layout flicker 2026-06-02 15:39:21 +08:00
nianzhibai b938ff1221 fix: prevent hover animation flicker 2026-06-02 15:30:13 +08:00
nianzhibai 7d63a6d265 docs: add MIT license 2026-06-02 15:16:35 +08:00
nianzhibai a8de7d2f6b fix: improve local storage path diagnostics 2026-06-02 15:11:53 +08:00
nianzhibai d4fcff896e perf: optimize home page loading 2026-06-02 15:04:12 +08:00
nianzhibai cada336e96 123云盘支持,删除存储逻辑优化 2026-06-02 14:30:16 +08:00
nianzhibai 5bb93bd95b fix: install socks support for 91Spider proxy 2026-06-01 20:31:33 +08:00
nianzhibai b6be7d021c fix: reduce duplicate home recommendations 2026-06-01 19:02:41 +08:00
nianzhibai e36a17f99d fix: improve 91Spider tagging and deduped tag filters 2026-06-01 18:51:56 +08:00
nianzhibai e01b7cc3b7 perf: speed up catalog startup migrations 2026-06-01 18:03:21 +08:00
nianzhibai c78f22aedb feat: add 91Spider proxy support and drive improvements 2026-06-01 17:41:20 +08:00
nianzhibai cf9de5b40a Add failed fingerprint retry controls 2026-06-01 13:42:32 +08:00
nianzhibai be19f81e82 网盘 302 支持说明 2026-05-31 19:42:10 +08:00
nianzhibai 4d679ef64f docs: update release version example 2026-05-31 17:53:38 +08:00
nianzhibai 4ba964b7e2 fix thumbnail status and frontend serving 2026-05-31 17:40:16 +08:00
nianzhibai cd3b3c6976 feat: use root id as drive scan root 2026-05-31 17:13:51 +08:00
nianzhibai 91c03947d1 fix: suppress deleted auto tags 2026-05-31 16:51:45 +08:00
nianzhibai 7f1c1a51a3 fix: remove setup login help text 2026-05-31 16:41:12 +08:00
nianzhibai 077c2e2c38 fix: make install script optional checks non-fatal 2026-05-31 16:32:58 +08:00
nianzhibai 30a62f265a fix: clean up install script uninstall 2026-05-31 16:19:41 +08:00
nianzhibai 38e62c6a2f feat: paginate admin tags 2026-05-31 16:07:49 +08:00
nianzhibai 6345cf74e0 fix: preserve shorts slide on fullscreen exit 2026-05-31 16:00:56 +08:00
nianzhibai f004b14d20 feat: add bulk tag deletion 2026-05-31 15:45:22 +08:00
nianzhibai a407312dfa fix: prevent duplicate scan-all jobs 2026-05-31 15:09:05 +08:00
nianzhibai a165605b0f Merge pull request #15 from thazjswe42700/fix/logout-icon-alignment
fix: remove extra margin-right on logout button icon
2026-05-31 14:34:46 +08:00
nianzhibai 0ac1a5b13e Merge pull request #16 from thazjswe42700/fix/scan-all-debounce
fix: debounce scan-all button and deduplicate toast notifications
2026-05-31 14:34:32 +08:00
nianzhibai a83449b129 fix: improve shorts preference and scrubbing 2026-05-31 12:59:21 +08:00
nianzhibai c68891e6f0 Merge pull request #14 from yancj9ya/feat/shorts-tag-preference
按观看标签优化短视频推荐
2026-05-31 12:36:31 +08:00
hermes-agent 9892599412 fix: debounce scan-all button and deduplicate toast notifications
- Add scanningAll state to disable the 扫描所有网盘 button while the
  API request is in-flight, preventing repeated clicks from stacking
  independent requests.
- Deduplicate toast notifications: when show() is called with the same
  text that is already visible, reset its dismiss timer instead of
  adding a duplicate overlay.

Closes #13
2026-05-31 04:26:34 +00:00
hermes-agent 0cb2a7a1c2 fix: remove extra margin-right on logout button icon
The LogOut icon had an inline marginRight:4 that conflicted with the
flex gap:6 defined in CSS, causing the icon to be misaligned with the
Check Update button above it.

Closes #11
2026-05-31 04:19:32 +00:00
nianzhibai 87d197496b Limit thumbnail transient retries 2026-05-31 12:02:49 +08:00
nianzhibai 0e3a5bd5cd Add Google Drive support 2026-05-31 11:14:03 +08:00
nianzhibai d72bfee10f feat: prefer short videos by watched tags
Recommend shorts from the least-populated tag after a user watches a video long enough, while preserving random fallback behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-31 10:41:27 +08:00
nianzhibai 389dd981a8 Add manual tag deletion 2026-05-31 10:39:18 +08:00
nianzhibai 44d622d49c Revise upgrade instructions in README
Updated instructions for upgrading to the latest stable version.
2026-05-31 10:14:13 +08:00
nianzhibai d7ff0c98af chore: deploy Docker Compose from stable release image 2026-05-31 10:04:56 +08:00
nianzhibai 66adf444ba fix: detect Docker image version for update checks 2026-05-31 09:55:15 +08:00
nianzhibai 8f8037b838 Update README with service restart instruction
Add note about restarting service on first access.
2026-05-30 20:26:29 +08:00
nianzhibai 215d9596fd Update README.md 2026-05-30 20:17:18 +08:00
nianzhibai e57058db79 feat: prepare v0.0.4 storage release 2026-05-30 20:02:02 +08:00
nianzhibai 6ec61833f2 feat: probe video duration during thumbnail generation 2026-05-30 18:30:22 +08:00
nianzhibai 6e87f88d53 feat: support spider91 uploads to OneDrive 2026-05-30 18:04:15 +08:00
nianzhibai e78fa9d978 feat: improve media generation pipeline status 2026-05-30 17:37:31 +08:00
nianzhibai afbff9eb55 Add Docker Compose deployment support 2026-05-30 11:09:04 +08:00
nianzhibai 039ec2a988 Improve fingerprint dedupe maintenance 2026-05-29 23:58:36 +08:00
nianzhibai da0683344e Add sampled fingerprint deduplication 2026-05-29 23:19:52 +08:00
nianzhibai 1a1282382e Simplify OneDrive setup and redirect playback 2026-05-29 22:35:02 +08:00
nianzhibai 34b6fa8ea9 Release v0.0.3 improvements 2026-05-29 18:34:38 +08:00
nianzhibai 08e38bc4ca Recreate releases with assets 2026-05-29 16:46:02 +08:00
nianzhibai c93d193efe Fetch annotated tag notes for releases 2026-05-29 16:39:12 +08:00
nianzhibai 08568c3951 Use tag notes for release body 2026-05-29 16:34:29 +08:00
nianzhibai 7e394e2971 Prioritize ready thumbnails on home 2026-05-29 16:23:13 +08:00
nianzhibai d16e3168f9 Update README with upgrade instructions and cleanup
Added upgrade instructions for old version users and removed redundant access troubleshooting note.
2026-05-29 15:39:51 +08:00
nianzhibai 81f348b246 Document legacy update recovery 2026-05-29 15:37:40 +08:00
nianzhibai 1e71c1fb72 Wait for service readiness after install 2026-05-29 15:34:48 +08:00
nianzhibai d5122d289e Harden installer update flow 2026-05-29 15:23:42 +08:00
nianzhibai c146ad50ed Fix PikPak captcha recovery 2026-05-29 14:49:47 +08:00
nianzhibai f5c20f9594 Fix spider91 upload target and thumbnails 2026-05-29 06:28:18 +00:00
nianzhibai 62e69d4c06 Update mobile section images in README 2026-05-29 11:54:56 +08:00
nianzhibai 51725ba82f 更新 README.md 2026-05-29 11:28:02 +08:00
nianzhibai c06db836dd Update LinuxDo community link in README 2026-05-28 21:30:38 +08:00
nianzhibai b8717da4fd Include restart command for access issues
Add troubleshooting tip for project access issues.
2026-05-28 21:26:49 +08:00
nianzhibai 2d57545e87 Revise README content for clarity and updates
Updated the README to enhance the description and clarify features.
2026-05-28 21:23:29 +08:00
nianzhibai 6518d772c0 docs: polish README layout 2026-05-28 21:15:40 +08:00
nianzhibai f2c0e7f854 Enhance README with new features and preview images
Added preview images for desktop and mobile, included theme options and short video mode.
2026-05-28 21:11:13 +08:00
nianzhibai 3c7219ecd6 fix: reduce mobile admin content gap 2026-05-28 20:50:46 +08:00
nianzhibai 94669fd35e Revise README for project overview and setup
Updated project description and installation instructions in README.md.
2026-05-28 20:41:40 +08:00
159 changed files with 38803 additions and 6854 deletions
+21
View File
@@ -0,0 +1,21 @@
.git
.github
.gitattributes
.gitignore
node_modules
dist
release
data
backend/data
backend/config.yaml
config.yaml
*.db
*.sqlite
*.sqlite3
*.log
*.tmp
tests
video-site-implementation-plan.md
+82
View File
@@ -0,0 +1,82 @@
name: Docker
on:
push:
branches:
- main
tags:
- "v*"
pull_request:
branches:
- main
workflow_dispatch:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
permissions:
contents: read
packages: write
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to GHCR
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=tag
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha,prefix=sha-
type=raw,value=latest,enable=${{ startsWith(github.ref, 'refs/tags/v') }}
type=raw,value=stable,enable=${{ startsWith(github.ref, 'refs/tags/v') }}
- name: Determine image version
id: version
shell: bash
run: |
if [[ "$GITHUB_REF" == refs/tags/v* ]]; then
version="$GITHUB_REF_NAME"
else
version="$(git describe --tags --always --dirty 2>/dev/null || git rev-parse --short=12 HEAD)"
fi
echo "version=$version" >> "$GITHUB_OUTPUT"
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
platforms: linux/amd64,linux/arm64
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
build-args: |
VERSION=${{ steps.version.outputs.version }}
cache-from: type=gha
cache-to: type=gha,mode=max
+9 -4
View File
@@ -15,6 +15,8 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Go
uses: actions/setup-go@v5
@@ -36,8 +38,11 @@ jobs:
GH_TOKEN: ${{ github.token }}
TAG: ${{ github.ref_name }}
run: |
if gh release view "$TAG" >/dev/null 2>&1; then
gh release upload "$TAG" release/*.tar.gz --clobber
else
gh release create "$TAG" release/*.tar.gz --title "$TAG" --notes "Prebuilt Linux release packages."
git tag -d "$TAG" >/dev/null 2>&1 || true
git fetch --force origin "refs/tags/$TAG:refs/tags/$TAG"
NOTES="$(git tag -l "$TAG" --format='%(contents)')"
if [ -z "$NOTES" ]; then
NOTES="Prebuilt Linux release packages."
fi
gh release delete "$TAG" --yes >/dev/null 2>&1 || true
gh release create "$TAG" release/*.tar.gz --title "$TAG" --notes "$NOTES" --verify-tag
+10
View File
@@ -23,8 +23,10 @@ tools/
# 编译产物
backend/server
backend/server.*
release/
tsconfig.tsbuildinfo
tmp/
# 91 爬虫脚本独立运行时的默认输出文件(backend 跑时会显式 --output 到 backend/data/spider91/,所以不会落在这里)
91porn_videos.json
@@ -33,3 +35,11 @@ tsconfig.tsbuildinfo
91VideoSpider/__pycache__/
__pycache__/
*.pyc
# Local scratch images
/image.jpg
/image003.jpg
/image004.jpg
/image005.png
/image006.png
/image02.png
+164 -10
View File
@@ -9,9 +9,12 @@
- 视频直链 (MP4)
依赖安装:
pip install requests beautifulsoup4 lxml
pip install requests beautifulsoup4 lxml PySocks
使用方法:
# 作为 video-site-91 通用爬虫脚本运行(后台会自动这样调用)
python spider_91porn.py --job /path/to/job.json
# 全量爬取(默认行为,从 page=1 一直爬到末尾,写到 OUTPUT_FILE
python spider_91porn.py
@@ -22,6 +25,7 @@
python spider_91porn.py --target-new 15 --seen-viewkeys-file /tmp/seen.txt --output /tmp/new.json
CLI 参数:
--job FILE crawler.v1 job JSON 路径;后台爬虫管理会使用此模式
--page N 只爬第 N 页,配合 --output 用于手动调试
--target-new N 从 page 1 起翻页直到凑够 N 个新视频(不在 seen 列表里的)
--seen-viewkeys-file FILE 每行一个已知 viewkey 或 mp4 源 ID,命中即跳过;与 --target-new 配合使用
@@ -37,6 +41,8 @@ CLI 参数:
- OUTPUT_FILE : 输出文件名
输出格式 (JSON):
--job 模式下 stdout 输出 crawler.v1 JSON Lines,日志全部写到 stderr。
手动运行模式仍会写传统 JSON 文件:
{
"videos": [
{
@@ -68,6 +74,7 @@ import time
import random
import json
import os
import socket
import sys
import html
from urllib.parse import urljoin, unquote, urlparse
@@ -76,10 +83,32 @@ from datetime import datetime
try:
from bs4 import BeautifulSoup
except ImportError:
print("错误: 缺少依赖库 beautifulsoup4")
print("请运行: pip install beautifulsoup4 lxml")
print("错误: 缺少依赖库 beautifulsoup4", file=sys.stderr)
print("请运行: pip install beautifulsoup4 lxml", file=sys.stderr)
sys.exit(1)
def prefer_ipv4_for_plain_socks5_proxy():
"""PySocks may pick IPv6 first for socks5://; some SOCKS5 servers only accept IPv4."""
proxy_envs = (
os.environ.get("HTTPS_PROXY", ""),
os.environ.get("HTTP_PROXY", ""),
os.environ.get("https_proxy", ""),
os.environ.get("http_proxy", ""),
)
uses_plain_socks5 = any(v.strip().lower().startswith("socks5://") for v in proxy_envs)
if not uses_plain_socks5 or getattr(socket, "_spider91_ipv4_first", False):
return
original_getaddrinfo = socket.getaddrinfo
def getaddrinfo_ipv4_first(*args, **kwargs):
infos = original_getaddrinfo(*args, **kwargs)
return sorted(infos, key=lambda info: 0 if info[0] == socket.AF_INET else 1)
socket.getaddrinfo = getaddrinfo_ipv4_first
socket._spider91_ipv4_first = True
# ===================== 配置区域 =====================
BASE_URL = "https://www.91porn.com/v.php"
LIST_PARAMS = {
@@ -125,9 +154,24 @@ OUTPUT_FILE = "91porn_videos.json"
MAX_PAGES = None # 设置为 None 爬取所有页,或设置整数如 5 只爬前5页
RESUME = True # 是否跳过输出文件中已存在的 viewkey (断点续爬)
MAX_EMPTY_PAGES = 2 # 连续空页数达到此值时停止爬取
CRAWLER_NAME = "91Porn"
CRAWLER_PROTOCOL = "crawler.v1"
# ===================================================
def crawler_source_id(raw: str) -> str:
"""Return a backend-safe source_id, preserving existing numeric 91 IDs."""
value = str(raw or "").strip()
if not value:
return ""
safe = re.sub(r"[^A-Za-z0-9_.-]+", "_", value).strip("._-")
return safe[:160]
def write_jsonl(event: dict):
print(json.dumps(event, ensure_ascii=False), flush=True)
class Porn91Spider:
def __init__(
self,
@@ -140,6 +184,7 @@ class Porn91Spider:
target_new: int = None,
seen_viewkeys: list = None,
stream_output: bool = False,
stream_protocol: str = "legacy",
):
"""
构造函数。所有参数都有默认值,等同于使用脚本顶部的全局配置。
@@ -175,6 +220,7 @@ class Porn91Spider:
# (配合 backend Go 端 bufio.Scanner 实时消费,下载一个就开始下一个)。
# 开启后所有 log 都走 stderr。
self.stream_output = bool(stream_output)
self.stream_protocol = stream_protocol or "legacy"
# 添加重试适配器
try:
@@ -240,7 +286,28 @@ class Porn91Spider:
if not self.stream_output:
return
try:
print(json.dumps(video, ensure_ascii=False), flush=True)
if self.stream_protocol == "crawler.v1":
source_id = crawler_source_id(video.get("source_id") or video.get("viewkey") or "")
item = {
"title": video.get("title") or "",
"detail_url": video.get("detail_url") or "",
"author": "91porn",
"tags": ["91porn"],
"media_url": video.get("video_url") or "",
"thumbnail_url": video.get("thumb_url") or "",
"headers": {
"Referer": video.get("detail_url") or BASE_URL,
},
}
if source_id:
item["source_id"] = source_id
event = {
"type": "item",
"item": item,
}
write_jsonl(event)
else:
print(json.dumps(video, ensure_ascii=False), flush=True)
except Exception as e:
# stdout 异常基本只在管道断开时发生(消费方进程死了);
# 写到 stderr 让 backend 看到,然后让 crawl 循环自己 break。
@@ -674,8 +741,9 @@ class Porn91Spider:
except Exception as e:
self.log(f"保存文件失败: {e}")
# 尝试输出到控制台作为备份
print("\n--- 备份输出 ---")
print(json.dumps(output_data, ensure_ascii=False, indent=2))
backup_out = sys.stderr if self.stream_output else sys.stdout
print("\n--- 备份输出 ---", file=backup_out, flush=True)
print(json.dumps(output_data, ensure_ascii=False, indent=2), file=backup_out, flush=True)
def _print_summary(self):
"""
@@ -706,7 +774,7 @@ def print_help():
- 视频直链 (MP4)
依赖安装:
pip install requests beautifulsoup4 lxml
pip install requests beautifulsoup4 lxml PySocks
使用方法:
python spider_91porn.py
@@ -728,6 +796,84 @@ def print_help():
""")
def run_job(job_path: str):
"""Run as a crawler.v1 script plugin.
The Go host passes a job JSON file and expects stdout JSONL events. Logs go
to stderr so stdout stays machine-readable.
"""
with open(job_path, "r", encoding="utf-8") as f:
job = json.load(f)
if job.get("protocol") != CRAWLER_PROTOCOL:
raise ValueError(f"unsupported crawler protocol: {job.get('protocol')!r}")
if job.get("mode") not in ("", None, "crawl"):
raise ValueError(f"unsupported crawler mode: {job.get('mode')!r}")
try:
target_new = int(job.get("target_new") or 15)
except (TypeError, ValueError):
target_new = 15
if target_new <= 0:
target_new = 15
seen_file = job.get("seen_source_ids_file") or ""
output_dir = job.get("output_dir") or os.getcwd()
run_id = job.get("run_id") or datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")
os.makedirs(output_dir, exist_ok=True)
output_file = os.path.join(output_dir, f"spider91-{run_id}.json")
network = job.get("network") if isinstance(job.get("network"), dict) else {}
proxy_url = str(network.get("proxy_url") or "").strip()
if proxy_url:
os.environ["HTTP_PROXY"] = proxy_url
os.environ["HTTPS_PROXY"] = proxy_url
os.environ["http_proxy"] = proxy_url
os.environ["https_proxy"] = proxy_url
os.environ["NO_PROXY"] = ""
os.environ["no_proxy"] = ""
seen_viewkeys = []
if seen_file:
try:
with open(seen_file, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if line:
seen_viewkeys.append(line)
except FileNotFoundError:
print(f"警告: seen_source_ids_file 不存在: {seen_file}", file=sys.stderr, flush=True)
except Exception as e:
print(f"警告: 读取 seen_source_ids_file 失败: {e}", file=sys.stderr, flush=True)
prefer_ipv4_for_plain_socks5_proxy()
spider = Porn91Spider(
output_file=output_file,
start_page=1,
max_pages=None,
resume=False,
quiet=True,
target_new=target_new,
seen_viewkeys=seen_viewkeys,
stream_output=True,
stream_protocol="crawler.v1",
)
try:
spider.crawl()
done = {
"type": "done",
"stats": {
"emitted": spider.processed_videos,
"failed": spider.failed_videos,
"skipped": spider.skipped_videos,
},
}
write_jsonl(done)
except KeyboardInterrupt:
spider.log("\n用户中断,正在保存已爬取的数据...")
spider._save_results()
raise
def main():
if len(sys.argv) > 1 and sys.argv[1] in ('-h', '--help', 'help'):
print_help()
@@ -755,15 +901,23 @@ def main():
parser.add_argument("--stream-output", action="store_true",
help="流式模式:每解析一条视频直链就立即把它作为一行 JSON 写到 stdout 并 flush"
"日志改走 stderr。配合 backend 边读边下载使用。")
parser.add_argument("--job", type=str, default=None,
help="crawler.v1 job JSON 路径;作为通用脚本爬虫运行。")
args, _ = parser.parse_known_args()
if args.job:
run_job(args.job)
return
cli_out = sys.stderr if args.stream_output else sys.stdout
prefer_ipv4_for_plain_socks5_proxy()
print("""
================================================
91porn 视频爬虫启动中...
================================================
按 Ctrl+C 可随时中断并保存进度
""")
""", file=cli_out)
# 加载已知 ID(来自 backend 的 catalog 已入库列表;兼容旧参数名)
seen_viewkeys = []
@@ -775,9 +929,9 @@ def main():
if line:
seen_viewkeys.append(line)
except FileNotFoundError:
print(f"警告: --seen-viewkeys-file 不存在: {args.seen_viewkeys_file}")
print(f"警告: --seen-viewkeys-file 不存在: {args.seen_viewkeys_file}", file=cli_out)
except Exception as e:
print(f"警告: 读取 --seen-viewkeys-file 失败: {e}")
print(f"警告: 读取 --seen-viewkeys-file 失败: {e}", file=cli_out)
# 决定运行模式
if args.target_new is not None:
+68
View File
@@ -0,0 +1,68 @@
# ---- Stage 1: Build frontend ----
FROM node:20-slim AS frontend
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY tsconfig.json vite.config.ts index.html ./
COPY public/ public/
COPY src/ src/
RUN npm run build
# ---- Stage 2: Build backend ----
FROM golang:1.23-bookworm AS backend
WORKDIR /app/backend
COPY backend/go.mod backend/go.sum ./
COPY backend/vendor/ vendor/
COPY backend/cmd/ cmd/
COPY backend/internal/ internal/
RUN CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/server ./cmd/server
# ---- Stage 3: Runtime ----
FROM debian:bookworm-slim AS runtime
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
curl \
ffmpeg \
openssl \
python3 \
python3-bs4 \
python3-lxml \
python3-requests \
python3-socks \
tar \
tzdata \
&& rm -rf /var/lib/apt/lists/*
RUN python3 -c "import requests, bs4, lxml, socks"
WORKDIR /opt/video-site-91
COPY --from=backend /out/server ./server
COPY --from=frontend /app/dist ./dist
COPY backend/config.example.yaml ./config.example.yaml
COPY docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh
ARG VERSION=dev
ENV VIDEO_CONFIG=/opt/video-site-91/data/config.yaml \
VIDEO_FRONTEND_DIR=/opt/video-site-91/dist \
VIDEO_GITHUB_REPO=nianzhibai/91 \
VIDEO_IMAGE_VERSION=${VERSION} \
VIDEO_LISTEN_PORT=9191 \
VIDEO_VERSION_FILE=/opt/video-site-91/data/.version
RUN chmod +x ./server /usr/local/bin/docker-entrypoint.sh
VOLUME ["/opt/video-site-91/data"]
EXPOSE 9191
ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["./server"]
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 nianzhibai
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+162 -126
View File
@@ -1,171 +1,207 @@
# 视频聚合站
# 91
把散落在不同网盘里的视频,整理成一个可以自己登录、自己浏览、自己管理的私人视频站。
<p align="center">
<img width="120" height="120" alt="91" src="https://github.com/user-attachments/assets/5b323c94-bbd3-4dce-bbc8-adc86935b7de" />
</p>
网盘适合存东西,却不适合慢慢看东西。文件多了以后,你很难记住它们在哪里、叫什么、有没有看过、还能不能快速预览。这个项目做的是中间那一层:文件仍然留在原来的网盘里,但你可以用一个更像视频站的界面去搜索、筛选、预览和管理它们。
<p align="center">
😄 个人私有视频站 😄
</p>
它不是另一个网盘客户端,也不是内容平台。它更像是给你自己的视频收藏做一个入口:安静、集中、可控。
<p align="center">
<a href="#快速开始">快速开始</a> ·
<a href="#功能特性">功能特性</a> ·
<a href="#预览图">预览图</a> ·
<a href="#数据存放位置">数据目录</a> ·
<a href="#许可证">许可证</a>
</p>
## 它能做什么
---
- **统一入口**:把 115、PikPak、夸克、联通沃盘、OneDrive、本地上传和可选的 91 爬虫源放在同一个站里浏览。
- **像视频站一样浏览**:首页推荐、最新视频、列表页、搜索、标签筛选、详情播放和相关推荐都已经接好。
- **自动生成预览**:后端会用 ffmpeg 在本地生成封面和短 teaser,扫到新视频后不用一条条手动整理。
- **保留网盘本身**:视频文件不需要搬家,播放时由后端按来源取链和代理。
- **后台可管理**:在管理后台添加网盘、扫描所有网盘、编辑视频信息、维护标签、切换主题。
- **首次部署更直接**:第一次访问时会要求设置管理员用户名和密码,设置后保存到本地配置文件。
- **适合长期运行**:扫描、预览、隐藏视频、标签归类这些重复工作,都尽量交给系统处理。
## 功能特性
## 适合谁
- **多后端支持** — 兼容 115 云盘、PikPak 云盘、123网盘、联通网盘、OneDrive、Google Drive 和本地存储
- **低带宽播放** — 115 云盘、PikPak 云盘、123网盘、联通网盘、OneDrive 支持302模式,在线播放视频时,不占用服务器带宽,播放体验不受服务器带宽影响;Google Drive 不支持302模式,走服务器中转,观看体验会受服务器带宽影响
- **封面 & 预览片段** — 自动为每个视频生成封面图和预览片段,首页快速选片
- **爬虫脚本** — 项目支持导入自定义脚本,但是有一些规范,具体可以参考 [SpiderFor91](https://github.com/Just-Spider/SpiderFor91),项目不再内置任何爬虫脚本
- **短视频模式** — 一键切换抖音风格,沉浸刷片
如果你有一批视频散落在多个网盘里,想把它们整理成一个自己的私有站点,这个项目会比较合适。
---
如果你只是想临时播放单个文件,直接用网盘客户端更简单;如果你想做公开视频网站,这个项目也不是为那个场景设计的。它的重点是个人部署、个人管理、个人观看。
## 预览图
## 支持的来源
### 电脑端
- 115 网盘
- PikPak
- 91 爬虫源
- 夸克网盘
- 联通沃盘
- OneDrive
- 本地上传
<p>
<img width="49%" alt="首页" src="https://github.com/user-attachments/assets/9808fceb-760b-4dd5-b7d2-8622b95b90d5" />
<img width="49%" alt="播放页" src="https://github.com/user-attachments/assets/859db4aa-1fba-44f2-bb46-1db07c2f964f" />
</p>
91 爬虫源是一种特殊存储来源,用来把爬虫抓到的视频和封面接入站内目录。它不是必须项;如果你只想管理自己的网盘,可以完全不启用。
<p>
<img width="49%" alt="主题切换" src="https://github.com/user-attachments/assets/96bea37a-8764-413e-9b70-1856b4ae0cd2" />
<img width="49%" alt="管理页" src="https://github.com/user-attachments/assets/29c1e27a-7651-4dfc-93dd-556331844214" />
</p>
### 手机端
<p align="center">
<img width="1284" height="1134" alt="手机端" src="https://github.com/user-attachments/assets/bdb7a86c-a4e5-483e-a307-e02c0bb34dac" />
</p>
---
## 快速开始
需要先准备:
- Node.js 18+
- Go 1.23+
- ffmpeg 和 ffprobe
启动项目:
### 方式一:一键安装脚本(推荐)
```bash
npm install
./start.sh
```
默认访问地址:
- 前台:`http://127.0.0.1:9191/`
- 后台:`http://127.0.0.1:9191/admin`
- 后端:`127.0.0.1:9192`
第一次打开时,如果还没有设置管理员账号,页面会引导你创建用户名和密码。保存后会写入本地的 `backend/config.yaml`
常用命令:
```bash
./start.sh --status
./start.sh --restart
./start.sh --stop
```
需要前端热更新时:
```bash
FRONTEND_MODE=dev ./start.sh --restart
```
## 新服务器一键安装
如果你只是想在一台 Ubuntu / Debian 服务器上尽快跑起来,推荐使用预编译安装脚本。普通用户不需要安装 Go、Node.js,也不需要自己编译;脚本会按服务器 CPU 架构下载 GitHub Release 里的预编译包,安装运行依赖,写入 systemd 服务并启动。
```bash
sudo apt update
sudo apt install -y curl ca-certificates
sudo apt update && sudo apt install -y curl ca-certificates
curl -fsSL https://raw.githubusercontent.com/nianzhibai/91/main/install.sh -o install.sh
sudo bash install.sh
```
部署完成后访问:
- 前台:`http://服务器IP:9191/`
- 后台:`http://服务器IP:9191/admin`
| 地址 | 说明 |
|------|------|
| `http://服务器IP:9191/` | 前台 |
| `http://服务器IP:9191/admin` | 后台管理 |
第一次打开后台会要求设置管理员用户名和密码。常用维护命令:
**注意:如果首次访问,显示502,可以运行 `91 restart` 重启一下服务**
安装后自动注册 `91` 管理命令:
```bash
sudo bash install.sh status
sudo bash install.sh logs
sudo bash install.sh update
sudo bash install.sh restart
sudo bash install.sh stop
91 # 打开管理菜单
91 status # 查看运行状态
91 logs # 查看日志
91 update # 更新到最新版本
91 restart # 重启服务
91 stop # 停止服务
```
安装后会自动创建 `91` 指令,和 OpenList 的管理指令类似:
> `video-site-91` 为等效别名,两者可互换使用。
```bash
91 # 打开管理菜单
91 status # 查看状态
91 logs # 查看日志
91 update # 更新
91 restart # 重启
91 stop # 停止
```
同时也保留 `video-site-91` 作为同等别名。
想换端口:
**自定义端口:**
```bash
FRONTEND_PORT=8080 sudo -E bash install.sh
```
如果服务器还有云厂商安全组,请记得放行对应端口,默认是 `9191/tcp`
**旧版本升级(v0.0.2 之前):**
如果你是项目维护者,要预先编译发布包
旧版脚本直接执行 `91 update` 可能失败,先执行以下修复命令
```bash
scripts/build-release.sh
curl -fsSL https://raw.githubusercontent.com/nianzhibai/91/main/install.sh -o /tmp/install-91.sh
sudo bash /tmp/install-91.sh update
```
它会生成:
---
- `release/video-site-91-linux-amd64.tar.gz`
- `release/video-site-91-linux-arm64.tar.gz`
### 方式二:Docker Compose 部署
把这两个文件上传到 GitHub Release 后,`install.sh` 就能自动下载。仓库也带了 GitHub Actions:推送 `v*` 标签时会自动构建并上传这两个 Release 包。
源码部署仍然保留在 `deploy.sh`,适合你想在服务器上直接 clone、编译和调试时使用。
## 第一次使用
1. 打开 `http://127.0.0.1:9191/`,先完成管理员账号设置。
2. 进入 `/admin`,在网盘管理里新建一个来源。
3. 填入名称和对应凭证,保存。
4. 点击“扫描所有网盘”,等待视频入库。
5. 回到前台,用首页、搜索、标签和详情页浏览内容。
## 数据放在哪里
项目会把运行数据保存在本地:
- `backend/config.yaml`:本地配置、管理员账号、网盘凭证。
- `backend/data/video-site.db`SQLite 数据库。
- `backend/data/previews/`:本地生成的封面和 teaser。
这些文件不应该提交到公开仓库。仓库里的 `backend/config.example.yaml` 只是模板,不应该放真实账号、Cookie、Token 或密码。
## 更多文档
根目录 README 只保留项目介绍和最短上手路径。更细的实现、接口、网盘字段和部署方式可以看:
- [backend/README.md](backend/README.md)
- [video-site-implementation-plan.md](video-site-implementation-plan.md)
## 开发验证
**1. 准备目录**
```bash
npm run lint
npm test
cd backend && go test ./... -count=1
mkdir -p video-site-91 && cd video-site-91
```
## 使用边界
**2. 创建 `docker-compose.yml`**
这个项目面向个人私有部署。请只接入你有权访问和管理的内容,并遵守对应网盘、站点服务条款以及所在地法律法规。
```yaml
services:
video-site-91:
image: ghcr.io/nianzhibai/91:stable
container_name: video-site-91
ports:
- "9191:9191"
volumes:
- ./data:/opt/video-site-91/data
restart: unless-stopped
```
创建yml文件后运行下面指令
```bash
docker compose pull
docker compose up -d
```
如果想固定某个 Release 版本,可以改成明确的 tag,例如:
```yaml
image: ghcr.io/nianzhibai/91:v0.0.6
```
或直接拉取仓库内置配置:
```bash
curl -fsSL https://raw.githubusercontent.com/nianzhibai/91/main/docker-compose.yml -o docker-compose.yml
```
**3. 启动**
```bash
docker compose up -d
```
**常用命令:**
```bash
docker compose logs -f # 查看日志
docker compose pull # 拉取最新正式版 stable 镜像
docker compose up -d # 更新并重启
```
> 所有配置、数据库、封面、预览及上传文件均保存在 `./data/` 目录下。
---
## 数据存放位置
### 一键脚本部署
| 路径 | 内容 |
|------|------|
| `/opt/video-site-91/config.yaml` | 配置文件、管理员账号、网盘凭证 |
| `/opt/video-site-91/data/video-site.db` | SQLite 数据库 |
| `/opt/video-site-91/data/previews/` | 封面图和预览片段 |
### Docker Compose 部署
| 路径 | 内容 |
|------|------|
| `./data/config.yaml` | 配置文件、管理员账号、网盘凭证 |
| `./data/video-site.db` | SQLite 数据库 |
| `./data/previews/` | 封面图和预览片段 |
| `./data/uploads/` | 本地上传的视频文件 |
| `./data/spider91/` | 91 爬虫抓取的视频文件 |
---
## 使用须知
本项目面向**个人私有部署**,请仅接入你有权访问和管理的内容,并遵守对应网盘、站点的服务条款及所在地法律法规。
> 不对外传播,仅限个人使用。
---
## PR提交规范
欢迎大家提交PR,一起来完善这个项目,但是这里要说明一下PR提交的规范
1. 一个PR的功能改动要单一,不建议一个PR修改了大量功能。单个PR单个功能修改,这样也更容易Merge
2. 完善项目的PR比新增功能的PR更容易Merge(例如:例如你发现开发者没有实现爬取的视频上传到某个网盘,并且你有这个需求,此时你可以实现一下这个功能然后提交PR,也感谢你为开发者分担工作量)
3. 新增功能的PR不容易Merge,因为某些功能的需求可能不是所有人都需要的,如果一味的不断增加功能,会让项目变得过于庞大。当然如果你肯定你的新功能和想法很好,并且相信将会对于项目有很大的改善,那么热烈欢迎你的PR
---
## 许可证
本项目基于 [MIT License](LICENSE) 开源。
---
## 致谢
- [OpenList](https://github.com/OpenListTeam/OpenList) — 优秀的开源项目
- [LinuxDo](https://linux.do/) — 学 AI 上 L 站
- [NodeSeek](https://nodeseek.com/) — MJJ 上 N 站
+42 -19
View File
@@ -2,8 +2,8 @@
视频聚合站的 Go 后端。提供三件事:
1. 多家网盘统一抽象(夸克 / 115 / PikPak / 联通盘 / OneDrive
2. 视频元数据目录(SQLite)+ 扫描 + teaser 预生成
1. 多家网盘统一抽象(夸克 / 115 / PikPak / 联通盘 / OneDrive / Google Drive / 本地存储
2. 视频元数据目录(SQLite)+ 扫描 + 预览视频预生成
3. REST API(前台)+ 管理后台 + 直链代理
4. 标签池、视频隐藏、按网盘统计和详情页来源网盘类型展示能力
@@ -19,10 +19,12 @@ internal/
quark/ 夸克(自己实现,参考 OpenList quark_uc
p115/ 115(壳子 + SheltonZhu/115driver
pikpak/ PikPak(自己实现,参考 OpenList pikpak
wopan/ 联通盘(壳子 + OpenListTeam/wopan-sdk-go
wopan/ 联通盘(壳子 + OpenListTeam/wopan-sdk-go
onedrive/ OneDriveOpenList 在线续期 + Microsoft Graph 文件接口)
googledrive/ Google DriveOpenList 在线续期 + Google Drive API;播放走后端代理)
localstorage/ 本地目录扫描(服务器已有视频目录)
scanner/ 扫目录 → 落库
preview/ ffmpeg 抽封面和生成多段 teaser
preview/ ffmpeg 抽封面和生成多段预览视频
proxy/ /p/stream/*、/p/preview/* 代理
auth/ 管理员 session
api/ REST 路由
@@ -79,7 +81,7 @@ go run ./cmd/server 后端 9192
## 添加一个盘
推荐在前端管理后台 `/admin/drives` 新增网盘。保存后会立即挂载并触发扫描;视频结果可在 `/admin/videos` 按网盘查看,每页 100 条,页面会同时显示各网盘 Teaser 已生成、待生成、失败数量。
推荐在前端管理后台 `/admin/drives` 新增网盘。保存后会立即挂载并触发扫描;视频结果可在 `/admin/videos` 按网盘查看,每页 100 条,页面会同时显示各网盘预览视频已生成、待生成、失败数量。
也可以直接调用后端接口:
@@ -91,7 +93,6 @@ go run ./cmd/server 后端 9192
"kind": "quark",
"name": "我的夸克盘",
"rootId": "0",
"scanRootId": "0",
"credentials": {
"cookie": "粘贴浏览器 F12 复制的 pan.quark.cn Cookie"
}
@@ -105,9 +106,11 @@ go run ./cmd/server 后端 9192
|--------|---------------------------------------------------------------|
| quark | `cookie` |
| p115 | `cookie`(形如 `UID=...; CID=...; SEID=...; KID=...` |
| pikpak | `username`、`password`,可选 `refresh_token`、`captcha_token`、`device_id`、`platform`、`disable_media_link` |
| pikpak | `username`、`password`(token、验证码和设备 ID 由服务端自动处理并保存) |
| wopan | `access_token`、`refresh_token`,可选 `family_id` |
| onedrive | `refresh_token`,可选 `access_token`、`api_url_address`、`region`、`is_sharepoint`、`site_id` |
| onedrive | `refresh_token` |
| googledrive | 默认只需 `refresh_token`;自建 OAuth 客户端模式还需 `use_online_api=false`、`client_id`、`client_secret` |
| localstorage | `path`(服务器上的已有视频目录,如 `/mnt/videos` |
### PikPak 速度说明
@@ -115,29 +118,49 @@ go run ./cmd/server 后端 9192
当前服务器同时存在 sing-box TUN 透明代理,PikPak 默认出站会被 `tun0` 接管;但强制直连物理网卡并没有更快,慢速的主要差异来自 PikPak 取链方式。media/cache CDN 节点仍有波动,偶尔可能遇到慢节点;如果播放变慢,可重新获取直链或重新挂载 PikPak 后再测。
OneDrive 按 OpenList 默认方式调用 `https://api.oplist.org/onedrive/renewapi` 在线刷新 token,不需要配置 Azure 应用的 `client_id` / `client_secret` / `redirect_uri`。OpenList 代刷得到的 refresh token 可以直接填到本项目。普通 OneDrive 的 `rootId` / `scanRootId` 可填 `root`SharePoint 文档库需要额外设置 `is_sharepoint=true` 和 `site_id`
OneDrive 按 OpenList 默认应用方式调用 `https://api.oplist.org/onedrive/renewapi` 在线刷新 token,不需要配置 Azure 应用的 `client_id` / `client_secret` / `redirect_uri`。后台新建 OneDrive 时只需要填 OpenList 代刷得到的 `refresh_token`;服务端会默认挂载根目录并自动回写新 token
Google Drive 默认按 OpenList 在线 API 调用 `https://api.oplist.org/googleui/renewapi` 刷新 token。后台新建 Google Drive 时只需要填 OpenList Google Drive 获取到的 `refresh_token`。如果不想依赖 OpenList 在线 API,可以关闭“使用 OpenList 在线续期 API”,并填写同一个 Google OAuth 客户端授权得到的 `refresh_token`、`client_id`、`client_secret`,服务端会直接请求 Google OAuth token 接口续期。Google Drive 下载地址必须携带 `Authorization` 头,浏览器不能直接 302 使用,所以本站会由后端代理 `/p/stream` 播放,不加入零带宽 302 白名单。
## 文件名约定
扫描器按以下顺序解析文件名:
扫描器按以下顺序解析文件名,用于提取标题和作者
1. `[tag1,tag2] 标题 - 作者.mp4`
2. `[tag1,tag2] 标题.mp4`
1. `[前缀] 标题 - 作者.mp4`
2. `[前缀] 标题.mp4`
3. `标题 - 作者.mp4`
4. `标题.mp4`
标签分隔符支持 `, ` 和空格。解析结果会和系统标签池匹配,常见番号类噪声会归并到 `AV` 等系统标签,避免把每个番号都变成独立标签。解析结果可在管理后台覆盖。
开头的 `[前缀]` 只会从标题里剥离,不会按分隔符作为任意标签入库。视频标签来自三类规则:
1. 文件名、作者和目录名命中系统标签或已有标签的标签名 / 别名。
2. 符合条件的目录名会自动创建 `collection` 合集标签,并给同目录视频打上该标签。
3. 常见番号类噪声会统一归并到 `AV`,避免把每个番号都变成独立标签。
当前内置系统标签为:`后入`、`奶子`、`口交`、``、`人妻`、`女大`、`AV`。解析结果可在管理后台覆盖;手动保存后,该视频会标记为人工标签,后续扫描不会再自动覆盖。
## 视频去重
项目有三层去重:
1. 同一网盘同一文件按 `(drive_id, file_id)` 形成稳定视频 ID,重复扫描只更新同一行。
2. 扫描时优先按网盘侧 `content_hash` 去重;没有 hash 时退化为 `file_name + size_bytes`。
3. 扫描、本地上传或服务启动挂载网盘后,后台指纹 worker 会异步读取视频的少量 Range 片段,生成 `sampled_sha256`。前台列表、首页、搜索、推荐会按 `size_bytes + sampled_sha256` 只展示最早入库的 canonical 视频。
`sampled_sha256` 是文件级去重:适合识别同一个视频文件被复制到 115 / PikPak / OneDrive / Google Drive 等不同网盘的情况。它不会删除任何网盘文件,也不用于识别转码、裁剪、加水印后的同源视频。
封面和预览视频仍然优先生成,不等待指纹完成。夜间流水线最后会做一次重复资产清理:对 `size_bytes + sampled_sha256` 命中的非 canonical 视频,只删除本机生成的重复封面和预览视频,并把对应字段重置为 `pending`。网盘原文件和视频元数据记录不会被删除;如果 canonical 视频以后被移除,这些重复项会重新进入生成队列。
## 管理能力
- `/admin/drives`:新增、编辑、删除网盘,触发扫描。
- `/admin/videos`:按网盘筛选视频,每页 100 条分页,查看各网盘 Teaser 统计,编辑标题/作者/分类/标签,单条或全量重生 teaser
- `/admin/tags`:新增标签并用内置规则自动匹配已有视频。
- `/admin/videos`:按网盘筛选视频,每页 100 条分页,查看各网盘预览视频统计,编辑标题/作者/分类/标签,单条或全量重生预览视频
- `/admin/tags`:新增标签并用内置规则自动匹配已有视频;删除非系统标签时会从所有视频上同步移除该标签
- 播放页视频信息会展示来源网盘类型;同时提供“不再展示”,点击后会把视频标记为全局隐藏。隐藏视频不会再出现在首页、列表、搜索、相关推荐和详情接口中。目前没有管理后台恢复入口,如需恢复可把数据库里对应视频的 `hidden` 字段改回 `0`。
## Teaser 生成
## 预览视频生成
scanner 扫到新视频会把 `(driveID, videoID)` 丢进 worker 队列。worker 会先用 `ffprobe` 探测时长,再用 `ffmpeg` 抽封面和生成无声 teaser
scanner 扫到新视频会把 `(driveID, videoID)` 丢进 worker 队列。worker 会先用 `ffprobe` 探测时长,再用 `ffmpeg` 抽封面和生成无声预览视频
```
ffmpeg -ss <起点> -headers "UA/Cookie/Referer" -i <直链> \
@@ -145,9 +168,9 @@ ffmpeg -ss <起点> -headers "UA/Cookie/Referer" -i <直链> \
-movflags +faststart -y <local>.mp4
```
当前策略是每段固定 3 秒;30 秒以下最多 3 段,30 秒及以上固定 4 段;长视频在 20% 到 80% 区间均匀取段。生成的 teaser 和封面都只保存在本地 `data/previews/`,不会回写到网盘;旧数据中的 `preview_file_id` 会被忽略。
当前策略是每段固定 3 秒;30 秒以下最多 3 段,30 秒及以上固定 4 段;长视频在 20% 到 80% 区间均匀取段。生成的预览视频和封面都只保存在本地 `data/previews/`,不会回写到网盘;旧数据中的 `preview_file_id` 会被忽略。
服务启动或网盘重新挂载时,如果 Teaser 开关已开启,后端会把历史 `pending` 任务重新入队,避免重启后长期停在“待生成”。OneDrive 直链生成 teaser 时可能触发 Microsoft 429 限流;后端会识别这类错误并让当前网盘进入冷却期,保留任务为 `pending`,避免连续请求触发更严重限流。
服务启动或网盘重新挂载时,如果预览视频开关已开启,后端会把历史 `pending` 任务重新入队,避免重启后长期停在“待生成”。OneDrive 扫盘和直链生成预览视频 / 封面时可能触发 Microsoft Graph 429、`TooManyRequests`、`activityLimitReached` 或 throttled 文本;Google Drive 可能返回 429、`usageLimits`、`userRateLimitExceeded`、`downloadQuotaExceeded` 等限制标识。后端会识别这类错误并让当前网盘进入冷却期,保留任务为 `pending`,避免连续请求触发更严重限流。扫盘阶段会按 `Retry-After` 或默认冷却时间等待后继续当前目录。
前端卡片的 `previewSrc` 统一指向 `/p/preview/<videoID>`,后端只从本地 `preview_local` 文件读取。
+37
View File
@@ -67,3 +67,40 @@ func TestFrontendHandlerDoesNotSwallowBackendRoutes(t *testing.T) {
}
}
}
func TestResolveFrontendDirFallsBackToParentDist(t *testing.T) {
workspace := t.TempDir()
backendDir := filepath.Join(workspace, "backend")
distDir := filepath.Join(workspace, "dist")
if err := os.MkdirAll(backendDir, 0o755); err != nil {
t.Fatalf("mkdir backend: %v", err)
}
if err := os.MkdirAll(distDir, 0o755); err != nil {
t.Fatalf("mkdir dist: %v", err)
}
if err := os.WriteFile(filepath.Join(distDir, "index.html"), []byte("<html>app</html>"), 0o644); err != nil {
t.Fatalf("write index: %v", err)
}
oldWD, err := os.Getwd()
if err != nil {
t.Fatalf("getwd: %v", err)
}
t.Cleanup(func() {
if err := os.Chdir(oldWD); err != nil {
t.Fatalf("restore wd: %v", err)
}
})
t.Setenv("VIDEO_FRONTEND_DIR", "")
if err := os.Chdir(backendDir); err != nil {
t.Fatalf("chdir backend: %v", err)
}
got, ok := resolveFrontendDir()
if !ok {
t.Fatal("resolveFrontendDir ok = false, want true")
}
if got != "../dist" {
t.Fatalf("frontend dir = %q, want ../dist", got)
}
}
+1993 -334
View File
File diff suppressed because it is too large Load Diff
+69
View File
@@ -1,9 +1,13 @@
package main
import (
"context"
"io"
"testing"
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/drives"
"github.com/video-site/backend/internal/proxy"
)
func TestSpider91IntCredFallbacks(t *testing.T) {
@@ -30,3 +34,68 @@ func TestSpider91IntCredFallbacks(t *testing.T) {
})
}
}
func TestSpider91UploadDriveIDDoesNotAutoSelectTarget(t *testing.T) {
reg := proxy.NewRegistry()
reg.Set("p115-one", &spider91UploadTargetFakeDrive{id: "p115-one", kind: "p115"})
reg.Set("p123-one", &spider91UploadTargetFakeDrive{id: "p123-one", kind: "p123"})
reg.Set("onedrive-one", &spider91UploadTargetFakeDrive{id: "onedrive-one", kind: "onedrive"})
reg.Set("wopan-one", &spider91UploadTargetFakeDrive{id: "wopan-one", kind: "wopan"})
app := &App{registry: reg}
if got := app.Spider91UploadDriveID(); got != "" {
t.Fatalf("empty upload target selected %q, want local-only empty target", got)
}
app.spider91UploadDriveID = "p115-one"
if got := app.Spider91UploadDriveID(); got != "p115-one" {
t.Fatalf("explicit upload target = %q, want p115-one", got)
}
app.spider91UploadDriveID = "p123-one"
if got := app.Spider91UploadDriveID(); got != "p123-one" {
t.Fatalf("explicit p123 upload target = %q, want p123-one", got)
}
app.spider91UploadDriveID = "onedrive-one"
if got := app.Spider91UploadDriveID(); got != "onedrive-one" {
t.Fatalf("explicit onedrive upload target = %q, want onedrive-one", got)
}
app.spider91UploadDriveID = "wopan-one"
if got := app.Spider91UploadDriveID(); got != "wopan-one" {
t.Fatalf("explicit wopan upload target = %q, want wopan-one", got)
}
app.spider91UploadDriveID = "missing"
if got := app.Spider91UploadDriveID(); got != "" {
t.Fatalf("missing upload target = %q, want empty", got)
}
}
type spider91UploadTargetFakeDrive struct {
id string
kind string
}
func (d *spider91UploadTargetFakeDrive) Kind() string { return d.kind }
func (d *spider91UploadTargetFakeDrive) ID() string { return d.id }
func (d *spider91UploadTargetFakeDrive) Init(context.Context) error {
return nil
}
func (d *spider91UploadTargetFakeDrive) List(context.Context, string) ([]drives.Entry, error) {
return nil, nil
}
func (d *spider91UploadTargetFakeDrive) Stat(context.Context, string) (*drives.Entry, error) {
return nil, drives.ErrNotSupported
}
func (d *spider91UploadTargetFakeDrive) StreamURL(context.Context, string) (*drives.StreamLink, error) {
return nil, drives.ErrNotSupported
}
func (d *spider91UploadTargetFakeDrive) Upload(context.Context, string, string, io.Reader, int64) (string, error) {
return "", drives.ErrNotSupported
}
func (d *spider91UploadTargetFakeDrive) EnsureDir(context.Context, string) (string, error) {
return "", drives.ErrNotSupported
}
func (d *spider91UploadTargetFakeDrive) RootID() string { return "root" }
File diff suppressed because it is too large Load Diff
+29 -13
View File
@@ -22,7 +22,7 @@ server:
storage:
# SQLite 数据库文件路径
db_path: "./data/video-site.db"
# 本地 teaser 和封面目录
# 本地预览视频和封面目录
local_preview_dir: "./data/previews"
scanner:
@@ -33,33 +33,30 @@ scanner:
# 单次扫描每家网盘目录递归层数上限
max_depth: 5
# 被扫描的扩展名
video_extensions: [".mp4", ".mkv", ".mov", ".webm", ".avi"]
video_extensions: [".mp4", ".mkv", ".mov", ".webm", ".avi", ".strm"]
nightly:
# 凌晨流水线触发整点(0-23),默认 1 即每天 01:00。流程:
# Phase 1 扫所有非 spider91 / 非 localupload 网盘 → 检测新增 / 删除
# → 入队封面和 teaser → 等所有队列 idle
# Phase 2 spider91 爬虫(如配置)→ 入队 teaser → 等队列 idle
# Phase 3 spider91 → 云盘迁移(一次性 sweep)
# 凌晨流水线触发整点(0-23),默认 1 即每天 01:00。
# 运行时会统一编排扫描、媒体资产生成和后续清理任务。
cron_hour: 1
# 单次流水线总耗时上限(软超时);超过后当前 phase 跑完不启动后续 phase。
max_duration: 6h
preview:
# 是否启用 ffmpeg 抽帧生成 teaser
# 是否启用 ffmpeg 抽帧生成预览视频
enabled: true
# ffmpeg / ffprobe 可执行文件名或绝对路径
ffmpeg_path: "ffmpeg"
ffprobe_path: "ffprobe"
# teaser 每段时长(秒),实际生成时每段最多 3 秒
# 预览视频每段时长(秒),实际生成时每段最多 3 秒
duration_seconds: 3
# 兼容旧配置;当前 30 秒以下最多 3 段,30 秒及以上固定 4 段
segments: 3
# teaser 视频宽度
# 预览视频宽度
width: 480
# 盘列表。上线后请通过管理后台添加,本文件可留空。
# kind 支持 quark / p115 / pikpak / wopan / onedrive。
# kind 支持 quark / p115 / p123 / pikpak / wopan / onedrive / googledrive / localstorage
# OneDrive 示例:
# - id: "my-onedrive"
# kind: "onedrive"
@@ -67,6 +64,25 @@ preview:
# root_id: "root"
# params:
# refresh_token: "..."
# api_url_address: "https://api.oplist.org/onedrive/renewapi"
# region: "global"
# Google Drive 示例:
# - id: "my-google"
# kind: "googledrive"
# name: "我的 Google Drive"
# root_id: "root"
# params:
# refresh_token: "..."
# # 默认 use_online_api=true,会使用 OpenList 在线续期 API。
# # 如需使用自己创建的 Google OAuth 客户端,取消下面三行注释:
# # use_online_api: "false"
# # client_id: "..."
# # client_secret: "..."
# 本地存储示例:
# - id: "local-media"
# kind: "localstorage"
# name: "本地视频目录"
# root_id: "/"
# params:
# # Docker 部署时这里和 .strm 里的绝对路径都必须使用容器内路径。
# # 例如宿主机 /mnt/videos 挂载为 /media,就填写 /media。
# path: "/mnt/videos"
drives: []
+4 -4
View File
@@ -7,15 +7,18 @@ toolchain go1.23.4
require (
github.com/OpenListTeam/wopan-sdk-go v0.2.0
github.com/SheltonZhu/115driver v1.3.2
github.com/aliyun/aliyun-oss-go-sdk v3.0.2+incompatible
github.com/go-chi/chi/v5 v5.1.0
github.com/go-resty/resty/v2 v2.14.0
github.com/skip2/go-qrcode v0.0.0-20200617195104-da1b6568686e
golang.org/x/net v0.27.0
golang.org/x/sys v0.30.0
gopkg.in/yaml.v3 v3.0.1
modernc.org/sqlite v1.33.1
)
require (
github.com/aead/ecdh v0.2.0 // indirect
github.com/aliyun/aliyun-oss-go-sdk v3.0.2+incompatible // indirect
github.com/andreburgaud/crypt2go v1.1.0 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/google/uuid v1.6.0 // indirect
@@ -26,10 +29,7 @@ require (
github.com/pierrec/lz4/v4 v4.1.17 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
github.com/skip2/go-qrcode v0.0.0-20200617195104-da1b6568686e // indirect
golang.org/x/crypto v0.25.0 // indirect
golang.org/x/net v0.27.0 // indirect
golang.org/x/sys v0.30.0 // indirect
golang.org/x/time v0.8.0 // indirect
modernc.org/gc/v3 v3.0.0-20240107210532-573471604cb6 // indirect
modernc.org/libc v1.55.3 // indirect
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+376 -118
View File
@@ -11,18 +11,22 @@ import (
"io"
"math/rand/v2"
"net/http"
"net/url"
"os"
"path/filepath"
"strconv"
"strings"
"sync"
"time"
"github.com/go-chi/chi/v5"
"github.com/video-site/backend/internal/auth"
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/drives/localstorage"
"github.com/video-site/backend/internal/drives/localupload"
"github.com/video-site/backend/internal/drives/spider91"
"github.com/video-site/backend/internal/mediaasset"
"github.com/video-site/backend/internal/proxy"
)
@@ -39,7 +43,7 @@ var allowedUploadExtensions = map[string]struct{}{
var allowedUploadTags = map[string]struct{}{
"奶子": {},
"臀": {},
"口": {},
"口": {},
"女大": {},
"人妻": {},
"AV": {},
@@ -52,6 +56,10 @@ type Server struct {
UploadDir string
OnVideoUploaded func(*catalog.Video)
tagCacheMu sync.Mutex
tagCacheUntil time.Time
tagCache []TagDTO
// GetTheme 返回当前生效的主题("dark" | "pink")。前台 /api/settings/theme 用,
// 不需要登录。无注入时返回 "dark"。
GetTheme func() string
@@ -85,6 +93,12 @@ type VideoDTO struct {
Category string `json:"category,omitempty"`
}
type TagDTO struct {
ID string `json:"id"`
Label string `json:"label"`
Count int `json:"count"`
}
type VideoDetailDTO struct {
VideoDTO
VideoSrc string `json:"videoSrc"`
@@ -133,7 +147,7 @@ func (s *Server) RegisterRoutes(r chi.Router, a *auth.Authenticator) {
r.Post("/api/shorts/next", s.handleShortsNext)
// 代理路由同样需要鉴权,防止绕过
r.Get("/p/stream/{driveID}/{fileID}", s.handleStream)
r.Get("/p/stream/{driveID}/*", s.handleStream)
r.Get("/p/upload/{videoID}", s.handleUploadedVideo)
r.Get("/p/spider91/{videoID}", s.handleSpider91Video)
r.Get("/p/preview/{videoID}", s.handlePreview)
@@ -155,26 +169,117 @@ func (s *Server) handleGetTheme(w http.ResponseWriter, r *http.Request) {
}
func (s *Server) handleHome(w http.ResponseWriter, r *http.Request) {
// 拉一批候选(按发布时间倒序,覆盖最近 200 个),然后随机洗牌取前 homePageSize 个
// 如果库内不足 200 个会自动按实际数量返回,最后裁剪到 homePageSize。
const candidatePool = 200
items, _, err := s.Catalog.ListVideos(r.Context(), catalog.ListParams{
Sort: "latest", Page: 1, PageSize: candidatePool,
})
// 首页优先从全量已有封面的视频里随机抽取,避免只在最近一小段候选中反复出现
excludeIDs := parseVideoIDQuery(r, "exclude", 120)
items, err := s.Catalog.RandomVideosWithReadyThumbnailsExcluding(r.Context(), excludeIDs, homePageSize)
if err != nil {
writeErr(w, http.StatusInternalServerError, err)
return
}
rand.Shuffle(len(items), func(i, j int) {
items[i], items[j] = items[j], items[i]
})
if len(items) > homePageSize {
items = items[:homePageSize]
if len(items) < homePageSize {
fallbackExclude := append([]string{}, excludeIDs...)
for _, item := range items {
if item != nil {
fallbackExclude = append(fallbackExclude, item.ID)
}
}
fallback, err := s.Catalog.RandomVideosExcluding(r.Context(), fallbackExclude, homePageSize-len(items))
if err != nil {
writeErr(w, http.StatusInternalServerError, err)
return
}
items = appendUniqueVideos(items, fallback, homePageSize)
}
if len(items) < homePageSize && len(excludeIDs) > 0 {
// The browser keeps a recent-video exclude list so normal refreshes do not
// repeat too quickly. On small libraries that list can cover every visible
// video; when that happens, start a new random round instead of returning
// an empty home section.
roundExclude := videoIDs(items)
fallback, err := s.Catalog.RandomVideosWithReadyThumbnailsExcluding(r.Context(), roundExclude, homePageSize-len(items))
if err != nil {
writeErr(w, http.StatusInternalServerError, err)
return
}
items = appendUniqueVideos(items, fallback, homePageSize)
}
if len(items) < homePageSize && len(excludeIDs) > 0 {
fallback, err := s.Catalog.RandomVideosExcluding(r.Context(), videoIDs(items), homePageSize-len(items))
if err != nil {
writeErr(w, http.StatusInternalServerError, err)
return
}
items = appendUniqueVideos(items, fallback, homePageSize)
}
w.Header().Set("Cache-Control", "no-store")
writeJSON(w, http.StatusOK, mapVideos(items))
}
func parseVideoIDQuery(r *http.Request, key string, limit int) []string {
if r == nil {
return nil
}
values := r.URL.Query()[key]
if len(values) == 0 {
return nil
}
seen := map[string]struct{}{}
out := make([]string, 0, len(values))
for _, value := range values {
for _, id := range strings.Split(value, ",") {
id = strings.TrimSpace(id)
if id == "" {
continue
}
if _, ok := seen[id]; ok {
continue
}
seen[id] = struct{}{}
out = append(out, id)
if limit > 0 && len(out) >= limit {
return out
}
}
}
return out
}
func appendUniqueVideos(dst []*catalog.Video, candidates []*catalog.Video, limit int) []*catalog.Video {
if len(dst) >= limit {
return dst[:limit]
}
seen := make(map[string]struct{}, len(dst))
for _, v := range dst {
if v != nil {
seen[v.ID] = struct{}{}
}
}
for _, v := range candidates {
if v == nil {
continue
}
if _, ok := seen[v.ID]; ok {
continue
}
dst = append(dst, v)
seen[v.ID] = struct{}{}
if len(dst) >= limit {
return dst
}
}
return dst
}
func videoIDs(items []*catalog.Video) []string {
out := make([]string, 0, len(items))
for _, item := range items {
if item != nil && item.ID != "" {
out = append(out, item.ID)
}
}
return out
}
func (s *Server) handleList(w http.ResponseWriter, r *http.Request) {
q := r.URL.Query()
page, _ := strconv.Atoi(q.Get("page"))
@@ -182,13 +287,18 @@ func (s *Server) handleList(w http.ResponseWriter, r *http.Request) {
if size <= 0 {
size = 24
}
sort := q.Get("sort")
params := catalog.ListParams{
Keyword: q.Get("q"),
Tag: q.Get("tag"),
Category: q.Get("cat"),
Sort: q.Get("sort"),
Page: page,
PageSize: size,
Keyword: q.Get("q"),
Tag: q.Get("tag"),
Category: q.Get("cat"),
Sort: sort,
Page: page,
PageSize: size,
SkipTotal: strings.EqualFold(q.Get("count"), "false"),
}
if sort == "" || sort == "latest" {
params.PreferReadyThumbnails = true
}
items, total, err := s.Catalog.ListVideos(r.Context(), params)
if err != nil {
@@ -204,7 +314,7 @@ func (s *Server) handleList(w http.ResponseWriter, r *http.Request) {
}
func (s *Server) handleVideoDetail(w http.ResponseWriter, r *http.Request) {
id := chi.URLParam(r, "id")
id := routeParam(r, "id")
v, err := s.Catalog.GetVideo(r.Context(), id)
if err != nil {
writeErr(w, http.StatusNotFound, err)
@@ -214,6 +324,15 @@ func (s *Server) handleVideoDetail(w http.ResponseWriter, r *http.Request) {
writeErr(w, http.StatusNotFound, sql.ErrNoRows)
return
}
if v.DriveID != localUploadDriveID {
if _, err := s.Catalog.GetDrive(r.Context(), v.DriveID); err != nil {
drives, listErr := s.Catalog.ListDrives(r.Context())
if listErr != nil || len(drives) > 0 {
writeErr(w, http.StatusNotFound, sql.ErrNoRows)
return
}
}
}
related := s.pickRelatedVideos(r.Context(), v, 6)
dto := mapVideo(v)
if d, err := s.Catalog.GetDrive(r.Context(), v.DriveID); err == nil {
@@ -225,7 +344,7 @@ func (s *Server) handleVideoDetail(w http.ResponseWriter, r *http.Request) {
VideoSrc: s.videoSource(v),
Poster: thumbnailURL(v),
Description: v.Description,
EmbedURL: fmt.Sprintf(`<iframe src="/embed/%s" width="640" height="360" frameborder="0" allowfullscreen></iframe>`, v.ID),
EmbedURL: fmt.Sprintf(`<iframe src="/embed/%s" width="640" height="360" frameborder="0" allowfullscreen></iframe>`, pathSegment(v.ID)),
AuthorProfile: AuthorProfile{
ID: "author-" + v.Author,
Name: v.Author,
@@ -241,7 +360,8 @@ func (s *Server) handleVideoDetail(w http.ResponseWriter, r *http.Request) {
}
// pickRelatedVideos 选 total 个推荐视频。
// 一半(向上取整)来自同标签命中,剩下用全库随机补齐;不会重复,也不会包含当前视频
// 一半来自同标签命中,剩下用全库随机补齐;两段都优先取已有封面的视频
// 不够时再回退到未生成封面的候选。结果不会重复,也不会包含当前视频。
func (s *Server) pickRelatedVideos(ctx context.Context, current *catalog.Video, total int) []*catalog.Video {
if total <= 0 || current == nil {
return nil
@@ -254,93 +374,163 @@ func (s *Server) pickRelatedVideos(ctx context.Context, current *catalog.Video,
picked := make([]*catalog.Video, 0, total)
seen := map[string]struct{}{current.ID: {}}
// 1) 同标签候选:对每个 tag 取一批,合并去重,洗牌后取 tagQuota 个
// 1) 同标签候选:先取已有封面的候选,数量不够再从全部候选里补。
if tagQuota > 0 && len(current.Tags) > 0 {
var tagPool []*catalog.Video
for _, tag := range current.Tags {
if tag == "" {
continue
}
items, _, err := s.Catalog.ListVideos(ctx, catalog.ListParams{
Tag: tag, Sort: "latest", Page: 1, PageSize: 30,
})
if err != nil {
continue
}
for _, v := range items {
if v == nil {
continue
}
if _, ok := seen[v.ID]; ok {
continue
}
seen[v.ID] = struct{}{}
tagPool = append(tagPool, v)
}
picked = appendRandomRelated(
picked,
s.relatedTagPool(ctx, current.Tags, seen, true),
tagQuota,
seen,
)
if len(picked) < tagQuota {
picked = appendRandomRelated(
picked,
s.relatedTagPool(ctx, current.Tags, seen, false),
tagQuota,
seen,
)
}
rand.Shuffle(len(tagPool), func(i, j int) {
tagPool[i], tagPool[j] = tagPool[j], tagPool[i]
})
if len(tagPool) > tagQuota {
tagPool = tagPool[:tagQuota]
}
picked = append(picked, tagPool...)
}
// 2) 随机补齐:从全库取一批(避开已选 ID),洗牌后取剩下的名额
remaining := total - len(picked)
if remaining > 0 {
items, _, err := s.Catalog.ListVideos(ctx, catalog.ListParams{
Sort: "latest", Page: 1, PageSize: 200,
})
if err == nil {
var randomPool []*catalog.Video
for _, v := range items {
if v == nil {
continue
}
if _, ok := seen[v.ID]; ok {
continue
}
seen[v.ID] = struct{}{}
randomPool = append(randomPool, v)
}
rand.Shuffle(len(randomPool), func(i, j int) {
randomPool[i], randomPool[j] = randomPool[j], randomPool[i]
})
if len(randomPool) > remaining {
randomPool = randomPool[:remaining]
}
picked = append(picked, randomPool...)
}
// 2) 随机补齐:同样优先已有封面的全库候选,不够再回退。
if len(picked) < total {
picked = appendRandomRelated(
picked,
s.relatedListPool(ctx, seen, true, 200),
total,
seen,
)
}
if len(picked) < total {
picked = appendRandomRelated(
picked,
s.relatedListPool(ctx, seen, false, 200),
total,
seen,
)
}
return picked
}
func (s *Server) relatedTagPool(ctx context.Context, tags []string, seen map[string]struct{}, readyOnly bool) []*catalog.Video {
var pool []*catalog.Video
poolSeen := make(map[string]struct{})
for _, tag := range tags {
if tag == "" {
continue
}
items, _, err := s.Catalog.ListVideos(ctx, catalog.ListParams{
Tag: tag,
Sort: "latest",
Page: 1,
PageSize: 30,
ThumbnailReadyOnly: readyOnly,
PreferReadyThumbnails: !readyOnly,
})
if err != nil {
continue
}
for _, v := range items {
if v == nil {
continue
}
if _, ok := seen[v.ID]; ok {
continue
}
if _, ok := poolSeen[v.ID]; ok {
continue
}
poolSeen[v.ID] = struct{}{}
pool = append(pool, v)
}
}
return pool
}
func (s *Server) relatedListPool(ctx context.Context, seen map[string]struct{}, readyOnly bool, pageSize int) []*catalog.Video {
items, _, err := s.Catalog.ListVideos(ctx, catalog.ListParams{
Sort: "latest",
Page: 1,
PageSize: pageSize,
ThumbnailReadyOnly: readyOnly,
PreferReadyThumbnails: !readyOnly,
})
if err != nil {
return nil
}
pool := make([]*catalog.Video, 0, len(items))
for _, v := range items {
if v == nil {
continue
}
if _, ok := seen[v.ID]; ok {
continue
}
pool = append(pool, v)
}
return pool
}
func appendRandomRelated(picked []*catalog.Video, pool []*catalog.Video, targetLen int, seen map[string]struct{}) []*catalog.Video {
if len(picked) >= targetLen || len(pool) == 0 {
return picked
}
rand.Shuffle(len(pool), func(i, j int) {
pool[i], pool[j] = pool[j], pool[i]
})
for _, v := range pool {
if len(picked) >= targetLen {
break
}
if v == nil {
continue
}
if _, ok := seen[v.ID]; ok {
continue
}
seen[v.ID] = struct{}{}
picked = append(picked, v)
}
return picked
}
func (s *Server) handleTags(w http.ResponseWriter, r *http.Request) {
now := time.Now()
s.tagCacheMu.Lock()
if s.tagCache != nil && now.Before(s.tagCacheUntil) {
out := append([]TagDTO(nil), s.tagCache...)
s.tagCacheMu.Unlock()
w.Header().Set("Cache-Control", "private, max-age=15")
writeJSON(w, http.StatusOK, out)
return
}
s.tagCacheMu.Unlock()
stats, err := s.Catalog.ListTags(r.Context())
if err != nil {
writeErr(w, http.StatusInternalServerError, err)
return
}
type tag struct {
ID string `json:"id"`
Label string `json:"label"`
Count int `json:"count"`
}
out := make([]tag, 0, len(stats))
out := make([]TagDTO, 0, len(stats))
for _, stat := range stats {
out = append(out, tag{ID: stat.Label, Label: stat.Label, Count: stat.Count})
out = append(out, TagDTO{ID: stat.Label, Label: stat.Label, Count: stat.Count})
}
s.tagCacheMu.Lock()
s.tagCache = append([]TagDTO(nil), out...)
s.tagCacheUntil = now.Add(30 * time.Second)
s.tagCacheMu.Unlock()
w.Header().Set("Cache-Control", "private, max-age=15")
writeJSON(w, http.StatusOK, out)
}
// shortsNextReq 客户端把当前轮已看过的 video id 列表传上来
// 服务器从未在列表中的视频里随机抽 count 个返回
// shortsNextReq 客户端把当前轮已看过的 video id 列表传上来
// PreferredFromVideoID 来自短视频页最近一次点赞成功的视频,用于优先推荐相似标签
type shortsNextReq struct {
SeenIDs []string `json:"seenIds"`
Count int `json:"count"`
SeenIDs []string `json:"seenIds"`
Count int `json:"count"`
PreferredFromVideoID string `json:"preferredFromVideoId"`
}
// ShortsItemDTO 是短视频流单条的精简结构。比 VideoDTO 多 videoSrc / poster
@@ -386,7 +576,12 @@ func (s *Server) handleShortsNext(w http.ResponseWriter, r *http.Request) {
exclude = nil
}
items, err := s.Catalog.RandomVideosExcluding(r.Context(), exclude, count)
var items []*catalog.Video
if strings.TrimSpace(body.PreferredFromVideoID) != "" {
items, err = s.Catalog.RandomVideosForPreferredVideoExcluding(r.Context(), body.PreferredFromVideoID, exclude, count)
} else {
items, err = s.Catalog.RandomVideosExcluding(r.Context(), exclude, count)
}
if err != nil {
writeErr(w, http.StatusInternalServerError, err)
return
@@ -428,7 +623,7 @@ type updateVideoTagsReq struct {
}
func (s *Server) handleUpdateVideoTags(w http.ResponseWriter, r *http.Request) {
id := chi.URLParam(r, "id")
id := routeParam(r, "id")
var body updateVideoTagsReq
if err := json.NewDecoder(r.Body).Decode(&body); err != nil {
writeErr(w, http.StatusBadRequest, err)
@@ -451,7 +646,7 @@ func (s *Server) handleUpdateVideoTags(w http.ResponseWriter, r *http.Request) {
}
func (s *Server) handleLike(w http.ResponseWriter, r *http.Request) {
id := chi.URLParam(r, "id")
id := routeParam(r, "id")
likes, err := s.Catalog.IncrementLike(r.Context(), id)
if err != nil {
writeErr(w, http.StatusInternalServerError, err)
@@ -463,7 +658,7 @@ func (s *Server) handleLike(w http.ResponseWriter, r *http.Request) {
// handleUnlike 取消点赞:likes - 1(保底 0)。
// 短视频模式中爱心按钮点击切换状态时使用。
func (s *Server) handleUnlike(w http.ResponseWriter, r *http.Request) {
id := chi.URLParam(r, "id")
id := routeParam(r, "id")
likes, err := s.Catalog.DecrementLike(r.Context(), id)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
@@ -477,7 +672,7 @@ func (s *Server) handleUnlike(w http.ResponseWriter, r *http.Request) {
}
func (s *Server) handleView(w http.ResponseWriter, r *http.Request) {
id := chi.URLParam(r, "id")
id := routeParam(r, "id")
views, err := s.Catalog.IncrementView(r.Context(), id)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
@@ -491,7 +686,7 @@ func (s *Server) handleView(w http.ResponseWriter, r *http.Request) {
}
func (s *Server) handleHideVideo(w http.ResponseWriter, r *http.Request) {
id := chi.URLParam(r, "id")
id := routeParam(r, "id")
if err := s.Catalog.HideVideo(r.Context(), id); err != nil {
if errors.Is(err, sql.ErrNoRows) {
writeErr(w, http.StatusNotFound, err)
@@ -608,12 +803,12 @@ func (s *Server) handleUploadVideo(w http.ResponseWriter, r *http.Request) {
}
func (s *Server) handleStream(w http.ResponseWriter, r *http.Request) {
driveID := chi.URLParam(r, "driveID")
fileID := chi.URLParam(r, "fileID")
driveID := routeParam(r, "driveID")
fileID := routeWildcardParam(r, "*")
s.Proxy.ServeStream(w, r, driveID, fileID)
}
func (s *Server) handleUploadedVideo(w http.ResponseWriter, r *http.Request) {
videoID := chi.URLParam(r, "videoID")
videoID := routeParam(r, "videoID")
v, err := s.Catalog.GetVideo(r.Context(), videoID)
if err != nil || v.Hidden || v.DriveID != localUploadDriveID {
http.NotFound(w, r)
@@ -637,7 +832,7 @@ func (s *Server) handleUploadedVideo(w http.ResponseWriter, r *http.Request) {
// 路径形如 /p/spider91/<videoID>videoID = "spider91-<driveID>-<sourceID>"。
// 通过 catalog 拿到 file_id"<sourceID>.mp4"),再让 driver 解析到绝对路径并 ServeFile。
func (s *Server) handleSpider91Video(w http.ResponseWriter, r *http.Request) {
videoID := chi.URLParam(r, "videoID")
videoID := routeParam(r, "videoID")
v, err := s.Catalog.GetVideo(r.Context(), videoID)
if err != nil || v.Hidden {
http.NotFound(w, r)
@@ -672,7 +867,7 @@ func (s *Server) handleSpider91Video(w http.ResponseWriter, r *http.Request) {
}
func (s *Server) handlePreview(w http.ResponseWriter, r *http.Request) {
videoID := chi.URLParam(r, "videoID")
videoID := routeParam(r, "videoID")
v, err := s.Catalog.GetVideo(r.Context(), videoID)
if err != nil {
http.NotFound(w, r)
@@ -697,15 +892,20 @@ func (s *Server) handlePreview(w http.ResponseWriter, r *http.Request) {
}
func (s *Server) handleThumb(w http.ResponseWriter, r *http.Request) {
videoID := chi.URLParam(r, "videoID")
// 直接读本地 thumbs 目录中 <videoID>.jpg
path := filepath.Join(s.LocalDir, "thumbs", videoID+".jpg")
clean := filepath.Clean(path)
if !strings.HasPrefix(clean, filepath.Clean(s.LocalDir)) {
http.Error(w, "invalid path", http.StatusForbidden)
return
videoID := routeParam(r, "videoID")
var clean string
for _, path := range mediaasset.ThumbnailPathCandidates(s.LocalDir, videoID) {
candidate := filepath.Clean(path)
if !strings.HasPrefix(candidate, filepath.Clean(s.LocalDir)) {
http.Error(w, "invalid path", http.StatusForbidden)
return
}
if _, err := os.Stat(candidate); err == nil {
clean = candidate
break
}
}
if _, err := os.Stat(clean); err != nil {
if clean == "" {
w.Header().Set("Cache-Control", "no-store")
http.NotFound(w, r)
return
@@ -727,7 +927,7 @@ func mapVideo(v *catalog.Video) VideoDTO {
}
return VideoDTO{
ID: v.ID,
Href: "/video/" + v.ID,
Href: "/video/" + pathSegment(v.ID),
Title: v.Title,
Thumbnail: thumbnailURL(v),
PreviewSrc: previewURL(v),
@@ -749,7 +949,7 @@ func mapVideo(v *catalog.Video) VideoDTO {
}
func previewURL(v *catalog.Video) string {
base := "/p/preview/" + v.ID
base := "/p/preview/" + pathSegment(v.ID)
if v.UpdatedAt.IsZero() {
return base
}
@@ -757,31 +957,83 @@ func previewURL(v *catalog.Video) string {
}
func thumbnailURL(v *catalog.Video) string {
base := "/p/thumb/" + pathSegment(v.ID)
if v.ThumbnailURL != "" {
return v.ThumbnailURL
base = v.ThumbnailURL
if thumbnailURLMatchesVideoID(base, v.ID) {
base = "/p/thumb/" + pathSegment(v.ID)
}
}
return "/p/thumb/" + v.ID
if !strings.HasPrefix(base, "/p/thumb/") || v.UpdatedAt.IsZero() {
return base
}
return base + "?v=" + strconv.FormatInt(v.UpdatedAt.UnixMilli(), 10)
}
func (s *Server) videoSource(v *catalog.Video) string {
if v.DriveID == localUploadDriveID {
return "/p/upload/" + v.ID
return "/p/upload/" + pathSegment(v.ID)
}
if s.Proxy != nil && s.Proxy.Registry != nil {
if d, ok := s.Proxy.Registry.Get(v.DriveID); ok && d.Kind() == spider91.Kind {
return "/p/spider91/" + v.ID
if d, ok := s.Proxy.Registry.Get(v.DriveID); ok {
switch d.Kind() {
case spider91.Kind:
return "/p/spider91/" + pathSegment(v.ID)
}
}
}
return fmt.Sprintf("/p/stream/%s/%s", v.DriveID, v.FileID)
return fmt.Sprintf("/p/stream/%s/%s", pathSegment(v.DriveID), pathSegment(v.FileID))
}
// videoSource 兼容旧调用点,没有 server context 时按之前逻辑回退到 /p/stream。
// 内部新增的代码请使用 (*Server).videoSource。
func videoSource(v *catalog.Video) string {
if v.DriveID == localUploadDriveID {
return "/p/upload/" + v.ID
return "/p/upload/" + pathSegment(v.ID)
}
return fmt.Sprintf("/p/stream/%s/%s", v.DriveID, v.FileID)
return fmt.Sprintf("/p/stream/%s/%s", pathSegment(v.DriveID), pathSegment(v.FileID))
}
func pathSegment(value string) string {
return url.PathEscape(value)
}
func routeParam(r *http.Request, key string) string {
value := chi.URLParam(r, key)
if value == "" {
return ""
}
if decoded, err := url.PathUnescape(value); err == nil {
return decoded
}
return value
}
func routeWildcardParam(r *http.Request, key string) string {
value := chi.URLParam(r, key)
if value == "" {
return ""
}
value = strings.TrimPrefix(value, "/")
if decoded, err := url.PathUnescape(value); err == nil {
return decoded
}
return value
}
func thumbnailURLMatchesVideoID(value, videoID string) bool {
if !strings.HasPrefix(value, "/p/thumb/") {
return false
}
tail := strings.TrimPrefix(value, "/p/thumb/")
if idx := strings.IndexByte(tail, '?'); idx >= 0 {
tail = tail[:idx]
}
if tail == videoID {
return true
}
decoded, err := url.PathUnescape(tail)
return err == nil && decoded == videoID
}
func driveKindLabel(kind string) string {
@@ -790,12 +1042,18 @@ func driveKindLabel(kind string) string {
return "夸克网盘"
case "p115":
return "115 网盘"
case "p123":
return "123网盘"
case "pikpak":
return "PikPak"
case "wopan":
return "联通盘"
return "联通盘"
case "onedrive":
return "OneDrive"
case "googledrive":
return "Google Drive"
case localstorage.Kind:
return "本地存储"
case spider91.Kind:
return "91 爬虫"
default:
+603 -2
View File
@@ -4,11 +4,13 @@ import (
"bytes"
"context"
"encoding/json"
"io"
"mime/multipart"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strconv"
"strings"
"testing"
"time"
@@ -16,6 +18,8 @@ import (
"github.com/go-chi/chi/v5"
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/drives"
"github.com/video-site/backend/internal/mediaasset"
"github.com/video-site/backend/internal/proxy"
)
@@ -64,6 +68,68 @@ func TestVideoSourceKeepsDirectStreamForMp4(t *testing.T) {
}
}
func TestVideoURLsEscapePathSegments(t *testing.T) {
updated := time.UnixMilli(1778863000123)
v := &catalog.Video{
ID: "wopan-drive-fid/with space",
DriveID: "drive-1",
FileID: "fid/with space",
Title: "Video",
UpdatedAt: updated,
}
dto := mapVideo(v)
if dto.Href != "/video/wopan-drive-fid%2Fwith%20space" {
t.Fatalf("href = %q, want escaped video id", dto.Href)
}
if dto.PreviewSrc != "/p/preview/wopan-drive-fid%2Fwith%20space?v=1778863000123" {
t.Fatalf("preview = %q, want escaped video id", dto.PreviewSrc)
}
if dto.Thumbnail != "/p/thumb/wopan-drive-fid%2Fwith%20space?v=1778863000123" {
t.Fatalf("thumbnail = %q, want escaped video id", dto.Thumbnail)
}
if got := videoSource(v); got != "/p/stream/drive-1/fid%2Fwith%20space" {
t.Fatalf("video source = %q, want escaped file id", got)
}
}
func TestThumbnailURLRewritesStoredLocalURLForUnsafeVideoID(t *testing.T) {
got := thumbnailURL(&catalog.Video{
ID: "wopan-drive-fid/with space",
ThumbnailURL: "/p/thumb/wopan-drive-fid/with space",
UpdatedAt: time.UnixMilli(1778863000123),
})
if got != "/p/thumb/wopan-drive-fid%2Fwith%20space?v=1778863000123" {
t.Fatalf("thumbnail URL = %q, want escaped local URL", got)
}
}
func TestHandleStreamDecodesEscapedWildcardFileID(t *testing.T) {
local := filepath.Join(t.TempDir(), "video.mp4")
if err := os.WriteFile(local, []byte("ok"), 0o644); err != nil {
t.Fatalf("write local video: %v", err)
}
drv := &apiStreamFakeDrive{localPath: local}
reg := proxy.NewRegistry()
reg.Set("drive-1", drv)
srv := &Server{Proxy: proxy.New(reg)}
router := chi.NewRouter()
router.Get("/p/stream/{driveID}/*", srv.handleStream)
req := httptest.NewRequest(http.MethodGet, "/p/stream/drive-1/fid%2Fwith%20space", nil)
rr := httptest.NewRecorder()
router.ServeHTTP(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("status = %d, body = %s", rr.Code, rr.Body.String())
}
if drv.fileID != "fid/with space" {
t.Fatalf("fileID = %q, want decoded original", drv.fileID)
}
}
func TestVideoSourceUsesLocalUploadRoute(t *testing.T) {
v := &catalog.Video{
ID: "video-1",
@@ -98,6 +164,340 @@ func TestPreviewURLFallsBackWithoutUpdatedAt(t *testing.T) {
}
}
func TestHandleVideoDetailDecodesEscapedVideoID(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: "wopan-drive-fid/with space",
DriveID: "drive-1",
FileID: "fid/with space",
Title: "Video",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed video: %v", err)
}
router := chi.NewRouter()
router.Get("/api/video/{id}", (&Server{Catalog: cat}).handleVideoDetail)
req := httptest.NewRequest(http.MethodGet, "/api/video/wopan-drive-fid%2Fwith%20space", nil)
rr := httptest.NewRecorder()
router.ServeHTTP(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("status = %d, body = %s", rr.Code, rr.Body.String())
}
var got VideoDetailDTO
if err := json.NewDecoder(rr.Body).Decode(&got); err != nil {
t.Fatalf("decode: %v", err)
}
if got.ID != "wopan-drive-fid/with space" {
t.Fatalf("id = %q, want original video id", got.ID)
}
}
func TestThumbnailURLVersionsLocalGeneratedThumbnails(t *testing.T) {
got := thumbnailURL(&catalog.Video{
ID: "video-1",
ThumbnailURL: "/p/thumb/video-1",
UpdatedAt: time.UnixMilli(1778863000123),
})
if got != "/p/thumb/video-1?v=1778863000123" {
t.Fatalf("thumbnail URL = %q, want versioned local URL", got)
}
remote := "https://thumb.example/video-1.jpg"
got = thumbnailURL(&catalog.Video{
ID: "video-1",
ThumbnailURL: remote,
UpdatedAt: time.UnixMilli(1778863000123),
})
if got != remote {
t.Fatalf("remote thumbnail URL = %q, want unchanged %q", got, remote)
}
}
func TestHandleHomePrioritizesVideosWithReadyThumbnails(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for i := 0; i < 20; i++ {
id := "pending-video-" + strconv.Itoa(i)
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: id,
PublishedAt: now.Add(time.Duration(i) * time.Minute),
CreatedAt: now.Add(time.Duration(i) * time.Minute),
UpdatedAt: now.Add(time.Duration(i) * time.Minute),
}); err != nil {
t.Fatalf("seed pending video %s: %v", id, err)
}
}
for i := 0; i < homePageSize+2; i++ {
id := "ready-video-" + strconv.Itoa(i)
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: id,
ThumbnailURL: "https://thumb.example/" + id + ".jpg",
PublishedAt: now.Add(-time.Duration(i+1) * time.Hour),
CreatedAt: now.Add(-time.Duration(i+1) * time.Hour),
UpdatedAt: now.Add(-time.Duration(i+1) * time.Hour),
}); err != nil {
t.Fatalf("seed ready video %s: %v", id, err)
}
}
rr := httptest.NewRecorder()
req := httptest.NewRequest(http.MethodGet, "/api/home", nil)
(&Server{Catalog: cat}).handleHome(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("status = %d, body = %s", rr.Code, rr.Body.String())
}
var got []VideoDTO
if err := json.NewDecoder(rr.Body).Decode(&got); err != nil {
t.Fatalf("decode response: %v", err)
}
if len(got) != homePageSize {
t.Fatalf("home items = %d, want %d", len(got), homePageSize)
}
for _, item := range got {
if !strings.HasPrefix(item.ID, "ready-video-") {
t.Fatalf("home returned %q without a ready thumbnail; items=%#v", item.ID, got)
}
if !strings.HasPrefix(item.Thumbnail, "https://thumb.example/") {
t.Fatalf("thumbnail for %q = %q, want ready thumbnail URL", item.ID, item.Thumbnail)
}
}
}
func TestHandleHomeExcludesRecentlyShownVideos(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for i := 0; i < homePageSize+4; i++ {
id := "ready-video-" + strconv.Itoa(i)
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: id,
ThumbnailURL: "https://thumb.example/" + id + ".jpg",
PublishedAt: now.Add(time.Duration(i) * time.Minute),
CreatedAt: now.Add(time.Duration(i) * time.Minute),
UpdatedAt: now.Add(time.Duration(i) * time.Minute),
}); err != nil {
t.Fatalf("seed ready video %s: %v", id, err)
}
}
rr := httptest.NewRecorder()
req := httptest.NewRequest(http.MethodGet, "/api/home?exclude=ready-video-0&exclude=ready-video-1", nil)
(&Server{Catalog: cat}).handleHome(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("status = %d, body = %s", rr.Code, rr.Body.String())
}
var got []VideoDTO
if err := json.NewDecoder(rr.Body).Decode(&got); err != nil {
t.Fatalf("decode response: %v", err)
}
if len(got) != homePageSize {
t.Fatalf("home items = %d, want %d", len(got), homePageSize)
}
for _, item := range got {
if item.ID == "ready-video-0" || item.ID == "ready-video-1" {
t.Fatalf("home returned excluded video %q; items=%#v", item.ID, got)
}
if !strings.HasPrefix(item.ID, "ready-video-") {
t.Fatalf("home returned %q without a ready thumbnail; items=%#v", item.ID, got)
}
}
}
func TestHandleHomeStartsNewRoundWhenRecentExcludesAllVisibleVideos(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
excludes := make([]string, 0, homePageSize+2)
for i := 0; i < homePageSize+2; i++ {
id := "ready-video-" + strconv.Itoa(i)
excludes = append(excludes, "exclude="+id)
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: id,
ThumbnailURL: "https://thumb.example/" + id + ".jpg",
PublishedAt: now.Add(time.Duration(i) * time.Minute),
CreatedAt: now.Add(time.Duration(i) * time.Minute),
UpdatedAt: now.Add(time.Duration(i) * time.Minute),
}); err != nil {
t.Fatalf("seed ready video %s: %v", id, err)
}
}
rr := httptest.NewRecorder()
req := httptest.NewRequest(http.MethodGet, "/api/home?"+strings.Join(excludes, "&"), nil)
(&Server{Catalog: cat}).handleHome(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("status = %d, body = %s", rr.Code, rr.Body.String())
}
var got []VideoDTO
if err := json.NewDecoder(rr.Body).Decode(&got); err != nil {
t.Fatalf("decode response: %v", err)
}
if len(got) != homePageSize {
t.Fatalf("home items = %d, want %d; body=%s", len(got), homePageSize, rr.Body.String())
}
seen := map[string]bool{}
for _, item := range got {
if seen[item.ID] {
t.Fatalf("home returned duplicate video %q; items=%#v", item.ID, got)
}
seen[item.ID] = true
if !strings.HasPrefix(item.ID, "ready-video-") {
t.Fatalf("home returned unexpected video %q; items=%#v", item.ID, got)
}
}
}
func TestHandleListLatestPrefersReadyThumbnails(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for i := 0; i < 20; i++ {
id := "pending-latest-" + strconv.Itoa(i)
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: id,
PublishedAt: now.Add(time.Duration(i) * time.Minute),
CreatedAt: now.Add(time.Duration(i) * time.Minute),
UpdatedAt: now.Add(time.Duration(i) * time.Minute),
}); err != nil {
t.Fatalf("seed pending video %s: %v", id, err)
}
}
for i := 0; i < 12; i++ {
id := "ready-latest-" + strconv.Itoa(i)
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: id,
ThumbnailURL: "https://thumb.example/" + id + ".jpg",
PublishedAt: now.Add(-time.Duration(i+1) * time.Hour),
CreatedAt: now.Add(-time.Duration(i+1) * time.Hour),
UpdatedAt: now.Add(-time.Duration(i+1) * time.Hour),
}); err != nil {
t.Fatalf("seed ready video %s: %v", id, err)
}
}
rr := httptest.NewRecorder()
req := httptest.NewRequest(http.MethodGet, "/api/list?page=1&size=12&sort=latest", nil)
(&Server{Catalog: cat}).handleList(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("status = %d, body = %s", rr.Code, rr.Body.String())
}
var got struct {
Items []VideoDTO `json:"items"`
Total int `json:"total"`
}
if err := json.NewDecoder(rr.Body).Decode(&got); err != nil {
t.Fatalf("decode response: %v", err)
}
if got.Total != 32 {
t.Fatalf("total = %d, want all matching videos included", got.Total)
}
if len(got.Items) != 12 {
t.Fatalf("items = %d, want 12", len(got.Items))
}
for _, item := range got.Items {
if !strings.HasPrefix(item.ID, "ready-latest-") {
t.Fatalf("latest list returned %q before ready thumbnails; items=%#v", item.ID, got.Items)
}
if !strings.HasPrefix(item.Thumbnail, "https://thumb.example/") {
t.Fatalf("thumbnail for %q = %q, want ready thumbnail URL", item.ID, item.Thumbnail)
}
}
rr = httptest.NewRecorder()
req = httptest.NewRequest(http.MethodGet, "/api/list?page=1&size=12&sort=latest&count=false", nil)
(&Server{Catalog: cat}).handleList(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("count=false status = %d, body = %s", rr.Code, rr.Body.String())
}
got = struct {
Items []VideoDTO `json:"items"`
Total int `json:"total"`
}{}
if err := json.NewDecoder(rr.Body).Decode(&got); err != nil {
t.Fatalf("decode count=false response: %v", err)
}
if got.Total != 0 {
t.Fatalf("count=false total = %d, want 0", got.Total)
}
if len(got.Items) != 12 {
t.Fatalf("count=false items = %d, want 12", len(got.Items))
}
}
func TestHandleUploadVideoSavesFileVideoTagsAndQueuesPreview(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
@@ -120,7 +520,7 @@ func TestHandleUploadVideoSavesFileVideoTagsAndQueuesPreview(t *testing.T) {
}
req := multipartUploadRequest(t, map[string]string{
"title": "用户上传标题",
"tags": "奶子,AV,女大",
"tags": "奶子,口交,AV,女大",
}, "clip.mp4", "video-bytes")
rr := httptest.NewRecorder()
@@ -146,7 +546,7 @@ func TestHandleUploadVideoSavesFileVideoTagsAndQueuesPreview(t *testing.T) {
if got.Title != "用户上传标题" {
t.Fatalf("title = %q, want submitted title", got.Title)
}
if !sameStringSet(got.Tags, []string{"奶子", "AV", "女大"}) {
if !sameStringSet(got.Tags, []string{"奶子", "口交", "AV", "女大"}) {
t.Fatalf("tags = %#v, want selected tags", got.Tags)
}
if got.PreviewStatus != "pending" {
@@ -317,6 +717,34 @@ func TestHandlePreviewIgnoresRemotePreviewFileIDAndServesLocalFile(t *testing.T)
}
}
func TestHandleThumbServesHashedPathForLongVideoID(t *testing.T) {
localDir := t.TempDir()
longID := "localstorage-" + strings.Repeat("x", 240)
thumbPath := mediaasset.ThumbnailPath(localDir, longID)
if err := os.MkdirAll(filepath.Dir(thumbPath), 0o755); err != nil {
t.Fatalf("mkdir thumb dir: %v", err)
}
if err := os.WriteFile(thumbPath, []byte("thumb-bytes"), 0o644); err != nil {
t.Fatalf("write thumb: %v", err)
}
server := &Server{
LocalDir: localDir,
Proxy: proxy.New(proxy.NewRegistry()),
}
req := requestWithRouteParam(http.MethodGet, "/p/thumb/"+longID, "videoID", longID, strings.NewReader(``))
rr := httptest.NewRecorder()
server.handleThumb(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("status = %d, body = %s", rr.Code, rr.Body.String())
}
if rr.Body.String() != "thumb-bytes" {
t.Fatalf("body = %q, want thumb bytes", rr.Body.String())
}
}
func TestHandleTagsReturnsUnifiedTagPool(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
@@ -382,6 +810,66 @@ func TestHandleTagsReturnsUnifiedTagPool(t *testing.T) {
}
}
func TestHandleShortsNextUsesPreferredVideoLeastPopulatedTag(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for _, v := range []*catalog.Video{
{ID: "current", DriveID: "drive", FileID: "f-current", Title: "current", Tags: []string{"common", "rare"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "common-1", DriveID: "drive", FileID: "f-common-1", Title: "common 1", Tags: []string{"common"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "common-2", DriveID: "drive", FileID: "f-common-2", Title: "common 2", Tags: []string{"common"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "rare-1", DriveID: "drive", FileID: "f-rare-1", Title: "rare 1", Tags: []string{"rare"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
} {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("seed %s: %v", v.ID, err)
}
}
req := httptest.NewRequest(http.MethodPost, "/api/shorts/next", strings.NewReader(`{"seenIds":["current"],"count":3,"preferredFromVideoId":"current"}`))
rr := httptest.NewRecorder()
(&Server{Catalog: cat}).handleShortsNext(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("status = %d, body = %s", rr.Code, rr.Body.String())
}
var got struct {
Items []ShortsItemDTO `json:"items"`
Total int `json:"total"`
RoundComplete bool `json:"roundComplete"`
}
if err := json.NewDecoder(rr.Body).Decode(&got); err != nil {
t.Fatalf("decode: %v", err)
}
ids := make([]string, 0, len(got.Items))
for _, item := range got.Items {
ids = append(ids, item.ID)
}
if got.Total != 4 {
t.Fatalf("total = %d, want 4", got.Total)
}
if got.RoundComplete {
t.Fatalf("roundComplete = true, want false with fallback-filled batch")
}
if !containsString(ids, "rare-1") {
t.Fatalf("ids = %#v, want rare-1 from least populated tag", ids)
}
if containsString(ids, "current") {
t.Fatalf("ids = %#v, should exclude current", ids)
}
if len(ids) != 3 {
t.Fatalf("ids = %#v, want 3 items", ids)
}
}
func TestHandleUpdateVideoTagsRejectsUnknownTags(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
@@ -509,6 +997,88 @@ func TestHandleVideoDetailIncludesDriveKindLabel(t *testing.T) {
}
}
func TestHandleVideoDetailRecommendationsPreferReadyThumbnails(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: "current-video",
DriveID: "drive",
FileID: "current-video",
Title: "Current",
Tags: []string{"same-tag"},
ThumbnailURL: "https://thumb.example/current-video.jpg",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed current video: %v", err)
}
for i := 0; i < 20; i++ {
id := "pending-related-" + strconv.Itoa(i)
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: id,
Tags: []string{"same-tag"},
PublishedAt: now.Add(time.Duration(i+1) * time.Minute),
CreatedAt: now.Add(time.Duration(i+1) * time.Minute),
UpdatedAt: now.Add(time.Duration(i+1) * time.Minute),
}); err != nil {
t.Fatalf("seed pending related video %s: %v", id, err)
}
}
for i := 0; i < 8; i++ {
id := "ready-related-" + strconv.Itoa(i)
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: id,
Tags: []string{"same-tag"},
ThumbnailURL: "https://thumb.example/" + id + ".jpg",
PublishedAt: now.Add(-time.Duration(i+1) * time.Hour),
CreatedAt: now.Add(-time.Duration(i+1) * time.Hour),
UpdatedAt: now.Add(-time.Duration(i+1) * time.Hour),
}); err != nil {
t.Fatalf("seed ready related video %s: %v", id, err)
}
}
req := requestWithVideoID(http.MethodGet, "/api/video/current-video", "current-video", strings.NewReader(``))
rr := httptest.NewRecorder()
(&Server{Catalog: cat}).handleVideoDetail(rr, req)
if rr.Code != http.StatusOK {
t.Fatalf("status = %d, body = %s", rr.Code, rr.Body.String())
}
var got VideoDetailDTO
if err := json.NewDecoder(rr.Body).Decode(&got); err != nil {
t.Fatalf("decode: %v", err)
}
if len(got.RelatedVideos) != 6 {
t.Fatalf("related videos = %d, want 6; items=%#v", len(got.RelatedVideos), got.RelatedVideos)
}
for _, item := range got.RelatedVideos {
if !strings.HasPrefix(item.ID, "ready-related-") {
t.Fatalf("related returned %q before ready thumbnails; items=%#v", item.ID, got.RelatedVideos)
}
if !strings.HasPrefix(item.Thumbnail, "https://thumb.example/") {
t.Fatalf("thumbnail for %q = %q, want ready thumbnail URL", item.ID, item.Thumbnail)
}
}
}
func TestHandleHideVideoRemovesVideoFromPublicListAndDetail(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
@@ -621,6 +1191,37 @@ func sameStringSet(a, b []string) bool {
return true
}
type apiStreamFakeDrive struct {
localPath string
fileID string
}
func (d *apiStreamFakeDrive) Kind() string { return "fake" }
func (d *apiStreamFakeDrive) ID() string { return "drive-1" }
func (d *apiStreamFakeDrive) Init(context.Context) error {
return nil
}
func (d *apiStreamFakeDrive) List(context.Context, string) ([]drives.Entry, error) {
return nil, drives.ErrNotSupported
}
func (d *apiStreamFakeDrive) Stat(context.Context, string) (*drives.Entry, error) {
return nil, drives.ErrNotSupported
}
func (d *apiStreamFakeDrive) StreamURL(_ context.Context, fileID string) (*drives.StreamLink, error) {
d.fileID = fileID
return &drives.StreamLink{
URL: d.localPath,
Expires: time.Now().Add(time.Minute),
}, nil
}
func (d *apiStreamFakeDrive) Upload(context.Context, string, string, io.Reader, int64) (string, error) {
return "", drives.ErrNotSupported
}
func (d *apiStreamFakeDrive) EnsureDir(context.Context, string) (string, error) {
return "", drives.ErrNotSupported
}
func (d *apiStreamFakeDrive) RootID() string { return "root" }
func requestWithVideoID(method, target, videoID string, body *strings.Reader) *http.Request {
return requestWithRouteParam(method, target, "id", videoID, body)
}
File diff suppressed because it is too large Load Diff
+126
View File
@@ -0,0 +1,126 @@
package catalog
import (
"context"
"testing"
)
func TestUpsertDriveUsesRootIDAsScanRootID(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
if err := cat.UpsertDrive(ctx, &Drive{
ID: "drive",
Kind: "p115",
Name: "115",
RootID: "root-folder",
ScanRootID: "ignored-scan-root",
}); err != nil {
t.Fatalf("upsert drive: %v", err)
}
got, err := cat.GetDrive(ctx, "drive")
if err != nil {
t.Fatalf("get drive: %v", err)
}
if got.RootID != "root-folder" {
t.Fatalf("rootId = %q, want root-folder", got.RootID)
}
if got.ScanRootID != "root-folder" {
t.Fatalf("scanRootId = %q, want root-folder", got.ScanRootID)
}
}
func TestUpsertDriveDefaultsRootIDByKind(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
cases := []struct {
id string
kind string
want string
}{
{id: "p115", kind: "p115", want: "0"},
{id: "pikpak", kind: "pikpak", want: ""},
{id: "onedrive", kind: "onedrive", want: "root"},
{id: "googledrive", kind: "googledrive", want: "root"},
{id: "localstorage", kind: "localstorage", want: "/"},
{id: "spider91", kind: "spider91", want: "/"},
}
for _, tc := range cases {
if err := cat.UpsertDrive(ctx, &Drive{
ID: tc.id,
Kind: tc.kind,
Name: tc.kind,
}); err != nil {
t.Fatalf("upsert %s: %v", tc.kind, err)
}
got, err := cat.GetDrive(ctx, tc.id)
if err != nil {
t.Fatalf("get %s: %v", tc.kind, err)
}
if got.RootID != tc.want {
t.Fatalf("%s rootId = %q, want %q", tc.kind, got.RootID, tc.want)
}
if got.ScanRootID != tc.want {
t.Fatalf("%s scanRootId = %q, want %q", tc.kind, got.ScanRootID, tc.want)
}
}
}
func TestUpsertDriveIgnoresRootIDForLocalStorageAndSpider91(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
for _, tc := range []struct {
id string
kind string
}{
{id: "localstorage", kind: "localstorage"},
{id: "spider91", kind: "spider91"},
} {
if err := cat.UpsertDrive(ctx, &Drive{
ID: tc.id,
Kind: tc.kind,
Name: tc.kind,
RootID: "manual-root",
ScanRootID: "manual-scan-root",
}); err != nil {
t.Fatalf("upsert %s: %v", tc.kind, err)
}
got, err := cat.GetDrive(ctx, tc.id)
if err != nil {
t.Fatalf("get %s: %v", tc.kind, err)
}
if got.RootID != "/" {
t.Fatalf("%s rootId = %q, want /", tc.kind, got.RootID)
}
if got.ScanRootID != "/" {
t.Fatalf("%s scanRootId = %q, want /", tc.kind, got.ScanRootID)
}
}
}
+48
View File
@@ -2,6 +2,7 @@ package catalog
import (
"context"
"database/sql"
"sort"
"testing"
"time"
@@ -126,3 +127,50 @@ func TestListSpider91ViewkeysFindsMigratedVideos(t *testing.T) {
t.Fatalf("non-existent drive: got %v, want empty", other)
}
}
func TestDeleteVideoWithTombstonePreventsReimport(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() { _ = cat.Close() })
now := time.Now()
if err := cat.UpsertVideo(ctx, &Video{
ID: "spider91-91Spider-vk004",
DriveID: "91Spider",
FileID: "vk004.mp4",
FileName: "vk004.mp4",
ContentHash: "ABCDEF",
Title: "Deleted Spider",
Size: 2048,
PreviewStatus: "ready",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("upsert: %v", err)
}
if err := cat.DeleteVideoWithTombstone(ctx, "spider91-91Spider-vk004"); err != nil {
t.Fatalf("delete with tombstone: %v", err)
}
if _, err := cat.GetVideo(ctx, "spider91-91Spider-vk004"); err != sql.ErrNoRows {
t.Fatalf("get deleted video error = %v, want sql.ErrNoRows", err)
}
deleted, err := cat.IsDeletedVideoCandidate(ctx, "spider91-91Spider-vk004", "91Spider", "vk004.mp4", "abcdef", "vk004.mp4", 2048)
if err != nil {
t.Fatalf("check deleted candidate: %v", err)
}
if !deleted {
t.Fatal("deleted candidate was not recognized")
}
viewkeys, err := cat.ListSpider91Viewkeys(ctx, "91Spider")
if err != nil {
t.Fatalf("ListSpider91Viewkeys: %v", err)
}
if len(viewkeys) != 1 || viewkeys[0] != "vk004" {
t.Fatalf("viewkeys = %#v, want [vk004]", viewkeys)
}
}
@@ -0,0 +1,179 @@
package catalog
import (
"context"
"testing"
"time"
)
func TestListVideosDeduplicatesBySampledSHA256(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for _, v := range []*Video{
{
ID: "drive-a-file-a",
DriveID: "drive-a",
FileID: "file-a",
FileName: "first-name.mp4",
Title: "First",
Size: 1234,
PublishedAt: now.Add(-time.Minute),
CreatedAt: now.Add(-time.Minute),
UpdatedAt: now.Add(-time.Minute),
},
{
ID: "drive-b-file-b",
DriveID: "drive-b",
FileID: "file-b",
FileName: "second-name.mp4",
Title: "Second",
Size: 1234,
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
},
} {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("upsert %s: %v", v.ID, err)
}
}
items, total, err := cat.ListVideos(ctx, ListParams{Page: 1, PageSize: 10})
if err != nil {
t.Fatalf("list before fingerprint: %v", err)
}
if total != 2 || len(items) != 2 {
t.Fatalf("before fingerprint total=%d len=%d, want 2", total, len(items))
}
const sampled = "abc123"
if err := cat.UpdateVideoFingerprint(ctx, "drive-a-file-a", sampled, "ready", ""); err != nil {
t.Fatalf("update a fingerprint: %v", err)
}
if err := cat.UpdateVideoFingerprint(ctx, "drive-b-file-b", sampled, "ready", ""); err != nil {
t.Fatalf("update b fingerprint: %v", err)
}
items, total, err = cat.ListVideos(ctx, ListParams{Page: 1, PageSize: 10})
if err != nil {
t.Fatalf("list after fingerprint: %v", err)
}
if total != 1 || len(items) != 1 {
t.Fatalf("after fingerprint total=%d len=%d, want 1", total, len(items))
}
if items[0].ID != "drive-a-file-a" {
t.Fatalf("canonical id = %q, want earliest created video", items[0].ID)
}
}
func TestDuplicateAssetCleanupCandidates(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
base := time.Date(2026, 5, 29, 12, 0, 0, 0, time.UTC)
videos := []*Video{
{
ID: "drive-a-canonical",
DriveID: "drive-a",
FileID: "file-a",
FileName: "canonical.mp4",
Title: "Canonical",
Size: 1234,
ThumbnailURL: "/p/thumb/drive-a-canonical",
PreviewLocal: "/tmp/previews/canonical.mp4",
PreviewStatus: "ready",
PublishedAt: base,
CreatedAt: base,
UpdatedAt: base,
},
{
ID: "drive-b-duplicate",
DriveID: "drive-b",
FileID: "file-b",
FileName: "duplicate.mp4",
Title: "Duplicate",
Size: 1234,
ThumbnailURL: "/p/thumb/drive-b-duplicate",
PreviewLocal: "/tmp/previews/duplicate.mp4",
PreviewStatus: "ready",
PublishedAt: base.Add(time.Second),
CreatedAt: base.Add(time.Second),
UpdatedAt: base.Add(time.Second),
},
{
ID: "drive-c-remote-thumb",
DriveID: "drive-c",
FileID: "file-c",
FileName: "remote-thumb.mp4",
Title: "Remote Thumbnail",
Size: 1234,
ThumbnailURL: "https://thumb.example/file-c.jpg",
PreviewStatus: "ready",
PublishedAt: base.Add(2 * time.Second),
CreatedAt: base.Add(2 * time.Second),
UpdatedAt: base.Add(2 * time.Second),
},
}
for _, v := range videos {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("seed %s: %v", v.ID, err)
}
}
const sampled = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
for _, v := range videos {
if err := cat.UpdateVideoFingerprint(ctx, v.ID, sampled, "ready", ""); err != nil {
t.Fatalf("fingerprint %s: %v", v.ID, err)
}
}
items, err := cat.ListDuplicateAssetCleanupCandidates(ctx, 0)
if err != nil {
t.Fatalf("list cleanup candidates: %v", err)
}
if len(items) != 1 {
t.Fatalf("candidates = %#v, want only local duplicate", items)
}
item := items[0]
if item.VideoID != "drive-b-duplicate" || item.CanonicalID != "drive-a-canonical" {
t.Fatalf("candidate = %#v, want duplicate with canonical", item)
}
if err := cat.ClearGeneratedAssets(ctx, item.VideoID, true, true); err != nil {
t.Fatalf("clear generated assets: %v", err)
}
got, err := cat.GetVideo(ctx, item.VideoID)
if err != nil {
t.Fatalf("get duplicate: %v", err)
}
if got.PreviewLocal != "" || got.PreviewStatus != "pending" {
t.Fatalf("preview after cleanup local=%q status=%q, want empty pending", got.PreviewLocal, got.PreviewStatus)
}
if got.ThumbnailURL != "" {
t.Fatalf("thumbnail after cleanup = %q, want empty", got.ThumbnailURL)
}
var thumbStatus string
if err := cat.db.QueryRowContext(ctx, `SELECT thumbnail_status FROM videos WHERE id = ?`, item.VideoID).Scan(&thumbStatus); err != nil {
t.Fatalf("query thumbnail status: %v", err)
}
if thumbStatus != "pending" {
t.Fatalf("thumbnail_status = %q, want pending", thumbStatus)
}
}
@@ -0,0 +1,64 @@
package catalog
import (
"context"
"testing"
"time"
)
func TestListVideosHidesMissingDriveVideosWhenDrivesExist(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
if err := cat.UpsertDrive(ctx, &Drive{
ID: "active-drive",
Kind: "pikpak",
Name: "Active",
RootID: "root",
TeaserEnabled: true,
}); err != nil {
t.Fatalf("seed drive: %v", err)
}
now := time.Now()
for _, v := range []*Video{
{
ID: "visible-video",
DriveID: "active-drive",
FileID: "visible-file",
Title: "Visible",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
},
{
ID: "orphan-video",
DriveID: "deleted-drive",
FileID: "orphan-file",
Title: "Orphan",
PublishedAt: now.Add(time.Second),
CreatedAt: now.Add(time.Second),
UpdatedAt: now.Add(time.Second),
},
} {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("seed video %s: %v", v.ID, err)
}
}
items, total, err := cat.ListVideos(ctx, ListParams{Page: 1, PageSize: 10, Sort: "latest"})
if err != nil {
t.Fatalf("list videos: %v", err)
}
if total != 1 || len(items) != 1 || items[0].ID != "visible-video" {
t.Fatalf("items total=%d items=%v, want only visible-video", total, items)
}
}
+55 -6
View File
@@ -5,6 +5,9 @@ CREATE TABLE IF NOT EXISTS videos (
file_id TEXT NOT NULL,
file_name TEXT DEFAULT '', -- 网盘侧原始文件名,用于同名同大小去重
content_hash TEXT DEFAULT '',
sampled_sha256 TEXT DEFAULT '', -- 跨网盘统一采样指纹(size + sampled bytes
fingerprint_status TEXT DEFAULT 'pending', -- pending / ready / failed
fingerprint_error TEXT DEFAULT '',
parent_id TEXT,
title TEXT NOT NULL,
author TEXT,
@@ -14,9 +17,10 @@ CREATE TABLE IF NOT EXISTS videos (
ext TEXT,
quality TEXT, -- HD / SD
thumbnail_url TEXT,
thumbnail_status TEXT DEFAULT 'pending', -- pending / ready / failed
preview_file_id TEXT, -- deprecated: 旧版回写网盘后的 teaser file id
preview_local TEXT, -- 本地 teaser 路径(兜底)
thumbnail_status TEXT DEFAULT 'pending', -- pending / ready / failed / skipped
thumbnail_failures INTEGER DEFAULT 0, -- consecutive transient thumbnail generation failures
preview_file_id TEXT, -- deprecated: 旧版回写网盘后的预览视频 file id
preview_local TEXT, -- 本地预览视频路径(兜底)
preview_status TEXT DEFAULT 'pending', -- pending / ready / failed
views INTEGER DEFAULT 0,
favorites INTEGER DEFAULT 0,
@@ -58,17 +62,62 @@ CREATE TABLE IF NOT EXISTS video_tags (
CREATE INDEX IF NOT EXISTS idx_video_tags_tag ON video_tags(tag_id);
CREATE INDEX IF NOT EXISTS idx_video_tags_video ON video_tags(video_id);
-- 用户手动删除过的非系统标签。自动扫描/迁移不再重新创建同名标签;
-- 管理员手动新建同名标签时会移除这里的记录。
CREATE TABLE IF NOT EXISTS deleted_tags (
label TEXT PRIMARY KEY COLLATE NOCASE,
source TEXT NOT NULL DEFAULT '',
deleted_at INTEGER NOT NULL
);
-- 管理员显式删除过的视频。用于防止后续扫描 / spider91 爬虫把同一个源文件
-- 再次入库;不代表原始云盘文件已被删除。
CREATE TABLE IF NOT EXISTS deleted_videos (
id TEXT PRIMARY KEY,
drive_id TEXT NOT NULL DEFAULT '',
file_id TEXT NOT NULL DEFAULT '',
content_hash TEXT NOT NULL DEFAULT '',
file_name TEXT NOT NULL DEFAULT '',
size_bytes INTEGER NOT NULL DEFAULT 0,
deleted_at INTEGER NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_deleted_videos_drive_file
ON deleted_videos(drive_id, file_id);
CREATE INDEX IF NOT EXISTS idx_deleted_videos_drive_hash
ON deleted_videos(drive_id, content_hash);
CREATE INDEX IF NOT EXISTS idx_deleted_videos_drive_signature
ON deleted_videos(drive_id, file_name, size_bytes);
-- 爬虫来源记录。用于把已确认重复的 source_id 写回 seen 列表,
-- 避免后续爬虫反复下载同一个候选视频。
CREATE TABLE IF NOT EXISTS crawler_seen_sources (
kind TEXT NOT NULL,
drive_id TEXT NOT NULL,
source_id TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'imported', -- imported / duplicate
canonical_video_id TEXT NOT NULL DEFAULT '',
sampled_sha256 TEXT NOT NULL DEFAULT '',
size_bytes INTEGER NOT NULL DEFAULT 0,
first_seen_at INTEGER NOT NULL,
last_seen_at INTEGER NOT NULL,
PRIMARY KEY (kind, drive_id, source_id)
);
CREATE INDEX IF NOT EXISTS idx_crawler_seen_sources_drive
ON crawler_seen_sources(kind, drive_id, status);
-- 网盘账户
CREATE TABLE IF NOT EXISTS drives (
id TEXT PRIMARY KEY,
kind TEXT NOT NULL, -- quark / p115 / pikpak / wopan / onedrive / spider91
kind TEXT NOT NULL, -- quark / p115 / p123 / pikpak / wopan / onedrive / googledrive / localstorage / spider91
name TEXT NOT NULL,
root_id TEXT NOT NULL DEFAULT '0',
scan_root_id TEXT, -- 扫描起点(默认 root_id
scan_root_id TEXT, -- deprecated: 扫描起点固定等于 root_id
credentials TEXT, -- JSON: cookie / refresh_token 等
status TEXT DEFAULT 'disconnected', -- disconnected / ok / error
last_error TEXT,
-- 是否给该盘生成 teaser/封面:1 开 / 0 关。
-- 是否给该盘生成预览视频/封面:1 开 / 0 关。
-- 替代了早期的全局 preview.enabled 设置(保留旧 setting 行不再读)。
teaser_enabled INTEGER NOT NULL DEFAULT 1,
-- 扫描时要跳过的目录 ID 集合(JSON array of string)。命中其中任意一个的目录及其
+224
View File
@@ -109,3 +109,227 @@ func TestRandomVideosExcluding(t *testing.T) {
t.Fatalf("limit 0 should return nil, got %v", got4)
}
}
func TestRandomVideosWithReadyThumbnailsExcluding(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() { _ = cat.Close() })
now := time.Now()
for i := 0; i < 4; i++ {
id := "ready-" + string(rune('a'+i))
if err := cat.UpsertVideo(ctx, &Video{
ID: id,
DriveID: "drive",
FileID: "f-" + id,
Title: id,
ThumbnailURL: "/p/thumb/" + id,
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed %s: %v", id, err)
}
}
for i := 0; i < 4; i++ {
id := "pending-" + string(rune('a'+i))
if err := cat.UpsertVideo(ctx, &Video{
ID: id,
DriveID: "drive",
FileID: "f-" + id,
Title: id,
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed %s: %v", id, err)
}
}
got, err := cat.RandomVideosWithReadyThumbnailsExcluding(ctx, []string{"ready-a"}, 10)
if err != nil {
t.Fatalf("random ready excluding: %v", err)
}
if len(got) != 3 {
t.Fatalf("ready random count = %d, want 3", len(got))
}
for _, v := range got {
if v.ID == "ready-a" {
t.Fatal("excluded ready video was returned")
}
if v.ThumbnailURL == "" {
t.Fatalf("pending video %q was returned", v.ID)
}
}
}
func TestRandomVideosForPreferredVideoChoosesLeastPopulatedTag(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() { _ = cat.Close() })
now := time.Now()
for _, v := range []*Video{
{ID: "current", DriveID: "drive", FileID: "f-current", Title: "current", Tags: []string{"common", "rare"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "common-1", DriveID: "drive", FileID: "f-common-1", Title: "common 1", Tags: []string{"common"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "common-2", DriveID: "drive", FileID: "f-common-2", Title: "common 2", Tags: []string{"common"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "rare-1", DriveID: "drive", FileID: "f-rare-1", Title: "rare 1", Tags: []string{"rare"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
} {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("seed %s: %v", v.ID, err)
}
}
tag, err := cat.LeastPopulatedVisibleUniqueTag(ctx, []string{"common", "rare"})
if err != nil {
t.Fatalf("least populated tag: %v", err)
}
if tag != "rare" {
t.Fatalf("least populated tag = %q, want rare", tag)
}
got, err := cat.RandomVideosForPreferredVideoExcluding(ctx, "current", []string{"current"}, 1)
if err != nil {
t.Fatalf("random preferred: %v", err)
}
if len(got) != 1 || got[0].ID != "rare-1" {
t.Fatalf("preferred result = %#v, want rare-1", videoIDs(got))
}
got, err = cat.RandomVideosForPreferredVideoExcluding(ctx, "current", nil, 1)
if err != nil {
t.Fatalf("random preferred without explicit exclude: %v", err)
}
if len(got) != 1 || got[0].ID == "current" {
t.Fatalf("preferred result without explicit exclude = %#v, should not return current", videoIDs(got))
}
}
func TestRandomVideosForPreferredVideoFallsBackToFillBatch(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() { _ = cat.Close() })
now := time.Now()
for _, v := range []*Video{
{ID: "current", DriveID: "drive", FileID: "f-current", Title: "current", Tags: []string{"common", "rare"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "common-1", DriveID: "drive", FileID: "f-common-1", Title: "common 1", Tags: []string{"common"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "common-2", DriveID: "drive", FileID: "f-common-2", Title: "common 2", Tags: []string{"common"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "rare-1", DriveID: "drive", FileID: "f-rare-1", Title: "rare 1", Tags: []string{"rare"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "hidden-rare", DriveID: "drive", FileID: "f-hidden-rare", Title: "hidden rare", Tags: []string{"rare"}, PublishedAt: now, CreatedAt: now, UpdatedAt: now},
} {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("seed %s: %v", v.ID, err)
}
}
if err := cat.HideVideo(ctx, "hidden-rare"); err != nil {
t.Fatalf("hide hidden-rare: %v", err)
}
got, err := cat.RandomVideosForPreferredVideoExcluding(ctx, "current", []string{"current"}, 3)
if err != nil {
t.Fatalf("random preferred: %v", err)
}
ids := videoIDs(got)
if len(ids) != 3 {
t.Fatalf("result ids = %#v, want 3 items", ids)
}
for _, excluded := range []string{"current", "hidden-rare"} {
if hasVideoID(ids, excluded) {
t.Fatalf("result ids = %#v, should not include %s", ids, excluded)
}
}
if !hasVideoID(ids, "rare-1") {
t.Fatalf("result ids = %#v, want rare-1 from least populated tag", ids)
}
if len(uniqueVideoIDs(ids)) != len(ids) {
t.Fatalf("result ids = %#v, want no duplicates", ids)
}
}
func TestRandomVideosForPreferredVideoFallbacksWhenPreferenceUnavailable(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() { _ = cat.Close() })
now := time.Now()
for _, v := range []*Video{
{ID: "untagged", DriveID: "drive", FileID: "f-untagged", Title: "untagged", PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "visible-1", DriveID: "drive", FileID: "f-visible-1", Title: "visible 1", PublishedAt: now, CreatedAt: now, UpdatedAt: now},
{ID: "visible-2", DriveID: "drive", FileID: "f-visible-2", Title: "visible 2", PublishedAt: now, CreatedAt: now, UpdatedAt: now},
} {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("seed %s: %v", v.ID, err)
}
}
got, err := cat.RandomVideosForPreferredVideoExcluding(ctx, "missing", []string{"untagged"}, 2)
if err != nil {
t.Fatalf("random missing preferred: %v", err)
}
if !sameVideoIDSet(videoIDs(got), []string{"visible-1", "visible-2"}) {
t.Fatalf("missing preferred ids = %#v, want visible fallback videos", videoIDs(got))
}
got, err = cat.RandomVideosForPreferredVideoExcluding(ctx, "untagged", []string{"untagged"}, 2)
if err != nil {
t.Fatalf("random untagged preferred: %v", err)
}
if !sameVideoIDSet(videoIDs(got), []string{"visible-1", "visible-2"}) {
t.Fatalf("untagged preferred ids = %#v, want visible fallback videos", videoIDs(got))
}
}
func videoIDs(videos []*Video) []string {
ids := make([]string, 0, len(videos))
for _, v := range videos {
ids = append(ids, v.ID)
}
return ids
}
func hasVideoID(ids []string, want string) bool {
for _, id := range ids {
if id == want {
return true
}
}
return false
}
func uniqueVideoIDs(ids []string) map[string]struct{} {
seen := make(map[string]struct{}, len(ids))
for _, id := range ids {
seen[id] = struct{}{}
}
return seen
}
func sameVideoIDSet(a, b []string) bool {
if len(a) != len(b) {
return false
}
seen := make(map[string]int, len(a))
for _, value := range a {
seen[value]++
}
for _, value := range b {
if seen[value] == 0 {
return false
}
seen[value]--
}
return true
}
+576 -24
View File
@@ -17,6 +17,8 @@ import (
)
var ErrUnknownTag = errors.New("unknown tag")
var ErrSystemTag = errors.New("system tag cannot be deleted")
var ErrDeletedTag = errors.New("tag was previously deleted")
const avTagLabel = "AV"
@@ -43,6 +45,15 @@ func (c *Catalog) migrate(ctx context.Context) error {
if err := c.addColumnIfMissing(ctx, "videos", "content_hash", "TEXT DEFAULT ''"); err != nil {
return err
}
if err := c.addColumnIfMissing(ctx, "videos", "sampled_sha256", "TEXT DEFAULT ''"); err != nil {
return err
}
if err := c.addColumnIfMissing(ctx, "videos", "fingerprint_status", "TEXT DEFAULT 'pending'"); err != nil {
return err
}
if err := c.addColumnIfMissing(ctx, "videos", "fingerprint_error", "TEXT DEFAULT ''"); err != nil {
return err
}
if err := c.addColumnIfMissing(ctx, "videos", "file_name", "TEXT DEFAULT ''"); err != nil {
return err
}
@@ -52,10 +63,13 @@ func (c *Catalog) migrate(ctx context.Context) error {
if err := c.addColumnIfMissing(ctx, "videos", "thumbnail_status", "TEXT DEFAULT 'pending'"); err != nil {
return err
}
// drives.teaser_enabled:每盘 teaser 开关,替代旧的全局 preview.enabled。
if err := c.addColumnIfMissing(ctx, "videos", "thumbnail_failures", "INTEGER DEFAULT 0"); err != nil {
return err
}
// drives.teaser_enabled:每盘预览视频开关,替代旧的全局 preview.enabled。
// 升级路径:直接让 ALTER TABLE 的 DEFAULT 1 兜底 —— 每个现存 drive 都默认开启,
// 不读旧的 settings.preview.enabled 字段。这样老用户即便之前关过全局开关,
// 升级后所有盘也都恢复"默认生成 teaser",跟新建保持一致。
// 升级后所有盘也都恢复"默认生成预览视频",跟新建保持一致。
if _, err := c.addColumnIfMissingReportNew(ctx, "drives", "teaser_enabled", "INTEGER NOT NULL DEFAULT 1"); err != nil {
return err
}
@@ -65,6 +79,21 @@ func (c *Catalog) migrate(ctx context.Context) error {
if err := c.addColumnIfMissing(ctx, "drives", "skip_dir_ids", "TEXT NOT NULL DEFAULT '[]'"); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `
CREATE TABLE IF NOT EXISTS deleted_videos (
id TEXT PRIMARY KEY,
drive_id TEXT NOT NULL DEFAULT '',
file_id TEXT NOT NULL DEFAULT '',
content_hash TEXT NOT NULL DEFAULT '',
file_name TEXT NOT NULL DEFAULT '',
size_bytes INTEGER NOT NULL DEFAULT 0,
deleted_at INTEGER NOT NULL
)`); err != nil {
return err
}
if err := c.syncDriveScanRootIDToRootID(ctx); err != nil {
return err
}
// 一次性修正:早期版本(短暂存在过)会把现存 drive 的 teaser_enabled 同步成
// 旧的全局 preview.enabled 值,导致升级后所有 drive 都是关。"默认开启"约定下,
// 这里一次性把所有 drive 强制重置为 1,并用 marker setting 记号,避免之后
@@ -83,12 +112,36 @@ func (c *Catalog) migrate(ctx context.Context) error {
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_videos_content_hash ON videos(content_hash)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_videos_content_hash_created ON videos(content_hash, created_at, id)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_videos_sampled_sha256 ON videos(size_bytes, sampled_sha256)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_videos_sampled_sha256_created ON videos(size_bytes, sampled_sha256, created_at, id)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_videos_hidden ON videos(hidden)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_videos_visible_pub ON videos(COALESCE(hidden, 0), published_at DESC)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_videos_file_name_size ON videos(file_name, size_bytes)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_videos_file_name_size_created ON videos(file_name, size_bytes, created_at, id)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_deleted_videos_drive_file ON deleted_videos(drive_id, file_id)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_deleted_videos_drive_hash ON deleted_videos(drive_id, content_hash)`); err != nil {
return err
}
if _, err := c.db.ExecContext(ctx, `CREATE INDEX IF NOT EXISTS idx_deleted_videos_drive_signature ON deleted_videos(drive_id, file_name, size_bytes)`); err != nil {
return err
}
if err := c.seedSystemTags(ctx); err != nil {
return err
}
@@ -107,6 +160,12 @@ func (c *Catalog) migrate(ctx context.Context) error {
if err := c.clearVolatileOneDriveThumbnails(ctx); err != nil {
return err
}
if err := c.clearRemoteP123ThumbnailsOnce(ctx); err != nil {
return err
}
if err := c.clearRemoteNonSpider91Thumbnails(ctx); err != nil {
return err
}
if err := c.hideZeroSizeVideosFromKnownDrives(ctx); err != nil {
return err
}
@@ -155,7 +214,7 @@ func (c *Catalog) addColumnIfMissingReportNew(ctx context.Context, table, column
// 设为 1(开启),但仅在历史上没跑过这条迁移时执行(用 marker setting 记号)。
//
// 为什么需要:早期短暂存在过的版本会从旧的全局 preview.enabled = "0" 同步到
// 所有 drive 的 teaser_enabled = 0;用户报告升级后页面全显示"Teaser 关"。新版
// 所有 drive 的 teaser_enabled = 0;用户报告升级后页面全显示"预览视频关"。新版
// 约定 per-drive 默认开启,所以这里跑一次性修正。
//
// 幂等保证:marker setting 设过了就不再跑,确保用户在 UI 里把某盘关了不会被
@@ -189,8 +248,9 @@ func (c *Catalog) resetDriveTeaserEnabledToDefaultOnce(ctx context.Context) erro
// - 管理员凭直觉认知字段名时会被误导
//
// 修正策略:
// - thumbnail_url 非空 + status 非 'ready' + status 非 'failed' → 改成 'ready'
// - thumbnail_url 非空 + status 非 'ready' + status 非 'failed' + status 非 'skipped' → 改成 'ready'
// - status='failed' 不动(这是 worker 显式标的失败,要保留以便管理员手动重生)
// - status='skipped' 不动(已有封面但时长探测不可用,避免重启后重复排队)
//
// 幂等保证:marker setting 写过就不再跑,避免每次重启都 update 一遍。
func (c *Catalog) reconcileThumbnailStatusOnce(ctx context.Context) error {
@@ -207,7 +267,7 @@ UPDATE videos
SET thumbnail_status = 'ready',
updated_at = ?
WHERE COALESCE(thumbnail_url, '') != ''
AND COALESCE(thumbnail_status, 'pending') NOT IN ('ready', 'failed')
AND COALESCE(thumbnail_status, 'pending') NOT IN ('ready', 'failed', 'skipped')
`, time.Now().UnixMilli())
if err != nil {
return fmt.Errorf("reconcile thumbnail_status: %w", err)
@@ -236,6 +296,85 @@ UPDATE videos
return err
}
func (c *Catalog) clearRemoteP123ThumbnailsOnce(ctx context.Context) error {
// 123网盘列表返回的缩略图尺寸和稳定性都不适合作为站内封面;清空历史写入的
// 远程 URL,让封面 worker 统一从视频直链抽帧生成本地 /p/thumb/<id>。
const markerKey = "videos.p123.remote_thumbnails_cleared"
marker, err := c.GetSetting(ctx, markerKey, "")
if err != nil {
return fmt.Errorf("read %s marker: %w", markerKey, err)
}
if strings.TrimSpace(marker) == "1" {
return nil
}
var p123Drives int
if err := c.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM drives WHERE kind = 'p123'`).Scan(&p123Drives); err != nil {
return fmt.Errorf("count p123 drives: %w", err)
}
if p123Drives == 0 {
return nil
}
res, err := c.db.ExecContext(ctx, `
UPDATE videos
SET thumbnail_url = '',
thumbnail_status = 'pending',
thumbnail_failures = 0,
updated_at = ?
WHERE EXISTS (
SELECT 1
FROM drives
WHERE drives.id = videos.drive_id
AND drives.kind = 'p123'
)
AND (
lower(COALESCE(thumbnail_url, '')) LIKE 'http://%'
OR lower(COALESCE(thumbnail_url, '')) LIKE 'https://%'
)
`, time.Now().UnixMilli())
if err != nil {
return err
}
if affected, err := res.RowsAffected(); err == nil && affected > 0 {
log.Printf("[catalog] cleared %d remote 123pan thumbnail(s) for local regeneration", affected)
}
if err := c.SetSetting(ctx, markerKey, "1"); err != nil {
return fmt.Errorf("write %s marker: %w", markerKey, err)
}
return nil
}
func (c *Catalog) clearRemoteNonSpider91Thumbnails(ctx context.Context) error {
// 非 91Spider 视频不再使用网盘侧返回的远程缩略图。清空历史 http/https
// thumbnail_url 后,封面 worker 会重新从视频中间帧生成本地 /p/thumb/<id>。
// 91Spider 的封面是爬虫下载后保存到本地 /p/thumb/<id>,不受这条规则影响。
res, err := c.db.ExecContext(ctx, `
UPDATE videos
SET thumbnail_url = '',
thumbnail_status = 'pending',
thumbnail_failures = 0,
updated_at = ?
WHERE (
lower(COALESCE(thumbnail_url, '')) LIKE 'http://%'
OR lower(COALESCE(thumbnail_url, '')) LIKE 'https://%'
)
AND NOT EXISTS (
SELECT 1
FROM drives
WHERE drives.id = videos.drive_id
AND drives.kind = 'spider91'
)
`, time.Now().UnixMilli())
if err != nil {
return err
}
if affected, err := res.RowsAffected(); err == nil && affected > 0 {
log.Printf("[catalog] cleared %d remote non-91Spider thumbnail(s) for local regeneration", affected)
}
return nil
}
func (c *Catalog) hideZeroSizeVideosFromKnownDrives(ctx context.Context) error {
_, err := c.db.ExecContext(ctx, `
UPDATE videos
@@ -281,7 +420,15 @@ func (c *Catalog) classifySystemTags(ctx context.Context) error {
}
func (c *Catalog) backfillVideoTags(ctx context.Context) error {
rows, err := c.db.QueryContext(ctx, `SELECT id, COALESCE(tags, '[]') FROM videos`)
rows, err := c.db.QueryContext(ctx, `
SELECT id, COALESCE(tags, '[]')
FROM videos
WHERE COALESCE(tags, '') NOT IN ('', '[]', 'null')
AND NOT EXISTS (
SELECT 1
FROM video_tags vt
WHERE vt.video_id = videos.id
)`)
if err != nil {
return err
}
@@ -298,11 +445,14 @@ func (c *Catalog) backfillVideoTags(ctx context.Context) error {
if len(labels) == 0 {
continue
}
if err := c.addVideoTags(ctx, videoID, labels, "legacy", true); err != nil {
added, err := c.addVideoTags(ctx, videoID, labels, "legacy", true)
if err != nil {
return err
}
if err := c.syncVideoTagsJSON(ctx, videoID, false); err != nil {
return err
if added {
if err := c.syncVideoTagsJSON(ctx, videoID, false); err != nil {
return err
}
}
}
return nil
@@ -350,6 +500,9 @@ GROUP BY category`)
if !LooksLikeCollectionTag(stat.category) {
continue
}
if c.tagDeleted(ctx, stat.category) {
continue
}
if _, err := c.ensureTag(ctx, stat.category, nil, "collection"); err != nil {
return err
}
@@ -368,12 +521,178 @@ func (c *Catalog) CreateTagAndClassify(ctx context.Context, label string, aliase
return c.classifyTag(ctx, tag)
}
func (c *Catalog) EnsureTagForVideoIDPrefix(ctx context.Context, prefix, label string, aliases []string, source string) (int, error) {
prefix = strings.TrimSpace(prefix)
if prefix == "" {
return 0, errors.New("video id prefix is required")
}
tag, err := c.ensureTag(ctx, label, aliases, source)
if err != nil {
return 0, err
}
rows, err := c.db.QueryContext(ctx, `
SELECT v.id
FROM videos v
WHERE v.id LIKE ? || '%'
AND COALESCE(v.tags_manual, 0) = 0
AND NOT EXISTS (
SELECT 1
FROM video_tags vt
WHERE vt.video_id = v.id
AND vt.tag_id = ?
)
ORDER BY v.id ASC`, prefix, tag.ID)
if err != nil {
return 0, err
}
var videoIDs []string
for rows.Next() {
var videoID string
if err := rows.Scan(&videoID); err != nil {
rows.Close()
return 0, err
}
videoIDs = append(videoIDs, videoID)
}
if err := rows.Err(); err != nil {
rows.Close()
return 0, err
}
if err := rows.Close(); err != nil {
return 0, err
}
for _, videoID := range videoIDs {
if err := c.insertVideoTag(ctx, videoID, tag.ID, "auto"); err != nil {
return 0, err
}
if err := c.syncVideoTagsJSON(ctx, videoID, false); err != nil {
return 0, err
}
}
return len(videoIDs), nil
}
func (c *Catalog) DeleteTag(ctx context.Context, tagID int64) (int, error) {
tx, err := c.db.BeginTx(ctx, nil)
if err != nil {
return 0, err
}
defer tx.Rollback()
tag, err := c.getTagByIDTx(ctx, tx, tagID)
if err != nil {
return 0, err
}
if tag.Source == "system" {
return 0, ErrSystemTag
}
rows, err := tx.QueryContext(ctx, `SELECT video_id FROM video_tags WHERE tag_id = ?`, tagID)
if err != nil {
return 0, err
}
var videoIDs []string
for rows.Next() {
var videoID string
if err := rows.Scan(&videoID); err != nil {
rows.Close()
return 0, err
}
videoIDs = append(videoIDs, videoID)
}
if err := rows.Err(); err != nil {
rows.Close()
return 0, err
}
if err := rows.Close(); err != nil {
return 0, err
}
if _, err := tx.ExecContext(ctx, `DELETE FROM video_tags WHERE tag_id = ?`, tagID); err != nil {
return 0, err
}
if _, err := tx.ExecContext(ctx, `DELETE FROM tags WHERE id = ?`, tagID); err != nil {
return 0, err
}
if err := markDeletedTagTx(ctx, tx, tag); err != nil {
return 0, err
}
for _, videoID := range videoIDs {
manual := hasManualTagsTx(ctx, tx, videoID)
if err := syncVideoTagsJSONTx(ctx, tx, videoID, manual); err != nil {
return 0, err
}
}
if err := tx.Commit(); err != nil {
return 0, err
}
return len(videoIDs), nil
}
func (c *Catalog) ListTags(ctx context.Context) ([]Tag, error) {
rows, err := c.db.QueryContext(ctx, `
SELECT t.id, t.label, t.aliases, t.source, COUNT(v.id) AS cnt
WITH tagged_tags AS (
SELECT vt.tag_id,
tagged.id,
COALESCE(tagged.content_hash, '') AS content_hash,
COALESCE(tagged.sampled_sha256, '') AS sampled_sha256,
tagged.size_bytes,
COALESCE(tagged.file_name, '') AS file_name
FROM video_tags vt
JOIN videos tagged ON tagged.id = vt.video_id
WHERE COALESCE(tagged.hidden, 0) = 0
),
tag_candidates AS (
SELECT tag_id, id AS video_id
FROM tagged_tags
UNION ALL
SELECT tag_id,
(SELECT canonical.id
FROM videos canonical
WHERE tagged_tags.content_hash != ''
AND canonical.content_hash = tagged_tags.content_hash
AND COALESCE(canonical.content_hash, '') != ''
ORDER BY canonical.created_at ASC, canonical.id ASC
LIMIT 1) AS video_id
FROM tagged_tags
WHERE content_hash != ''
UNION ALL
SELECT tag_id,
(SELECT canonical.id
FROM videos canonical
WHERE tagged_tags.sampled_sha256 != ''
AND tagged_tags.size_bytes > 0
AND canonical.sampled_sha256 = tagged_tags.sampled_sha256
AND canonical.size_bytes = tagged_tags.size_bytes
AND COALESCE(canonical.sampled_sha256, '') != ''
AND canonical.size_bytes > 0
ORDER BY canonical.created_at ASC, canonical.id ASC
LIMIT 1) AS video_id
FROM tagged_tags
WHERE sampled_sha256 != '' AND size_bytes > 0
UNION ALL
SELECT tag_id,
(SELECT canonical.id
FROM videos canonical
WHERE tagged_tags.file_name != ''
AND tagged_tags.size_bytes > 0
AND canonical.file_name = tagged_tags.file_name
AND canonical.size_bytes = tagged_tags.size_bytes
AND COALESCE(canonical.file_name, '') != ''
AND canonical.size_bytes > 0
ORDER BY canonical.created_at ASC, canonical.id ASC
LIMIT 1) AS video_id
FROM tagged_tags
WHERE file_name != '' AND size_bytes > 0
)
SELECT t.id, t.label, t.aliases, t.source, COUNT(DISTINCT videos.id) AS cnt
FROM tags t
LEFT JOIN video_tags vt ON vt.tag_id = t.id
LEFT JOIN videos v ON v.id = vt.video_id AND COALESCE(v.hidden, 0) = 0
LEFT JOIN tag_candidates tc ON tc.tag_id = t.id AND tc.video_id IS NOT NULL
LEFT JOIN videos ON videos.id = tc.video_id
AND COALESCE(videos.hidden, 0) = 0
AND `+uniqueVideoWhereSQL+`
GROUP BY t.id, t.label, t.aliases, t.source
ORDER BY cnt DESC, t.label ASC`)
if err != nil {
@@ -391,6 +710,66 @@ ORDER BY cnt DESC, t.label ASC`)
return out, nil
}
func videoMatchesTagLabelSQL(videoAlias string) string {
return fmt.Sprintf(`%s.id IN (
WITH tagged_videos AS (
SELECT tagged.id,
COALESCE(tagged.content_hash, '') AS content_hash,
COALESCE(tagged.sampled_sha256, '') AS sampled_sha256,
tagged.size_bytes,
COALESCE(tagged.file_name, '') AS file_name
FROM video_tags vt
JOIN tags tag_filter ON tag_filter.id = vt.tag_id
JOIN videos tagged ON tagged.id = vt.video_id
WHERE tag_filter.label = ? COLLATE NOCASE
AND COALESCE(tagged.hidden, 0) = 0
),
tag_candidates AS (
SELECT id AS video_id
FROM tagged_videos
UNION ALL
SELECT (SELECT canonical.id
FROM videos canonical
WHERE tagged_videos.content_hash != ''
AND canonical.content_hash = tagged_videos.content_hash
AND COALESCE(canonical.content_hash, '') != ''
ORDER BY canonical.created_at ASC, canonical.id ASC
LIMIT 1) AS video_id
FROM tagged_videos
WHERE content_hash != ''
UNION ALL
SELECT (SELECT canonical.id
FROM videos canonical
WHERE tagged_videos.sampled_sha256 != ''
AND tagged_videos.size_bytes > 0
AND canonical.sampled_sha256 = tagged_videos.sampled_sha256
AND canonical.size_bytes = tagged_videos.size_bytes
AND COALESCE(canonical.sampled_sha256, '') != ''
AND canonical.size_bytes > 0
ORDER BY canonical.created_at ASC, canonical.id ASC
LIMIT 1) AS video_id
FROM tagged_videos
WHERE sampled_sha256 != '' AND size_bytes > 0
UNION ALL
SELECT (SELECT canonical.id
FROM videos canonical
WHERE tagged_videos.file_name != ''
AND tagged_videos.size_bytes > 0
AND canonical.file_name = tagged_videos.file_name
AND canonical.size_bytes = tagged_videos.size_bytes
AND COALESCE(canonical.file_name, '') != ''
AND canonical.size_bytes > 0
ORDER BY canonical.created_at ASC, canonical.id ASC
LIMIT 1) AS video_id
FROM tagged_videos
WHERE file_name != '' AND size_bytes > 0
)
SELECT video_id
FROM tag_candidates
WHERE video_id IS NOT NULL
)`, videoAlias)
}
func (c *Catalog) SetManualVideoTags(ctx context.Context, videoID string, labels []string) error {
if _, err := c.GetVideo(ctx, videoID); err != nil {
return err
@@ -441,6 +820,9 @@ func (c *Catalog) EnsureCollectionTag(ctx context.Context, label string) (string
if !LooksLikeCollectionTag(label) {
return "", false, nil
}
if c.tagDeleted(ctx, label) {
return "", false, nil
}
if !c.tagExists(ctx, label) {
count, err := c.categoryVideoCount(ctx, label)
if err != nil {
@@ -472,6 +854,14 @@ func (c *Catalog) ensureTag(ctx context.Context, label string, aliases []string,
if source == "" {
source = "user"
}
if source != "system" && source != "user" && c.tagDeleted(ctx, label) {
return Tag{}, ErrDeletedTag
}
if source == "system" || source == "user" {
if err := c.restoreDeletedTag(ctx, label); err != nil {
return Tag{}, err
}
}
aliases = cleanAliases(aliases, label)
aliasesJSON, _ := json.Marshal(aliases)
now := time.Now().UnixMilli()
@@ -498,6 +888,10 @@ func (c *Catalog) getTagByLabel(ctx context.Context, label string) (Tag, error)
}
func (c *Catalog) classifyTag(ctx context.Context, tag Tag) (int, error) {
existingIDs, err := c.videoIDSetForTagID(ctx, tag.ID)
if err != nil {
return 0, err
}
rows, err := c.db.QueryContext(ctx, `
SELECT id, title, COALESCE(author, ''), COALESCE(category, ''), COALESCE(tags_manual, 0)
FROM videos`)
@@ -529,13 +923,14 @@ FROM videos`)
continue
}
}
added, err := c.addVideoTag(ctx, videoID, tag.Label, "auto", false)
if err != nil {
if existingIDs[videoID] {
continue
}
if err := c.insertVideoTag(ctx, videoID, tag.ID, "auto"); err != nil {
return 0, err
}
if added {
classified++
}
existingIDs[videoID] = true
classified++
if err := c.syncVideoTagsJSON(ctx, videoID, false); err != nil {
return 0, err
}
@@ -545,9 +940,15 @@ FROM videos`)
func (c *Catalog) replaceVideoTags(ctx context.Context, videoID string, labels []string, source string, manual bool, createMissing bool) error {
labels = uniqueStrings(cleanLabels(labels))
if source != "manual" {
labels = c.filterDeletedTagLabels(ctx, labels)
}
if createMissing {
for _, label := range labels {
if _, err := c.ensureTag(ctx, label, nil, "legacy"); err != nil {
if errors.Is(err, ErrDeletedTag) {
continue
}
return err
}
}
@@ -589,18 +990,33 @@ func (c *Catalog) replaceVideoTags(ctx context.Context, videoID string, labels [
return c.syncVideoTagsJSON(ctx, videoID, manual)
}
func (c *Catalog) addVideoTags(ctx context.Context, videoID string, labels []string, source string, createMissing bool) error {
for _, label := range uniqueStrings(cleanLabels(labels)) {
if _, err := c.addVideoTag(ctx, videoID, label, source, createMissing); err != nil {
return err
func (c *Catalog) addVideoTags(ctx context.Context, videoID string, labels []string, source string, createMissing bool) (bool, error) {
labels = uniqueStrings(cleanLabels(labels))
if source != "manual" {
labels = c.filterDeletedTagLabels(ctx, labels)
}
changed := false
for _, label := range labels {
added, err := c.addVideoTag(ctx, videoID, label, source, createMissing)
if err != nil {
return false, err
}
if added {
changed = true
}
}
return nil
return changed, nil
}
func (c *Catalog) addVideoTag(ctx context.Context, videoID, label, source string, createMissing bool) (bool, error) {
if source != "manual" && c.tagDeleted(ctx, label) {
return false, nil
}
if createMissing {
if _, err := c.ensureTag(ctx, label, nil, "legacy"); err != nil {
if errors.Is(err, ErrDeletedTag) {
return false, nil
}
return false, err
}
}
@@ -619,12 +1035,33 @@ func (c *Catalog) addVideoTag(ctx context.Context, videoID, label, source string
return n > 0, nil
}
func (c *Catalog) insertVideoTag(ctx context.Context, videoID string, tagID int64, source string) error {
_, err := c.db.ExecContext(ctx,
`INSERT OR IGNORE INTO video_tags (video_id, tag_id, source, created_at) VALUES (?, ?, ?, ?)`,
videoID, tagID, source, time.Now().UnixMilli())
return err
}
func (c *Catalog) addCollectionTagToVideos(ctx context.Context, category string) error {
return c.addTagToVideosByCategory(ctx, category, category, "auto")
}
func (c *Catalog) addTagToVideosByCategory(ctx context.Context, category, label, source string) error {
rows, err := c.db.QueryContext(ctx, `SELECT id FROM videos WHERE category = ? AND COALESCE(tags_manual, 0) = 0`, category)
tag, err := c.getTagByLabel(ctx, label)
if err != nil {
return err
}
rows, err := c.db.QueryContext(ctx, `
SELECT v.id
FROM videos v
WHERE v.category = ?
AND COALESCE(v.tags_manual, 0) = 0
AND NOT EXISTS (
SELECT 1
FROM video_tags vt
WHERE vt.video_id = v.id
AND vt.tag_id = ?
)`, category, tag.ID)
if err != nil {
return err
}
@@ -643,7 +1080,7 @@ func (c *Catalog) addTagToVideosByCategory(ctx context.Context, category, label,
return err
}
for _, videoID := range videoIDs {
if _, err := c.addVideoTag(ctx, videoID, label, source, false); err != nil {
if err := c.insertVideoTag(ctx, videoID, tag.ID, source); err != nil {
return err
}
if err := c.syncVideoTagsJSON(ctx, videoID, false); err != nil {
@@ -727,6 +1164,23 @@ func (c *Catalog) videoIDsForTagID(ctx context.Context, tagID int64) ([]string,
return videoIDs, rows.Err()
}
func (c *Catalog) videoIDSetForTagID(ctx context.Context, tagID int64) (map[string]bool, error) {
rows, err := c.db.QueryContext(ctx, `SELECT video_id FROM video_tags WHERE tag_id = ?`, tagID)
if err != nil {
return nil, err
}
defer rows.Close()
out := map[string]bool{}
for rows.Next() {
var videoID string
if err := rows.Scan(&videoID); err != nil {
return nil, err
}
out[videoID] = true
}
return out, rows.Err()
}
func (c *Catalog) validateTagsExist(ctx context.Context, labels []string) error {
for _, label := range labels {
if _, err := c.getTagByLabel(ctx, label); err != nil {
@@ -792,6 +1246,39 @@ func (c *Catalog) tagExists(ctx context.Context, label string) bool {
return err == nil
}
func (c *Catalog) tagDeleted(ctx context.Context, label string) bool {
label = cleanTagLabel(label)
if label == "" {
return false
}
var exists int
err := c.db.QueryRowContext(ctx, `SELECT 1 FROM deleted_tags WHERE label = ? COLLATE NOCASE`, label).Scan(&exists)
return err == nil
}
func (c *Catalog) filterDeletedTagLabels(ctx context.Context, labels []string) []string {
if len(labels) == 0 {
return labels
}
out := labels[:0]
for _, label := range labels {
if c.tagDeleted(ctx, label) {
continue
}
out = append(out, label)
}
return out
}
func (c *Catalog) restoreDeletedTag(ctx context.Context, label string) error {
label = cleanTagLabel(label)
if label == "" {
return nil
}
_, err := c.db.ExecContext(ctx, `DELETE FROM deleted_tags WHERE label = ? COLLATE NOCASE`, label)
return err
}
func (c *Catalog) categoryVideoCount(ctx context.Context, category string) (int, error) {
var count int
err := c.db.QueryRowContext(ctx, `SELECT COUNT(*) FROM videos WHERE category = ?`, category).Scan(&count)
@@ -805,6 +1292,71 @@ func (c *Catalog) getTagByLabelTx(ctx context.Context, tx *sql.Tx, label string)
return scanTag(row)
}
func (c *Catalog) getTagByIDTx(ctx context.Context, tx *sql.Tx, id int64) (Tag, error) {
row := tx.QueryRowContext(ctx,
`SELECT id, label, aliases, source, 0 FROM tags WHERE id = ?`,
id)
return scanTag(row)
}
func hasManualTagsTx(ctx context.Context, tx *sql.Tx, videoID string) bool {
var manual int
err := tx.QueryRowContext(ctx, `SELECT COALESCE(tags_manual, 0) FROM videos WHERE id = ?`, videoID).Scan(&manual)
return err == nil && manual == 1
}
func markDeletedTagTx(ctx context.Context, tx *sql.Tx, tag Tag) error {
label := cleanTagLabel(tag.Label)
if label == "" {
return nil
}
now := time.Now().UnixMilli()
_, err := tx.ExecContext(ctx, `
INSERT INTO deleted_tags (label, source, deleted_at)
VALUES (?, ?, ?)
ON CONFLICT(label) DO UPDATE SET
source = excluded.source,
deleted_at = excluded.deleted_at`, label, tag.Source, now)
return err
}
func syncVideoTagsJSONTx(ctx context.Context, tx *sql.Tx, videoID string, manual bool) error {
rows, err := tx.QueryContext(ctx, `
SELECT t.label
FROM video_tags vt
JOIN tags t ON t.id = vt.tag_id
WHERE vt.video_id = ?
ORDER BY t.id ASC`, videoID)
if err != nil {
return err
}
var labels []string
for rows.Next() {
var label string
if err := rows.Scan(&label); err != nil {
rows.Close()
return err
}
labels = append(labels, label)
}
if err := rows.Err(); err != nil {
rows.Close()
return err
}
if err := rows.Close(); err != nil {
return err
}
labelsJSON, _ := json.Marshal(labels)
manualValue := 0
if manual {
manualValue = 1
}
_, err = tx.ExecContext(ctx,
`UPDATE videos SET tags = ?, tags_manual = ?, updated_at = ? WHERE id = ?`,
string(labelsJSON), manualValue, time.Now().UnixMilli(), videoID)
return err
}
type tagRowScanner interface {
Scan(dest ...any) error
}
+694 -8
View File
@@ -3,10 +3,121 @@ package catalog
import (
"context"
"database/sql"
"errors"
"testing"
"time"
)
func TestListVideosNeedingThumbnailIncludesExistingThumbnailMissingDuration(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
videos := []*Video{
{
ID: "duration-only",
DriveID: "drive",
FileID: "file-duration-only",
Title: "Duration Only",
ThumbnailURL: "/p/thumb/duration-only",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
},
{
ID: "complete",
DriveID: "drive",
FileID: "file-complete",
Title: "Complete",
DurationSeconds: 12,
ThumbnailURL: "/p/thumb/complete",
PublishedAt: now.Add(time.Second),
CreatedAt: now.Add(time.Second),
UpdatedAt: now.Add(time.Second),
},
{
ID: "missing-thumb",
DriveID: "drive",
FileID: "file-missing-thumb",
Title: "Missing Thumb",
DurationSeconds: 18,
PublishedAt: now.Add(2 * time.Second),
CreatedAt: now.Add(2 * time.Second),
UpdatedAt: now.Add(2 * time.Second),
},
{
ID: "failed",
DriveID: "drive",
FileID: "file-failed",
Title: "Failed",
PublishedAt: now.Add(3 * time.Second),
CreatedAt: now.Add(3 * time.Second),
UpdatedAt: now.Add(3 * time.Second),
},
}
for _, v := range videos {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("seed %s: %v", v.ID, err)
}
}
if err := cat.UpdateVideoMeta(ctx, "failed", VideoMetaPatch{ThumbnailStatus: "failed"}); err != nil {
t.Fatalf("mark failed thumbnail: %v", err)
}
items, err := cat.ListVideosNeedingThumbnail(ctx, "drive", 0)
if err != nil {
t.Fatalf("list videos needing thumbnail: %v", err)
}
if len(items) != 2 {
t.Fatalf("items = %#v, want duration-only and missing-thumb", items)
}
if items[0].ID != "duration-only" || items[1].ID != "missing-thumb" {
t.Fatalf("item ids = %q, %q; want duration-only, missing-thumb", items[0].ID, items[1].ID)
}
count, err := cat.CountVideosNeedingThumbnail(ctx, "drive")
if err != nil {
t.Fatalf("count videos needing thumbnail: %v", err)
}
if count != 2 {
t.Fatalf("count = %d, want 2", count)
}
counts, err := cat.CountThumbnailsByDrive(ctx)
if err != nil {
t.Fatalf("count thumbnails by drive: %v", err)
}
if got := counts["drive"]; got.Ready != 2 || got.Pending != 1 || got.Failed != 1 || got.DurationPending != 1 {
t.Fatalf("thumbnail counts = %#v, want ready=2 pending=1 failed=1 durationPending=1", got)
}
if err := cat.UpdateVideoMeta(ctx, "duration-only", VideoMetaPatch{ThumbnailStatus: "skipped"}); err != nil {
t.Fatalf("mark duration-only skipped: %v", err)
}
count, err = cat.CountVideosNeedingThumbnail(ctx, "drive")
if err != nil {
t.Fatalf("count videos needing thumbnail after skip: %v", err)
}
if count != 1 {
t.Fatalf("count after skip = %d, want 1", count)
}
counts, err = cat.CountThumbnailsByDrive(ctx)
if err != nil {
t.Fatalf("count thumbnails by drive after skip: %v", err)
}
if got := counts["drive"]; got.Ready != 2 || got.Pending != 1 || got.Failed != 1 || got.DurationPending != 0 {
t.Fatalf("thumbnail counts after skip = %#v, want ready=2 pending=1 failed=1 durationPending=0", got)
}
}
func TestCreateTagAndClassifyAddsTagToMatchingExistingVideos(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
@@ -70,6 +181,242 @@ func TestCreateTagAndClassifyAddsTagToMatchingExistingVideos(t *testing.T) {
}
}
func TestDeleteTagRemovesTagFromVideos(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
if err := cat.UpsertVideo(ctx, &Video{
ID: "video-1",
DriveID: "drive",
FileID: "file-1",
Title: "清纯短发",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed video: %v", err)
}
if _, err := cat.CreateTagAndClassify(ctx, "清纯", nil, "user"); err != nil {
t.Fatalf("create tag: %v", err)
}
tag := mustTagByLabel(t, ctx, cat, "清纯")
removed, err := cat.DeleteTag(ctx, tag.ID)
if err != nil {
t.Fatalf("delete tag: %v", err)
}
if removed != 1 {
t.Fatalf("removed = %d, want 1", removed)
}
got, err := cat.GetVideo(ctx, "video-1")
if err != nil {
t.Fatalf("get video: %v", err)
}
if len(got.Tags) != 0 {
t.Fatalf("video tags = %#v, want none", got.Tags)
}
for _, tag := range mustListTags(t, ctx, cat) {
if tag.Label == "清纯" {
t.Fatal("deleted tag still appears in ListTags")
}
}
}
func TestDeleteTagSuppressesAutomaticCollectionRecreation(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for _, id := range []string{"video-1", "video-2"} {
if err := cat.UpsertVideo(ctx, &Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: "合集视频",
Category: "sunny",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed video %s: %v", id, err)
}
}
if label, ok, err := cat.EnsureCollectionTag(ctx, "sunny"); err != nil || !ok || label != "sunny" {
t.Fatalf("ensure collection = %q, %v, %v; want sunny true nil", label, ok, err)
}
tag := mustTagByLabel(t, ctx, cat, "sunny")
if _, err := cat.DeleteTag(ctx, tag.ID); err != nil {
t.Fatalf("delete tag: %v", err)
}
if label, ok, err := cat.EnsureCollectionTag(ctx, "sunny"); err != nil || ok || label != "" {
t.Fatalf("ensure deleted collection = %q, %v, %v; want empty false nil", label, ok, err)
}
for _, tag := range mustListTags(t, ctx, cat) {
if tag.Label == "sunny" {
t.Fatal("deleted collection tag was recreated automatically")
}
}
}
func TestCreateTagAndClassifyRestoresDeletedTag(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
if err := cat.UpsertVideo(ctx, &Video{
ID: "video-1",
DriveID: "drive",
FileID: "file-1",
Title: "清纯短发",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed video: %v", err)
}
if _, err := cat.CreateTagAndClassify(ctx, "清纯", nil, "user"); err != nil {
t.Fatalf("create tag: %v", err)
}
tag := mustTagByLabel(t, ctx, cat, "清纯")
if _, err := cat.DeleteTag(ctx, tag.ID); err != nil {
t.Fatalf("delete tag: %v", err)
}
classified, err := cat.CreateTagAndClassify(ctx, "清纯", nil, "user")
if err != nil {
t.Fatalf("recreate tag: %v", err)
}
if classified != 1 {
t.Fatalf("classified = %d, want 1", classified)
}
got, err := cat.GetVideo(ctx, "video-1")
if err != nil {
t.Fatalf("get video: %v", err)
}
if !sameStrings(got.Tags, []string{"清纯"}) {
t.Fatalf("video tags = %#v, want 清纯", got.Tags)
}
}
func TestEnsureTagForVideoIDPrefixBackfillsSourceTag(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for _, seed := range []struct {
id string
manual bool
}{
{id: "spider91-91-spider-1200001"},
{id: "spider91-91-spider-1200002", manual: true},
{id: "spider91-other-1200003"},
} {
if err := cat.UpsertVideo(ctx, &Video{
ID: seed.id,
DriveID: "91-spider",
FileID: seed.id + ".mp4",
Title: "legacy title without source text",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed %s: %v", seed.id, err)
}
if seed.manual {
if err := cat.SetManualVideoTags(ctx, seed.id, nil); err != nil {
t.Fatalf("mark %s manual: %v", seed.id, err)
}
}
}
added, err := cat.EnsureTagForVideoIDPrefix(ctx, "spider91-91-spider-", "91porn", nil, "system")
if err != nil {
t.Fatalf("ensure prefix tag: %v", err)
}
if added != 1 {
t.Fatalf("added = %d, want 1", added)
}
got, err := cat.GetVideo(ctx, "spider91-91-spider-1200001")
if err != nil {
t.Fatalf("get tagged video: %v", err)
}
if !sameStrings(got.Tags, []string{"91porn"}) {
t.Fatalf("tagged video tags = %#v, want 91porn", got.Tags)
}
manual, err := cat.GetVideo(ctx, "spider91-91-spider-1200002")
if err != nil {
t.Fatalf("get manual video: %v", err)
}
if len(manual.Tags) != 0 {
t.Fatalf("manual video tags = %#v, want unchanged", manual.Tags)
}
other, err := cat.GetVideo(ctx, "spider91-other-1200003")
if err != nil {
t.Fatalf("get other prefix video: %v", err)
}
if len(other.Tags) != 0 {
t.Fatalf("other prefix video tags = %#v, want unchanged", other.Tags)
}
}
func TestDeleteTagRejectsSystemTags(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
tag := mustTagByLabel(t, ctx, cat, "AV")
if _, err := cat.DeleteTag(ctx, tag.ID); !errors.Is(err, ErrSystemTag) {
t.Fatalf("delete system tag err = %v, want ErrSystemTag", err)
}
if tag := mustTagByLabel(t, ctx, cat, "AV"); tag.Source != "system" {
t.Fatalf("AV source = %q, want system", tag.Source)
}
}
func TestOpenClassifiesSystemTagsForExistingVideos(t *testing.T) {
path := t.TempDir() + "/catalog.db"
db, err := sql.Open("sqlite", path)
@@ -120,6 +467,84 @@ VALUES
}
}
func TestMigrateDoesNotRewriteAlreadySyncedVideoTags(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for _, id := range []string{"video-1", "video-2", "video-3"} {
if err := cat.UpsertVideo(ctx, &Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: "巨乳后入合集",
Category: "Better Call Saul S03",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed %s: %v", id, err)
}
}
if err := cat.migrate(ctx); err != nil {
t.Fatalf("first migrate: %v", err)
}
before := videoUpdatedAtByID(t, ctx, cat, "video-1", "video-2", "video-3")
time.Sleep(5 * time.Millisecond)
if err := cat.migrate(ctx); err != nil {
t.Fatalf("second migrate: %v", err)
}
after := videoUpdatedAtByID(t, ctx, cat, "video-1", "video-2", "video-3")
for id, want := range before {
if got := after[id]; got != want {
t.Fatalf("%s updated_at changed on no-op migrate: got %d, want %d", id, got, want)
}
}
}
func TestMigrateBackfillsLegacyTagsWithoutRelations(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now().UnixMilli()
if _, err := cat.db.ExecContext(ctx, `
INSERT INTO videos (id, drive_id, file_id, title, tags, tags_manual, published_at, created_at, updated_at)
VALUES ('legacy-video', 'drive', 'file-legacy', 'legacy title', '["legacy-tag"]', 0, ?, ?, ?)`,
now, now, now); err != nil {
t.Fatalf("seed legacy video: %v", err)
}
if err := cat.migrate(ctx); err != nil {
t.Fatalf("migrate: %v", err)
}
tag := mustTagByLabel(t, ctx, cat, "legacy-tag")
var count int
if err := cat.db.QueryRowContext(ctx,
`SELECT COUNT(*) FROM video_tags WHERE video_id = 'legacy-video' AND tag_id = ?`, tag.ID).Scan(&count); err != nil {
t.Fatalf("count video tag: %v", err)
}
if count != 1 {
t.Fatalf("legacy video tag relation count = %d, want 1", count)
}
}
func TestOpenMigratesLegacyVideosWithoutFileName(t *testing.T) {
path := t.TempDir() + "/catalog.db"
db, err := sql.Open("sqlite", path)
@@ -379,7 +804,7 @@ func TestMigrateCollapsesAVCodeTagsIntoAV(t *testing.T) {
}
}
func TestMigrateClearsVolatileOneDriveThumbnailURLs(t *testing.T) {
func TestMigrateClearsRemoteNonSpiderThumbnailURLs(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
@@ -402,6 +827,36 @@ func TestMigrateClearsVolatileOneDriveThumbnailURLs(t *testing.T) {
}); err != nil {
t.Fatalf("seed onedrive: %v", err)
}
if err := cat.UpsertDrive(ctx, &Drive{
ID: "p123-main",
Kind: "p123",
Name: "123Pan",
RootID: "root",
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed p123: %v", err)
}
if err := cat.UpsertDrive(ctx, &Drive{
ID: "pikpak-main",
Kind: "pikpak",
Name: "PikPak",
RootID: "root",
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed pikpak: %v", err)
}
if err := cat.UpsertDrive(ctx, &Drive{
ID: "spider91-main",
Kind: "spider91",
Name: "91Spider",
RootID: "root",
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed spider91: %v", err)
}
videos := []*Video{
{
@@ -425,6 +880,27 @@ func TestMigrateClearsVolatileOneDriveThumbnailURLs(t *testing.T) {
Title: "PikPak",
ThumbnailURL: "https://sg-thumbnail-drive.mypikpak.net/v0/screenshot-thumbnails/demo",
},
{
ID: "p123-remote-thumb-video",
DriveID: "p123-main",
FileID: "file-4",
Title: "123Pan remote thumb",
ThumbnailURL: "https://download.123pan.com/thumb/file_70_70?w=70&h=70",
},
{
ID: "p123-local-thumb-video",
DriveID: "p123-main",
FileID: "file-5",
Title: "123Pan local thumb",
ThumbnailURL: "/p/thumb/p123-local-thumb-video",
},
{
ID: "spider91-local-thumb-video",
DriveID: "spider91-main",
FileID: "file-6",
Title: "91Spider local thumb",
ThumbnailURL: "/p/thumb/spider91-local-thumb-video",
},
}
for _, v := range videos {
v.PublishedAt = now
@@ -459,8 +935,39 @@ func TestMigrateClearsVolatileOneDriveThumbnailURLs(t *testing.T) {
if err != nil {
t.Fatalf("get pikpak video: %v", err)
}
if pikpak.ThumbnailURL == "" {
t.Fatal("pikpak thumbnail was cleared")
if pikpak.ThumbnailURL != "" {
t.Fatalf("pikpak thumbnail = %q, want cleared", pikpak.ThumbnailURL)
}
p123Remote, err := cat.GetVideo(ctx, "p123-remote-thumb-video")
if err != nil {
t.Fatalf("get p123 remote thumb video: %v", err)
}
if p123Remote.ThumbnailURL != "" {
t.Fatalf("p123 remote thumbnail = %q, want cleared", p123Remote.ThumbnailURL)
}
var p123Status string
if err := cat.db.QueryRowContext(ctx, `SELECT thumbnail_status FROM videos WHERE id = ?`, "p123-remote-thumb-video").Scan(&p123Status); err != nil {
t.Fatalf("read p123 thumbnail status: %v", err)
}
if p123Status != "pending" {
t.Fatalf("p123 remote thumbnail_status = %q, want pending", p123Status)
}
p123Local, err := cat.GetVideo(ctx, "p123-local-thumb-video")
if err != nil {
t.Fatalf("get p123 local thumb video: %v", err)
}
if p123Local.ThumbnailURL != "/p/thumb/p123-local-thumb-video" {
t.Fatalf("p123 local thumbnail = %q, want preserved", p123Local.ThumbnailURL)
}
spider91Local, err := cat.GetVideo(ctx, "spider91-local-thumb-video")
if err != nil {
t.Fatalf("get spider91 local thumb video: %v", err)
}
if spider91Local.ThumbnailURL != "/p/thumb/spider91-local-thumb-video" {
t.Fatalf("spider91 local thumbnail = %q, want preserved", spider91Local.ThumbnailURL)
}
}
@@ -581,6 +1088,151 @@ func TestListVideosHidesDuplicateContentHashes(t *testing.T) {
}
}
func TestTagFilterMatchesCanonicalDuplicateVideo(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for _, v := range []*Video{
{
ID: "pikpak-canonical",
DriveID: "pikpak",
FileID: "canonical.mp4",
Title: "Canonical",
Size: 1024,
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
},
{
ID: "spider91-dup-1",
DriveID: "91-spider",
FileID: "dup-1.mp4",
Title: "Spider duplicate 1",
Tags: []string{"91porn"},
Size: 1024,
PublishedAt: now.Add(time.Second),
CreatedAt: now.Add(time.Second),
UpdatedAt: now.Add(time.Second),
},
{
ID: "spider91-dup-2",
DriveID: "91-spider",
FileID: "dup-2.mp4",
Title: "Spider duplicate 2",
Tags: []string{"91porn"},
Size: 1024,
PublishedAt: now.Add(2 * time.Second),
CreatedAt: now.Add(2 * time.Second),
UpdatedAt: now.Add(2 * time.Second),
},
{
ID: "spider91-visible",
DriveID: "91-spider",
FileID: "visible.mp4",
Title: "Spider visible",
Tags: []string{"91porn"},
Size: 2048,
PublishedAt: now.Add(3 * time.Second),
CreatedAt: now.Add(3 * time.Second),
UpdatedAt: now.Add(3 * time.Second),
},
} {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("seed %s: %v", v.ID, err)
}
}
for _, id := range []string{"pikpak-canonical", "spider91-dup-1", "spider91-dup-2"} {
if err := cat.UpdateVideoFingerprint(ctx, id, "same-sampled-sha256", "ready", ""); err != nil {
t.Fatalf("fingerprint %s: %v", id, err)
}
}
if err := cat.UpdateVideoFingerprint(ctx, "spider91-visible", "unique-sampled-sha256", "ready", ""); err != nil {
t.Fatalf("fingerprint visible: %v", err)
}
items, total, err := cat.ListVideos(ctx, ListParams{Tag: "91porn", Page: 1, PageSize: 10})
if err != nil {
t.Fatalf("list videos by tag: %v", err)
}
if total != 2 || len(items) != 2 {
t.Fatalf("tagged videos total=%d len=%d, want 2", total, len(items))
}
gotIDs := map[string]bool{}
for _, item := range items {
gotIDs[item.ID] = true
}
for _, want := range []string{"pikpak-canonical", "spider91-visible"} {
if !gotIDs[want] {
t.Fatalf("tagged video ids = %#v, want %s", gotIDs, want)
}
}
if got := mustTagByLabel(t, ctx, cat, "91porn").Count; got != 2 {
t.Fatalf("91porn count = %d, want 2 visible canonical videos", got)
}
}
func TestListVideosCanFilterReadyThumbnails(t *testing.T) {
ctx := context.Background()
cat, err := Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for _, v := range []*Video{
{
ID: "ready-video",
DriveID: "drive",
FileID: "file-ready",
Title: "Ready",
ThumbnailURL: "/p/thumb/ready-video",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
},
{
ID: "pending-video",
DriveID: "drive",
FileID: "file-pending",
Title: "Pending",
PublishedAt: now.Add(time.Second),
CreatedAt: now.Add(time.Second),
UpdatedAt: now.Add(time.Second),
},
} {
if err := cat.UpsertVideo(ctx, v); err != nil {
t.Fatalf("seed video %s: %v", v.ID, err)
}
}
items, total, err := cat.ListVideos(ctx, ListParams{
Page: 1, PageSize: 10, ThumbnailReadyOnly: true,
})
if err != nil {
t.Fatalf("list videos: %v", err)
}
if total != 1 || len(items) != 1 {
t.Fatalf("ready videos total=%d len=%d, want 1", total, len(items))
}
if items[0].ID != "ready-video" {
t.Fatalf("ready video id = %q, want ready-video", items[0].ID)
}
}
func sameStrings(a, b []string) bool {
if len(a) != len(b) {
return false
@@ -593,6 +1245,39 @@ func sameStrings(a, b []string) bool {
return true
}
func mustListTags(t *testing.T, ctx context.Context, cat *Catalog) []Tag {
t.Helper()
tags, err := cat.ListTags(ctx)
if err != nil {
t.Fatalf("list tags: %v", err)
}
return tags
}
func mustTagByLabel(t *testing.T, ctx context.Context, cat *Catalog, label string) Tag {
t.Helper()
for _, tag := range mustListTags(t, ctx, cat) {
if tag.Label == label {
return tag
}
}
t.Fatalf("tag %q not found", label)
return Tag{}
}
func videoUpdatedAtByID(t *testing.T, ctx context.Context, cat *Catalog, ids ...string) map[string]int64 {
t.Helper()
out := make(map[string]int64, len(ids))
for _, id := range ids {
var updatedAt int64
if err := cat.db.QueryRowContext(ctx, `SELECT updated_at FROM videos WHERE id = ?`, id).Scan(&updatedAt); err != nil {
t.Fatalf("read updated_at for %s: %v", id, err)
}
out[id] = updatedAt
}
return out
}
// 删除 collection 标签的最后一个引用视频后,标签应当自动从 tags 表里消失。
// user/system 标签不受影响:用户/系统标签的语义由人维护,孤儿状态保留。
func TestDeleteVideoPrunesOrphanCollectionTag(t *testing.T) {
@@ -786,11 +1471,12 @@ func TestReconcileThumbnailStatusOnce(t *testing.T) {
id, url, status string
wantStatus string
}{
{"v-pending-url", "/p/thumb/v-pending-url", "pending", "ready"}, // 主要修复目标
{"v-empty-url-pending", "", "pending", "pending"}, // 没 url 不动
{"v-failed-with-url", "/p/thumb/v-failed-with-url", "failed", "failed"}, // 显式失败保留
{"v-empty-url-failed", "", "failed", "failed"}, // 失败 + 没 url 也保留
{"v-already-ready", "/p/thumb/v-already-ready", "ready", "ready"}, // 幂等
{"v-pending-url", "/p/thumb/v-pending-url", "pending", "ready"}, // 主要修复目标
{"v-empty-url-pending", "", "pending", "pending"}, // 没 url 不动
{"v-failed-with-url", "/p/thumb/v-failed-with-url", "failed", "failed"}, // 显式失败保留
{"v-empty-url-failed", "", "failed", "failed"}, // 失败 + 没 url 也保留
{"v-skipped-with-url", "/p/thumb/v-skipped-with-url", "skipped", "skipped"}, // 已跳过的时长补全保留
{"v-already-ready", "/p/thumb/v-already-ready", "ready", "ready"}, // 幂等
}
for _, c := range cases {
if err := cat.UpsertVideo(ctx, &Video{
+25 -2
View File
@@ -16,6 +16,11 @@ const (
DefaultAdminPassword = "admin123"
)
var (
legacyDefaultVideoExtensions = []string{".mp4", ".mkv", ".mov", ".webm", ".avi"}
defaultVideoExtensions = []string{".mp4", ".mkv", ".mov", ".webm", ".avi", ".strm"}
)
type Config struct {
Server Server `yaml:"server"`
Storage Storage `yaml:"storage"`
@@ -202,7 +207,7 @@ type Nightly struct {
// 这里保留 yaml 中的静态定义,用于启动时预置盘。生产建议只在 DB 里维护。
type Drive struct {
ID string `yaml:"id"`
Kind string `yaml:"kind"` // quark / p115 / pikpak / wopan / onedrive
Kind string `yaml:"kind"` // quark / p115 / p123 / pikpak / wopan / onedrive / googledrive / localstorage
Name string `yaml:"name"`
RootID string `yaml:"root_id"`
Params map[string]string `yaml:"params,omitempty"`
@@ -247,7 +252,9 @@ func (c *Config) applyDefaults() {
c.Scanner.MaxDepth = 5
}
if len(c.Scanner.VideoExtensions) == 0 {
c.Scanner.VideoExtensions = []string{".mp4", ".mkv", ".mov", ".webm", ".avi"}
c.Scanner.VideoExtensions = append([]string{}, defaultVideoExtensions...)
} else if isLegacyDefaultVideoExtensions(c.Scanner.VideoExtensions) {
c.Scanner.VideoExtensions = append(c.Scanner.VideoExtensions, ".strm")
}
if c.Preview.FFmpegPath == "" {
c.Preview.FFmpegPath = "ffmpeg"
@@ -276,3 +283,19 @@ func (c *Config) applyDefaults() {
c.Nightly.CronHour = 1
}
}
func isLegacyDefaultVideoExtensions(exts []string) bool {
if len(exts) != len(legacyDefaultVideoExtensions) {
return false
}
seen := make(map[string]struct{}, len(exts))
for _, ext := range exts {
seen[strings.ToLower(strings.TrimSpace(ext))] = struct{}{}
}
for _, ext := range legacyDefaultVideoExtensions {
if _, ok := seen[ext]; !ok {
return false
}
}
return true
}
+62
View File
@@ -3,6 +3,7 @@ package config
import (
"os"
"path/filepath"
"strings"
"testing"
)
@@ -50,3 +51,64 @@ storage:
t.Fatalf("db path = %q, want preserved value", cfg.Storage.DBPath)
}
}
func TestLoadDefaultScannerVideoExtensionsIncludeSTRM(t *testing.T) {
path := filepath.Join(t.TempDir(), "config.yaml")
if err := os.WriteFile(path, []byte(`{}`), 0o644); err != nil {
t.Fatalf("write config: %v", err)
}
cfg, err := Load(path)
if err != nil {
t.Fatalf("load config: %v", err)
}
if !hasVideoExtension(cfg.Scanner.VideoExtensions, ".strm") {
t.Fatalf("video extensions = %#v, want .strm", cfg.Scanner.VideoExtensions)
}
}
func TestLoadLegacyDefaultScannerVideoExtensionsIncludeSTRM(t *testing.T) {
path := filepath.Join(t.TempDir(), "config.yaml")
if err := os.WriteFile(path, []byte(`
scanner:
video_extensions: [".mp4", ".mkv", ".mov", ".webm", ".avi"]
`), 0o644); err != nil {
t.Fatalf("write config: %v", err)
}
cfg, err := Load(path)
if err != nil {
t.Fatalf("load config: %v", err)
}
if !hasVideoExtension(cfg.Scanner.VideoExtensions, ".strm") {
t.Fatalf("video extensions = %#v, want .strm appended for legacy default list", cfg.Scanner.VideoExtensions)
}
}
func TestLoadCustomScannerVideoExtensionsArePreserved(t *testing.T) {
path := filepath.Join(t.TempDir(), "config.yaml")
if err := os.WriteFile(path, []byte(`
scanner:
video_extensions: [".mp4"]
`), 0o644); err != nil {
t.Fatalf("write config: %v", err)
}
cfg, err := Load(path)
if err != nil {
t.Fatalf("load config: %v", err)
}
if len(cfg.Scanner.VideoExtensions) != 1 || cfg.Scanner.VideoExtensions[0] != ".mp4" {
t.Fatalf("video extensions = %#v, want custom list preserved", cfg.Scanner.VideoExtensions)
}
}
func hasVideoExtension(exts []string, want string) bool {
want = strings.ToLower(strings.TrimSpace(want))
for _, ext := range exts {
if strings.ToLower(strings.TrimSpace(ext)) == want {
return true
}
}
return false
}
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,407 @@
package googledrive
import (
"context"
"crypto/md5"
"encoding/hex"
"encoding/json"
"errors"
"io"
"net/http"
"net/http/httptest"
"strings"
"testing"
"time"
"github.com/video-site/backend/internal/drives"
)
func TestInitUsesOnlineRenewAPI(t *testing.T) {
var savedAccess, savedRefresh string
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/renew" {
t.Fatalf("unexpected path %s", r.URL.Path)
}
if got := r.URL.Query().Get("refresh_ui"); got != "old-refresh" {
t.Fatalf("refresh_ui = %q", got)
}
if got := r.URL.Query().Get("server_use"); got != "true" {
t.Fatalf("server_use = %q", got)
}
if got := r.URL.Query().Get("driver_txt"); got != "googleui_go" {
t.Fatalf("driver_txt = %q", got)
}
writeTestJSON(w, tokenResp{
AccessToken: "new-access",
RefreshToken: "new-refresh",
})
}))
defer srv.Close()
d := New(Config{
ID: "g",
RefreshToken: "old-refresh",
UseOnlineAPI: true,
RenewAPIURL: srv.URL + "/renew",
OnTokenUpdate: func(access, refresh string) {
savedAccess = access
savedRefresh = refresh
},
})
if err := d.Init(context.Background()); err != nil {
t.Fatalf("Init() error = %v", err)
}
if d.accessToken != "new-access" || d.refreshToken != "new-refresh" {
t.Fatalf("tokens not applied: access=%q refresh=%q", d.accessToken, d.refreshToken)
}
if savedAccess != "new-access" || savedRefresh != "new-refresh" {
t.Fatalf("tokens not persisted: access=%q refresh=%q", savedAccess, savedRefresh)
}
}
func TestListMapsGoogleDriveFiles(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if got := r.Header.Get("Authorization"); got != "Bearer access" {
t.Fatalf("Authorization = %q", got)
}
if r.URL.Path != "/drive/v3/files" {
t.Fatalf("unexpected path %s", r.URL.Path)
}
if !strings.Contains(r.URL.Query().Get("q"), "'root' in parents") {
t.Fatalf("unexpected q = %q", r.URL.Query().Get("q"))
}
writeTestJSON(w, filesResp{Files: []driveFile{
{ID: "folder-1", Name: "Movies", MimeType: "application/vnd.google-apps.folder"},
{
ID: "file-1",
Name: "clip.mp4",
MimeType: "video/mp4",
Size: "1234",
MD5Checksum: "abc",
ThumbnailLink: "https://thumb.example/1",
},
}})
}))
defer srv.Close()
d := New(Config{ID: "g", RootID: "root", APIBaseURL: srv.URL + "/drive/v3"})
d.accessToken = "access"
d.listInterval = -1
entries, err := d.List(context.Background(), "")
if err != nil {
t.Fatalf("List() error = %v", err)
}
if len(entries) != 2 {
t.Fatalf("len(entries) = %d", len(entries))
}
if !entries[0].IsDir || entries[0].ID != "folder-1" {
t.Fatalf("folder entry = %+v", entries[0])
}
if entries[1].ID != "file-1" || entries[1].Size != 1234 || entries[1].Hash != "abc" || entries[1].ThumbnailURL == "" {
t.Fatalf("file entry = %+v", entries[1])
}
}
func TestStreamURLReturnsAuthenticatedMediaLinkWithoutRedirectRequirement(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if got := r.Header.Get("Authorization"); got != "Bearer access" {
t.Fatalf("Authorization = %q", got)
}
if r.URL.Path != "/drive/v3/files/file-1" {
t.Fatalf("unexpected path %s", r.URL.Path)
}
writeTestJSON(w, driveFile{
ID: "file-1",
Name: "clip.mp4",
MimeType: "video/mp4",
Size: "1234",
})
}))
defer srv.Close()
d := New(Config{ID: "g", APIBaseURL: srv.URL + "/drive/v3"})
d.accessToken = "access"
link, err := d.StreamURL(context.Background(), "file-1")
if err != nil {
t.Fatalf("StreamURL() error = %v", err)
}
if !strings.HasPrefix(link.URL, srv.URL+"/drive/v3/files/file-1?") {
t.Fatalf("link URL = %q", link.URL)
}
if !strings.Contains(link.URL, "alt=media") {
t.Fatalf("link URL missing alt=media: %q", link.URL)
}
if got := link.Headers.Get("Authorization"); got != "Bearer access" {
t.Fatalf("link Authorization = %q", got)
}
}
func TestUploadAndReportHashUsesResumableSession(t *testing.T) {
body := "hello google drive"
wantHash := md5.Sum([]byte(body))
var sawSession bool
var sawUpload bool
var srv *httptest.Server
srv = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/upload/drive/v3/files":
sawSession = true
if got := r.Header.Get("Authorization"); got != "Bearer access" {
t.Fatalf("session Authorization = %q", got)
}
if got := r.URL.Query().Get("uploadType"); got != "resumable" {
t.Fatalf("uploadType = %q", got)
}
if got := r.Header.Get("X-Upload-Content-Length"); got != "18" {
t.Fatalf("X-Upload-Content-Length = %q", got)
}
var meta struct {
Name string `json:"name"`
Parents []string `json:"parents"`
}
if err := json.NewDecoder(r.Body).Decode(&meta); err != nil {
t.Fatalf("decode session metadata: %v", err)
}
if meta.Name != "clip.mp4" || len(meta.Parents) != 1 || meta.Parents[0] != "parent-1" {
t.Fatalf("metadata = %+v", meta)
}
w.Header().Set("Location", srv.URL+"/upload/session/1")
w.WriteHeader(http.StatusOK)
case "/upload/session/1":
sawUpload = true
if got := r.Header.Get("Authorization"); got != "Bearer access" {
t.Fatalf("upload Authorization = %q", got)
}
if got := r.Header.Get("Content-Range"); got != "bytes 0-17/18" {
t.Fatalf("Content-Range = %q", got)
}
gotBody, err := io.ReadAll(r.Body)
if err != nil {
t.Fatalf("read upload body: %v", err)
}
if string(gotBody) != body {
t.Fatalf("upload body = %q", string(gotBody))
}
writeTestJSONStatus(w, http.StatusCreated, driveFile{
ID: "file-uploaded",
Name: "clip.mp4",
Size: "18",
MD5Checksum: hex.EncodeToString(wantHash[:]),
})
default:
t.Fatalf("unexpected path %s", r.URL.Path)
}
}))
defer srv.Close()
d := New(Config{ID: "g", APIBaseURL: srv.URL + "/drive/v3"})
d.accessToken = "access"
res, err := d.UploadAndReportHash(context.Background(), "parent-1", "clip.mp4", strings.NewReader(body), int64(len(body)))
if err != nil {
t.Fatalf("UploadAndReportHash() error = %v", err)
}
if !sawSession || !sawUpload {
t.Fatalf("saw session/upload = %v/%v, want both", sawSession, sawUpload)
}
if res.FileID != "file-uploaded" || res.Size != int64(len(body)) || res.Hash != hex.EncodeToString(wantHash[:]) {
t.Fatalf("upload result = %+v", res)
}
}
func TestEnsureDirAndRenameUseGoogleDriveFileAPI(t *testing.T) {
var madeDir bool
var renamed bool
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch {
case r.Method == http.MethodGet && r.URL.Path == "/drive/v3/files":
writeTestJSON(w, filesResp{})
case r.Method == http.MethodPost && r.URL.Path == "/drive/v3/files":
madeDir = true
var meta struct {
Name string `json:"name"`
Parents []string `json:"parents"`
MimeType string `json:"mimeType"`
}
if err := json.NewDecoder(r.Body).Decode(&meta); err != nil {
t.Fatalf("decode mkdir body: %v", err)
}
if meta.Name != "91 Spider" || len(meta.Parents) != 1 || meta.Parents[0] != "root" || meta.MimeType != "application/vnd.google-apps.folder" {
t.Fatalf("mkdir body = %+v", meta)
}
writeTestJSON(w, driveFile{ID: "folder-91", Name: "91 Spider", MimeType: "application/vnd.google-apps.folder"})
case r.Method == http.MethodPatch && r.URL.Path == "/drive/v3/files/file-1":
renamed = true
var body map[string]string
if err := json.NewDecoder(r.Body).Decode(&body); err != nil {
t.Fatalf("decode rename body: %v", err)
}
if body["name"] != "new-name.mp4" {
t.Fatalf("rename body = %+v", body)
}
writeTestJSON(w, driveFile{ID: "file-1", Name: "new-name.mp4"})
default:
t.Fatalf("unexpected %s %s", r.Method, r.URL.Path)
}
}))
defer srv.Close()
d := New(Config{ID: "g", RootID: "root", APIBaseURL: srv.URL + "/drive/v3"})
d.accessToken = "access"
d.listInterval = -1
dirID, err := d.EnsureDir(context.Background(), "91 Spider")
if err != nil {
t.Fatalf("EnsureDir() error = %v", err)
}
if dirID != "folder-91" || !madeDir {
t.Fatalf("dirID/madeDir = %q/%v, want folder-91/true", dirID, madeDir)
}
if err := d.Rename(context.Background(), "file-1", "new-name.mp4"); err != nil {
t.Fatalf("Rename() error = %v", err)
}
if !renamed {
t.Fatal("rename endpoint was not called")
}
}
func TestRequestRefreshesOnUnauthorized(t *testing.T) {
var fileCalls int
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/renew":
writeTestJSON(w, tokenResp{
AccessToken: "new-access",
RefreshToken: "new-refresh",
})
case "/drive/v3/files/file-1":
fileCalls++
if fileCalls == 1 {
writeTestJSONStatus(w, http.StatusUnauthorized, apiErrorResp{Error: apiErrorBody{
Code: http.StatusUnauthorized,
Message: "Invalid Credentials",
}})
return
}
if got := r.Header.Get("Authorization"); got != "Bearer new-access" {
t.Fatalf("Authorization after refresh = %q", got)
}
writeTestJSON(w, driveFile{ID: "file-1", Name: "clip.mp4", Size: "1"})
default:
t.Fatalf("unexpected path %s", r.URL.Path)
}
}))
defer srv.Close()
d := New(Config{
ID: "g",
RefreshToken: "old-refresh",
UseOnlineAPI: true,
RenewAPIURL: srv.URL + "/renew",
APIBaseURL: srv.URL + "/drive/v3",
})
d.accessToken = "old-access"
if _, err := d.Stat(context.Background(), "file-1"); err != nil {
t.Fatalf("Stat() error = %v", err)
}
if fileCalls != 2 {
t.Fatalf("fileCalls = %d", fileCalls)
}
if d.accessToken != "new-access" || d.refreshToken != "new-refresh" {
t.Fatalf("tokens not refreshed: access=%q refresh=%q", d.accessToken, d.refreshToken)
}
}
func TestRateLimitReasonsFollowGoogleDriveErrorShape(t *testing.T) {
reasons := []string{
"rateLimitExceeded",
"userRateLimitExceeded",
"dailyLimitExceeded",
"dailyLimitExceededUnreg",
"downloadQuotaExceeded",
"sharingRateLimitExceeded",
"quotaExceeded",
}
for _, reason := range reasons {
body := apiErrorBody{
Code: http.StatusForbidden,
Message: "google drive quota or rate limited",
Errors: []struct {
Domain string `json:"domain"`
Reason string `json:"reason"`
Message string `json:"message"`
LocationType string `json:"location_type"`
Location string `json:"location"`
}{
{Domain: "usageLimits", Reason: reason, Message: reason},
},
}
if !isGoogleRateLimit(nil, body) {
t.Fatalf("reason %q not treated as rate limit", reason)
}
}
}
func TestStreamURLRateLimitStartsSharedLinkCooldown(t *testing.T) {
var calls int
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
calls++
w.Header().Set("Retry-After", "120")
writeTestJSONStatus(w, http.StatusForbidden, apiErrorResp{Error: apiErrorBody{
Code: http.StatusForbidden,
Message: "User rate limit exceeded.",
Errors: []struct {
Domain string `json:"domain"`
Reason string `json:"reason"`
Message string `json:"message"`
LocationType string `json:"location_type"`
Location string `json:"location"`
}{
{Domain: "usageLimits", Reason: "userRateLimitExceeded", Message: "User rate limit exceeded."},
},
}})
}))
defer srv.Close()
d := New(Config{ID: "g", APIBaseURL: srv.URL})
d.accessToken = "access"
d.linkCooldownDuration = time.Hour
_, err := d.StreamURL(context.Background(), "file-1")
if err == nil {
t.Fatal("first StreamURL succeeded, want rate limit")
}
var rateLimit *drives.RateLimitError
if !errors.As(err, &rateLimit) {
t.Fatalf("first error = %T %[1]v, want RateLimitError", err)
}
if rateLimit.RetryAfter != 2*time.Minute {
t.Fatalf("retry after = %s, want 2m", rateLimit.RetryAfter)
}
_, err = d.StreamURL(context.Background(), "file-1")
if err == nil {
t.Fatal("second StreamURL succeeded during cooldown")
}
if !errors.As(err, &rateLimit) {
t.Fatalf("second error = %T %[1]v, want RateLimitError", err)
}
if calls != 1 {
t.Fatalf("remote calls = %d, want 1; second call should use shared cooldown", calls)
}
if rateLimit.RetryAfter <= 0 || rateLimit.RetryAfter > 2*time.Minute {
t.Fatalf("second retry after = %s, want remaining cooldown", rateLimit.RetryAfter)
}
}
func writeTestJSON(w http.ResponseWriter, v any) {
writeTestJSONStatus(w, http.StatusOK, v)
}
func writeTestJSONStatus(w http.ResponseWriter, status int, v any) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)
_ = json.NewEncoder(w).Encode(v)
}
@@ -0,0 +1,57 @@
package googledrive
import "time"
type tokenResp struct {
AccessToken string `json:"access_token"`
RefreshToken string `json:"refresh_token"`
ExpiresIn int64 `json:"expires_in"`
Error string `json:"error"`
ErrorDescription string `json:"error_description"`
Text string `json:"text"`
}
type filesResp struct {
NextPageToken string `json:"nextPageToken"`
Files []driveFile `json:"files"`
Error apiErrorBody `json:"error"`
}
type driveFile struct {
ID string `json:"id"`
Name string `json:"name"`
MimeType string `json:"mimeType"`
ModifiedTime time.Time `json:"modifiedTime"`
CreatedTime time.Time `json:"createdTime"`
Size string `json:"size"`
ThumbnailLink string `json:"thumbnailLink"`
MD5Checksum string `json:"md5Checksum"`
SHA1Checksum string `json:"sha1Checksum"`
SHA256Checksum string `json:"sha256Checksum"`
Shortcut struct {
TargetID string `json:"targetId"`
TargetMimeType string `json:"targetMimeType"`
} `json:"shortcutDetails"`
}
type apiErrorResp struct {
Error apiErrorBody `json:"error"`
}
type apiErrorBody struct {
Code int `json:"code"`
Message string `json:"message"`
Errors []struct {
Domain string `json:"domain"`
Reason string `json:"reason"`
Message string `json:"message"`
LocationType string `json:"location_type"`
Location string `json:"location"`
} `json:"errors"`
}
type UploadResult struct {
FileID string
Hash string
Size int64
}
+23 -2
View File
@@ -10,7 +10,7 @@ import (
// Drive 是多家网盘统一抽象。上层不区分盘,只区分 Kind。
type Drive interface {
// Kind 返回驱动代号:"quark" / "p115" / "pikpak" / "wopan" / "onedrive"
// Kind 返回驱动代号:"quark" / "p115" / "p123" / "pikpak" / "wopan" / "onedrive" / "googledrive" / "localstorage"
Kind() string
// ID 返回该盘在 catalog 中的唯一标识
@@ -30,7 +30,7 @@ type Drive interface {
StreamURL(ctx context.Context, fileID string) (*StreamLink, error)
// Upload 把本地流写入指定目录,返回新文件 fileID。
// 当前 teaser 和封面只保存在本地,不再通过该方法写回网盘。
// 当前预览视频和封面只保存在本地,不再通过该方法写回网盘。
Upload(ctx context.Context, parentID, name string, r io.Reader, size int64) (string, error)
// EnsureDir 保证指定路径存在(相对根目录),返回最终目录 fileID。
@@ -40,6 +40,27 @@ type Drive interface {
RootID() string
}
// Remover is an optional drive capability. It mirrors OpenList's optional
// Remove interface: callers must type-assert before deleting a source file.
type Remover interface {
Remove(ctx context.Context, fileID string) error
}
// SourceFile carries the catalog metadata available when an administrator
// requests deletion of the original source file.
type SourceFile struct {
FileID string
ParentID string
Name string
Size int64
}
// SourceRemover is an optional, richer removal capability for providers whose
// playback ID is not the same ID required by their delete API.
type SourceRemover interface {
RemoveSource(ctx context.Context, source SourceFile) error
}
type Entry struct {
ID string
Name string
@@ -0,0 +1,446 @@
// Package localstorage exposes an existing server-side directory as a Drive.
package localstorage
import (
"context"
"encoding/base64"
"errors"
"fmt"
"io"
"net/url"
"os"
"path/filepath"
"strings"
"time"
"github.com/video-site/backend/internal/drives"
)
const Kind = "localstorage"
const maxSTRMBytes = 64 * 1024
type Config struct {
ID string
RootPath string
}
type Driver struct {
id string
rootPath string
}
func New(c Config) *Driver {
return &Driver{
id: c.ID,
rootPath: c.RootPath,
}
}
func (d *Driver) Kind() string { return Kind }
func (d *Driver) ID() string { return d.id }
func (d *Driver) RootID() string { return "/" }
func (d *Driver) Init(context.Context) error {
root, err := d.root()
if err != nil {
return err
}
info, err := os.Stat(root)
if err != nil {
return fmt.Errorf("localstorage: stat root %q: %w%s", root, err, localStoragePathHint(d.rootPath))
}
if !info.IsDir() {
return fmt.Errorf("localstorage: root is not a directory: %s", root)
}
return nil
}
func (d *Driver) List(ctx context.Context, dirID string) ([]drives.Entry, error) {
dir, rel, err := d.pathForID(dirID)
if err != nil {
return nil, err
}
entries, err := os.ReadDir(dir)
if err != nil {
return nil, err
}
out := make([]drives.Entry, 0, len(entries))
for _, entry := range entries {
if err := ctx.Err(); err != nil {
return nil, err
}
// Symlinks can escape the configured root or create cycles. Keep the
// local storage drive predictable by scanning real files/directories only.
if entry.Type()&os.ModeSymlink != 0 {
continue
}
info, err := entry.Info()
if err != nil {
continue
}
if !info.IsDir() && !info.Mode().IsRegular() {
continue
}
childRel := joinRel(rel, entry.Name())
out = append(out, drives.Entry{
ID: encodeRel(childRel),
Name: entry.Name(),
Size: sizeForEntry(info),
IsDir: info.IsDir(),
ParentID: idForRel(rel),
ModTime: info.ModTime(),
})
}
return out, nil
}
func (d *Driver) Stat(ctx context.Context, fileID string) (*drives.Entry, error) {
p, rel, err := d.pathForID(fileID)
if err != nil {
return nil, err
}
info, err := os.Stat(p)
if err != nil {
return nil, err
}
return &drives.Entry{
ID: idForRel(rel),
Name: filepath.Base(p),
Size: sizeForEntry(info),
IsDir: info.IsDir(),
ParentID: idForRel(parentRel(rel)),
ModTime: info.ModTime(),
}, nil
}
func (d *Driver) StreamURL(ctx context.Context, fileID string) (*drives.StreamLink, error) {
p, _, err := d.pathForID(fileID)
if err != nil {
return nil, err
}
info, err := os.Stat(p)
if err != nil {
return nil, err
}
if info.IsDir() || !info.Mode().IsRegular() {
return nil, os.ErrNotExist
}
if strings.EqualFold(filepath.Ext(p), ".strm") {
return d.streamURLFromSTRM(ctx, p)
}
if info.Size() <= 0 {
return nil, os.ErrNotExist
}
return &drives.StreamLink{
URL: p,
Expires: time.Now().Add(24 * time.Hour),
}, nil
}
func (d *Driver) streamURLFromSTRM(ctx context.Context, strmPath string) (*drives.StreamLink, error) {
target, err := readSTRMTarget(strmPath)
if err != nil {
return nil, err
}
if err := ctx.Err(); err != nil {
return nil, err
}
if filepath.IsAbs(target) {
return d.localSTRMLink(strmPath, target)
}
u, err := url.Parse(target)
if err == nil {
switch strings.ToLower(u.Scheme) {
case "http", "https":
if u.Host == "" {
return nil, fmt.Errorf("localstorage: invalid strm url %q", target)
}
return &drives.StreamLink{
URL: target,
Expires: time.Now().Add(24 * time.Hour),
}, nil
case "file":
if u.Host != "" && !strings.EqualFold(u.Host, "localhost") {
return nil, fmt.Errorf("localstorage: unsupported strm file url host %q", u.Host)
}
return d.localSTRMLink(strmPath, u.Path)
case "":
// Local path below.
default:
return nil, fmt.Errorf("localstorage: unsupported strm target scheme %q", u.Scheme)
}
} else if strings.Contains(target, "://") {
return nil, fmt.Errorf("localstorage: invalid strm url %q: %w", target, err)
}
return d.localSTRMLink(strmPath, target)
}
func readSTRMTarget(path string) (string, error) {
f, err := os.Open(path)
if err != nil {
return "", err
}
defer f.Close()
data, err := io.ReadAll(io.LimitReader(f, maxSTRMBytes+1))
if err != nil {
return "", err
}
if len(data) > maxSTRMBytes {
return "", errors.New("localstorage: strm file is too large")
}
lines := strings.Split(string(data), "\n")
for i, line := range lines {
if i == 0 {
line = strings.TrimPrefix(line, "\ufeff")
}
line = strings.TrimSpace(line)
if line != "" {
return line, nil
}
}
return "", errors.New("localstorage: empty strm target")
}
func (d *Driver) localSTRMLink(strmPath, target string) (*drives.StreamLink, error) {
target = strings.TrimSpace(target)
if target == "" {
return nil, errors.New("localstorage: empty strm target")
}
var p string
if filepath.IsAbs(target) {
p = filepath.Clean(target)
} else {
p = filepath.Join(filepath.Dir(strmPath), filepath.FromSlash(target))
}
p, err := filepath.Abs(p)
if err != nil {
return nil, err
}
root, err := d.root()
if err != nil {
return nil, err
}
realPath, within, err := realPathWithinRoot(root, p)
if err != nil {
return nil, err
}
if !within {
return nil, errors.New("localstorage: strm target escapes root")
}
if strings.EqualFold(filepath.Ext(p), ".strm") || strings.EqualFold(filepath.Ext(realPath), ".strm") {
return nil, errors.New("localstorage: nested strm target is not supported")
}
info, err := os.Stat(realPath)
if err != nil {
return nil, err
}
if info.IsDir() || !info.Mode().IsRegular() || info.Size() <= 0 {
return nil, os.ErrNotExist
}
return &drives.StreamLink{
URL: realPath,
Expires: time.Now().Add(24 * time.Hour),
}, nil
}
func (d *Driver) Upload(context.Context, string, string, io.Reader, int64) (string, error) {
return "", drives.ErrNotSupported
}
func (d *Driver) EnsureDir(context.Context, string) (string, error) {
return "", drives.ErrNotSupported
}
func (d *Driver) Remove(ctx context.Context, fileID string) error {
if err := ctx.Err(); err != nil {
return err
}
p, rel, err := d.pathForID(fileID)
if err != nil {
if os.IsNotExist(err) {
return nil
}
return err
}
if rel == "" {
return errors.New("localstorage: refusing to remove root")
}
info, err := os.Stat(p)
if err != nil {
if os.IsNotExist(err) {
return nil
}
return err
}
if info.IsDir() {
return errors.New("localstorage: refusing to remove directory")
}
if !info.Mode().IsRegular() {
return errors.New("localstorage: refusing to remove non-regular file")
}
if err := os.Remove(p); err != nil && !os.IsNotExist(err) {
return err
}
return nil
}
func (d *Driver) root() (string, error) {
raw := strings.TrimSpace(d.rootPath)
if raw == "" {
return "", errors.New("localstorage: empty path")
}
raw = os.ExpandEnv(raw)
if strings.HasPrefix(raw, "~") {
if home, err := os.UserHomeDir(); err == nil && home != "" {
switch {
case raw == "~":
raw = home
case strings.HasPrefix(raw, "~/") || strings.HasPrefix(raw, `~\`):
raw = filepath.Join(home, raw[2:])
}
}
}
return filepath.Abs(raw)
}
var _ drives.Remover = (*Driver)(nil)
func (d *Driver) pathForID(id string) (string, string, error) {
root, err := d.root()
if err != nil {
return "", "", err
}
rel, err := decodeRel(id)
if err != nil {
return "", "", err
}
if rel == "" {
return root, "", nil
}
p, err := filepath.Abs(filepath.Join(root, filepath.FromSlash(rel)))
if err != nil {
return "", "", err
}
if !pathWithinRoot(root, p) {
return "", "", errors.New("localstorage: path escapes root")
}
if _, within, err := realPathWithinRoot(root, p); err != nil {
return "", "", err
} else if !within {
return "", "", errors.New("localstorage: path escapes root")
}
return p, rel, nil
}
func pathWithinRoot(root, path string) bool {
rel, err := filepath.Rel(root, path)
if err != nil {
return false
}
return rel == "." || (rel != ".." && !strings.HasPrefix(rel, ".."+string(os.PathSeparator)))
}
func realPathWithinRoot(root, path string) (string, bool, error) {
realRoot, err := filepath.EvalSymlinks(root)
if err != nil {
return "", false, err
}
realRoot, err = filepath.Abs(realRoot)
if err != nil {
return "", false, err
}
realPath, err := filepath.EvalSymlinks(path)
if err != nil {
return "", false, err
}
realPath, err = filepath.Abs(realPath)
if err != nil {
return "", false, err
}
return realPath, pathWithinRoot(realRoot, realPath), nil
}
func localStoragePathHint(configured string) string {
cwd, _ := os.Getwd()
parts := []string{}
if strings.TrimSpace(configured) != "" {
parts = append(parts, fmt.Sprintf("configured=%q", strings.TrimSpace(configured)))
}
if cwd != "" {
parts = append(parts, fmt.Sprintf("cwd=%q", cwd))
}
if _, err := os.Stat("/.dockerenv"); err == nil {
parts = append(parts, "docker=host paths must be bind-mounted into the container")
}
if len(parts) == 0 {
return ""
}
return " (" + strings.Join(parts, ", ") + ")"
}
func decodeRel(id string) (string, error) {
id = strings.TrimSpace(id)
if id == "" || id == "/" {
return "", nil
}
raw, err := base64.RawURLEncoding.DecodeString(id)
if err != nil {
return "", fmt.Errorf("localstorage: invalid file id: %w", err)
}
rel := filepath.ToSlash(filepath.Clean(filepath.FromSlash(string(raw))))
if rel == "." {
return "", nil
}
if strings.HasPrefix(rel, "../") || rel == ".." || strings.HasPrefix(rel, "/") {
return "", errors.New("localstorage: invalid relative path")
}
return rel, nil
}
func encodeRel(rel string) string {
rel = filepath.ToSlash(filepath.Clean(filepath.FromSlash(rel)))
if rel == "." || rel == "" {
return "/"
}
return base64.RawURLEncoding.EncodeToString([]byte(rel))
}
func idForRel(rel string) string {
if rel == "" {
return "/"
}
return encodeRel(rel)
}
func joinRel(parent, name string) string {
if parent == "" {
return filepath.ToSlash(name)
}
return filepath.ToSlash(filepath.Join(filepath.FromSlash(parent), name))
}
func parentRel(rel string) string {
if rel == "" {
return ""
}
parent := filepath.ToSlash(filepath.Dir(filepath.FromSlash(rel)))
if parent == "." {
return ""
}
return parent
}
func sizeForEntry(info os.FileInfo) int64 {
if info == nil || info.IsDir() {
return 0
}
return info.Size()
}
var _ drives.Drive = (*Driver)(nil)
@@ -0,0 +1,339 @@
package localstorage
import (
"context"
"encoding/base64"
"os"
"path/filepath"
"strings"
"testing"
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/scanner"
)
func TestListEncodesRelativePathsAndStreamURLResolvesFile(t *testing.T) {
root := t.TempDir()
sub := filepath.Join(root, "clips")
if err := os.MkdirAll(sub, 0o755); err != nil {
t.Fatalf("mkdir: %v", err)
}
videoPath := filepath.Join(sub, "sample.mp4")
if err := os.WriteFile(videoPath, []byte("video"), 0o644); err != nil {
t.Fatalf("write video: %v", err)
}
drv := New(Config{ID: "local", RootPath: root})
if err := drv.Init(context.Background()); err != nil {
t.Fatalf("init: %v", err)
}
rootEntries, err := drv.List(context.Background(), drv.RootID())
if err != nil {
t.Fatalf("list root: %v", err)
}
if len(rootEntries) != 1 || !rootEntries[0].IsDir {
t.Fatalf("root entries = %#v, want one directory", rootEntries)
}
if strings.Contains(rootEntries[0].ID, "/") {
t.Fatalf("encoded dir id contains slash: %q", rootEntries[0].ID)
}
fileEntries, err := drv.List(context.Background(), rootEntries[0].ID)
if err != nil {
t.Fatalf("list subdir: %v", err)
}
if len(fileEntries) != 1 || fileEntries[0].Name != "sample.mp4" {
t.Fatalf("file entries = %#v, want sample.mp4", fileEntries)
}
if strings.Contains(fileEntries[0].ID, "/") {
t.Fatalf("encoded file id contains slash: %q", fileEntries[0].ID)
}
link, err := drv.StreamURL(context.Background(), fileEntries[0].ID)
if err != nil {
t.Fatalf("stream url: %v", err)
}
if link.URL != videoPath {
t.Fatalf("url = %q, want %q", link.URL, videoPath)
}
}
func TestStreamURLResolvesHTTPSTRM(t *testing.T) {
root := t.TempDir()
strmPath := filepath.Join(root, "movie.strm")
target := "https://media.example/clip.mp4?token=abc"
if err := os.WriteFile(strmPath, []byte("\ufeff\n "+target+"\n"), 0o644); err != nil {
t.Fatalf("write strm: %v", err)
}
drv := New(Config{ID: "local", RootPath: root})
link, err := drv.StreamURL(context.Background(), encodeRel("movie.strm"))
if err != nil {
t.Fatalf("stream url: %v", err)
}
if link.URL != target {
t.Fatalf("url = %q, want %q", link.URL, target)
}
}
func TestStreamURLResolvesRelativeLocalSTRM(t *testing.T) {
root := t.TempDir()
if err := os.MkdirAll(filepath.Join(root, "links"), 0o755); err != nil {
t.Fatalf("mkdir links: %v", err)
}
if err := os.MkdirAll(filepath.Join(root, "media"), 0o755); err != nil {
t.Fatalf("mkdir media: %v", err)
}
videoPath := filepath.Join(root, "media", "clip.mp4")
if err := os.WriteFile(videoPath, []byte("video"), 0o644); err != nil {
t.Fatalf("write video: %v", err)
}
if err := os.WriteFile(filepath.Join(root, "links", "movie.strm"), []byte("../media/clip.mp4\n"), 0o644); err != nil {
t.Fatalf("write strm: %v", err)
}
drv := New(Config{ID: "local", RootPath: root})
link, err := drv.StreamURL(context.Background(), encodeRel("links/movie.strm"))
if err != nil {
t.Fatalf("stream url: %v", err)
}
if link.URL != videoPath {
t.Fatalf("url = %q, want %q", link.URL, videoPath)
}
}
func TestStreamURLRejectsInvalidSTRMTargets(t *testing.T) {
tests := []struct {
name string
setup func(t *testing.T, root string) string
want string
}{
{
name: "empty",
setup: func(t *testing.T, root string) string {
t.Helper()
writeLocalStorageTestFile(t, filepath.Join(root, "empty.strm"), []byte("\n \r\n"))
return "empty.strm"
},
want: "empty strm target",
},
{
name: "escapes root",
setup: func(t *testing.T, root string) string {
t.Helper()
writeLocalStorageTestFile(t, filepath.Join(filepath.Dir(root), "outside.mp4"), []byte("video"))
writeLocalStorageTestFile(t, filepath.Join(root, "escape.strm"), []byte("../outside.mp4\n"))
return "escape.strm"
},
want: "escapes root",
},
{
name: "nested",
setup: func(t *testing.T, root string) string {
t.Helper()
writeLocalStorageTestFile(t, filepath.Join(root, "nested.strm"), []byte("https://media.example/clip.mp4\n"))
writeLocalStorageTestFile(t, filepath.Join(root, "outer.strm"), []byte("nested.strm\n"))
return "outer.strm"
},
want: "nested strm target",
},
{
name: "unsupported scheme",
setup: func(t *testing.T, root string) string {
t.Helper()
writeLocalStorageTestFile(t, filepath.Join(root, "ftp.strm"), []byte("ftp://media.example/clip.mp4\n"))
return "ftp.strm"
},
want: "unsupported strm target scheme",
},
{
name: "too large",
setup: func(t *testing.T, root string) string {
t.Helper()
writeLocalStorageTestFile(t, filepath.Join(root, "large.strm"), []byte(strings.Repeat("x", maxSTRMBytes+1)))
return "large.strm"
},
want: "strm file is too large",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
root := t.TempDir()
rel := tt.setup(t, root)
drv := New(Config{ID: "local", RootPath: root})
_, err := drv.StreamURL(context.Background(), encodeRel(rel))
if err == nil || !strings.Contains(err.Error(), tt.want) {
t.Fatalf("error = %v, want contain %q", err, tt.want)
}
})
}
}
func TestStreamURLRejectsSTRMTargetEscapingRootThroughSymlink(t *testing.T) {
root := t.TempDir()
outside := t.TempDir()
writeLocalStorageTestFile(t, filepath.Join(outside, "secret.mp4"), []byte("secret"))
if err := os.MkdirAll(filepath.Join(root, "links"), 0o755); err != nil {
t.Fatalf("mkdir links: %v", err)
}
if err := os.MkdirAll(filepath.Join(root, "real"), 0o755); err != nil {
t.Fatalf("mkdir real: %v", err)
}
if err := os.Symlink(outside, filepath.Join(root, "real", "outside")); err != nil {
t.Fatalf("symlink: %v", err)
}
writeLocalStorageTestFile(t, filepath.Join(root, "links", "movie.strm"), []byte("../real/outside/secret.mp4\n"))
drv := New(Config{ID: "local", RootPath: root})
_, err := drv.StreamURL(context.Background(), encodeRel("links/movie.strm"))
if err == nil || !strings.Contains(err.Error(), "strm target escapes root") {
t.Fatalf("error = %v, want strm target escapes root", err)
}
}
func TestStreamURLRejectsSymlinkFileIDEscapingRoot(t *testing.T) {
root := t.TempDir()
outside := t.TempDir()
writeLocalStorageTestFile(t, filepath.Join(outside, "secret.mp4"), []byte("secret"))
if err := os.Symlink(filepath.Join(outside, "secret.mp4"), filepath.Join(root, "link.mp4")); err != nil {
t.Fatalf("symlink: %v", err)
}
drv := New(Config{ID: "local", RootPath: root})
_, err := drv.StreamURL(context.Background(), encodeRel("link.mp4"))
if err == nil || !strings.Contains(err.Error(), "path escapes root") {
t.Fatalf("error = %v, want path escapes root", err)
}
}
func TestStreamURLRejectsEscapingID(t *testing.T) {
drv := New(Config{ID: "local", RootPath: t.TempDir()})
escaped := base64.RawURLEncoding.EncodeToString([]byte("../secret.mp4"))
_, err := drv.StreamURL(context.Background(), escaped)
if err == nil || !strings.Contains(err.Error(), "invalid relative path") {
t.Fatalf("error = %v, want invalid relative path", err)
}
}
func TestInitRequiresExistingDirectory(t *testing.T) {
missing := filepath.Join(t.TempDir(), "missing")
drv := New(Config{ID: "local", RootPath: missing})
err := drv.Init(context.Background())
if err == nil || !strings.Contains(err.Error(), "stat root") {
t.Fatalf("error = %v, want stat root failure", err)
}
if !strings.Contains(err.Error(), missing) || !strings.Contains(err.Error(), "configured=") {
t.Fatalf("error = %v, want diagnostic path details", err)
}
}
func TestPathForIDAllowsRootPathSlash(t *testing.T) {
drv := New(Config{ID: "local", RootPath: string(os.PathSeparator)})
childID := encodeRel("tmp")
path, rel, err := drv.pathForID(childID)
if err != nil {
t.Fatalf("pathForID: %v", err)
}
if rel != "tmp" {
t.Fatalf("rel = %q, want tmp", rel)
}
if path != filepath.Join(string(os.PathSeparator), "tmp") {
t.Fatalf("path = %q, want /tmp", path)
}
}
func TestScannerPersistsLocalStorageSTRM(t *testing.T) {
ctx := context.Background()
root := t.TempDir()
if err := os.MkdirAll(filepath.Join(root, "collection"), 0o755); err != nil {
t.Fatalf("mkdir collection: %v", err)
}
if err := os.WriteFile(filepath.Join(root, "collection", "clip.strm"), []byte("https://media.example/clip.mp4\n"), 0o644); err != nil {
t.Fatalf("write strm: %v", err)
}
cat, err := catalog.Open(filepath.Join(t.TempDir(), "catalog.db"))
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := New(Config{ID: "local", RootPath: root})
sc := scanner.New(cat, drv, []string{".strm"}, nil, nil)
stats, err := sc.Run(ctx, drv.RootID())
if err != nil {
t.Fatalf("scan: %v", err)
}
if stats.Added != 1 {
t.Fatalf("added = %d, want 1", stats.Added)
}
fileID := encodeRel("collection/clip.strm")
got, err := cat.GetVideo(ctx, Kind+"-local-"+fileID)
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.Ext != "strm" || got.FileID != fileID || got.Category != "collection" {
t.Fatalf("video = %#v, want local strm video in collection", got)
}
}
func TestScannerPersistsLocalStorageVideo(t *testing.T) {
ctx := context.Background()
root := t.TempDir()
if err := os.MkdirAll(filepath.Join(root, "collection"), 0o755); err != nil {
t.Fatalf("mkdir collection: %v", err)
}
if err := os.WriteFile(filepath.Join(root, "collection", "clip.mp4"), []byte("video"), 0o644); err != nil {
t.Fatalf("write video: %v", err)
}
cat, err := catalog.Open(filepath.Join(t.TempDir(), "catalog.db"))
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := New(Config{ID: "local", RootPath: root})
sc := scanner.New(cat, drv, []string{".mp4"}, nil, nil)
stats, err := sc.Run(ctx, drv.RootID())
if err != nil {
t.Fatalf("scan: %v", err)
}
if stats.Added != 1 {
t.Fatalf("added = %d, want 1", stats.Added)
}
fileID := encodeRel("collection/clip.mp4")
got, err := cat.GetVideo(ctx, Kind+"-local-"+fileID)
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.DriveID != "local" || got.FileID != fileID || got.Category != "collection" {
t.Fatalf("video = %#v, want local drive video in collection", got)
}
}
func writeLocalStorageTestFile(t *testing.T, path string, data []byte) {
t.Helper()
if err := os.WriteFile(path, data, 0o644); err != nil {
t.Fatalf("write %s: %v", path, err)
}
}
@@ -78,12 +78,38 @@ func (d *Driver) EnsureDir(context.Context, string) (string, error) {
return "", drives.ErrNotSupported
}
func (d *Driver) Remove(ctx context.Context, fileID string) error {
if err := ctx.Err(); err != nil {
return err
}
path, err := d.uploadPath(fileID)
if err != nil {
return err
}
info, err := os.Stat(path)
if err != nil {
if os.IsNotExist(err) {
return nil
}
return err
}
if info.IsDir() {
return errors.New("localupload: refusing to remove directory")
}
if err := os.Remove(path); err != nil && !os.IsNotExist(err) {
return err
}
return nil
}
func (d *Driver) RootID() string { return d.uploadDir() }
func (d *Driver) uploadDir() string {
return d.uploadDirPath
}
var _ drives.Remover = (*Driver)(nil)
func (d *Driver) uploadPath(fileID string) (string, error) {
if strings.TrimSpace(fileID) == "" || filepath.Base(fileID) != fileID {
return "", errors.New("invalid upload file id")
+333 -22
View File
@@ -3,14 +3,19 @@ package onedrive
import (
"bytes"
"context"
"crypto/sha1"
"encoding/hex"
"encoding/json"
"errors"
"fmt"
"io"
"log"
"net/http"
"net/url"
"path"
"strconv"
"strings"
"sync"
"time"
"github.com/go-resty/resty/v2"
@@ -18,8 +23,17 @@ import (
)
const (
maxSmallUploadSize = 250 * 1024 * 1024
defaultRenewAPIURL = "https://api.oplist.org/onedrive/renewapi"
maxSmallUploadSize = 250 * 1024 * 1024
defaultUploadSessionChunk = 10 * 1024 * 1024
uploadSessionRetryAttempts = 3
defaultRenewAPIURL = "https://api.oplist.org/onedrive/renewapi"
onedriveListCooldown = 5 * time.Minute
onedriveListInterval = 1 * time.Second
)
var (
smallUploadThreshold = int64(maxSmallUploadSize)
uploadSessionChunk = int64(defaultUploadSessionChunk)
)
type Driver struct {
@@ -34,6 +48,11 @@ type Driver struct {
renewAPIURL string
client *resty.Client
onTokenUpdate func(access, refresh string)
listMu sync.Mutex
lastListAt time.Time
listInterval time.Duration
listCooldown time.Duration
}
type Config struct {
@@ -85,6 +104,8 @@ func New(c Config) *Driver {
client: resty.New().
SetTimeout(30*time.Second).
SetHeader("Accept", "application/json, text/plain, */*"),
listInterval: onedriveListInterval,
listCooldown: onedriveListCooldown,
}
}
@@ -106,10 +127,16 @@ func (d *Driver) List(ctx context.Context, dirID string) ([]drives.Entry, error)
if dirID == "" {
dirID = d.rootID
}
d.listMu.Lock()
defer d.listMu.Unlock()
nextLink := d.childrenURL(dirID)
first := true
out := make([]drives.Entry, 0)
for nextLink != "" {
if err := d.waitForListSlotLocked(ctx); err != nil {
return nil, err
}
var resp filesResp
err := d.request(ctx, nextLink, http.MethodGet, func(req *resty.Request) {
if first {
@@ -120,6 +147,19 @@ func (d *Driver) List(ctx context.Context, dirID string) ([]drives.Entry, error)
}
}, &resp)
if err != nil {
if wait, ok := drives.RateLimitRetryAfter(err); ok {
if wait <= 0 {
wait = d.listCooldown
if wait <= 0 {
wait = onedriveListCooldown
}
}
log.Printf("[onedrive] list cooling down drive=%s dir=%s cooldown=%s err=%v", d.id, dirID, wait, err)
if err := sleepContext(ctx, wait); err != nil {
return nil, err
}
continue
}
return nil, fmt.Errorf("onedrive list: %w", err)
}
for _, item := range resp.Value {
@@ -131,6 +171,36 @@ func (d *Driver) List(ctx context.Context, dirID string) ([]drives.Entry, error)
return out, nil
}
func (d *Driver) waitForListSlotLocked(ctx context.Context) error {
if d.listInterval <= 0 || d.lastListAt.IsZero() {
d.lastListAt = time.Now()
return ctx.Err()
}
next := d.lastListAt.Add(d.listInterval)
now := time.Now()
if now.Before(next) {
if err := sleepContext(ctx, next.Sub(now)); err != nil {
return err
}
}
d.lastListAt = time.Now()
return ctx.Err()
}
func sleepContext(ctx context.Context, d time.Duration) error {
if d <= 0 {
return ctx.Err()
}
timer := time.NewTimer(d)
defer timer.Stop()
select {
case <-ctx.Done():
return ctx.Err()
case <-timer.C:
return nil
}
}
func (d *Driver) Stat(ctx context.Context, fileID string) (*drives.Entry, error) {
var item graphItem
if err := d.request(ctx, d.itemURL(fileID), http.MethodGet, nil, &item); err != nil {
@@ -156,15 +226,49 @@ func (d *Driver) StreamURL(ctx context.Context, fileID string) (*drives.StreamLi
}
func (d *Driver) Upload(ctx context.Context, parentID, name string, r io.Reader, size int64) (string, error) {
res, err := d.UploadAndReportHash(ctx, parentID, name, r, size)
if err != nil {
return "", err
}
return res.FileID, nil
}
func (d *Driver) UploadAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error) {
parentID, name, err := d.normalizeUploadArgs(parentID, name, r, size)
if err != nil {
return UploadResult{}, err
}
threshold := smallUploadThreshold
if threshold <= 0 {
threshold = maxSmallUploadSize
}
if size <= threshold {
return d.uploadSmallAndReportHash(ctx, parentID, name, r, size, threshold)
}
return d.uploadSessionAndReportHash(ctx, parentID, name, r, size)
}
func (d *Driver) normalizeUploadArgs(parentID, name string, r io.Reader, size int64) (string, string, error) {
if r == nil {
return "", "", errors.New("onedrive upload: body is required")
}
if size < 0 {
return "", "", fmt.Errorf("onedrive upload: invalid size %d", size)
}
if parentID == "" {
parentID = d.rootID
}
if size > maxSmallUploadSize {
return "", fmt.Errorf("onedrive upload: files over %d bytes require upload session", maxSmallUploadSize)
name = strings.TrimSpace(name)
if name == "" {
return "", "", errors.New("onedrive upload: empty file name")
}
data, err := readSmallUpload(r)
return parentID, name, nil
}
func (d *Driver) uploadSmallAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size, limit int64) (UploadResult, error) {
data, hash, actualSize, err := readSmallUpload(r, size, limit)
if err != nil {
return "", err
return UploadResult{}, err
}
u := fmt.Sprintf("%s/items/%s:/%s:/content", d.driveBaseURL(), url.PathEscape(parentID), url.PathEscape(name))
var item graphItem
@@ -173,26 +277,159 @@ func (d *Driver) Upload(ctx context.Context, parentID, name string, r io.Reader,
req.SetContentLength(true)
}, &item)
if err != nil {
return "", fmt.Errorf("onedrive upload: %w", err)
return UploadResult{}, fmt.Errorf("onedrive upload: %w", err)
}
if item.ID == "" {
return "", errors.New("onedrive upload: empty item id")
return UploadResult{}, errors.New("onedrive upload: empty item id")
}
return item.ID, nil
return UploadResult{FileID: item.ID, Hash: hash, Size: actualSize}, nil
}
func readSmallUpload(r io.Reader) ([]byte, error) {
if r == nil {
return nil, errors.New("onedrive upload: body is required")
}
data, err := io.ReadAll(io.LimitReader(r, maxSmallUploadSize+1))
func (d *Driver) uploadSessionAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error) {
session, err := d.createUploadSession(ctx, parentID, name)
if err != nil {
return nil, fmt.Errorf("onedrive upload: read body: %w", err)
return UploadResult{}, err
}
if len(data) > maxSmallUploadSize {
return nil, fmt.Errorf("onedrive upload: files over %d bytes require upload session", maxSmallUploadSize)
if strings.TrimSpace(session.UploadURL) == "" {
return UploadResult{}, errors.New("onedrive upload session: empty upload url")
}
return data, nil
chunkSize := uploadSessionChunk
if chunkSize <= 0 {
chunkSize = defaultUploadSessionChunk
}
buf := make([]byte, int(chunkSize))
hasher := sha1.New()
var finalItem graphItem
var offset int64
for offset < size {
partSize := minInt64(chunkSize, size-offset)
chunk := buf[:int(partSize)]
n, err := io.ReadFull(r, chunk)
if err != nil {
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
return UploadResult{}, fmt.Errorf("onedrive upload: size mismatch: declared %d, copied %d", size, offset+int64(n))
}
return UploadResult{}, fmt.Errorf("onedrive upload: read body: %w", err)
}
chunk = chunk[:n]
_, _ = hasher.Write(chunk)
item, err := d.putUploadSessionChunkWithRetry(ctx, session.UploadURL, offset, size, chunk)
if err != nil {
return UploadResult{}, err
}
if item != nil {
finalItem = *item
}
offset += int64(n)
}
if finalItem.ID == "" {
return UploadResult{}, errors.New("onedrive upload session: empty item id")
}
return UploadResult{
FileID: finalItem.ID,
Hash: hex.EncodeToString(hasher.Sum(nil)),
Size: offset,
}, nil
}
func (d *Driver) createUploadSession(ctx context.Context, parentID, name string) (uploadSessionResp, error) {
u := fmt.Sprintf("%s/items/%s:/%s:/createUploadSession", d.driveBaseURL(), url.PathEscape(parentID), url.PathEscape(name))
body := map[string]any{
"item": map[string]any{
"@microsoft.graph.conflictBehavior": "rename",
},
}
var out uploadSessionResp
err := d.request(ctx, u, http.MethodPost, func(req *resty.Request) {
req.SetBody(body)
}, &out)
if err != nil {
return uploadSessionResp{}, fmt.Errorf("onedrive upload session: %w", err)
}
return out, nil
}
func (d *Driver) putUploadSessionChunkWithRetry(ctx context.Context, uploadURL string, start, total int64, data []byte) (*graphItem, error) {
var last error
for attempt := 0; attempt < uploadSessionRetryAttempts; attempt++ {
if attempt > 0 {
if err := sleepContext(ctx, time.Duration(attempt)*time.Second); err != nil {
return nil, err
}
}
item, retryable, err := d.putUploadSessionChunk(ctx, uploadURL, start, total, data)
if err == nil {
return item, nil
}
last = err
if !retryable {
return nil, err
}
}
if last == nil {
last = errors.New("onedrive upload session: retry attempts exhausted")
}
return nil, last
}
func (d *Driver) putUploadSessionChunk(ctx context.Context, uploadURL string, start, total int64, data []byte) (*graphItem, bool, error) {
end := start + int64(len(data)) - 1
req, err := http.NewRequestWithContext(ctx, http.MethodPut, uploadURL, bytes.NewReader(data))
if err != nil {
return nil, false, err
}
req.ContentLength = int64(len(data))
req.Header.Set("Content-Range", fmt.Sprintf("bytes %d-%d/%d", start, end, total))
res, err := http.DefaultClient.Do(req)
if err != nil {
return nil, true, err
}
defer res.Body.Close()
switch res.StatusCode {
case http.StatusOK, http.StatusCreated:
var item graphItem
if err := json.NewDecoder(res.Body).Decode(&item); err != nil {
return nil, false, fmt.Errorf("onedrive upload session: decode completed item: %w", err)
}
return &item, false, nil
case http.StatusAccepted:
return nil, false, nil
default:
body, _ := io.ReadAll(io.LimitReader(res.Body, 4096))
err := fmt.Errorf("onedrive upload session: status=%d body=%s", res.StatusCode, strings.TrimSpace(string(body)))
retryable := res.StatusCode == http.StatusTooManyRequests || (res.StatusCode >= 500 && res.StatusCode <= 504)
return nil, retryable, err
}
}
func readSmallUpload(r io.Reader, declaredSize, limit int64) ([]byte, string, int64, error) {
if r == nil {
return nil, "", 0, errors.New("onedrive upload: body is required")
}
if limit <= 0 {
limit = maxSmallUploadSize
}
data, err := io.ReadAll(io.LimitReader(r, limit+1))
if err != nil {
return nil, "", 0, fmt.Errorf("onedrive upload: read body: %w", err)
}
if int64(len(data)) > limit {
return nil, "", 0, fmt.Errorf("onedrive upload: files over %d bytes require upload session", limit)
}
if declaredSize >= 0 && int64(len(data)) != declaredSize {
return nil, "", 0, fmt.Errorf("onedrive upload: size mismatch: declared %d, copied %d", declaredSize, len(data))
}
sum := sha1.Sum(data)
return data, hex.EncodeToString(sum[:]), int64(len(data)), nil
}
func minInt64(a, b int64) int64 {
if a < b {
return a
}
return b
}
func (d *Driver) EnsureDir(ctx context.Context, pathFromRoot string) (string, error) {
@@ -245,6 +482,36 @@ func (d *Driver) makeDir(ctx context.Context, parentID, name string) (string, er
return item.ID, nil
}
func (d *Driver) Rename(ctx context.Context, fileID, newName string) error {
fileID = strings.TrimSpace(fileID)
if fileID == "" {
return errors.New("onedrive rename: empty file id")
}
newName = strings.TrimSpace(newName)
if newName == "" {
return errors.New("onedrive rename: empty new name")
}
var item graphItem
err := d.request(ctx, d.itemURL(fileID), http.MethodPatch, func(req *resty.Request) {
req.SetBody(map[string]string{"name": newName})
}, &item)
if err != nil {
return fmt.Errorf("onedrive rename: %w", err)
}
return nil
}
func (d *Driver) Remove(ctx context.Context, fileID string) error {
fileID = strings.TrimSpace(fileID)
if fileID == "" {
return errors.New("onedrive remove: empty file id")
}
if err := d.request(ctx, d.itemURL(fileID), http.MethodDelete, nil, nil); err != nil {
return fmt.Errorf("onedrive remove: %w", err)
}
return nil
}
func (d *Driver) request(ctx context.Context, rawURL, method string, configure func(*resty.Request), out any) error {
return d.requestOnce(ctx, rawURL, method, configure, out, true)
}
@@ -265,7 +532,7 @@ func (d *Driver) requestOnce(ctx context.Context, rawURL, method string, configu
if err != nil {
return err
}
if isRateLimitResponse(res, graphErr.Error.Code) {
if isRateLimitResponse(res, graphErr.Error.Code, graphErr.Error.Message) {
return onedriveRateLimitError(res, graphErr.Error.Message)
}
if graphErr.Error.Code != "" {
@@ -327,11 +594,54 @@ func (d *Driver) refresh(ctx context.Context) error {
return nil
}
func isRateLimitResponse(res *resty.Response, code string) bool {
if code == "TooManyRequests" || code == "activityLimitReached" {
func isRateLimitResponse(res *resty.Response, code, message string) bool {
if isRateLimitCode(code) || isRateLimitMessage(message) {
return true
}
return res != nil && res.StatusCode() == http.StatusTooManyRequests
if res == nil {
return false
}
if res.StatusCode() == http.StatusTooManyRequests {
return true
}
if res.Header().Get("Retry-After") == "" {
return false
}
switch res.StatusCode() {
case http.StatusServiceUnavailable, http.StatusGatewayTimeout:
return true
default:
return false
}
}
func isRateLimitCode(code string) bool {
normalized := strings.ToLower(strings.ReplaceAll(strings.TrimSpace(code), "_", ""))
normalized = strings.ReplaceAll(normalized, "-", "")
switch normalized {
case "toomanyrequests",
"activitylimitreached",
"throttledrequest",
"requestthrottled",
"resourcethrottled",
"applicationthrottled",
"tenantthrottled":
return true
default:
return false
}
}
func isRateLimitMessage(message string) bool {
text := strings.ToLower(strings.TrimSpace(message))
if text == "" {
return false
}
return strings.Contains(text, "too many requests") ||
strings.Contains(text, "throttl") ||
strings.Contains(text, "rate limit") ||
strings.Contains(text, "activity limit") ||
strings.Contains(text, "temporarily blocked")
}
func onedriveRateLimitError(res *resty.Response, message string) error {
@@ -442,3 +752,4 @@ func guessMime(name string) string {
}
var _ drives.Drive = (*Driver)(nil)
var _ drives.Remover = (*Driver)(nil)
+199 -1
View File
@@ -2,6 +2,8 @@ package onedrive
import (
"context"
"crypto/sha1"
"encoding/hex"
"encoding/json"
"errors"
"io"
@@ -199,7 +201,7 @@ func TestGraph429ReturnsRateLimitErrorWithRetryAfter(t *testing.T) {
APIBaseURL: srv.URL,
})
_, err := d.List(context.Background(), "root")
_, err := d.StreamURL(context.Background(), "file-id")
if err == nil {
t.Fatal("list succeeded, want rate limit error")
}
@@ -212,6 +214,92 @@ func TestGraph429ReturnsRateLimitErrorWithRetryAfter(t *testing.T) {
}
}
func TestGraphThrottleMessageReturnsRateLimitError(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusForbidden)
if err := json.NewEncoder(w).Encode(map[string]any{
"error": map[string]any{
"code": "generalException",
"message": "The request has been throttled. Please try again later.",
},
}); err != nil {
t.Fatalf("write json: %v", err)
}
}))
defer srv.Close()
d := New(Config{
ID: "od-main",
AccessToken: "access-token",
RefreshToken: "refresh-token",
APIBaseURL: srv.URL,
})
_, err := d.StreamURL(context.Background(), "file-id")
if err == nil {
t.Fatal("list succeeded, want rate limit error")
}
var rateLimit *drives.RateLimitError
if !errors.As(err, &rateLimit) {
t.Fatalf("error = %T %[1]v, want RateLimitError", err)
}
}
func TestListCoolsDownAndRetriesOneDriveRateLimit(t *testing.T) {
var calls int
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/v1.0/me/drive/items/root/children" {
t.Fatalf("unexpected request %s %s", r.Method, r.URL.String())
}
calls++
if calls == 1 {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusTooManyRequests)
if err := json.NewEncoder(w).Encode(map[string]any{
"error": map[string]any{
"code": "TooManyRequests",
"message": "throttled",
},
}); err != nil {
t.Fatalf("write json: %v", err)
}
return
}
writeJSON(t, w, map[string]any{
"value": []map[string]any{
{
"id": "file-id",
"name": "demo.mp4",
"size": 100,
"file": map[string]any{"mimeType": "video/mp4"},
},
},
})
}))
defer srv.Close()
d := New(Config{
ID: "od-main",
AccessToken: "access-token",
RefreshToken: "refresh-token",
APIBaseURL: srv.URL,
})
d.listInterval = 0
d.listCooldown = time.Millisecond
got, err := d.List(context.Background(), "root")
if err != nil {
t.Fatalf("list: %v", err)
}
if calls != 2 {
t.Fatalf("calls = %d, want retry after rate limit", calls)
}
if len(got) != 1 || got[0].ID != "file-id" {
t.Fatalf("entries = %#v, want retried file", got)
}
}
func TestStatAndStreamURLUseDriveItemMetadata(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if got := r.Header.Get("Authorization"); got != "Bearer access-token" {
@@ -320,6 +408,36 @@ func TestEnsureDirCreatesMissingFolders(t *testing.T) {
}
}
func TestRenamePatchesDriveItemName(t *testing.T) {
var body map[string]string
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPatch || r.URL.EscapedPath() != "/v1.0/me/drive/items/file-id" {
t.Fatalf("unexpected request %s %s", r.Method, r.URL.String())
}
if got := r.Header.Get("Authorization"); got != "Bearer access-token" {
t.Fatalf("authorization = %q, want bearer token", got)
}
if err := json.NewDecoder(r.Body).Decode(&body); err != nil {
t.Fatalf("decode body: %v", err)
}
writeJSON(t, w, map[string]any{"id": "file-id", "name": "new name.mp4"})
}))
defer srv.Close()
d := New(Config{
ID: "od-main",
AccessToken: "access-token",
RefreshToken: "refresh-token",
APIBaseURL: srv.URL,
})
if err := d.Rename(context.Background(), "file-id", "new name.mp4"); err != nil {
t.Fatalf("rename: %v", err)
}
if body["name"] != "new name.mp4" {
t.Fatalf("rename body = %#v, want new name", body)
}
}
func TestUploadSmallFileReturnsNewItemID(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if got := r.Header.Get("Authorization"); got != "Bearer access-token" {
@@ -358,6 +476,86 @@ func TestUploadSmallFileReturnsNewItemID(t *testing.T) {
}
}
func TestUploadLargeFileUsesUploadSessionAndReportsHash(t *testing.T) {
oldThreshold := smallUploadThreshold
oldChunk := uploadSessionChunk
smallUploadThreshold = 8
uploadSessionChunk = 4
t.Cleanup(func() {
smallUploadThreshold = oldThreshold
uploadSessionChunk = oldChunk
})
body := "0123456789abc"
var ranges []string
var chunks []string
var createdSession bool
var srv *httptest.Server
srv = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch {
case r.Method == http.MethodPost && r.URL.EscapedPath() == "/v1.0/me/drive/items/parent-id:/big.mp4:/createUploadSession":
createdSession = true
if got := r.Header.Get("Authorization"); got != "Bearer access-token" {
t.Fatalf("authorization = %q, want bearer token", got)
}
writeJSON(t, w, map[string]any{"uploadUrl": srv.URL + "/upload-session"})
case r.Method == http.MethodPut && r.URL.Path == "/upload-session":
ranges = append(ranges, r.Header.Get("Content-Range"))
data, err := io.ReadAll(r.Body)
if err != nil {
t.Fatalf("read chunk: %v", err)
}
chunks = append(chunks, string(data))
if len(ranges) < 4 {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusAccepted)
if _, err := w.Write([]byte(`{"nextExpectedRanges":["0-"]}`)); err != nil {
t.Fatalf("write accepted: %v", err)
}
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusCreated)
if err := json.NewEncoder(w).Encode(map[string]any{"id": "uploaded-big-id"}); err != nil {
t.Fatalf("write final item: %v", err)
}
default:
t.Fatalf("unexpected request %s %s", r.Method, r.URL.String())
}
}))
defer srv.Close()
d := New(Config{
ID: "od-main",
AccessToken: "access-token",
RefreshToken: "refresh-token",
APIBaseURL: srv.URL,
})
got, err := d.UploadAndReportHash(context.Background(), "parent-id", "big.mp4", strings.NewReader(body), int64(len(body)))
if err != nil {
t.Fatalf("upload: %v", err)
}
if !createdSession {
t.Fatal("createUploadSession was not called")
}
wantRanges := []string{
"bytes 0-3/13",
"bytes 4-7/13",
"bytes 8-11/13",
"bytes 12-12/13",
}
if strings.Join(ranges, "|") != strings.Join(wantRanges, "|") {
t.Fatalf("ranges = %#v, want %#v", ranges, wantRanges)
}
if strings.Join(chunks, "") != body {
t.Fatalf("uploaded chunks = %q, want %q", strings.Join(chunks, ""), body)
}
sum := sha1.Sum([]byte(body))
if got.FileID != "uploaded-big-id" || got.Size != int64(len(body)) || got.Hash != hex.EncodeToString(sum[:]) {
t.Fatalf("upload result = %#v, want file id/hash/size for body", got)
}
}
func TestUploadRefreshesExpiredTokenAndReplaysBody(t *testing.T) {
var uploadAttempts int
var tokenRefreshes int
+10
View File
@@ -82,3 +82,13 @@ type filesResp struct {
Value []graphItem `json:"value"`
NextLink string `json:"@odata.nextLink"`
}
type UploadResult struct {
FileID string
Hash string
Size int64
}
type uploadSessionResp struct {
UploadURL string `json:"uploadUrl"`
}
+36 -2
View File
@@ -149,6 +149,10 @@ func sleepContext(ctx context.Context, d time.Duration) error {
}
func isTransient115ListError(err error) bool {
return isTransient115UpstreamError(err)
}
func isTransient115UpstreamError(err error) bool {
if err == nil {
return false
}
@@ -248,11 +252,11 @@ func (d *Driver) streamURLWithUA(ctx context.Context, fileID string, ua string)
// 需要先拿到 pickCode
f, err := d.client.GetFile(fileID)
if err != nil {
return nil, fmt.Errorf("115 get file: %w", err)
return nil, wrap115StreamTransientError("115 get file", err)
}
info, ua, err := d.downloadInfo(f.PickCode, ua)
if err != nil {
return nil, fmt.Errorf("115 download url: %w", err)
return nil, wrap115StreamTransientError("115 download url", err)
}
if info == nil || info.Url.Url == "" {
return nil, errors.New("115 download url: empty")
@@ -288,6 +292,18 @@ func (d *Driver) downloadInfo(pickCode string, ua string) (*sdk.DownloadInfo, st
return info, ua, nil
}
func wrap115StreamTransientError(op string, err error) error {
wrapped := fmt.Errorf("%s: %w", op, err)
if !isTransient115UpstreamError(err) {
return wrapped
}
return &drives.RateLimitError{
Provider: "p115",
RetryAfter: p115ListCooldown,
Err: wrapped,
}
}
func (d *Driver) Upload(ctx context.Context, parentID, name string, r io.Reader, size int64) (string, error) {
res, err := d.UploadAndReportSha1(ctx, parentID, name, r, size)
if err != nil {
@@ -445,6 +461,23 @@ func (d *Driver) Rename(ctx context.Context, fileID, newName string) error {
return nil
}
func (d *Driver) Remove(ctx context.Context, fileID string) error {
if d.client == nil {
return errors.New("p115 remove: driver not initialized")
}
if err := ctx.Err(); err != nil {
return err
}
fileID = strings.TrimSpace(fileID)
if fileID == "" {
return errors.New("p115 remove: empty fileID")
}
if err := d.client.Delete(fileID); err != nil {
return fmt.Errorf("p115 remove: %w", err)
}
return nil
}
// bufferAndHashSha1 把 r 全量复制到一个临时文件,同时计算 SHA1。
// 返回临时文件(位置在末尾,需调用方 Seek 回 0)、SHA1 hex 大写、实际字节数。
//
@@ -547,3 +580,4 @@ func guessMime(name string) string {
}
var _ drives.Drive = (*Driver)(nil)
var _ drives.Remover = (*Driver)(nil)
@@ -10,6 +10,9 @@ import (
"os"
"strings"
"testing"
"time"
"github.com/video-site/backend/internal/drives"
)
func TestIsTransient115ListError(t *testing.T) {
@@ -34,6 +37,42 @@ func TestIsTransient115ListError(t *testing.T) {
}
}
func TestWrap115StreamTransientError(t *testing.T) {
cases := []struct {
name string
err error
wantRateLimit bool
}{
{name: "unexpected", err: errors.New("unexpected error"), wantRateLimit: true},
{name: "405 blocked", err: errors.New("405 request has been blocked"), wantRateLimit: true},
{name: "429", err: errors.New("429 too many requests"), wantRateLimit: true},
{name: "blocked", err: errors.New("blocked by waf"), wantRateLimit: true},
{name: "auth", err: errors.New("invalid credential"), wantRateLimit: false},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
got := wrap115StreamTransientError("115 get file", tc.err)
var rateLimit *drives.RateLimitError
isRateLimit := errors.As(got, &rateLimit)
if isRateLimit != tc.wantRateLimit {
t.Fatalf("rate limit = %v, want %v; err=%v", isRateLimit, tc.wantRateLimit, got)
}
if !strings.Contains(got.Error(), "115 get file") {
t.Fatalf("err = %v, want operation prefix", got)
}
if tc.wantRateLimit {
if rateLimit.Provider != "p115" {
t.Fatalf("provider = %q, want p115", rateLimit.Provider)
}
if rateLimit.RetryAfter != 10*time.Minute {
t.Fatalf("retry after = %s, want 10m", rateLimit.RetryAfter)
}
}
})
}
}
// TestBufferAndHashSha1 验证 bufferAndHashSha1
//
// - 把 reader 的全部字节落到 tmp 文件
File diff suppressed because it is too large Load Diff
+487
View File
@@ -0,0 +1,487 @@
package p123
import (
"bytes"
"context"
"crypto/md5"
"encoding/base64"
"encoding/json"
"errors"
"fmt"
"io"
"net/http"
"net/http/httptest"
"strings"
"testing"
"time"
"github.com/video-site/backend/internal/drives"
)
func TestStreamURLResolvesDownloadInfoRedirect(t *testing.T) {
ctx := context.Background()
var downloadReferer string
var download *httptest.Server
download = httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/resolve":
downloadReferer = r.Header.Get("Referer")
http.Redirect(w, r, download.URL+"/cdn/video.mp4", http.StatusFound)
case "/cdn/video.mp4":
t.Fatalf("driver followed redirect unexpectedly")
default:
http.NotFound(w, r)
}
}))
defer download.Close()
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
switch r.URL.Path {
case "/api/user/sign_in":
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 200,
"data": map[string]string{"token": "token-1"},
})
case "/b/api/user/info":
if got := r.Header.Get("Authorization"); got != "Bearer token-1" {
t.Fatalf("Authorization = %q, want bearer token", got)
}
_ = json.NewEncoder(w).Encode(map[string]any{"code": 0, "data": map[string]any{}})
case "/b/api/file/list/new":
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]any{
"Next": "-1",
"Total": 1,
"InfoList": []map[string]any{
{
"FileName": "video.mp4",
"Size": 1234,
"UpdateAt": "2026-01-02 03:04:05",
"FileId": 100,
"Type": 0,
"Etag": "ABCDEF",
"S3KeyFlag": "flag-1",
},
},
},
})
case "/b/api/file/download_info":
var body map[string]any
if err := json.NewDecoder(r.Body).Decode(&body); err != nil {
t.Fatalf("decode download_info body: %v", err)
}
if got := body["fileName"]; got != "video.mp4" {
t.Fatalf("fileName = %#v, want cached file metadata", got)
}
if got := body["etag"]; got != "ABCDEF" {
t.Fatalf("etag = %#v, want cached etag", got)
}
entryURL := download.URL + "/entry?params=" + base64.StdEncoding.EncodeToString([]byte(download.URL+"/resolve"))
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]string{"DownloadUrl": entryURL},
})
default:
http.NotFound(w, r)
}
}))
defer api.Close()
var savedToken string
d := New(Config{
ID: "123-main",
Username: "user@example.com",
Password: "secret",
MainAPIBaseURL: api.URL + "/b/api",
LoginAPIBaseURL: api.URL + "/api",
OnTokenUpdate: func(access string) {
savedToken = access
},
})
if err := d.Init(ctx); err != nil {
t.Fatalf("Init() error = %v", err)
}
if savedToken != "token-1" {
t.Fatalf("saved token = %q, want token-1", savedToken)
}
if _, err := d.List(ctx, d.RootID()); err != nil {
t.Fatalf("List() error = %v", err)
}
link, err := d.StreamURL(ctx, "100")
if err != nil {
t.Fatalf("StreamURL() error = %v", err)
}
if got := link.URL; got != download.URL+"/cdn/video.mp4" {
t.Fatalf("URL = %q, want final CDN URL", got)
}
if got := link.Headers.Get("Referer"); !strings.HasPrefix(got, download.URL) {
t.Fatalf("Referer = %q, want original download host", got)
}
if downloadReferer != defaultReferer {
t.Fatalf("resolve Referer = %q, want %q", downloadReferer, defaultReferer)
}
}
func TestInitUsesAccessTokenWithoutLogin(t *testing.T) {
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
switch r.URL.Path {
case "/api/user/sign_in":
t.Fatalf("driver should not password-login when access_token is configured")
case "/b/api/user/info":
if got := r.Header.Get("Authorization"); got != "Bearer token-1" {
t.Fatalf("Authorization = %q, want bearer token", got)
}
_ = json.NewEncoder(w).Encode(map[string]any{"code": 0, "data": map[string]any{}})
default:
http.NotFound(w, r)
}
}))
defer api.Close()
d := New(Config{
ID: "123-main",
AccessToken: "Bearer token-1",
MainAPIBaseURL: api.URL + "/b/api",
LoginAPIBaseURL: api.URL + "/api",
})
if err := d.Init(context.Background()); err != nil {
t.Fatalf("Init() error = %v", err)
}
}
func TestLoginRiskErrorSuggestsAccessToken(t *testing.T) {
err := loginError("当前账号存在境外登录风险,请使用短信验证码或者微信进行登录。")
if err == nil || !strings.Contains(err.Error(), "access_token") {
t.Fatalf("loginError() = %v, want access_token guidance", err)
}
}
func TestRequestCode429ReturnsRateLimitError(t *testing.T) {
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.Header().Set("Retry-After", "2")
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 429,
"message": "请求太频繁",
})
}))
defer api.Close()
d := New(Config{
ID: "123-main",
AccessToken: "token-1",
MainAPIBaseURL: api.URL,
})
_, err := d.request(context.Background(), endpointFileList, http.MethodGet, nil, nil)
var rateLimit *drives.RateLimitError
if !errors.As(err, &rateLimit) {
t.Fatalf("error = %T %[1]v, want RateLimitError", err)
}
if rateLimit.RetryAfter != 2*time.Second {
t.Fatalf("RetryAfter = %s, want 2s", rateLimit.RetryAfter)
}
}
func TestListCoolsDownAndRetriesRateLimit(t *testing.T) {
var listCalls int
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path != "/file/list/new" {
http.NotFound(w, r)
return
}
listCalls++
if listCalls == 1 {
w.Header().Set("Retry-After", "1")
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 429,
"message": "请求太频繁",
})
return
}
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]any{
"Next": "-1",
"Total": 1,
"InfoList": []map[string]any{
{
"FileName": "video.mp4",
"Size": 1234,
"UpdateAt": "2026-01-02 03:04:05",
"FileId": 100,
"Type": 0,
"Etag": "ABCDEF",
"S3KeyFlag": "flag-1",
},
},
},
})
}))
defer api.Close()
d := New(Config{
ID: "123-main",
AccessToken: "token-1",
MainAPIBaseURL: api.URL,
})
entries, err := d.List(context.Background(), d.RootID())
if err != nil {
t.Fatalf("List() error = %v", err)
}
if listCalls != 2 {
t.Fatalf("list calls = %d, want 2", listCalls)
}
if len(entries) != 1 || entries[0].ID != "100" {
t.Fatalf("entries = %#v, want one file", entries)
}
}
func TestResolveDownloadURL429ReturnsRateLimitError(t *testing.T) {
download := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Retry-After", "3")
http.Error(w, "too many requests", http.StatusTooManyRequests)
}))
defer download.Close()
d := New(Config{ID: "123-main"})
_, err := d.resolveDownloadURL(context.Background(), download.URL)
var rateLimit *drives.RateLimitError
if !errors.As(err, &rateLimit) {
t.Fatalf("error = %T %[1]v, want RateLimitError", err)
}
if rateLimit.RetryAfter != 3*time.Second {
t.Fatalf("RetryAfter = %s, want 3s", rateLimit.RetryAfter)
}
}
func TestUploadAndReportHashUsesPresignedPUTAndComplete(t *testing.T) {
ctx := context.Background()
body := []byte("video bytes for 123 upload")
wantMD5 := fmt.Sprintf("%x", md5.Sum(body))
var putBody []byte
upload := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPut {
t.Fatalf("upload method = %s, want PUT", r.Method)
}
if r.ContentLength != int64(len(body)) {
t.Fatalf("ContentLength = %d, want %d", r.ContentLength, len(body))
}
got, err := io.ReadAll(r.Body)
if err != nil {
t.Fatalf("read upload body: %v", err)
}
putBody = got
w.WriteHeader(http.StatusOK)
}))
defer upload.Close()
var uploadRequest map[string]any
var uploadURLRequest map[string]any
var completeRequest map[string]any
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
switch r.URL.Path {
case "/file/upload_request":
if err := json.NewDecoder(r.Body).Decode(&uploadRequest); err != nil {
t.Fatalf("decode upload_request: %v", err)
}
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]any{
"FileId": 9001,
"Bucket": "bucket-1",
"Key": "key-1",
"StorageNode": "node-1",
"UploadId": "upload-1",
},
})
case "/file/s3_upload_object/auth":
if err := json.NewDecoder(r.Body).Decode(&uploadURLRequest); err != nil {
t.Fatalf("decode s3 auth: %v", err)
}
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]any{
"presignedUrls": map[string]string{
"1": upload.URL + "/part-1",
},
},
})
case "/file/upload_complete/v2":
if err := json.NewDecoder(r.Body).Decode(&completeRequest); err != nil {
t.Fatalf("decode complete: %v", err)
}
_ = json.NewEncoder(w).Encode(map[string]any{"code": 0, "data": map[string]any{}})
default:
http.NotFound(w, r)
}
}))
defer api.Close()
d := New(Config{
ID: "123-main",
AccessToken: "token-1",
MainAPIBaseURL: api.URL,
})
res, err := d.UploadAndReportHash(ctx, "parent-1", "video.mp4", bytes.NewReader(body), int64(len(body)))
if err != nil {
t.Fatalf("UploadAndReportHash() error = %v", err)
}
if res.FileID != "9001" {
t.Fatalf("FileID = %q, want 9001", res.FileID)
}
if res.Hash != wantMD5 {
t.Fatalf("Hash = %q, want %q", res.Hash, wantMD5)
}
if res.Size != int64(len(body)) {
t.Fatalf("Size = %d, want %d", res.Size, len(body))
}
if !bytes.Equal(putBody, body) {
t.Fatalf("PUT body = %q, want %q", putBody, body)
}
if uploadRequest["etag"] != wantMD5 {
t.Fatalf("upload etag = %#v, want %q", uploadRequest["etag"], wantMD5)
}
if uploadRequest["fileName"] != "video.mp4" || uploadRequest["parentFileId"] != "parent-1" {
t.Fatalf("upload request = %#v, want fileName and parentFileId", uploadRequest)
}
if uploadURLRequest["partNumberStart"].(float64) != 1 || uploadURLRequest["partNumberEnd"].(float64) != 2 {
t.Fatalf("s3 auth request = %#v, want part range 1..2", uploadURLRequest)
}
if completeRequest["fileId"].(float64) != 9001 || completeRequest["fileSize"].(float64) != float64(len(body)) {
t.Fatalf("complete request = %#v, want file id and size", completeRequest)
}
if completeRequest["isMultipart"].(bool) {
t.Fatalf("complete isMultipart = true, want false")
}
}
func TestUploadAndReportHashReuseSkipsPUTAndComplete(t *testing.T) {
body := []byte("reused body")
var presignedCalled bool
var completeCalled bool
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
switch r.URL.Path {
case "/file/upload_request":
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]any{
"FileId": 7001,
"Reuse": true,
},
})
case "/file/s3_upload_object/auth", "/file/s3_repare_upload_parts_batch":
presignedCalled = true
_ = json.NewEncoder(w).Encode(map[string]any{"code": 0})
case "/file/upload_complete/v2":
completeCalled = true
_ = json.NewEncoder(w).Encode(map[string]any{"code": 0})
default:
http.NotFound(w, r)
}
}))
defer api.Close()
d := New(Config{
ID: "123-main",
AccessToken: "token-1",
MainAPIBaseURL: api.URL,
})
res, err := d.UploadAndReportHash(context.Background(), "parent-1", "reused.mp4", bytes.NewReader(body), int64(len(body)))
if err != nil {
t.Fatalf("UploadAndReportHash() error = %v", err)
}
if res.FileID != "7001" {
t.Fatalf("FileID = %q, want 7001", res.FileID)
}
if presignedCalled {
t.Fatal("reuse upload should not request presigned URLs")
}
if completeCalled {
t.Fatal("reuse upload should not call upload_complete")
}
}
func TestUploadPresignedPUT429ReturnsRateLimitError(t *testing.T) {
upload := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Retry-After", "4")
http.Error(w, "too many requests", http.StatusTooManyRequests)
}))
defer upload.Close()
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
switch r.URL.Path {
case "/file/upload_request":
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]any{
"FileId": 9001,
"Bucket": "bucket-1",
"Key": "key-1",
"StorageNode": "node-1",
"UploadId": "upload-1",
},
})
case "/file/s3_upload_object/auth":
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]any{
"presignedUrls": map[string]string{"1": upload.URL},
},
})
default:
http.NotFound(w, r)
}
}))
defer api.Close()
d := New(Config{
ID: "123-main",
AccessToken: "token-1",
MainAPIBaseURL: api.URL,
})
_, err := d.UploadAndReportHash(context.Background(), "parent-1", "limited.mp4", strings.NewReader("limited"), int64(len("limited")))
var rateLimit *drives.RateLimitError
if !errors.As(err, &rateLimit) {
t.Fatalf("error = %T %[1]v, want RateLimitError", err)
}
if rateLimit.RetryAfter != 4*time.Second {
t.Fatalf("RetryAfter = %s, want 4s", rateLimit.RetryAfter)
}
}
func TestRenameSendsExpectedBody(t *testing.T) {
var renameRequest map[string]any
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path != "/file/rename" {
http.NotFound(w, r)
return
}
if err := json.NewDecoder(r.Body).Decode(&renameRequest); err != nil {
t.Fatalf("decode rename: %v", err)
}
_ = json.NewEncoder(w).Encode(map[string]any{"code": 0, "data": map[string]any{}})
}))
defer api.Close()
d := New(Config{
ID: "123-main",
AccessToken: "token-1",
MainAPIBaseURL: api.URL,
})
if err := d.Rename(context.Background(), "9001", "new name.mp4"); err != nil {
t.Fatalf("Rename() error = %v", err)
}
if renameRequest["driveId"].(float64) != 0 || renameRequest["fileId"] != "9001" || renameRequest["fileName"] != "new name.mp4" {
t.Fatalf("rename request = %#v, want driveId/fileId/fileName", renameRequest)
}
}
+285
View File
@@ -0,0 +1,285 @@
package p123
import (
"context"
"crypto/rand"
"encoding/base64"
"encoding/hex"
"errors"
"fmt"
"net/http"
"net/url"
"strings"
"time"
"github.com/go-resty/resty/v2"
"github.com/skip2/go-qrcode"
)
const (
defaultUserAPIBase = "https://user.123pan.cn/api"
defaultQRLoginPage = "https://www.123pan.com/wx-app-login.html"
defaultQRReferer = "https://user.123pan.com/centerlogin"
defaultQROrigin = "https://user.123pan.com"
defaultQRUserAgent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0 Safari/537.36"
endpointQRCodeGenerate = "/user/qr-code/generate"
endpointQRCodeResult = "/user/qr-code/result"
endpointQRCodeWXCode = "/user/qr-code/wx_code"
)
type QRConfig struct {
UserAPIBaseURL string
HTTPClient *http.Client
Now func() time.Time
}
type QRClient struct {
userAPIBase string
client *resty.Client
now func() time.Time
}
type QRCodeSession struct {
LoginUUID string `json:"loginUuid"`
UniID string `json:"uniID"`
QRCodeURL string `json:"qrCodeUrl"`
QRImageDataURL string `json:"qrImageDataUrl"`
ExpiresAt string `json:"expiresAt,omitempty"`
}
type QRCodeStatus struct {
LoginStatus int `json:"loginStatus"`
StatusText string `json:"statusText"`
ScanPlatform int `json:"scanPlatform,omitempty"`
PlatformText string `json:"platformText,omitempty"`
AccessToken string `json:"accessToken,omitempty"`
}
func NewQRClient(c QRConfig) *QRClient {
userAPIBase := strings.TrimRight(strings.TrimSpace(c.UserAPIBaseURL), "/")
if userAPIBase == "" {
userAPIBase = defaultUserAPIBase
}
httpClient := c.HTTPClient
if httpClient == nil {
httpClient = &http.Client{Timeout: 20 * time.Second}
}
now := c.Now
if now == nil {
now = time.Now
}
return &QRClient{
userAPIBase: userAPIBase,
client: resty.NewWithClient(httpClient).
SetTimeout(20*time.Second).
SetHeader("Accept", "application/json, text/plain, */*"),
now: now,
}
}
func (c *QRClient) Generate(ctx context.Context) (QRCodeSession, error) {
loginUUID, err := newLoginUUID()
if err != nil {
return QRCodeSession{}, err
}
var resp qrGenerateResp
res, err := c.request(ctx, loginUUID).
SetResult(&resp).
Get(c.userAPIBase + endpointQRCodeGenerate)
if err != nil {
return QRCodeSession{}, err
}
if resp.Code != 0 {
return QRCodeSession{}, qrAPIError(resp.Message, res.StatusCode(), resp.Code)
}
uniID := strings.TrimSpace(resp.Data.UniID)
if uniID == "" {
return QRCodeSession{}, errors.New("123pan qr: empty uniID")
}
qrURL := buildQRLoginURL(resp.Data.URL, uniID)
png, err := qrcode.Encode(qrURL, qrcode.Medium, 220)
if err != nil {
return QRCodeSession{}, err
}
return QRCodeSession{
LoginUUID: loginUUID,
UniID: uniID,
QRCodeURL: qrURL,
QRImageDataURL: "data:image/png;base64," + base64.StdEncoding.EncodeToString(png),
ExpiresAt: c.now().Add(5 * time.Minute).Format(time.RFC3339),
}, nil
}
func (c *QRClient) Poll(ctx context.Context, loginUUID, uniID string) (QRCodeStatus, error) {
loginUUID = strings.TrimSpace(loginUUID)
uniID = strings.TrimSpace(uniID)
if loginUUID == "" {
return QRCodeStatus{}, errors.New("loginUuid is required")
}
if uniID == "" {
return QRCodeStatus{}, errors.New("uniID is required")
}
var resp qrResultResp
res, err := c.request(ctx, loginUUID).
SetQueryParam("uniID", uniID).
SetResult(&resp).
Get(c.userAPIBase + endpointQRCodeResult)
if err != nil {
return QRCodeStatus{}, err
}
if resp.Code != 0 && resp.Code != 200 {
return QRCodeStatus{}, qrAPIError(resp.Message, res.StatusCode(), resp.Code)
}
if resp.Code == 200 {
resp.Data.LoginStatus = 3
if resp.Data.ScanPlatform == 0 {
resp.Data.ScanPlatform = resp.Data.LoginType
}
}
status := QRCodeStatus{
LoginStatus: resp.Data.LoginStatus,
StatusText: qrLoginStatusText(resp.Data.LoginStatus),
ScanPlatform: resp.Data.ScanPlatform,
PlatformText: qrScanPlatformText(resp.Data.ScanPlatform),
}
if status.LoginStatus != 3 {
return status, nil
}
if token := resp.TokenValue(); token != "" {
status.AccessToken = normalizeAccessToken(token)
return status, nil
}
if resp.Data.ScanPlatform == 4 {
token, err := c.finishWechatLogin(ctx, loginUUID, uniID)
if err != nil {
return QRCodeStatus{}, err
}
status.AccessToken = normalizeAccessToken(token)
return status, nil
}
return QRCodeStatus{}, errors.New("123pan qr: confirmed login returned empty token")
}
func (c *QRClient) finishWechatLogin(ctx context.Context, loginUUID, uniID string) (string, error) {
var wxResp qrWXCodeResp
res, err := c.request(ctx, loginUUID).
SetBody(map[string]string{"uniID": uniID}).
SetResult(&wxResp).
Post(c.userAPIBase + endpointQRCodeWXCode)
if err != nil {
return "", err
}
if wxResp.Code != 0 {
return "", qrAPIError(wxResp.Message, res.StatusCode(), wxResp.Code)
}
wxCode := strings.TrimSpace(wxResp.WXCode())
if wxCode == "" {
return "", errors.New("123pan qr: empty wechat code")
}
var signIn loginResp
res, err = c.request(ctx, loginUUID).
SetBody(map[string]any{
"from": "web",
"wechat_code": wxCode,
"type": 4,
}).
SetResult(&signIn).
Post(c.userAPIBase + endpointSignIn)
if err != nil {
return "", err
}
if signIn.Code != 200 && signIn.Code != 0 {
return "", qrAPIError(signIn.Message, res.StatusCode(), signIn.Code)
}
token := strings.TrimSpace(signIn.Data.Token)
if token == "" {
return "", errors.New("123pan qr: empty token")
}
return token, nil
}
func (c *QRClient) request(ctx context.Context, loginUUID string) *resty.Request {
return c.client.R().
SetContext(ctx).
SetHeaders(map[string]string{
"Content-Type": "application/json;charset=UTF-8",
"platform": defaultPlatform,
"App-Version": defaultAppVersion,
"LoginUuid": loginUUID,
"Referer": defaultQRReferer,
"Origin": defaultQROrigin,
"User-Agent": defaultQRUserAgent,
})
}
func buildQRLoginURL(raw, uniID string) string {
raw = strings.TrimSpace(raw)
if raw == "" {
raw = defaultQRLoginPage
}
u, err := url.Parse(raw)
if err != nil {
return defaultQRLoginPage + "?env=production&uniID=" + url.QueryEscape(uniID) + "&source=123pan&type=login"
}
q := u.Query()
q.Set("env", "production")
q.Set("uniID", uniID)
q.Set("source", "123pan")
q.Set("type", "login")
u.RawQuery = q.Encode()
return u.String()
}
func newLoginUUID() (string, error) {
var b [16]byte
if _, err := rand.Read(b[:]); err != nil {
return "", err
}
b[6] = (b[6] & 0x0f) | 0x40
b[8] = (b[8] & 0x3f) | 0x80
parts := []string{
hex.EncodeToString(b[0:4]),
hex.EncodeToString(b[4:6]),
hex.EncodeToString(b[6:8]),
hex.EncodeToString(b[8:10]),
hex.EncodeToString(b[10:16]),
}
return strings.Join(parts, "-"), nil
}
func qrAPIError(message string, httpStatus, apiCode int) error {
message = strings.TrimSpace(message)
if message == "" {
message = fmt.Sprintf("HTTP %d code=%d", httpStatus, apiCode)
}
return errors.New(message)
}
func qrLoginStatusText(status int) string {
switch status {
case 0:
return "等待扫码"
case 1:
return "已扫码,等待确认"
case 2:
return "已拒绝"
case 3:
return "已确认"
case 4:
return "已过期"
default:
return "未知状态"
}
}
func qrScanPlatformText(platform int) string {
switch platform {
case 4:
return "微信"
case 7:
return "123网盘 App"
default:
return ""
}
}
+182
View File
@@ -0,0 +1,182 @@
package p123
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"strings"
"testing"
)
func TestQRCodeGenerateBuildsImage(t *testing.T) {
var seenLoginUUID string
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path != "/api/user/qr-code/generate" {
http.NotFound(w, r)
return
}
seenLoginUUID = r.Header.Get("LoginUuid")
if seenLoginUUID == "" {
t.Fatalf("missing LoginUuid header")
}
if r.Header.Get("platform") != defaultPlatform {
t.Fatalf("platform header = %q, want %q", r.Header.Get("platform"), defaultPlatform)
}
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"message": "ok",
"data": map[string]string{
"uniID": "uni-1",
"url": "https://www.123pan.com/wx-app-login.html",
},
})
}))
t.Cleanup(api.Close)
got, err := NewQRClient(QRConfig{UserAPIBaseURL: api.URL + "/api"}).Generate(context.Background())
if err != nil {
t.Fatalf("Generate() error = %v", err)
}
if got.LoginUUID != seenLoginUUID {
t.Fatalf("loginUuid = %q, want header %q", got.LoginUUID, seenLoginUUID)
}
if got.UniID != "uni-1" {
t.Fatalf("uniID = %q, want uni-1", got.UniID)
}
if !strings.Contains(got.QRCodeURL, "uniID=uni-1") || !strings.Contains(got.QRCodeURL, "type=login") {
t.Fatalf("qrCodeUrl = %q, want login params", got.QRCodeURL)
}
if !strings.HasPrefix(got.QRImageDataURL, "data:image/png;base64,") {
t.Fatalf("qrImageDataUrl missing png data url prefix")
}
if got.ExpiresAt == "" {
t.Fatalf("expiresAt is empty")
}
}
func TestQRCodePollCompletesWechatLogin(t *testing.T) {
var wxCodeRequested bool
var signInRequested bool
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.Header.Get("LoginUuid") != "login-1" {
t.Fatalf("LoginUuid = %q, want login-1", r.Header.Get("LoginUuid"))
}
switch r.URL.Path {
case "/api/user/qr-code/result":
if r.URL.Query().Get("uniID") != "uni-1" {
t.Fatalf("uniID = %q, want uni-1", r.URL.Query().Get("uniID"))
}
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]any{
"loginStatus": 3,
"scanPlatform": 4,
},
})
case "/api/user/qr-code/wx_code":
wxCodeRequested = true
var body map[string]string
if err := json.NewDecoder(r.Body).Decode(&body); err != nil {
t.Fatalf("decode wx_code body: %v", err)
}
if body["uniID"] != "uni-1" {
t.Fatalf("wx_code uniID = %q, want uni-1", body["uniID"])
}
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]string{"wxCode": "wx-code-1"},
})
case "/api/user/sign_in":
signInRequested = true
var body map[string]any
if err := json.NewDecoder(r.Body).Decode(&body); err != nil {
t.Fatalf("decode sign_in body: %v", err)
}
if body["wechat_code"] != "wx-code-1" {
t.Fatalf("wechat_code = %#v, want wx-code-1", body["wechat_code"])
}
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 200,
"data": map[string]string{"token": "Bearer token-1"},
})
default:
http.NotFound(w, r)
}
}))
t.Cleanup(api.Close)
got, err := NewQRClient(QRConfig{UserAPIBaseURL: api.URL + "/api"}).Poll(context.Background(), "login-1", "uni-1")
if err != nil {
t.Fatalf("Poll() error = %v", err)
}
if !wxCodeRequested || !signInRequested {
t.Fatalf("wechat completion calls wx=%v signIn=%v, want both", wxCodeRequested, signInRequested)
}
if got.LoginStatus != 3 || got.AccessToken != "token-1" || got.PlatformText != "微信" {
t.Fatalf("status = %#v, want confirmed wechat token", got)
}
}
func TestQRCodePollUsesAppToken(t *testing.T) {
var wxCodeRequested bool
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
switch r.URL.Path {
case "/api/user/qr-code/result":
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 0,
"data": map[string]any{
"loginStatus": 3,
"scanPlatform": 7,
"token": "app-token",
},
})
case "/api/user/qr-code/wx_code":
wxCodeRequested = true
http.Error(w, "unexpected wx_code", http.StatusInternalServerError)
default:
http.NotFound(w, r)
}
}))
t.Cleanup(api.Close)
got, err := NewQRClient(QRConfig{UserAPIBaseURL: api.URL + "/api"}).Poll(context.Background(), "login-1", "uni-1")
if err != nil {
t.Fatalf("Poll() error = %v", err)
}
if wxCodeRequested {
t.Fatalf("wx_code should not be called when app token is already returned")
}
if got.AccessToken != "app-token" || got.PlatformText != "123网盘 App" {
t.Fatalf("status = %#v, want app token", got)
}
}
func TestQRCodePollUsesOfficialAppSuccessCode(t *testing.T) {
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path != "/api/user/qr-code/result" {
http.NotFound(w, r)
return
}
_ = json.NewEncoder(w).Encode(map[string]any{
"code": 200,
"data": map[string]any{
"login_type": 7,
"token": "app-token",
},
})
}))
t.Cleanup(api.Close)
got, err := NewQRClient(QRConfig{UserAPIBaseURL: api.URL + "/api"}).Poll(context.Background(), "login-1", "uni-1")
if err != nil {
t.Fatalf("Poll() error = %v", err)
}
if got.LoginStatus != 3 || got.ScanPlatform != 7 || got.AccessToken != "app-token" {
t.Fatalf("status = %#v, want official app success token", got)
}
}
+204
View File
@@ -0,0 +1,204 @@
package p123
import (
"encoding/json"
"strconv"
"strings"
"time"
)
type apiEnvelope struct {
Code int `json:"code"`
Message string `json:"message"`
}
type loginResp struct {
Code int `json:"code"`
Message string `json:"message"`
Data struct {
Token string `json:"token"`
} `json:"data"`
}
type qrGenerateResp struct {
Code int `json:"code"`
Message string `json:"message"`
Data struct {
UniID string `json:"uniID"`
URL string `json:"url"`
} `json:"data"`
}
type qrResultResp struct {
Code int `json:"code"`
Message string `json:"message"`
Data struct {
LoginStatus int `json:"loginStatus"`
ScanPlatform int `json:"scanPlatform"`
LoginType int `json:"login_type"`
Token string `json:"token"`
AccessToken string `json:"accessToken"`
} `json:"data"`
}
func (r qrResultResp) TokenValue() string {
if strings.TrimSpace(r.Data.Token) != "" {
return r.Data.Token
}
return r.Data.AccessToken
}
type qrWXCodeResp struct {
Code int `json:"code"`
Message string `json:"message"`
Data struct {
WXCodeLower string `json:"wxCode"`
WXCodeTitle string `json:"WxCode"`
Code string `json:"code"`
} `json:"data"`
}
func (r qrWXCodeResp) WXCode() string {
if r.Data.WXCodeLower != "" {
return r.Data.WXCodeLower
}
if r.Data.WXCodeTitle != "" {
return r.Data.WXCodeTitle
}
return r.Data.Code
}
type fileListResp struct {
Data struct {
Next string `json:"Next"`
Total int `json:"Total"`
InfoList []panFile `json:"InfoList"`
} `json:"data"`
}
type panFile struct {
FileName string `json:"FileName"`
Size int64 `json:"Size"`
UpdateAt flexibleTime `json:"UpdateAt"`
FileID int64 `json:"FileId"`
Type int `json:"Type"`
Etag string `json:"Etag"`
S3KeyFlag string `json:"S3KeyFlag"`
}
type cachedFile struct {
file panFile
parentID string
}
type downloadInfoResp struct {
Data struct {
DownloadURL string `json:"DownloadUrl"`
DownloadURLLower string `json:"downloadUrl"`
} `json:"data"`
}
func (r downloadInfoResp) URL() string {
if r.Data.DownloadURL != "" {
return r.Data.DownloadURL
}
return r.Data.DownloadURLLower
}
type redirectResp struct {
Data struct {
RedirectURL string `json:"redirect_url"`
RedirectURLCamel string `json:"redirectUrl"`
RedirectURLTitle string `json:"RedirectUrl"`
} `json:"data"`
}
func (r redirectResp) URL() string {
if r.Data.RedirectURL != "" {
return r.Data.RedirectURL
}
if r.Data.RedirectURLCamel != "" {
return r.Data.RedirectURLCamel
}
return r.Data.RedirectURLTitle
}
type mkdirResp struct {
Data struct {
FileID int64 `json:"FileId"`
} `json:"data"`
}
type uploadResp struct {
Data struct {
AccessKeyID string `json:"AccessKeyId"`
Bucket string `json:"Bucket"`
Key string `json:"Key"`
SecretAccessKey string `json:"SecretAccessKey"`
SessionToken string `json:"SessionToken"`
FileID int64 `json:"FileId"`
Reuse bool `json:"Reuse"`
EndPoint string `json:"EndPoint"`
StorageNode string `json:"StorageNode"`
UploadID string `json:"UploadId"`
} `json:"data"`
}
type s3PreSignedURLsResp struct {
Data struct {
PreSignedURLs map[string]string `json:"presignedUrls"`
} `json:"data"`
}
type flexibleTime struct {
t time.Time
}
func (t *flexibleTime) UnmarshalJSON(data []byte) error {
if string(data) == "null" || string(data) == `""` {
return nil
}
var s string
if err := json.Unmarshal(data, &s); err == nil {
t.t = parseTimeString(s)
return nil
}
var n int64
if err := json.Unmarshal(data, &n); err == nil {
if n > 1_000_000_000_000 {
t.t = time.UnixMilli(n)
} else {
t.t = time.Unix(n, 0)
}
return nil
}
return nil
}
func (t flexibleTime) Time() time.Time {
return t.t
}
func parseTimeString(s string) time.Time {
s = strings.TrimSpace(s)
if s == "" {
return time.Time{}
}
for _, layout := range []string{
time.RFC3339Nano,
time.RFC3339,
"2006-01-02 15:04:05",
"2006-01-02T15:04:05",
} {
if parsed, err := time.ParseInLocation(layout, s, time.FixedZone("UTC+8", 8*3600)); err == nil {
return parsed
}
}
if n, err := strconv.ParseInt(s, 10, 64); err == nil {
if n > 1_000_000_000_000 {
return time.UnixMilli(n)
}
return time.Unix(n, 0)
}
return time.Time{}
}
+3 -4
View File
@@ -199,9 +199,8 @@ func (d *Driver) refreshCaptchaToken(ctx context.Context, action string, meta ma
// refreshCaptchaTokenOnce 调 /v1/shield/captcha/init 申请新 captcha token。
//
// 如果 retry=true 且服务端返回 4002captcha_token expired,意味着 body 里
// 携带的 d.captchaToken 已经过期),就清空缓存的 captcha_token 后再调一次;
// 这次 body 里 captcha_token 为空,服务端会直接发一个新的。这覆盖
// 如果 retry=true 且服务端返回 captcha 失效错误(4002 或 9),就清空缓存的
// captcha_token 后再调一次;这次 body 里 captcha_token 为空,服务端会直接发一个新的。这覆盖
// driver 重启后 Init() 用持久化的旧 captcha_token 调 captcha init 失败的
// 场景。
func (d *Driver) refreshCaptchaTokenOnce(ctx context.Context, action string, meta map[string]string, retry bool) error {
@@ -230,7 +229,7 @@ func (d *Driver) refreshCaptchaTokenOnce(ctx context.Context, action string, met
return err
}
if e.isError() {
if retry && e.ErrorCode == 4002 && d.captchaToken != "" {
if retry && isCaptchaTokenRejectedCode(e.ErrorCode) && d.captchaToken != "" {
d.captchaToken = ""
return d.refreshCaptchaTokenOnce(ctx, action, meta, false)
}
@@ -96,6 +96,65 @@ func TestRefreshCaptchaTokenRecoversFrom4002(t *testing.T) {
}
}
// TestRefreshCaptchaTokenRecoversFrom9 覆盖 PikPak 返回 error_code=9
// captcha_invalid 的路径。这个错误和 4002 一样表示当前 captcha_token 已被拒绝;
// 重试 captcha/init 前必须先清空旧 token,否则服务端会继续拒绝。
func TestRefreshCaptchaTokenRecoversFrom9(t *testing.T) {
var calls int32
type bodyShape struct {
CaptchaToken string `json:"captcha_token"`
}
var (
firstBody bodyShape
secondBody bodyShape
)
mux := http.NewServeMux()
mux.HandleFunc("/v1/shield/captcha/init", func(w http.ResponseWriter, r *http.Request) {
n := atomic.AddInt32(&calls, 1)
switch n {
case 1:
_ = json.NewDecoder(r.Body).Decode(&firstBody)
writeErrorJSON(w, `{
"error_code": 9,
"error": "captcha_invalid",
"error_description": "Verification code is invalid"
}`)
case 2:
_ = json.NewDecoder(r.Body).Decode(&secondBody)
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(`{
"captcha_token": "fresh-captcha",
"expires_in": 300
}`))
default:
t.Errorf("unexpected captcha init call #%d", n)
}
})
server := httptest.NewServer(mux)
defer server.Close()
d := newTestDriver(t, server)
d.captchaToken = "expired-captcha"
if err := d.refreshCaptchaTokenAtLogin(context.Background(), "GET:/drive/v1/files", "user-1"); err != nil {
t.Fatalf("refreshCaptchaTokenAtLogin: %v", err)
}
if got := atomic.LoadInt32(&calls); got != 2 {
t.Fatalf("captcha init called %d times, want 2", got)
}
if firstBody.CaptchaToken != "expired-captcha" {
t.Errorf("first body captcha_token = %q, want \"expired-captcha\"", firstBody.CaptchaToken)
}
if secondBody.CaptchaToken != "" {
t.Errorf("second body captcha_token = %q, want empty (cleared after error_code=9)", secondBody.CaptchaToken)
}
if d.captchaToken != "fresh-captcha" {
t.Errorf("d.captchaToken = %q, want \"fresh-captcha\"", d.captchaToken)
}
}
// TestRefreshCaptchaTokenDoesNotLoopOn4002WithEmptyToken 防止退化成无限重试:
// 如果调用方一开始 captchaToken 就是空,又遇上 4002,不应该再清空一次重试
// (清空后还是空,再发会拿到同样的错误),应该直接返回错误让上层处理。
@@ -121,6 +180,141 @@ func TestRefreshCaptchaTokenDoesNotLoopOn4002WithEmptyToken(t *testing.T) {
}
}
func TestInitWithRefreshTokenDoesNotSendPersistedCaptchaToken(t *testing.T) {
var captchaCalls int32
var captchaBody struct {
CaptchaToken string `json:"captcha_token"`
}
var persisted struct {
access, refresh, captcha string
calls int
}
mux := http.NewServeMux()
mux.HandleFunc("/v1/auth/token", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(`{
"access_token": "fresh-access",
"refresh_token": "fresh-refresh",
"sub": "user-1"
}`))
})
mux.HandleFunc("/v1/shield/captcha/init", func(w http.ResponseWriter, r *http.Request) {
atomic.AddInt32(&captchaCalls, 1)
_ = json.NewDecoder(r.Body).Decode(&captchaBody)
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(`{
"captcha_token": "fresh-captcha",
"expires_in": 300
}`))
})
server := httptest.NewServer(mux)
defer server.Close()
d := newTestDriver(t, server)
d.captchaToken = "persisted-stale-captcha"
d.onTokenUpdate = func(access, refresh, captcha, deviceID string) {
persisted.access = access
persisted.refresh = refresh
persisted.captcha = captcha
persisted.calls++
}
if err := d.Init(context.Background()); err != nil {
t.Fatalf("Init: %v", err)
}
if got := atomic.LoadInt32(&captchaCalls); got != 1 {
t.Fatalf("captcha init calls = %d, want 1", got)
}
if captchaBody.CaptchaToken != "" {
t.Errorf("captcha init body captcha_token = %q, want empty", captchaBody.CaptchaToken)
}
if d.captchaToken != "fresh-captcha" {
t.Errorf("d.captchaToken = %q, want \"fresh-captcha\"", d.captchaToken)
}
if persisted.access != "fresh-access" || persisted.refresh != "fresh-refresh" || persisted.captcha != "fresh-captcha" {
t.Errorf("persisted tokens = (%q, %q, %q), want fresh values", persisted.access, persisted.refresh, persisted.captcha)
}
if persisted.calls < 2 {
t.Errorf("persist callback calls = %d, want at least 2 (clear stale + persist fresh)", persisted.calls)
}
}
func TestInitFallsBackToLoginWhenRefreshReturnsCaptchaInvalid(t *testing.T) {
var (
tokenCalls int32
captchaCalls int32
signinCalls int32
)
var signinBody struct {
CaptchaToken string `json:"captcha_token"`
}
mux := http.NewServeMux()
mux.HandleFunc("/v1/auth/token", func(w http.ResponseWriter, r *http.Request) {
atomic.AddInt32(&tokenCalls, 1)
writeErrorJSON(w, `{
"error_code": 4002,
"error": "captcha_invalid",
"error_description": "Code(4002) - captcha_token expired"
}`)
})
mux.HandleFunc("/v1/shield/captcha/init", func(w http.ResponseWriter, r *http.Request) {
n := atomic.AddInt32(&captchaCalls, 1)
w.Header().Set("Content-Type", "application/json")
switch n {
case 1:
_, _ = w.Write([]byte(`{
"captcha_token": "login-captcha",
"expires_in": 300
}`))
case 2:
_, _ = w.Write([]byte(`{
"captcha_token": "files-captcha",
"expires_in": 300
}`))
default:
t.Errorf("unexpected captcha init call #%d", n)
}
})
mux.HandleFunc("/v1/auth/signin", func(w http.ResponseWriter, r *http.Request) {
atomic.AddInt32(&signinCalls, 1)
_ = json.NewDecoder(r.Body).Decode(&signinBody)
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(`{
"access_token": "login-access",
"refresh_token": "login-refresh",
"sub": "user-1"
}`))
})
server := httptest.NewServer(mux)
defer server.Close()
d := newTestDriver(t, server)
d.captchaToken = "persisted-stale-captcha"
if err := d.Init(context.Background()); err != nil {
t.Fatalf("Init: %v", err)
}
if got := atomic.LoadInt32(&tokenCalls); got != 1 {
t.Fatalf("token refresh calls = %d, want 1", got)
}
if got := atomic.LoadInt32(&signinCalls); got != 1 {
t.Fatalf("signin calls = %d, want 1", got)
}
if got := atomic.LoadInt32(&captchaCalls); got != 2 {
t.Fatalf("captcha init calls = %d, want 2 (login + post-login files action)", got)
}
if signinBody.CaptchaToken != "login-captcha" {
t.Errorf("signin captcha_token = %q, want \"login-captcha\"", signinBody.CaptchaToken)
}
if d.accessToken != "login-access" || d.refreshToken != "login-refresh" || d.captchaToken != "files-captcha" {
t.Errorf("driver tokens = (%q, %q, %q), want login/files tokens", d.accessToken, d.refreshToken, d.captchaToken)
}
}
// TestRequestOnceRecoversFrom4002OnAPICall 验证一个普通 API 调用收到 4002
// 时,requestOnce 会先清空 captchaToken、再走 captcha 刷新,最后用新 token
// 重试请求,最终成功返回。
@@ -196,6 +390,76 @@ func TestRequestOnceRecoversFrom4002OnAPICall(t *testing.T) {
}
}
// TestRequestOnceRecoversFrom9OnAPICall 验证普通 API 调用收到 error_code=9
// 时,会先清空旧 captchaToken,再刷新 captcha 并重试原请求。
func TestRequestOnceRecoversFrom9OnAPICall(t *testing.T) {
var (
filesCalls int32
captchaCalls int32
)
type capturedFiles struct {
captchaHeader string
}
var firstFiles, secondFiles capturedFiles
mux := http.NewServeMux()
mux.HandleFunc("/drive/v1/files", func(w http.ResponseWriter, r *http.Request) {
n := atomic.AddInt32(&filesCalls, 1)
switch n {
case 1:
firstFiles.captchaHeader = r.Header.Get("X-Captcha-Token")
writeErrorJSON(w, `{
"error_code": 9,
"error": "captcha_invalid",
"error_description": "Verification code is invalid"
}`)
case 2:
secondFiles.captchaHeader = r.Header.Get("X-Captcha-Token")
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(`{"files": [], "next_page_token": ""}`))
default:
t.Errorf("unexpected /drive/v1/files call #%d", n)
}
})
mux.HandleFunc("/v1/shield/captcha/init", func(w http.ResponseWriter, r *http.Request) {
atomic.AddInt32(&captchaCalls, 1)
var body struct {
CaptchaToken string `json:"captcha_token"`
}
_ = json.NewDecoder(r.Body).Decode(&body)
if body.CaptchaToken != "" {
t.Errorf("captcha init body captcha_token = %q, want empty (error_code=9 path should clear cache)", body.CaptchaToken)
}
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(`{"captcha_token": "fresh-captcha", "expires_in": 300}`))
})
server := httptest.NewServer(mux)
defer server.Close()
d := newTestDriver(t, server)
d.captchaToken = "expired-captcha"
if _, err := d.List(context.Background(), "any-parent"); err != nil {
t.Fatalf("List: %v", err)
}
if got := atomic.LoadInt32(&filesCalls); got != 2 {
t.Fatalf("/drive/v1/files calls = %d, want 2 (initial + retry)", got)
}
if got := atomic.LoadInt32(&captchaCalls); got != 1 {
t.Fatalf("captcha init calls = %d, want 1", got)
}
if firstFiles.captchaHeader != "expired-captcha" {
t.Errorf("first request X-Captcha-Token = %q, want \"expired-captcha\"", firstFiles.captchaHeader)
}
if secondFiles.captchaHeader != "fresh-captcha" {
t.Errorf("retry X-Captcha-Token = %q, want \"fresh-captcha\"", secondFiles.captchaHeader)
}
if d.captchaToken != "fresh-captcha" {
t.Errorf("d.captchaToken after recovery = %q, want \"fresh-captcha\"", d.captchaToken)
}
}
// TestRequestOnceDoesNotRetryTwiceOn4002 验证 4002 恢复路径只重试一次;
// 如果重试请求依然失败(哪怕是再来一个 4002),也不会再次进入恢复逻辑,
// 而是把错误返回出去,避免无限循环。
+97 -8
View File
@@ -4,6 +4,7 @@ import (
"context"
"errors"
"fmt"
"io"
"log"
"net/http"
"path"
@@ -43,8 +44,9 @@ type Driver struct {
algorithms []string
userAgent string
client *resty.Client
onTokenUpdate func(access, refresh, captcha, deviceID string)
client *resty.Client
onTokenUpdate func(access, refresh, captcha, deviceID string)
uploadToOSSFunc func(context.Context, *s3Params, io.Reader) error
// captchaMu serializes captcha-token refreshes triggered by 4002 / 9
// recovery in requestOnce. Without it, N concurrent callers all hitting
@@ -121,9 +123,28 @@ func (d *Driver) ID() string { return d.id }
func (d *Driver) RootID() string { return d.rootID }
func (d *Driver) Init(ctx context.Context) error {
clearPersistedCaptcha := func() {
if d.captchaToken == "" {
return
}
d.captchaToken = ""
d.persistTokens()
}
if d.refreshToken != "" {
if err := d.refresh(ctx, d.refreshToken); err != nil {
return err
if !IsCaptchaError(err) || d.username == "" || d.password == "" {
return err
}
clearPersistedCaptcha()
if err := d.login(ctx); err != nil {
return fmt.Errorf("pikpak refresh captcha recovery login: %w", err)
}
} else {
// Persisted captcha tokens are short-lived. With a refresh token we can
// safely request a fresh captcha token after auth, and avoiding the
// stored value prevents known-stale tokens from poisoning startup.
clearPersistedCaptcha()
}
} else {
if err := d.login(ctx); err != nil {
@@ -335,8 +356,74 @@ func (d *Driver) Rename(ctx context.Context, fileID, newName string) error {
return nil
}
func (d *Driver) Remove(ctx context.Context, fileID string) error {
fileID = strings.TrimSpace(fileID)
if fileID == "" {
return errors.New("pikpak remove: empty file id")
}
if err := d.request(ctx, filesURL+":batchTrash", http.MethodPost, func(req *resty.Request) {
req.SetBody(map[string]any{"ids": []string{fileID}})
}, nil); err != nil {
return fmt.Errorf("pikpak remove: %w", err)
}
return nil
}
func (d *Driver) EnsureDir(ctx context.Context, pathFromRoot string) (string, error) {
return "", drives.ErrNotSupported
currentID := d.rootID
for _, name := range splitPath(pathFromRoot) {
childID, err := d.findChildDir(ctx, currentID, name)
if err != nil {
return "", err
}
if childID == "" {
childID, err = d.makeDir(ctx, currentID, name)
if err != nil {
return "", err
}
}
currentID = childID
}
return currentID, nil
}
func (d *Driver) findChildDir(ctx context.Context, parentID, name string) (string, error) {
entries, err := d.List(ctx, parentID)
if err != nil {
return "", err
}
for _, e := range entries {
if e.IsDir && e.Name == name {
return e.ID, nil
}
}
return "", nil
}
func (d *Driver) makeDir(ctx context.Context, parentID, name string) (string, error) {
var out file
err := d.request(ctx, filesURL, http.MethodPost, func(req *resty.Request) {
req.SetBody(map[string]any{
"kind": "drive#folder",
"parent_id": parentID,
"name": name,
})
}, &out)
if err != nil {
return "", fmt.Errorf("pikpak mkdir %s: %w", name, err)
}
if out.ID == "" {
return "", fmt.Errorf("pikpak mkdir %s: empty folder id", name)
}
return out.ID, nil
}
func splitPath(p string) []string {
p = strings.Trim(p, "/")
if p == "" {
return nil
}
return strings.Split(p, "/")
}
func (d *Driver) getFiles(ctx context.Context, parentID string) ([]file, error) {
@@ -408,14 +495,15 @@ func (d *Driver) requestOnce(ctx context.Context, url, method string, configure
// serialized. Once we hold the lock, if d.captchaToken has
// already moved past staleToken, another goroutine has refreshed
// it for us — we skip the refresh and just retry. Otherwise we
// clear the cached token (4002 means "the value in the body is
// expired"; sending it again will keep returning 4002) and ask
// /v1/shield/captcha/init for a fresh one.
// clear the cached token before asking /v1/shield/captcha/init
// for a fresh one. PikPak may report stale captcha as either
// 4002 or 9, and sending the rejected token into captcha init can
// keep returning captcha_invalid.
staleToken := d.captchaToken
d.captchaMu.Lock()
var refreshErr error
if d.captchaToken == staleToken {
if e.ErrorCode == 4002 {
if d.captchaToken != "" {
d.captchaToken = ""
}
refreshErr = d.refreshCaptchaTokenAtLogin(ctx, getAction(method, url), d.userID)
@@ -490,3 +578,4 @@ func ParseBoolDefault(raw string, def bool) bool {
}
var _ drives.Drive = (*Driver)(nil)
var _ drives.Remover = (*Driver)(nil)
+83 -7
View File
@@ -1,10 +1,12 @@
package pikpak
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
"time"
"github.com/video-site/backend/internal/drives"
)
func TestNewDefaults(t *testing.T) {
@@ -95,11 +97,85 @@ func TestFolderToEntry(t *testing.T) {
}
}
func TestEnsureDirStillUnsupported(t *testing.T) {
d := New(Config{ID: "pikpak-main"})
func TestEnsureDirReusesExistingFolder(t *testing.T) {
var postCalled bool
mux := http.NewServeMux()
mux.HandleFunc("/drive/v1/files", func(w http.ResponseWriter, r *http.Request) {
switch r.Method {
case http.MethodGet:
if got := r.URL.Query().Get("parent_id"); got != "root-id" {
t.Fatalf("parent_id = %q, want root-id", got)
}
writePikPakJSON(t, w, map[string]any{
"files": []map[string]any{{
"id": "existing-folder-id",
"kind": "drive#folder",
"name": "91 Spider",
}},
})
case http.MethodPost:
postCalled = true
t.Fatalf("existing folder should not be created again")
default:
t.Fatalf("unexpected method %s", r.Method)
}
})
srv := httptest.NewServer(mux)
defer srv.Close()
if _, err := d.EnsureDir(nil, "/previews"); err != drives.ErrNotSupported {
t.Fatalf("EnsureDir error = %v, want ErrNotSupported", err)
d := newTestDriver(t, srv)
got, err := d.EnsureDir(context.Background(), "91 Spider")
if err != nil {
t.Fatalf("ensure dir: %v", err)
}
if got != "existing-folder-id" {
t.Fatalf("dir id = %q, want existing-folder-id", got)
}
if postCalled {
t.Fatal("POST should not be called")
}
}
func TestEnsureDirCreatesMissingFolder(t *testing.T) {
var got uploadRequestBody
mux := http.NewServeMux()
mux.HandleFunc("/drive/v1/files", func(w http.ResponseWriter, r *http.Request) {
switch r.Method {
case http.MethodGet:
writePikPakJSON(t, w, map[string]any{"files": []map[string]any{}})
case http.MethodPost:
if err := json.NewDecoder(r.Body).Decode(&got); err != nil {
t.Fatalf("decode create folder body: %v", err)
}
writePikPakJSON(t, w, map[string]any{
"id": "new-folder-id",
"kind": "drive#folder",
"name": "91 Spider",
})
default:
t.Fatalf("unexpected method %s", r.Method)
}
})
srv := httptest.NewServer(mux)
defer srv.Close()
d := newTestDriver(t, srv)
id, err := d.EnsureDir(context.Background(), "91 Spider")
if err != nil {
t.Fatalf("ensure dir: %v", err)
}
if id != "new-folder-id" {
t.Fatalf("dir id = %q, want new-folder-id", id)
}
if got.Kind != "drive#folder" || got.ParentID != "root-id" || got.Name != "91 Spider" {
t.Fatalf("create folder body = %#v", got)
}
}
func writePikPakJSON(t *testing.T, w http.ResponseWriter, body any) {
t.Helper()
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(body); err != nil {
t.Fatalf("write json: %v", err)
}
// Upload 的真实实现见 upload_test.go。
}
+5 -1
View File
@@ -59,6 +59,10 @@ func (e *errResp) Error() string {
return fmt.Sprintf("pikpak error_code=%d error=%s description=%s", e.ErrorCode, e.ErrorMsg, e.ErrorDescription)
}
func isCaptchaTokenRejectedCode(code int64) bool {
return code == 9 || code == 4002
}
// APIError is the public alias for the PikPak API error response. Callers
// outside this package (e.g. the spider91→PikPak migrator, tests) can either
// construct it for fakes or unwrap it via errors.As. Prefer IsCaptchaError
@@ -76,7 +80,7 @@ func IsCaptchaError(err error) bool {
}
var e *errResp
if errors.As(err, &e) {
return e != nil && (e.ErrorCode == 4002 || e.ErrorCode == 9)
return e != nil && isCaptchaTokenRejectedCode(e.ErrorCode)
}
return false
}
+146 -10
View File
@@ -6,7 +6,10 @@ import (
"errors"
"fmt"
"io"
"log"
"net"
"net/http"
"net/url"
"os"
"strings"
"time"
@@ -26,7 +29,7 @@ import (
// - 未命中:resumable.params 含 S3 兼容凭证(access_key / secret /
// bucket / endpoint / key / security_token
//
// 3. 用 Aliyun OSS SDK PutObject 把字节传到 endpoint+bucket+key
// 3. 用 Aliyun OSS SDK PutObject 把字节传到 PikPak 返回的临时 OSS endpoint
//
// 4. PikPak 服务端轮询 OSS,发现完成后把 resp.File.ID 标记为可用;
// 所以 Upload 完成后直接返回 resp.File.ID 即可(一开始就有,
@@ -39,6 +42,9 @@ const (
// spider91 视频通常 ~100MiB,远低于该值。超过则需走 multipart,
// 当前未实现,遇到会显式报错。
maxSinglePutSize = 5*1024*1024*1024 - 1
// 首次上传失败后最多再重试 3 次。每次重试都会重新申请 PikPak
// upload session,以避开偶发不可解析/不可达的临时上传 endpoint。
pikpakUploadMaxAttempts = 4
)
// uploadTaskData 是 POST /drive/v1/files 的响应结构。
@@ -129,13 +135,49 @@ func (d *Driver) UploadAndReportHash(ctx context.Context, parentID, name string,
_ = os.Remove(tmp.Name())
}()
// 2) 申请上传会话。
result := UploadResult{Hash: gcidHex, Size: actualSize}
var lastErr error
for attempt := 1; attempt <= pikpakUploadMaxAttempts; attempt++ {
if err := ctx.Err(); err != nil {
return UploadResult{}, err
}
resp, err := d.requestUploadSession(ctx, parentID, name, actualSize, gcidHex)
if err != nil {
lastErr = fmt.Errorf("pikpak upload: request session: %w", err)
if !shouldRetryPikPakUploadAttempt(lastErr, attempt) {
return UploadResult{}, lastErr
}
d.logUploadRetry(name, attempt, lastErr)
if err := pikpakSleepContext(ctx, pikpakUploadRetryDelay(attempt)); err != nil {
return UploadResult{}, err
}
continue
}
out, err := d.completeUploadAttempt(ctx, tmp, parentID, name, result, resp)
if err == nil {
return out, nil
}
lastErr = err
if !shouldRetryPikPakUploadAttempt(lastErr, attempt) {
return UploadResult{}, lastErr
}
d.logUploadRetry(name, attempt, lastErr)
if err := pikpakSleepContext(ctx, pikpakUploadRetryDelay(attempt)); err != nil {
return UploadResult{}, err
}
}
return UploadResult{}, lastErr
}
func (d *Driver) requestUploadSession(ctx context.Context, parentID, name string, size int64, gcidHex string) (uploadTaskData, error) {
var resp uploadTaskData
if err := d.request(ctx, filesURL, http.MethodPost, func(req *resty.Request) {
req.SetBody(map[string]any{
"kind": "drive#file",
"name": name,
"size": actualSize,
"size": size,
"hash": gcidHex,
"upload_type": "UPLOAD_TYPE_RESUMABLE",
"objProvider": map[string]any{"provider": "UPLOAD_TYPE_UNKNOWN"},
@@ -143,12 +185,13 @@ func (d *Driver) UploadAndReportHash(ctx context.Context, parentID, name string,
"folder_type": "NORMAL",
})
}, &resp); err != nil {
return UploadResult{}, fmt.Errorf("pikpak upload: request session: %w", err)
return uploadTaskData{}, err
}
return resp, nil
}
result := UploadResult{Hash: gcidHex, Size: actualSize}
// 3) 命中秒传:服务端已经知道这个 hash,直接返回新文件 ID。
func (d *Driver) completeUploadAttempt(ctx context.Context, tmp *os.File, parentID, name string, result UploadResult, resp uploadTaskData) (UploadResult, error) {
// 命中秒传:服务端已经知道这个 hash,直接返回新文件 ID。
if resp.Resumable == nil {
if resp.File.ID != "" {
result.FileID = resp.File.ID
@@ -163,7 +206,7 @@ func (d *Driver) UploadAndReportHash(ctx context.Context, parentID, name string,
return result, nil
}
// 4) 未命中秒传:把字节传到 S3 兼容存储。
// 未命中秒传:把字节传到 S3 兼容存储。
if _, err := tmp.Seek(0, io.SeekStart); err != nil {
return UploadResult{}, fmt.Errorf("pikpak upload: seek tmp: %w", err)
}
@@ -171,7 +214,7 @@ func (d *Driver) UploadAndReportHash(ctx context.Context, parentID, name string,
return UploadResult{}, fmt.Errorf("pikpak upload: oss put: %w", err)
}
// 5) 拿到 fileID。优先走响应里的预分配 ID;为空就回查目录。
// 拿到 fileID。优先走响应里的预分配 ID;为空就回查目录。
if resp.File.ID != "" {
result.FileID = resp.File.ID
return result, nil
@@ -184,6 +227,58 @@ func (d *Driver) UploadAndReportHash(ctx context.Context, parentID, name string,
return result, nil
}
func shouldRetryPikPakUploadAttempt(err error, attempt int) bool {
return attempt < pikpakUploadMaxAttempts && isRetryablePikPakUploadError(err)
}
func pikpakUploadRetryDelay(attempt int) time.Duration {
if attempt <= 0 {
return 0
}
return time.Duration(attempt) * time.Second
}
func (d *Driver) logUploadRetry(name string, attempt int, err error) {
log.Printf("[pikpak] upload retry drive=%s name=%q next_attempt=%d/%d err=%v",
d.id, name, attempt+1, pikpakUploadMaxAttempts, err)
}
func isRetryablePikPakUploadError(err error) bool {
if err == nil {
return false
}
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return false
}
var serviceErr oss.ServiceError
if errors.As(err, &serviceErr) {
return serviceErr.StatusCode == http.StatusTooManyRequests || serviceErr.StatusCode >= 500
}
var netErr net.Error
if errors.As(err, &netErr) {
return true
}
text := strings.ToLower(err.Error())
return strings.Contains(text, "no such host") ||
strings.Contains(text, "temporary failure in name resolution") ||
strings.Contains(text, "server misbehaving") ||
strings.Contains(text, "connection reset") ||
strings.Contains(text, "connection refused") ||
strings.Contains(text, "broken pipe") ||
strings.Contains(text, "eof") ||
strings.Contains(text, "i/o timeout") ||
strings.Contains(text, "tls handshake timeout") ||
strings.Contains(text, "http 429") ||
strings.Contains(text, "http 500") ||
strings.Contains(text, "http 502") ||
strings.Contains(text, "http 503") ||
strings.Contains(text, "http 504") ||
strings.Contains(text, "http 509") ||
strings.Contains(text, "too many requests") ||
strings.Contains(text, "temporarily unavailable") ||
strings.Contains(text, "service unavailable")
}
// bufferAndHashGCID 把 r 复制到一个临时文件,同时计算 GCID。
// 返回临时文件(位置在末尾,需要调用方 Seek 回 0)、GCID hex 大写、实际写入字节数。
//
@@ -215,10 +310,13 @@ func bufferAndHashGCID(r io.Reader, size int64) (*os.File, string, int64, error)
//
// 参数复用 PikPak 的临时凭证;必须带 Security Token 头部 + UserAgent,与 OpenList 一致。
func (d *Driver) uploadToOSS(ctx context.Context, p *s3Params, body io.Reader) error {
if d.uploadToOSSFunc != nil {
return d.uploadToOSSFunc(ctx, p, body)
}
if p == nil {
return errors.New("pikpak upload: nil s3 params")
}
client, err := oss.New(p.Endpoint, p.AccessKeyID, p.AccessKeySecret)
client, err := newPikPakOSSClient(p)
if err != nil {
return fmt.Errorf("oss client: %w", err)
}
@@ -235,6 +333,44 @@ func (d *Driver) uploadToOSS(ctx context.Context, p *s3Params, body io.Reader) e
)
}
func newPikPakOSSClient(p *s3Params, options ...oss.ClientOption) (*oss.Client, error) {
if p == nil {
return nil, errors.New("pikpak upload: nil s3 params")
}
clientOptions := make([]oss.ClientOption, 0, len(options)+1)
if isPikPakCNAMEEndpoint(p.Endpoint) {
clientOptions = append(clientOptions, oss.UseCname(true))
}
clientOptions = append(clientOptions, options...)
return oss.New(p.Endpoint, p.AccessKeyID, p.AccessKeySecret, clientOptions...)
}
func isPikPakCNAMEEndpoint(endpoint string) bool {
host := endpointHost(endpoint)
if host == "" {
return false
}
host = strings.TrimSuffix(strings.ToLower(host), ".")
return host != "mypikpak.com" && host != "mypikpak.net" &&
(strings.HasSuffix(host, ".mypikpak.com") || strings.HasSuffix(host, ".mypikpak.net"))
}
func endpointHost(endpoint string) string {
endpoint = strings.TrimSpace(endpoint)
if endpoint == "" {
return ""
}
if u, err := url.Parse(endpoint); err == nil && u.Host != "" {
endpoint = u.Host
} else if idx := strings.IndexByte(endpoint, '/'); idx >= 0 {
endpoint = endpoint[:idx]
}
if host, _, err := net.SplitHostPort(endpoint); err == nil {
endpoint = host
}
return strings.Trim(endpoint, "[]")
}
type readerWithCtx struct {
ctx context.Context
r io.Reader
@@ -6,12 +6,15 @@ import (
"crypto/sha1"
"encoding/hex"
"encoding/json"
"fmt"
"io"
"net"
"net/http"
"net/http/httptest"
"strings"
"testing"
"github.com/aliyun/aliyun-oss-go-sdk/oss"
"github.com/go-resty/resty/v2"
)
@@ -181,6 +184,95 @@ func TestUploadInstantSuccessFallsBackToListWhenFileIDMissing(t *testing.T) {
}
}
func TestUploadRetriesWithNewSessionWhenOSSEndpointDNSFails(t *testing.T) {
sessionRequests := 0
mux := http.NewServeMux()
mux.HandleFunc("/drive/v1/files", func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
t.Errorf("method = %q, want POST", r.Method)
}
sessionRequests++
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(fmt.Sprintf(`{
"upload_type": "UPLOAD_TYPE_RESUMABLE",
"resumable": {
"kind": "drive#resumable",
"provider": "UPLOAD_TYPE_UNKNOWN",
"params": {
"access_key_id": "ak",
"access_key_secret": "sk",
"bucket": "bucket",
"endpoint": "https://vip-lixian-%02d.upload-a10b.mypikpak.com",
"key": "object-key-%02d",
"security_token": "token"
}
},
"file": {"id": "retry-file-%02d", "name": "retry.mp4", "kind": "drive#file"}
}`, sessionRequests, sessionRequests, sessionRequests)))
})
server := httptest.NewServer(mux)
defer server.Close()
d := newTestDriver(t, server)
uploadAttempts := 0
var uploaded []byte
d.uploadToOSSFunc = func(_ context.Context, _ *s3Params, body io.Reader) error {
uploadAttempts++
if uploadAttempts == 1 {
return &net.DNSError{Err: "no such host", Name: "vip-lixian-01.upload-a10b.mypikpak.com"}
}
var err error
uploaded, err = io.ReadAll(body)
return err
}
payload := []byte("retry payload body")
id, err := d.Upload(context.Background(), "parent-id", "retry.mp4", bytes.NewReader(payload), int64(len(payload)))
if err != nil {
t.Fatalf("upload: %v", err)
}
if id != "retry-file-02" {
t.Fatalf("file id = %q, want retry-file-02 from the second session", id)
}
if sessionRequests != 2 {
t.Fatalf("session requests = %d, want 2", sessionRequests)
}
if uploadAttempts != 2 {
t.Fatalf("upload attempts = %d, want 2", uploadAttempts)
}
if !bytes.Equal(uploaded, payload) {
t.Fatalf("uploaded body = %q, want %q", string(uploaded), string(payload))
}
}
func TestPikPakOSSClientUsesCNAMEForPikPakUploadEndpoint(t *testing.T) {
params := &s3Params{
AccessKeyID: "ak",
AccessKeySecret: "sk",
Bucket: "vip-lixian-07",
Endpoint: "http://upload-a10b.mypikpak.com",
Key: "upload_tmp/object-key",
}
client, err := newPikPakOSSClient(params)
if err != nil {
t.Fatalf("new oss client: %v", err)
}
bucket, err := client.Bucket(params.Bucket)
if err != nil {
t.Fatalf("bucket: %v", err)
}
signed, err := bucket.SignURL(params.Key, oss.HTTPPut, 60)
if err != nil {
t.Fatalf("sign url: %v", err)
}
if strings.Contains(signed, "vip-lixian-07.upload-a10b.mypikpak.com") {
t.Fatalf("signed url uses invalid bucket-prefixed PikPak host: %s", signed)
}
if !strings.Contains(signed, "http://upload-a10b.mypikpak.com/upload_tmp%2Fobject-key") {
t.Fatalf("signed url = %s, want PikPak endpoint host with object key path", signed)
}
}
func TestUploadRejectsInvalidArguments(t *testing.T) {
d := New(Config{ID: "x", Username: "u", Password: "p", Platform: "web"})
cases := []struct {
+30 -13
View File
@@ -16,23 +16,23 @@ import (
)
const (
defaultUA = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) quark-cloud-drive/2.5.20 Chrome/100.0.4896.160 Electron/18.3.5.4-b478491100 Safari/537.36 Channel/pckk_other_ch"
defaultUA = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) quark-cloud-drive/2.5.20 Chrome/100.0.4896.160 Electron/18.3.5.4-b478491100 Safari/537.36 Channel/pckk_other_ch"
defaultReferer = "https://pan.quark.cn"
defaultAPI = "https://drive.quark.cn/1/clouddrive"
defaultPR = "ucpro"
)
type Driver struct {
id string
cookie string
rootID string
ua string
referer string
apiBase string
pr string
client *resty.Client
onCookieUpdate func(string)
useTranscodingAddress bool
id string
cookie string
rootID string
ua string
referer string
apiBase string
pr string
client *resty.Client
onCookieUpdate func(string)
useTranscodingAddress bool
}
type Config struct {
@@ -60,7 +60,7 @@ func New(c Config) *Driver {
onCookieUpdate: c.OnCookieUpdate,
}
d.client = resty.New().
SetTimeout(30 * time.Second).
SetTimeout(30*time.Second).
SetHeader("Accept", "application/json, text/plain, */*").
SetHeader("Referer", d.referer).
SetHeader("User-Agent", d.ua)
@@ -263,12 +263,28 @@ func (d *Driver) findChildDir(ctx context.Context, parent, name string) (string,
return "", nil
}
// ---------- 上传(第一版不实现,走本地 teaser 兜底) ----------
// ---------- 上传(第一版不实现,走本地预览视频兜底) ----------
func (d *Driver) Upload(ctx context.Context, parentID, name string, r io.Reader, size int64) (string, error) {
return "", drives.ErrNotSupported
}
func (d *Driver) Remove(ctx context.Context, fileID string) error {
fileID = strings.TrimSpace(fileID)
if fileID == "" {
return errors.New("quark remove: empty file id")
}
body := map[string]any{
"action_type": 1,
"exclude_fids": []string{},
"filelist": []string{fileID},
}
if err := d.request(ctx, "/file/delete", http.MethodPost, nil, body, nil); err != nil {
return fmt.Errorf("quark remove: %w", err)
}
return nil
}
// ---------- helpers ----------
func fileToEntry(f *file, parentID string) drives.Entry {
@@ -343,3 +359,4 @@ func setCookieValue(cookie, key, value string) string {
}
var _ drives.Drive = (*Driver)(nil)
var _ drives.Remover = (*Driver)(nil)
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,653 @@
package scriptcrawler
import (
"context"
"encoding/json"
"fmt"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strings"
"testing"
"time"
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/fingerprint"
)
const (
scriptCrawlerDuplicateBytes = "duplicate-video-bytes"
scriptCrawlerUniqueBytes = "unique-video-bytes"
)
func writeScriptCrawlerFFprobeStub(t *testing.T, dir string, ok bool) string {
t.Helper()
name := "ffprobe-ok.sh"
body := "#!/bin/sh\necho video\nexit 0\n"
if !ok {
name = "ffprobe-fail.sh"
body = "#!/bin/sh\necho 'moov atom not found' >&2\nexit 1\n"
}
path := filepath.Join(dir, name)
if err := os.WriteFile(path, []byte(body), 0o755); err != nil {
t.Fatalf("write ffprobe stub: %v", err)
}
return path
}
func writeScriptCrawlerFFmpegStub(t *testing.T, dir string) string {
t.Helper()
path := filepath.Join(dir, "ffmpeg-hls.sh")
body := "#!/bin/sh\nout=\"\"\nfor arg do out=\"$arg\"; done\nprintf 'hls-video-bytes' > \"$out\"\n"
if err := os.WriteFile(path, []byte(body), 0o755); err != nil {
t.Fatalf("write ffmpeg stub: %v", err)
}
return path
}
func TestCrawlerRunOnceImportsLocalFileAndSkipsExisting(t *testing.T) {
ctx := context.Background()
tmp := t.TempDir()
cat, err := catalog.Open(filepath.Join(tmp, "catalog.db"))
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := New(Config{ID: "demo", RootDir: filepath.Join(tmp, "crawler")})
if err := drv.Init(ctx); err != nil {
t.Fatalf("driver init: %v", err)
}
dummyScript := filepath.Join(tmp, "helper-script")
if err := os.WriteFile(dummyScript, []byte("helper"), 0o755); err != nil {
t.Fatalf("write dummy script: %v", err)
}
wrapper := filepath.Join(tmp, "helper-wrapper.sh")
wrapperScript := fmt.Sprintf("#!/bin/sh\nexec %q -test.run=TestScriptCrawlerHelperProcess \"$@\"\n", os.Args[0])
if err := os.WriteFile(wrapper, []byte(wrapperScript), 0o755); err != nil {
t.Fatalf("write helper wrapper: %v", err)
}
t.Setenv("GO_WANT_SCRIPTCRAWLER_HELPER", "1")
c := NewCrawler(CrawlerConfig{
Driver: drv,
Catalog: cat,
CrawlerName: "Demo Crawler",
PythonPath: wrapper,
FFprobePath: writeScriptCrawlerFFprobeStub(t, tmp, true),
ScriptPath: dummyScript,
})
res, err := c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("run once: %v", err)
}
if res.NewVideos != 1 || res.Skipped != 0 || res.Failed != 0 {
t.Fatalf("result = new:%d skipped:%d failed:%d, want 1/0/0", res.NewVideos, res.Skipped, res.Failed)
}
v, err := cat.GetVideo(ctx, BuildVideoID("demo", "abc-123"))
if err != nil {
t.Fatalf("get video: %v", err)
}
if v.Title != "Imported From Helper" || v.FileID != "abc-123.mp4" || v.Size == 0 {
t.Fatalf("video = title:%q file:%q size:%d", v.Title, v.FileID, v.Size)
}
if !hasString(v.Tags, "Demo Crawler") {
t.Fatalf("video tags = %#v, want crawler name tag", v.Tags)
}
if _, err := os.Stat(filepath.Join(drv.VideosDir(), "abc-123.mp4")); err != nil {
t.Fatalf("video file not copied: %v", err)
}
res, err = c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("second run: %v", err)
}
if res.NewVideos != 0 || res.Skipped != 1 {
t.Fatalf("second result = new:%d skipped:%d, want 0/1", res.NewVideos, res.Skipped)
}
if res.SeenSnapshot != 1 {
t.Fatalf("seen snapshot = %d, want 1", res.SeenSnapshot)
}
}
func TestCrawlerRunOnceUsesSourceKindNamespace(t *testing.T) {
ctx := context.Background()
tmp := t.TempDir()
cat, err := catalog.Open(filepath.Join(tmp, "catalog.db"))
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := New(Config{ID: "demo", RootDir: filepath.Join(tmp, "crawler")})
if err := drv.Init(ctx); err != nil {
t.Fatalf("driver init: %v", err)
}
dummyScript := filepath.Join(tmp, "helper-script")
if err := os.WriteFile(dummyScript, []byte("helper"), 0o755); err != nil {
t.Fatalf("write dummy script: %v", err)
}
wrapper := filepath.Join(tmp, "helper-wrapper.sh")
wrapperScript := fmt.Sprintf("#!/bin/sh\nexec %q -test.run=TestScriptCrawlerHelperProcess \"$@\"\n", os.Args[0])
if err := os.WriteFile(wrapper, []byte(wrapperScript), 0o755); err != nil {
t.Fatalf("write helper wrapper: %v", err)
}
t.Setenv("GO_WANT_SCRIPTCRAWLER_HELPER", "1")
c := NewCrawler(CrawlerConfig{
Driver: drv,
Catalog: cat,
SourceKind: "spider91",
PythonPath: wrapper,
FFprobePath: writeScriptCrawlerFFprobeStub(t, tmp, true),
ScriptPath: dummyScript,
})
res, err := c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("run once: %v", err)
}
if res.NewVideos != 1 || res.SeenSnapshot != 0 {
t.Fatalf("result = new:%d seen:%d, want 1/0", res.NewVideos, res.SeenSnapshot)
}
videoID := BuildVideoIDForKind("spider91", "demo", "abc-123")
if _, err := cat.GetVideo(ctx, videoID); err != nil {
t.Fatalf("get source-kind video: %v", err)
}
if _, err := cat.GetVideo(ctx, BuildVideoID("demo", "abc-123")); err == nil {
t.Fatalf("default namespace video unexpectedly exists")
}
res, err = c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("second run: %v", err)
}
if res.NewVideos != 0 || res.Skipped != 1 || res.SeenSnapshot != 1 {
t.Fatalf("second result = new:%d skipped:%d seen:%d, want 0/1/1", res.NewVideos, res.Skipped, res.SeenSnapshot)
}
}
func TestCrawlerRunOncePassesAbsoluteJobPathsWhenWorkDirDiffers(t *testing.T) {
ctx := context.Background()
tmp := t.TempDir()
t.Chdir(tmp)
cat, err := catalog.Open(filepath.Join(tmp, "catalog.db"))
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := New(Config{ID: "demo", RootDir: filepath.Join("data", "crawler")})
if err := drv.Init(ctx); err != nil {
t.Fatalf("driver init: %v", err)
}
scriptDir := filepath.Join(tmp, "scripts")
if err := os.MkdirAll(scriptDir, 0o755); err != nil {
t.Fatalf("mkdir script dir: %v", err)
}
dummyScript := filepath.Join(scriptDir, "helper-script")
if err := os.WriteFile(dummyScript, []byte("helper"), 0o755); err != nil {
t.Fatalf("write dummy script: %v", err)
}
wrapper := filepath.Join(tmp, "helper-wrapper.sh")
wrapperScript := fmt.Sprintf("#!/bin/sh\nexec %q -test.run=TestScriptCrawlerHelperProcess \"$@\"\n", os.Args[0])
if err := os.WriteFile(wrapper, []byte(wrapperScript), 0o755); err != nil {
t.Fatalf("write helper wrapper: %v", err)
}
t.Setenv("GO_WANT_SCRIPTCRAWLER_HELPER", "1")
t.Setenv("GO_WANT_SCRIPTCRAWLER_ASSERT_ABS", "1")
c := NewCrawler(CrawlerConfig{
Driver: drv,
Catalog: cat,
PythonPath: wrapper,
FFprobePath: writeScriptCrawlerFFprobeStub(t, tmp, true),
ScriptPath: dummyScript,
WorkDir: scriptDir,
})
res, err := c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("run once: %v", err)
}
if res.NewVideos != 1 || res.Skipped != 0 || res.Failed != 0 {
t.Fatalf("result = new:%d skipped:%d failed:%d, want 1/0/0", res.NewVideos, res.Skipped, res.Failed)
}
if !filepath.IsAbs(res.JobFile) || !filepath.IsAbs(res.SeenFile) {
t.Fatalf("result paths should be absolute: job=%q seen=%q", res.JobFile, res.SeenFile)
}
}
func TestCrawlerRunOnceImportsSimpleMediaURLWithoutSourceID(t *testing.T) {
ctx := context.Background()
tmp := t.TempDir()
cat, err := catalog.Open(filepath.Join(tmp, "catalog.db"))
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/video.mp4" {
http.NotFound(w, r)
return
}
_, _ = w.Write([]byte("simple-video-bytes"))
}))
defer srv.Close()
drv := New(Config{ID: "demo", RootDir: filepath.Join(tmp, "crawler")})
if err := drv.Init(ctx); err != nil {
t.Fatalf("driver init: %v", err)
}
dummyScript := filepath.Join(tmp, "helper-script")
if err := os.WriteFile(dummyScript, []byte("helper"), 0o755); err != nil {
t.Fatalf("write dummy script: %v", err)
}
wrapper := filepath.Join(tmp, "helper-wrapper.sh")
wrapperScript := fmt.Sprintf("#!/bin/sh\nexec %q -test.run=TestScriptCrawlerHelperProcess \"$@\"\n", os.Args[0])
if err := os.WriteFile(wrapper, []byte(wrapperScript), 0o755); err != nil {
t.Fatalf("write helper wrapper: %v", err)
}
t.Setenv("GO_WANT_SCRIPTCRAWLER_HELPER", "1")
t.Setenv("GO_WANT_SCRIPTCRAWLER_SIMPLE", "1")
t.Setenv("GO_SCRIPTCRAWLER_MEDIA_URL", srv.URL+"/video.mp4?token=first")
c := NewCrawler(CrawlerConfig{
Driver: drv,
Catalog: cat,
PythonPath: wrapper,
FFprobePath: writeScriptCrawlerFFprobeStub(t, tmp, true),
ScriptPath: dummyScript,
HTTPClient: srv.Client(),
})
res, err := c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("run once: %v", err)
}
if res.NewVideos != 1 || res.Skipped != 0 || res.Failed != 0 {
t.Fatalf("result = new:%d skipped:%d failed:%d, want 1/0/0", res.NewVideos, res.Skipped, res.Failed)
}
videos, err := cat.ListVideosByDrive(ctx, "demo")
if err != nil {
t.Fatalf("list videos: %v", err)
}
if len(videos) != 1 {
t.Fatalf("videos = %d, want 1", len(videos))
}
v := videos[0]
if !strings.HasPrefix(v.ID, BuildVideoID("demo", "auto-")) {
t.Fatalf("video id = %q, want generated auto source id", v.ID)
}
if v.Title != "Simple Protocol Video" || v.Ext != "mp4" || v.ThumbnailURL != "" || v.Size == 0 {
t.Fatalf("video = title:%q ext:%q thumb:%q size:%d", v.Title, v.Ext, v.ThumbnailURL, v.Size)
}
if _, err := os.Stat(filepath.Join(drv.VideosDir(), v.FileID)); err != nil {
t.Fatalf("video file not downloaded: %v", err)
}
t.Setenv("GO_SCRIPTCRAWLER_MEDIA_URL", srv.URL+"/video.mp4?token=second")
res, err = c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("second run: %v", err)
}
if res.NewVideos != 0 || res.Skipped != 1 {
t.Fatalf("second result = new:%d skipped:%d, want 0/1", res.NewVideos, res.Skipped)
}
}
func TestCrawlerRunOnceSkipsFingerprintDuplicateAndContinues(t *testing.T) {
ctx := context.Background()
tmp := t.TempDir()
cat, err := catalog.Open(filepath.Join(tmp, "catalog.db"))
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := New(Config{ID: "demo", RootDir: filepath.Join(tmp, "crawler")})
if err := drv.Init(ctx); err != nil {
t.Fatalf("driver init: %v", err)
}
seedFile := "seed-canonical.mp4"
if err := os.WriteFile(filepath.Join(drv.VideosDir(), seedFile), []byte(scriptCrawlerDuplicateBytes), 0o644); err != nil {
t.Fatalf("write seed video: %v", err)
}
seed := &catalog.Video{
ID: "seed-for-hash",
DriveID: drv.ID(),
FileID: seedFile,
Title: "Seed",
Size: int64(len(scriptCrawlerDuplicateBytes)),
PublishedAt: time.Now(),
}
sampled, err := fingerprint.Compute(ctx, drv, seed, fingerprint.Config{}, nil)
if err != nil {
t.Fatalf("compute seed fingerprint: %v", err)
}
_ = os.Remove(filepath.Join(drv.VideosDir(), seedFile))
now := time.Now()
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: "existing-canonical",
DriveID: "other-drive",
FileID: "existing.mp4",
FileName: "existing.mp4",
Title: "Existing Canonical",
Size: int64(len(scriptCrawlerDuplicateBytes)),
Ext: "mp4",
SampledSHA256: sampled,
FingerprintStatus: "ready",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed canonical video: %v", err)
}
dummyScript := filepath.Join(tmp, "helper-script")
if err := os.WriteFile(dummyScript, []byte("helper"), 0o755); err != nil {
t.Fatalf("write dummy script: %v", err)
}
wrapper := filepath.Join(tmp, "helper-wrapper.sh")
wrapperScript := fmt.Sprintf("#!/bin/sh\nexec %q -test.run=TestScriptCrawlerHelperProcess \"$@\"\n", os.Args[0])
if err := os.WriteFile(wrapper, []byte(wrapperScript), 0o755); err != nil {
t.Fatalf("write helper wrapper: %v", err)
}
t.Setenv("GO_WANT_SCRIPTCRAWLER_HELPER", "1")
t.Setenv("GO_WANT_SCRIPTCRAWLER_DUP_UNIQUE", "1")
c := NewCrawler(CrawlerConfig{
Driver: drv,
Catalog: cat,
PythonPath: wrapper,
FFprobePath: writeScriptCrawlerFFprobeStub(t, tmp, true),
ScriptPath: dummyScript,
})
res, err := c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("run once: %v", err)
}
if res.NewVideos != 1 || res.Skipped != 1 || res.Failed != 0 || res.TotalEntries != 2 {
t.Fatalf("result = total:%d new:%d skipped:%d failed:%d, want 2/1/1/0", res.TotalEntries, res.NewVideos, res.Skipped, res.Failed)
}
if res.CandidateBudget <= res.TargetNew {
t.Fatalf("candidate budget = %d, target = %d; want expanded budget", res.CandidateBudget, res.TargetNew)
}
if _, err := cat.GetVideo(ctx, BuildVideoID("demo", "dup-source")); err == nil {
t.Fatal("duplicate candidate should not be imported")
}
if _, err := os.Stat(filepath.Join(drv.VideosDir(), "dup-source.mp4")); !os.IsNotExist(err) {
t.Fatalf("duplicate local file stat = %v, want removed", err)
}
v, err := cat.GetVideo(ctx, BuildVideoID("demo", "unique-source"))
if err != nil {
t.Fatalf("unique video should be imported: %v", err)
}
if v.SampledSHA256 == "" || v.FingerprintStatus != "ready" {
t.Fatalf("unique fingerprint = %q status=%q, want ready sampled fingerprint", v.SampledSHA256, v.FingerprintStatus)
}
seen, err := cat.ListCrawlerSourceIDs(ctx, Kind, "demo")
if err != nil {
t.Fatalf("list seen source ids: %v", err)
}
seenSet := map[string]bool{}
for _, id := range seen {
seenSet[id] = true
}
if !seenSet["dup-source"] || !seenSet["unique-source"] {
t.Fatalf("seen ids = %#v, want duplicate and imported source ids", seen)
}
}
func TestCrawlerRunOnceRejectsInvalidDownloadedVideo(t *testing.T) {
ctx := context.Background()
tmp := t.TempDir()
cat, err := catalog.Open(filepath.Join(tmp, "catalog.db"))
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := New(Config{ID: "demo", RootDir: filepath.Join(tmp, "crawler")})
if err := drv.Init(ctx); err != nil {
t.Fatalf("driver init: %v", err)
}
dummyScript := filepath.Join(tmp, "helper-script")
if err := os.WriteFile(dummyScript, []byte("helper"), 0o755); err != nil {
t.Fatalf("write dummy script: %v", err)
}
wrapper := filepath.Join(tmp, "helper-wrapper.sh")
wrapperScript := fmt.Sprintf("#!/bin/sh\nexec %q -test.run=TestScriptCrawlerHelperProcess \"$@\"\n", os.Args[0])
if err := os.WriteFile(wrapper, []byte(wrapperScript), 0o755); err != nil {
t.Fatalf("write helper wrapper: %v", err)
}
t.Setenv("GO_WANT_SCRIPTCRAWLER_HELPER", "1")
c := NewCrawler(CrawlerConfig{
Driver: drv,
Catalog: cat,
CrawlerName: "Demo Crawler",
PythonPath: wrapper,
FFprobePath: writeScriptCrawlerFFprobeStub(t, tmp, false),
ScriptPath: dummyScript,
})
res, err := c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("run once: %v", err)
}
if res.NewVideos != 0 || res.Skipped != 0 || res.Failed != 1 || res.TotalEntries != 1 {
t.Fatalf("result = total:%d new:%d skipped:%d failed:%d, want 1/0/0/1", res.TotalEntries, res.NewVideos, res.Skipped, res.Failed)
}
if _, err := cat.GetVideo(ctx, BuildVideoID("demo", "abc-123")); err == nil {
t.Fatal("invalid video should not be imported")
}
if _, err := os.Stat(filepath.Join(drv.VideosDir(), "abc-123.mp4")); !os.IsNotExist(err) {
t.Fatalf("invalid local video stat = %v, want removed", err)
}
seen, err := cat.ListCrawlerSourceIDs(ctx, Kind, "demo")
if err != nil {
t.Fatalf("list seen source ids: %v", err)
}
if len(seen) != 0 {
t.Fatalf("seen ids = %#v, want none for invalid video", seen)
}
}
func TestCrawlerRunOnceDownloadsHLSMediaURL(t *testing.T) {
ctx := context.Background()
tmp := t.TempDir()
cat, err := catalog.Open(filepath.Join(tmp, "catalog.db"))
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := New(Config{ID: "demo", RootDir: filepath.Join(tmp, "crawler")})
if err := drv.Init(ctx); err != nil {
t.Fatalf("driver init: %v", err)
}
dummyScript := filepath.Join(tmp, "helper-script")
if err := os.WriteFile(dummyScript, []byte("helper"), 0o755); err != nil {
t.Fatalf("write dummy script: %v", err)
}
wrapper := filepath.Join(tmp, "helper-wrapper.sh")
wrapperScript := fmt.Sprintf("#!/bin/sh\nexec %q -test.run=TestScriptCrawlerHelperProcess \"$@\"\n", os.Args[0])
if err := os.WriteFile(wrapper, []byte(wrapperScript), 0o755); err != nil {
t.Fatalf("write helper wrapper: %v", err)
}
t.Setenv("GO_WANT_SCRIPTCRAWLER_HELPER", "1")
t.Setenv("GO_WANT_SCRIPTCRAWLER_HLS", "1")
c := NewCrawler(CrawlerConfig{
Driver: drv,
Catalog: cat,
CrawlerName: "Demo Crawler",
PythonPath: wrapper,
FFmpegPath: writeScriptCrawlerFFmpegStub(t, tmp),
FFprobePath: writeScriptCrawlerFFprobeStub(t, tmp, true),
ScriptPath: dummyScript,
})
res, err := c.RunOnce(ctx, 1)
if err != nil {
t.Fatalf("run once: %v", err)
}
if res.NewVideos != 1 || res.Skipped != 0 || res.Failed != 0 {
t.Fatalf("result = new:%d skipped:%d failed:%d, want 1/0/0", res.NewVideos, res.Skipped, res.Failed)
}
v, err := cat.GetVideo(ctx, BuildVideoID("demo", "hls-source"))
if err != nil {
t.Fatalf("get hls video: %v", err)
}
if v.FileID != "hls-source.mp4" || v.Size != int64(len("hls-video-bytes")) {
t.Fatalf("video file=%q size=%d, want hls-source.mp4 size %d", v.FileID, v.Size, len("hls-video-bytes"))
}
data, err := os.ReadFile(filepath.Join(drv.VideosDir(), "hls-source.mp4"))
if err != nil {
t.Fatalf("read hls output: %v", err)
}
if string(data) != "hls-video-bytes" {
t.Fatalf("hls output = %q", string(data))
}
}
func TestScriptCrawlerHelperProcess(t *testing.T) {
if os.Getenv("GO_WANT_SCRIPTCRAWLER_HELPER") != "1" {
return
}
args := os.Args
jobPath := ""
for i := 0; i < len(args)-1; i++ {
if args[i] == "--job" {
jobPath = args[i+1]
break
}
}
if jobPath == "" {
fmt.Fprintln(os.Stderr, "missing --job")
os.Exit(2)
}
data, err := os.ReadFile(jobPath)
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(2)
}
var job Job
if err := json.Unmarshal(data, &job); err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(2)
}
if os.Getenv("GO_WANT_SCRIPTCRAWLER_ASSERT_ABS") == "1" {
if !filepath.IsAbs(jobPath) || !filepath.IsAbs(job.SeenSourceIDsFile) || !filepath.IsAbs(job.OutputDir) {
fmt.Fprintf(os.Stderr, "expected absolute paths, got job=%q seen=%q output=%q\n", jobPath, job.SeenSourceIDsFile, job.OutputDir)
os.Exit(2)
}
}
if os.Getenv("GO_WANT_SCRIPTCRAWLER_SIMPLE") == "1" {
event := map[string]any{
"title": "Simple Protocol Video",
"media_url": os.Getenv("GO_SCRIPTCRAWLER_MEDIA_URL"),
}
_ = json.NewEncoder(os.Stdout).Encode(event)
os.Exit(0)
}
if os.Getenv("GO_WANT_SCRIPTCRAWLER_HLS") == "1" {
event := Event{
Type: "item",
Item: Item{
SourceID: "hls-source",
Title: "HLS Protocol Video",
Author: "helper",
Media: MediaRef{
URL: "https://media.example.test/video.m3u8",
Headers: map[string]string{
"Referer": "https://example.test/",
},
},
},
}
_ = json.NewEncoder(os.Stdout).Encode(event)
os.Exit(0)
}
if os.Getenv("GO_WANT_SCRIPTCRAWLER_DUP_UNIQUE") == "1" {
duplicateFile := filepath.Join(job.OutputDir, "duplicate.mp4")
if err := os.WriteFile(duplicateFile, []byte(scriptCrawlerDuplicateBytes), 0o644); err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(2)
}
uniqueFile := filepath.Join(job.OutputDir, "unique.mp4")
if err := os.WriteFile(uniqueFile, []byte(scriptCrawlerUniqueBytes), 0o644); err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(2)
}
for _, event := range []Event{
{
Type: "item",
Item: Item{
SourceID: "dup-source",
Title: "Duplicate Candidate",
Author: "helper",
Media: MediaRef{LocalFile: duplicateFile},
},
},
{
Type: "item",
Item: Item{
SourceID: "unique-source",
Title: "Unique Candidate",
Author: "helper",
Media: MediaRef{LocalFile: uniqueFile},
},
},
} {
_ = json.NewEncoder(os.Stdout).Encode(event)
}
os.Exit(0)
}
localFile := filepath.Join(job.OutputDir, "helper.mp4")
if err := os.WriteFile(localFile, []byte("helper-video"), 0o644); err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(2)
}
event := Event{
Type: "item",
Item: Item{
SourceID: "abc-123",
Title: "Imported From Helper",
Author: "helper",
Media: MediaRef{LocalFile: localFile},
},
}
_ = json.NewEncoder(os.Stdout).Encode(event)
os.Exit(0)
}
func hasString(values []string, want string) bool {
for _, value := range values {
if value == want {
return true
}
}
return false
}
@@ -0,0 +1,213 @@
// Package scriptcrawler provides a generic local drive for script-based
// crawlers. A crawler script discovers videos; the Go runner downloads them
// into this drive and the existing preview/fingerprint workers consume them
// through the normal drives.Drive interface.
package scriptcrawler
import (
"context"
"errors"
"io"
"os"
"path/filepath"
"strings"
"time"
"github.com/video-site/backend/internal/drives"
)
const Kind = "scriptcrawler"
type Config struct {
ID string
RootDir string
}
type Driver struct {
id string
rootDir string
}
func New(c Config) *Driver {
return &Driver{id: c.ID, rootDir: c.RootDir}
}
func (d *Driver) Kind() string { return Kind }
func (d *Driver) ID() string { return d.id }
func (d *Driver) RootID() string { return "/" }
func (d *Driver) Init(context.Context) error {
if strings.TrimSpace(d.id) == "" {
return errors.New("scriptcrawler: empty drive id")
}
if strings.TrimSpace(d.rootDir) == "" {
return errors.New("scriptcrawler: empty root dir")
}
for _, sub := range []string{"videos", "thumbs", "output", ".crawl"} {
if err := os.MkdirAll(filepath.Join(d.rootDir, sub), 0o755); err != nil {
return err
}
}
return nil
}
func (d *Driver) RootDir() string { return d.rootDir }
func (d *Driver) VideosDir() string { return filepath.Join(d.rootDir, "videos") }
func (d *Driver) ThumbsDir() string { return filepath.Join(d.rootDir, "thumbs") }
func (d *Driver) OutputDir() string { return filepath.Join(d.rootDir, "output") }
func (d *Driver) CrawlDir() string { return filepath.Join(d.rootDir, ".crawl") }
func (d *Driver) VideoPath(fileID string) (string, error) {
return safeJoin(d.VideosDir(), fileID)
}
func (d *Driver) ThumbPath(fileID string) (string, error) {
return safeJoin(d.ThumbsDir(), fileID)
}
func (d *Driver) OutputPath(fileName string) (string, error) {
return safeJoin(d.OutputDir(), fileName)
}
func (d *Driver) List(context.Context, string) ([]drives.Entry, error) {
entries, err := os.ReadDir(d.VideosDir())
if err != nil {
if os.IsNotExist(err) {
return nil, nil
}
return nil, err
}
out := make([]drives.Entry, 0, len(entries))
for _, e := range entries {
if e.IsDir() {
continue
}
info, err := e.Info()
if err != nil {
continue
}
out = append(out, drives.Entry{
ID: e.Name(),
Name: e.Name(),
Size: info.Size(),
IsDir: false,
ModTime: info.ModTime(),
})
}
return out, nil
}
func (d *Driver) Stat(ctx context.Context, fileID string) (*drives.Entry, error) {
path, err := d.VideoPath(fileID)
if err != nil {
return nil, err
}
info, err := os.Stat(path)
if err != nil {
return nil, err
}
return &drives.Entry{
ID: fileID,
Name: fileID,
Size: info.Size(),
IsDir: info.IsDir(),
ModTime: info.ModTime(),
}, nil
}
func (d *Driver) StreamURL(ctx context.Context, fileID string) (*drives.StreamLink, error) {
path, err := d.VideoPath(fileID)
if err != nil {
return nil, err
}
info, err := os.Stat(path)
if err != nil {
return nil, err
}
if info.IsDir() || info.Size() == 0 {
return nil, os.ErrNotExist
}
return &drives.StreamLink{
URL: path,
Expires: time.Now().Add(24 * time.Hour),
}, nil
}
func (d *Driver) Upload(context.Context, string, string, io.Reader, int64) (string, error) {
return "", drives.ErrNotSupported
}
func (d *Driver) EnsureDir(context.Context, string) (string, error) {
return "", drives.ErrNotSupported
}
func (d *Driver) Remove(ctx context.Context, fileID string) error {
if err := ctx.Err(); err != nil {
return err
}
videoPath, err := d.VideoPath(fileID)
if err != nil {
return err
}
info, err := os.Stat(videoPath)
if err != nil {
if os.IsNotExist(err) {
removeThumbCandidates(d.ThumbPath, strings.TrimSuffix(fileID, filepath.Ext(fileID)))
return nil
}
return err
}
if info.IsDir() {
return errors.New("scriptcrawler: refusing to remove directory")
}
if err := os.Remove(videoPath); err != nil && !os.IsNotExist(err) {
return err
}
removeThumbCandidates(d.ThumbPath, strings.TrimSuffix(fileID, filepath.Ext(fileID)))
return nil
}
func removeThumbCandidates(pathFor func(string) (string, error), stem string) {
stem = strings.TrimSpace(stem)
if stem == "" {
return
}
for _, ext := range []string{".jpg", ".jpeg", ".png", ".webp"} {
path, err := pathFor(stem + ext)
if err != nil {
continue
}
_ = os.Remove(path)
}
}
func safeJoin(root, fileID string) (string, error) {
id := strings.TrimSpace(fileID)
if id == "" || filepath.Base(id) != id {
return "", errors.New("scriptcrawler: invalid file id")
}
if strings.TrimSpace(root) == "" {
return "", errors.New("scriptcrawler: empty root")
}
rootAbs, err := filepath.Abs(root)
if err != nil {
return "", err
}
pathAbs, err := filepath.Abs(filepath.Join(rootAbs, id))
if err != nil {
return "", err
}
if pathAbs != rootAbs && !strings.HasPrefix(pathAbs, rootAbs+string(os.PathSeparator)) {
return "", errors.New("scriptcrawler: file id escapes root")
}
return pathAbs, nil
}
var _ drives.Drive = (*Driver)(nil)
var _ drives.Remover = (*Driver)(nil)
@@ -0,0 +1,386 @@
package scriptcrawler
import (
"bufio"
"context"
"encoding/json"
"fmt"
"net/http"
"os"
"os/exec"
"path/filepath"
"strings"
"sync"
"syscall"
"time"
)
// DryRun 在不入库的前提下试跑一个爬虫脚本:临时目录里生成 job.json,
// 启动脚本进程,拿到第一条(或前 MaxItems 条)item 事件后立即停止,
// 再对视频直链做一次小范围探测,验证脚本"能不能爬取到视频"。
// 用于后台导入脚本后的"测试脚本"按钮。
const (
defaultDryRunTimeout = 2 * time.Minute
dryRunLogTailLines = 60
dryRunMediaProbeLimit = 20 * time.Second
dryRunStopGrace = 100 * time.Millisecond
)
type DryRunConfig struct {
PythonPath string
ScriptPath string
ProxyURL string
ConfigJSON string
// MaxItems 收到多少条 item 后停止脚本,默认 1。
MaxItems int
// Timeout 整个试跑的硬上限,默认 2 分钟。
Timeout time.Duration
// SkipMediaProbe 跳过视频直链可达性探测(单测注入用)。
SkipMediaProbe bool
HTTPClient *http.Client
}
type DryRunItem struct {
Title string `json:"title"`
SourceID string `json:"sourceId,omitempty"`
MediaURL string `json:"mediaUrl,omitempty"`
MediaLocalFile string `json:"mediaLocalFile,omitempty"`
ThumbnailURL string `json:"thumbnailUrl,omitempty"`
DetailURL string `json:"detailUrl,omitempty"`
}
type DryRunMediaCheck struct {
OK bool `json:"ok"`
Status int `json:"status,omitempty"`
ContentType string `json:"contentType,omitempty"`
ContentLength int64 `json:"contentLengthBytes,omitempty"`
Error string `json:"error,omitempty"`
}
type DryRunResult struct {
OK bool `json:"ok"`
Items []DryRunItem `json:"items"`
MediaCheck *DryRunMediaCheck `json:"mediaCheck,omitempty"`
Error string `json:"error,omitempty"`
Log []string `json:"log,omitempty"`
DurationMs int64 `json:"durationMs"`
}
func DryRun(ctx context.Context, cfg DryRunConfig) *DryRunResult {
started := time.Now()
result := &DryRunResult{Items: []DryRunItem{}}
defer func() { result.DurationMs = time.Since(started).Milliseconds() }()
scriptPath := strings.TrimSpace(cfg.ScriptPath)
if scriptPath == "" {
result.Error = "脚本路径为空,请先导入脚本"
return result
}
if _, err := os.Stat(scriptPath); err != nil {
result.Error = fmt.Sprintf("脚本不存在: %v", err)
return result
}
pythonPath := strings.TrimSpace(cfg.PythonPath)
if pythonPath == "" {
pythonPath = "python3"
}
maxItems := cfg.MaxItems
if maxItems <= 0 {
maxItems = 1
}
timeout := cfg.Timeout
if timeout <= 0 {
timeout = defaultDryRunTimeout
}
tmpDir, err := os.MkdirTemp("", "crawler-dryrun-")
if err != nil {
result.Error = fmt.Sprintf("创建临时目录失败: %v", err)
return result
}
defer os.RemoveAll(tmpDir)
outputDir := filepath.Join(tmpDir, "output")
if err := os.MkdirAll(outputDir, 0o755); err != nil {
result.Error = fmt.Sprintf("创建输出目录失败: %v", err)
return result
}
seenPath := filepath.Join(tmpDir, "seen.txt")
if err := os.WriteFile(seenPath, nil, 0o644); err != nil {
result.Error = fmt.Sprintf("写入 seen 文件失败: %v", err)
return result
}
configJSON := json.RawMessage([]byte("{}"))
if raw := strings.TrimSpace(cfg.ConfigJSON); raw != "" {
if !json.Valid([]byte(raw)) {
result.Error = "自定义配置必须是合法 JSON"
return result
}
configJSON = json.RawMessage(raw)
}
job := Job{
Protocol: "crawler.v1",
Mode: "crawl",
RunID: "dryrun-" + started.UTC().Format("20060102T150405Z"),
CrawlerID: "dryrun",
TargetNew: maxItems,
SeenSourceIDsFile: seenPath,
OutputDir: outputDir,
Config: configJSON,
Network: JobNetwork{ProxyURL: strings.TrimSpace(cfg.ProxyURL)},
}
jobPath := filepath.Join(tmpDir, "job.json")
jobData, err := json.MarshalIndent(job, "", " ")
if err != nil {
result.Error = fmt.Sprintf("生成 job 文件失败: %v", err)
return result
}
if err := os.WriteFile(jobPath, jobData, 0o600); err != nil {
result.Error = fmt.Sprintf("写入 job 文件失败: %v", err)
return result
}
runCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
cmd := exec.CommandContext(runCtx, pythonPath, scriptPath, "--job", jobPath)
cmd.Dir = filepath.Dir(scriptPath)
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
cmd.Cancel = func() error {
return killDryRunProcess(cmd)
}
// 超时或提前 kill 后,脚本派生的子进程可能仍持有 stdout/stderr 管道;
// WaitDelay 强制在宽限期后关闭管道,避免读取端永久阻塞。
cmd.WaitDelay = 3 * time.Second
if proxyURL := strings.TrimSpace(cfg.ProxyURL); proxyURL != "" {
cmd.Env = append(os.Environ(),
"HTTP_PROXY="+proxyURL,
"HTTPS_PROXY="+proxyURL,
"http_proxy="+proxyURL,
"https_proxy="+proxyURL,
"NO_PROXY=",
"no_proxy=",
)
}
stdout, err := cmd.StdoutPipe()
if err != nil {
result.Error = fmt.Sprintf("启动脚本失败: %v", err)
return result
}
stderr, err := cmd.StderrPipe()
if err != nil {
_ = stdout.Close()
result.Error = fmt.Sprintf("启动脚本失败: %v", err)
return result
}
if err := cmd.Start(); err != nil {
_ = stdout.Close()
_ = stderr.Close()
result.Error = fmt.Sprintf("启动脚本失败: %v", err)
return result
}
// stderr 是脚本日志,保留尾部若干行用于排错回显。
var logMu sync.Mutex
logTail := make([]string, 0, dryRunLogTailLines)
stderrDone := make(chan struct{})
go func() {
defer close(stderrDone)
scanner := bufio.NewScanner(stderr)
scanner.Buffer(make([]byte, 64*1024), 1024*1024)
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
logMu.Lock()
if len(logTail) >= dryRunLogTailLines {
logTail = logTail[1:]
}
logTail = append(logTail, line)
logMu.Unlock()
}
}()
items := []DryRunItem{}
var firstMediaHeaders map[string]string
parseFailures := 0
scanner := bufio.NewScanner(stdout)
scanner.Buffer(make([]byte, 64*1024), 4*1024*1024)
for scanner.Scan() {
if runCtx.Err() != nil {
break
}
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
var event Event
if err := json.Unmarshal([]byte(line), &event); err != nil {
parseFailures++
continue
}
eventType := strings.ToLower(strings.TrimSpace(event.Type))
item := event.normalizedItem()
if eventType == "" && item.hasPayload() {
eventType = "item"
}
if eventType != "item" {
continue
}
normalized, _, err := normalizeItemForImport(item)
if err != nil {
result.Error = fmt.Sprintf("item 字段不完整: %v", err)
continue
}
mediaURL := strings.TrimSpace(normalized.Media.URL)
if len(items) == 0 {
firstMediaHeaders = normalized.Media.Headers
}
items = append(items, DryRunItem{
Title: strings.TrimSpace(normalized.Title),
SourceID: strings.TrimSpace(item.SourceID),
MediaURL: mediaURL,
MediaLocalFile: strings.TrimSpace(normalized.Media.LocalFile),
ThumbnailURL: strings.TrimSpace(normalized.Thumbnail.URL),
DetailURL: strings.TrimSpace(normalized.DetailURL),
})
if len(items) >= maxItems {
break
}
}
// 拿够了就停掉脚本,避免它继续翻页。给已经自然结束的脚本一个很短
// 的宽限期,让 stderr 日志先被管道读完,避免 dry-run 回显偶发为空。
waitDone := make(chan struct{})
go func() {
_ = cmd.Wait()
close(waitDone)
}()
select {
case <-waitDone:
case <-time.After(dryRunStopGrace):
_ = killDryRunProcess(cmd)
<-waitDone
}
<-stderrDone
logMu.Lock()
result.Log = append([]string{}, logTail...)
logMu.Unlock()
result.Items = items
if len(items) == 0 {
if result.Error == "" {
switch {
case runCtx.Err() != nil && ctx.Err() == nil:
result.Error = fmt.Sprintf("测试超时(%s),脚本没有输出任何视频", timeout)
case parseFailures > 0:
result.Error = "脚本 stdout 不是合法的 crawler.v1 JSON Lines(日志应输出到 stderr"
default:
result.Error = "脚本退出但没有输出任何视频"
}
}
return result
}
result.Error = ""
first := items[0]
switch {
case cfg.SkipMediaProbe:
result.OK = true
case first.MediaLocalFile != "":
// 脚本自己下载到 output_dir 的模式:试跑用的是临时目录,
// 文件已随目录清理,能输出合法 local_file 即视为通过。
result.OK = true
default:
check := probeMediaURL(ctx, cfg, first, firstMediaHeaders)
result.MediaCheck = check
result.OK = check.OK
}
return result
}
func killDryRunProcess(cmd *exec.Cmd) error {
if cmd == nil || cmd.Process == nil {
return nil
}
if err := syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL); err != nil {
if err == syscall.ESRCH {
return nil
}
return cmd.Process.Kill()
}
return nil
}
// probeMediaURL 对视频直链发一个 Range: bytes=0-0 的小请求,
// 验证直链可达(带上脚本给的防盗链 headers 和代理)。
func probeMediaURL(ctx context.Context, cfg DryRunConfig, item DryRunItem, mediaHeaders map[string]string) *DryRunMediaCheck {
check := &DryRunMediaCheck{}
if item.MediaURL == "" {
check.Error = "item 没有视频直链"
return check
}
client := cfg.HTTPClient
if client == nil {
transport := &http.Transport{
Proxy: http.ProxyFromEnvironment,
ResponseHeaderTimeout: dryRunMediaProbeLimit,
}
if err := configureExplicitProxy(transport, cfg.ProxyURL); err != nil {
check.Error = fmt.Sprintf("代理配置无效: %v", err)
return check
}
client = &http.Client{Transport: transport}
}
probeCtx, cancel := context.WithTimeout(ctx, dryRunMediaProbeLimit)
defer cancel()
req, err := http.NewRequestWithContext(probeCtx, http.MethodGet, item.MediaURL, nil)
if err != nil {
check.Error = fmt.Sprintf("视频直链无效: %v", err)
return check
}
req.Header.Set("User-Agent", defaultUserAgent)
req.Header.Set("Range", "bytes=0-0")
if item.DetailURL != "" {
req.Header.Set("Referer", item.DetailURL)
}
for k, v := range mediaHeaders {
k = strings.TrimSpace(k)
if k == "" {
continue
}
req.Header.Set(k, v)
}
resp, err := client.Do(req)
if err != nil {
check.Error = fmt.Sprintf("视频直链请求失败: %v", err)
return check
}
defer resp.Body.Close()
check.Status = resp.StatusCode
check.ContentType = resp.Header.Get("Content-Type")
if cr := resp.Header.Get("Content-Range"); cr != "" {
// Content-Range: bytes 0-0/12345 → 取总大小
if idx := strings.LastIndex(cr, "/"); idx >= 0 {
var total int64
if _, err := fmt.Sscanf(cr[idx+1:], "%d", &total); err == nil {
check.ContentLength = total
}
}
}
if check.ContentLength == 0 && resp.StatusCode == http.StatusOK {
check.ContentLength = resp.ContentLength
}
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusPartialContent {
check.Error = fmt.Sprintf("视频直链返回 HTTP %d", resp.StatusCode)
return check
}
check.OK = true
return check
}
@@ -0,0 +1,153 @@
package scriptcrawler
import (
"context"
"fmt"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strings"
"testing"
"time"
)
func writeDryRunScript(t *testing.T, body string) string {
t.Helper()
dir := t.TempDir()
path := filepath.Join(dir, "crawler.sh")
if err := os.WriteFile(path, []byte("#!/bin/sh\n"+body), 0o755); err != nil {
t.Fatalf("write script: %v", err)
}
return path
}
func TestDryRunCollectsFirstItem(t *testing.T) {
script := writeDryRunScript(t, `
echo '[log] fetching list page' >&2
echo '{"type":"item","item":{"title":"Test Video","media_url":"https://cdn.example.test/v.mp4","source_id":"123","thumbnail_url":"https://cdn.example.test/t.jpg"}}'
echo '{"type":"done","stats":{"emitted":1}}'
`)
result := DryRun(context.Background(), DryRunConfig{
PythonPath: "/bin/sh",
ScriptPath: script,
SkipMediaProbe: true,
})
if !result.OK {
t.Fatalf("ok = false, error = %q, log = %v", result.Error, result.Log)
}
if len(result.Items) != 1 {
t.Fatalf("items = %d, want 1", len(result.Items))
}
item := result.Items[0]
if item.Title != "Test Video" || item.MediaURL != "https://cdn.example.test/v.mp4" || item.SourceID != "123" {
t.Fatalf("item = %+v", item)
}
if len(result.Log) == 0 || !strings.Contains(result.Log[0], "fetching list page") {
t.Fatalf("log tail = %v, want stderr captured", result.Log)
}
}
func TestDryRunProbesMediaURL(t *testing.T) {
var gotRange, gotReferer string
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
gotRange = r.Header.Get("Range")
gotReferer = r.Header.Get("Referer")
w.Header().Set("Content-Type", "video/mp4")
w.Header().Set("Content-Range", "bytes 0-0/4096")
w.WriteHeader(http.StatusPartialContent)
_, _ = w.Write([]byte("x"))
}))
t.Cleanup(srv.Close)
script := writeDryRunScript(t, fmt.Sprintf(
`echo '{"type":"item","title":"Probe Video","media_url":"%s/v.mp4","detail_url":"https://example.test/view"}'`,
srv.URL,
))
result := DryRun(context.Background(), DryRunConfig{
PythonPath: "/bin/sh",
ScriptPath: script,
})
if !result.OK {
t.Fatalf("ok = false, error = %q, mediaCheck = %+v", result.Error, result.MediaCheck)
}
if result.MediaCheck == nil || !result.MediaCheck.OK {
t.Fatalf("mediaCheck = %+v, want ok", result.MediaCheck)
}
if result.MediaCheck.Status != http.StatusPartialContent || result.MediaCheck.ContentLength != 4096 {
t.Fatalf("mediaCheck = %+v, want 206 with total 4096", result.MediaCheck)
}
if gotRange != "bytes=0-0" || gotReferer != "https://example.test/view" {
t.Fatalf("probe headers range=%q referer=%q", gotRange, gotReferer)
}
}
func TestDryRunReportsBrokenMediaURL(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
http.Error(w, "forbidden", http.StatusForbidden)
}))
t.Cleanup(srv.Close)
script := writeDryRunScript(t, fmt.Sprintf(
`echo '{"type":"item","title":"Dead Link","media_url":"%s/v.mp4"}'`,
srv.URL,
))
result := DryRun(context.Background(), DryRunConfig{
PythonPath: "/bin/sh",
ScriptPath: script,
})
if result.OK {
t.Fatal("ok = true, want false for HTTP 403 media url")
}
if result.MediaCheck == nil || result.MediaCheck.OK || result.MediaCheck.Status != http.StatusForbidden {
t.Fatalf("mediaCheck = %+v, want failed 403", result.MediaCheck)
}
if len(result.Items) != 1 {
t.Fatalf("items = %d, want item still returned for debugging", len(result.Items))
}
}
func TestDryRunRejectsNonJSONStdout(t *testing.T) {
script := writeDryRunScript(t, `echo 'plain text progress output'`)
result := DryRun(context.Background(), DryRunConfig{
PythonPath: "/bin/sh",
ScriptPath: script,
SkipMediaProbe: true,
})
if result.OK {
t.Fatal("ok = true, want false for non-JSON stdout")
}
if !strings.Contains(result.Error, "JSON Lines") {
t.Fatalf("error = %q, want JSON Lines hint", result.Error)
}
}
func TestDryRunTimesOut(t *testing.T) {
script := writeDryRunScript(t, `sleep 30`)
start := time.Now()
result := DryRun(context.Background(), DryRunConfig{
PythonPath: "/bin/sh",
ScriptPath: script,
Timeout: 2 * time.Second,
SkipMediaProbe: true,
})
if result.OK {
t.Fatal("ok = true, want false on timeout")
}
if !strings.Contains(result.Error, "超时") {
t.Fatalf("error = %q, want timeout message", result.Error)
}
if elapsed := time.Since(start); elapsed > 10*time.Second {
t.Fatalf("dry run took %s, script was not killed", elapsed)
}
}
func TestDryRunMissingScript(t *testing.T) {
result := DryRun(context.Background(), DryRunConfig{
PythonPath: "/bin/sh",
ScriptPath: filepath.Join(t.TempDir(), "missing.py"),
})
if result.OK || result.Error == "" {
t.Fatalf("result = %+v, want error for missing script", result)
}
}
@@ -0,0 +1,117 @@
package scriptcrawler
import (
"errors"
"fmt"
"os"
"path/filepath"
"strings"
)
const maxCrawlerNameRunes = 80
type Metadata struct {
Name string `json:"name"`
}
func ReadMetadata(scriptPath string) (Metadata, error) {
scriptPath = strings.TrimSpace(scriptPath)
if scriptPath == "" {
return Metadata{}, errors.New("脚本路径为空")
}
if filepath.Ext(scriptPath) != ".py" {
return Metadata{}, errors.New("目前只支持 .py 爬虫脚本")
}
data, err := os.ReadFile(scriptPath)
if err != nil {
return Metadata{}, fmt.Errorf("读取脚本失败: %w", err)
}
return ExtractMetadata(string(data))
}
func ExtractMetadata(source string) (Metadata, error) {
for _, line := range strings.Split(source, "\n") {
trimmed := strings.TrimSpace(line)
if trimmed == "" || strings.HasPrefix(trimmed, "#") {
continue
}
if !strings.HasPrefix(trimmed, "CRAWLER_NAME") {
continue
}
left, right, ok := strings.Cut(trimmed, "=")
if !ok || strings.TrimSpace(left) != "CRAWLER_NAME" {
continue
}
name, ok := parsePythonStringLiteral(right)
if !ok {
return Metadata{}, errors.New(`CRAWLER_NAME 必须是字符串字面量,例如 CRAWLER_NAME = "示例爬虫"`)
}
name = strings.TrimSpace(name)
if name == "" {
return Metadata{}, errors.New("CRAWLER_NAME 不能为空")
}
if len([]rune(name)) > maxCrawlerNameRunes {
return Metadata{}, fmt.Errorf("CRAWLER_NAME 不能超过 %d 个字符", maxCrawlerNameRunes)
}
return Metadata{Name: name}, nil
}
return Metadata{}, errors.New(`脚本必须声明 CRAWLER_NAME,例如 CRAWLER_NAME = "示例爬虫"`)
}
func parsePythonStringLiteral(raw string) (string, bool) {
s := strings.TrimSpace(raw)
if s == "" {
return "", false
}
rawString := false
for len(s) > 0 {
switch s[0] {
case 'r', 'R':
rawString = true
s = strings.TrimSpace(s[1:])
case 'u', 'U', 'b', 'B':
s = strings.TrimSpace(s[1:])
default:
goto parseQuote
}
}
parseQuote:
if len(s) < 2 || (s[0] != '"' && s[0] != '\'') {
return "", false
}
quote := s[0]
var b strings.Builder
escaped := false
for i := 1; i < len(s); i++ {
ch := s[i]
if escaped {
switch {
case rawString:
b.WriteByte('\\')
b.WriteByte(ch)
case ch == 'n':
b.WriteByte('\n')
case ch == 'r':
b.WriteByte('\r')
case ch == 't':
b.WriteByte('\t')
case ch == '\\' || ch == quote || ch == '"' || ch == '\'':
b.WriteByte(ch)
default:
b.WriteByte(ch)
}
escaped = false
continue
}
if ch == '\\' {
escaped = true
continue
}
if ch == quote {
return b.String(), true
}
b.WriteByte(ch)
}
return "", false
}
@@ -0,0 +1,39 @@
package scriptcrawler
import (
"strings"
"testing"
)
func TestExtractMetadataReadsCrawlerName(t *testing.T) {
meta, err := ExtractMetadata(`
# comment
CRAWLER_NAME = "示例爬虫"
`)
if err != nil {
t.Fatalf("extract metadata: %v", err)
}
if meta.Name != "示例爬虫" {
t.Fatalf("name = %q", meta.Name)
}
}
func TestExtractMetadataRejectsMissingCrawlerName(t *testing.T) {
_, err := ExtractMetadata(`print("hello")`)
if err == nil {
t.Fatal("expected error")
}
if !strings.Contains(err.Error(), "CRAWLER_NAME") {
t.Fatalf("error = %v, want CRAWLER_NAME guidance", err)
}
}
func TestExtractMetadataRejectsEmptyCrawlerName(t *testing.T) {
_, err := ExtractMetadata(`CRAWLER_NAME = " "`)
if err == nil {
t.Fatal("expected error")
}
if !strings.Contains(err.Error(), "不能为空") {
t.Fatalf("error = %v, want empty-name error", err)
}
}
+227 -30
View File
@@ -8,6 +8,7 @@ import (
"fmt"
"io"
"log"
"net"
"net/http"
"net/url"
"os"
@@ -20,6 +21,8 @@ import (
"time"
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/mediaasset"
"golang.org/x/net/proxy"
)
// 默认 author/tag 标签,便于在前端筛选 spider91 来源的视频。
@@ -59,8 +62,14 @@ type CrawlerConfig struct {
// DownloadTimeout 限制单条视频/封面下载的耗时。
DownloadTimeout time.Duration
// OnNewVideo 是新视频成功入库后的回调,用于触发 teaser worker。
// OnNewVideo 是新视频成功入库后的回调,用于触发预览视频 worker。
OnNewVideo func(v *catalog.Video)
// OnProgress 在抓取统计变化时触发,用于后台管理页展示实时进度。
OnProgress func(progress CrawlProgress)
// OnCheckedVideo 在 Python 爬虫开始检查一个列表页视频时触发。
OnCheckedVideo func()
// OnExtractedVideo 在 Python 爬虫提取到一个新视频直链时触发。
OnExtractedVideo func()
}
// Crawler 把 Python 爬虫产出包装成 catalog 入库流程。
@@ -79,29 +88,123 @@ func NewCrawler(cfg CrawlerConfig) *Crawler {
cfg.DownloadTimeout = 30 * time.Minute
}
if cfg.HTTPClient == nil {
// 选 proxy 函数:显式 ProxyURL > 环境变量 > 直连
proxyFn := http.ProxyFromEnvironment
if strings.TrimSpace(cfg.ProxyURL) != "" {
if u, err := url.Parse(cfg.ProxyURL); err == nil {
proxyFn = http.ProxyURL(u)
} else {
log.Printf("[spider91] invalid proxy URL %q, falling back to env: %v", cfg.ProxyURL, err)
}
transport := &http.Transport{
Proxy: http.ProxyFromEnvironment,
ResponseHeaderTimeout: 60 * time.Second,
MaxIdleConns: 10,
IdleConnTimeout: 90 * time.Second,
}
if err := configureExplicitProxy(transport, cfg.ProxyURL); err != nil {
log.Printf("[spider91] invalid configured proxy URL, falling back to env: %v", err)
}
cfg.HTTPClient = &http.Client{
// 不限制总下载时长,靠 ctx 控制;只挡 dial / handshake / header
Timeout: 0,
Transport: &http.Transport{
Proxy: proxyFn,
ResponseHeaderTimeout: 60 * time.Second,
MaxIdleConns: 10,
IdleConnTimeout: 90 * time.Second,
},
Timeout: 0,
Transport: transport,
}
}
return &Crawler{cfg: cfg}
}
func configureExplicitProxy(transport *http.Transport, raw string) error {
proxyURL := strings.TrimSpace(raw)
if proxyURL == "" {
return nil
}
u, err := url.Parse(proxyURL)
if err != nil || u.Scheme == "" || u.Host == "" {
return fmt.Errorf("invalid proxy URL")
}
switch strings.ToLower(u.Scheme) {
case "http", "https":
transport.Proxy = http.ProxyURL(u)
transport.DialContext = nil
return nil
case "socks5", "socks5h":
dialContext, err := socksProxyDialContext(u)
if err != nil {
return err
}
transport.Proxy = nil
transport.DialContext = dialContext
return nil
default:
return fmt.Errorf("unsupported proxy scheme %q", u.Scheme)
}
}
func socksProxyDialContext(proxyURL *url.URL) (func(context.Context, string, string) (net.Conn, error), error) {
var auth *proxy.Auth
if proxyURL.User != nil {
username := proxyURL.User.Username()
password, _ := proxyURL.User.Password()
auth = &proxy.Auth{User: username, Password: password}
}
dialer, err := proxy.SOCKS5("tcp", proxyURL.Host, auth, &net.Dialer{Timeout: 60 * time.Second})
if err != nil {
return nil, err
}
remoteDNS := strings.EqualFold(proxyURL.Scheme, "socks5h")
return func(ctx context.Context, network, addr string) (net.Conn, error) {
target := addr
if !remoteDNS {
resolved, err := resolveSocksTarget(ctx, addr)
if err != nil {
return nil, err
}
target = resolved
}
if ctxDialer, ok := dialer.(proxy.ContextDialer); ok {
return ctxDialer.DialContext(ctx, network, target)
}
type result struct {
conn net.Conn
err error
}
ch := make(chan result, 1)
go func() {
conn, err := dialer.Dial(network, target)
ch <- result{conn: conn, err: err}
}()
select {
case <-ctx.Done():
return nil, ctx.Err()
case res := <-ch:
return res.conn, res.err
}
}, nil
}
func resolveSocksTarget(ctx context.Context, addr string) (string, error) {
host, port, err := net.SplitHostPort(addr)
if err != nil || net.ParseIP(host) != nil {
return addr, nil
}
ips, err := net.DefaultResolver.LookupIPAddr(ctx, host)
if err != nil {
return "", err
}
ip := selectSocksTargetIP(ips)
if ip == nil {
return "", fmt.Errorf("resolve %s: no address", host)
}
return net.JoinHostPort(ip.String(), port), nil
}
func selectSocksTargetIP(ips []net.IPAddr) net.IP {
for _, addr := range ips {
if ip4 := addr.IP.To4(); ip4 != nil {
return ip4
}
}
for _, addr := range ips {
if addr.IP != nil {
return addr.IP
}
}
return nil
}
// CrawlResult 汇总一次 RunOnce 的结果。
type CrawlResult struct {
// TargetNew 是本次 RunOnce 的目标新增数(来自 drive.Credentials.target_new)。
@@ -122,6 +225,16 @@ type CrawlResult struct {
SeenFile string
}
// CrawlProgress 是 RunOnce 过程中可安全对外发布的实时计数。
type CrawlProgress struct {
TargetNew int
TotalEntries int
NewVideos int
Skipped int
Failed int
SeenSnapshot int
}
// spiderVideoEntry 对应 spider_91porn.py 输出 JSON 中的单条视频。
type spiderVideoEntry struct {
Title string `json:"title"`
@@ -139,7 +252,7 @@ type spiderVideoEntry struct {
// 3. Go 端 bufio.Scanner 按行读:每行立即下载视频和封面、入库。
// 这样 "Python 翻页找下一个" 与 "Go 下载当前一个" 在时间上重叠,缩短整轮耗时;
// 更重要的是不会让前几个下载耽误后面签名链接 e= 过期。
// 4. 全部消费完 + 子进程退出 → 返回 CrawlResult。teaser 不在此处入队,
// 4. 全部消费完 + 子进程退出 → 返回 CrawlResult。预览视频不在此处入队,
// 由调用方 (App.runSpider91Crawl) 在 RunOnce 后统一调 enqueueDriveGeneration。
//
// targetNew <= 0 会被规范化成 spider91DefaultTargetNew15)。
@@ -169,6 +282,20 @@ func (c *Crawler) RunOnce(ctx context.Context, targetNew int) (*CrawlResult, err
result := &CrawlResult{TargetNew: targetNew, StartedAt: time.Now()}
defer func() { result.FinishedAt = time.Now() }()
emitProgress := func() {
if c.cfg.OnProgress == nil {
return
}
c.cfg.OnProgress(CrawlProgress{
TargetNew: result.TargetNew,
TotalEntries: result.TotalEntries,
NewVideos: result.NewVideos,
Skipped: result.Skipped,
Failed: result.Failed,
SeenSnapshot: result.SeenSnapshot,
})
}
emitProgress()
// 1. 准备 .crawl/ 目录 + 已知源视频 ID 列表
//
@@ -194,6 +321,7 @@ func (c *Crawler) RunOnce(ctx context.Context, targetNew int) (*CrawlResult, err
return result, fmt.Errorf("spider91 crawler: build seen list: %w", err)
}
result.SeenSnapshot = seenCount
emitProgress()
// 2-3. 启动 Python 爬虫(流式 stdout 协议),并边读边处理。
//
@@ -224,9 +352,11 @@ func (c *Crawler) RunOnce(ctx context.Context, targetNew int) (*CrawlResult, err
continue
}
result.TotalEntries++
emitProgress()
sourceID := sourceIDForItem(item)
if sourceID == "" || strings.TrimSpace(item.VideoURL) == "" {
result.Failed++
emitProgress()
continue
}
if result.NewVideos >= targetNew {
@@ -234,16 +364,31 @@ func (c *Crawler) RunOnce(ctx context.Context, targetNew int) (*CrawlResult, err
break
}
videoID := buildVideoID(c.cfg.Driver.ID(), sourceID)
deleted, err := c.cfg.Catalog.IsVideoDeleted(ctx, videoID)
if err != nil {
log.Printf("[spider91] drive=%s viewkey=%s source_id=%s check deleted: %v", c.cfg.Driver.ID(), item.Viewkey, sourceID, err)
result.Failed++
emitProgress()
continue
}
if deleted {
result.Skipped++
emitProgress()
continue
}
if existing, _ := c.cfg.Catalog.GetVideo(ctx, videoID); existing != nil {
result.Skipped++
emitProgress()
continue
}
if perr := c.processOne(ctx, videoID, item); perr != nil {
log.Printf("[spider91] drive=%s viewkey=%s source_id=%s failed: %v", c.cfg.Driver.ID(), item.Viewkey, sourceID, perr)
result.Failed++
emitProgress()
continue
}
result.NewVideos++
emitProgress()
}
if scerr := scanner.Err(); scerr != nil {
log.Printf("[spider91] drive=%s stdout scan: %v", c.cfg.Driver.ID(), scerr)
@@ -324,6 +469,16 @@ func (c *Crawler) startSpiderTargetNew(ctx context.Context, targetNew int, seenP
if c.cfg.WorkDir != "" {
cmd.Dir = c.cfg.WorkDir
}
if proxyURL := strings.TrimSpace(c.cfg.ProxyURL); proxyURL != "" {
cmd.Env = append(os.Environ(),
"HTTP_PROXY="+proxyURL,
"HTTPS_PROXY="+proxyURL,
"http_proxy="+proxyURL,
"https_proxy="+proxyURL,
"NO_PROXY=",
"no_proxy=",
)
}
stdout, err := cmd.StdoutPipe()
if err != nil {
return nil, nil, fmt.Errorf("stdout pipe: %w", err)
@@ -341,12 +496,12 @@ func (c *Crawler) startSpiderTargetNew(ctx context.Context, targetNew int, seenP
return nil, nil, fmt.Errorf("start: %w", err)
}
// stderr 转发到 backend log。子进程退出时 reader 自动 EOFgoroutine 自然结束。
go forwardSpiderLog(c.cfg.Driver.ID(), stderr)
go forwardSpiderLog(c.cfg.Driver.ID(), stderr, c.cfg.OnCheckedVideo, c.cfg.OnExtractedVideo)
return cmd, stdout, nil
}
// forwardSpiderLog 把 Python stderr 逐行转发到 backend log,便于调试。
func forwardSpiderLog(driveID string, r io.Reader) {
func forwardSpiderLog(driveID string, r io.Reader, onCheckedVideo func(), onExtractedVideo func()) {
scanner := bufio.NewScanner(r)
scanner.Buffer(make([]byte, 64*1024), 1024*1024)
for scanner.Scan() {
@@ -355,9 +510,23 @@ func forwardSpiderLog(driveID string, r io.Reader) {
continue
}
log.Printf("[spider91:py] drive=%s %s", driveID, line)
if onCheckedVideo != nil && isSpider91CheckedVideoLogLine(line) {
onCheckedVideo()
}
if onExtractedVideo != nil && isSpider91ExtractedVideoLogLine(line) {
onExtractedVideo()
}
}
}
func isSpider91CheckedVideoLogLine(line string) bool {
return checkedVideoLogRE.MatchString(line)
}
func isSpider91ExtractedVideoLogLine(line string) bool {
return strings.Contains(line, "[OK] 成功提取视频直链")
}
// processOne 处理单个 91 源视频:下载视频 + 封面 + 复制封面 + 入库。
// 任一步失败会清理已写入的临时文件,不留半成品。
func (c *Crawler) processOne(ctx context.Context, videoID string, item spiderVideoEntry) error {
@@ -419,7 +588,7 @@ func (c *Crawler) processOne(ctx context.Context, videoID string, item spiderVid
log.Printf("[spider91] drive=%s mkdir common thumbs: %v", c.cfg.Driver.ID(), err)
thumbReady = false
} else {
dst := filepath.Join(c.cfg.CommonThumbDir, videoID+".jpg")
dst := mediaasset.ThumbnailPathInDir(c.cfg.CommonThumbDir, videoID)
if err := copyFileAtomic(thumbPath, dst); err != nil {
log.Printf("[spider91] drive=%s viewkey=%s source_id=%s copy thumb to common dir: %v", c.cfg.Driver.ID(), viewkey, sourceID, err)
thumbReady = false
@@ -427,6 +596,17 @@ func (c *Crawler) processOne(ctx context.Context, videoID string, item spiderVid
}
}
title := strings.TrimSpace(item.Title)
if title == "" {
title = sourceID
}
tags := []string{DefaultTag}
if matched, err := c.cfg.Catalog.MatchTags(ctx, title+" "+DefaultAuthor); err == nil {
tags = mergeCatalogTags(tags, matched)
} else {
log.Printf("[spider91] drive=%s viewkey=%s source_id=%s match tags: %v", c.cfg.Driver.ID(), viewkey, sourceID, err)
}
// 入库
now := time.Now()
v := &catalog.Video{
@@ -434,9 +614,9 @@ func (c *Crawler) processOne(ctx context.Context, videoID string, item spiderVid
DriveID: c.cfg.Driver.ID(),
FileID: videoFile,
FileName: videoFile,
Title: strings.TrimSpace(item.Title),
Title: title,
Author: DefaultAuthor,
Tags: []string{DefaultTag},
Tags: tags,
Ext: strings.TrimPrefix(videoExt, "."),
Quality: "HD",
Size: videoSize,
@@ -445,9 +625,6 @@ func (c *Crawler) processOne(ctx context.Context, videoID string, item spiderVid
CreatedAt: now,
UpdatedAt: now,
}
if v.Title == "" {
v.Title = sourceID
}
if thumbReady {
// 设了 ThumbnailURL 后 thumb worker 会跳过这条视频,
// 不再尝试用 ffmpeg 抽帧(封面已经是网站原图)。
@@ -463,8 +640,7 @@ func (c *Crawler) processOne(ctx context.Context, videoID string, item spiderVid
// 网站封面下载失败的视频:spider91 drive 的 thumb worker 按设计不
// 处理 spider91 视频(封面应是网站原图直接保存),所以没人接手。
// 显式标 'failed' 让 CountVideosNeedingThumbnail 排除(条件 status
// != 'failed'),否则 enqueueDriveGeneration → waitForThumbnailsBeforePreview
// 会因为 count > 0 把 teaser 入队永远卡在等待循环里。
// != 'failed'),避免后续封面补队列一直重复捞到这条视频。
_ = c.cfg.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{
ThumbnailStatus: "failed",
})
@@ -723,9 +899,10 @@ func spider91CookieHeader(cookies []*http.Cookie) string {
}
var (
strencode2RE = regexp.MustCompile(`strencode2\(["']([^"']+)["']\)`)
srcAttrRE = regexp.MustCompile(`src=['"]([^'"]+)['"]`)
mp4URLRE = regexp.MustCompile(`https?://[^\s"'<>]+\.mp4[^\s"'<>]*`)
checkedVideoLogRE = regexp.MustCompile(`处理视频\s+\d+/\d+:`)
strencode2RE = regexp.MustCompile(`strencode2\(["']([^"']+)["']\)`)
srcAttrRE = regexp.MustCompile(`src=['"]([^'"]+)['"]`)
mp4URLRE = regexp.MustCompile(`https?://[^\s"'<>]+\.mp4[^\s"'<>]*`)
)
func parseSpider91VideoURL(html string) string {
@@ -889,6 +1066,26 @@ func copyFileAtomic(src, dst string) error {
return os.Rename(tmp, dst)
}
func mergeCatalogTags(lists ...[]string) []string {
out := []string{}
seen := map[string]bool{}
for _, list := range lists {
for _, tag := range list {
tag = strings.TrimSpace(tag)
if tag == "" {
continue
}
key := strings.ToLower(tag)
if seen[key] {
continue
}
seen[key] = true
out = append(out, tag)
}
}
return out
}
// BuildVideoID 给定 driveID + 91 源视频 ID,按统一规则生成 catalog 中 videos.id。
// 与 scanner 用法一致:<kind>-<driveID>-<fileID>。
func BuildVideoID(driveID, sourceID string) string {
@@ -3,6 +3,8 @@ package spider91
import (
"context"
"encoding/json"
"io"
"net"
"net/http"
"net/http/httptest"
"net/url"
@@ -53,7 +55,7 @@ func TestCrawlerRunOnceFullFlow(t *testing.T) {
// 同时仍写 --output 文件作归档。
videoEntries := []map[string]string{
{
"title": "Video One",
"title": "Video One 口交",
"thumb_url": srv.URL + "/thumb/not-120001.jpg",
"video_url": srv.URL + "/videos/120001.mp4",
"viewkey": "vk-001",
@@ -94,6 +96,9 @@ func TestCrawlerRunOnceFullFlow(t *testing.T) {
}); err != nil {
t.Fatalf("upsert drive: %v", err)
}
if _, err := cat.CreateTagAndClassify(context.Background(), "Video One", nil, "user"); err != nil {
t.Fatalf("create user tag: %v", err)
}
var newVideos []*catalog.Video
c := NewCrawler(CrawlerConfig{
@@ -188,6 +193,17 @@ func TestCrawlerRunOnceFullFlow(t *testing.T) {
if !hasDefaultTag {
t.Fatalf("video %s tags = %v, want contain %q", videoID, v.Tags, DefaultTag)
}
if sourceID == "120001" {
if !containsString(v.Tags, "口交") {
t.Fatalf("video %s tags = %v, want contain built-in tag 口交", videoID, v.Tags)
}
if !containsString(v.Tags, "Video One") {
t.Fatalf("video %s tags = %v, want contain user tag Video One", videoID, v.Tags)
}
}
if sourceID == "120002" && (containsString(v.Tags, "口交") || containsString(v.Tags, "Video One")) {
t.Fatalf("video %s tags = %v, should not inherit tags from other spider91 videos", videoID, v.Tags)
}
}
// 7. 第二次 RunOnce:源视频 ID 已存在 → 全部 skipped,无新文件下载
@@ -233,14 +249,116 @@ func TestCrawlerRunOnceMissingScript(t *testing.T) {
}
}
func TestCrawlerPassesProxyToSpiderProcess(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("shell-based fake script only on unix")
}
tmp := t.TempDir()
scriptPath := filepath.Join(tmp, "print_proxy_env.sh")
script := `#!/bin/sh
printf 'HTTP_PROXY=%s\n' "$HTTP_PROXY"
printf 'HTTPS_PROXY=%s\n' "$HTTPS_PROXY"
printf 'http_proxy=%s\n' "$http_proxy"
printf 'https_proxy=%s\n' "$https_proxy"
printf 'NO_PROXY=%s\n' "$NO_PROXY"
printf 'no_proxy=%s\n' "$no_proxy"
`
if err := os.WriteFile(scriptPath, []byte(script), 0o755); err != nil {
t.Fatalf("write script: %v", err)
}
proxyURL := "socks5h://proxy.local:1080"
drv := New(Config{ID: "proxy-drive", RootDir: filepath.Join(tmp, "proxy-drive")})
c := NewCrawler(CrawlerConfig{
Driver: drv,
PythonPath: "sh",
ScriptPath: scriptPath,
ProxyURL: proxyURL,
})
cmd, stdout, err := c.startSpiderTargetNew(
context.Background(),
1,
filepath.Join(tmp, "seen.txt"),
filepath.Join(tmp, "out.json"),
)
if err != nil {
t.Fatalf("startSpiderTargetNew: %v", err)
}
raw, err := io.ReadAll(stdout)
if err != nil {
t.Fatalf("read stdout: %v", err)
}
if err := cmd.Wait(); err != nil {
t.Fatalf("wait: %v", err)
}
want := strings.Join([]string{
"HTTP_PROXY=" + proxyURL,
"HTTPS_PROXY=" + proxyURL,
"http_proxy=" + proxyURL,
"https_proxy=" + proxyURL,
"NO_PROXY=",
"no_proxy=",
}, "\n") + "\n"
if string(raw) != want {
t.Fatalf("proxy env = %q, want %q", string(raw), want)
}
}
func TestConfigureExplicitProxySupportsSocksSchemes(t *testing.T) {
for _, raw := range []string{
"socks5://127.0.0.1:1080",
"socks5h://proxy-user:proxy-pass@127.0.0.1:1080",
} {
t.Run(raw, func(t *testing.T) {
transport := &http.Transport{Proxy: http.ProxyFromEnvironment}
if err := configureExplicitProxy(transport, raw); err != nil {
t.Fatalf("configureExplicitProxy: %v", err)
}
if transport.Proxy != nil {
t.Fatalf("Transport.Proxy should be nil for SOCKS proxy")
}
if transport.DialContext == nil {
t.Fatalf("Transport.DialContext should be set for SOCKS proxy")
}
})
}
transport := &http.Transport{Proxy: http.ProxyFromEnvironment}
if err := configureExplicitProxy(transport, "http://127.0.0.1:7890"); err != nil {
t.Fatalf("configureExplicitProxy http: %v", err)
}
if transport.Proxy == nil {
t.Fatalf("Transport.Proxy should be set for HTTP proxy")
}
if transport.DialContext != nil {
t.Fatalf("Transport.DialContext should not be set for HTTP proxy")
}
if err := configureExplicitProxy(&http.Transport{}, "ftp://127.0.0.1:21"); err == nil {
t.Fatalf("expected unsupported proxy scheme error")
}
}
func TestSelectSocksTargetIPPrefersIPv4(t *testing.T) {
got := selectSocksTargetIP([]net.IPAddr{
{IP: net.ParseIP("2606:4700:20::681a:229")},
{IP: net.ParseIP("104.26.3.41")},
})
if got == nil || got.String() != "104.26.3.41" {
t.Fatalf("selectSocksTargetIP = %v, want IPv4 104.26.3.41", got)
}
}
// TestCrawlerThumbDownloadFailureMarksStatusFailed 验证:网站封面下载失败时
// crawler 把 thumbnail_status 显式标 'failed',避免 enqueueDriveGeneration 的
// waitForThumbnailsBeforePreview 因为 count > 0 把 teaser 卡死等待
// crawler 把 thumbnail_status 显式标 'failed',避免后续封面补队列一直重复
// 捞到这条 spider91 视频
//
// 历史 bug:之前 thumb 下载失败仅打 logurl=”, status 走 schema DEFAULT 'pending'。
// CountVideosNeedingThumbnail 条件是 url=” AND status != 'failed' → count=1。
// spider91 drive 的 thumb worker 按设计不处理 spider91 视频 → 没人会改 status
// 结果 teaser 永远卡在 [preview] waiting for 1 thumbnails before teaser generation
// spider91 drive 的 thumb worker 按设计不处理 spider91 视频 → 没人会改 status
// 后续补队列会一直认为它还缺封面
func TestCrawlerThumbDownloadFailureMarksStatusFailed(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("shell-based fake script only on unix")
@@ -317,8 +435,7 @@ func TestCrawlerThumbDownloadFailureMarksStatusFailed(t *testing.T) {
// 关键断言:CountVideosNeedingThumbnail 应该返回 0。
// 该函数的 SQL 条件是 `url = '' AND status != 'failed'`;如果 crawler 没把
// status 标 'failed'schema DEFAULT 'pending'),count 就会是 1,外层
// waitForThumbnailsBeforePreview 会因为 count > 0 把 teaser 卡死等待。
// status 标 'failed'schema DEFAULT 'pending'),count 就会是 1
count, err := cat.CountVideosNeedingThumbnail(context.Background(), driveID)
if err != nil {
t.Fatalf("count: %v", err)
@@ -590,6 +707,18 @@ func TestSpider91CookieHeader(t *testing.T) {
}
}
func TestSpider91ProgressLogLineClassifiers(t *testing.T) {
if !isSpider91CheckedVideoLogLine("[2026-06-08 16:49:17] 处理视频 3/24: 标题") {
t.Fatal("checked video log line was not recognized")
}
if isSpider91CheckedVideoLogLine("[2026-06-08 16:49:17] [页 2] 发现 24 个视频") {
t.Fatal("page summary log line should not count as checked video")
}
if !isSpider91ExtractedVideoLogLine("[2026-06-08 16:49:39] [OK] 成功提取视频直链") {
t.Fatal("extracted video log line was not recognized")
}
}
func spider91DetailHTML(videoURL string) string {
fragment := `<video><source src="` + videoURL + `" type="video/mp4"></video>`
return `document.write(strencode2("` + url.PathEscape(fragment) + `"));`
@@ -659,3 +788,12 @@ func buildFakeSpiderScript(entries []map[string]string) string {
sb.WriteString("fi\n")
return sb.String()
}
func containsString(values []string, want string) bool {
for _, value := range values {
if value == want {
return true
}
}
return false
}
+42 -1
View File
@@ -138,7 +138,7 @@ func (d *Driver) Stat(ctx context.Context, fileID string) (*drives.Entry, error)
// StreamURL 返回本地视频文件路径,给 ffmpeg / 上层服务使用。
// 注意:proxy.serve 不能直接处理本地路径,回放要走 api.handleSpider91Video。
// teaser/封面 worker 通过 localPreviewLink 兜底走本地文件,刚好兼容 path 形式的 URL。
// 预览视频/封面 worker 通过 localPreviewLink 兜底走本地文件,刚好兼容 path 形式的 URL。
func (d *Driver) StreamURL(ctx context.Context, fileID string) (*drives.StreamLink, error) {
path, err := d.VideoPath(fileID)
if err != nil {
@@ -167,6 +167,46 @@ func (d *Driver) EnsureDir(ctx context.Context, pathFromRoot string) (string, er
return "", drives.ErrNotSupported
}
func (d *Driver) Remove(ctx context.Context, fileID string) error {
if err := ctx.Err(); err != nil {
return err
}
videoPath, err := d.VideoPath(fileID)
if err != nil {
return err
}
info, err := os.Stat(videoPath)
if err != nil {
if os.IsNotExist(err) {
removeThumbCandidates(d.ThumbPath, strings.TrimSuffix(fileID, filepath.Ext(fileID)))
return nil
}
return err
}
if info.IsDir() {
return errors.New("spider91: refusing to remove directory")
}
if err := os.Remove(videoPath); err != nil && !os.IsNotExist(err) {
return err
}
removeThumbCandidates(d.ThumbPath, strings.TrimSuffix(fileID, filepath.Ext(fileID)))
return nil
}
func removeThumbCandidates(pathFor func(string) (string, error), stem string) {
stem = strings.TrimSpace(stem)
if stem == "" {
return
}
for _, ext := range []string{".jpg", ".jpeg", ".png", ".webp"} {
path, err := pathFor(stem + ext)
if err != nil {
continue
}
_ = os.Remove(path)
}
}
// safeJoin 把 fileID 拼到 root 下,保证最终路径不会逃出 root。
// fileID 必须是单纯的文件名(不含 / 或 .. 等组件)。
func safeJoin(root, fileID string) (string, error) {
@@ -192,3 +232,4 @@ func safeJoin(root, fileID string) (string, error) {
}
var _ drives.Drive = (*Driver)(nil)
var _ drives.Remover = (*Driver)(nil)
+352 -10
View File
@@ -2,19 +2,23 @@ package wopan
import (
"context"
"errors"
"fmt"
"io"
"log"
"net/http"
"os"
"path"
"strings"
"sync"
"time"
sdk "github.com/OpenListTeam/wopan-sdk-go"
"github.com/go-resty/resty/v2"
"github.com/video-site/backend/internal/drives"
)
// Driver 封装联通
// Driver 封装联通
type Driver struct {
id string
rootID string
@@ -23,6 +27,14 @@ type Driver struct {
refreshToken string
client *sdk.WoClient
onTokenUpdate func(access, refresh string)
listMu sync.Mutex
lastListAt time.Time
listInterval time.Duration
listCooldown time.Duration
fileIDMu sync.RWMutex
fidToID map[string]string
}
type Config struct {
@@ -47,6 +59,9 @@ func New(c Config) *Driver {
accessToken: c.AccessToken,
refreshToken: c.RefreshToken,
onTokenUpdate: c.OnTokenUpdate,
listInterval: 800 * time.Millisecond,
listCooldown: 5 * time.Minute,
fidToID: make(map[string]string),
}
}
@@ -78,15 +93,41 @@ func (d *Driver) spaceType() string {
}
func (d *Driver) List(ctx context.Context, dirID string) ([]drives.Entry, error) {
d.listMu.Lock()
defer d.listMu.Unlock()
var result []drives.Entry
pageNum := 0
pageSize := 100
for {
data, err := d.client.QueryAllFiles(d.spaceType(), dirID, pageNum, pageSize, 0, d.familyID)
if err != nil {
return nil, fmt.Errorf("wopan list: %w", err)
var data *sdk.QueryAllFilesData
for attempt := 0; ; attempt++ {
if err := d.waitForListSlotLocked(ctx); err != nil {
return nil, err
}
var err error
data, err = d.client.QueryAllFiles(d.spaceType(), dirID, pageNum, pageSize, 0, d.familyID, func(req *resty.Request) {
req.SetContext(ctx)
})
if err == nil {
break
}
err = wopanRequestError("list", err)
wait, ok := drives.RateLimitRetryAfter(err)
if !ok {
return nil, err
}
if wait <= 0 {
wait = d.listCooldown
}
log.Printf("[wopan] list cooling down drive=%s dir=%s page=%d cooldown=%s attempt=%d err=%v",
d.id, dirID, pageNum, wait, attempt+1, err)
if err := sleepContext(ctx, wait); err != nil {
return nil, err
}
}
for _, f := range data.Files {
d.rememberFileID(f)
result = append(result, fileToEntry(f, dirID))
}
if len(data.Files) < pageSize {
@@ -103,9 +144,11 @@ func (d *Driver) Stat(ctx context.Context, fileID string) (*drives.Entry, error)
}
func (d *Driver) StreamURL(ctx context.Context, fileID string) (*drives.StreamLink, error) {
data, err := d.client.GetDownloadUrlV2([]string{fileID})
data, err := d.client.GetDownloadUrlV2([]string{fileID}, func(req *resty.Request) {
req.SetContext(ctx)
})
if err != nil {
return nil, fmt.Errorf("wopan download url: %w", err)
return nil, wopanRequestError("download url", err)
}
if len(data.List) == 0 {
return nil, fmt.Errorf("wopan download url: empty response")
@@ -142,9 +185,151 @@ func (d *Driver) Upload(ctx context.Context, parentID, name string, r io.Reader,
if err != nil {
return "", fmt.Errorf("wopan upload: %w", err)
}
if fid != "" {
if objectID, err := d.findDeleteFileIDInParent(ctx, parentID, drives.SourceFile{
FileID: fid,
Name: name,
Size: size,
}); err == nil {
d.rememberFIDMapping(fid, objectID)
} else {
log.Printf("[wopan] upload drive=%s parent=%s fid=%s resolve object id: %v", d.id, parentID, fid, err)
}
}
return fid, nil
}
func (d *Driver) Rename(ctx context.Context, fileID, newName string) error {
if d.client == nil {
return fmt.Errorf("wopan rename: driver not initialized")
}
fileID = strings.TrimSpace(fileID)
if fileID == "" {
return fmt.Errorf("wopan rename: empty file id")
}
newName = strings.TrimSpace(newName)
if newName == "" {
return fmt.Errorf("wopan rename: empty new name")
}
renameID := fileID
if cached := d.cachedDeleteFileID(fileID); cached != "" {
renameID = cached
}
if err := d.client.RenameFileOrDirectory(d.spaceType(), 1, renameID, newName, d.familyID, func(req *resty.Request) {
req.SetContext(ctx)
}); err != nil {
return wopanRequestError("rename", err)
}
return nil
}
func (d *Driver) Remove(ctx context.Context, fileID string) error {
if d.client == nil {
return fmt.Errorf("wopan remove: driver not initialized")
}
fileID = strings.TrimSpace(fileID)
if fileID == "" {
return fmt.Errorf("wopan remove: empty file id")
}
deleteID := fileID
if cached := d.cachedDeleteFileID(fileID); cached != "" {
deleteID = cached
}
if err := d.deleteFileByObjectID(ctx, deleteID); err != nil {
return fmt.Errorf("wopan remove: %w", err)
}
return nil
}
func (d *Driver) RemoveSource(ctx context.Context, source drives.SourceFile) error {
if d.client == nil {
return fmt.Errorf("wopan remove: driver not initialized")
}
fileID := strings.TrimSpace(source.FileID)
if fileID == "" {
return fmt.Errorf("wopan remove: empty file id")
}
deleteID, err := d.resolveDeleteFileID(ctx, source)
if err != nil {
return err
}
if err := d.deleteFileByObjectID(ctx, deleteID); err != nil {
return fmt.Errorf("wopan remove: %w", err)
}
return nil
}
func (d *Driver) deleteFileByObjectID(ctx context.Context, fileID string) error {
if err := d.client.DeleteFile(d.spaceType(), nil, []string{fileID}, func(req *resty.Request) {
req.SetContext(ctx)
}); err != nil {
return err
}
return nil
}
func (d *Driver) resolveDeleteFileID(ctx context.Context, source drives.SourceFile) (string, error) {
fileID := strings.TrimSpace(source.FileID)
if fileID == "" {
return "", fmt.Errorf("wopan remove: empty file id")
}
if cached := d.cachedDeleteFileID(fileID); cached != "" {
return cached, nil
}
parentID := strings.TrimSpace(source.ParentID)
if parentID == "" {
return fileID, nil
}
return d.findDeleteFileIDInParent(ctx, parentID, source)
}
func (d *Driver) findDeleteFileIDInParent(ctx context.Context, parentID string, source drives.SourceFile) (string, error) {
d.listMu.Lock()
defer d.listMu.Unlock()
pageNum := 0
pageSize := 100
for {
var data *sdk.QueryAllFilesData
for attempt := 0; ; attempt++ {
if err := d.waitForListSlotLocked(ctx); err != nil {
return "", err
}
var err error
data, err = d.client.QueryAllFiles(d.spaceType(), parentID, pageNum, pageSize, 0, d.familyID, func(req *resty.Request) {
req.SetContext(ctx)
})
if err == nil {
break
}
err = wopanRequestError("resolve delete id", err)
wait, ok := drives.RateLimitRetryAfter(err)
if !ok {
return "", err
}
if wait <= 0 {
wait = d.listCooldown
}
log.Printf("[wopan] resolve delete id cooling down drive=%s parent=%s page=%d cooldown=%s attempt=%d err=%v",
d.id, parentID, pageNum, wait, attempt+1, err)
if err := sleepContext(ctx, wait); err != nil {
return "", err
}
}
for _, f := range data.Files {
d.rememberFileID(f)
if id, ok := deleteFileIDFromWopanFile(f, source); ok {
return id, nil
}
}
if len(data.Files) < pageSize {
break
}
pageNum++
}
return "", fmt.Errorf("wopan remove: source file %q not found under parent %q", source.FileID, parentID)
}
func (d *Driver) EnsureDir(ctx context.Context, pathFromRoot string) (string, error) {
parts := splitPath(pathFromRoot)
currentID := d.rootID
@@ -154,9 +339,11 @@ func (d *Driver) EnsureDir(ctx context.Context, pathFromRoot string) (string, er
return "", err
}
if childID == "" {
resp, err := d.client.CreateDirectory(d.spaceType(), currentID, name, d.familyID)
resp, err := d.client.CreateDirectory(d.spaceType(), currentID, name, d.familyID, func(req *resty.Request) {
req.SetContext(ctx)
})
if err != nil {
return "", fmt.Errorf("wopan mkdir %s: %w", name, err)
return "", wopanRequestError("mkdir "+name, err)
}
childID = resp.Id
}
@@ -190,9 +377,12 @@ func fileToEntry(f *sdk.File, parentID string) drives.Entry {
mod, _ := time.Parse("2006-01-02 15:04:05", f.CreateTime)
name := f.Name
isDir := f.Type == 0
id := f.Fid
id := f.Id
if !isDir && f.Fid != "" {
id = f.Fid
}
if id == "" {
id = f.Id
id = f.Fid
}
if isDir && !strings.HasSuffix(name, "/") {
// 不改 name,只标志
@@ -208,6 +398,156 @@ func fileToEntry(f *sdk.File, parentID string) drives.Entry {
}
}
func (d *Driver) rememberFileID(f *sdk.File) {
if f == nil || f.Type == 0 {
return
}
objectID := strings.TrimSpace(f.Id)
fid := strings.TrimSpace(f.Fid)
if objectID == "" {
return
}
d.fileIDMu.Lock()
if d.fidToID == nil {
d.fidToID = make(map[string]string)
}
d.fidToID[objectID] = objectID
if fid != "" {
d.fidToID[fid] = objectID
}
d.fileIDMu.Unlock()
}
func (d *Driver) rememberFIDMapping(fid, objectID string) {
fid = strings.TrimSpace(fid)
objectID = strings.TrimSpace(objectID)
if fid == "" || objectID == "" {
return
}
d.fileIDMu.Lock()
if d.fidToID == nil {
d.fidToID = make(map[string]string)
}
d.fidToID[fid] = objectID
d.fidToID[objectID] = objectID
d.fileIDMu.Unlock()
}
func (d *Driver) cachedDeleteFileID(fileID string) string {
fileID = strings.TrimSpace(fileID)
if fileID == "" {
return ""
}
d.fileIDMu.RLock()
defer d.fileIDMu.RUnlock()
return strings.TrimSpace(d.fidToID[fileID])
}
func deleteFileIDFromWopanFile(f *sdk.File, source drives.SourceFile) (string, bool) {
if f == nil || f.Type == 0 {
return "", false
}
sourceID := strings.TrimSpace(source.FileID)
if sourceID == "" {
return "", false
}
objectID := strings.TrimSpace(f.Id)
fid := strings.TrimSpace(f.Fid)
if objectID == "" {
return "", false
}
if sourceID != objectID && sourceID != fid {
return "", false
}
return objectID, true
}
func (d *Driver) waitForListSlotLocked(ctx context.Context) error {
if d.listInterval <= 0 || d.lastListAt.IsZero() {
d.lastListAt = time.Now()
return ctx.Err()
}
next := d.lastListAt.Add(d.listInterval)
now := time.Now()
if now.Before(next) {
if err := sleepContext(ctx, next.Sub(now)); err != nil {
return err
}
}
d.lastListAt = time.Now()
return ctx.Err()
}
func sleepContext(ctx context.Context, d time.Duration) error {
if d <= 0 {
return ctx.Err()
}
timer := time.NewTimer(d)
defer timer.Stop()
select {
case <-ctx.Done():
return ctx.Err()
case <-timer.C:
return nil
}
}
func wopanRequestError(step string, err error) error {
if err == nil {
return nil
}
wrapped := fmt.Errorf("wopan %s: %w", step, err)
if isWopanRateLimitError(err) {
return &drives.RateLimitError{
Provider: "wopan",
Err: wrapped,
}
}
return wrapped
}
func isWopanRateLimitError(err error) bool {
if err == nil || errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return false
}
text := strings.ToLower(strings.TrimSpace(err.Error()))
if text == "" {
return false
}
return strings.Contains(text, "status: 429") ||
strings.Contains(text, "status 429") ||
strings.Contains(text, "http status: 429") ||
strings.Contains(text, "status: 500") ||
strings.Contains(text, "status 500") ||
strings.Contains(text, "status: 502") ||
strings.Contains(text, "status 502") ||
strings.Contains(text, "status: 503") ||
strings.Contains(text, "status 503") ||
strings.Contains(text, "status: 504") ||
strings.Contains(text, "status 504") ||
strings.Contains(text, "status: 509") ||
strings.Contains(text, "status 509") ||
strings.Contains(text, "too many request") ||
strings.Contains(text, "too many requests") ||
strings.Contains(text, "rate limit") ||
strings.Contains(text, "rate-limit") ||
strings.Contains(text, "throttl") ||
strings.Contains(text, "blocked") ||
strings.Contains(text, "request has been blocked") ||
strings.Contains(text, "操作频繁") ||
strings.Contains(text, "请求频繁") ||
strings.Contains(text, "请求太频繁") ||
strings.Contains(text, "请求过于频繁") ||
strings.Contains(text, "频率限制") ||
strings.Contains(text, "请求次数过多") ||
strings.Contains(text, "系统繁忙") ||
strings.Contains(text, "服务繁忙") ||
strings.Contains(text, "稍后再试") ||
strings.Contains(text, "稍后重试") ||
strings.Contains(text, "访问被阻断") ||
strings.Contains(text, "风控")
}
func guessMime(name string) string {
ext := strings.ToLower(path.Ext(name))
switch ext {
@@ -229,3 +569,5 @@ func guessMime(name string) string {
// 确保实现接口
var _ drives.Drive = (*Driver)(nil)
var _ drives.Remover = (*Driver)(nil)
var _ drives.SourceRemover = (*Driver)(nil)
@@ -0,0 +1,113 @@
package wopan
import (
"errors"
"testing"
sdk "github.com/OpenListTeam/wopan-sdk-go"
"github.com/video-site/backend/internal/drives"
)
func TestFileToEntryUsesDirectoryIDAndFileFID(t *testing.T) {
dir := fileToEntry(&sdk.File{
Id: "dir-object-id",
Fid: "0",
Type: 0,
Name: "collection",
}, "root")
if !dir.IsDir {
t.Fatal("directory entry IsDir = false")
}
if dir.ID != "dir-object-id" {
t.Fatalf("directory id = %q, want object id", dir.ID)
}
file := fileToEntry(&sdk.File{
Id: "file-object-id",
Fid: "fid/with/slash",
Type: 1,
Name: "clip.mp4",
Size: 123,
}, "dir-object-id")
if file.IsDir {
t.Fatal("file entry IsDir = true")
}
if file.ID != "fid/with/slash" {
t.Fatalf("file id = %q, want fid for download", file.ID)
}
}
func TestDeleteFileIDFromWopanFileUsesObjectIDForFID(t *testing.T) {
got, ok := deleteFileIDFromWopanFile(&sdk.File{
Id: "file-object-id",
Fid: "fid/with/slash",
Type: 1,
Name: "clip.mp4",
Size: 123,
}, drives.SourceFile{
FileID: "fid/with/slash",
Name: "clip.mp4",
Size: 123,
})
if !ok {
t.Fatal("delete file id not resolved")
}
if got != "file-object-id" {
t.Fatalf("delete file id = %q, want object id", got)
}
}
func TestDeleteFileIDFromWopanFileAcceptsObjectID(t *testing.T) {
got, ok := deleteFileIDFromWopanFile(&sdk.File{
Id: "file-object-id",
Fid: "fid-1",
Type: 1,
Name: "clip.mp4",
Size: 123,
}, drives.SourceFile{
FileID: "file-object-id",
Name: "clip.mp4",
Size: 123,
})
if !ok {
t.Fatal("delete file id not resolved")
}
if got != "file-object-id" {
t.Fatalf("delete file id = %q, want object id", got)
}
}
func TestDeleteFileIDFromWopanFileRejectsIDMismatch(t *testing.T) {
if _, ok := deleteFileIDFromWopanFile(&sdk.File{
Id: "file-object-id",
Fid: "fid-1",
Type: 1,
Name: "clip.mp4",
Size: 123,
}, drives.SourceFile{
FileID: "other-fid",
Name: "clip.mp4",
Size: 123,
}); ok {
t.Fatal("delete file id resolved despite id mismatch")
}
}
func TestWopanRequestErrorWrapsRateLimit(t *testing.T) {
err := wopanRequestError("list", errors.New("request failed with status: 429 Too Many Requests"))
var rateLimit *drives.RateLimitError
if !errors.As(err, &rateLimit) {
t.Fatalf("error = %T %[1]v, want RateLimitError", err)
}
if rateLimit.Provider != "wopan" {
t.Fatalf("provider = %q, want wopan", rateLimit.Provider)
}
}
func TestWopanRequestErrorLeavesNormalErrors(t *testing.T) {
err := wopanRequestError("download url", errors.New("invalid access token"))
var rateLimit *drives.RateLimitError
if errors.As(err, &rateLimit) {
t.Fatalf("error = %T %[1]v, want non-rate-limit error", err)
}
}
+349
View File
@@ -0,0 +1,349 @@
package wopan
import (
"context"
"encoding/json"
"errors"
"fmt"
"net/http"
"sort"
"strconv"
"strings"
"time"
"github.com/go-resty/resty/v2"
)
const (
defaultQRCodeAPIBase = "https://panservice.mail.wo.cn/wohome/open/v1/QRCode"
defaultQRCodeClient = "1001000021"
)
type QRConfig struct {
APIBaseURL string
HTTPClient *http.Client
Now func() time.Time
}
type QRClient struct {
apiBase string
client *resty.Client
now func() time.Time
}
type QRCodeSession struct {
UUID string `json:"uuid"`
QRImageDataURL string `json:"qrImageDataUrl"`
ExpiresAt string `json:"expiresAt,omitempty"`
}
type QRCodeStatus struct {
State int `json:"state"`
StatusText string `json:"statusText"`
AccessToken string `json:"accessToken,omitempty"`
RefreshToken string `json:"refreshToken,omitempty"`
FamilyID string `json:"familyID,omitempty"`
}
func NewQRClient(c QRConfig) *QRClient {
apiBase := strings.TrimRight(strings.TrimSpace(c.APIBaseURL), "/")
if apiBase == "" {
apiBase = defaultQRCodeAPIBase
}
httpClient := c.HTTPClient
if httpClient == nil {
httpClient = &http.Client{Timeout: 20 * time.Second}
}
now := c.Now
if now == nil {
now = time.Now
}
return &QRClient{
apiBase: apiBase,
client: resty.NewWithClient(httpClient).
SetTimeout(20*time.Second).
SetHeader("Accept", "application/json"),
now: now,
}
}
func (c *QRClient) Generate(ctx context.Context) (QRCodeSession, error) {
var envelope qrEnvelope
res, err := c.request(ctx).
SetResult(&envelope).
Get(c.apiBase + "/generate")
if err != nil {
return QRCodeSession{}, err
}
if res.IsError() {
return QRCodeSession{}, qrAPIError(envelope.message(), res.StatusCode())
}
var result qrGenerateResult
if err := decodeResult(envelope.Result, &result); err != nil {
return QRCodeSession{}, err
}
result.UUID = strings.TrimSpace(result.UUID)
result.Image = strings.TrimSpace(result.Image)
if result.UUID == "" {
return QRCodeSession{}, errors.New("wopan qr: empty uuid")
}
if result.Image == "" {
return QRCodeSession{}, errors.New("wopan qr: empty image")
}
return QRCodeSession{
UUID: result.UUID,
QRImageDataURL: qrImageDataURL(result.Image),
ExpiresAt: c.now().Add(60 * time.Second).Format(time.RFC3339),
}, nil
}
func (c *QRClient) Poll(ctx context.Context, uuid string) (QRCodeStatus, error) {
uuid = strings.TrimSpace(uuid)
if uuid == "" {
return QRCodeStatus{}, errors.New("uuid is required")
}
var envelope qrEnvelope
res, err := c.request(ctx).
SetQueryParam("uuid", uuid).
SetResult(&envelope).
Get(c.apiBase + "/query")
if err != nil {
return QRCodeStatus{}, err
}
if res.IsError() {
return QRCodeStatus{}, qrAPIError(envelope.message(), res.StatusCode())
}
result, err := decodeResultMap(envelope.Result)
if err != nil {
return QRCodeStatus{}, err
}
state := intValue(result["state"])
status := QRCodeStatus{
State: state,
StatusText: qrStateText(state),
}
if state != 3 {
return status, nil
}
status.AccessToken = findStringByKeys(result, "access_token", "accessToken", "token", "tokenValue")
status.RefreshToken = findStringByKeys(result, "refresh_token", "refreshToken")
status.FamilyID = findStringByKeys(result, "family_id", "familyId", "familyID", "defaultFamilyId", "defaultHomeId", "homeId")
if status.AccessToken == "" || status.RefreshToken == "" {
missing := make([]string, 0, 2)
if status.AccessToken == "" {
missing = append(missing, "access_token")
}
if status.RefreshToken == "" {
missing = append(missing, "refresh_token")
}
return QRCodeStatus{}, fmt.Errorf("wopan qr: login succeeded but missing %s; available keys: %s",
strings.Join(missing, ", "), strings.Join(collectJSONKeys(result), ", "))
}
return status, nil
}
func (c *QRClient) request(ctx context.Context) *resty.Request {
return c.client.R().
SetContext(ctx).
SetHeaders(map[string]string{
"client-id": defaultQRCodeClient,
"x-yp-client-id": defaultQRCodeClient,
"Accept": "application/json",
"Accept-Language": "zh-CN,zh;q=0.9",
})
}
type qrEnvelope struct {
Meta qrMeta `json:"meta"`
Result json.RawMessage `json:"result"`
Code any `json:"code,omitempty"`
Message string `json:"message,omitempty"`
Msg string `json:"msg,omitempty"`
}
type qrMeta struct {
Code any `json:"code,omitempty"`
Message string `json:"message,omitempty"`
Msg string `json:"msg,omitempty"`
}
type qrGenerateResult struct {
UUID string `json:"uuid"`
Image string `json:"image"`
}
func (e qrEnvelope) message() string {
for _, s := range []string{e.Message, e.Msg, e.Meta.Message, e.Meta.Msg} {
if strings.TrimSpace(s) != "" {
return strings.TrimSpace(s)
}
}
return ""
}
func decodeResult(raw json.RawMessage, dst any) error {
if len(raw) == 0 || string(raw) == "null" {
return errors.New("wopan qr: empty result")
}
if err := json.Unmarshal(raw, dst); err != nil {
return fmt.Errorf("wopan qr: decode result: %w", err)
}
return nil
}
func decodeResultMap(raw json.RawMessage) (map[string]any, error) {
var result map[string]any
if err := decodeResult(raw, &result); err != nil {
return nil, err
}
if result == nil {
return nil, errors.New("wopan qr: empty result")
}
return result, nil
}
func qrImageDataURL(image string) string {
image = strings.TrimSpace(image)
if strings.HasPrefix(strings.ToLower(image), "data:image/") {
return image
}
return "data:image/png;base64," + image
}
func qrAPIError(message string, httpStatus int) error {
message = strings.TrimSpace(message)
if message == "" {
message = fmt.Sprintf("HTTP %d", httpStatus)
}
return errors.New(message)
}
func qrStateText(state int) string {
switch state {
case 1:
return "等待扫码"
case 2:
return "已扫码,请在联通网盘 App 确认"
case 3:
return "登录成功"
case 4:
return "二维码已过期"
default:
return "未知状态"
}
}
func intValue(v any) int {
switch x := v.(type) {
case int:
return x
case int64:
return int(x)
case float64:
return int(x)
case json.Number:
n, _ := x.Int64()
return int(n)
case string:
n, _ := strconv.Atoi(strings.TrimSpace(x))
return n
default:
return 0
}
}
func findStringByKeys(v any, keys ...string) string {
targets := make(map[string]struct{}, len(keys))
for _, key := range keys {
targets[normalizeJSONKey(key)] = struct{}{}
}
return findStringByNormalizedKeys(v, targets)
}
func findStringByNormalizedKeys(v any, targets map[string]struct{}) string {
switch x := v.(type) {
case map[string]any:
for key, value := range x {
if _, ok := targets[normalizeJSONKey(key)]; ok {
if s := stringValue(value); s != "" {
return s
}
}
}
for _, value := range x {
if s := findStringByNormalizedKeys(value, targets); s != "" {
return s
}
}
case []any:
for _, value := range x {
if s := findStringByNormalizedKeys(value, targets); s != "" {
return s
}
}
}
return ""
}
func stringValue(v any) string {
switch x := v.(type) {
case string:
return strings.TrimSpace(x)
case int:
return strconv.Itoa(x)
case int64:
return strconv.FormatInt(x, 10)
case float64:
if x == float64(int64(x)) {
return strconv.FormatInt(int64(x), 10)
}
return strconv.FormatFloat(x, 'f', -1, 64)
case json.Number:
return strings.TrimSpace(x.String())
default:
return ""
}
}
func normalizeJSONKey(key string) string {
key = strings.ToLower(strings.TrimSpace(key))
key = strings.ReplaceAll(key, "_", "")
key = strings.ReplaceAll(key, "-", "")
key = strings.ReplaceAll(key, " ", "")
return key
}
func collectJSONKeys(v any) []string {
seen := map[string]struct{}{}
var walk func(any)
walk = func(value any) {
switch x := value.(type) {
case map[string]any:
for key, child := range x {
if strings.TrimSpace(key) != "" {
seen[key] = struct{}{}
}
walk(child)
}
case []any:
for _, child := range x {
walk(child)
}
}
}
walk(v)
keys := make([]string, 0, len(seen))
for key := range seen {
keys = append(keys, key)
}
sort.Strings(keys)
if len(keys) > 16 {
keys = append(keys[:16], "...")
}
return keys
}
+128
View File
@@ -0,0 +1,128 @@
package wopan
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"strings"
"testing"
)
func TestQRCodeGenerateUsesServiceImage(t *testing.T) {
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path != "/QRCode/generate" {
http.NotFound(w, r)
return
}
if r.Header.Get("client-id") != defaultQRCodeClient {
t.Fatalf("client-id = %q, want %q", r.Header.Get("client-id"), defaultQRCodeClient)
}
if r.Header.Get("x-yp-client-id") != defaultQRCodeClient {
t.Fatalf("x-yp-client-id = %q, want %q", r.Header.Get("x-yp-client-id"), defaultQRCodeClient)
}
_ = json.NewEncoder(w).Encode(map[string]any{
"meta": map[string]string{"code": "0000", "message": "ok"},
"result": map[string]string{
"uuid": "uuid-1",
"image": "iVBORw0KGgo=",
},
})
}))
t.Cleanup(api.Close)
got, err := NewQRClient(QRConfig{APIBaseURL: api.URL + "/QRCode"}).Generate(context.Background())
if err != nil {
t.Fatalf("Generate() error = %v", err)
}
if got.UUID != "uuid-1" {
t.Fatalf("uuid = %q, want uuid-1", got.UUID)
}
if got.QRImageDataURL != "data:image/png;base64,iVBORw0KGgo=" {
t.Fatalf("qrImageDataUrl = %q, want PNG data URL", got.QRImageDataURL)
}
if got.ExpiresAt == "" {
t.Fatalf("expiresAt is empty")
}
}
func TestQRCodePollPending(t *testing.T) {
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path != "/QRCode/query" {
http.NotFound(w, r)
return
}
if r.URL.Query().Get("uuid") != "uuid-1" {
t.Fatalf("uuid query = %q, want uuid-1", r.URL.Query().Get("uuid"))
}
_ = json.NewEncoder(w).Encode(map[string]any{
"meta": map[string]string{"code": "0000", "message": "ok"},
"result": map[string]any{
"state": 1,
"token": nil,
"refreshToken": nil,
},
})
}))
t.Cleanup(api.Close)
got, err := NewQRClient(QRConfig{APIBaseURL: api.URL + "/QRCode"}).Poll(context.Background(), "uuid-1")
if err != nil {
t.Fatalf("Poll() error = %v", err)
}
if got.State != 1 || got.StatusText != "等待扫码" || got.AccessToken != "" || got.RefreshToken != "" {
t.Fatalf("status = %#v, want pending without tokens", got)
}
}
func TestQRCodePollSuccessMapsTokenFields(t *testing.T) {
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
if r.URL.Path != "/QRCode/query" {
http.NotFound(w, r)
return
}
_ = json.NewEncoder(w).Encode(map[string]any{
"meta": map[string]string{"code": "0000", "message": "ok"},
"result": map[string]any{
"state": 3,
"token": "access-1",
"refreshToken": "refresh-1",
},
})
}))
t.Cleanup(api.Close)
got, err := NewQRClient(QRConfig{APIBaseURL: api.URL + "/QRCode"}).Poll(context.Background(), "uuid-1")
if err != nil {
t.Fatalf("Poll() error = %v", err)
}
if got.State != 3 || got.AccessToken != "access-1" || got.RefreshToken != "refresh-1" {
t.Fatalf("status = %#v, want token and refreshToken mapped", got)
}
}
func TestQRCodePollSuccessReportsMissingTokenKeys(t *testing.T) {
api := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(map[string]any{
"meta": map[string]string{"code": "0000", "message": "ok"},
"result": map[string]any{
"state": 3,
"user": map[string]string{"name": "demo"},
},
})
}))
t.Cleanup(api.Close)
_, err := NewQRClient(QRConfig{APIBaseURL: api.URL + "/QRCode"}).Poll(context.Background(), "uuid-1")
if err == nil {
t.Fatal("Poll() error is nil, want missing token error")
}
if !strings.Contains(err.Error(), "missing access_token, refresh_token") ||
!strings.Contains(err.Error(), "available keys") {
t.Fatalf("error = %q, want missing token keys", err.Error())
}
}
+575
View File
@@ -0,0 +1,575 @@
package fingerprint
import (
"context"
"crypto/sha256"
"encoding/hex"
"errors"
"fmt"
"io"
"log"
"net/http"
"net/url"
"os"
"strconv"
"strings"
"sync"
"time"
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/drives"
)
const (
defaultSampleSizeBytes int64 = 512 * 1024
defaultFullHashMaxSize int64 = 8 * 1024 * 1024
defaultCooldown = 5 * time.Minute
defaultWorkerQueueSize = 10000
)
type Config struct {
SampleSizeBytes int64
FullHashMaxSize int64
RateLimitCooldown time.Duration
HTTPClient *http.Client
}
type Worker struct {
Catalog *catalog.Catalog
Drive drives.Drive
Config Config
ch chan *catalog.Video
queue videoQueue
activity taskActivity
cooldown cooldownState
http *http.Client
}
type TaskStatus struct {
State string
CurrentTitle string
QueueLength int
CooldownUntil time.Time
}
func NewWorker(cat *catalog.Catalog, drv drives.Drive, cfg Config) *Worker {
hc := cfg.HTTPClient
if hc == nil {
hc = &http.Client{Timeout: 0}
}
if cfg.SampleSizeBytes <= 0 {
cfg.SampleSizeBytes = defaultSampleSizeBytes
}
if cfg.FullHashMaxSize <= 0 {
cfg.FullHashMaxSize = defaultFullHashMaxSize
}
if cfg.RateLimitCooldown <= 0 {
cfg.RateLimitCooldown = defaultCooldown
}
return &Worker{
Catalog: cat,
Drive: drv,
Config: cfg,
ch: make(chan *catalog.Video, defaultWorkerQueueSize),
http: hc,
}
}
func (w *Worker) Enqueue(v *catalog.Video) bool {
if v == nil {
return false
}
if !w.queue.reserve(v.ID) {
return true
}
select {
case w.ch <- v:
return true
default:
w.queue.release(v.ID)
return false
}
}
func (w *Worker) EnqueueBlocking(ctx context.Context, v *catalog.Video) bool {
if v == nil {
return false
}
if !w.queue.reserve(v.ID) {
return true
}
select {
case w.ch <- v:
return true
case <-ctx.Done():
w.queue.release(v.ID)
return false
}
}
func (w *Worker) Run(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
case v := <-w.ch:
w.processQueued(ctx, v)
select {
case <-ctx.Done():
return
case <-time.After(500 * time.Millisecond):
}
}
}
}
func (w *Worker) Status() TaskStatus {
if w == nil {
return TaskStatus{State: "idle"}
}
currentID, currentTitle := w.activity.current()
status := TaskStatus{
State: "idle",
CurrentTitle: currentTitle,
QueueLength: w.queue.lengthExcluding(currentID),
}
if until, ok := w.cooldown.active(time.Now()); ok {
status.State = "cooling"
status.CooldownUntil = until
return status
}
if currentID != "" {
status.State = "generating"
return status
}
if status.QueueLength > 0 {
status.State = "queued"
}
return status
}
// WaitIdle blocks until the fingerprint queue is empty and no item is being processed.
func (w *Worker) WaitIdle(ctx context.Context) error {
if w == nil {
return nil
}
if w.queue.lengthExcluding("") == 0 {
return nil
}
ticker := time.NewTicker(200 * time.Millisecond)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return ctx.Err()
case <-ticker.C:
if w.queue.lengthExcluding("") == 0 {
return nil
}
}
}
}
func (w *Worker) processQueued(ctx context.Context, v *catalog.Video) {
defer w.queue.release(v.ID)
if w.Catalog == nil || w.Drive == nil || v == nil || v.ID == "" {
return
}
current, err := w.Catalog.GetVideo(ctx, v.ID)
if err != nil {
return
}
if current.SampledSHA256 != "" || current.FingerprintStatus == "ready" || current.Hidden {
return
}
w.activity.start(current)
defer w.activity.done()
sum, err := Compute(ctx, w.Drive, current, w.Config, w.http)
if err != nil {
var rl *drives.RateLimitError
if errors.As(err, &rl) {
wait := rl.RetryAfter
if wait <= 0 {
wait = w.Config.RateLimitCooldown
}
until := time.Now().Add(wait)
w.cooldown.set(until)
log.Printf("[fingerprint] drive=%s rate limited; keep video=%s pending and cool down for %s: %v", w.Drive.ID(), current.ID, wait, err)
sleepContext(ctx, wait)
w.cooldown.clear(until)
return
}
log.Printf("[fingerprint] video=%s failed: %v", current.ID, err)
_ = w.Catalog.UpdateVideoFingerprint(ctx, current.ID, "", "failed", err.Error())
return
}
if err := w.Catalog.UpdateVideoFingerprint(ctx, current.ID, sum, "ready", ""); err != nil {
log.Printf("[fingerprint] update video=%s: %v", current.ID, err)
return
}
log.Printf("[fingerprint] video=%s ready sampled_sha256=%s", current.ID, sum)
}
func Compute(ctx context.Context, drv drives.Drive, v *catalog.Video, cfg Config, hc *http.Client) (string, error) {
if drv == nil {
return "", errors.New("fingerprint: nil drive")
}
if v == nil {
return "", errors.New("fingerprint: nil video")
}
if v.Size <= 0 {
return "", errors.New("fingerprint: video size is empty")
}
if cfg.SampleSizeBytes <= 0 {
cfg.SampleSizeBytes = defaultSampleSizeBytes
}
if cfg.FullHashMaxSize <= 0 {
cfg.FullHashMaxSize = defaultFullHashMaxSize
}
if hc == nil {
hc = &http.Client{Timeout: 0}
}
link, err := drv.StreamURL(ctx, v.FileID)
if err != nil {
return "", fmt.Errorf("fingerprint: stream url: %w", err)
}
if link == nil || strings.TrimSpace(link.URL) == "" {
return "", errors.New("fingerprint: empty stream url")
}
ranges := sampleRanges(v.Size, cfg.SampleSizeBytes, cfg.FullHashMaxSize)
h := sha256.New()
writeHashHeader(h, v.Size, ranges)
for _, r := range ranges {
data, err := readRange(ctx, hc, link, r)
if err != nil {
return "", err
}
if int64(len(data)) != r.length {
return "", fmt.Errorf("fingerprint: short sample at %d: got %d want %d", r.start, len(data), r.length)
}
_, _ = h.Write([]byte(fmt.Sprintf("offset=%d length=%d\n", r.start, r.length)))
_, _ = h.Write(data)
_, _ = h.Write([]byte("\n"))
}
return hex.EncodeToString(h.Sum(nil)), nil
}
type byteRange struct {
start int64
length int64
}
func sampleRanges(size, sampleSize, fullHashMax int64) []byteRange {
if size <= fullHashMax {
return []byteRange{{start: 0, length: size}}
}
if sampleSize > size {
sampleSize = size
}
maxStart := size - sampleSize
percents := []int64{0, 20, 40, 60, 80}
out := make([]byteRange, 0, len(percents))
seen := make(map[int64]struct{}, len(percents))
for _, pct := range percents {
start := maxStart * pct / 100
if _, ok := seen[start]; ok {
continue
}
seen[start] = struct{}{}
out = append(out, byteRange{start: start, length: sampleSize})
}
return out
}
func writeHashHeader(w io.Writer, size int64, ranges []byteRange) {
_, _ = fmt.Fprintf(w, "video-site-sampled-sha256-v1\nsize=%d\nsamples=%d\n", size, len(ranges))
}
func readRange(ctx context.Context, hc *http.Client, link *drives.StreamLink, r byteRange) ([]byte, error) {
u, err := url.Parse(link.URL)
if err == nil && (u.Scheme == "http" || u.Scheme == "https") {
return readHTTPRange(ctx, hc, link, r)
}
path := link.URL
if err == nil && u.Scheme == "file" {
path = u.Path
}
return readLocalRange(path, r)
}
func readLocalRange(path string, r byteRange) ([]byte, error) {
f, err := os.Open(path)
if err != nil {
return nil, fmt.Errorf("fingerprint: open local stream: %w", err)
}
defer f.Close()
buf := make([]byte, r.length)
n, err := f.ReadAt(buf, r.start)
if err != nil && !errors.Is(err, io.EOF) {
return nil, fmt.Errorf("fingerprint: read local sample: %w", err)
}
if int64(n) != r.length {
return nil, fmt.Errorf("fingerprint: read local sample at %d: got %d want %d", r.start, n, r.length)
}
return buf, nil
}
func readHTTPRange(ctx context.Context, hc *http.Client, link *drives.StreamLink, r byteRange) ([]byte, error) {
end := r.start + r.length - 1
req, err := http.NewRequestWithContext(ctx, http.MethodGet, link.URL, nil)
if err != nil {
return nil, err
}
for k, vs := range link.Headers {
for _, v := range vs {
req.Header.Add(k, v)
}
}
req.Header.Set("Range", fmt.Sprintf("bytes=%d-%d", r.start, end))
resp, err := hc.Do(req)
if err != nil {
return nil, fmt.Errorf("fingerprint: read remote sample: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode == http.StatusTooManyRequests {
return nil, &drives.RateLimitError{
Provider: "fingerprint",
RetryAfter: parseRetryAfter(resp.Header.Get("Retry-After")),
Err: fmt.Errorf("remote sample rate limited: status=%d", resp.StatusCode),
}
}
if resp.StatusCode != http.StatusPartialContent {
if resp.StatusCode == http.StatusOK && r.start == 0 {
data, err := io.ReadAll(io.LimitReader(resp.Body, r.length+1))
if err != nil {
return nil, err
}
if int64(len(data)) == r.length {
return data, nil
}
}
body, _ := io.ReadAll(io.LimitReader(resp.Body, 64*1024))
if remoteRangeResponseLooksRateLimited(link.URL, resp.StatusCode, body) {
return nil, &drives.RateLimitError{
Provider: "fingerprint",
RetryAfter: parseRetryAfter(resp.Header.Get("Retry-After")),
Err: fmt.Errorf("remote sample rate limited: status=%d body=%s", resp.StatusCode, strings.TrimSpace(string(body))),
}
}
return nil, fmt.Errorf("fingerprint: range request got status=%d for bytes=%d-%d", resp.StatusCode, r.start, end)
}
return io.ReadAll(io.LimitReader(resp.Body, r.length))
}
func remoteRangeResponseLooksRateLimited(rawURL string, status int, body []byte) bool {
if status == http.StatusTooManyRequests {
return true
}
if isWopanMediaURL(rawURL) && (status == http.StatusForbidden || status == http.StatusTooManyRequests ||
status == http.StatusInternalServerError || status == http.StatusBadGateway ||
status == http.StatusServiceUnavailable || status == http.StatusGatewayTimeout ||
status == 509) {
return true
}
text := strings.ToLower(strings.TrimSpace(string(body)))
compact := compactRemoteRangeErrorText(text)
if strings.Contains(text, "too many request") ||
strings.Contains(text, "too many requests") ||
strings.Contains(text, "rate limit") ||
strings.Contains(text, "quota exceeded") ||
strings.Contains(text, "操作频繁") ||
strings.Contains(text, "请求频繁") ||
strings.Contains(text, "请求太频繁") ||
strings.Contains(text, "请求过于频繁") ||
strings.Contains(text, "频率限制") ||
strings.Contains(text, "请求次数过多") ||
strings.Contains(text, "系统繁忙") ||
strings.Contains(text, "服务繁忙") ||
strings.Contains(text, "稍后再试") ||
strings.Contains(text, "稍后重试") ||
strings.Contains(text, "访问被阻断") ||
strings.Contains(text, "风控") ||
strings.Contains(text, "download quota") ||
strings.Contains(text, "sharing rate") ||
strings.Contains(text, "daily limit") ||
strings.Contains(text, "user rate") ||
strings.Contains(text, "usage limit") ||
strings.Contains(compact, "ratelimitexceeded") ||
strings.Contains(compact, "userratelimitexceeded") ||
strings.Contains(compact, "dailylimitexceeded") ||
strings.Contains(compact, "downloadquotaexceeded") ||
strings.Contains(compact, "sharingratelimitexceeded") ||
strings.Contains(compact, "quotaexceeded") ||
strings.Contains(compact, "toomanyrequests") ||
strings.Contains(compact, "usagelimits") {
return true
}
if status == http.StatusForbidden && isGoogleDriveMediaURL(rawURL) {
return true
}
return false
}
func isWopanMediaURL(rawURL string) bool {
u, err := url.Parse(rawURL)
if err != nil {
return false
}
host := strings.ToLower(u.Hostname())
path := strings.ToLower(u.Path)
return (strings.HasSuffix(host, "pan.wo.cn") ||
strings.HasSuffix(host, "smartont.net") ||
strings.Contains(host, "wo.cn")) &&
strings.Contains(path, "/openapi/download")
}
func isGoogleDriveMediaURL(rawURL string) bool {
u, err := url.Parse(rawURL)
if err != nil {
return false
}
host := strings.ToLower(u.Host)
path := strings.ToLower(u.Path)
return strings.Contains(host, "googleapis.com") && strings.Contains(path, "/drive/")
}
func compactRemoteRangeErrorText(text string) string {
replacer := strings.NewReplacer("_", "", "-", "", " ", "", ".", "", ":", "")
return replacer.Replace(strings.ToLower(strings.TrimSpace(text)))
}
func parseRetryAfter(raw string) time.Duration {
raw = strings.TrimSpace(raw)
if raw == "" {
return 0
}
if seconds, err := strconv.Atoi(raw); err == nil && seconds > 0 {
return time.Duration(seconds) * time.Second
}
if when, err := http.ParseTime(raw); err == nil {
d := time.Until(when)
if d > 0 {
return d
}
}
return 0
}
func sleepContext(ctx context.Context, d time.Duration) bool {
if d <= 0 {
return true
}
timer := time.NewTimer(d)
defer timer.Stop()
select {
case <-ctx.Done():
return false
case <-timer.C:
return true
}
}
type taskActivity struct {
mu sync.Mutex
currentID string
currentTitle string
}
func (a *taskActivity) start(v *catalog.Video) {
a.mu.Lock()
defer a.mu.Unlock()
if v == nil {
a.currentID = ""
a.currentTitle = ""
return
}
a.currentID = v.ID
a.currentTitle = v.Title
}
func (a *taskActivity) done() {
a.mu.Lock()
a.currentID = ""
a.currentTitle = ""
a.mu.Unlock()
}
func (a *taskActivity) current() (string, string) {
a.mu.Lock()
defer a.mu.Unlock()
return a.currentID, a.currentTitle
}
type cooldownState struct {
mu sync.Mutex
until time.Time
}
func (s *cooldownState) set(until time.Time) {
s.mu.Lock()
s.until = until
s.mu.Unlock()
}
func (s *cooldownState) clear(until time.Time) {
s.mu.Lock()
if s.until.Equal(until) {
s.until = time.Time{}
}
s.mu.Unlock()
}
func (s *cooldownState) active(now time.Time) (time.Time, bool) {
s.mu.Lock()
defer s.mu.Unlock()
if s.until.IsZero() || !s.until.After(now) {
return time.Time{}, false
}
return s.until, true
}
type videoQueue struct {
mu sync.Mutex
ids map[string]struct{}
}
func (q *videoQueue) reserve(id string) bool {
if id == "" {
return true
}
q.mu.Lock()
defer q.mu.Unlock()
if q.ids == nil {
q.ids = make(map[string]struct{})
}
if _, ok := q.ids[id]; ok {
return false
}
q.ids[id] = struct{}{}
return true
}
func (q *videoQueue) release(id string) {
if id == "" {
return
}
q.mu.Lock()
delete(q.ids, id)
q.mu.Unlock()
}
func (q *videoQueue) lengthExcluding(currentID string) int {
q.mu.Lock()
defer q.mu.Unlock()
n := len(q.ids)
if currentID != "" {
if _, ok := q.ids[currentID]; ok {
n--
}
}
if n < 0 {
return 0
}
return n
}
+158
View File
@@ -0,0 +1,158 @@
package fingerprint
import (
"context"
"errors"
"fmt"
"io"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"testing"
"time"
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/drives"
)
func TestComputeLocalFilesWithSameContentMatch(t *testing.T) {
ctx := context.Background()
dir := t.TempDir()
body := []byte("same video bytes")
a := filepath.Join(dir, "a.mp4")
b := filepath.Join(dir, "b.mp4")
if err := os.WriteFile(a, body, 0o644); err != nil {
t.Fatalf("write a: %v", err)
}
if err := os.WriteFile(b, body, 0o644); err != nil {
t.Fatalf("write b: %v", err)
}
sumA, err := Compute(ctx, &fakeDrive{paths: map[string]string{"a": a}}, &catalog.Video{ID: "a", FileID: "a", Size: int64(len(body))}, Config{}, nil)
if err != nil {
t.Fatalf("compute a: %v", err)
}
sumB, err := Compute(ctx, &fakeDrive{paths: map[string]string{"b": b}}, &catalog.Video{ID: "b", FileID: "b", Size: int64(len(body))}, Config{}, nil)
if err != nil {
t.Fatalf("compute b: %v", err)
}
if sumA == "" || sumA != sumB {
t.Fatalf("fingerprints = %q / %q, want same non-empty", sumA, sumB)
}
}
func TestComputeRemoteUsesRangeSamples(t *testing.T) {
ctx := context.Background()
data := make([]byte, 10*1024*1024)
for i := range data {
data[i] = byte(i % 251)
}
var ranges []string
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
rawRange := r.Header.Get("Range")
ranges = append(ranges, rawRange)
var start, end int
if _, err := fmt.Sscanf(rawRange, "bytes=%d-%d", &start, &end); err != nil {
t.Fatalf("bad range %q: %v", rawRange, err)
}
w.Header().Set("Content-Range", fmt.Sprintf("bytes %d-%d/%d", start, end, len(data)))
w.WriteHeader(http.StatusPartialContent)
_, _ = w.Write(data[start : end+1])
}))
defer srv.Close()
drv := &fakeDrive{paths: map[string]string{"remote": srv.URL + "/video.mp4"}}
sum, err := Compute(ctx, drv, &catalog.Video{ID: "remote", FileID: "remote", Size: int64(len(data))}, Config{
SampleSizeBytes: 4,
FullHashMaxSize: 8,
HTTPClient: srv.Client(),
}, srv.Client())
if err != nil {
t.Fatalf("compute remote: %v", err)
}
if sum == "" {
t.Fatal("fingerprint should not be empty")
}
want := []string{
"bytes=0-3",
"bytes=2097151-2097154",
"bytes=4194302-4194305",
"bytes=6291453-6291456",
"bytes=8388604-8388607",
}
if fmt.Sprint(ranges) != fmt.Sprint(want) {
t.Fatalf("ranges = %#v, want %#v", ranges, want)
}
}
func TestComputeRemoteGoogleQuotaExceededReturnsRateLimit(t *testing.T) {
ctx := context.Background()
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Retry-After", "60")
w.WriteHeader(http.StatusForbidden)
_, _ = w.Write([]byte(`{"error":{"code":403,"message":"The download quota for this file has been exceeded.","errors":[{"domain":"usageLimits","reason":"downloadQuotaExceeded","message":"The download quota for this file has been exceeded."}]}}`))
}))
defer srv.Close()
drv := &fakeDrive{paths: map[string]string{"remote": srv.URL + "/drive/v3/files/file-1?alt=media"}}
_, err := Compute(ctx, drv, &catalog.Video{ID: "remote", FileID: "remote", Size: 1024 * 1024}, Config{
SampleSizeBytes: 4,
FullHashMaxSize: 8,
HTTPClient: srv.Client(),
}, srv.Client())
if err == nil {
t.Fatal("compute succeeded, want rate limit")
}
var rateLimit *drives.RateLimitError
if !errors.As(err, &rateLimit) {
t.Fatalf("error = %T %[1]v, want RateLimitError", err)
}
if rateLimit.RetryAfter != time.Minute {
t.Fatalf("retry after = %s, want 1m", rateLimit.RetryAfter)
}
}
func TestWopanRemoteRangeErrorsLookRateLimited(t *testing.T) {
for _, tc := range []struct {
rawURL string
status int
}{
{rawURL: "https://gxdownload.pan.wo.cn:8445/openapi/download?fid=encoded", status: http.StatusForbidden},
{rawURL: "https://du.smartont.net:8445/openapi/download?fid=encoded", status: http.StatusServiceUnavailable},
{rawURL: "https://du.smartont.net:8445/openapi/download?fid=encoded", status: 509},
} {
if !remoteRangeResponseLooksRateLimited(tc.rawURL, tc.status, nil) {
t.Fatalf("remoteRangeResponseLooksRateLimited(%q, %d) = false, want true", tc.rawURL, tc.status)
}
}
if remoteRangeResponseLooksRateLimited("https://example.com/video.mp4", http.StatusForbidden, nil) {
t.Fatal("generic 403 should not be treated as wopan rate limit")
}
}
type fakeDrive struct {
paths map[string]string
}
func (d *fakeDrive) Kind() string { return "fake" }
func (d *fakeDrive) ID() string { return "fake" }
func (d *fakeDrive) Init(context.Context) error {
return nil
}
func (d *fakeDrive) List(context.Context, string) ([]drives.Entry, error) {
return nil, drives.ErrNotSupported
}
func (d *fakeDrive) Stat(context.Context, string) (*drives.Entry, error) {
return nil, drives.ErrNotSupported
}
func (d *fakeDrive) StreamURL(_ context.Context, fileID string) (*drives.StreamLink, error) {
return &drives.StreamLink{URL: d.paths[fileID], Expires: time.Now().Add(time.Minute)}, nil
}
func (d *fakeDrive) Upload(context.Context, string, string, io.Reader, int64) (string, error) {
return "", drives.ErrNotSupported
}
func (d *fakeDrive) EnsureDir(context.Context, string) (string, error) {
return "", drives.ErrNotSupported
}
func (d *fakeDrive) RootID() string { return "root" }
+69
View File
@@ -0,0 +1,69 @@
package mediaasset
import (
"crypto/sha256"
"encoding/hex"
"path/filepath"
"strings"
)
const maxPlainStemBytes = 180
const maxLegacyFilenameBytes = 255
func PreviewPath(localDir, videoID string) string {
return filepath.Join(localDir, PreviewFilename(videoID))
}
func ThumbnailPath(localDir, videoID string) string {
return ThumbnailPathInDir(filepath.Join(localDir, "thumbs"), videoID)
}
func ThumbnailPathInDir(thumbDir, videoID string) string {
return filepath.Join(thumbDir, ThumbnailFilename(videoID))
}
func PreviewPathCandidates(localDir, videoID string) []string {
return pathCandidates(localDir, videoID, ".mp4", "")
}
func ThumbnailPathCandidates(localDir, videoID string) []string {
return pathCandidates(localDir, videoID, ".jpg", "thumbs")
}
func PreviewFilename(videoID string) string {
return safeFilename(videoID, ".mp4")
}
func ThumbnailFilename(videoID string) string {
return safeFilename(videoID, ".jpg")
}
func pathCandidates(localDir, videoID, ext, subdir string) []string {
safe := safeFilename(videoID, ext)
legacy := videoID + ext
base := localDir
if subdir != "" {
base = filepath.Join(base, subdir)
}
out := []string{filepath.Join(base, safe)}
if legacy != safe && isPlainSafeStem(videoID) && len([]byte(legacy)) <= maxLegacyFilenameBytes {
out = append(out, filepath.Join(base, legacy))
}
return out
}
func safeFilename(videoID, ext string) string {
if isPlainSafeStem(videoID) && len([]byte(videoID))+len(ext) <= maxPlainStemBytes {
return videoID + ext
}
sum := sha256.Sum256([]byte(videoID))
return "v-" + hex.EncodeToString(sum[:]) + ext
}
func isPlainSafeStem(value string) bool {
value = strings.TrimSpace(value)
if value == "" || value == "." || value == ".." {
return false
}
return !strings.ContainsAny(value, `/\`+"\x00")
}
+56
View File
@@ -0,0 +1,56 @@
package mediaasset
import (
"path/filepath"
"strings"
"testing"
)
func TestFilenamesKeepShortSafeIDs(t *testing.T) {
if got := ThumbnailFilename("video-1"); got != "video-1.jpg" {
t.Fatalf("thumbnail filename = %q, want video-1.jpg", got)
}
if got := PreviewFilename("video-1"); got != "video-1.mp4" {
t.Fatalf("preview filename = %q, want video-1.mp4", got)
}
}
func TestFilenamesHashLongOrUnsafeIDs(t *testing.T) {
longID := "localstorage-" + strings.Repeat("x", 240)
got := ThumbnailFilename(longID)
if !strings.HasPrefix(got, "v-") || !strings.HasSuffix(got, ".jpg") {
t.Fatalf("thumbnail filename = %q, want hashed jpg", got)
}
if len([]byte(got)) >= len([]byte(longID+".jpg")) {
t.Fatalf("thumbnail filename = %q should be shorter than original id", got)
}
unsafe := ThumbnailFilename("dir/video")
if unsafe == "dir/video.jpg" || strings.ContainsAny(unsafe, `/\`) {
t.Fatalf("unsafe thumbnail filename = %q, want hashed single filename", unsafe)
}
}
func TestThumbnailPathCandidatesIncludeLegacyForHashedIDs(t *testing.T) {
localDir := t.TempDir()
mediumID := "localstorage-" + strings.Repeat("x", 190)
got := ThumbnailPathCandidates(localDir, mediumID)
if len(got) != 2 {
t.Fatalf("candidates = %#v, want hashed and legacy paths", got)
}
if got[0] != ThumbnailPath(localDir, mediumID) {
t.Fatalf("first candidate = %q, want safe path %q", got[0], ThumbnailPath(localDir, mediumID))
}
if filepath.Base(got[1]) != mediumID+".jpg" {
t.Fatalf("legacy candidate = %q, want original id jpg", got[1])
}
}
func TestThumbnailPathCandidatesSkipOverlongLegacy(t *testing.T) {
localDir := t.TempDir()
longID := "localstorage-" + strings.Repeat("x", 240)
got := ThumbnailPathCandidates(localDir, longID)
if len(got) != 1 {
t.Fatalf("candidates = %#v, want only hashed path for overlong id", got)
}
}
+145 -12
View File
@@ -5,13 +5,15 @@
// "扫描所有网盘"):
//
// Phase 1: for each non-spider91 cloud drive
// scan + delete-detection + enqueue thumb + enqueue teaser
// wait until all thumb / teaser queues are idle
// scan + delete-detection + enqueue thumb + enqueue preview video
// wait until all thumb / preview-video queues are idle
// Phase 2: if any spider91 drive configured
// crawl + enqueue teaser for new videos
// wait until teaser queues are idle
// crawl + enqueue preview video for new videos
// wait until preview-video queues are idle
// Phase 3: spider91 → cloud migration (single sweep, captcha cooldown still
// honored within this call)
// Phase 4: cleanup duplicate local preview/thumbnail assets after sampled
// fingerprints have identified canonical videos
//
// A 6h soft deadline guards each pipeline run; phases check deadline at their
// boundaries and exit cleanly if exceeded (no in-flight ffmpeg / upload is
@@ -74,10 +76,10 @@ type Config struct {
ListSpider91Drives func(ctx context.Context) []string
// RunSpider91Crawl synchronously runs one crawl cycle (downloads + thumbs +
// teaser enqueue) for a single spider91 drive.
// preview-video enqueue) for a single spider91 drive.
RunSpider91Crawl func(ctx context.Context, driveID string)
// WaitPreviewQueuesIdle blocks until both the thumbnail and teaser queues
// WaitPreviewQueuesIdle blocks until both the thumbnail and preview-video queues
// across all drives are drained (queue empty + no in-flight task). It must
// honor ctx cancellation.
WaitPreviewQueuesIdle func(ctx context.Context) error
@@ -85,15 +87,35 @@ type Config struct {
// RunMigration runs spider91migrate.Migrator.RunOnce for Phase 3.
RunMigration func(ctx context.Context) error
// RunDedupeAssetCleanup removes generated local assets from non-canonical
// videos in size+sampled_sha256 duplicate groups. It must not delete cloud
// files or catalog rows.
RunDedupeAssetCleanup func(ctx context.Context) error
// Now is injected for tests; nil → time.Now.
Now func() time.Time
}
type Status struct {
State string
Running bool
Queued bool
StartedAt time.Time
LastFinishedAt time.Time
}
// Runner drives the nightly pipeline.
type Runner struct {
cfg Config
trigger chan struct{} // buffered(1); manual "run now"
runMu sync.Mutex // prevents overlapping pipeline runs
stateMu sync.Mutex
running bool
queued bool
startedAt time.Time
lastFinishedAt time.Time
currentCancel context.CancelFunc
}
// New constructs a Runner. cfg is shallow-copied; defaults are applied.
@@ -131,13 +153,75 @@ func (r *Runner) Run(ctx context.Context) {
}
}
// TriggerNow asks the running loop to fire a pipeline ASAP. If a pipeline is
// already in progress (or another trigger is already pending), the request
// is dropped — the in-progress run will absorb the intent.
func (r *Runner) TriggerNow() {
// TriggerNow asks the running loop to fire a pipeline ASAP. Only one manual
// trigger can be active at a time: if a pipeline is already running or waiting
// in the trigger channel, the request is ignored and returns false.
func (r *Runner) TriggerNow() bool {
r.stateMu.Lock()
if r.running || r.queued {
r.stateMu.Unlock()
return false
}
r.queued = true
r.stateMu.Unlock()
select {
case r.trigger <- struct{}{}:
return true
default:
r.stateMu.Lock()
r.queued = false
r.stateMu.Unlock()
return false
}
}
// StopCurrent cancels the currently running pipeline and drops one queued
// manual trigger, if present. It returns true when there was something to stop.
func (r *Runner) StopCurrent() bool {
r.stateMu.Lock()
wasRunning := r.running
wasQueued := r.queued
cancel := r.currentCancel
r.queued = false
r.stateMu.Unlock()
if wasQueued {
select {
case <-r.trigger:
default:
}
}
if cancel != nil {
cancel()
}
return wasRunning || wasQueued || cancel != nil
}
func (r *Runner) Status() Status {
r.stateMu.Lock()
running := r.running
queued := r.queued
startedAt := r.startedAt
lastFinishedAt := r.lastFinishedAt
r.stateMu.Unlock()
state := "idle"
switch {
case running && queued:
state = "running_queued"
case running:
state = "running"
case queued:
state = "queued"
}
return Status{
State: state,
Running: running,
Queued: queued,
StartedAt: startedAt,
LastFinishedAt: lastFinishedAt,
}
}
@@ -171,13 +255,28 @@ func shouldRun(now time.Time, lastRunDate string) bool {
//
// 流水线没有总耗时上限:一直跑到 ctx 取消(进程退出)或所有 phase 完成。
func (r *Runner) runPipelineLocked(ctx context.Context, manual bool) {
if manual {
r.stateMu.Lock()
queued := r.queued
r.stateMu.Unlock()
if !queued {
log.Printf("[nightly] manual trigger was canceled before start")
return
}
}
if !r.runMu.TryLock() {
log.Printf("[nightly] another pipeline is already running, skipping this trigger")
return
}
defer r.runMu.Unlock()
started := r.cfg.Now()
runCtx, cancel := context.WithCancel(ctx)
r.markStarted(started, cancel)
defer func() {
cancel()
r.markFinished(r.cfg.Now())
r.runMu.Unlock()
}()
mode := "scheduled"
if manual {
@@ -185,7 +284,7 @@ func (r *Runner) runPipelineLocked(ctx context.Context, manual bool) {
}
log.Printf("[nightly] pipeline (%s) start", mode)
r.runPipeline(ctx)
r.runPipeline(runCtx)
finished := r.cfg.Now()
log.Printf("[nightly] pipeline (%s) finish; took=%s", mode, finished.Sub(started).Round(time.Second))
@@ -199,6 +298,24 @@ func (r *Runner) runPipelineLocked(ctx context.Context, manual bool) {
}
}
func (r *Runner) markStarted(started time.Time, cancel context.CancelFunc) {
r.stateMu.Lock()
defer r.stateMu.Unlock()
r.running = true
r.queued = false
r.startedAt = started
r.currentCancel = cancel
}
func (r *Runner) markFinished(finished time.Time) {
r.stateMu.Lock()
defer r.stateMu.Unlock()
r.running = false
r.startedAt = time.Time{}
r.lastFinishedAt = finished
r.currentCancel = nil
}
// runPipeline executes the three phases. It returns when the pipeline finishes
// OR ctx is done (deadline / cancel). Errors are logged but not propagated —
// each phase is best-effort; downstream phases still attempt to run unless ctx
@@ -240,6 +357,7 @@ func (r *Runner) runPipeline(ctx context.Context) {
}
if len(spiderIDs) == 0 {
log.Printf("[nightly] phase 2/3 skipped: no spider91 drive configured")
r.runDedupeAssetCleanupPhase(ctx)
return
}
log.Printf("[nightly] phase 2: crawling %d spider91 drive(s)", len(spiderIDs))
@@ -266,6 +384,8 @@ func (r *Runner) runPipeline(ctx context.Context) {
log.Printf("[nightly] phase 3 migration: %v", err)
}
}
r.runDedupeAssetCleanupPhase(ctx)
}
// checkDeadline returns true when ctx is already done (runner shutting down or
@@ -291,6 +411,19 @@ func (r *Runner) waitIdle(ctx context.Context, phase string) error {
return nil
}
func (r *Runner) runDedupeAssetCleanupPhase(ctx context.Context) {
if r.checkDeadline(ctx, "phase 4") {
return
}
if r.cfg.RunDedupeAssetCleanup == nil {
return
}
log.Printf("[nightly] phase 4: duplicate asset cleanup")
if err := r.cfg.RunDedupeAssetCleanup(ctx); err != nil {
log.Printf("[nightly] phase 4 duplicate asset cleanup: %v", err)
}
}
// readLastRunDate reads the persisted last_run_date or returns "" when unset.
func (r *Runner) readLastRunDate(ctx context.Context) (string, error) {
if r.cfg.Settings == nil {
+176 -2
View File
@@ -114,6 +114,10 @@ func TestRunPipelineHonoursPhaseOrder(t *testing.T) {
rec.push("migrate")
return nil
},
RunDedupeAssetCleanup: func(context.Context) error {
rec.push("dedupe-cleanup")
return nil
},
})
r.runPipeline(context.Background())
@@ -128,6 +132,7 @@ func TestRunPipelineHonoursPhaseOrder(t *testing.T) {
"crawl:sp-1",
"wait-idle", // after phase 2
"migrate",
"dedupe-cleanup",
}
if len(got) != len(want) {
t.Fatalf("call sequence len = %d, want %d; got=%v", len(got), len(want), got)
@@ -156,6 +161,10 @@ func TestRunPipelineSkipsMigrationWhenNoSpider91(t *testing.T) {
rec.push("migrate")
return nil
},
RunDedupeAssetCleanup: func(context.Context) error {
rec.push("dedupe-cleanup")
return nil
},
})
r.runPipeline(context.Background())
@@ -165,6 +174,15 @@ func TestRunPipelineSkipsMigrationWhenNoSpider91(t *testing.T) {
t.Fatalf("phase 2/3 should be skipped when no spider91 drive, got call %q", c)
}
}
foundCleanup := false
for _, c := range rec.snapshot() {
if c == "dedupe-cleanup" {
foundCleanup = true
}
}
if !foundCleanup {
t.Fatalf("dedupe cleanup should still run when spider91 is absent; calls=%v", rec.snapshot())
}
}
func TestRunPipelineExitsWhenContextCancelledMidPhase(t *testing.T) {
@@ -186,6 +204,7 @@ func TestRunPipelineExitsWhenContextCancelledMidPhase(t *testing.T) {
RunSpider91Crawl: func(context.Context, string) { rec.push("crawl") },
WaitPreviewQueuesIdle: func(context.Context) error { rec.push("wait-idle"); return nil },
RunMigration: func(context.Context) error { rec.push("migrate"); return nil },
RunDedupeAssetCleanup: func(context.Context) error { rec.push("dedupe-cleanup"); return nil },
})
r.runPipeline(ctx)
@@ -200,6 +219,9 @@ func TestRunPipelineExitsWhenContextCancelledMidPhase(t *testing.T) {
if c == "crawl" || c == "migrate" {
t.Fatalf("subsequent phase should not run after cancel, got call %q", c)
}
if c == "dedupe-cleanup" {
t.Fatalf("dedupe cleanup should not run after cancel, got call %q", c)
}
}
}
@@ -290,11 +312,14 @@ func TestCtxCancelPreventsLaterPhases(t *testing.T) {
func TestTriggerNowIsNonBlocking(t *testing.T) {
r := New(Config{Settings: newStubSettings()})
// fill the trigger channel
r.TriggerNow()
if !r.TriggerNow() {
t.Fatal("first TriggerNow should be accepted")
}
// Second call must not block
done := make(chan struct{})
var accepted bool
go func() {
r.TriggerNow()
accepted = r.TriggerNow()
close(done)
}()
select {
@@ -302,4 +327,153 @@ func TestTriggerNowIsNonBlocking(t *testing.T) {
case <-time.After(100 * time.Millisecond):
t.Fatal("TriggerNow blocked when channel is full")
}
if accepted {
t.Fatal("second TriggerNow should be ignored when trigger channel is full")
}
}
func TestStatusTracksQueuedRunningAndFinished(t *testing.T) {
blockScan := make(chan struct{})
scanStarted := make(chan struct{})
var startedOnce sync.Once
r := New(Config{
Settings: newStubSettings(),
ListScanTargets: func(context.Context) []string {
return []string{"drive"}
},
RunScan: func(context.Context, string) {
startedOnce.Do(func() { close(scanStarted) })
<-blockScan
},
})
if got := r.Status(); got.State != "idle" || got.Running || got.Queued {
t.Fatalf("initial status = %#v, want idle", got)
}
if !r.TriggerNow() {
t.Fatal("TriggerNow should queue a manual run")
}
if got := r.Status(); got.State != "queued" || got.Running || !got.Queued {
t.Fatalf("queued status = %#v, want queued", got)
}
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go r.Run(ctx)
select {
case <-scanStarted:
case <-time.After(time.Second):
t.Fatal("pipeline did not start")
}
if got := r.Status(); got.State != "running" || !got.Running || got.Queued || got.StartedAt.IsZero() {
t.Fatalf("running status = %#v, want running with startedAt", got)
}
if r.TriggerNow() {
t.Fatal("TriggerNow during a run should be ignored")
}
if got := r.Status(); got.State != "running" || !got.Running || got.Queued {
t.Fatalf("status after ignored trigger = %#v, want running", got)
}
close(blockScan)
deadline := time.After(time.Second)
for {
got := r.Status()
if !got.Running && !got.Queued && !got.LastFinishedAt.IsZero() {
return
}
select {
case <-deadline:
t.Fatalf("status did not finish; got=%#v", got)
default:
time.Sleep(10 * time.Millisecond)
}
}
}
func TestStopCurrentCancelsRunningPipeline(t *testing.T) {
scanStarted := make(chan struct{})
scanCanceled := make(chan struct{})
var startedOnce sync.Once
r := New(Config{
Settings: newStubSettings(),
ListScanTargets: func(context.Context) []string {
return []string{"drive"}
},
RunScan: func(ctx context.Context, _ string) {
startedOnce.Do(func() { close(scanStarted) })
<-ctx.Done()
close(scanCanceled)
},
})
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go r.Run(ctx)
if !r.TriggerNow() {
t.Fatal("TriggerNow should queue a manual run")
}
select {
case <-scanStarted:
case <-time.After(time.Second):
t.Fatal("pipeline did not start")
}
if !r.StopCurrent() {
t.Fatal("StopCurrent should report a running pipeline")
}
select {
case <-scanCanceled:
case <-time.After(time.Second):
t.Fatal("StopCurrent did not cancel pipeline context")
}
}
func TestStopCurrentDropsQueuedTrigger(t *testing.T) {
r := New(Config{Settings: newStubSettings()})
if !r.TriggerNow() {
t.Fatal("TriggerNow should queue a manual run")
}
if !r.StopCurrent() {
t.Fatal("StopCurrent should report a queued pipeline")
}
if got := r.Status(); got.State != "idle" || got.Running || got.Queued {
t.Fatalf("status = %#v, want idle after dropping queued trigger", got)
}
if !r.TriggerNow() {
t.Fatal("TriggerNow should accept a new request after queued stop")
}
}
func TestTriggerNowAcceptsOnlyOneConcurrentRequest(t *testing.T) {
r := New(Config{Settings: newStubSettings()})
const callers = 16
start := make(chan struct{})
results := make(chan bool, callers)
for i := 0; i < callers; i++ {
go func() {
<-start
results <- r.TriggerNow()
}()
}
close(start)
accepted := 0
for i := 0; i < callers; i++ {
if <-results {
accepted++
}
}
if accepted != 1 {
t.Fatalf("accepted triggers = %d, want 1", accepted)
}
if got := r.Status(); got.State != "queued" || got.Running || !got.Queued {
t.Fatalf("status = %#v, want one queued trigger", got)
}
}
+331 -64
View File
@@ -1,6 +1,7 @@
package preview
import (
"bytes"
"context"
"encoding/json"
"errors"
@@ -20,15 +21,16 @@ import (
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/drives"
"github.com/video-site/backend/internal/mediaasset"
)
type Config struct {
FFmpegPath string
FFprobePath string
DurationSeconds int // 兼容旧配置;当前 teaser 每段固定 3 秒
DurationSeconds int // 兼容旧配置;当前预览视频每段固定 3 秒
Width int
Segments int // 兼容旧配置;当前 30 秒及以上视频固定使用 4 段
LocalDir string // 本地 teaser 和封面目录
LocalDir string // 本地预览视频和封面目录
}
type Generator struct {
@@ -235,23 +237,43 @@ func appendUniqueStart(starts []float64, start, eachSec float64) []float64 {
return append(starts, start)
}
// thumbnailOffsets 选封面抽帧的时间点(秒)。独立于 teaser
func thumbnailOffsets() []float64 {
return []float64{5, 1, 0}
// thumbnailOffsets 选封面抽帧的时间点(秒)。独立于预览视频
// 默认取视频中间帧;时长未知时退回早期帧。
func thumbnailOffsets(duration float64) []float64 {
if duration <= 0 {
return []float64{5, 1, 0}
}
mid := duration / 2
out := []float64{mid}
for _, fallback := range []float64{5, 1, 0} {
if !containsOffset(out, fallback) {
out = append(out, fallback)
}
}
return out
}
func containsOffset(offsets []float64, target float64) bool {
for _, offset := range offsets {
if math.Abs(offset-target) < 0.01 {
return true
}
}
return false
}
// --- 封面 ---
// GenerateThumbnail 抽一张 jpg 封面。默认从第 5 秒抽帧,失败时回退到更早时间点。
// GenerateThumbnail 抽一张 jpg 封面。默认从视频中间抽帧,失败时回退到更早时间点。
func (g *Generator) GenerateThumbnail(ctx context.Context, link *drives.StreamLink, videoID string, duration float64) (string, error) {
dir := filepath.Join(g.cfg.LocalDir, "thumbs")
if err := os.MkdirAll(dir, 0o755); err != nil {
return "", err
}
dst := filepath.Join(dir, videoID+".jpg")
dst := mediaasset.ThumbnailPath(g.cfg.LocalDir, videoID)
var lastErr error
offsets := thumbnailOffsets()
offsets := thumbnailOffsets(duration)
for i, offset := range offsets {
if i > 0 {
_ = os.Remove(dst)
@@ -289,7 +311,7 @@ func (g *Generator) generateThumbnailAtOffset(ctx context.Context, link *drives.
args = append(args,
"-i", ffmpegLink.URL,
"-frames:v", "1",
"-vf", fmt.Sprintf("scale=%d:-2", g.cfg.Width),
"-vf", thumbnailVideoFilter(g.cfg.Width),
"-q:v", "3",
"-y", dst,
)
@@ -307,6 +329,12 @@ func (g *Generator) generateThumbnailAtOffset(ctx context.Context, link *drives.
return nil
}
func thumbnailVideoFilter(width int) string {
// FFmpeg 7 rejects non-full-range YUV for MJPEG/JPEG output. Force the
// scaled frame into a JPEG-friendly full-range pixel format before encode.
return fmt.Sprintf("scale=%d:-2:out_range=pc,format=yuvj420p", width)
}
func thumbnailOffsetFallbackAllowed(err error) bool {
if err == nil {
return false
@@ -339,9 +367,15 @@ func (g *Generator) Probe(ctx context.Context, link *drives.StreamLink) (float64
args = append(args, ffmpegLink.URL)
cmd := exec.CommandContext(ctx2, g.cfg.FFprobePath, args...)
out, err := cmd.CombinedOutput()
var stderr bytes.Buffer
cmd.Stderr = &stderr
out, err := cmd.Output()
if err != nil {
return 0, ffmpegCommandError("ffprobe", err, out)
errOut := stderr.Bytes()
if len(errOut) == 0 {
errOut = out
}
return 0, ffmpegCommandError("ffprobe", err, errOut)
}
raw := strings.TrimSpace(string(out))
if raw == "" || raw == "N/A" {
@@ -350,9 +384,9 @@ func (g *Generator) Probe(ctx context.Context, link *drives.StreamLink) (float64
return strconv.ParseFloat(raw, 64)
}
// --- Teaser ---
// --- 预览视频 ---
// Generate 拉取 teaser 到本地临时文件,返回路径。
// Generate 拉取预览视频到本地临时文件,返回路径。
// 根据 Config.Segments 和视频时长决定是单段还是多段拼接。
func (g *Generator) Generate(ctx context.Context, link *drives.StreamLink, duration float64) (string, error) {
return g.generate(ctx, duration, func(int) (*drives.StreamLink, error) {
@@ -923,6 +957,7 @@ func ffmpegOutputLooksRateLimited(output []byte) bool {
return false
}
return strings.Contains(text, "too many requests") ||
strings.Contains(text, "throttl") ||
strings.Contains(text, "rate limit") ||
strings.Contains(text, "rate-limit") ||
strings.Contains(text, "server returned 429")
@@ -932,7 +967,10 @@ func ffmpegOutputLooksRateLimited(output []byte) bool {
// MoveToLocal 把临时文件改名到稳定位置,返回最终路径
func (g *Generator) MoveToLocal(tmpPath, videoID string) (string, error) {
dst := filepath.Join(g.cfg.LocalDir, videoID+".mp4")
if err := os.MkdirAll(g.cfg.LocalDir, 0o755); err != nil {
return "", err
}
dst := mediaasset.PreviewPath(g.cfg.LocalDir, videoID)
if err := os.Rename(tmpPath, dst); err != nil {
// 跨盘 rename 可能失败,fallback 到 copy
if cerr := copyFile(tmpPath, dst); cerr != nil {
@@ -968,7 +1006,6 @@ type Worker struct {
queue videoQueue
RateLimitCooldown time.Duration
BeforeTask func(context.Context) bool
rateLimit rateLimitState
activity taskActivity
}
@@ -978,7 +1015,7 @@ func NewWorker(gen TeaserGenerator, cat *catalog.Catalog, drv drives.Drive) *Wor
Gen: gen,
Catalog: cat,
Drive: drv,
ch: make(chan *catalog.Video, 4096),
ch: make(chan *catalog.Video, defaultWorkerQueueSize),
}
}
@@ -1027,10 +1064,12 @@ type ThumbWorker struct {
}
const (
defaultTransientMediaCooldown = 5 * time.Minute
defaultGenerationRateLimitCooldown = 5 * time.Minute
maxPreviewTeaserSizeBytes int64 = 5 * 1024 * 1024 * 1024
previewStatusSkipped = "skipped"
defaultTransientMediaCooldown = 5 * time.Minute
defaultGenerationRateLimitCooldown = 5 * time.Minute
defaultThumbTransientMediaMaxFailures = 3
defaultWorkerQueueSize = 10000
maxPreviewTeaserSizeBytes int64 = 5 * 1024 * 1024 * 1024
previewStatusSkipped = "skipped"
)
type rateLimitState struct {
@@ -1168,7 +1207,7 @@ func NewThumbWorker(gen ThumbnailGenerator, cat *catalog.Catalog, drv drives.Dri
Gen: gen,
Catalog: cat,
Drive: drv,
ch: make(chan *catalog.Video, 4096),
ch: make(chan *catalog.Video, defaultWorkerQueueSize),
}
}
@@ -1323,26 +1362,32 @@ func (w *ThumbWorker) Run(ctx context.Context) {
func (w *Worker) processQueued(ctx context.Context, v *catalog.Video) {
defer w.queue.release(v)
if w.BeforeTask != nil && !w.BeforeTask(ctx) {
if w.Catalog == nil || v == nil || v.ID == "" {
return
}
w.activity.start(v)
current, err := w.Catalog.GetVideo(ctx, v.ID)
if err != nil || current.Hidden {
return
}
w.activity.start(current)
defer w.activity.done()
if !waitForRateLimitCooldown(ctx, &w.rateLimit, "preview", w.Drive) {
return
}
w.process(ctx, v)
w.process(ctx, current)
}
func (w *ThumbWorker) processQueued(ctx context.Context, v *catalog.Video) {
defer w.queue.release(v)
w.activity.start(v)
defer w.activity.done()
if !waitForRateLimitCooldown(ctx, &w.rateLimit, "thumb", w.Drive) {
return
retry := false
if waitForRateLimitCooldown(ctx, &w.rateLimit, "thumb", w.Drive) {
retry = w.process(ctx, v)
}
w.activity.done()
w.queue.release(v)
if retry && ctx.Err() == nil {
w.EnqueueBlocking(ctx, v)
}
w.process(ctx, v)
}
func waitForRateLimitCooldown(ctx context.Context, state *rateLimitState, label string, drive drives.Drive) bool {
@@ -1382,11 +1427,17 @@ func (w *Worker) skipIfRateLimited(v *catalog.Video) bool {
}
func (w *Worker) pauseForRateLimit(err error, step, title string) bool {
_, ok := drives.RateLimitRetryAfter(err)
wait, ok := drives.RateLimitRetryAfter(err)
if !ok {
return false
}
until := w.rateLimit.pause(time.Now(), defaultGenerationRateLimitCooldown)
if wait <= 0 {
wait = w.RateLimitCooldown
if wait <= 0 {
wait = defaultGenerationRateLimitCooldown
}
}
until := w.rateLimit.pause(time.Now(), wait)
log.Printf("[preview] drive=%s rate-limited until=%s step=%s video=%s: %v", w.Drive.ID(), until.Format(time.RFC3339), step, title, err)
return true
}
@@ -1415,24 +1466,49 @@ func (w *ThumbWorker) skipIfRateLimited(v *catalog.Video) bool {
}
func (w *ThumbWorker) pauseForRateLimit(err error, step, title string) bool {
_, ok := drives.RateLimitRetryAfter(err)
wait, ok := drives.RateLimitRetryAfter(err)
if !ok {
return false
}
until := w.rateLimit.pause(time.Now(), defaultGenerationRateLimitCooldown)
if wait <= 0 {
wait = w.RateLimitCooldown
if wait <= 0 {
wait = defaultGenerationRateLimitCooldown
}
}
until := w.rateLimit.pause(time.Now(), wait)
log.Printf("[thumb] drive=%s rate-limited until=%s step=%s video=%s: %v", w.Drive.ID(), until.Format(time.RFC3339), step, title, err)
return true
}
func (w *ThumbWorker) pauseForRecoverableError(err error, step, title string) bool {
func (w *ThumbWorker) pauseForRecoverableError(ctx context.Context, v *catalog.Video, err error, step string) bool {
title := ""
videoID := ""
if v != nil {
title = v.Title
videoID = v.ID
}
if w.pauseForRateLimit(err, step, title) {
return true
}
if !driveErrorShouldCooldown(w.Drive, err) {
return false
}
failures := 1
if w.Catalog != nil && videoID != "" {
count, countErr := w.Catalog.IncrementThumbnailFailures(ctx, videoID)
if countErr != nil {
log.Printf("[thumb] drive=%s transient media source error count failed step=%s video=%s: %v", w.Drive.ID(), step, title, countErr)
} else {
failures = count
}
}
if failures >= defaultThumbTransientMediaMaxFailures {
log.Printf("[thumb] drive=%s transient media source error reached retry limit failures=%d/%d step=%s video=%s: %v", w.Drive.ID(), failures, defaultThumbTransientMediaMaxFailures, step, title, err)
return false
}
until := w.rateLimit.pause(time.Now(), w.RateLimitCooldown)
log.Printf("[thumb] drive=%s transient media source error until=%s step=%s video=%s: %v", w.Drive.ID(), until.Format(time.RFC3339), step, title, err)
log.Printf("[thumb] drive=%s transient media source error until=%s failures=%d/%d step=%s video=%s: %v", w.Drive.ID(), until.Format(time.RFC3339), failures, defaultThumbTransientMediaMaxFailures, step, title, err)
return true
}
@@ -1453,7 +1529,7 @@ func driveErrorShouldCooldown(d drives.Drive, err error) bool {
strings.Contains(text, "request has been blocked") ||
strings.Contains(text, "访问被阻断")
case "pikpak":
// PikPak 在 teaser / 封面生成阶段(取链或拉直链字节)可能命中:
// PikPak 在预览视频 / 封面生成阶段(取链或拉直链字节)可能命中:
// - error_code=10 操作频繁
// - HTTP 429 / 5xx / 509 限流和服务端不可用
// - 通用文本:rate limit / too many requests / blocked
@@ -1471,65 +1547,252 @@ func driveErrorShouldCooldown(d drives.Drive, err error) bool {
strings.Contains(text, "too many requests") ||
strings.Contains(text, "rate limit") ||
strings.Contains(text, "blocked") ||
strings.Contains(text, "moov atom not found") ||
strings.Contains(text, "partial file") ||
strings.Contains(text, "service unavailable")
case "p123":
// 123网盘直链解析 / ffmpeg 读取阶段可能返回 429、5xx,或 WAF 类
// blocked / 访问阻断文本。命中时冷却,避免封面和预览视频生成连续打接口。
text := strings.ToLower(err.Error())
return strings.Contains(text, "请求太频繁") ||
strings.Contains(text, "请求过于频繁") ||
strings.Contains(text, "请求频繁") ||
strings.Contains(text, "操作频繁") ||
strings.Contains(text, "频率限制") ||
strings.Contains(text, "请求次数过多") ||
strings.Contains(text, "429") ||
strings.Contains(text, "http 500") ||
strings.Contains(text, "http 502") ||
strings.Contains(text, "http 503") ||
strings.Contains(text, "http 504") ||
strings.Contains(text, "server returned 403") ||
strings.Contains(text, "403 forbidden") ||
strings.Contains(text, "too many request") ||
strings.Contains(text, "too many requests") ||
strings.Contains(text, "rate limit") ||
strings.Contains(text, "blocked") ||
strings.Contains(text, "访问被阻断") ||
strings.Contains(text, "service unavailable")
case "wopan":
// 联通网盘的取链接口和下载直链都可能返回"操作频繁"、429、5xx
// 或 WAF 阻断文本。封面/预览失败时先冷却,避免持续触发风控。
text := strings.ToLower(err.Error())
return strings.Contains(text, "请求太频繁") ||
strings.Contains(text, "请求过于频繁") ||
strings.Contains(text, "请求频繁") ||
strings.Contains(text, "操作频繁") ||
strings.Contains(text, "频率限制") ||
strings.Contains(text, "请求次数过多") ||
strings.Contains(text, "系统繁忙") ||
strings.Contains(text, "服务繁忙") ||
strings.Contains(text, "稍后再试") ||
strings.Contains(text, "稍后重试") ||
strings.Contains(text, "429") ||
strings.Contains(text, "http 500") ||
strings.Contains(text, "http 502") ||
strings.Contains(text, "http 503") ||
strings.Contains(text, "http 504") ||
strings.Contains(text, "http 509") ||
strings.Contains(text, "server returned 403") ||
strings.Contains(text, "403 forbidden") ||
strings.Contains(text, "server returned 429") ||
strings.Contains(text, "server returned 500") ||
strings.Contains(text, "server returned 502") ||
strings.Contains(text, "server returned 503") ||
strings.Contains(text, "server returned 504") ||
strings.Contains(text, "too many request") ||
strings.Contains(text, "too many requests") ||
strings.Contains(text, "rate limit") ||
strings.Contains(text, "rate-limit") ||
strings.Contains(text, "throttl") ||
strings.Contains(text, "blocked") ||
strings.Contains(text, "request has been blocked") ||
strings.Contains(text, "访问被阻断") ||
strings.Contains(text, "风控") ||
strings.Contains(text, "service unavailable")
case "googledrive":
// Google Drive 下载/取样阶段常把频控和配额问题包装成 403,
// 具体标识在 error.errors[].reason/message 里(OpenList 也按该结构解析)。
// ffmpeg/ffprobe 只能看到 stderr 文本时,按这些 reason/文本兜底冷却。
text := strings.ToLower(err.Error())
return googleDriveMediaErrorShouldCooldown(text)
}
return false
}
func (w *ThumbWorker) process(ctx context.Context, v *catalog.Video) {
if w.skipIfRateLimited(v) {
return
func googleDriveMediaErrorShouldCooldown(text string) bool {
if text == "" {
return false
}
if current, err := w.Catalog.GetVideo(ctx, v.ID); err == nil {
if current.ThumbnailURL != "" {
_ = w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{ThumbnailStatus: "ready"})
return
compact := compactGoogleDriveErrorText(text)
return strings.Contains(text, "server returned 403") ||
strings.Contains(text, "403 forbidden") ||
strings.Contains(text, "server returned 429") ||
strings.Contains(text, "http 429") ||
strings.Contains(text, "http 500") ||
strings.Contains(text, "http 502") ||
strings.Contains(text, "http 503") ||
strings.Contains(text, "http 504") ||
strings.Contains(text, "too many request") ||
strings.Contains(text, "too many requests") ||
strings.Contains(text, "rate limit") ||
strings.Contains(text, "quota exceeded") ||
strings.Contains(text, "download quota") ||
strings.Contains(text, "sharing rate") ||
strings.Contains(text, "daily limit") ||
strings.Contains(text, "user rate") ||
strings.Contains(text, "usage limit") ||
strings.Contains(text, "service unavailable") ||
strings.Contains(compact, "ratelimitexceeded") ||
strings.Contains(compact, "userratelimitexceeded") ||
strings.Contains(compact, "dailylimitexceeded") ||
strings.Contains(compact, "downloadquotaexceeded") ||
strings.Contains(compact, "sharingratelimitexceeded") ||
strings.Contains(compact, "quotaexceeded") ||
strings.Contains(compact, "toomanyrequests") ||
strings.Contains(compact, "usagelimits")
}
func compactGoogleDriveErrorText(text string) string {
replacer := strings.NewReplacer("_", "", "-", "", " ", "", ".", "", ":", "")
return replacer.Replace(strings.ToLower(strings.TrimSpace(text)))
}
func (w *ThumbWorker) process(ctx context.Context, v *catalog.Video) bool {
if w.skipIfRateLimited(v) {
return false
}
if w.Catalog == nil || v == nil || v.ID == "" {
return false
}
queued := v
loaded, err := w.Catalog.GetVideo(ctx, v.ID)
if err != nil || loaded.Hidden {
return false
}
if loaded.PreviewLocal == "" {
loaded.PreviewLocal = queued.PreviewLocal
}
current := loaded
v = loaded
if loaded.ThumbnailURL != "" && loaded.DurationSeconds > 0 {
_ = w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{ThumbnailStatus: "ready"})
return false
}
if current.ThumbnailURL != "" {
durationBackfillFailed := false
if current.DurationSeconds <= 0 {
link, err := w.streamLink(ctx, current)
if err != nil {
if w.pauseForRecoverableError(ctx, current, err, "streamURL") {
return true
}
log.Printf("[thumb] probe streamURL %s: %v", current.Title, err)
durationBackfillFailed = true
} else if w.probeDuration(ctx, current, link) {
return true
} else if current.DurationSeconds <= 0 {
durationBackfillFailed = true
}
}
if durationBackfillFailed {
log.Printf("[thumb] skip duration backfill %s: thumbnail already exists but duration could not be probed", current.Title)
_ = w.Catalog.UpdateVideoMeta(ctx, current.ID, catalog.VideoMetaPatch{ThumbnailStatus: "skipped"})
return false
}
_ = w.Catalog.UpdateVideoMeta(ctx, current.ID, catalog.VideoMetaPatch{ThumbnailStatus: "ready"})
return false
}
_ = w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{ThumbnailStatus: "pending"})
link, err := w.Drive.StreamURL(ctx, v.FileID)
if isSpider91OriginVideo(v) {
log.Printf("[thumb] skip %s: spider91-origin video must use crawled thumbnail", v.Title)
_ = w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{ThumbnailStatus: "failed"})
return false
}
link, err := w.streamLink(ctx, v)
if err != nil {
if localLink, ok := localPreviewLink(v); ok {
link = localLink
} else {
if w.pauseForRecoverableError(err, "streamURL", v.Title) {
return
}
log.Printf("[thumb] streamURL %s: %v", v.Title, err)
_ = w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{ThumbnailStatus: "failed"})
return
if w.pauseForRecoverableError(ctx, v, err, "streamURL") {
return true
}
log.Printf("[thumb] streamURL %s: %v", v.Title, err)
_ = w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{ThumbnailStatus: "failed"})
return false
}
if w.probeDuration(ctx, v, link) {
return true
}
if err := w.generateThumbnailFromLink(ctx, v, link); err != nil {
if localLink, ok := localPreviewLink(v); ok && link.URL != localLink.URL {
if w.probeDuration(ctx, v, localLink) {
return true
}
if localErr := w.generateThumbnailFromLink(ctx, v, localLink); localErr == nil {
return
return false
}
}
if w.pauseForRecoverableError(err, "generate", v.Title) {
return
if w.pauseForRecoverableError(ctx, v, err, "generate") {
return true
}
log.Printf("[thumb] generate %s: %v", v.Title, err)
_ = w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{ThumbnailStatus: "failed"})
return
return false
}
return false
}
func (w *ThumbWorker) streamLink(ctx context.Context, v *catalog.Video) (*drives.StreamLink, error) {
link, err := w.Drive.StreamURL(ctx, v.FileID)
if err == nil {
return link, nil
}
if localLink, ok := localPreviewLink(v); ok {
return localLink, nil
}
return nil, err
}
func (w *ThumbWorker) probeDuration(ctx context.Context, v *catalog.Video, link *drives.StreamLink) bool {
if v.DurationSeconds > 0 {
return false
}
dur, err := w.Gen.Probe(ctx, link)
if err == nil {
if dur > 0 {
v.DurationSeconds = int(dur)
_ = w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{
DurationSeconds: int(dur),
})
}
return false
}
if w.pauseForRecoverableError(ctx, v, err, "probe") {
return true
}
log.Printf("[thumb] probe %s: %v", v.Title, err)
return false
}
func (w *ThumbWorker) generateThumbnailFromLink(ctx context.Context, v *catalog.Video, link *drives.StreamLink) error {
if _, err := w.Gen.GenerateThumbnail(ctx, link, v.ID, 0); err != nil {
local, err := w.Gen.GenerateThumbnail(ctx, link, v.ID, float64(v.DurationSeconds))
if err != nil {
return err
}
_ = w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{
if err := w.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{
ThumbnailURL: "/p/thumb/" + v.ID,
ThumbnailStatus: "ready",
})
}); err != nil {
_ = os.Remove(local)
log.Printf("[thumb] update %s after generate: %v", v.Title, err)
return nil
}
log.Printf("[thumb] ready %s", v.Title)
return nil
}
func isSpider91OriginVideo(v *catalog.Video) bool {
return v != nil && strings.HasPrefix(v.ID, "spider91-")
}
func localPreviewLink(v *catalog.Video) (*drives.StreamLink, bool) {
if v.PreviewLocal == "" {
return nil, false
@@ -1578,7 +1841,7 @@ func (w *Worker) process(ctx context.Context, v *catalog.Video) {
}
}
// 2) teaser
// 2) 预览视频
tmp, err := w.generateTeaser(ctx, v, link, duration)
if err != nil {
if w.pauseForRecoverableError(err, "generate", v.Title) {
@@ -1596,7 +1859,11 @@ func (w *Worker) process(ctx context.Context, v *catalog.Video) {
}
removePreviousLocalTeaser(v.PreviewLocal, local)
w.Catalog.UpdatePreview(ctx, v.ID, local, "ready")
if err := w.Catalog.UpdatePreview(ctx, v.ID, local, "ready"); err != nil {
removePreviousLocalTeaser(local, "")
log.Printf("[preview] update %s after generate: %v", v.Title, err)
return
}
log.Printf("[preview] ready %s (duration=%.1fs)", v.Title, duration)
}
+52 -9
View File
@@ -5,6 +5,8 @@ import (
"errors"
"math"
"net/http"
"os"
"path/filepath"
"strings"
"testing"
@@ -95,6 +97,24 @@ func TestTinyVideoPreviewPlanUsesWholeVideoAsSingleSegment(t *testing.T) {
}
}
func TestProbeIgnoresStderrWarnings(t *testing.T) {
dir := t.TempDir()
ffprobePath := filepath.Join(dir, "ffprobe")
script := "#!/bin/sh\nprintf '%s\\n' 'h264 warning' >&2\nprintf '%s\\n' '364.800000'\n"
if err := os.WriteFile(ffprobePath, []byte(script), 0o755); err != nil {
t.Fatalf("write ffprobe stub: %v", err)
}
gen := New(Config{FFprobePath: ffprobePath})
got, err := gen.Probe(context.Background(), &drives.StreamLink{URL: filepath.Join(dir, "video.mp4")})
if err != nil {
t.Fatalf("probe: %v", err)
}
if got != 364.8 {
t.Fatalf("duration = %v, want 364.8", got)
}
}
func TestTeaserCandidateStartsKeepPrimaryAndAddFallbacks(t *testing.T) {
primary := []float64{10.2, 64.65, 119.1, 173.55}
got := teaserCandidateStarts(204, primary, 3)
@@ -148,16 +168,39 @@ func TestMediumAndLongVideosStillRequirePlannedTeaserSegments(t *testing.T) {
}
}
func TestThumbnailOffsetsUseFiveSecondsWithEarlyFallbacks(t *testing.T) {
got := thumbnailOffsets()
want := []float64{5, 1, 0}
if len(got) != len(want) {
t.Fatalf("offsets = %#v, want %#v", got, want)
func TestThumbnailOffsetsPreferMiddleFrame(t *testing.T) {
tests := []struct {
name string
duration float64
want []float64
}{
{name: "unknown duration", duration: 0, want: []float64{5, 1, 0}},
{name: "long video", duration: 2804.9, want: []float64{1402.45, 5, 1, 0}},
{name: "short video", duration: 8.9, want: []float64{4.45, 5, 1, 0}},
{name: "middle equals fallback", duration: 10, want: []float64{5, 1, 0}},
}
for i := range want {
if got[i] != want[i] {
t.Fatalf("offset[%d] = %.2f, want %.2f", i, got[i], want[i])
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := thumbnailOffsets(tt.duration)
if len(got) != len(tt.want) {
t.Fatalf("offsets = %#v, want %#v", got, tt.want)
}
for i := range tt.want {
if math.Abs(got[i]-tt.want[i]) > 0.001 {
t.Fatalf("offset[%d] = %.2f, want %.2f", i, got[i], tt.want[i])
}
}
})
}
}
func TestThumbnailVideoFilterUsesFullRangeJPEGPixelFormat(t *testing.T) {
got := thumbnailVideoFilter(480)
if !strings.Contains(got, "scale=480:-2:out_range=pc") {
t.Fatalf("thumbnail filter = %q, want full-range scale output", got)
}
if !strings.Contains(got, "format=yuvj420p") {
t.Fatalf("thumbnail filter = %q, want JPEG-friendly pixel format", got)
}
}
+314 -13
View File
@@ -13,11 +13,11 @@ import (
"github.com/video-site/backend/internal/drives"
)
func TestThumbWorkerUpdatesThumbnailWithoutChangingPreviewStatus(t *testing.T) {
func TestThumbWorkerUpdatesThumbnailAndDurationWithoutChangingPreviewStatus(t *testing.T) {
ctx := context.Background()
cat, video := seedPreviewTestVideo(t, "thumb-worker-video")
gen := &fakeThumbGenerator{}
gen := &fakeThumbGenerator{probeDuration: 42}
drv := &previewFakeDrive{}
worker := NewThumbWorker(gen, cat, drv)
@@ -33,23 +33,131 @@ func TestThumbWorkerUpdatesThumbnailWithoutChangingPreviewStatus(t *testing.T) {
if got.PreviewStatus != "pending" {
t.Fatalf("preview status = %q, want pending", got.PreviewStatus)
}
if got.DurationSeconds != 0 {
t.Fatalf("duration = %d, want unchanged", got.DurationSeconds)
if got.DurationSeconds != 42 {
t.Fatalf("duration = %d, want probed duration", got.DurationSeconds)
}
if gen.thumbnailVideoID != video.ID {
t.Fatalf("thumbnail video id = %q, want %q", gen.thumbnailVideoID, video.ID)
}
if gen.thumbnailDuration != 0 {
t.Fatalf("thumbnail duration = %.1f, want fixed-offset thumbnail generation", gen.thumbnailDuration)
if gen.thumbnailDuration != 42 {
t.Fatalf("thumbnail duration = %.1f, want probed duration", gen.thumbnailDuration)
}
if gen.probeCalls != 0 {
t.Fatalf("probe calls = %d, want 0 for thumbnail generation", gen.probeCalls)
if gen.probeCalls != 1 {
t.Fatalf("probe calls = %d, want 1 for thumbnail generation", gen.probeCalls)
}
if drv.streamFileID != video.FileID {
t.Fatalf("stream file id = %q, want %q", drv.streamFileID, video.FileID)
}
}
func TestThumbWorkerBackfillsDurationWhenThumbnailAlreadyExists(t *testing.T) {
ctx := context.Background()
cat, video := seedPreviewTestVideo(t, "thumb-worker-existing-thumbnail")
video.ThumbnailURL = "/p/thumb/" + video.ID
if err := cat.UpsertVideo(ctx, video); err != nil {
t.Fatalf("update video: %v", err)
}
gen := &fakeThumbGenerator{probeDuration: 19}
drv := &previewFakeDrive{}
worker := NewThumbWorker(gen, cat, drv)
worker.process(ctx, video)
got, err := cat.GetVideo(ctx, video.ID)
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.DurationSeconds != 19 {
t.Fatalf("duration = %d, want probed duration", got.DurationSeconds)
}
if got.ThumbnailURL != "/p/thumb/"+video.ID {
t.Fatalf("thumbnail = %q, want unchanged existing thumbnail", got.ThumbnailURL)
}
ready, err := cat.ListVideosByThumbnailStatus(ctx, video.DriveID, "ready", 0)
if err != nil {
t.Fatalf("list ready thumbnails: %v", err)
}
if len(ready) != 1 || ready[0].ID != video.ID {
t.Fatalf("ready thumbnails = %#v, want only %s", ready, video.ID)
}
if gen.probeCalls != 1 {
t.Fatalf("probe calls = %d, want 1", gen.probeCalls)
}
if gen.thumbnailVideoID != "" {
t.Fatalf("thumbnail generation video id = %q, want no regeneration", gen.thumbnailVideoID)
}
}
func TestThumbWorkerDoesNotGenerateThumbnailForSpider91OriginVideo(t *testing.T) {
ctx := context.Background()
cat, video := seedPreviewTestVideo(t, "spider91-91-spider-1200001")
gen := &fakeThumbGenerator{probeDuration: 42}
drv := &previewFakeDrive{kind: "pikpak"}
worker := NewThumbWorker(gen, cat, drv)
worker.process(ctx, video)
got, err := cat.GetVideo(ctx, video.ID)
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.ThumbnailURL != "" {
t.Fatalf("thumbnail = %q, want empty when crawled spider91 thumbnail is missing", got.ThumbnailURL)
}
failed, err := cat.ListVideosByThumbnailStatus(ctx, video.DriveID, "failed", 0)
if err != nil {
t.Fatalf("list failed thumbnails: %v", err)
}
if len(failed) != 1 || failed[0].ID != video.ID {
t.Fatalf("failed thumbnails = %#v, want only %s", failed, video.ID)
}
if gen.probeCalls != 0 || gen.generateCalls != 0 {
t.Fatalf("generator calls probe=%d generate=%d, want no ffmpeg work for spider91-origin thumbnail", gen.probeCalls, gen.generateCalls)
}
}
func TestThumbWorkerSkipsDurationBackfillWhenExistingThumbnailCannotBeProbed(t *testing.T) {
ctx := context.Background()
cat, video := seedPreviewTestVideo(t, "thumb-worker-existing-thumbnail-probe-fails")
video.ThumbnailURL = "/p/thumb/" + video.ID
if err := cat.UpsertVideo(ctx, video); err != nil {
t.Fatalf("update video: %v", err)
}
gen := &fakeThumbGenerator{probeErr: errors.New("invalid media")}
drv := &previewFakeDrive{}
worker := NewThumbWorker(gen, cat, drv)
worker.process(ctx, video)
got, err := cat.GetVideo(ctx, video.ID)
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.ThumbnailURL != "/p/thumb/"+video.ID {
t.Fatalf("thumbnail = %q, want unchanged existing thumbnail", got.ThumbnailURL)
}
if got.DurationSeconds != 0 {
t.Fatalf("duration = %d, want still unknown", got.DurationSeconds)
}
skipped, err := cat.ListVideosByThumbnailStatus(ctx, video.DriveID, "skipped", 0)
if err != nil {
t.Fatalf("list skipped thumbnails: %v", err)
}
if len(skipped) != 1 || skipped[0].ID != video.ID {
t.Fatalf("skipped thumbnails = %#v, want only %s", skipped, video.ID)
}
missing, err := cat.CountVideosNeedingThumbnail(ctx, video.DriveID)
if err != nil {
t.Fatalf("count videos needing thumbnail: %v", err)
}
if missing != 0 {
t.Fatalf("missing thumbnails = %d, want 0 after duration backfill is skipped", missing)
}
}
func TestThumbWorkerFallsBackToLocalPreviewWhenDriveStreamFails(t *testing.T) {
ctx := context.Background()
cat, video := seedPreviewTestVideo(t, "thumb-worker-local-preview")
@@ -334,7 +442,7 @@ func TestPreviewWorkerRateLimitLeavesCurrentPendingAndSkipsNextVideo(t *testing.
if gen.generateCalls != 1 {
t.Fatalf("generate calls = %d, want 1", gen.generateCalls)
}
assertCooldownAround(t, worker.Status().CooldownUntil, before, 5*time.Minute)
assertCooldownAround(t, worker.Status().CooldownUntil, before, 2*time.Hour)
gen.generateErr = nil
worker.process(ctx, &second)
@@ -350,7 +458,7 @@ func TestPreviewWorkerRateLimitLeavesCurrentPendingAndSkipsNextVideo(t *testing.
}
}
func TestThumbWorkerRateLimitCoolsDownFiveMinutes(t *testing.T) {
func TestThumbWorkerRateLimitHonorsRetryAfter(t *testing.T) {
ctx := context.Background()
cat, video := seedPreviewTestVideo(t, "thumb-rate-limit")
@@ -374,7 +482,143 @@ func TestThumbWorkerRateLimitCoolsDownFiveMinutes(t *testing.T) {
if got.ThumbnailURL != "" {
t.Fatalf("thumbnail = %q, want unchanged after rate limit", got.ThumbnailURL)
}
assertCooldownAround(t, worker.Status().CooldownUntil, before, 5*time.Minute)
assertCooldownAround(t, worker.Status().CooldownUntil, before, 2*time.Hour)
}
func TestThumbWorkerP115TransientErrorFailsAfterRetryLimit(t *testing.T) {
ctx := context.Background()
cat, video := seedPreviewTestVideo(t, "thumb-p115-transient")
gen := &fakeThumbGenerator{
generateErr: errors.New("ffmpeg thumb: exit status 183, stderr: partial file Cannot determine format of input 0:0 after EOF"),
}
drv := &previewFakeDrive{kind: "p115"}
worker := NewThumbWorker(gen, cat, drv)
for attempt := 1; attempt <= defaultThumbTransientMediaMaxFailures; attempt++ {
worker.rateLimit = rateLimitState{}
worker.process(ctx, video)
if attempt < defaultThumbTransientMediaMaxFailures {
pending, err := cat.ListVideosByThumbnailStatus(ctx, video.DriveID, "pending", 0)
if err != nil {
t.Fatalf("list pending thumbnails: %v", err)
}
if len(pending) != 1 || pending[0].ID != video.ID {
t.Fatalf("attempt %d pending thumbnails = %#v, want only %s", attempt, pending, video.ID)
}
missing, err := cat.CountVideosNeedingThumbnail(ctx, video.DriveID)
if err != nil {
t.Fatalf("count missing thumbnails: %v", err)
}
if missing != 1 {
t.Fatalf("attempt %d missing thumbnails = %d, want 1 before retry limit", attempt, missing)
}
continue
}
failed, err := cat.ListVideosByThumbnailStatus(ctx, video.DriveID, "failed", 0)
if err != nil {
t.Fatalf("list failed thumbnails: %v", err)
}
if len(failed) != 1 || failed[0].ID != video.ID {
t.Fatalf("failed thumbnails = %#v, want only %s", failed, video.ID)
}
missing, err := cat.CountVideosNeedingThumbnail(ctx, video.DriveID)
if err != nil {
t.Fatalf("count missing thumbnails: %v", err)
}
if missing != 0 {
t.Fatalf("missing thumbnails = %d, want 0 after retry limit marks failed", missing)
}
}
if gen.generateCalls != defaultThumbTransientMediaMaxFailures {
t.Fatalf("generate calls = %d, want %d", gen.generateCalls, defaultThumbTransientMediaMaxFailures)
}
if err := cat.UpdateVideoMeta(ctx, video.ID, catalog.VideoMetaPatch{
ThumbnailStatus: "pending",
ResetThumbnailFailures: true,
}); err != nil {
t.Fatalf("reset thumbnail status: %v", err)
}
worker.rateLimit = rateLimitState{}
worker.process(ctx, video)
pending, err := cat.ListVideosByThumbnailStatus(ctx, video.DriveID, "pending", 0)
if err != nil {
t.Fatalf("list pending thumbnails after reset: %v", err)
}
if len(pending) != 1 || pending[0].ID != video.ID {
t.Fatalf("pending thumbnails after reset = %#v, want only %s", pending, video.ID)
}
}
func TestThumbWorkerRequeuesP115TransientErrorBeforeRetryLimit(t *testing.T) {
ctx := context.Background()
cat, video := seedPreviewTestVideo(t, "thumb-p115-requeue")
gen := &fakeThumbGenerator{
generateErr: errors.New("ffmpeg thumb: partial file Cannot determine format of input 0:0 after EOF"),
}
drv := &previewFakeDrive{kind: "p115"}
worker := NewThumbWorker(gen, cat, drv)
worker.processQueued(ctx, video)
select {
case queued := <-worker.ch:
if queued.ID != video.ID {
t.Fatalf("requeued video id = %q, want %q", queued.ID, video.ID)
}
default:
t.Fatal("expected transient thumbnail failure to requeue the same video")
}
got, err := cat.GetVideo(ctx, video.ID)
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.ThumbnailURL != "" {
t.Fatalf("thumbnail = %q, want empty after transient failure", got.ThumbnailURL)
}
pending, err := cat.ListVideosByThumbnailStatus(ctx, video.DriveID, "pending", 0)
if err != nil {
t.Fatalf("list pending thumbnails: %v", err)
}
if len(pending) != 1 || pending[0].ID != video.ID {
t.Fatalf("pending thumbnails = %#v, want only %s", pending, video.ID)
}
}
func TestThumbWorkerPikPakMoovAtomErrorFailsWithoutCooldown(t *testing.T) {
ctx := context.Background()
cat, video := seedPreviewTestVideo(t, "thumb-pikpak-missing-moov")
mediaErr := errors.New("ffprobe: exit status 1, stderr: moov atom not found Invalid data found when processing input")
gen := &fakeThumbGenerator{
probeErr: mediaErr,
generateErr: mediaErr,
}
drv := &previewFakeDrive{kind: "pikpak"}
worker := NewThumbWorker(gen, cat, drv)
worker.process(ctx, video)
failed, err := cat.ListVideosByThumbnailStatus(ctx, video.DriveID, "failed", 0)
if err != nil {
t.Fatalf("list failed thumbnails: %v", err)
}
if len(failed) != 1 || failed[0].ID != video.ID {
t.Fatalf("failed thumbnails = %#v, want only %s", failed, video.ID)
}
if !worker.Status().CooldownUntil.IsZero() {
t.Fatalf("cooldown until = %s, want no cooldown for invalid PikPak MP4", worker.Status().CooldownUntil)
}
if gen.generateCalls != 1 {
t.Fatalf("generate calls = %d, want 1", gen.generateCalls)
}
}
func TestPreviewWorkerP115TransientErrorKeepsVideoPending(t *testing.T) {
@@ -401,6 +645,57 @@ func TestPreviewWorkerP115TransientErrorKeepsVideoPending(t *testing.T) {
}
}
func TestP123TransientErrorsShouldCooldown(t *testing.T) {
drv := &previewFakeDrive{kind: "p123"}
for _, err := range []error{
errors.New("Server returned 403 Forbidden"),
errors.New("请求太频繁"),
errors.New("http 503 service unavailable"),
} {
if !driveErrorShouldCooldown(drv, err) {
t.Fatalf("driveErrorShouldCooldown(%v) = false, want true", err)
}
}
if driveErrorShouldCooldown(drv, errors.New("invalid credential")) {
t.Fatal("invalid credential should not trigger p123 cooldown")
}
}
func TestWopanTransientErrorsShouldCooldown(t *testing.T) {
drv := &previewFakeDrive{kind: "wopan"}
for _, err := range []error{
errors.New("ffmpeg: Server returned 403 Forbidden"),
errors.New("wopan download url: request failed with status: 429 Too Many Requests"),
errors.New("操作频繁,请稍后重试"),
errors.New("http 503 service unavailable"),
} {
if !driveErrorShouldCooldown(drv, err) {
t.Fatalf("driveErrorShouldCooldown(%v) = false, want true", err)
}
}
if driveErrorShouldCooldown(drv, errors.New("invalid access token")) {
t.Fatal("invalid access token should not trigger wopan cooldown")
}
}
func TestGoogleDriveMediaErrorsShouldCooldown(t *testing.T) {
drv := &previewFakeDrive{kind: "googledrive"}
for _, err := range []error{
errors.New("google drive api error: usageLimits userRateLimitExceeded"),
errors.New("ffmpeg: Server returned 403 Forbidden"),
errors.New("downloadQuotaExceeded: The download quota for this file has been exceeded"),
errors.New("sharingRateLimitExceeded"),
errors.New("http 503 service unavailable"),
} {
if !driveErrorShouldCooldown(drv, err) {
t.Fatalf("driveErrorShouldCooldown(%v) = false, want true", err)
}
}
if driveErrorShouldCooldown(drv, errors.New("invalid credentials")) {
t.Fatal("invalid credentials should not trigger googledrive cooldown")
}
}
func assertCooldownAround(t *testing.T, until time.Time, before time.Time, want time.Duration) {
t.Helper()
if until.IsZero() {
@@ -469,15 +764,22 @@ type fakeThumbGenerator struct {
thumbnailDuration float64
thumbnailURL string
probeCalls int
generateCalls int
probeDuration float64
probeErr error
generateErr error
}
func (g *fakeThumbGenerator) Probe(context.Context, *drives.StreamLink) (float64, error) {
g.probeCalls++
return 42, nil
if g.probeErr != nil {
return 0, g.probeErr
}
return g.probeDuration, nil
}
func (g *fakeThumbGenerator) GenerateThumbnail(_ context.Context, link *drives.StreamLink, videoID string, duration float64) (string, error) {
g.generateCalls++
g.thumbnailVideoID = videoID
g.thumbnailDuration = duration
if link != nil {
@@ -568,7 +870,6 @@ func (d *previewFakeDrive) EnsureDir(context.Context, string) (string, error) {
}
func (d *previewFakeDrive) RootID() string { return "root" }
func TestWorkerWaitIdleReturnsImmediatelyWhenQueueEmpty(t *testing.T) {
worker := NewWorker(&fakeTeaserGenerator{}, nil, &previewFakeDrive{})
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
+28 -3
View File
@@ -5,6 +5,7 @@ import (
"io"
"net/http"
"net/url"
"path/filepath"
"sync"
"time"
@@ -144,13 +145,19 @@ func (p *Proxy) ServeStream(w http.ResponseWriter, r *http.Request, driveID, fil
// 302 之后浏览器用自己的 UA 直连,CDN 仍然认签名
// - pikpak:与 OpenList 一致,WebContentLink / media link 都是自签 URL
// CDN 不校验请求头,直连可获得最佳带宽并避免占用 backend 出站
// - onedriveMicrosoft Graph 返回的 @microsoft.graph.downloadUrl 是短期
// 免鉴权下载 URL,不需要后端继续代传视频字节
// - p123123网盘 download_info 返回的下载页会再跳 CDNdriver 已在后端
// 先解出最终 Location,浏览器可直接 302 到该短期地址
// - wopan:联通网盘 GetDownloadUrlV2 返回的是短期直链,OpenList 也是直接
// 将该 URL 交给客户端使用;不需要后端持续代传视频字节
//
// 其余网盘(如 OneDrive / 沃盘 / 夸克等)仍走反代,因为它们的下载
// 其余网盘(如夸克等)仍走反代,因为它们的下载
// 链接通常需要随请求带上后端持有的 Cookie / Authorization / Range
// 的特殊处理,浏览器拿不到这些上下文。
func shouldRedirect(d drives.Drive) bool {
switch d.Kind() {
case "p115", "pikpak":
case "p115", "pikpak", "onedrive", "p123", "wopan":
return true
}
return false
@@ -169,6 +176,11 @@ func (p *Proxy) serve(w http.ResponseWriter, r *http.Request, link *drives.Strea
http.Error(w, "bad upstream url", http.StatusBadGateway)
return
}
if localPath, ok := localFilePath(u, link.URL); ok {
w.Header().Set("Cache-Control", "private, max-age=300")
http.ServeFile(w, r, localPath)
return
}
req, err := http.NewRequestWithContext(r.Context(), r.Method, u.String(), nil)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
@@ -206,11 +218,24 @@ func (p *Proxy) serve(w http.ResponseWriter, r *http.Request, link *drives.Strea
_, _ = io.Copy(w, resp.Body)
}
// ServeLocal 服务本地 teaser 文件
// ServeLocal 服务本地预览视频文件
func (p *Proxy) ServeLocal(w http.ResponseWriter, r *http.Request, path string) {
http.ServeFile(w, r, path)
}
func localFilePath(u *url.URL, raw string) (string, bool) {
if u == nil {
return "", false
}
if u.Scheme == "file" && u.Path != "" {
return u.Path, true
}
if u.Scheme == "" && u.Host == "" && filepath.IsAbs(raw) {
return raw, true
}
return "", false
}
var errDriveNotFound = &httpError{Code: http.StatusNotFound, Msg: "drive not found"}
type httpError struct {
+140
View File
@@ -5,6 +5,8 @@ import (
"io"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"testing"
"time"
@@ -149,6 +151,111 @@ func TestServeStreamPikPakSetsRedirectHeaders(t *testing.T) {
}
}
func TestServeStreamRedirectsOneDrive(t *testing.T) {
reg := NewRegistry()
drv := &proxyFakeSimpleDrive{
kind: "onedrive",
url: "https://public.onedrive.example/video.mp4",
}
reg.Set("onedrive", drv)
p := New(reg)
req := httptest.NewRequest(http.MethodGet, "/p/stream/onedrive/file-1", nil)
rr := httptest.NewRecorder()
p.ServeStream(rr, req, "onedrive", "file-1")
if rr.Code != http.StatusFound {
t.Fatalf("status = %d, want %d", rr.Code, http.StatusFound)
}
if got := rr.Header().Get("Location"); got != "https://public.onedrive.example/video.mp4" {
t.Fatalf("Location = %q", got)
}
if drv.calls != 1 {
t.Fatalf("link calls = %d, want 1", drv.calls)
}
}
func TestServeStreamRedirectsP123(t *testing.T) {
reg := NewRegistry()
drv := &proxyFakeSimpleDrive{
kind: "p123",
url: "https://cdn.123pan.example/video.mp4",
}
reg.Set("p123", drv)
p := New(reg)
req := httptest.NewRequest(http.MethodGet, "/p/stream/p123/file-1", nil)
rr := httptest.NewRecorder()
p.ServeStream(rr, req, "p123", "file-1")
if rr.Code != http.StatusFound {
t.Fatalf("status = %d, want %d", rr.Code, http.StatusFound)
}
if got := rr.Header().Get("Location"); got != "https://cdn.123pan.example/video.mp4" {
t.Fatalf("Location = %q", got)
}
if drv.calls != 1 {
t.Fatalf("link calls = %d, want 1", drv.calls)
}
}
func TestServeStreamRedirectsWopan(t *testing.T) {
reg := NewRegistry()
drv := &proxyFakeSimpleDrive{
kind: "wopan",
url: "https://du.smartont.net:8445/openapi/download?fid=encoded",
}
reg.Set("wopan", drv)
p := New(reg)
req := httptest.NewRequest(http.MethodGet, "/p/stream/wopan/file-1", nil)
rr := httptest.NewRecorder()
p.ServeStream(rr, req, "wopan", "file-1")
if rr.Code != http.StatusFound {
t.Fatalf("status = %d, want %d", rr.Code, http.StatusFound)
}
if got := rr.Header().Get("Location"); got != "https://du.smartont.net:8445/openapi/download?fid=encoded" {
t.Fatalf("Location = %q", got)
}
if drv.calls != 1 {
t.Fatalf("link calls = %d, want 1", drv.calls)
}
}
func TestServeStreamServesLocalFilePath(t *testing.T) {
path := filepath.Join(t.TempDir(), "video.mp4")
if err := os.WriteFile(path, []byte("0123456789"), 0o644); err != nil {
t.Fatalf("write local file: %v", err)
}
reg := NewRegistry()
drv := &proxyFakeSimpleDrive{
kind: "localstorage",
url: path,
}
reg.Set("local", drv)
p := New(reg)
req := httptest.NewRequest(http.MethodGet, "/p/stream/local/file-1", nil)
req.Header.Set("Range", "bytes=2-5")
rr := httptest.NewRecorder()
p.ServeStream(rr, req, "local", "file-1")
if rr.Code != http.StatusPartialContent {
t.Fatalf("status = %d, want %d", rr.Code, http.StatusPartialContent)
}
if got := rr.Body.String(); got != "2345" {
t.Fatalf("body = %q, want range bytes", got)
}
if drv.calls != 1 {
t.Fatalf("link calls = %d, want 1", drv.calls)
}
}
func requestPikPak(t *testing.T, p *Proxy, driveID, fileID, ua string) {
t.Helper()
req := httptest.NewRequest(http.MethodGet, "/p/stream/"+driveID+"/"+fileID, nil)
@@ -192,3 +299,36 @@ func (d *proxyFakePikPakDrive) EnsureDir(context.Context, string) (string, error
return "", drives.ErrNotSupported
}
func (d *proxyFakePikPakDrive) RootID() string { return "0" }
type proxyFakeSimpleDrive struct {
kind string
url string
calls int
}
func (d *proxyFakeSimpleDrive) Kind() string { return d.kind }
func (d *proxyFakeSimpleDrive) ID() string { return d.kind }
func (d *proxyFakeSimpleDrive) Init(context.Context) error {
return nil
}
func (d *proxyFakeSimpleDrive) List(context.Context, string) ([]drives.Entry, error) {
return nil, drives.ErrNotSupported
}
func (d *proxyFakeSimpleDrive) Stat(context.Context, string) (*drives.Entry, error) {
return nil, drives.ErrNotSupported
}
func (d *proxyFakeSimpleDrive) StreamURL(context.Context, string) (*drives.StreamLink, error) {
d.calls++
return &drives.StreamLink{
URL: d.url,
Headers: http.Header{},
Expires: time.Now().Add(10 * time.Minute),
}, nil
}
func (d *proxyFakeSimpleDrive) Upload(context.Context, string, string, io.Reader, int64) (string, error) {
return "", drives.ErrNotSupported
}
func (d *proxyFakeSimpleDrive) EnsureDir(context.Context, string) (string, error) {
return "", drives.ErrNotSupported
}
func (d *proxyFakeSimpleDrive) RootID() string { return "0" }
+2 -2
View File
@@ -16,11 +16,11 @@ type ParsedName struct {
}
var (
reTags = regexp.MustCompile(`^\s*\[([^\]]+)\]\s*`) // [tag1,tag2]
reTags = regexp.MustCompile(`^\s*\[([^\]]+)\]\s*`) // [前缀]
reAuthor = regexp.MustCompile(`\s*-\s*([^-]+?)\s*$`) // - author
)
// Parse 按约定解析:[tag1,tag2] 标题 - 作者.ext
// Parse 按约定解析:[前缀] 标题 - 作者.ext
// 任何字段缺失都能降级
func Parse(filename string) ParsedName {
name := strings.TrimSuffix(filename, path.Ext(filename))
+64 -18
View File
@@ -2,6 +2,7 @@ package scanner
import (
"context"
"encoding/base64"
"fmt"
"log"
"path"
@@ -23,8 +24,10 @@ type Scanner struct {
//
// nil / 空集合 → 行为等同于不跳过任何目录。
SkipDirIDs map[string]struct{}
// 回调:新视频被加入后触发 teaser 生成
// 回调:新视频被加入后触发预览视频生成
OnNewVideo func(v *catalog.Video)
// OnProgress 在扫描进度变化时触发。回调只应读取 Stats 里的计数,不应修改 map 字段。
OnProgress func(stats Stats)
// ProgressInterval 控制扫描内部 heartbeat 的最小输出间隔。
// 0 → 默认 30s< 0 → 关闭 heartbeat(仅留外层 start / done 两行)。
// heartbeat 单行格式:
@@ -91,6 +94,9 @@ func (s *Scanner) Run(ctx context.Context, startDirID string) (Stats, error) {
driveID = s.Drive.ID()
}
progress := func(currentDir string) {
if s.OnProgress != nil {
s.OnProgress(stats)
}
if interval < 0 {
return
}
@@ -127,8 +133,11 @@ func (s *Scanner) walk(ctx context.Context, dirID, dirName string, stats *Stats,
}
for _, e := range entries {
if err := ctx.Err(); err != nil {
return err
}
if e.IsDir {
// 跳过 previews 目录,避免扫到自己生成的 teaser
// 跳过 previews 目录,避免扫到自己生成的预览视频
if strings.EqualFold(e.Name, "previews") {
continue
}
@@ -137,13 +146,15 @@ func (s *Scanner) walk(ctx context.Context, dirID, dirName string, stats *Stats,
continue
}
if err := s.walk(ctx, e.ID, e.Name, stats, progress); err != nil {
if ctxErr := ctx.Err(); ctxErr != nil {
return ctxErr
}
stats.Errors++
log.Printf("[scanner] walk %s error: %v", e.Name, err)
}
continue
}
stats.Scanned++
ext := strings.ToLower(path.Ext(e.Name))
if !s.Exts[ext] {
continue
@@ -151,9 +162,22 @@ func (s *Scanner) walk(ctx context.Context, dirID, dirName string, stats *Stats,
if e.Size <= 0 {
continue
}
stats.Scanned++
progress(dirName)
stats.SeenFileIDs[e.ID] = struct{}{}
id := s.Drive.Kind() + "-" + s.Drive.ID() + "-" + e.ID
id := s.Drive.Kind() + "-" + s.Drive.ID() + "-" + videoIDFilePart(e.ID)
if deleted, err := s.Catalog.IsDeletedVideoCandidate(ctx, id, s.Drive.ID(), e.ID, e.Hash, e.Name, e.Size); err != nil {
if ctxErr := ctx.Err(); ctxErr != nil {
return ctxErr
}
stats.Errors++
log.Printf("[scanner] check deleted video %s error: %v", id, err)
continue
} else if deleted {
continue
}
parsed := Parse(e.Name)
if parsed.Title == "" {
parsed.Title = strings.TrimSuffix(e.Name, ext)
@@ -162,11 +186,20 @@ func (s *Scanner) walk(ctx context.Context, dirID, dirName string, stats *Stats,
if matched, err := s.Catalog.MatchTags(ctx, e.Name+" "+dirName+" "+parsed.Author); err == nil {
tags = mergeTags(tags, matched)
}
if err := ctx.Err(); err != nil {
return err
}
if label, ok, err := s.Catalog.EnsureCollectionTag(ctx, dirName); err == nil && ok {
tags = mergeTags(tags, []string{label})
}
if err := ctx.Err(); err != nil {
return err
}
existing, _ := s.Catalog.GetVideo(ctx, id)
if err := ctx.Err(); err != nil {
return err
}
if existing != nil {
patch := catalog.VideoMetaPatch{}
if e.Hash != "" && existing.ContentHash == "" {
@@ -181,26 +214,33 @@ func (s *Scanner) walk(ctx context.Context, dirID, dirName string, stats *Stats,
if existing.Category == "" && dirName != "" {
patch.Category = dirName
}
if existing.ThumbnailURL == "" && e.ThumbnailURL != "" {
patch.ThumbnailURL = e.ThumbnailURL
}
if patch.Category != "" || patch.ThumbnailURL != "" || patch.ContentHash != "" || patch.FileName != "" {
if patch.Category != "" || patch.ContentHash != "" || patch.FileName != "" {
_ = s.Catalog.UpdateVideoMeta(ctx, id, patch)
if err := ctx.Err(); err != nil {
return err
}
}
if dup := s.findDuplicate(ctx, e.Hash, e.Name, e.Size, id); dup != nil {
s.backfillDuplicateThumbnail(ctx, dup, e.ThumbnailURL)
continue
}
if err := ctx.Err(); err != nil {
return err
}
if !sameTags(existing.Tags, tags) {
_ = s.Catalog.SetAutoVideoTags(ctx, id, tags)
if err := ctx.Err(); err != nil {
return err
}
}
continue
}
if dup := s.findDuplicate(ctx, e.Hash, e.Name, e.Size, id); dup != nil {
s.backfillDuplicateThumbnail(ctx, dup, e.ThumbnailURL)
continue
}
if err := ctx.Err(); err != nil {
return err
}
now := time.Now()
v := &catalog.Video{
@@ -216,7 +256,6 @@ func (s *Scanner) walk(ctx context.Context, dirID, dirName string, stats *Stats,
Ext: strings.TrimPrefix(ext, "."),
Quality: "HD",
Size: e.Size,
ThumbnailURL: e.ThumbnailURL,
PreviewStatus: "pending",
Category: dirName,
PublishedAt: now,
@@ -224,10 +263,17 @@ func (s *Scanner) walk(ctx context.Context, dirID, dirName string, stats *Stats,
UpdatedAt: now,
}
if err := s.Catalog.UpsertVideo(ctx, v); err != nil {
if ctxErr := ctx.Err(); ctxErr != nil {
return ctxErr
}
log.Printf("[scanner] upsert %s error: %v", v.Title, err)
continue
}
if err := ctx.Err(); err != nil {
return err
}
stats.Added++
progress(dirName)
if s.OnNewVideo != nil {
s.OnNewVideo(v)
}
@@ -268,13 +314,6 @@ func (s *Scanner) findDuplicateByFileSignature(ctx context.Context, fileName str
return dup
}
func (s *Scanner) backfillDuplicateThumbnail(ctx context.Context, canonical *catalog.Video, thumbnailURL string) {
if canonical.ThumbnailURL != "" || thumbnailURL == "" {
return
}
_ = s.Catalog.UpdateVideoMeta(ctx, canonical.ID, catalog.VideoMetaPatch{ThumbnailURL: thumbnailURL})
}
func sameTags(a, b []string) bool {
if len(a) != len(b) {
return false
@@ -301,3 +340,10 @@ func mergeTags(lists ...[]string) []string {
}
return out
}
func videoIDFilePart(fileID string) string {
if !strings.ContainsAny(fileID, `/\`+"\x00") {
return fileID
}
return "b64_" + base64.RawURLEncoding.EncodeToString([]byte(fileID))
}
+271 -6
View File
@@ -3,6 +3,7 @@ package scanner
import (
"context"
"database/sql"
"errors"
"fmt"
"io"
"log"
@@ -14,7 +15,7 @@ import (
"github.com/video-site/backend/internal/drives"
)
func TestRunPersistsRemoteThumbnailFromDriveEntry(t *testing.T) {
func TestRunIgnoresRemoteThumbnailFromDriveEntry(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
@@ -50,8 +51,8 @@ func TestRunPersistsRemoteThumbnailFromDriveEntry(t *testing.T) {
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.ThumbnailURL != "https://thumbnail.example/clip.jpg" {
t.Fatalf("thumbnail = %q, want remote thumbnail", got.ThumbnailURL)
if got.ThumbnailURL != "" {
t.Fatalf("thumbnail = %q, want empty so local thumbnail worker regenerates it", got.ThumbnailURL)
}
}
@@ -90,7 +91,184 @@ func TestRunIgnoresZeroSizeVideoFiles(t *testing.T) {
}
}
func TestRunBackfillsRemoteThumbnailForExistingVideo(t *testing.T) {
func TestRunScannedCountsOnlyVideoCandidates(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := &scannerFakeDrive{
entries: []drives.Entry{
{ID: "file-1", Name: "clip.mp4", Size: 123},
{ID: "file-2", Name: "notes.txt", Size: 123},
{ID: "file-3", Name: "empty.mp4", Size: 0},
},
}
sc := New(cat, drv, []string{".mp4"}, nil, nil)
stats, err := sc.Run(ctx, "")
if err != nil {
t.Fatalf("scan: %v", err)
}
if stats.Scanned != 1 {
t.Fatalf("scanned = %d, want one non-empty video candidate", stats.Scanned)
}
if stats.Added != 1 {
t.Fatalf("added = %d, want one added video", stats.Added)
}
}
func TestRunUsesPathSafeVideoIDForUnsafeFileID(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := &scannerFakeDrive{
entries: []drives.Entry{{
ID: "fid/with space",
Name: "clip.mp4",
Size: 123,
}},
}
sc := New(cat, drv, []string{".mp4"}, nil, nil)
stats, err := sc.Run(ctx, "")
if err != nil {
t.Fatalf("scan: %v", err)
}
if stats.Added != 1 {
t.Fatalf("added = %d, want 1", stats.Added)
}
if _, ok := stats.SeenFileIDs["fid/with space"]; !ok {
t.Fatalf("seen file ids = %#v, want original file id", stats.SeenFileIDs)
}
wantID := "fake-drive-b64_ZmlkL3dpdGggc3BhY2U"
got, err := cat.GetVideo(ctx, wantID)
if err != nil {
t.Fatalf("get video %s: %v", wantID, err)
}
if strings.Contains(got.ID, "/") {
t.Fatalf("video id = %q, must not contain slash", got.ID)
}
if got.FileID != "fid/with space" {
t.Fatalf("file id = %q, want original", got.FileID)
}
}
func TestRunStopsWhenContextCanceledDuringFileLoop(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
drv := &scannerFakeDrive{
entries: []drives.Entry{
{ID: "file-1", Name: "one.mp4", Size: 123},
{ID: "file-2", Name: "two.mp4", Size: 123},
{ID: "file-3", Name: "three.mp4", Size: 123},
},
}
callbacks := 0
sc := New(cat, drv, []string{".mp4"}, nil, func(*catalog.Video) {
callbacks++
cancel()
})
stats, err := sc.Run(ctx, "")
if !errors.Is(err, context.Canceled) {
t.Fatalf("scan error = %v, want context.Canceled", err)
}
if stats.Added != 1 || callbacks != 1 {
t.Fatalf("added=%d callbacks=%d, want exactly one video before cancellation", stats.Added, callbacks)
}
if _, err := cat.GetVideo(context.Background(), "fake-drive-file-1"); err != nil {
t.Fatalf("first video should be persisted before cancellation: %v", err)
}
if _, err := cat.GetVideo(context.Background(), "fake-drive-file-2"); err != sql.ErrNoRows {
t.Fatalf("second video lookup error = %v, want sql.ErrNoRows", err)
}
if _, err := cat.GetVideo(context.Background(), "fake-drive-file-3"); err != sql.ErrNoRows {
t.Fatalf("third video lookup error = %v, want sql.ErrNoRows", err)
}
}
func TestRunSkipsAdminDeletedVideo(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: "fake-drive-file-1",
DriveID: "drive",
FileID: "file-1",
FileName: "clip.mp4",
ContentHash: "HASH-1",
Title: "Deleted Clip",
Size: 123,
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed video: %v", err)
}
if err := cat.DeleteVideoWithTombstone(ctx, "fake-drive-file-1"); err != nil {
t.Fatalf("delete with tombstone: %v", err)
}
drv := &scannerFakeDrive{
entries: []drives.Entry{{
ID: "file-1",
Name: "clip.mp4",
Size: 123,
Hash: "hash-1",
MimeType: "video/mp4",
ModTime: now,
}},
}
sc := New(cat, drv, []string{".mp4"}, nil, nil)
stats, err := sc.Run(ctx, "")
if err != nil {
t.Fatalf("scan: %v", err)
}
if stats.Added != 0 {
t.Fatalf("added = %d, want 0", stats.Added)
}
if _, err := cat.GetVideo(ctx, "fake-drive-file-1"); err != sql.ErrNoRows {
t.Fatalf("deleted video was recreated, get error = %v", err)
}
}
func TestRunDoesNotBackfillRemoteThumbnailForExistingVideo(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
@@ -140,8 +318,8 @@ func TestRunBackfillsRemoteThumbnailForExistingVideo(t *testing.T) {
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.ThumbnailURL != "https://thumbnail.example/backfilled.jpg" {
t.Fatalf("thumbnail = %q, want backfilled remote thumbnail", got.ThumbnailURL)
if got.ThumbnailURL != "" {
t.Fatalf("thumbnail = %q, want empty so local thumbnail worker regenerates it", got.ThumbnailURL)
}
}
@@ -254,6 +432,93 @@ func TestRunAddsShortCollectionDirectoryAsTag(t *testing.T) {
}
}
func TestRunDoesNotRecreateDeletedCollectionDirectoryTag(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
if err != nil {
t.Fatalf("open catalog: %v", err)
}
t.Cleanup(func() {
if err := cat.Close(); err != nil {
t.Fatalf("close catalog: %v", err)
}
})
now := time.Now()
for _, id := range []string{"existing-1", "existing-2"} {
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: "drive",
FileID: id,
Title: "Existing",
Category: "sunny",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("seed existing sunny video: %v", err)
}
}
if label, ok, err := cat.EnsureCollectionTag(ctx, "sunny"); err != nil || !ok || label != "sunny" {
t.Fatalf("ensure collection = %q, %v, %v; want sunny true nil", label, ok, err)
}
tags, err := cat.ListTags(ctx)
if err != nil {
t.Fatalf("list tags: %v", err)
}
var tagID int64
for _, tag := range tags {
if tag.Label == "sunny" {
tagID = tag.ID
break
}
}
if tagID == 0 {
t.Fatal("sunny tag not found before delete")
}
if _, err := cat.DeleteTag(ctx, tagID); err != nil {
t.Fatalf("delete tag: %v", err)
}
drv := &scannerTreeFakeDrive{
entries: map[string][]drives.Entry{
"root": {{
ID: "dir-1",
Name: "sunny",
IsDir: true,
}},
"dir-1": {{
ID: "file-1",
ParentID: "dir-1",
Name: "clip.mp4",
Size: 123,
ModTime: now,
}},
},
}
sc := New(cat, drv, []string{".mp4"}, nil, nil)
if _, err := sc.Run(ctx, ""); err != nil {
t.Fatalf("scan: %v", err)
}
got, err := cat.GetVideo(ctx, "fake-drive-file-1")
if err != nil {
t.Fatalf("get video: %v", err)
}
if len(got.Tags) != 0 {
t.Fatalf("tags = %#v, want none", got.Tags)
}
tags, err = cat.ListTags(ctx)
if err != nil {
t.Fatalf("list tags after scan: %v", err)
}
for _, tag := range tags {
if tag.Label == "sunny" {
t.Fatal("deleted collection tag was recreated during scan")
}
}
}
func TestRunMapsAVCodeDirectoryToAVTag(t *testing.T) {
ctx := context.Background()
cat, err := catalog.Open(t.TempDir() + "/catalog.db")
+625 -81
View File
@@ -1,10 +1,11 @@
// Package spider91migrate 周期性把 spider91 drive 下载到本地的视频
// 上传到一个指定的目标 drive 目录(PikPak 或 115),上传成功后:
// 上传到一个指定的目标 drive 目录(PikPak、115、123、OneDrive、Google Drive 或联通网盘),上传成功后:
//
// - 改写 catalog 行:drive_id / file_id / content_hash 改成目标盘的;
// 视频自身的 id 不变(仍是 spider91-<driveID>-<viewkey>),video_tags、
// 收藏、点赞、views 等关联数据全部保留
// - 删除本地 mp4spider91/<id>/videos/<viewkey>.<ext>)和 thumbspider91/<id>/thumbs/<viewkey>.jpg
// - 删除本地 mp4spider91/<id>/videos/<viewkey>.<ext>)和 thumb
// spider91/<id>/thumbs/<viewkey>.jpg);公共 /p/thumb/<videoID> 副本会保留
//
// 之后回放时,videoSource() 自动落到 /p/stream/<target>/<file_id>
// proxy 层走对应盘的直链 / 302 直连。
@@ -15,6 +16,7 @@ package spider91migrate
import (
"context"
"database/sql"
"errors"
"fmt"
"io"
@@ -28,31 +30,53 @@ import (
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/drives"
"github.com/video-site/backend/internal/drives/googledrive"
"github.com/video-site/backend/internal/drives/onedrive"
"github.com/video-site/backend/internal/drives/p115"
"github.com/video-site/backend/internal/drives/p123"
"github.com/video-site/backend/internal/drives/pikpak"
"github.com/video-site/backend/internal/drives/scriptcrawler"
"github.com/video-site/backend/internal/drives/spider91"
"github.com/video-site/backend/internal/drives/wopan"
"github.com/video-site/backend/internal/mediaasset"
)
// uploadTarget 是 migrator 调用目标 drive 的最小接口。任何一种"接收 spider91 上传"的
// 网盘都要实现它;当前 PikPak 和 115 各自通过适配器满足。
// 网盘都要实现它;当前 PikPak、115、123、OneDrive、Google Drive 和联通网盘各自通过适配器满足。
//
// 这一层抽象把"迁移调用方"和"具体盘的 SDK 协议"解耦:
// - PikPak 走 GCID + OSS PutObjectpikpak.UploadResult
// - 115 走 SHA1 + 秒传 / OSS / 分片(p115.UploadResult
// - 123 走 MD5 + 秒传 / S3 预签名分片(p123.UploadResult
// - OneDrive 走 SHA1 + 小文件 PUT / 大文件 upload session
// - Google Drive 走 MD5 + resumable upload session
// - 联通网盘 走 SDK Upload2C,当前上游不返回内容 hash
//
// 两个返回值都被归一成本地的 UploadResult,并在 catalog 改写阶段统一处理。
// 各家返回值都被归一成本地的 UploadResult,并在 catalog 改写阶段统一处理。
type uploadTarget interface {
ID() string
Kind() string
RootID() string
EnsureDir(ctx context.Context, pathFromRoot string) (string, error)
UploadAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error)
Rename(ctx context.Context, fileID, newName string) error
}
// Spider91LocalSource is the local source interface used by the migration
// worker. Legacy spider91.Driver and the new scriptcrawler.Driver both satisfy
// it when they are mounted for the Spider91 built-in crawler.
type Spider91LocalSource interface {
drives.Drive
VideosDir() string
ThumbsDir() string
VideoPath(fileID string) (string, error)
ThumbPath(fileID string) (string, error)
}
// UploadResult 是 uploadTarget.UploadAndReportHash 的归一返回。
//
// FileID 目标盘上的新文件 ID;
// Hash GCIDPikPak)或 SHA1 HEX 大写115),写入 catalog.content_hash 用于跨盘去重;
// Hash GCIDPikPak、MD5 HEX123 / Google Drive或 SHA1 HEX115 / OneDrive),写入 catalog.content_hash 用于跨盘去重;联通网盘暂为空;
// Size 实际上传字节数。
type UploadResult struct {
FileID string
@@ -60,7 +84,33 @@ type UploadResult struct {
Size int64
}
// pikpakAdapter / p115Adapter 把具体 driver 包装成 uploadTarget。
type UploadProgress struct {
DriveID string
State string
CurrentTitle string
QueueLength int
DoneCount int
TotalCount int
}
const (
spider91UploadDirName = "91 Spider"
scriptCrawlerUploadRootDirName = "Script Crawlers"
)
type migrationPlan struct {
source Spider91LocalSource
row *catalog.Drive
sourceKinds []string
targetDriveID string
target uploadTarget
uploadDir string
keepLatestN int
requireAssetsReady bool
legacyBackfill bool
}
// pikpakAdapter / p115Adapter / p123Adapter / onedriveAdapter / googledriveAdapter / wopanAdapter 把具体 driver 包装成 uploadTarget。
//
// 之所以不让 driver 直接实现 uploadTarget
//
@@ -74,6 +124,9 @@ type pikpakAdapter struct {
func (a *pikpakAdapter) ID() string { return a.d.ID() }
func (a *pikpakAdapter) Kind() string { return a.d.Kind() }
func (a *pikpakAdapter) RootID() string { return a.d.RootID() }
func (a *pikpakAdapter) EnsureDir(ctx context.Context, pathFromRoot string) (string, error) {
return a.d.EnsureDir(ctx, pathFromRoot)
}
func (a *pikpakAdapter) UploadAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error) {
res, err := a.d.UploadAndReportHash(ctx, parentID, name, r, size)
if err != nil {
@@ -92,6 +145,9 @@ type p115Adapter struct {
func (a *p115Adapter) ID() string { return a.d.ID() }
func (a *p115Adapter) Kind() string { return a.d.Kind() }
func (a *p115Adapter) RootID() string { return a.d.RootID() }
func (a *p115Adapter) EnsureDir(ctx context.Context, pathFromRoot string) (string, error) {
return a.d.EnsureDir(ctx, pathFromRoot)
}
func (a *p115Adapter) UploadAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error) {
res, err := a.d.UploadAndReportSha1(ctx, parentID, name, r, size)
if err != nil {
@@ -103,6 +159,90 @@ func (a *p115Adapter) Rename(ctx context.Context, fileID, newName string) error
return a.d.Rename(ctx, fileID, newName)
}
type p123Adapter struct {
d *p123.Driver
}
func (a *p123Adapter) ID() string { return a.d.ID() }
func (a *p123Adapter) Kind() string { return a.d.Kind() }
func (a *p123Adapter) RootID() string { return a.d.RootID() }
func (a *p123Adapter) EnsureDir(ctx context.Context, pathFromRoot string) (string, error) {
return a.d.EnsureDir(ctx, pathFromRoot)
}
func (a *p123Adapter) UploadAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error) {
res, err := a.d.UploadAndReportHash(ctx, parentID, name, r, size)
if err != nil {
return UploadResult{}, err
}
return UploadResult{FileID: res.FileID, Hash: res.Hash, Size: res.Size}, nil
}
func (a *p123Adapter) Rename(ctx context.Context, fileID, newName string) error {
return a.d.Rename(ctx, fileID, newName)
}
type onedriveAdapter struct {
d *onedrive.Driver
}
func (a *onedriveAdapter) ID() string { return a.d.ID() }
func (a *onedriveAdapter) Kind() string { return a.d.Kind() }
func (a *onedriveAdapter) RootID() string { return a.d.RootID() }
func (a *onedriveAdapter) EnsureDir(ctx context.Context, pathFromRoot string) (string, error) {
return a.d.EnsureDir(ctx, pathFromRoot)
}
func (a *onedriveAdapter) UploadAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error) {
res, err := a.d.UploadAndReportHash(ctx, parentID, name, r, size)
if err != nil {
return UploadResult{}, err
}
return UploadResult{FileID: res.FileID, Hash: res.Hash, Size: res.Size}, nil
}
func (a *onedriveAdapter) Rename(ctx context.Context, fileID, newName string) error {
return a.d.Rename(ctx, fileID, newName)
}
type googledriveAdapter struct {
d *googledrive.Driver
}
func (a *googledriveAdapter) ID() string { return a.d.ID() }
func (a *googledriveAdapter) Kind() string { return a.d.Kind() }
func (a *googledriveAdapter) RootID() string { return a.d.RootID() }
func (a *googledriveAdapter) EnsureDir(ctx context.Context, pathFromRoot string) (string, error) {
return a.d.EnsureDir(ctx, pathFromRoot)
}
func (a *googledriveAdapter) UploadAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error) {
res, err := a.d.UploadAndReportHash(ctx, parentID, name, r, size)
if err != nil {
return UploadResult{}, err
}
return UploadResult{FileID: res.FileID, Hash: res.Hash, Size: res.Size}, nil
}
func (a *googledriveAdapter) Rename(ctx context.Context, fileID, newName string) error {
return a.d.Rename(ctx, fileID, newName)
}
type wopanAdapter struct {
d *wopan.Driver
}
func (a *wopanAdapter) ID() string { return a.d.ID() }
func (a *wopanAdapter) Kind() string { return a.d.Kind() }
func (a *wopanAdapter) RootID() string { return a.d.RootID() }
func (a *wopanAdapter) EnsureDir(ctx context.Context, pathFromRoot string) (string, error) {
return a.d.EnsureDir(ctx, pathFromRoot)
}
func (a *wopanAdapter) UploadAndReportHash(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error) {
fileID, err := a.d.Upload(ctx, parentID, name, r, size)
if err != nil {
return UploadResult{}, err
}
return UploadResult{FileID: fileID, Size: size}, nil
}
func (a *wopanAdapter) Rename(ctx context.Context, fileID, newName string) error {
return a.d.Rename(ctx, fileID, newName)
}
// adaptUploadTarget 把通用 drive 包装成 uploadTarget。
// 不支持的盘 kind 返回 error;调用方静默跳过。
func adaptUploadTarget(d drives.Drive) (uploadTarget, error) {
@@ -111,6 +251,14 @@ func adaptUploadTarget(d drives.Drive) (uploadTarget, error) {
return &pikpakAdapter{d: v}, nil
case *p115.Driver:
return &p115Adapter{d: v}, nil
case *p123.Driver:
return &p123Adapter{d: v}, nil
case *onedrive.Driver:
return &onedriveAdapter{d: v}, nil
case *googledrive.Driver:
return &googledriveAdapter{d: v}, nil
case *wopan.Driver:
return &wopanAdapter{d: v}, nil
case uploadTarget:
// 测试或自定义实现可以直接传入;优先使用具体类型分支以拿到适配器。
return v, nil
@@ -140,8 +288,10 @@ type Config struct {
// CaptchaCooldown 是迁移 worker 在遇到 PikPak captcha 错误(error_code
// 4002 / 9)后整体进入冷却的时长。冷却期间 runOnce 直接返回,不再发起任何
// PikPak API 请求,避免被进一步风控。0 时默认 5 分钟;< 0 关闭冷却(仅用于测试)。
CaptchaCooldown time.Duration
OnMigrated func(videoID string)
CaptchaCooldown time.Duration
CommonThumbDir string
OnMigrated func(videoID string)
OnUploadProgress func(UploadProgress)
}
type Migrator struct {
@@ -270,59 +420,79 @@ func (m *Migrator) runOnce(ctx context.Context) {
log.Printf("[spider91migrate] captcha cooldown ended at %s, resuming migration", until.Format(time.RFC3339))
}
target, pp, err := m.resolveTarget()
if err != nil {
// 没目标就静默 —— 用户可能还没配 PikPak drive
plans := m.migrationPlans(ctx)
if len(plans) == 0 {
// 没目标就静默 —— 用户选择了本地保存,或目标盘还没挂载。
return
}
migrated := 0
for _, src := range m.spider91Drives() {
backfillTargets := map[string]uploadTarget{}
for _, plan := range plans {
if err := ctx.Err(); err != nil {
return
}
n, err := m.migrateDrive(ctx, src, target, pp)
n, err := m.migrateDrive(ctx, plan)
if err != nil {
log.Printf("[spider91migrate] drive=%s migrate batch error: %v", src.ID(), err)
log.Printf("[spider91migrate] drive=%s migrate batch error: %v", plan.source.ID(), err)
}
migrated += n
if active, _ := m.inCooldown(); active {
if migrated > 0 {
log.Printf("[spider91migrate] migrated %d video(s) to drive=%s", migrated, target)
log.Printf("[spider91migrate] migrated %d video(s)", migrated)
}
return
}
if plan.legacyBackfill {
backfillTargets[plan.targetDriveID] = plan.target
}
}
if migrated > 0 {
log.Printf("[spider91migrate] migrated %d video(s) to drive=%s", migrated, target)
log.Printf("[spider91migrate] migrated %d video(s)", migrated)
}
// 收尾:扫每个 spider91 drive 的本地目录,把 catalog 已经迁到别处但本地
// 收尾:扫每个本地爬虫 drive 的 videos 目录,把 catalog 已经迁到别处但本地
// 仍有残留的孤儿文件清掉。这是纯防御性兜底——正常路径下 migrateDrive
// 已经在迁移成功后立刻 CleanupSpider91Local,不会留孤儿。
for _, src := range m.spider91Drives() {
for _, plan := range plans {
if err := ctx.Err(); err != nil {
return
}
deleted, err := m.cleanupOldLocalVideos(ctx, src)
deleted, err := m.cleanupOldLocalVideos(ctx, plan)
if err != nil {
log.Printf("[spider91migrate] cleanup drive=%s: %v", src.ID(), err)
log.Printf("[spider91migrate] cleanup drive=%s: %v", plan.source.ID(), err)
}
if deleted > 0 {
log.Printf("[spider91migrate] cleanup drive=%s deleted %d orphan local file(s)", src.ID(), deleted)
log.Printf("[spider91migrate] cleanup drive=%s deleted %d orphan local file(s)", plan.source.ID(), deleted)
}
}
// 回填:把已迁移到 PikPak 的 spider91-* 视频里文件名仍是旧格式
// (比如刚迁完没改、或人工导入)的统一改成方案 B 期望的格式。
// 这一步幂等:已经是期望格式的不会再调 Rename。
if renamed, err := m.backfillFileNames(ctx, target, pp); err != nil {
log.Printf("[spider91migrate] backfill names: %v", err)
} else if renamed > 0 {
log.Printf("[spider91migrate] backfilled %d %s file name(s) to desired format", renamed, m.targetKindForLog())
for targetDriveID, pp := range backfillTargets {
if renamed, err := m.backfillFileNames(ctx, targetDriveID, pp); err != nil {
log.Printf("[spider91migrate] backfill names: %v", err)
} else if renamed > 0 {
log.Printf("[spider91migrate] backfilled %d %s file name(s) to desired format", renamed, pp.Kind())
}
}
}
func (m *Migrator) reportUploadProgress(progress UploadProgress) {
if m == nil || m.cfg.OnUploadProgress == nil {
return
}
progress.DriveID = strings.TrimSpace(progress.DriveID)
if progress.DriveID == "" {
return
}
if progress.State == "" {
progress.State = "idle"
}
m.cfg.OnUploadProgress(progress)
}
// targetKindForLog 把当前目标盘 kind 转成对人友好的简称,用于日志。
// 解析失败时回退 "target"。
func (m *Migrator) targetKindForLog() string {
@@ -347,9 +517,17 @@ func (m *Migrator) resolveTarget() (string, uploadTarget, error) {
return "", nil, errors.New("no target getter")
}
id := m.cfg.GetTargetDriveID()
return m.resolveTargetID(id)
}
func (m *Migrator) resolveTargetID(id string) (string, uploadTarget, error) {
id = strings.TrimSpace(id)
if id == "" {
return "", nil, errors.New("target drive not configured")
}
if m.cfg.Registry == nil {
return "", nil, errors.New("registry not configured")
}
d, ok := m.cfg.Registry.Get(id)
if !ok {
return "", nil, fmt.Errorf("target drive %q not in registry", id)
@@ -361,33 +539,142 @@ func (m *Migrator) resolveTarget() (string, uploadTarget, error) {
return id, t, nil
}
// spider91Drives 返回当前注册的所有 spider91 driver。
func (m *Migrator) spider91Drives() []*spider91.Driver {
func (m *Migrator) migrationPlans(ctx context.Context) []migrationPlan {
if m == nil || m.cfg.Catalog == nil || m.cfg.Registry == nil {
return nil
}
all := m.cfg.Registry.All()
out := make([]*spider91.Driver, 0, len(all))
out := make([]migrationPlan, 0, len(all))
for _, d := range all {
if d.Kind() != spider91.Kind {
if d == nil {
continue
}
if sd, ok := d.(*spider91.Driver); ok {
src, ok := d.(Spider91LocalSource)
if !ok {
continue
}
row, err := m.cfg.Catalog.GetDrive(ctx, d.ID())
if (err != nil || row == nil) && d.Kind() == spider91.Kind {
row = &catalog.Drive{ID: d.ID(), Kind: spider91.Kind, RootID: "/"}
}
if row == nil {
continue
}
switch row.Kind {
case scriptcrawler.Kind:
targetID := strings.TrimSpace(row.Credentials["upload_drive_id"])
if targetID == "" {
continue
}
resolvedID, target, err := m.resolveTargetID(targetID)
if err != nil {
log.Printf("[spider91migrate] crawler=%s upload target=%q unavailable: %v", row.ID, targetID, err)
continue
}
out = append(out, migrationPlan{
source: src,
row: row,
sourceKinds: crawlerSourceKindsForRow(row),
targetDriveID: resolvedID,
target: target,
uploadDir: scriptCrawlerUploadDir(row.ID),
keepLatestN: 0,
requireAssetsReady: true,
})
case spider91.Kind:
if m.cfg.GetTargetDriveID == nil {
continue
}
targetID := strings.TrimSpace(m.cfg.GetTargetDriveID())
if targetID == "" {
continue
}
resolvedID, target, err := m.resolveTargetID(targetID)
if err != nil {
continue
}
out = append(out, migrationPlan{
source: src,
row: row,
sourceKinds: []string{spider91.Kind},
targetDriveID: resolvedID,
target: target,
uploadDir: spider91UploadDirName,
keepLatestN: m.cfg.KeepLatestN,
legacyBackfill: true,
})
}
}
return out
}
func crawlerSourceKindsForRow(d *catalog.Drive) []string {
kinds := []string{scriptcrawler.Kind}
if d != nil && strings.EqualFold(strings.TrimSpace(d.Credentials["builtin"]), spider91.Kind) {
kinds = append(kinds, spider91.Kind)
}
return kinds
}
func scriptCrawlerUploadDir(driveID string) string {
driveID = sanitizeUploadDirSegment(driveID)
if driveID == "" {
driveID = "crawler"
}
return scriptCrawlerUploadRootDirName + "/" + driveID
}
func sanitizeUploadDirSegment(raw string) string {
clean := sanitizeTitle(raw)
clean = strings.Trim(clean, "/")
if clean == "." || clean == ".." {
return ""
}
return clean
}
// spider91Drives 返回当前注册的所有 Spider91 来源本地爬虫 driver。
func (m *Migrator) spider91Drives(ctx context.Context) []Spider91LocalSource {
all := m.cfg.Registry.All()
out := make([]Spider91LocalSource, 0, len(all))
for _, d := range all {
if !m.isSpider91SourceDrive(ctx, d) {
continue
}
if sd, ok := d.(Spider91LocalSource); ok {
out = append(out, sd)
}
}
return out
}
// migrateDrive 对单个 spider91 drive 跑一批迁移;返回成功迁移的条数。
//
// 策略(与"本地缓存最新 N 个"语义一致):
// - 列出 spider91 drive 本地 videos/ 目录所有 mp4 文件,按 mtime 降序排
// - 跳过最新 KeepLatestN 个:这些是用户希望保留在本地的最新爬取
// - 对剩下的(更旧)逐个处理:
// - 还没迁移(drive_id 仍是 src.ID())→ 上传到 PikPak + 改 catalog + 删本地
// - 已经迁移过但本地还有残留 → 仅删本地(兜底)
//
// KeepLatestN < 0 时不保护任何本地文件,全部尝试迁移(旧行为,主要给测试用)。
func (m *Migrator) migrateDrive(ctx context.Context, src *spider91.Driver, targetDriveID string, pp uploadTarget) (int, error) {
keepN := m.cfg.KeepLatestN
func (m *Migrator) isSpider91SourceDrive(ctx context.Context, d drives.Drive) bool {
if d == nil {
return false
}
if d.Kind() == spider91.Kind {
return true
}
if d.Kind() != scriptcrawler.Kind || m.cfg.Catalog == nil {
return false
}
row, err := m.cfg.Catalog.GetDrive(ctx, d.ID())
if err != nil || row == nil {
return false
}
if row.Kind == spider91.Kind {
return true
}
return row.Kind == scriptcrawler.Kind && strings.EqualFold(strings.TrimSpace(row.Credentials["builtin"]), spider91.Kind)
}
// migrateDrive 对单个本地爬虫 drive 跑一批迁移;返回成功迁移的条数。
func (m *Migrator) migrateDrive(ctx context.Context, plan migrationPlan) (int, error) {
src := plan.source
if src == nil || plan.target == nil || plan.targetDriveID == "" {
return 0, nil
}
keepN := plan.keepLatestN
if keepN < 0 {
keepN = 0
}
@@ -417,28 +704,46 @@ func (m *Migrator) migrateDrive(ctx context.Context, src *spider91.Driver, targe
files = append(files, localFile{name: e.Name(), modTime: info.ModTime()})
}
// 本地数量没超过 keepN 时不动任何文件 —— 这条是 KeepLatestN 语义的核心
if m.cfg.KeepLatestN >= 0 && len(files) <= keepN {
if plan.keepLatestN >= 0 && len(files) <= keepN {
return 0, nil
}
// 按 mtime 降序:最新的排前面,保留前 keepN 个
sort.Slice(files, func(i, j int) bool { return files[i].modTime.After(files[j].modTime) })
// 候选 = 跳过最新 keepN 个之外的(更旧的)。KeepLatestN < 0 时 candidates=files。
skip := keepN
if m.cfg.KeepLatestN < 0 {
if plan.keepLatestN < 0 {
skip = 0
}
candidates := files
if skip < len(files) {
candidates = files[skip:]
} else {
m.reportUploadProgress(UploadProgress{DriveID: src.ID(), State: "idle"})
return 0, nil
}
totalCandidates := len(candidates)
m.reportUploadProgress(UploadProgress{
DriveID: src.ID(),
State: "uploading",
QueueLength: totalCandidates,
TotalCount: totalCandidates,
})
defer m.reportUploadProgress(UploadProgress{DriveID: src.ID(), State: "idle"})
localVideos, err := m.cfg.Catalog.ListVideosByDriveID(ctx, src.ID(), 100000)
if err != nil {
return 0, fmt.Errorf("list local catalog videos: %w", err)
}
byFileID := make(map[string]*catalog.Video, len(localVideos))
for _, v := range localVideos {
if v != nil && strings.TrimSpace(v.FileID) != "" {
byFileID[v.FileID] = v
}
}
migrated := 0
for _, f := range candidates {
processed := 0
for index, f := range candidates {
if err := ctx.Err(); err != nil {
return migrated, err
}
@@ -446,21 +751,87 @@ func (m *Migrator) migrateDrive(ctx context.Context, src *spider91.Driver, targe
break
}
viewkey := stripExt(f.name)
videoID := "spider91-" + src.ID() + "-" + viewkey
v, err := m.cfg.Catalog.GetVideo(ctx, videoID)
if err != nil || v == nil {
// 找不到 catalog 行:保险起见保留本地,让管理员可见
v := m.findVideoForLocalFile(ctx, plan, f.name, byFileID)
if v == nil {
processed++
m.reportUploadProgress(UploadProgress{
DriveID: src.ID(),
State: "uploading",
QueueLength: maxInt(totalCandidates-processed, 0),
DoneCount: processed,
TotalCount: totalCandidates,
})
continue
}
m.reportUploadProgress(UploadProgress{
DriveID: src.ID(),
State: "uploading",
CurrentTitle: v.Title,
QueueLength: maxInt(totalCandidates-index-1, 0),
DoneCount: processed,
TotalCount: totalCandidates,
})
if v.DriveID != src.ID() {
// catalog 已迁移到别的 drive,但本地还有残留 → 兜底删本地
CleanupSpider91Local(src, v.FileID)
CleanupSpider91Local(src, f.name)
processed++
m.reportUploadProgress(UploadProgress{
DriveID: src.ID(),
State: "uploading",
QueueLength: maxInt(totalCandidates-processed, 0),
DoneCount: processed,
TotalCount: totalCandidates,
})
continue
}
ok, err := m.migrateOne(ctx, v, src, targetDriveID, pp)
if targetDuplicate, err := m.cfg.Catalog.FindEquivalentVideoOnDrive(ctx, v, plan.targetDriveID); err != nil {
if !errors.Is(err, sql.ErrNoRows) {
log.Printf("[spider91migrate] %s find target duplicate: %v", v.ID, err)
}
} else if targetDuplicate != nil {
ok, err := m.bindToExistingTarget(ctx, v, targetDuplicate, plan)
if err != nil {
log.Printf("[spider91migrate] %s: %v", v.ID, err)
continue
}
if ok {
migrated++
if m.cfg.OnMigrated != nil {
m.cfg.OnMigrated(v.ID)
}
}
processed++
m.reportUploadProgress(UploadProgress{
DriveID: src.ID(),
State: "uploading",
QueueLength: maxInt(totalCandidates-processed, 0),
DoneCount: processed,
TotalCount: totalCandidates,
})
continue
}
if plan.requireAssetsReady {
ready, err := m.crawlerVideoAssetsReady(ctx, v)
if err != nil {
log.Printf("[spider91migrate] %s check generated assets: %v", v.ID, err)
continue
}
if !ready {
processed++
m.reportUploadProgress(UploadProgress{
DriveID: src.ID(),
State: "uploading",
QueueLength: maxInt(totalCandidates-processed, 0),
DoneCount: processed,
TotalCount: totalCandidates,
})
continue
}
}
ok, err := m.migrateOne(ctx, v, plan)
if err != nil {
log.Printf("[spider91migrate] %s: %v", v.ID, err)
// captcha 错误(4002 / 9)说明 PikPak 当前正拒绝我们;继续在
@@ -480,14 +851,64 @@ func (m *Migrator) migrateDrive(ctx context.Context, src *spider91.Driver, targe
m.cfg.OnMigrated(v.ID)
}
}
processed++
m.reportUploadProgress(UploadProgress{
DriveID: src.ID(),
State: "uploading",
QueueLength: maxInt(totalCandidates-processed, 0),
DoneCount: processed,
TotalCount: totalCandidates,
})
}
return migrated, nil
}
// migrateOne 把单条 spider91 视频上传到 PikPak 并改写 catalog。
func maxInt(a, b int) int {
if a > b {
return a
}
return b
}
func (m *Migrator) findVideoForLocalFile(ctx context.Context, plan migrationPlan, localFile string, byFileID map[string]*catalog.Video) *catalog.Video {
if v := byFileID[localFile]; v != nil {
return v
}
sourceID := stripExt(localFile)
driveID := ""
if plan.source != nil {
driveID = plan.source.ID()
}
for _, kind := range plan.sourceKinds {
id := scriptcrawler.BuildVideoIDForKind(kind, driveID, sourceID)
v, err := m.cfg.Catalog.GetVideo(ctx, id)
if err == nil && v != nil {
return v
}
}
return nil
}
func (m *Migrator) crawlerVideoAssetsReady(ctx context.Context, v *catalog.Video) (bool, error) {
if v == nil {
return false, nil
}
fingerprintReady := strings.EqualFold(strings.TrimSpace(v.FingerprintStatus), "ready") || strings.TrimSpace(v.SampledSHA256) != ""
if !fingerprintReady {
return false, nil
}
if strings.EqualFold(strings.TrimSpace(v.PreviewStatus), "ready") {
return true, nil
}
return m.cfg.Catalog.HasReadyEquivalentPreview(ctx, v)
}
// migrateOne 把单条本地爬虫视频上传到目标盘并改写 catalog。
// 返回 (true, nil) 表示真的迁了一条;(false, nil) 表示跳过(本地文件已不在等);
// (false, err) 表示真出错。
func (m *Migrator) migrateOne(ctx context.Context, v *catalog.Video, src *spider91.Driver, targetDriveID string, pp uploadTarget) (bool, error) {
func (m *Migrator) migrateOne(ctx context.Context, v *catalog.Video, plan migrationPlan) (bool, error) {
src := plan.source
pp := plan.target
path, err := src.VideoPath(v.FileID)
if err != nil {
return false, fmt.Errorf("resolve local path: %w", err)
@@ -511,16 +932,11 @@ func (m *Migrator) migrateOne(ctx context.Context, v *catalog.Video, src *spider
}
defer f.Close()
// 上传到目标盘的根目录(用户配置的目标 drive 的 rootID)。
// 上传名走 desiredPikPakName 算出来的方案 B 格式:
//
// <sanitized title>-<viewkey 后 8 位>.<ext>
//
// 这样网盘 Web 端列出来的文件名能直接看出是哪个视频,
// 又用 viewkey 后 8 位避免同标题撞名。两个目标盘(PikPak / 115)共用同一格式,
// 简化前端 / catalog 的认知。
parent := pp.RootID()
uploadName := desiredPikPakName(v.Title, extractViewKey(v.ID), v.Ext)
parent, err := pp.EnsureDir(ctx, plan.uploadDir)
if err != nil {
return false, fmt.Errorf("%s ensure %q dir: %w", pp.Kind(), plan.uploadDir, err)
}
uploadName := desiredPikPakName(v.Title, sourceIDForUploadName(v, plan), v.Ext)
res, err := pp.UploadAndReportHash(ctx, parent, uploadName, f, info.Size())
if err != nil {
return false, fmt.Errorf("%s upload: %w", pp.Kind(), err)
@@ -530,28 +946,157 @@ func (m *Migrator) migrateOne(ctx context.Context, v *catalog.Video, src *spider
}
// 事务性改写 catalog 行:drive_id / file_id / content_hash
if err := m.cfg.Catalog.MigrateVideoToDrive(ctx, v.ID, targetDriveID, res.FileID, res.Hash); err != nil {
if err := m.cfg.Catalog.MigrateVideoToDrive(ctx, v.ID, plan.targetDriveID, res.FileID, res.Hash); err != nil {
return false, fmt.Errorf("catalog migrate: %w", err)
}
m.preserveCrawledThumbnail(ctx, src, v)
// 同步 catalog 里的 file_name,让下次目标盘扫盘时 (file_name, size) 也能匹配上
if err := m.cfg.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{FileName: uploadName}); err != nil {
log.Printf("[spider91migrate] %s update file_name after migrate: %v", v.ID, err)
}
// 删除本地 mp4 和 thumbthumb 在 previews/thumbs/ 还有副本,不影响展示)
// 删除本地 mp4 和 thumb公共 /p/thumb 副本已在 preserveCrawledThumbnail 中保留)。
CleanupSpider91Local(src, v.FileID)
log.Printf("[spider91migrate] %s migrated to drive=%s(kind=%s) file=%s name=%q", v.ID, targetDriveID, pp.Kind(), res.FileID, uploadName)
log.Printf("[spider91migrate] %s migrated to drive=%s(kind=%s) file=%s name=%q", v.ID, plan.targetDriveID, pp.Kind(), res.FileID, uploadName)
return true, nil
}
func (m *Migrator) bindToExistingTarget(ctx context.Context, v, target *catalog.Video, plan migrationPlan) (bool, error) {
if v == nil || target == nil || plan.source == nil {
return false, nil
}
if plan.targetDriveID == "" || target.FileID == "" {
return false, nil
}
if err := m.cfg.Catalog.MigrateVideoToDrive(ctx, v.ID, plan.targetDriveID, target.FileID, firstNonEmpty(target.ContentHash, v.ContentHash)); err != nil {
return false, fmt.Errorf("catalog bind existing target: %w", err)
}
if target.FileName != "" {
if err := m.cfg.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{FileName: target.FileName}); err != nil {
log.Printf("[spider91migrate] %s update file_name after duplicate bind: %v", v.ID, err)
}
}
m.preserveCrawledThumbnail(ctx, plan.source, v)
CleanupSpider91Local(plan.source, v.FileID)
log.Printf("[spider91migrate] %s bound to existing drive=%s(kind=%s) file=%s duplicate=%s", v.ID, plan.targetDriveID, plan.target.Kind(), target.FileID, target.ID)
return true, nil
}
func firstNonEmpty(values ...string) string {
for _, value := range values {
if strings.TrimSpace(value) != "" {
return value
}
}
return ""
}
func sourceIDForUploadName(v *catalog.Video, plan migrationPlan) string {
if v == nil {
return ""
}
if plan.legacyBackfill {
return extractViewKey(v.ID)
}
for _, kind := range plan.sourceKinds {
prefix := kind + "-" + plan.source.ID() + "-"
if strings.HasPrefix(v.ID, prefix) {
return strings.TrimPrefix(v.ID, prefix)
}
}
if v.FileID != "" {
return stripExt(v.FileID)
}
return extractViewKey(v.ID)
}
func (m *Migrator) preserveCrawledThumbnail(ctx context.Context, src Spider91LocalSource, v *catalog.Video) {
if m == nil || m.cfg.Catalog == nil || src == nil || v == nil || v.ID == "" || v.FileID == "" {
return
}
commonDir := strings.TrimSpace(m.cfg.CommonThumbDir)
if commonDir == "" {
return
}
thumbPath, ok := findSpider91ThumbPath(src, v.FileID)
if !ok {
if v.ThumbnailURL == "" {
log.Printf("[spider91migrate] %s crawled thumbnail missing before migration cleanup", v.ID)
}
return
}
if err := os.MkdirAll(commonDir, 0o755); err != nil {
log.Printf("[spider91migrate] %s mkdir common thumbs: %v", v.ID, err)
return
}
dst := mediaasset.ThumbnailPathInDir(commonDir, v.ID)
if _, err := os.Stat(dst); err != nil {
if !os.IsNotExist(err) {
log.Printf("[spider91migrate] %s stat common thumb: %v", v.ID, err)
return
}
if err := copyFileAtomic(thumbPath, dst); err != nil {
log.Printf("[spider91migrate] %s preserve crawled thumbnail: %v", v.ID, err)
return
}
}
if err := m.cfg.Catalog.UpdateVideoMeta(ctx, v.ID, catalog.VideoMetaPatch{
ThumbnailURL: "/p/thumb/" + v.ID,
}); err != nil {
log.Printf("[spider91migrate] %s update crawled thumbnail url: %v", v.ID, err)
return
}
v.ThumbnailURL = "/p/thumb/" + v.ID
}
func findSpider91ThumbPath(src Spider91LocalSource, fileID string) (string, bool) {
thumbBase := stripExt(fileID)
for _, ext := range []string{".jpg", ".jpeg", ".png", ".webp"} {
thumbPath, err := src.ThumbPath(thumbBase + ext)
if err != nil {
continue
}
info, statErr := os.Stat(thumbPath)
if statErr == nil && info.Mode().IsRegular() && info.Size() > 0 {
return thumbPath, true
}
}
return "", false
}
func copyFileAtomic(src, dst string) error {
in, err := os.Open(src)
if err != nil {
return err
}
defer in.Close()
tmp := dst + ".part"
out, err := os.OpenFile(tmp, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o644)
if err != nil {
return err
}
_, copyErr := io.Copy(out, in)
closeErr := out.Close()
if copyErr != nil {
_ = os.Remove(tmp)
return copyErr
}
if closeErr != nil {
_ = os.Remove(tmp)
return closeErr
}
return os.Rename(tmp, dst)
}
// CleanupSpider91Local 删除已迁移视频的本地 mp4 和 thumb。
//
// thumb 删除是 best-effort —— 找不到就算了(spider91 thumb 文件名带后缀,
// 我们不知道具体是 .jpg 还是别的,逐个尝试常见后缀)。
//
// 暴露成包级函数方便 cleanup 模块复用(任务 6)。
func CleanupSpider91Local(src *spider91.Driver, fileID string) {
func CleanupSpider91Local(src Spider91LocalSource, fileID string) {
videoPath, err := src.VideoPath(fileID)
if err == nil {
if err := os.Remove(videoPath); err != nil && !os.IsNotExist(err) {
@@ -588,7 +1133,11 @@ func stripExt(name string) string {
// 找到孤儿。
//
// 返回实际删除的文件个数。
func (m *Migrator) cleanupOldLocalVideos(ctx context.Context, src *spider91.Driver) (int, error) {
func (m *Migrator) cleanupOldLocalVideos(ctx context.Context, plan migrationPlan) (int, error) {
src := plan.source
if src == nil {
return 0, nil
}
entries, err := os.ReadDir(src.VideosDir())
if err != nil {
if os.IsNotExist(err) {
@@ -605,18 +1154,13 @@ func (m *Migrator) cleanupOldLocalVideos(ctx context.Context, src *spider91.Driv
if e.IsDir() {
continue
}
viewkey := stripExt(e.Name())
videoID := "spider91-" + src.ID() + "-" + viewkey
v, err := m.cfg.Catalog.GetVideo(ctx, videoID)
if err != nil || v == nil {
// 找不到 catalog 行:保险起见保留,等管理员处理
v := m.findVideoForLocalFile(ctx, plan, e.Name(), nil)
if v == nil {
continue
}
if v.DriveID == src.ID() {
// 还没迁移,归 migrateDrive 管,不在这里动
continue
}
// 已迁移到别的 drive 但本地还有 → 删
path, perr := src.VideoPath(e.Name())
if perr != nil {
continue
@@ -639,7 +1183,7 @@ func (m *Migrator) cleanupOldLocalVideos(ctx context.Context, src *spider91.Driv
return deleted, nil
}
// backfillFileNames 扫描目标 drivePikPak 或 115)下所有 spider91-* 起始 ID 的视频,
// backfillFileNames 扫描目标 drivePikPak、115、123、OneDrive、Google Drive 或联通网盘)下所有 spider91-* 起始 ID 的视频,
// 对文件名不是 desiredPikPakName(...) 期望格式的,调 target.Rename 修正,
// 并把 catalog.file_name 同步到新名字。
//
@@ -14,8 +14,12 @@ import (
"github.com/video-site/backend/internal/catalog"
"github.com/video-site/backend/internal/drives"
"github.com/video-site/backend/internal/drives/googledrive"
"github.com/video-site/backend/internal/drives/p123"
"github.com/video-site/backend/internal/drives/pikpak"
"github.com/video-site/backend/internal/drives/scriptcrawler"
"github.com/video-site/backend/internal/drives/spider91"
"github.com/video-site/backend/internal/drives/wopan"
)
// fakeRegistry 是 Registry 接口的最小实现。
@@ -53,6 +57,8 @@ type fakePikPak struct {
uploadFunc func(ctx context.Context, parentID, name string, r io.Reader, size int64) (UploadResult, error)
mu sync.Mutex
gotBodies map[string][]byte
gotParents map[string]string
ensureCalls []string
// renameCalls 记录每次 Rename 的 fileID->newName 历史,用于 backfill 测试断言。
renameCalls map[string]string
}
@@ -62,6 +68,7 @@ func newFakePikPak(id, rootID string) *fakePikPak {
id: id,
rootID: rootID,
gotBodies: make(map[string][]byte),
gotParents: make(map[string]string),
renameCalls: make(map[string]string),
}
}
@@ -80,8 +87,11 @@ func (d *fakePikPak) StreamURL(context.Context, string) (*drives.StreamLink, err
func (d *fakePikPak) Upload(context.Context, string, string, io.Reader, int64) (string, error) {
return "", drives.ErrNotSupported
}
func (d *fakePikPak) EnsureDir(context.Context, string) (string, error) {
return "", drives.ErrNotSupported
func (d *fakePikPak) EnsureDir(_ context.Context, pathFromRoot string) (string, error) {
d.mu.Lock()
defer d.mu.Unlock()
d.ensureCalls = append(d.ensureCalls, pathFromRoot)
return d.rootID + "/" + pathFromRoot, nil
}
func (d *fakePikPak) Rename(_ context.Context, fileID, newName string) error {
d.mu.Lock()
@@ -99,6 +109,7 @@ func (d *fakePikPak) UploadAndReportHash(ctx context.Context, parentID, name str
body, _ := io.ReadAll(r)
d.mu.Lock()
d.gotBodies[name] = body
d.gotParents[name] = parentID
d.mu.Unlock()
return UploadResult{
FileID: "remote-" + name,
@@ -127,6 +138,32 @@ func (d *fakeP115) Kind() string { return "p115" }
var _ drives.Drive = (*fakeP115)(nil)
var _ uploadTarget = (*fakeP115)(nil)
type fakeP123 struct {
*fakePikPak
}
func newFakeP123(id, rootID string) *fakeP123 {
return &fakeP123{fakePikPak: newFakePikPak(id, rootID)}
}
func (d *fakeP123) Kind() string { return "p123" }
var _ drives.Drive = (*fakeP123)(nil)
var _ uploadTarget = (*fakeP123)(nil)
type fakeOneDrive struct {
*fakePikPak
}
func newFakeOneDrive(id, rootID string) *fakeOneDrive {
return &fakeOneDrive{fakePikPak: newFakePikPak(id, rootID)}
}
func (d *fakeOneDrive) Kind() string { return "onedrive" }
var _ drives.Drive = (*fakeOneDrive)(nil)
var _ uploadTarget = (*fakeOneDrive)(nil)
// TestBackfillFileNamesRenamesOnlyMismatchedSpider91Videos 验证回填逻辑:
//
// - 已经是期望格式的不会再调 Rename(幂等)
@@ -308,6 +345,81 @@ func writeSpider91Video(t *testing.T, cat *catalog.Catalog, d *spider91.Driver,
return id
}
func setupScriptCrawler(t *testing.T, id string) *scriptcrawler.Driver {
t.Helper()
d := scriptcrawler.New(scriptcrawler.Config{ID: id, RootDir: t.TempDir()})
if err := d.Init(context.Background()); err != nil {
t.Fatalf("scriptcrawler init: %v", err)
}
return d
}
func seedScriptCrawlerDrive(t *testing.T, cat *catalog.Catalog, d *scriptcrawler.Driver, uploadDriveID string) {
t.Helper()
if err := cat.UpsertDrive(context.Background(), &catalog.Drive{
ID: d.ID(),
Kind: scriptcrawler.Kind,
Name: "Script Crawler",
RootID: "/",
Credentials: map[string]string{
"script_path": "/tmp/crawler.py",
"upload_drive_id": uploadDriveID,
},
}); err != nil {
t.Fatalf("seed scriptcrawler drive: %v", err)
}
}
func writeScriptCrawlerVideo(t *testing.T, cat *catalog.Catalog, d *scriptcrawler.Driver, sourceID, ext string, content []byte, readyAssets bool) string {
t.Helper()
fileID := sourceID + ext
path, err := d.VideoPath(fileID)
if err != nil {
t.Fatalf("video path: %v", err)
}
if err := os.WriteFile(path, content, 0o644); err != nil {
t.Fatalf("write video: %v", err)
}
thumbPath, err := d.ThumbPath(sourceID + ".jpg")
if err != nil {
t.Fatalf("thumb path: %v", err)
}
if err := os.WriteFile(thumbPath, []byte("thumb"), 0o644); err != nil {
t.Fatalf("write thumb: %v", err)
}
now := time.Now()
id := scriptcrawler.BuildVideoID(d.ID(), sourceID)
previewStatus := "pending"
if readyAssets {
previewStatus = "ready"
}
v := &catalog.Video{
ID: id,
DriveID: d.ID(),
FileID: fileID,
FileName: fileID,
Title: "Crawler " + sourceID,
Author: "tester",
Ext: strings.TrimPrefix(ext, "."),
Quality: "HD",
Size: int64(len(content)),
ThumbnailURL: "/p/thumb/" + id,
PreviewStatus: previewStatus,
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}
if err := cat.UpsertVideo(context.Background(), v); err != nil {
t.Fatalf("upsert scriptcrawler video: %v", err)
}
if readyAssets {
if err := cat.UpdateVideoFingerprint(context.Background(), id, "sampled-"+sourceID, "ready", ""); err != nil {
t.Fatalf("mark fingerprint ready: %v", err)
}
}
return id
}
func TestRunOnceMigratesSpider91VideosAndCleansLocalFiles(t *testing.T) {
cat := setupCatalog(t)
src, _ := setupSpider91(t)
@@ -319,12 +431,14 @@ func TestRunOnceMigratesSpider91VideosAndCleansLocalFiles(t *testing.T) {
now := time.Now()
id := writeSpider91Video(t, cat, src, "vk001", ".mp4", []byte("video bytes here"), now)
commonThumbDir := t.TempDir()
m := New(Config{
Catalog: cat,
Registry: reg,
GetTargetDriveID: func() string { return pp.ID() },
KeepLatestN: -1, // 关闭"保留最新 N 个",让 1 条也能立即上传
CommonThumbDir: commonThumbDir,
})
m.runOnce(context.Background())
@@ -347,6 +461,12 @@ func TestRunOnceMigratesSpider91VideosAndCleansLocalFiles(t *testing.T) {
if _, ok := pp.gotBodies[wantName]; !ok {
t.Fatalf("PikPak did not receive expected upload name %q (got names: %v)", wantName, keysOf(pp.gotBodies))
}
if gotParent := pp.gotParents[wantName]; gotParent != "pikpak-root-id/"+spider91UploadDirName {
t.Fatalf("upload parent = %q, want root/91 Spider", gotParent)
}
if len(pp.ensureCalls) != 1 || pp.ensureCalls[0] != spider91UploadDirName {
t.Fatalf("ensure calls = %#v, want %q", pp.ensureCalls, spider91UploadDirName)
}
if got.FileID != "remote-"+wantName {
t.Fatalf("file_id = %q, want %q", got.FileID, "remote-"+wantName)
}
@@ -356,8 +476,15 @@ func TestRunOnceMigratesSpider91VideosAndCleansLocalFiles(t *testing.T) {
if got.ContentHash == "" {
t.Fatalf("content_hash should be set after migration")
}
if got.ThumbnailURL != "/p/thumb/"+id {
t.Fatalf("thumbnail_url = %q, want preserved crawled thumbnail URL", got.ThumbnailURL)
}
commonThumbPath := filepath.Join(commonThumbDir, id+".jpg")
if data, err := os.ReadFile(commonThumbPath); err != nil || string(data) != "thumb" {
t.Fatalf("common thumb = %q, %v; want copied crawled thumb", string(data), err)
}
// 3) 本地视频和 thumb 都被删了
// 3) 本地视频和 thumb 都被删了;公共 /p/thumb 副本保留。
videoPath, _ := src.VideoPath("vk001.mp4")
if _, err := os.Stat(videoPath); !os.IsNotExist(err) {
t.Fatalf("local mp4 still exists or stat error %v", err)
@@ -368,6 +495,174 @@ func TestRunOnceMigratesSpider91VideosAndCleansLocalFiles(t *testing.T) {
}
}
func TestRunOnceMigratesReadyScriptCrawlerVideoToConfiguredUploadDrive(t *testing.T) {
cat := setupCatalog(t)
src := setupScriptCrawler(t, "crawler-alpha")
pp := newFakePikPak("pikpak-target", "pikpak-root-id")
seedScriptCrawlerDrive(t, cat, src, pp.ID())
reg := newFakeRegistry()
reg.Add(src)
reg.Add(pp)
id := writeScriptCrawlerVideo(t, cat, src, "source-with-dash-001", ".mp4", []byte("script video bytes"), true)
commonThumbDir := t.TempDir()
m := New(Config{
Catalog: cat,
Registry: reg,
CommonThumbDir: commonThumbDir,
})
m.runOnce(context.Background())
if pp.uploadCalls != 1 {
t.Fatalf("upload calls = %d, want 1", pp.uploadCalls)
}
wantDir := "Script Crawlers/crawler-alpha"
if len(pp.ensureCalls) != 1 || pp.ensureCalls[0] != wantDir {
t.Fatalf("ensure calls = %#v, want %q", pp.ensureCalls, wantDir)
}
wantName := desiredPikPakName("Crawler source-with-dash-001", "source-with-dash-001", "mp4")
if gotParent := pp.gotParents[wantName]; gotParent != "pikpak-root-id/"+wantDir {
t.Fatalf("upload parent = %q, want root/%s", gotParent, wantDir)
}
got, err := cat.GetVideo(context.Background(), id)
if err != nil {
t.Fatalf("get migrated video: %v", err)
}
if got.DriveID != pp.ID() {
t.Fatalf("drive_id = %q, want %q", got.DriveID, pp.ID())
}
if got.FileID != "remote-"+wantName {
t.Fatalf("file_id = %q, want remote upload id", got.FileID)
}
if got.FileName != wantName {
t.Fatalf("file_name = %q, want %q", got.FileName, wantName)
}
if got.PreviewStatus != "ready" || got.FingerprintStatus != "ready" || got.SampledSHA256 == "" {
t.Fatalf("generated assets not preserved after migration: preview=%q fingerprint=%q sampled=%q", got.PreviewStatus, got.FingerprintStatus, got.SampledSHA256)
}
videoPath, _ := src.VideoPath("source-with-dash-001.mp4")
if _, err := os.Stat(videoPath); !os.IsNotExist(err) {
t.Fatalf("local scriptcrawler video still exists or stat error %v", err)
}
thumbPath, _ := src.ThumbPath("source-with-dash-001.jpg")
if _, err := os.Stat(thumbPath); !os.IsNotExist(err) {
t.Fatalf("local scriptcrawler thumb still exists or stat error %v", err)
}
commonThumbPath := filepath.Join(commonThumbDir, id+".jpg")
if data, err := os.ReadFile(commonThumbPath); err != nil || string(data) != "thumb" {
t.Fatalf("common thumb = %q, %v; want copied crawled thumb", string(data), err)
}
}
func TestRunOnceSkipsScriptCrawlerVideoUntilPreviewAndFingerprintReady(t *testing.T) {
cat := setupCatalog(t)
src := setupScriptCrawler(t, "crawler-beta")
pp := newFakePikPak("pikpak-target", "pikpak-root-id")
seedScriptCrawlerDrive(t, cat, src, pp.ID())
reg := newFakeRegistry()
reg.Add(src)
reg.Add(pp)
id := writeScriptCrawlerVideo(t, cat, src, "pending-assets", ".mp4", []byte("script video bytes"), false)
m := New(Config{Catalog: cat, Registry: reg})
m.runOnce(context.Background())
if pp.uploadCalls != 0 {
t.Fatalf("upload calls = %d, want 0 while generated assets are pending", pp.uploadCalls)
}
got, err := cat.GetVideo(context.Background(), id)
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.DriveID != src.ID() {
t.Fatalf("drive_id = %q, want local crawler drive %q", got.DriveID, src.ID())
}
videoPath, _ := src.VideoPath("pending-assets.mp4")
if _, err := os.Stat(videoPath); err != nil {
t.Fatalf("local video should remain while assets pending: %v", err)
}
}
func TestRunOnceBindsScriptCrawlerDuplicateToExistingTargetWithoutUpload(t *testing.T) {
cat := setupCatalog(t)
src := setupScriptCrawler(t, "crawler-duplicate")
pp := newFakePikPak("pikpak-target", "pikpak-root-id")
seedScriptCrawlerDrive(t, cat, src, pp.ID())
reg := newFakeRegistry()
reg.Add(src)
reg.Add(pp)
content := []byte("duplicate script video bytes")
id := writeScriptCrawlerVideo(t, cat, src, "duplicate-source", ".mp4", content, false)
sampled := "same-sampled-fingerprint"
if err := cat.UpdateVideoFingerprint(context.Background(), id, sampled, "ready", ""); err != nil {
t.Fatalf("mark source fingerprint ready: %v", err)
}
now := time.Now()
target := &catalog.Video{
ID: "pikpak-existing-duplicate",
DriveID: pp.ID(),
FileID: "existing-target-file",
FileName: "existing-target-name.mp4",
ContentHash: "existing-content-hash",
Title: "Existing duplicate",
Ext: "mp4",
Size: int64(len(content)),
PreviewStatus: "ready",
PublishedAt: now.Add(-time.Hour),
CreatedAt: now.Add(-time.Hour),
UpdatedAt: now.Add(-time.Hour),
}
if err := cat.UpsertVideo(context.Background(), target); err != nil {
t.Fatalf("upsert existing target: %v", err)
}
if err := cat.UpdateVideoFingerprint(context.Background(), target.ID, sampled, "ready", ""); err != nil {
t.Fatalf("mark target fingerprint ready: %v", err)
}
commonThumbDir := t.TempDir()
m := New(Config{Catalog: cat, Registry: reg, CommonThumbDir: commonThumbDir})
m.runOnce(context.Background())
if pp.uploadCalls != 0 {
t.Fatalf("upload calls = %d, want 0 when equivalent target file already exists", pp.uploadCalls)
}
got, err := cat.GetVideo(context.Background(), id)
if err != nil {
t.Fatalf("get bound video: %v", err)
}
if got.DriveID != pp.ID() {
t.Fatalf("drive_id = %q, want %q", got.DriveID, pp.ID())
}
if got.FileID != target.FileID {
t.Fatalf("file_id = %q, want existing target file %q", got.FileID, target.FileID)
}
if got.FileName != target.FileName {
t.Fatalf("file_name = %q, want existing target name %q", got.FileName, target.FileName)
}
if got.ContentHash != target.ContentHash {
t.Fatalf("content_hash = %q, want %q", got.ContentHash, target.ContentHash)
}
videoPath, _ := src.VideoPath("duplicate-source.mp4")
if _, err := os.Stat(videoPath); !os.IsNotExist(err) {
t.Fatalf("local duplicate video still exists or stat error %v", err)
}
thumbPath, _ := src.ThumbPath("duplicate-source.jpg")
if _, err := os.Stat(thumbPath); !os.IsNotExist(err) {
t.Fatalf("local duplicate thumb still exists or stat error %v", err)
}
commonThumbPath := filepath.Join(commonThumbDir, id+".jpg")
if data, err := os.ReadFile(commonThumbPath); err != nil || string(data) != "thumb" {
t.Fatalf("common thumb = %q, %v; want copied crawled thumb", string(data), err)
}
}
func TestRunOnceSkipsWhenLocalFileMissing(t *testing.T) {
cat := setupCatalog(t)
src, _ := setupSpider91(t)
@@ -527,7 +822,10 @@ func TestCleanupRemovesAllAlreadyMigratedOrphans(t *testing.T) {
GetTargetDriveID: func() string { return pp.ID() },
})
deleted, err := m.cleanupOldLocalVideos(context.Background(), src)
deleted, err := m.cleanupOldLocalVideos(context.Background(), migrationPlan{
source: src,
sourceKinds: []string{spider91.Kind},
})
if err != nil {
t.Fatalf("cleanup: %v", err)
}
@@ -549,6 +847,95 @@ func TestCleanupRemovesAllAlreadyMigratedOrphans(t *testing.T) {
}
}
func TestRunOnceMigratesBuiltInSpider91ScriptCrawlerSource(t *testing.T) {
ctx := context.Background()
cat := setupCatalog(t)
src := scriptcrawler.New(scriptcrawler.Config{ID: "spider-script", RootDir: t.TempDir()})
if err := src.Init(ctx); err != nil {
t.Fatalf("scriptcrawler init: %v", err)
}
if err := cat.UpsertDrive(ctx, &catalog.Drive{
ID: src.ID(),
Kind: scriptcrawler.Kind,
Name: "Built-in Spider91",
Credentials: map[string]string{
"builtin": "spider91",
"script_path": "/tmp/spider91.py",
"upload_drive_id": "pikpak-target",
},
}); err != nil {
t.Fatalf("upsert source drive: %v", err)
}
pp := newFakePikPak("pikpak-target", "pikpak-root-id")
reg := newFakeRegistry()
reg.Add(src)
reg.Add(pp)
fileID := "vk-script.mp4"
videoPath, err := src.VideoPath(fileID)
if err != nil {
t.Fatalf("video path: %v", err)
}
if err := os.WriteFile(videoPath, []byte("scriptcrawler spider91 video"), 0o644); err != nil {
t.Fatalf("write video: %v", err)
}
thumbPath, err := src.ThumbPath("vk-script.jpg")
if err != nil {
t.Fatalf("thumb path: %v", err)
}
if err := os.WriteFile(thumbPath, []byte("thumb"), 0o644); err != nil {
t.Fatalf("write thumb: %v", err)
}
now := time.Now()
id := "spider91-" + src.ID() + "-vk-script"
if err := cat.UpsertVideo(ctx, &catalog.Video{
ID: id,
DriveID: src.ID(),
FileID: fileID,
FileName: fileID,
Title: "Scriptcrawler Spider91",
Author: "91porn",
Ext: "mp4",
Quality: "HD",
Size: int64(len("scriptcrawler spider91 video")),
PreviewStatus: "ready",
PublishedAt: now,
CreatedAt: now,
UpdatedAt: now,
}); err != nil {
t.Fatalf("upsert video: %v", err)
}
if err := cat.UpdateVideoFingerprint(ctx, id, "sampled-vk-script", "ready", ""); err != nil {
t.Fatalf("mark fingerprint ready: %v", err)
}
m := New(Config{
Catalog: cat,
Registry: reg,
GetTargetDriveID: func() string { return pp.ID() },
KeepLatestN: -1,
CommonThumbDir: t.TempDir(),
})
m.runOnce(ctx)
if pp.uploadCalls != 1 {
t.Fatalf("upload calls = %d, want 1", pp.uploadCalls)
}
got, err := cat.GetVideo(ctx, id)
if err != nil {
t.Fatalf("get migrated video: %v", err)
}
if got.DriveID != pp.ID() {
t.Fatalf("drive_id = %q, want %q", got.DriveID, pp.ID())
}
if _, err := os.Stat(videoPath); !os.IsNotExist(err) {
t.Fatalf("local video stat err = %v, want not exist", err)
}
if _, err := os.Stat(thumbPath); !os.IsNotExist(err) {
t.Fatalf("local thumb stat err = %v, want not exist", err)
}
}
// TestRunOnceKeepsAllLocalWhenWithinKeepWindow 验证:本地文件数 ≤ KeepLatestN 时
// 一律不上传,全部留作"最新 N"缓存。这是用户的核心需求:刚爬下来的 15 个不要立即被传走。
func TestRunOnceKeepsAllLocalWhenWithinKeepWindow(t *testing.T) {
@@ -588,7 +975,7 @@ func TestRunOnceKeepsAllLocalWhenWithinKeepWindow(t *testing.T) {
}
// TestRunOnceMigratesOnlyOlderFilesBeyondKeepWindow 验证:本地文件数 > KeepLatestN 时
// 按 mtime 降序保留最新 N 个,超出部分(更旧的)才上传到 PikPak
// 按 mtime 降序保留最新 N 个,超出部分(更旧的)才上传到目标盘
func TestRunOnceMigratesOnlyOlderFilesBeyondKeepWindow(t *testing.T) {
cat := setupCatalog(t)
src, _ := setupSpider91(t)
@@ -841,7 +1228,6 @@ func TestNonCaptchaErrorDoesNotTriggerCooldown(t *testing.T) {
}
}
// TestRunOnceMigratesToP115Target 验证:当目标 drive 是 115kind="p115")时,
// migrator 也能正确把 spider91 视频上传过去并改写 catalog。
//
@@ -885,6 +1271,12 @@ func TestRunOnceMigratesToP115Target(t *testing.T) {
if _, ok := target.gotBodies[wantName]; !ok {
t.Fatalf("p115 did not receive expected upload name %q (got names: %v)", wantName, keysOf(target.gotBodies))
}
if gotParent := target.gotParents[wantName]; gotParent != "p115-root-cid/"+spider91UploadDirName {
t.Fatalf("p115 upload parent = %q, want root/91 Spider", gotParent)
}
if len(target.ensureCalls) != 1 || target.ensureCalls[0] != spider91UploadDirName {
t.Fatalf("p115 ensure calls = %#v, want %q", target.ensureCalls, spider91UploadDirName)
}
if got.FileID != "remote-"+wantName {
t.Fatalf("file_id = %q, want %q", got.FileID, "remote-"+wantName)
}
@@ -906,7 +1298,173 @@ func TestRunOnceMigratesToP115Target(t *testing.T) {
}
}
// TestResolveTargetRejectsUnsupportedKind 验证当目标 drive 既不是 PikPak 也不是 115 时,
func TestRunOnceMigratesToP123Target(t *testing.T) {
cat := setupCatalog(t)
src, _ := setupSpider91(t)
target := newFakeP123("p123-target", "p123-root-id")
reg := newFakeRegistry()
reg.Add(src)
reg.Add(target)
now := time.Now()
id := writeSpider91Video(t, cat, src, "vk-123-001", ".mp4", []byte("video bytes 123"), now)
m := New(Config{
Catalog: cat,
Registry: reg,
GetTargetDriveID: func() string { return target.ID() },
KeepLatestN: -1,
})
m.runOnce(context.Background())
if target.uploadCalls != 1 {
t.Fatalf("p123 upload calls = %d, want 1", target.uploadCalls)
}
got, err := cat.GetVideo(context.Background(), id)
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.DriveID != target.ID() {
t.Fatalf("drive_id = %q, want %q", got.DriveID, target.ID())
}
wantName := "Sample vk-123-001-001.mp4"
if _, ok := target.gotBodies[wantName]; !ok {
t.Fatalf("p123 did not receive expected upload name %q (got names: %v)", wantName, keysOf(target.gotBodies))
}
if gotParent := target.gotParents[wantName]; gotParent != "p123-root-id/"+spider91UploadDirName {
t.Fatalf("p123 upload parent = %q, want root/91 Spider", gotParent)
}
if len(target.ensureCalls) != 1 || target.ensureCalls[0] != spider91UploadDirName {
t.Fatalf("p123 ensure calls = %#v, want %q", target.ensureCalls, spider91UploadDirName)
}
if got.FileID != "remote-"+wantName {
t.Fatalf("file_id = %q, want %q", got.FileID, "remote-"+wantName)
}
if got.FileName != wantName {
t.Fatalf("file_name = %q, want %q", got.FileName, wantName)
}
if got.ContentHash == "" {
t.Fatal("content_hash should be set after p123 migration")
}
videoPath, _ := src.VideoPath("vk-123-001.mp4")
if _, err := os.Stat(videoPath); !os.IsNotExist(err) {
t.Fatalf("local mp4 still exists after p123 migration or stat error: %v", err)
}
thumbPath, _ := src.ThumbPath("vk-123-001.jpg")
if _, err := os.Stat(thumbPath); !os.IsNotExist(err) {
t.Fatalf("local thumb still exists after p123 migration or stat error: %v", err)
}
}
func TestRunOnceMigratesToOneDriveTarget(t *testing.T) {
cat := setupCatalog(t)
src, _ := setupSpider91(t)
target := newFakeOneDrive("onedrive-target", "onedrive-root")
reg := newFakeRegistry()
reg.Add(src)
reg.Add(target)
now := time.Now()
id := writeSpider91Video(t, cat, src, "vk-od-001", ".mp4", []byte("video bytes onedrive"), now)
m := New(Config{
Catalog: cat,
Registry: reg,
GetTargetDriveID: func() string { return target.ID() },
KeepLatestN: -1,
})
m.runOnce(context.Background())
if target.uploadCalls != 1 {
t.Fatalf("onedrive upload calls = %d, want 1", target.uploadCalls)
}
got, err := cat.GetVideo(context.Background(), id)
if err != nil {
t.Fatalf("get video: %v", err)
}
if got.DriveID != target.ID() {
t.Fatalf("drive_id = %q, want %q", got.DriveID, target.ID())
}
wantName := "Sample vk-od-001-001.mp4"
if _, ok := target.gotBodies[wantName]; !ok {
t.Fatalf("onedrive did not receive expected upload name %q (got names: %v)", wantName, keysOf(target.gotBodies))
}
if gotParent := target.gotParents[wantName]; gotParent != "onedrive-root/"+spider91UploadDirName {
t.Fatalf("onedrive upload parent = %q, want root/91 Spider", gotParent)
}
if len(target.ensureCalls) != 1 || target.ensureCalls[0] != spider91UploadDirName {
t.Fatalf("onedrive ensure calls = %#v, want %q", target.ensureCalls, spider91UploadDirName)
}
if got.FileID != "remote-"+wantName {
t.Fatalf("file_id = %q, want %q", got.FileID, "remote-"+wantName)
}
if got.FileName != wantName {
t.Fatalf("file_name = %q, want %q", got.FileName, wantName)
}
if got.ContentHash == "" {
t.Fatal("content_hash should be set after onedrive migration")
}
videoPath, _ := src.VideoPath("vk-od-001.mp4")
if _, err := os.Stat(videoPath); !os.IsNotExist(err) {
t.Fatalf("local mp4 still exists after onedrive migration or stat error: %v", err)
}
thumbPath, _ := src.ThumbPath("vk-od-001.jpg")
if _, err := os.Stat(thumbPath); !os.IsNotExist(err) {
t.Fatalf("local thumb still exists after onedrive migration or stat error: %v", err)
}
}
func TestAdaptUploadTargetSupportsP123Driver(t *testing.T) {
d := p123.New(p123.Config{
ID: "p123-target",
RootID: "root-123",
AccessToken: "token-1",
})
target, err := adaptUploadTarget(d)
if err != nil {
t.Fatalf("adaptUploadTarget() error = %v", err)
}
if target.ID() != "p123-target" || target.Kind() != "p123" || target.RootID() != "root-123" {
t.Fatalf("target id/kind/root = %q/%q/%q, want p123-target/p123/root-123", target.ID(), target.Kind(), target.RootID())
}
}
func TestAdaptUploadTargetSupportsGoogleDriveDriver(t *testing.T) {
d := googledrive.New(googledrive.Config{
ID: "google-target",
RootID: "root-google",
RefreshToken: "refresh-token",
})
target, err := adaptUploadTarget(d)
if err != nil {
t.Fatalf("adaptUploadTarget() error = %v", err)
}
if target.ID() != "google-target" || target.Kind() != "googledrive" || target.RootID() != "root-google" {
t.Fatalf("target id/kind/root = %q/%q/%q, want google-target/googledrive/root-google", target.ID(), target.Kind(), target.RootID())
}
}
func TestAdaptUploadTargetSupportsWopanDriver(t *testing.T) {
d := wopan.New(wopan.Config{
ID: "wopan-target",
RootID: "root-wopan",
AccessToken: "access-token",
RefreshToken: "refresh-token",
})
target, err := adaptUploadTarget(d)
if err != nil {
t.Fatalf("adaptUploadTarget() error = %v", err)
}
if target.ID() != "wopan-target" || target.Kind() != "wopan" || target.RootID() != "root-wopan" {
t.Fatalf("target id/kind/root = %q/%q/%q, want wopan-target/wopan/root-wopan", target.ID(), target.Kind(), target.RootID())
}
}
// TestResolveTargetRejectsUnsupportedKind 验证当目标 drive 既不是 PikPak、115、123、OneDrive、Google Drive 也不是联通网盘时,
// resolveTarget 拒绝并返回 error,让 runOnce 静默跳过(不会做破坏性变更)。
func TestResolveTargetRejectsUnsupportedKind(t *testing.T) {
cat := setupCatalog(t)
+11 -8
View File
@@ -5,6 +5,8 @@ import (
"os"
"path/filepath"
"strings"
"github.com/video-site/backend/internal/mediaasset"
)
type VideoAssetRef struct {
@@ -71,14 +73,15 @@ func Compute(
continue
}
driveUsage := out.Drives[ref.DriveID]
thumbPath := filepath.Join(localDir, "thumbs", ref.ID+".jpg")
if size, exists, err := regularFileSize(thumbPath); err != nil {
return Usage{}, err
} else if exists {
key := ref.DriveID + "\x00thumb\x00" + thumbPath
if !seen[key] {
driveUsage.ThumbnailBytes += size
seen[key] = true
for _, thumbPath := range mediaasset.ThumbnailPathCandidates(localDir, ref.ID) {
if size, exists, err := regularFileSize(thumbPath); err != nil {
return Usage{}, err
} else if exists {
key := ref.DriveID + "\x00thumb\x00" + thumbPath
if !seen[key] {
driveUsage.ThumbnailBytes += size
seen[key] = true
}
}
}
@@ -3,7 +3,10 @@ package storageusage
import (
"os"
"path/filepath"
"strings"
"testing"
"github.com/video-site/backend/internal/mediaasset"
)
func TestComputeCountsLocalThumbnailsAndTeasersByDrive(t *testing.T) {
@@ -13,6 +16,8 @@ func TestComputeCountsLocalThumbnailsAndTeasersByDrive(t *testing.T) {
}
writeSizedFile(t, filepath.Join(localDir, "thumbs", "video-a.jpg"), 3)
writeSizedFile(t, filepath.Join(localDir, "thumbs", "video-b.jpg"), 5)
longID := "localstorage-" + strings.Repeat("x", 240)
writeSizedFile(t, mediaasset.ThumbnailPath(localDir, longID), 13)
teaserA := filepath.Join(localDir, "video-a.mp4")
teaserB := filepath.Join(localDir, "video-b.mp4")
writeSizedFile(t, teaserA, 7)
@@ -24,6 +29,7 @@ func TestComputeCountsLocalThumbnailsAndTeasersByDrive(t *testing.T) {
{ID: "video-a", DriveID: "drive-a", PreviewLocal: teaserA},
{ID: "video-a-copy", DriveID: "drive-a", PreviewLocal: teaserA},
{ID: "video-b", DriveID: "drive-b", PreviewLocal: teaserB},
{ID: longID, DriveID: "drive-b"},
{ID: "outside", DriveID: "drive-b", PreviewLocal: outside},
{ID: "unknown-drive-video", DriveID: "missing", PreviewLocal: teaserB},
}, []string{"drive-a", "drive-b"}, func(string) (DiskStats, error) {
@@ -41,11 +47,11 @@ func TestComputeCountsLocalThumbnailsAndTeasersByDrive(t *testing.T) {
t.Fatalf("drive-a usage = %#v, want thumbnails=3 teaser=7 total=10", driveA)
}
driveB := got.Drives["drive-b"]
if driveB.ThumbnailBytes != 5 || driveB.TeaserBytes != 11 || driveB.TotalBytes != 16 {
t.Fatalf("drive-b usage = %#v, want thumbnails=5 teaser=11 total=16", driveB)
if driveB.ThumbnailBytes != 18 || driveB.TeaserBytes != 11 || driveB.TotalBytes != 29 {
t.Fatalf("drive-b usage = %#v, want thumbnails=18 teaser=11 total=29", driveB)
}
if got.ThumbnailBytes != 8 || got.TeaserBytes != 18 || got.TotalBytes != 26 {
t.Fatalf("totals = %#v, want thumbnails=8 teaser=18 total=26", got)
if got.ThumbnailBytes != 21 || got.TeaserBytes != 18 || got.TotalBytes != 39 {
t.Fatalf("totals = %#v, want thumbnails=21 teaser=18 total=39", got)
}
}
+168
View File
@@ -0,0 +1,168 @@
// Copyright 2018 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package socks
import (
"context"
"errors"
"io"
"net"
"strconv"
"time"
)
var (
noDeadline = time.Time{}
aLongTimeAgo = time.Unix(1, 0)
)
func (d *Dialer) connect(ctx context.Context, c net.Conn, address string) (_ net.Addr, ctxErr error) {
host, port, err := splitHostPort(address)
if err != nil {
return nil, err
}
if deadline, ok := ctx.Deadline(); ok && !deadline.IsZero() {
c.SetDeadline(deadline)
defer c.SetDeadline(noDeadline)
}
if ctx != context.Background() {
errCh := make(chan error, 1)
done := make(chan struct{})
defer func() {
close(done)
if ctxErr == nil {
ctxErr = <-errCh
}
}()
go func() {
select {
case <-ctx.Done():
c.SetDeadline(aLongTimeAgo)
errCh <- ctx.Err()
case <-done:
errCh <- nil
}
}()
}
b := make([]byte, 0, 6+len(host)) // the size here is just an estimate
b = append(b, Version5)
if len(d.AuthMethods) == 0 || d.Authenticate == nil {
b = append(b, 1, byte(AuthMethodNotRequired))
} else {
ams := d.AuthMethods
if len(ams) > 255 {
return nil, errors.New("too many authentication methods")
}
b = append(b, byte(len(ams)))
for _, am := range ams {
b = append(b, byte(am))
}
}
if _, ctxErr = c.Write(b); ctxErr != nil {
return
}
if _, ctxErr = io.ReadFull(c, b[:2]); ctxErr != nil {
return
}
if b[0] != Version5 {
return nil, errors.New("unexpected protocol version " + strconv.Itoa(int(b[0])))
}
am := AuthMethod(b[1])
if am == AuthMethodNoAcceptableMethods {
return nil, errors.New("no acceptable authentication methods")
}
if d.Authenticate != nil {
if ctxErr = d.Authenticate(ctx, c, am); ctxErr != nil {
return
}
}
b = b[:0]
b = append(b, Version5, byte(d.cmd), 0)
if ip := net.ParseIP(host); ip != nil {
if ip4 := ip.To4(); ip4 != nil {
b = append(b, AddrTypeIPv4)
b = append(b, ip4...)
} else if ip6 := ip.To16(); ip6 != nil {
b = append(b, AddrTypeIPv6)
b = append(b, ip6...)
} else {
return nil, errors.New("unknown address type")
}
} else {
if len(host) > 255 {
return nil, errors.New("FQDN too long")
}
b = append(b, AddrTypeFQDN)
b = append(b, byte(len(host)))
b = append(b, host...)
}
b = append(b, byte(port>>8), byte(port))
if _, ctxErr = c.Write(b); ctxErr != nil {
return
}
if _, ctxErr = io.ReadFull(c, b[:4]); ctxErr != nil {
return
}
if b[0] != Version5 {
return nil, errors.New("unexpected protocol version " + strconv.Itoa(int(b[0])))
}
if cmdErr := Reply(b[1]); cmdErr != StatusSucceeded {
return nil, errors.New("unknown error " + cmdErr.String())
}
if b[2] != 0 {
return nil, errors.New("non-zero reserved field")
}
l := 2
var a Addr
switch b[3] {
case AddrTypeIPv4:
l += net.IPv4len
a.IP = make(net.IP, net.IPv4len)
case AddrTypeIPv6:
l += net.IPv6len
a.IP = make(net.IP, net.IPv6len)
case AddrTypeFQDN:
if _, err := io.ReadFull(c, b[:1]); err != nil {
return nil, err
}
l += int(b[0])
default:
return nil, errors.New("unknown address type " + strconv.Itoa(int(b[3])))
}
if cap(b) < l {
b = make([]byte, l)
} else {
b = b[:l]
}
if _, ctxErr = io.ReadFull(c, b); ctxErr != nil {
return
}
if a.IP != nil {
copy(a.IP, b)
} else {
a.Name = string(b[:len(b)-2])
}
a.Port = int(b[len(b)-2])<<8 | int(b[len(b)-1])
return &a, nil
}
func splitHostPort(address string) (string, int, error) {
host, port, err := net.SplitHostPort(address)
if err != nil {
return "", 0, err
}
portnum, err := strconv.Atoi(port)
if err != nil {
return "", 0, err
}
if 1 > portnum || portnum > 0xffff {
return "", 0, errors.New("port number out of range " + port)
}
return host, portnum, nil
}
+317
View File
@@ -0,0 +1,317 @@
// Copyright 2018 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Package socks provides a SOCKS version 5 client implementation.
//
// SOCKS protocol version 5 is defined in RFC 1928.
// Username/Password authentication for SOCKS version 5 is defined in
// RFC 1929.
package socks
import (
"context"
"errors"
"io"
"net"
"strconv"
)
// A Command represents a SOCKS command.
type Command int
func (cmd Command) String() string {
switch cmd {
case CmdConnect:
return "socks connect"
case cmdBind:
return "socks bind"
default:
return "socks " + strconv.Itoa(int(cmd))
}
}
// An AuthMethod represents a SOCKS authentication method.
type AuthMethod int
// A Reply represents a SOCKS command reply code.
type Reply int
func (code Reply) String() string {
switch code {
case StatusSucceeded:
return "succeeded"
case 0x01:
return "general SOCKS server failure"
case 0x02:
return "connection not allowed by ruleset"
case 0x03:
return "network unreachable"
case 0x04:
return "host unreachable"
case 0x05:
return "connection refused"
case 0x06:
return "TTL expired"
case 0x07:
return "command not supported"
case 0x08:
return "address type not supported"
default:
return "unknown code: " + strconv.Itoa(int(code))
}
}
// Wire protocol constants.
const (
Version5 = 0x05
AddrTypeIPv4 = 0x01
AddrTypeFQDN = 0x03
AddrTypeIPv6 = 0x04
CmdConnect Command = 0x01 // establishes an active-open forward proxy connection
cmdBind Command = 0x02 // establishes a passive-open forward proxy connection
AuthMethodNotRequired AuthMethod = 0x00 // no authentication required
AuthMethodUsernamePassword AuthMethod = 0x02 // use username/password
AuthMethodNoAcceptableMethods AuthMethod = 0xff // no acceptable authentication methods
StatusSucceeded Reply = 0x00
)
// An Addr represents a SOCKS-specific address.
// Either Name or IP is used exclusively.
type Addr struct {
Name string // fully-qualified domain name
IP net.IP
Port int
}
func (a *Addr) Network() string { return "socks" }
func (a *Addr) String() string {
if a == nil {
return "<nil>"
}
port := strconv.Itoa(a.Port)
if a.IP == nil {
return net.JoinHostPort(a.Name, port)
}
return net.JoinHostPort(a.IP.String(), port)
}
// A Conn represents a forward proxy connection.
type Conn struct {
net.Conn
boundAddr net.Addr
}
// BoundAddr returns the address assigned by the proxy server for
// connecting to the command target address from the proxy server.
func (c *Conn) BoundAddr() net.Addr {
if c == nil {
return nil
}
return c.boundAddr
}
// A Dialer holds SOCKS-specific options.
type Dialer struct {
cmd Command // either CmdConnect or cmdBind
proxyNetwork string // network between a proxy server and a client
proxyAddress string // proxy server address
// ProxyDial specifies the optional dial function for
// establishing the transport connection.
ProxyDial func(context.Context, string, string) (net.Conn, error)
// AuthMethods specifies the list of request authentication
// methods.
// If empty, SOCKS client requests only AuthMethodNotRequired.
AuthMethods []AuthMethod
// Authenticate specifies the optional authentication
// function. It must be non-nil when AuthMethods is not empty.
// It must return an error when the authentication is failed.
Authenticate func(context.Context, io.ReadWriter, AuthMethod) error
}
// DialContext connects to the provided address on the provided
// network.
//
// The returned error value may be a net.OpError. When the Op field of
// net.OpError contains "socks", the Source field contains a proxy
// server address and the Addr field contains a command target
// address.
//
// See func Dial of the net package of standard library for a
// description of the network and address parameters.
func (d *Dialer) DialContext(ctx context.Context, network, address string) (net.Conn, error) {
if err := d.validateTarget(network, address); err != nil {
proxy, dst, _ := d.pathAddrs(address)
return nil, &net.OpError{Op: d.cmd.String(), Net: network, Source: proxy, Addr: dst, Err: err}
}
if ctx == nil {
proxy, dst, _ := d.pathAddrs(address)
return nil, &net.OpError{Op: d.cmd.String(), Net: network, Source: proxy, Addr: dst, Err: errors.New("nil context")}
}
var err error
var c net.Conn
if d.ProxyDial != nil {
c, err = d.ProxyDial(ctx, d.proxyNetwork, d.proxyAddress)
} else {
var dd net.Dialer
c, err = dd.DialContext(ctx, d.proxyNetwork, d.proxyAddress)
}
if err != nil {
proxy, dst, _ := d.pathAddrs(address)
return nil, &net.OpError{Op: d.cmd.String(), Net: network, Source: proxy, Addr: dst, Err: err}
}
a, err := d.connect(ctx, c, address)
if err != nil {
c.Close()
proxy, dst, _ := d.pathAddrs(address)
return nil, &net.OpError{Op: d.cmd.String(), Net: network, Source: proxy, Addr: dst, Err: err}
}
return &Conn{Conn: c, boundAddr: a}, nil
}
// DialWithConn initiates a connection from SOCKS server to the target
// network and address using the connection c that is already
// connected to the SOCKS server.
//
// It returns the connection's local address assigned by the SOCKS
// server.
func (d *Dialer) DialWithConn(ctx context.Context, c net.Conn, network, address string) (net.Addr, error) {
if err := d.validateTarget(network, address); err != nil {
proxy, dst, _ := d.pathAddrs(address)
return nil, &net.OpError{Op: d.cmd.String(), Net: network, Source: proxy, Addr: dst, Err: err}
}
if ctx == nil {
proxy, dst, _ := d.pathAddrs(address)
return nil, &net.OpError{Op: d.cmd.String(), Net: network, Source: proxy, Addr: dst, Err: errors.New("nil context")}
}
a, err := d.connect(ctx, c, address)
if err != nil {
proxy, dst, _ := d.pathAddrs(address)
return nil, &net.OpError{Op: d.cmd.String(), Net: network, Source: proxy, Addr: dst, Err: err}
}
return a, nil
}
// Dial connects to the provided address on the provided network.
//
// Unlike DialContext, it returns a raw transport connection instead
// of a forward proxy connection.
//
// Deprecated: Use DialContext or DialWithConn instead.
func (d *Dialer) Dial(network, address string) (net.Conn, error) {
if err := d.validateTarget(network, address); err != nil {
proxy, dst, _ := d.pathAddrs(address)
return nil, &net.OpError{Op: d.cmd.String(), Net: network, Source: proxy, Addr: dst, Err: err}
}
var err error
var c net.Conn
if d.ProxyDial != nil {
c, err = d.ProxyDial(context.Background(), d.proxyNetwork, d.proxyAddress)
} else {
c, err = net.Dial(d.proxyNetwork, d.proxyAddress)
}
if err != nil {
proxy, dst, _ := d.pathAddrs(address)
return nil, &net.OpError{Op: d.cmd.String(), Net: network, Source: proxy, Addr: dst, Err: err}
}
if _, err := d.DialWithConn(context.Background(), c, network, address); err != nil {
c.Close()
return nil, err
}
return c, nil
}
func (d *Dialer) validateTarget(network, address string) error {
switch network {
case "tcp", "tcp6", "tcp4":
default:
return errors.New("network not implemented")
}
switch d.cmd {
case CmdConnect, cmdBind:
default:
return errors.New("command not implemented")
}
return nil
}
func (d *Dialer) pathAddrs(address string) (proxy, dst net.Addr, err error) {
for i, s := range []string{d.proxyAddress, address} {
host, port, err := splitHostPort(s)
if err != nil {
return nil, nil, err
}
a := &Addr{Port: port}
a.IP = net.ParseIP(host)
if a.IP == nil {
a.Name = host
}
if i == 0 {
proxy = a
} else {
dst = a
}
}
return
}
// NewDialer returns a new Dialer that dials through the provided
// proxy server's network and address.
func NewDialer(network, address string) *Dialer {
return &Dialer{proxyNetwork: network, proxyAddress: address, cmd: CmdConnect}
}
const (
authUsernamePasswordVersion = 0x01
authStatusSucceeded = 0x00
)
// UsernamePassword are the credentials for the username/password
// authentication method.
type UsernamePassword struct {
Username string
Password string
}
// Authenticate authenticates a pair of username and password with the
// proxy server.
func (up *UsernamePassword) Authenticate(ctx context.Context, rw io.ReadWriter, auth AuthMethod) error {
switch auth {
case AuthMethodNotRequired:
return nil
case AuthMethodUsernamePassword:
if len(up.Username) == 0 || len(up.Username) > 255 || len(up.Password) > 255 {
return errors.New("invalid username/password")
}
b := []byte{authUsernamePasswordVersion}
b = append(b, byte(len(up.Username)))
b = append(b, up.Username...)
b = append(b, byte(len(up.Password)))
b = append(b, up.Password...)
// TODO(mikio): handle IO deadlines and cancelation if
// necessary
if _, err := rw.Write(b); err != nil {
return err
}
if _, err := io.ReadFull(rw, b[:2]); err != nil {
return err
}
if b[0] != authUsernamePasswordVersion {
return errors.New("invalid username/password version")
}
if b[1] != authStatusSucceeded {
return errors.New("username/password authentication failed")
}
return nil
}
return errors.New("unsupported authentication method " + strconv.Itoa(int(auth)))
}
+54
View File
@@ -0,0 +1,54 @@
// Copyright 2019 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package proxy
import (
"context"
"net"
)
// A ContextDialer dials using a context.
type ContextDialer interface {
DialContext(ctx context.Context, network, address string) (net.Conn, error)
}
// Dial works like DialContext on net.Dialer but using a dialer returned by FromEnvironment.
//
// The passed ctx is only used for returning the Conn, not the lifetime of the Conn.
//
// Custom dialers (registered via RegisterDialerType) that do not implement ContextDialer
// can leak a goroutine for as long as it takes the underlying Dialer implementation to timeout.
//
// A Conn returned from a successful Dial after the context has been cancelled will be immediately closed.
func Dial(ctx context.Context, network, address string) (net.Conn, error) {
d := FromEnvironment()
if xd, ok := d.(ContextDialer); ok {
return xd.DialContext(ctx, network, address)
}
return dialContext(ctx, d, network, address)
}
// WARNING: this can leak a goroutine for as long as the underlying Dialer implementation takes to timeout
// A Conn returned from a successful Dial after the context has been cancelled will be immediately closed.
func dialContext(ctx context.Context, d Dialer, network, address string) (net.Conn, error) {
var (
conn net.Conn
done = make(chan struct{}, 1)
err error
)
go func() {
conn, err = d.Dial(network, address)
close(done)
if conn != nil && ctx.Err() != nil {
conn.Close()
}
}()
select {
case <-ctx.Done():
err = ctx.Err()
case <-done:
}
return conn, err
}
+31
View File
@@ -0,0 +1,31 @@
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package proxy
import (
"context"
"net"
)
type direct struct{}
// Direct implements Dialer by making network connections directly using net.Dial or net.DialContext.
var Direct = direct{}
var (
_ Dialer = Direct
_ ContextDialer = Direct
)
// Dial directly invokes net.Dial with the supplied parameters.
func (direct) Dial(network, addr string) (net.Conn, error) {
return net.Dial(network, addr)
}
// DialContext instantiates a net.Dialer and invokes its DialContext receiver with the supplied parameters.
func (direct) DialContext(ctx context.Context, network, addr string) (net.Conn, error) {
var d net.Dialer
return d.DialContext(ctx, network, addr)
}
+151
View File
@@ -0,0 +1,151 @@
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package proxy
import (
"context"
"net"
"strings"
)
// A PerHost directs connections to a default Dialer unless the host name
// requested matches one of a number of exceptions.
type PerHost struct {
def, bypass Dialer
bypassNetworks []*net.IPNet
bypassIPs []net.IP
bypassZones []string
bypassHosts []string
}
// NewPerHost returns a PerHost Dialer that directs connections to either
// defaultDialer or bypass, depending on whether the connection matches one of
// the configured rules.
func NewPerHost(defaultDialer, bypass Dialer) *PerHost {
return &PerHost{
def: defaultDialer,
bypass: bypass,
}
}
// Dial connects to the address addr on the given network through either
// defaultDialer or bypass.
func (p *PerHost) Dial(network, addr string) (c net.Conn, err error) {
host, _, err := net.SplitHostPort(addr)
if err != nil {
return nil, err
}
return p.dialerForRequest(host).Dial(network, addr)
}
// DialContext connects to the address addr on the given network through either
// defaultDialer or bypass.
func (p *PerHost) DialContext(ctx context.Context, network, addr string) (c net.Conn, err error) {
host, _, err := net.SplitHostPort(addr)
if err != nil {
return nil, err
}
d := p.dialerForRequest(host)
if x, ok := d.(ContextDialer); ok {
return x.DialContext(ctx, network, addr)
}
return dialContext(ctx, d, network, addr)
}
func (p *PerHost) dialerForRequest(host string) Dialer {
if ip := net.ParseIP(host); ip != nil {
for _, net := range p.bypassNetworks {
if net.Contains(ip) {
return p.bypass
}
}
for _, bypassIP := range p.bypassIPs {
if bypassIP.Equal(ip) {
return p.bypass
}
}
return p.def
}
for _, zone := range p.bypassZones {
if strings.HasSuffix(host, zone) {
return p.bypass
}
if host == zone[1:] {
// For a zone ".example.com", we match "example.com"
// too.
return p.bypass
}
}
for _, bypassHost := range p.bypassHosts {
if bypassHost == host {
return p.bypass
}
}
return p.def
}
// AddFromString parses a string that contains comma-separated values
// specifying hosts that should use the bypass proxy. Each value is either an
// IP address, a CIDR range, a zone (*.example.com) or a host name
// (localhost). A best effort is made to parse the string and errors are
// ignored.
func (p *PerHost) AddFromString(s string) {
hosts := strings.Split(s, ",")
for _, host := range hosts {
host = strings.TrimSpace(host)
if len(host) == 0 {
continue
}
if strings.Contains(host, "/") {
// We assume that it's a CIDR address like 127.0.0.0/8
if _, net, err := net.ParseCIDR(host); err == nil {
p.AddNetwork(net)
}
continue
}
if ip := net.ParseIP(host); ip != nil {
p.AddIP(ip)
continue
}
if strings.HasPrefix(host, "*.") {
p.AddZone(host[1:])
continue
}
p.AddHost(host)
}
}
// AddIP specifies an IP address that will use the bypass proxy. Note that
// this will only take effect if a literal IP address is dialed. A connection
// to a named host will never match an IP.
func (p *PerHost) AddIP(ip net.IP) {
p.bypassIPs = append(p.bypassIPs, ip)
}
// AddNetwork specifies an IP range that will use the bypass proxy. Note that
// this will only take effect if a literal IP address is dialed. A connection
// to a named host will never match.
func (p *PerHost) AddNetwork(net *net.IPNet) {
p.bypassNetworks = append(p.bypassNetworks, net)
}
// AddZone specifies a DNS suffix that will use the bypass proxy. A zone of
// "example.com" matches "example.com" and all of its subdomains.
func (p *PerHost) AddZone(zone string) {
zone = strings.TrimSuffix(zone, ".")
if !strings.HasPrefix(zone, ".") {
zone = "." + zone
}
p.bypassZones = append(p.bypassZones, zone)
}
// AddHost specifies a host name that will use the bypass proxy.
func (p *PerHost) AddHost(host string) {
host = strings.TrimSuffix(host, ".")
p.bypassHosts = append(p.bypassHosts, host)
}
+149
View File
@@ -0,0 +1,149 @@
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Package proxy provides support for a variety of protocols to proxy network
// data.
package proxy // import "golang.org/x/net/proxy"
import (
"errors"
"net"
"net/url"
"os"
"sync"
)
// A Dialer is a means to establish a connection.
// Custom dialers should also implement ContextDialer.
type Dialer interface {
// Dial connects to the given address via the proxy.
Dial(network, addr string) (c net.Conn, err error)
}
// Auth contains authentication parameters that specific Dialers may require.
type Auth struct {
User, Password string
}
// FromEnvironment returns the dialer specified by the proxy-related
// variables in the environment and makes underlying connections
// directly.
func FromEnvironment() Dialer {
return FromEnvironmentUsing(Direct)
}
// FromEnvironmentUsing returns the dialer specify by the proxy-related
// variables in the environment and makes underlying connections
// using the provided forwarding Dialer (for instance, a *net.Dialer
// with desired configuration).
func FromEnvironmentUsing(forward Dialer) Dialer {
allProxy := allProxyEnv.Get()
if len(allProxy) == 0 {
return forward
}
proxyURL, err := url.Parse(allProxy)
if err != nil {
return forward
}
proxy, err := FromURL(proxyURL, forward)
if err != nil {
return forward
}
noProxy := noProxyEnv.Get()
if len(noProxy) == 0 {
return proxy
}
perHost := NewPerHost(proxy, forward)
perHost.AddFromString(noProxy)
return perHost
}
// proxySchemes is a map from URL schemes to a function that creates a Dialer
// from a URL with such a scheme.
var proxySchemes map[string]func(*url.URL, Dialer) (Dialer, error)
// RegisterDialerType takes a URL scheme and a function to generate Dialers from
// a URL with that scheme and a forwarding Dialer. Registered schemes are used
// by FromURL.
func RegisterDialerType(scheme string, f func(*url.URL, Dialer) (Dialer, error)) {
if proxySchemes == nil {
proxySchemes = make(map[string]func(*url.URL, Dialer) (Dialer, error))
}
proxySchemes[scheme] = f
}
// FromURL returns a Dialer given a URL specification and an underlying
// Dialer for it to make network requests.
func FromURL(u *url.URL, forward Dialer) (Dialer, error) {
var auth *Auth
if u.User != nil {
auth = new(Auth)
auth.User = u.User.Username()
if p, ok := u.User.Password(); ok {
auth.Password = p
}
}
switch u.Scheme {
case "socks5", "socks5h":
addr := u.Hostname()
port := u.Port()
if port == "" {
port = "1080"
}
return SOCKS5("tcp", net.JoinHostPort(addr, port), auth, forward)
}
// If the scheme doesn't match any of the built-in schemes, see if it
// was registered by another package.
if proxySchemes != nil {
if f, ok := proxySchemes[u.Scheme]; ok {
return f(u, forward)
}
}
return nil, errors.New("proxy: unknown scheme: " + u.Scheme)
}
var (
allProxyEnv = &envOnce{
names: []string{"ALL_PROXY", "all_proxy"},
}
noProxyEnv = &envOnce{
names: []string{"NO_PROXY", "no_proxy"},
}
)
// envOnce looks up an environment variable (optionally by multiple
// names) once. It mitigates expensive lookups on some platforms
// (e.g. Windows).
// (Borrowed from net/http/transport.go)
type envOnce struct {
names []string
once sync.Once
val string
}
func (e *envOnce) Get() string {
e.once.Do(e.init)
return e.val
}
func (e *envOnce) init() {
for _, n := range e.names {
e.val = os.Getenv(n)
if e.val != "" {
return
}
}
}
// reset is used by tests
func (e *envOnce) reset() {
e.once = sync.Once{}
e.val = ""
}
+42
View File
@@ -0,0 +1,42 @@
// Copyright 2011 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package proxy
import (
"context"
"net"
"golang.org/x/net/internal/socks"
)
// SOCKS5 returns a Dialer that makes SOCKSv5 connections to the given
// address with an optional username and password.
// See RFC 1928 and RFC 1929.
func SOCKS5(network, address string, auth *Auth, forward Dialer) (Dialer, error) {
d := socks.NewDialer(network, address)
if forward != nil {
if f, ok := forward.(ContextDialer); ok {
d.ProxyDial = func(ctx context.Context, network string, address string) (net.Conn, error) {
return f.DialContext(ctx, network, address)
}
} else {
d.ProxyDial = func(ctx context.Context, network string, address string) (net.Conn, error) {
return dialContext(ctx, forward, network, address)
}
}
}
if auth != nil {
up := socks.UsernamePassword{
Username: auth.User,
Password: auth.Password,
}
d.AuthMethods = []socks.AuthMethod{
socks.AuthMethodNotRequired,
socks.AuthMethodUsernamePassword,
}
d.Authenticate = up.Authenticate
}
return d, nil
}
+2
View File
@@ -67,6 +67,8 @@ github.com/skip2/go-qrcode/reedsolomon
golang.org/x/crypto/curve25519
# golang.org/x/net v0.27.0
## explicit; go 1.18
golang.org/x/net/internal/socks
golang.org/x/net/proxy
golang.org/x/net/publicsuffix
# golang.org/x/sys v0.30.0
## explicit; go 1.18
+26 -5
View File
@@ -49,7 +49,7 @@ Common overrides:
FRONTEND_PORT=9191 Public web port
FRONTEND_HOST=0.0.0.0 Public web bind address
GO_VERSION=1.23.12
INSTALL_DEPS=0 Do not install missing Node/Go/ffmpeg
INSTALL_DEPS=0 Do not install missing Node/Go/ffmpeg/Python runtime deps
CONFIGURE_UFW=0 Do not open UFW port automatically
DEPLOY_USER=<user> Service user; defaults to sudo user or root
@@ -130,7 +130,25 @@ apt_install() {
export DEBIAN_FRONTEND=noninteractive
log "installing base packages"
apt-get update
apt-get install -y ca-certificates curl git ffmpeg openssl iproute2 build-essential
apt-get install -y ca-certificates curl git ffmpeg openssl iproute2 build-essential \
python3 python3-requests python3-bs4 python3-lxml python3-socks
}
verify_crawler_python_deps() {
command -v python3 >/dev/null 2>&1 || die "python3 is required for crawler scripts"
python3 - <<'PY' || die "missing Python modules for crawler scripts: requests, bs4, lxml, socks"
import importlib.util
import sys
missing = [
name
for name in ("requests", "bs4", "lxml", "socks")
if importlib.util.find_spec(name) is None
]
if missing:
print("missing Python modules: " + ", ".join(missing), file=sys.stderr)
sys.exit(1)
PY
}
install_node() {
@@ -182,6 +200,7 @@ install_dependencies() {
install_go
command -v ffmpeg >/dev/null 2>&1 || die "ffmpeg is required"
command -v ffprobe >/dev/null 2>&1 || die "ffprobe is required"
verify_crawler_python_deps
}
ensure_ownership() {
@@ -315,8 +334,8 @@ EOF
}
open_firewall_port() {
[[ "$CONFIGURE_UFW" == "1" ]] || return
command -v ufw >/dev/null 2>&1 || return
[[ "$CONFIGURE_UFW" == "1" ]] || return 0
command -v ufw >/dev/null 2>&1 || return 0
if ufw status 2>/dev/null | grep -qi "Status: active"; then
log "UFW is active; allowing ${FRONTEND_PORT}/tcp"
ufw allow "${FRONTEND_PORT}/tcp"
@@ -359,7 +378,9 @@ install_or_update() {
open_firewall_port
restart_services
show_status
[[ "$mode" == "install" ]] && show_summary
if [[ "$mode" == "install" ]]; then
show_summary
fi
}
uninstall_services() {
+9
View File
@@ -0,0 +1,9 @@
services:
video-site-91:
image: ghcr.io/nianzhibai/91:stable
container_name: video-site-91
ports:
- "9191:9191"
volumes:
- ./data:/opt/video-site-91/data
restart: unless-stopped
+38
View File
@@ -0,0 +1,38 @@
#!/bin/sh
set -eu
APP_DIR="/opt/video-site-91"
DATA_DIR="${VIDEO_DATA_DIR:-$APP_DIR/data}"
CONFIG="${VIDEO_CONFIG:-$DATA_DIR/config.yaml}"
EXAMPLE="$APP_DIR/config.example.yaml"
PORT="${VIDEO_LISTEN_PORT:-9191}"
mkdir -p "$DATA_DIR" "$DATA_DIR/previews" "$DATA_DIR/uploads" "$DATA_DIR/spider91"
if [ ! -f "$CONFIG" ]; then
if [ ! -f "$EXAMPLE" ]; then
echo "[entrypoint] missing config template: $EXAMPLE" >&2
exit 1
fi
mkdir -p "$(dirname "$CONFIG")"
cp "$EXAMPLE" "$CONFIG"
SECRET="$(openssl rand -hex 32)"
sed -i -E "s#^([[:space:]]*listen:[[:space:]]*).*\$#\1\"0.0.0.0:${PORT}\"#" "$CONFIG"
sed -i -E "s#^([[:space:]]*session_secret:[[:space:]]*).*\$#\1\"${SECRET}\"#" "$CONFIG"
sed -i -E "s#^([[:space:]]*db_path:[[:space:]]*).*\$#\1\"${DATA_DIR}/video-site.db\"#" "$CONFIG"
sed -i -E "s#^([[:space:]]*local_preview_dir:[[:space:]]*).*\$#\1\"${DATA_DIR}/previews\"#" "$CONFIG"
chmod 600 "$CONFIG"
echo "[entrypoint] generated $CONFIG"
else
echo "[entrypoint] using existing $CONFIG"
fi
if [ -n "${VIDEO_VERSION_FILE:-}" ] && [ -n "${VIDEO_IMAGE_VERSION:-}" ]; then
mkdir -p "$(dirname "$VIDEO_VERSION_FILE")"
printf '%s\n' "$VIDEO_IMAGE_VERSION" > "$VIDEO_VERSION_FILE"
fi
exec "$@"
+1
View File
@@ -2,6 +2,7 @@
<html lang="zh-CN">
<head>
<meta charset="UTF-8" />
<meta name="referrer" content="no-referrer" />
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, viewport-fit=cover" />
<meta name="description" content="91 视频站" />
+443 -41
View File
@@ -11,6 +11,12 @@ VERSION="${VERSION:-latest}"
GH_PROXY="${GH_PROXY:-}"
CONFIGURE_UFW="${CONFIGURE_UFW:-1}"
INSTALL_DEPS="${INSTALL_DEPS:-1}"
SELF_UPDATE="${SELF_UPDATE:-1}"
FORCE_UPDATE="${FORCE_UPDATE:-0}"
INSTALL_SCRIPT_REF="${INSTALL_SCRIPT_REF:-main}"
INSTALL_SCRIPT_URL="${INSTALL_SCRIPT_URL:-${GH_PROXY}https://raw.githubusercontent.com/${GITHUB_REPO}/${INSTALL_SCRIPT_REF}/install.sh}"
VIDEO_SITE_SKIP_SELF_UPDATE="${VIDEO_SITE_SKIP_SELF_UPDATE:-0}"
SERVICE_READY_TIMEOUT="${SERVICE_READY_TIMEOUT:-90}"
VERSION_FILE="$INSTALL_PATH/.version"
MANAGER_PATH="/usr/local/sbin/${APP_NAME}-manager"
COMMAND_LINK="/usr/local/bin/91"
@@ -47,7 +53,7 @@ Default action:
Actions:
install Install to $INSTALL_PATH
update Download latest release and replace program files, keeping config/data
update Refresh manager script, download latest release, and keep config/data
restart Restart service
stop Stop service
status Show service status
@@ -62,6 +68,12 @@ Options via environment:
GH_PROXY=$GH_PROXY
INSTALL_DEPS=$INSTALL_DEPS
CONFIGURE_UFW=$CONFIGURE_UFW
SELF_UPDATE=$SELF_UPDATE
FORCE_UPDATE=$FORCE_UPDATE
UNINSTALL_DELETE_FILES=0 Set to 1 for non-interactive uninstall to delete $INSTALL_PATH
INSTALL_SCRIPT_REF=$INSTALL_SCRIPT_REF
INSTALL_SCRIPT_URL=$INSTALL_SCRIPT_URL
SERVICE_READY_TIMEOUT=$SERVICE_READY_TIMEOUT
Examples:
sudo bash install.sh
@@ -110,6 +122,27 @@ asset_name() {
printf '%s-linux-%s.tar.gz' "$APP_NAME" "$ARCH"
}
verify_runtime_deps() {
local cmd
for cmd in curl tar ffmpeg ffprobe openssl python3; do
command -v "$cmd" >/dev/null 2>&1 || die "missing command: $cmd"
done
python3 - <<'PY' || die "missing Python modules for crawler scripts: requests, bs4, lxml, socks"
import importlib.util
import sys
missing = [
name
for name in ("requests", "bs4", "lxml", "socks")
if importlib.util.find_spec(name) is None
]
if missing:
print("missing Python modules: " + ", ".join(missing), file=sys.stderr)
sys.exit(1)
PY
}
install_deps() {
if [[ "$INSTALL_DEPS" != "1" ]]; then
return
@@ -118,13 +151,12 @@ install_deps() {
export DEBIAN_FRONTEND=noninteractive
log "installing runtime dependencies"
apt-get update
apt-get install -y ca-certificates curl tar ffmpeg openssl iproute2 python3 python3-requests python3-bs4 python3-lxml
apt-get install -y ca-certificates curl tar ffmpeg openssl iproute2 python3 python3-requests python3-bs4 python3-lxml python3-socks
verify_runtime_deps
return
fi
for cmd in curl tar ffmpeg ffprobe openssl; do
command -v "$cmd" >/dev/null 2>&1 || die "missing command: $cmd"
done
verify_runtime_deps
}
check_system() {
@@ -158,6 +190,30 @@ download_file() {
return 1
}
backup_install_files() {
local backup="$1"
mkdir -p "$backup"
cp -a "$INSTALL_PATH/server" "$backup/server"
for item in dist config.example.yaml 91VideoSpider config.yaml .version; do
if [[ -e "$INSTALL_PATH/$item" ]]; then
cp -a "$INSTALL_PATH/$item" "$backup/$item"
fi
done
}
restore_install_files() {
local backup="$1"
mkdir -p "$INSTALL_PATH"
cp -a "$backup/server" "$INSTALL_PATH/server"
for item in dist config.example.yaml 91VideoSpider config.yaml .version; do
rm -rf "${INSTALL_PATH:?}/$item"
if [[ -e "$backup/$item" ]]; then
cp -a "$backup/$item" "$INSTALL_PATH/$item"
fi
done
chmod +x "$INSTALL_PATH/server"
}
prepare_config() {
local cfg="$INSTALL_PATH/config.yaml"
local example="$INSTALL_PATH/config.example.yaml"
@@ -200,6 +256,8 @@ RestartSec=5
TimeoutStopSec=20
Environment=VIDEO_CONFIG=${INSTALL_PATH}/config.yaml
Environment=VIDEO_FRONTEND_DIR=${INSTALL_PATH}/dist
Environment=VIDEO_VERSION_FILE=${VERSION_FILE}
Environment=VIDEO_GITHUB_REPO=${GITHUB_REPO}
Environment=HOME=/root
Environment=PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
LimitNOFILE=65536
@@ -217,23 +275,297 @@ EOF
install_cli() {
local src
src="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/$(basename "${BASH_SOURCE[0]}")"
if [[ -f "$src" ]]; then
cp "$src" "$MANAGER_PATH"
chmod 755 "$MANAGER_PATH"
ln -sf "$MANAGER_PATH" "$COMMAND_LINK"
ln -sf "$MANAGER_PATH" "$APP_COMMAND_LINK"
install_cli_from_file "$src"
}
install_cli_from_file() {
local src="$1"
local tmp
[[ -f "$src" ]] || return 0
mkdir -p "$(dirname "$MANAGER_PATH")" "$(dirname "$COMMAND_LINK")" "$(dirname "$APP_COMMAND_LINK")"
tmp="${MANAGER_PATH}.tmp.$$"
cp "$src" "$tmp"
chmod 755 "$tmp"
mv "$tmp" "$MANAGER_PATH"
ln -sfn "$MANAGER_PATH" "$COMMAND_LINK"
ln -sfn "$MANAGER_PATH" "$APP_COMMAND_LINK"
}
self_update_manager() {
[[ "$SELF_UPDATE" == "1" ]] || return 1
[[ "$VIDEO_SITE_SKIP_SELF_UPDATE" != "1" ]] || return 1
[[ -n "$INSTALL_SCRIPT_URL" ]] || return 1
local tmp
tmp="$(mktemp)"
log "checking latest manager script"
if ! download_file "$INSTALL_SCRIPT_URL" "$tmp"; then
warn "manager self-update skipped: cannot download $INSTALL_SCRIPT_URL"
rm -f "$tmp"
return 1
fi
if ! bash -n "$tmp"; then
warn "manager self-update skipped: downloaded script has syntax errors"
rm -f "$tmp"
return 1
fi
if [[ -f "$MANAGER_PATH" ]] && cmp -s "$tmp" "$MANAGER_PATH"; then
rm -f "$tmp"
return 1
fi
install_cli_from_file "$tmp"
rm -f "$tmp"
log "manager script updated"
return 0
}
exec_latest_manager_update() {
local env_args=(
"VIDEO_SITE_SKIP_SELF_UPDATE=1"
"APP_NAME=$APP_NAME"
"GITHUB_REPO=$GITHUB_REPO"
"INSTALL_PATH=$INSTALL_PATH"
"SERVICE_NAME=$SERVICE_NAME"
"VERSION=$VERSION"
"GH_PROXY=$GH_PROXY"
"CONFIGURE_UFW=$CONFIGURE_UFW"
"INSTALL_DEPS=$INSTALL_DEPS"
"SELF_UPDATE=$SELF_UPDATE"
"FORCE_UPDATE=$FORCE_UPDATE"
"INSTALL_SCRIPT_REF=$INSTALL_SCRIPT_REF"
"INSTALL_SCRIPT_URL=$INSTALL_SCRIPT_URL"
"SERVICE_READY_TIMEOUT=$SERVICE_READY_TIMEOUT"
)
if [[ -n "$FRONTEND_PORT_WAS_SET" ]]; then
env_args+=("FRONTEND_PORT=$FRONTEND_PORT")
fi
exec env "${env_args[@]}" bash "$MANAGER_PATH" update
}
open_firewall_port() {
[[ "$CONFIGURE_UFW" == "1" ]] || return
command -v ufw >/dev/null 2>&1 || return
[[ "$CONFIGURE_UFW" == "1" ]] || return 0
command -v ufw >/dev/null 2>&1 || return 0
if ufw status 2>/dev/null | grep -qi "Status: active"; then
log "allowing ${FRONTEND_PORT}/tcp in UFW"
ufw allow "${FRONTEND_PORT}/tcp"
fi
}
listen_port_from_config() {
local cfg="$INSTALL_PATH/config.yaml"
local listen="" port
if [[ -f "$cfg" ]]; then
listen="$(sed -nE 's/^[[:space:]]*listen:[[:space:]]*"?([^" #]+)"?.*/\1/p' "$cfg" | head -n1)"
fi
port="${listen##*:}"
if [[ "$port" =~ ^[0-9]+$ ]]; then
printf '%s' "$port"
return
fi
printf '%s' "$FRONTEND_PORT"
}
append_unique() {
local value="$1"
shift
for existing in "$@"; do
[[ "$existing" == "$value" ]] && return 1
done
printf '%s' "$value"
}
app_service_names() {
local names=()
local name
for name in "$SERVICE_NAME" "$APP_NAME" video-site-91 video-site-backend video-site-frontend; do
[[ -n "$name" ]] || continue
if append_unique "$name" "${names[@]}" >/dev/null; then
names+=("$name")
fi
done
printf '%s\n' "${names[@]}"
}
stop_app_services() {
local name unit
while IFS= read -r name; do
[[ -n "$name" ]] || continue
unit="${name}.service"
systemctl disable --now "$unit" 2>/dev/null || systemctl stop "$unit" 2>/dev/null || true
rm -f "/etc/systemd/system/$unit"
done < <(app_service_names)
systemctl daemon-reload
}
remove_app_containers() {
command -v docker >/dev/null 2>&1 || return 0
local names=()
local name
for name in "$SERVICE_NAME" "$APP_NAME" video-site-91; do
[[ -n "$name" ]] || continue
if append_unique "$name" "${names[@]}" >/dev/null; then
names+=("$name")
fi
done
for name in "${names[@]}"; do
if docker ps -a --format '{{.Names}}' 2>/dev/null | grep -Fxq "$name"; then
log "removing docker container $name"
docker rm -f "$name" >/dev/null 2>&1 || true
fi
done
}
pids_listening_on_port() {
local port="$1"
[[ "$port" =~ ^[0-9]+$ ]] || return 0
command -v ss >/dev/null 2>&1 || return 0
ss -ltnp 2>/dev/null \
| awk -v port="$port" '$4 ~ ":" port "$" {print}' \
| grep -oE 'pid=[0-9]+' \
| cut -d= -f2 \
| sort -u || true
}
process_looks_like_app() {
local pid="$1"
local exe="" cmd=""
exe="$(readlink "/proc/$pid/exe" 2>/dev/null || true)"
cmd="$(tr '\0' ' ' <"/proc/$pid/cmdline" 2>/dev/null || true)"
[[ "$exe" == "$INSTALL_PATH/server" ]] && return 0
[[ "$cmd" == *"$INSTALL_PATH"* ]] && return 0
[[ "$cmd" == *"VIDEO_FRONTEND_DIR=$INSTALL_PATH/dist"* ]] && return 0
[[ "$cmd" == *"VIDEO_CONFIG=$INSTALL_PATH/config.yaml"* ]] && return 0
[[ "$cmd" == *"video-site-91"* ]] && return 0
[[ "$cmd" == *"91VideoSpider"* ]] && return 0
return 1
}
stop_lingering_app_processes() {
local ports=("$@")
local port pid pids=()
for port in "${ports[@]}"; do
[[ "$port" =~ ^[0-9]+$ ]] || continue
while IFS= read -r pid; do
[[ -n "$pid" ]] || continue
process_looks_like_app "$pid" || continue
if append_unique "$pid" "${pids[@]}" >/dev/null; then
pids+=("$pid")
fi
done < <(pids_listening_on_port "$port")
done
if (( ${#pids[@]} == 0 )); then
return
fi
warn "stopping lingering app process(es): ${pids[*]}"
kill "${pids[@]}" 2>/dev/null || true
sleep 1
local alive=()
for pid in "${pids[@]}"; do
if kill -0 "$pid" 2>/dev/null; then
alive+=("$pid")
fi
done
if (( ${#alive[@]} > 0 )); then
warn "force killing lingering app process(es): ${alive[*]}"
kill -9 "${alive[@]}" 2>/dev/null || true
fi
}
warn_remaining_listeners() {
local ports=("$@")
local port pid cmd
for port in "${ports[@]}"; do
[[ "$port" =~ ^[0-9]+$ ]] || continue
while IFS= read -r pid; do
[[ -n "$pid" ]] || continue
cmd="$(tr '\0' ' ' <"/proc/$pid/cmdline" 2>/dev/null || true)"
warn "port $port is still listening after uninstall: pid=$pid ${cmd:-unknown}"
done < <(pids_listening_on_port "$port")
done
}
has_interactive_tty() {
[[ -t 0 ]]
}
confirm_uninstall_app() {
if ! has_interactive_tty; then
return 0
fi
local confirm=""
printf '确认卸载 91 吗?这会停止服务、移除管理命令,并可选择是否删除项目文件。[y/N]: ' >/dev/tty
IFS= read -r confirm </dev/tty || confirm=""
case "$confirm" in
[yY]) return 0 ;;
*)
log "uninstall cancelled"
return 1
;;
esac
}
delete_install_path_requested() {
if [[ "${UNINSTALL_DELETE_FILES:-0}" == "1" ]]; then
return 0
fi
if ! has_interactive_tty; then
return 1
fi
local confirm=""
printf '删除 %s 里的程序、配置和数据吗?[y/N]: ' "$INSTALL_PATH" >/dev/tty
IFS= read -r confirm </dev/tty || confirm=""
case "$confirm" in
[yY]) return 0 ;;
*) return 1 ;;
esac
}
service_health_url() {
printf 'http://127.0.0.1:%s/admin/api/setup' "$(listen_port_from_config)"
}
wait_for_service_ready() {
local url deadline
url="$(service_health_url)"
deadline=$((SECONDS + SERVICE_READY_TIMEOUT))
log "waiting for service at $url"
while (( SECONDS < deadline )); do
if curl -fsS --connect-timeout 2 --max-time 5 "$url" >/dev/null 2>&1; then
log "service is ready"
return 0
fi
sleep 2
done
return 1
}
restart_service_ready() {
if systemctl restart "${SERVICE_NAME}.service" && wait_for_service_ready; then
return 0
fi
warn "service did not become ready; retrying restart"
if systemctl restart "${SERVICE_NAME}.service" && wait_for_service_ready; then
return 0
fi
warn "service failed to become ready"
systemctl --no-pager --full status "${SERVICE_NAME}.service" || true
journalctl -u "${SERVICE_NAME}.service" -n 80 --no-pager || true
return 1
}
fetch_and_unpack() {
local tmp archive url root
tmp="$(mktemp -d)"
@@ -271,19 +603,63 @@ fetch_and_unpack() {
rm -rf "$tmp"
}
current_version_from_github() {
installed_version() {
if [[ -f "$VERSION_FILE" ]]; then
head -n1 "$VERSION_FILE" 2>/dev/null | tr -d '\r'
fi
}
target_version() {
if [[ "$VERSION" != "latest" ]]; then
printf '%s' "$VERSION"
return
fi
curl -fsSL "https://api.github.com/repos/${GITHUB_REPO}/releases/latest" \
local body version effective_url
body="$(curl -fsSL \
-H "Accept: application/vnd.github+json" \
-H "User-Agent: video-site-91-installer" \
"https://api.github.com/repos/${GITHUB_REPO}/releases/latest" 2>/dev/null || true)"
version="$(printf '%s\n' "$body" \
| sed -nE 's/.*"tag_name"[[:space:]]*:[[:space:]]*"([^"]+)".*/\1/p' \
| head -n1)"
if [[ -n "$version" ]]; then
printf '%s' "$version"
return
fi
effective_url="$(curl -fsSLI -o /dev/null -w '%{url_effective}' "$(download_base_url)/$(asset_name)" 2>/dev/null || true)"
printf '%s\n' "$effective_url" \
| sed -nE 's#.*/releases/download/([^/]+)/.*#\1#p' \
| head -n1
}
should_skip_update() {
[[ "$FORCE_UPDATE" != "1" ]] || return 1
local current target
current="$(installed_version)"
target="$(target_version || true)"
if [[ -z "$target" ]]; then
warn "cannot determine target version; continuing update"
return 1
fi
if [[ -z "$current" ]]; then
log "installed version: unknown"
log "target version: $target"
return 1
fi
log "installed version: $current"
log "target version: $target"
[[ "$current" == "$target" ]]
}
record_version() {
local version
version="$(current_version_from_github || true)"
version="$(target_version || true)"
[[ -n "$version" ]] || version="$VERSION"
{
echo "$version"
@@ -298,7 +674,7 @@ show_success() {
version="$(head -n1 "$VERSION_FILE" 2>/dev/null || echo unknown)"
echo
printf "${GREEN}安装完成${RESET}\n"
printf '%b安装完成%b\n' "$GREEN" "$RESET"
echo "版本:$version"
[[ -n "$local_ip" ]] && echo "局域网:http://${local_ip}:${FRONTEND_PORT}/"
[[ -n "$public_ip" ]] && echo "公网: http://${public_ip}:${FRONTEND_PORT}/"
@@ -319,57 +695,79 @@ install_app() {
write_service
install_cli
open_firewall_port
restart_service_ready || die "service failed to start"
record_version
systemctl restart "${SERVICE_NAME}.service"
show_success
}
update_app() {
check_system
check_disk_space
install_deps
[[ -f "$INSTALL_PATH/server" ]] || die "not installed at $INSTALL_PATH"
if self_update_manager; then
log "re-running update with latest manager script"
exec_latest_manager_update
fi
install_deps
if should_skip_update; then
log "already up to date; skipped app update"
return 0
fi
check_disk_space
local backup
backup="$(mktemp -d)"
cp "$INSTALL_PATH/server" "$backup/server"
[[ -d "$INSTALL_PATH/dist" ]] && cp -R "$INSTALL_PATH/dist" "$backup/dist"
backup_install_files "$backup"
systemctl stop "${SERVICE_NAME}.service" 2>/dev/null || true
if ! fetch_and_unpack; then
if ! (fetch_and_unpack && prepare_config && write_service && install_cli); then
warn "update failed; restoring previous files"
cp "$backup/server" "$INSTALL_PATH/server"
rm -rf "$INSTALL_PATH/dist"
[[ -d "$backup/dist" ]] && cp -R "$backup/dist" "$INSTALL_PATH/dist"
restore_install_files "$backup"
systemctl start "${SERVICE_NAME}.service" 2>/dev/null || true
rm -rf "$backup"
exit 1
fi
prepare_config
write_service
install_cli
if ! restart_service_ready; then
warn "new version failed to start; restoring previous files"
restore_install_files "$backup"
restart_service_ready 2>/dev/null || true
rm -rf "$backup"
exit 1
fi
record_version
systemctl restart "${SERVICE_NAME}.service"
rm -rf "$backup"
log "updated"
}
uninstall_app() {
systemctl disable --now "${SERVICE_NAME}.service" 2>/dev/null || true
rm -f "/etc/systemd/system/${SERVICE_NAME}.service"
systemctl daemon-reload
local listen_port port ports=()
confirm_uninstall_app || return 1
listen_port="$(listen_port_from_config)"
for port in "$listen_port" "$FRONTEND_PORT" 9191 9192; do
[[ "$port" =~ ^[0-9]+$ ]] || continue
if append_unique "$port" "${ports[@]}" >/dev/null; then
ports+=("$port")
fi
done
stop_app_services
remove_app_containers
stop_lingering_app_processes "${ports[@]}"
rm -f "$COMMAND_LINK" "$APP_COMMAND_LINK" "$MANAGER_PATH"
if [[ -t 0 ]]; then
read -r -p "删除 $INSTALL_PATH 里的程序、配置和数据吗?[y/N]: " confirm
case "$confirm" in
[yY]) rm -rf "$INSTALL_PATH" ;;
*) log "kept $INSTALL_PATH" ;;
esac
if delete_install_path_requested; then
rm -rf "$INSTALL_PATH"
log "removed $INSTALL_PATH"
else
log "removed service; kept $INSTALL_PATH"
log "kept $INSTALL_PATH"
fi
warn_remaining_listeners "${ports[@]}"
}
show_menu() {
@@ -399,7 +797,11 @@ show_menu() {
3) main update ;;
4) main restart ;;
5) main stop ;;
6) main uninstall ;;
6)
if main uninstall; then
exit 0
fi
;;
0) exit 0 ;;
*) echo "无效的选项" ;;
esac
@@ -430,7 +832,7 @@ main() {
;;
restart)
need_root "$@"
systemctl restart "${SERVICE_NAME}.service"
restart_service_ready || die "service failed to start"
;;
stop)
need_root "$@"

Some files were not shown because too many files have changed in this diff Show More