feat(user): simplify user response structure in JSON output

Merge pull request #2524 from seefs001/fix/revert-model-ratio
fix: revert model ratio
2025-12-25 15:39:58 +08:00 · 2025-12-25 15:38:36 +08:00 · 2025-12-25 15:37:54 +08:00 · 2025-12-24 16:59:23 +08:00 · 2025-12-24 16:52:24 +08:00 · 2025-12-24 16:51:51 +08:00
133 changed files with 2407 additions and 966 deletions
@@ -16,6 +16,7 @@ new-api
 tiktoken_cache
 .eslintcache
 .gocache
+.gomodcache/
 .cache
 web/bun.lock

@@ -14,7 +14,7 @@ ENV GO111MODULE=on CGO_ENABLED=0
 ARG TARGETOS
 ARG TARGETARCH
 ENV GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH:-amd64}
-
+ENV GOEXPERIMENT=greenteagc

 WORKDIR /build

@@ -25,10 +25,11 @@ COPY . .
 COPY --from=builder /build/dist ./web/dist
 RUN go build -ldflags "-s -w -X 'github.com/QuantumNous/new-api/common.Version=$(cat VERSION)'" -o new-api

-FROM alpine
+FROM debian:bookworm-slim

-RUN apk upgrade --no-cache \
-    && apk add --no-cache ca-certificates tzdata \
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ca-certificates tzdata libasan8 wget \
+    && rm -rf /var/lib/apt/lists/* \
    && update-ca-certificates

 COPY --from=builder2 /build/new-api /
@@ -146,7 +146,7 @@ docker run --name new-api -d --restart always \

 🎉 After deployment is complete, visit `http://localhost:3000` to start using!

-📖 For more deployment methods, please refer to [Deployment Guide](https://docs.newapi.pro/installation)
+📖 For more deployment methods, please refer to [Deployment Guide](https://docs.newapi.pro/en/docs/installation)

 ---

@@ -154,7 +154,7 @@ docker run --name new-api -d --restart always \

 <div align="center">

-### 📖 [Official Documentation](https://docs.newapi.pro/) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)
+### 📖 [Official Documentation](https://docs.newapi.pro/en/docs) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)

 </div>

@@ -162,17 +162,17 @@ docker run --name new-api -d --restart always \

 | Category | Link |
 |------|------|
-| 🚀 Deployment Guide | [Installation Documentation](https://docs.newapi.pro/installation) |
-| ⚙️ Environment Configuration | [Environment Variables](https://docs.newapi.pro/installation/environment-variables) |
-| 📡 API Documentation | [API Documentation](https://docs.newapi.pro/api) |
-| ❓ FAQ | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 Community Interaction | [Communication Channels](https://docs.newapi.pro/support/community-interaction) |
+| 🚀 Deployment Guide | [Installation Documentation](https://docs.newapi.pro/en/docs/installation) |
+| ⚙️ Environment Configuration | [Environment Variables](https://docs.newapi.pro/en/docs/installation/config-maintenance/environment-variables) |
+| 📡 API Documentation | [API Documentation](https://docs.newapi.pro/en/docs/api) |
+| ❓ FAQ | [FAQ](https://docs.newapi.pro/en/docs/support/faq) |
+| 💬 Community Interaction | [Communication Channels](https://docs.newapi.pro/en/docs/support/community-interaction) |

 ---

 ## ✨ Key Features

-> For detailed features, please refer to [Features Introduction](https://docs.newapi.pro/wiki/features-introduction)
+> For detailed features, please refer to [Features Introduction](https://docs.newapi.pro/en/docs/guide/wiki/basic-concepts/features-introduction)

 ### 🎨 Core Functions

@@ -201,11 +201,11 @@ docker run --name new-api -d --restart always \
 ### 🚀 Advanced Features

 **API Format Support:**
- ⚡ [OpenAI Responses](https://docs.newapi.pro/api/openai-responses)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime) (including Azure)
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Rerank Models](https://docs.newapi.pro/api/jinaai-rerank) (Cohere, Jina)
+- ⚡ [OpenAI Responses](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-response)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/en/docs/api/ai-model/realtime/create-realtime-session) (including Azure)
+- ⚡ [Claude Messages](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/en/api/google-gemini-chat)
+- 🔄 [Rerank Models](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank) (Cohere, Jina)

 **Intelligent Routing:**
 - ⚖️ Channel weighted random
@@ -238,6 +238,7 @@ docker run --name new-api -d --restart always \
 - `gemini-2.5-flash-nothinking` - Disable thinking mode
 - `gemini-2.5-pro-thinking` - Enable thinking mode
 - `gemini-2.5-pro-thinking-128` - Enable thinking mode with thinking budget of 128 tokens
+- You can also append `-low`, `-medium`, or `-high` to any Gemini model name to request the corresponding reasoning effort (no extra thinking-budget suffix needed).

 </details>

@@ -245,16 +246,16 @@ docker run --name new-api -d --restart always \

 ## 🤖 Model Support

-> For details, please refer to [API Documentation - Relay Interface](https://docs.newapi.pro/api)
+> For details, please refer to [API Documentation - Relay Interface](https://docs.newapi.pro/en/docs/api)

 | Model Type | Description | Documentation |
 |---------|------|------|
 | 🤖 OpenAI GPTs | gpt-4-gizmo-* series | - |
-| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [Documentation](https://docs.newapi.pro/api/midjourney-proxy-image) |
-| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [Documentation](https://docs.newapi.pro/api/suno-music) |
-| 🔄 Rerank | Cohere, Jina | [Documentation](https://docs.newapi.pro/api/jinaai-rerank) |
-| 💬 Claude | Messages format | [Documentation](https://docs.newapi.pro/api/anthropic-chat) |
-| 🌐 Gemini | Google Gemini format | [Documentation](https://docs.newapi.pro/api/google-gemini-chat/) |
+| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [Documentation](https://doc.newapi.pro/en/api/midjourney-proxy-image) |
+| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [Documentation](https://doc.newapi.pro/en/api/suno-music) |
+| 🔄 Rerank | Cohere, Jina | [Documentation](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank) |
+| 💬 Claude | Messages format | [Documentation](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message) |
+| 🌐 Gemini | Google Gemini format | [Documentation](https://doc.newapi.pro/en/api/google-gemini-chat) |
 | 🔧 Dify | ChatFlow mode | - |
 | 🎯 Custom | Supports complete call address | - |

@@ -263,16 +264,16 @@ docker run --name new-api -d --restart always \
 <details>
 <summary>View complete interface list</summary>

- [Chat Interface (Chat Completions)](https://docs.newapi.pro/api/openai-chat)
- [Response Interface (Responses)](https://docs.newapi.pro/api/openai-responses)
- [Image Interface (Image)](https://docs.newapi.pro/api/openai-image)
- [Audio Interface (Audio)](https://docs.newapi.pro/api/openai-audio)
- [Video Interface (Video)](https://docs.newapi.pro/api/openai-video)
- [Embedding Interface (Embeddings)](https://docs.newapi.pro/api/openai-embeddings)
- [Rerank Interface (Rerank)](https://docs.newapi.pro/api/jinaai-rerank)
- [Realtime Conversation (Realtime)](https://docs.newapi.pro/api/openai-realtime)
- [Claude Chat](https://docs.newapi.pro/api/anthropic-chat)
- [Google Gemini Chat](https://docs.newapi.pro/api/google-gemini-chat/)
+- [Chat Interface (Chat Completions)](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-chat-completion)
+- [Response Interface (Responses)](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-response)
+- [Image Interface (Image)](https://docs.newapi.pro/en/docs/api/ai-model/images/openai/v1-images-generations--post)
+- [Audio Interface (Audio)](https://docs.newapi.pro/en/docs/api/ai-model/audio/openai/create-transcription)
+- [Video Interface (Video)](https://docs.newapi.pro/en/docs/api/ai-model/videos/create-video-generation)
+- [Embedding Interface (Embeddings)](https://docs.newapi.pro/en/docs/api/ai-model/embeddings/create-embedding)
+- [Rerank Interface (Rerank)](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank)
+- [Realtime Conversation (Realtime)](https://docs.newapi.pro/en/docs/api/ai-model/realtime/create-realtime-session)
+- [Claude Chat](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message)
+- [Google Gemini Chat](https://doc.newapi.pro/en/api/google-gemini-chat)

 </details>

@@ -304,10 +305,11 @@ docker run --name new-api -d --restart always \
 | `REDIS_CONN_STRING` | Redis connection string | - |
 | `STREAMING_TIMEOUT` | Streaming timeout (seconds) | `300` |
 | `STREAM_SCANNER_MAX_BUFFER_MB` | Max per-line buffer (MB) for the stream scanner; increase when upstream sends huge image/base64 payloads | `64` |
+| `MAX_REQUEST_BODY_MB` | Max request body size (MB, counted **after decompression**; prevents huge requests/zip bombs from exhausting memory). Exceeding it returns `413` | `32` |
 | `AZURE_DEFAULT_API_VERSION` | Azure API version | `2025-04-01-preview` |
 | `ERROR_LOG_ENABLED` | Error log switch | `false` |

-📖 **Complete configuration:** [Environment Variables Documentation](https://docs.newapi.pro/installation/environment-variables)
+📖 **Complete configuration:** [Environment Variables Documentation](https://docs.newapi.pro/en/docs/installation/config-maintenance/environment-variables)

 </details>

@@ -409,10 +411,10 @@ docker run --name new-api -d --restart always \

 | Resource | Link |
 |------|------|
-| 📘 FAQ | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 Community Interaction | [Communication Channels](https://docs.newapi.pro/support/community-interaction) |
-| 🐛 Issue Feedback | [Issue Feedback](https://docs.newapi.pro/support/feedback-issues) |
-| 📚 Complete Documentation | [Official Documentation](https://docs.newapi.pro/support) |
+| 📘 FAQ | [FAQ](https://docs.newapi.pro/en/docs/support/faq) |
+| 💬 Community Interaction | [Communication Channels](https://docs.newapi.pro/en/docs/support/community-interaction) |
+| 🐛 Issue Feedback | [Issue Feedback](https://docs.newapi.pro/en/docs/support/feedback-issues) |
+| 📚 Complete Documentation | [Official Documentation](https://docs.newapi.pro/en/docs) |

 ### 🤝 Contribution Guide

@@ -441,7 +443,7 @@ Welcome all forms of contribution!

 If this project is helpful to you, welcome to give us a ⭐️ Star！

-**[Official Documentation](https://docs.newapi.pro/)** • **[Issue Feedback](https://github.com/Calcium-Ion/new-api/issues)** • **[Latest Release](https://github.com/Calcium-Ion/new-api/releases)**
+**[Official Documentation](https://docs.newapi.pro/en/docs)** • **[Issue Feedback](https://github.com/Calcium-Ion/new-api/issues)** • **[Latest Release](https://github.com/Calcium-Ion/new-api/releases)**

 <sub>Built with ❤️ by QuantumNous</sub>

@@ -146,7 +146,7 @@ docker run --name new-api -d --restart always \

 🎉 Après le déploiement, visitez `http://localhost:3000` pour commencer à utiliser!

-📖 Pour plus de méthodes de déploiement, veuillez vous référer à [Guide de déploiement](https://docs.newapi.pro/installation)
+📖 Pour plus de méthodes de déploiement, veuillez vous référer à [Guide de déploiement](https://docs.newapi.pro/en/docs/installation)

 ---

@@ -154,7 +154,7 @@ docker run --name new-api -d --restart always \

 <div align="center">

-### 📖 [Documentation officielle](https://docs.newapi.pro/) | [![Demander à DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)
+### 📖 [Documentation officielle](https://docs.newapi.pro/en/docs) | [![Demander à DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)

 </div>

@@ -162,17 +162,17 @@ docker run --name new-api -d --restart always \

 | Catégorie | Lien |
 |------|------|
-| 🚀 Guide de déploiement | [Documentation d'installation](https://docs.newapi.pro/installation) |
-| ⚙️ Configuration de l'environnement | [Variables d'environnement](https://docs.newapi.pro/installation/environment-variables) |
-| 📡 Documentation de l'API | [Documentation de l'API](https://docs.newapi.pro/api) |
-| ❓ FAQ | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 Interaction avec la communauté | [Canaux de communication](https://docs.newapi.pro/support/community-interaction) |
+| 🚀 Guide de déploiement | [Documentation d'installation](https://docs.newapi.pro/en/docs/installation) |
+| ⚙️ Configuration de l'environnement | [Variables d'environnement](https://docs.newapi.pro/en/docs/installation/config-maintenance/environment-variables) |
+| 📡 Documentation de l'API | [Documentation de l'API](https://docs.newapi.pro/en/docs/api) |
+| ❓ FAQ | [FAQ](https://docs.newapi.pro/en/docs/support/faq) |
+| 💬 Interaction avec la communauté | [Canaux de communication](https://docs.newapi.pro/en/docs/support/community-interaction) |

 ---

 ## ✨ Fonctionnalités clés

-> Pour les fonctionnalités détaillées, veuillez vous référer à [Présentation des fonctionnalités](https://docs.newapi.pro/wiki/features-introduction) |
+> Pour les fonctionnalités détaillées, veuillez vous référer à [Présentation des fonctionnalités](https://docs.newapi.pro/en/docs/guide/wiki/basic-concepts/features-introduction) |

 ### 🎨 Fonctions principales

@@ -200,11 +200,11 @@ docker run --name new-api -d --restart always \
 ### 🚀 Fonctionnalités avancées

 **Prise en charge des formats d'API:**
- ⚡ [OpenAI Responses](https://docs.newapi.pro/api/openai-responses)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime) (y compris Azure)
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Modèles Rerank](https://docs.newapi.pro/api/jinaai-rerank) (Cohere, Jina)
+- ⚡ [OpenAI Responses](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-response)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/en/docs/api/ai-model/realtime/create-realtime-session) (y compris Azure)
+- ⚡ [Claude Messages](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/en/api/google-gemini-chat)
+- 🔄 [Modèles Rerank](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank) (Cohere, Jina)

 **Routage intelligent:**
 - ⚖️ Sélection aléatoire pondérée des canaux
@@ -234,6 +234,7 @@ docker run --name new-api -d --restart always \
 - `gemini-2.5-flash-nothinking` - Désactiver le mode de pensée
 - `gemini-2.5-pro-thinking` - Activer le mode de pensée
 - `gemini-2.5-pro-thinking-128` - Activer le mode de pensée avec budget de pensée de 128 tokens
+- Vous pouvez également ajouter les suffixes `-low`, `-medium` ou `-high` aux modèles Gemini pour fixer le niveau d’effort de raisonnement (sans suffixe de budget supplémentaire).

 </details>

@@ -241,16 +242,16 @@ docker run --name new-api -d --restart always \

 ## 🤖 Prise en charge des modèles

-> Pour les détails, veuillez vous référer à [Documentation de l'API - Interface de relais](https://docs.newapi.pro/api)
+> Pour les détails, veuillez vous référer à [Documentation de l'API - Interface de relais](https://docs.newapi.pro/en/docs/api)

 | Type de modèle | Description | Documentation |
 |---------|------|------|
 | 🤖 OpenAI GPTs | série gpt-4-gizmo-* | - |
-| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [Documentation](https://docs.newapi.pro/api/midjourney-proxy-image) |
-| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [Documentation](https://docs.newapi.pro/api/suno-music) |
-| 🔄 Rerank | Cohere, Jina | [Documentation](https://docs.newapi.pro/api/jinaai-rerank) |
-| 💬 Claude | Format Messages | [Documentation](https://docs.newapi.pro/api/anthropic-chat) |
-| 🌐 Gemini | Format Google Gemini | [Documentation](https://docs.newapi.pro/api/google-gemini-chat/) |
+| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [Documentation](https://doc.newapi.pro/en/api/midjourney-proxy-image) |
+| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [Documentation](https://doc.newapi.pro/en/api/suno-music) |
+| 🔄 Rerank | Cohere, Jina | [Documentation](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank) |
+| 💬 Claude | Format Messages | [Documentation](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message) |
+| 🌐 Gemini | Format Google Gemini | [Documentation](https://doc.newapi.pro/en/api/google-gemini-chat) |
 | 🔧 Dify | Mode ChatFlow | - |
 | 🎯 Personnalisé | Prise en charge de l'adresse d'appel complète | - |

@@ -259,16 +260,16 @@ docker run --name new-api -d --restart always \
 <details>
 <summary>Voir la liste complète des interfaces</summary>

- [Interface de discussion (Chat Completions)](https://docs.newapi.pro/api/openai-chat)
- [Interface de réponse (Responses)](https://docs.newapi.pro/api/openai-responses)
- [Interface d'image (Image)](https://docs.newapi.pro/api/openai-image)
- [Interface audio (Audio)](https://docs.newapi.pro/api/openai-audio)
- [Interface vidéo (Video)](https://docs.newapi.pro/api/openai-video)
- [Interface d'incorporation (Embeddings)](https://docs.newapi.pro/api/openai-embeddings)
- [Interface de rerank (Rerank)](https://docs.newapi.pro/api/jinaai-rerank)
- [Conversation en temps réel (Realtime)](https://docs.newapi.pro/api/openai-realtime)
- [Discussion Claude](https://docs.newapi.pro/api/anthropic-chat)
- [Discussion Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
+- [Interface de discussion (Chat Completions)](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-chat-completion)
+- [Interface de réponse (Responses)](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-response)
+- [Interface d'image (Image)](https://docs.newapi.pro/en/docs/api/ai-model/images/openai/v1-images-generations--post)
+- [Interface audio (Audio)](https://docs.newapi.pro/en/docs/api/ai-model/audio/openai/create-transcription)
+- [Interface vidéo (Video)](https://docs.newapi.pro/en/docs/api/ai-model/videos/create-video-generation)
+- [Interface d'incorporation (Embeddings)](https://docs.newapi.pro/en/docs/api/ai-model/embeddings/create-embedding)
+- [Interface de rerank (Rerank)](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank)
+- [Conversation en temps réel (Realtime)](https://docs.newapi.pro/en/docs/api/ai-model/realtime/create-realtime-session)
+- [Discussion Claude](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message)
+- [Discussion Google Gemini](https://doc.newapi.pro/en/api/google-gemini-chat)

 </details>

@@ -300,10 +301,11 @@ docker run --name new-api -d --restart always \
 | `REDIS_CONN_STRING` | Chaine de connexion Redis | - |
 | `STREAMING_TIMEOUT` | Délai d'expiration du streaming (secondes) | `300` |
 | `STREAM_SCANNER_MAX_BUFFER_MB` | Taille max du buffer par ligne (Mo) pour le scanner SSE ; à augmenter quand les sorties image/base64 sont très volumineuses (ex. images 4K) | `64` |
+| `MAX_REQUEST_BODY_MB` | Taille maximale du corps de requête (Mo, comptée **après décompression** ; évite les requêtes énormes/zip bombs qui saturent la mémoire). Dépassement ⇒ `413` | `32` |
 | `AZURE_DEFAULT_API_VERSION` | Version de l'API Azure | `2025-04-01-preview` |
 | `ERROR_LOG_ENABLED` | Interrupteur du journal d'erreurs | `false` |

-📖 **Configuration complète:** [Documentation des variables d'environnement](https://docs.newapi.pro/installation/environment-variables)
+📖 **Configuration complète:** [Documentation des variables d'environnement](https://docs.newapi.pro/en/docs/installation/config-maintenance/environment-variables)

 </details>

@@ -403,10 +405,10 @@ docker run --name new-api -d --restart always \

 | Ressource | Lien |
 |------|------|
-| 📘 FAQ | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 Interaction avec la communauté | [Canaux de communication](https://docs.newapi.pro/support/community-interaction) |
-| 🐛 Commentaires sur les problèmes | [Commentaires sur les problèmes](https://docs.newapi.pro/support/feedback-issues) |
-| 📚 Documentation complète | [Documentation officielle](https://docs.newapi.pro/support) |
+| 📘 FAQ | [FAQ](https://docs.newapi.pro/en/docs/support/faq) |
+| 💬 Interaction avec la communauté | [Canaux de communication](https://docs.newapi.pro/en/docs/support/community-interaction) |
+| 🐛 Commentaires sur les problèmes | [Commentaires sur les problèmes](https://docs.newapi.pro/en/docs/support/feedback-issues) |
+| 📚 Documentation complète | [Documentation officielle](https://docs.newapi.pro/en/docs) |

 ### 🤝 Guide de contribution

@@ -435,7 +437,7 @@ Bienvenue à toutes les formes de contribution!

 Si ce projet vous est utile, bienvenue à nous donner une ⭐️ Étoile！

-**[Documentation officielle](https://docs.newapi.pro/)** • **[Commentaires sur les problèmes](https://github.com/Calcium-Ion/new-api/issues)** • **[Dernière version](https://github.com/Calcium-Ion/new-api/releases)**
+**[Documentation officielle](https://docs.newapi.pro/en/docs)** • **[Commentaires sur les problèmes](https://github.com/Calcium-Ion/new-api/issues)** • **[Dernière version](https://github.com/Calcium-Ion/new-api/releases)**

 <sub>Construit avec ❤️ par QuantumNous</sub>

@@ -146,7 +146,7 @@ docker run --name new-api -d --restart always \

 🎉 デプロイが完了したら、`http://localhost:3000` にアクセスして使用を開始してください！

-📖 その他のデプロイ方法については[デプロイガイド](https://docs.newapi.pro/installation)を参照してください。
+📖 その他のデプロイ方法については[デプロイガイド](https://docs.newapi.pro/ja/docs/installation)を参照してください。

 ---

@@ -154,7 +154,7 @@ docker run --name new-api -d --restart always \

 <div align="center">

-### 📖 [公式ドキュメント](https://docs.newapi.pro/) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)
+### 📖 [公式ドキュメント](https://docs.newapi.pro/ja/docs) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)

 </div>

@@ -162,17 +162,17 @@ docker run --name new-api -d --restart always \

 | カテゴリ | リンク |
 |------|------|
-| 🚀 デプロイガイド | [インストールドキュメント](https://docs.newapi.pro/installation) |
-| ⚙️ 環境設定 | [環境変数](https://docs.newapi.pro/installation/environment-variables) |
-| 📡 APIドキュメント | [APIドキュメント](https://docs.newapi.pro/api) |
-| ❓ よくある質問 | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 コミュニティ交流 | [交流チャネル](https://docs.newapi.pro/support/community-interaction) |
+| 🚀 デプロイガイド | [インストールドキュメント](https://docs.newapi.pro/ja/docs/installation) |
+| ⚙️ 環境設定 | [環境変数](https://docs.newapi.pro/ja/docs/installation/config-maintenance/environment-variables) |
+| 📡 APIドキュメント | [APIドキュメント](https://docs.newapi.pro/ja/docs/api) |
+| ❓ よくある質問 | [FAQ](https://docs.newapi.pro/ja/docs/support/faq) |
+| 💬 コミュニティ交流 | [交流チャネル](https://docs.newapi.pro/ja/docs/support/community-interaction) |

 ---

 ## ✨ 主な機能

-> 詳細な機能については[機能説明](https://docs.newapi.pro/wiki/features-introduction)を参照してください。
+> 詳細な機能については[機能説明](https://docs.newapi.pro/ja/docs/guide/wiki/basic-concepts/features-introduction)を参照してください。

 ### 🎨 コア機能

@@ -202,15 +202,15 @@ docker run --name new-api -d --restart always \
 ### 🚀 高度な機能

 **APIフォーマットサポート:**
- ⚡ [OpenAI Responses](https://docs.newapi.pro/api/openai-responses)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime)（Azureを含む）
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Rerankモデル](https://docs.newapi.pro/api/jinaai-rerank)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime)
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Rerankモデル](https://docs.newapi.pro/api/jinaai-rerank)（Cohere、Jina）
+- ⚡ [OpenAI Responses](https://docs.newapi.pro/ja/docs/api/ai-model/chat/openai/create-response)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/ja/docs/api/ai-model/realtime/create-realtime-session)（Azureを含む）
+- ⚡ [Claude Messages](https://docs.newapi.pro/ja/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/ja/api/google-gemini-chat)
+- 🔄 [Rerankモデル](https://docs.newapi.pro/ja/docs/api/ai-model/rerank/create-rerank)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/ja/docs/api/ai-model/realtime/create-realtime-session)
+- ⚡ [Claude Messages](https://docs.newapi.pro/ja/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/ja/api/google-gemini-chat)
+- 🔄 [Rerankモデル](https://docs.newapi.pro/ja/docs/api/ai-model/rerank/create-rerank)（Cohere、Jina）

 **インテリジェントルーティング:**
 - ⚖️ チャネル重み付けランダム
@@ -243,6 +243,7 @@ docker run --name new-api -d --restart always \
 - `gemini-2.5-flash-nothinking` - 思考モードを無効にする
 - `gemini-2.5-pro-thinking` - 思考モードを有効にする
 - `gemini-2.5-pro-thinking-128` - 思考モードを有効にし、思考予算を128トークンに設定する
+- Gemini モデル名の末尾に `-low` / `-medium` / `-high` を付けることで推論強度を直接指定できます（追加の思考予算サフィックスは不要です）。

 </details>

@@ -250,16 +251,16 @@ docker run --name new-api -d --restart always \

 ## 🤖 モデルサポート

-> 詳細については[APIドキュメント - 中継インターフェース](https://docs.newapi.pro/api)
+> 詳細については[APIドキュメント - 中継インターフェース](https://docs.newapi.pro/ja/docs/api)

 | モデルタイプ | 説明 | ドキュメント |
 |---------|------|------|
 | 🤖 OpenAI GPTs | gpt-4-gizmo-* シリーズ | - |
-| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [ドキュメント](https://docs.newapi.pro/api/midjourney-proxy-image) |
-| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [ドキュメント](https://docs.newapi.pro/api/suno-music) |
-| 🔄 Rerank | Cohere、Jina | [ドキュメント](https://docs.newapi.pro/api/jinaai-rerank) |
-| 💬 Claude | Messagesフォーマット | [ドキュメント](https://docs.newapi.pro/api/suno-music) |
-| 🌐 Gemini | Google Geminiフォーマット | [ドキュメント](https://docs.newapi.pro/api/google-gemini-chat/) |
+| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [ドキュメント](https://doc.newapi.pro/ja/api/midjourney-proxy-image) |
+| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [ドキュメント](https://doc.newapi.pro/ja/api/suno-music) |
+| 🔄 Rerank | Cohere、Jina | [ドキュメント](https://docs.newapi.pro/ja/docs/api/ai-model/rerank/create-rerank) |
+| 💬 Claude | Messagesフォーマット | [ドキュメント](https://docs.newapi.pro/ja/docs/api/ai-model/chat/create-message) |
+| 🌐 Gemini | Google Geminiフォーマット | [ドキュメント](https://doc.newapi.pro/ja/api/google-gemini-chat) |
 | 🔧 Dify | ChatFlowモード | - |
 | 🎯 カスタム | 完全な呼び出しアドレスの入力をサポート | - |

@@ -268,16 +269,16 @@ docker run --name new-api -d --restart always \
 <details>
 <summary>完全なインターフェースリストを表示</summary>

- [チャットインターフェース (Chat Completions)](https://docs.newapi.pro/api/openai-chat)
- [レスポンスインターフェース (Responses)](https://docs.newapi.pro/api/openai-responses)
- [イメージインターフェース (Image)](https://docs.newapi.pro/api/openai-image)
- [オーディオインターフェース (Audio)](https://docs.newapi.pro/api/openai-audio)
- [ビデオインターフェース (Video)](https://docs.newapi.pro/api/openai-video)
- [エンベッドインターフェース (Embeddings)](https://docs.newapi.pro/api/openai-embeddings)
- [再ランク付けインターフェース (Rerank)](https://docs.newapi.pro/api/jinaai-rerank)
- [リアルタイム対話インターフェース (Realtime)](https://docs.newapi.pro/api/openai-realtime)
- [Claudeチャット](https://docs.newapi.pro/api/anthropic-chat)
- [Google Geminiチャット](https://docs.newapi.pro/api/google-gemini-chat/)
+- [チャットインターフェース (Chat Completions)](https://docs.newapi.pro/ja/docs/api/ai-model/chat/openai/create-chat-completion)
+- [レスポンスインターフェース (Responses)](https://docs.newapi.pro/ja/docs/api/ai-model/chat/openai/create-response)
+- [イメージインターフェース (Image)](https://docs.newapi.pro/ja/docs/api/ai-model/images/openai/v1-images-generations--post)
+- [オーディオインターフェース (Audio)](https://docs.newapi.pro/ja/docs/api/ai-model/audio/openai/create-transcription)
+- [ビデオインターフェース (Video)](https://docs.newapi.pro/ja/docs/api/ai-model/videos/create-video-generation)
+- [エンベッドインターフェース (Embeddings)](https://docs.newapi.pro/ja/docs/api/ai-model/embeddings/create-embedding)
+- [再ランク付けインターフェース (Rerank)](https://docs.newapi.pro/ja/docs/api/ai-model/rerank/create-rerank)
+- [リアルタイム対話インターフェース (Realtime)](https://docs.newapi.pro/ja/docs/api/ai-model/realtime/create-realtime-session)
+- [Claudeチャット](https://docs.newapi.pro/ja/docs/api/ai-model/chat/create-message)
+- [Google Geminiチャット](https://doc.newapi.pro/ja/api/google-gemini-chat)

 </details>

@@ -309,10 +310,11 @@ docker run --name new-api -d --restart always \
 | `REDIS_CONN_STRING` | Redis接続文字列 | - |
 | `STREAMING_TIMEOUT` | ストリーミング応答のタイムアウト時間（秒） | `300` |
 | `STREAM_SCANNER_MAX_BUFFER_MB` | ストリームスキャナの1行あたりバッファ上限（MB）。4K画像など巨大なbase64 `data:` ペイロードを扱う場合は値を増加させてください | `64` |
+| `MAX_REQUEST_BODY_MB` | リクエストボディ最大サイズ（MB、**解凍後**に計測。巨大リクエスト/zip bomb によるメモリ枯渇を防止）。超過時は `413` | `32` |
 | `AZURE_DEFAULT_API_VERSION` | Azure APIバージョン | `2025-04-01-preview` |
 | `ERROR_LOG_ENABLED` | エラーログスイッチ | `false` |

-📖 **完全な設定:** [環境変数ドキュメント](https://docs.newapi.pro/installation/environment-variables)
+📖 **完全な設定:** [環境変数ドキュメント](https://docs.newapi.pro/ja/docs/installation/config-maintenance/environment-variables)

 </details>

@@ -412,10 +414,10 @@ docker run --name new-api -d --restart always \

 | リソース | リンク |
 |------|------|
-| 📘 よくある質問 | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 コミュニティ交流 | [交流チャネル](https://docs.newapi.pro/support/community-interaction) |
-| 🐛 問題のフィードバック | [問題フィードバック](https://docs.newapi.pro/support/feedback-issues) |
-| 📚 完全なドキュメント | [公式ドキュメント](https://docs.newapi.pro/support) |
+| 📘 よくある質問 | [FAQ](https://docs.newapi.pro/ja/docs/support/faq) |
+| 💬 コミュニティ交流 | [交流チャネル](https://docs.newapi.pro/ja/docs/support/community-interaction) |
+| 🐛 問題のフィードバック | [問題フィードバック](https://docs.newapi.pro/ja/docs/support/feedback-issues) |
+| 📚 完全なドキュメント | [公式ドキュメント](https://docs.newapi.pro/ja/docs) |

 ### 🤝 貢献ガイド

@@ -444,7 +446,7 @@ docker run --name new-api -d --restart always \

 このプロジェクトがあなたのお役に立てたなら、ぜひ ⭐️ スターをください！

-**[公式ドキュメント](https://docs.newapi.pro/)** • **[問題フィードバック](https://github.com/Calcium-Ion/new-api/issues)** • **[最新リリース](https://github.com/Calcium-Ion/new-api/releases)**
+**[公式ドキュメント](https://docs.newapi.pro/ja/docs)** • **[問題フィードバック](https://github.com/Calcium-Ion/new-api/issues)** • **[最新リリース](https://github.com/Calcium-Ion/new-api/releases)**

 <sub>❤️ で構築された QuantumNous</sub>

@@ -146,7 +146,7 @@ docker run --name new-api -d --restart always \

 🎉 部署完成后，访问 `http://localhost:3000` 即可使用！

-📖 更多部署方式请参考 [部署指南](https://docs.newapi.pro/installation)
+📖 更多部署方式请参考 [部署指南](https://docs.newapi.pro/zh/docs/installation)

 ---

@@ -154,7 +154,7 @@ docker run --name new-api -d --restart always \

 <div align="center">

-### 📖 [官方文档](https://docs.newapi.pro/) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)
+### 📖 [官方文档](https://docs.newapi.pro/zh/docs) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)

 </div>

@@ -162,17 +162,17 @@ docker run --name new-api -d --restart always \

 | 分类 | 链接 |
 |------|------|
-| 🚀 部署指南 | [安装文档](https://docs.newapi.pro/installation) |
-| ⚙️ 环境配置 | [环境变量](https://docs.newapi.pro/installation/environment-variables) |
-| 📡 接口文档 | [API 文档](https://docs.newapi.pro/api) |
-| ❓ 常见问题 | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 社区交流 | [交流渠道](https://docs.newapi.pro/support/community-interaction) |
+| 🚀 部署指南 | [安装文档](https://docs.newapi.pro/zh/docs/installation) |
+| ⚙️ 环境配置 | [环境变量](https://docs.newapi.pro/zh/docs/installation/config-maintenance/environment-variables) |
+| 📡 接口文档 | [API 文档](https://docs.newapi.pro/zh/docs/api) |
+| ❓ 常见问题 | [FAQ](https://docs.newapi.pro/zh/docs/support/faq) |
+| 💬 社区交流 | [交流渠道](https://docs.newapi.pro/zh/docs/support/community-interaction) |

 ---

 ## ✨ 主要特性

-> 详细特性请参考 [特性说明](https://docs.newapi.pro/wiki/features-introduction)
+> 详细特性请参考 [特性说明](https://docs.newapi.pro/zh/docs/guide/wiki/basic-concepts/features-introduction)

 ### 🎨 核心功能

@@ -202,11 +202,11 @@ docker run --name new-api -d --restart always \
 ### 🚀 高级功能

 **API 格式支持：**
- ⚡ [OpenAI Responses](https://docs.newapi.pro/api/openai-responses)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime)（含 Azure）
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Rerank 模型](https://docs.newapi.pro/api/jinaai-rerank)（Cohere、Jina）
+- ⚡ [OpenAI Responses](https://docs.newapi.pro/zh/docs/api/ai-model/chat/openai/create-response)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/zh/docs/api/ai-model/realtime/create-realtime-session)（含 Azure）
+- ⚡ [Claude Messages](https://docs.newapi.pro/zh/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/api/google-gemini-chat)
+- 🔄 [Rerank 模型](https://docs.newapi.pro/zh/docs/api/ai-model/rerank/create-rerank)（Cohere、Jina）

 **智能路由：**
 - ⚖️ 渠道加权随机
@@ -239,6 +239,7 @@ docker run --name new-api -d --restart always \
 - `gemini-2.5-flash-nothinking` - 禁用思考模式
 - `gemini-2.5-pro-thinking` - 启用思考模式
 - `gemini-2.5-pro-thinking-128` - 启用思考模式，并设置思考预算为128tokens
+- 也可以直接在 Gemini 模型名称后追加 `-low` / `-medium` / `-high` 来控制思考力度（无需再设置思考预算后缀）

 </details>

@@ -246,16 +247,16 @@ docker run --name new-api -d --restart always \

 ## 🤖 模型支持

-> 详情请参考 [接口文档 - 中继接口](https://docs.newapi.pro/api)
+> 详情请参考 [接口文档 - 中继接口](https://docs.newapi.pro/zh/docs/api)

 | 模型类型 | 说明 | 文档 |
 |---------|------|------|
 | 🤖 OpenAI GPTs | gpt-4-gizmo-* 系列 | - |
-| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [文档](https://docs.newapi.pro/api/midjourney-proxy-image) |
-| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [文档](https://docs.newapi.pro/api/suno-music) |
-| 🔄 Rerank | Cohere、Jina | [文档](https://docs.newapi.pro/api/jinaai-rerank) |
-| 💬 Claude | Messages 格式 | [文档](https://docs.newapi.pro/api/anthropic-chat) |
-| 🌐 Gemini | Google Gemini 格式 | [文档](https://docs.newapi.pro/api/google-gemini-chat/) |
+| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [文档](https://doc.newapi.pro/api/midjourney-proxy-image) |
+| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [文档](https://doc.newapi.pro/api/suno-music) |
+| 🔄 Rerank | Cohere、Jina | [文档](https://docs.newapi.pro/zh/docs/api/ai-model/rerank/create-rerank) |
+| 💬 Claude | Messages 格式 | [文档](https://docs.newapi.pro/zh/docs/api/ai-model/chat/create-message) |
+| 🌐 Gemini | Google Gemini 格式 | [文档](https://doc.newapi.pro/api/google-gemini-chat) |
 | 🔧 Dify | ChatFlow 模式 | - |
 | 🎯 自定义 | 支持完整调用地址 | - |

@@ -264,16 +265,16 @@ docker run --name new-api -d --restart always \
 <details>
 <summary>查看完整接口列表</summary>

- [聊天接口 (Chat Completions)](https://docs.newapi.pro/api/openai-chat)
- [响应接口 (Responses)](https://docs.newapi.pro/api/openai-responses)
- [图像接口 (Image)](https://docs.newapi.pro/api/openai-image)
- [音频接口 (Audio)](https://docs.newapi.pro/api/openai-audio)
- [视频接口 (Video)](https://docs.newapi.pro/api/openai-video)
- [嵌入接口 (Embeddings)](https://docs.newapi.pro/api/openai-embeddings)
- [重排序接口 (Rerank)](https://docs.newapi.pro/api/jinaai-rerank)
- [实时对话 (Realtime)](https://docs.newapi.pro/api/openai-realtime)
- [Claude 聊天](https://docs.newapi.pro/api/anthropic-chat)
- [Google Gemini 聊天](https://docs.newapi.pro/api/google-gemini-chat)
+- [聊天接口 (Chat Completions)](https://docs.newapi.pro/zh/docs/api/ai-model/chat/openai/create-chat-completion)
+- [响应接口 (Responses)](https://docs.newapi.pro/zh/docs/api/ai-model/chat/openai/create-response)
+- [图像接口 (Image)](https://docs.newapi.pro/zh/docs/api/ai-model/images/openai/v1-images-generations--post)
+- [音频接口 (Audio)](https://docs.newapi.pro/zh/docs/api/ai-model/audio/openai/create-transcription)
+- [视频接口 (Video)](https://docs.newapi.pro/zh/docs/api/ai-model/videos/create-video-generation)
+- [嵌入接口 (Embeddings)](https://docs.newapi.pro/zh/docs/api/ai-model/embeddings/create-embedding)
+- [重排序接口 (Rerank)](https://docs.newapi.pro/zh/docs/api/ai-model/rerank/create-rerank)
+- [实时对话 (Realtime)](https://docs.newapi.pro/zh/docs/api/ai-model/realtime/create-realtime-session)
+- [Claude 聊天](https://docs.newapi.pro/zh/docs/api/ai-model/chat/create-message)
+- [Google Gemini 聊天](https://doc.newapi.pro/api/google-gemini-chat)

 </details>

@@ -305,10 +306,11 @@ docker run --name new-api -d --restart always \
 | `REDIS_CONN_STRING` | Redis 连接字符串                                                  | - |
 | `STREAMING_TIMEOUT` | 流式超时时间（秒）                                                    | `300` |
 | `STREAM_SCANNER_MAX_BUFFER_MB` | 流式扫描器单行最大缓冲（MB），图像生成等超大 `data:` 片段（如 4K 图片 base64）需适当调大 | `64` |
+| `MAX_REQUEST_BODY_MB` | 请求体最大大小（MB，**解压后**计；防止超大请求/zip bomb 导致内存暴涨），超过将返回 `413` | `32` |
 | `AZURE_DEFAULT_API_VERSION` | Azure API 版本                                                 | `2025-04-01-preview` |
 | `ERROR_LOG_ENABLED` | 错误日志开关                                                       | `false` |

-📖 **完整配置：** [环境变量文档](https://docs.newapi.pro/installation/environment-variables)
+📖 **完整配置：** [环境变量文档](https://docs.newapi.pro/zh/docs/installation/config-maintenance/environment-variables)

 </details>

@@ -410,10 +412,10 @@ docker run --name new-api -d --restart always \

 | 资源 | 链接 |
 |------|------|
-| 📘 常见问题 | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 社区交流 | [交流渠道](https://docs.newapi.pro/support/community-interaction) |
-| 🐛 反馈问题 | [问题反馈](https://docs.newapi.pro/support/feedback-issues) |
-| 📚 完整文档 | [官方文档](https://docs.newapi.pro/support) |
+| 📘 常见问题 | [FAQ](https://docs.newapi.pro/zh/docs/support/faq) |
+| 💬 社区交流 | [交流渠道](https://docs.newapi.pro/zh/docs/support/community-interaction) |
+| 🐛 反馈问题 | [问题反馈](https://docs.newapi.pro/zh/docs/support/feedback-issues) |
+| 📚 完整文档 | [官方文档](https://docs.newapi.pro/zh/docs) |

 ### 🤝 贡献指南

@@ -442,7 +444,7 @@ docker run --name new-api -d --restart always \

 如果这个项目对你有帮助，欢迎给我们一个 ⭐️ Star！

-**[官方文档](https://docs.newapi.pro/)** • **[问题反馈](https://github.com/Calcium-Ion/new-api/issues)** • **[最新发布](https://github.com/Calcium-Ion/new-api/releases)**
+**[官方文档](https://docs.newapi.pro/zh/docs)** • **[问题反馈](https://github.com/Calcium-Ion/new-api/issues)** • **[最新发布](https://github.com/Calcium-Ion/new-api/releases)**

 <sub>Built with ❤️ by QuantumNous</sub>

@@ -71,15 +71,66 @@ func getMP3Duration(r io.Reader) (float64, error) {

 // getWAVDuration 解析 WAV 文件头以获取时长。
 func getWAVDuration(r io.ReadSeeker) (float64, error) {
+	// 1. 强制复位指针
+	r.Seek(0, io.SeekStart)
+
 	dec := wav.NewDecoder(r)
+
+	// IsValidFile 会读取 fmt 块
 	if !dec.IsValidFile() {
 		return 0, errors.New("invalid wav file")
 	}
-	d, err := dec.Duration()
-	if err != nil {
-		return 0, errors.Wrap(err, "failed to get wav duration")
+
+	// 尝试寻找 data 块
+	if err := dec.FwdToPCM(); err != nil {
+		return 0, errors.Wrap(err, "failed to find PCM data chunk")
 	}
-	return d.Seconds(), nil
+
+	pcmSize := int64(dec.PCMSize)
+
+	// 如果读出来的 Size 是 0，尝试用文件大小反推
+	if pcmSize == 0 {
+		// 获取文件总大小
+		currentPos, _ := r.Seek(0, io.SeekCurrent) // 当前通常在 data chunk header 之后
+		endPos, _ := r.Seek(0, io.SeekEnd)
+		fileSize := endPos
+
+		// 恢复位置（虽然如果不继续读也没关系）
+		r.Seek(currentPos, io.SeekStart)
+
+		// 数据区大小 ≈ 文件总大小 - 当前指针位置(即Header大小)
+		// 注意：FwdToPCM 成功后，CurrentPos 应该刚好指向 Data 区数据的开始
+		// 或者是 Data Chunk ID + Size 之后。
+		// WAV Header 一般 44 字节。
+		if fileSize > 44 {
+			// 如果 FwdToPCM 成功，Reader 应该位于 data 块的数据起始处
+			// 所以剩余的所有字节理论上都是音频数据
+			pcmSize = fileSize - currentPos
+
+			// 简单的兜底：如果算出来还是负数或0，强制按文件大小-44计算
+			if pcmSize <= 0 {
+				pcmSize = fileSize - 44
+			}
+		}
+	}
+
+	numChans := int64(dec.NumChans)
+	bitDepth := int64(dec.BitDepth)
+	sampleRate := float64(dec.SampleRate)
+
+	if sampleRate == 0 || numChans == 0 || bitDepth == 0 {
+		return 0, errors.New("invalid wav header metadata")
+	}
+
+	bytesPerFrame := numChans * (bitDepth / 8)
+	if bytesPerFrame == 0 {
+		return 0, errors.New("invalid byte depth calculation")
+	}
+
+	totalFrames := pcmSize / bytesPerFrame
+
+	durationSeconds := float64(totalFrames) / sampleRate
+	return durationSeconds, nil
 }

 // getFLACDuration 解析 FLAC 文件的 STREAMINFO 块。
@@ -121,6 +121,9 @@ var BatchUpdateInterval int

 var RelayTimeout int // unit is second

+var RelayMaxIdleConns int
+var RelayMaxIdleConnsPerHost int
+
 var GeminiSafetySetting string

 // https://docs.cohere.com/docs/safety-modes Type; NONE/CONTEXTUAL/STRICT
@@ -32,7 +32,7 @@ func SendEmail(subject string, receiver string, content string) error {
 	}
 	encodedSubject := fmt.Sprintf("=?UTF-8?B?%s?=", base64.StdEncoding.EncodeToString([]byte(subject)))
 	mail := []byte(fmt.Sprintf("To: %s\r\n"+
-		"From: %s<%s>\r\n"+
+		"From: %s <%s>\r\n"+
 		"Subject: %s\r\n"+
 		"Date: %s\r\n"+
 		"Message-ID: %s\r\n"+ // 添加 Message-ID 头
@@ -2,7 +2,7 @@ package common

 import (
 	"bytes"
-	"errors"
+	"fmt"
 	"io"
 	"mime"
 	"mime/multipart"
@@ -12,24 +12,61 @@ import (
 	"time"

 	"github.com/QuantumNous/new-api/constant"
+	"github.com/pkg/errors"

 	"github.com/gin-gonic/gin"
 )

 const KeyRequestBody = "key_request_body"

-func GetRequestBody(c *gin.Context) ([]byte, error) {
-	requestBody, _ := c.Get(KeyRequestBody)
-	if requestBody != nil {
-		return requestBody.([]byte), nil
+var ErrRequestBodyTooLarge = errors.New("request body too large")
+
+func IsRequestBodyTooLargeError(err error) bool {
+	if err == nil {
+		return false
 	}
-	requestBody, err := io.ReadAll(c.Request.Body)
+	if errors.Is(err, ErrRequestBodyTooLarge) {
+		return true
+	}
+	var mbe *http.MaxBytesError
+	return errors.As(err, &mbe)
+}
+
+func GetRequestBody(c *gin.Context) ([]byte, error) {
+	cached, exists := c.Get(KeyRequestBody)
+	if exists && cached != nil {
+		if b, ok := cached.([]byte); ok {
+			return b, nil
+		}
+	}
+	maxMB := constant.MaxRequestBodyMB
+	if maxMB < 0 {
+		// no limit
+		body, err := io.ReadAll(c.Request.Body)
+		_ = c.Request.Body.Close()
+		if err != nil {
+			return nil, err
+		}
+		c.Set(KeyRequestBody, body)
+		return body, nil
+	}
+	maxBytes := int64(maxMB) << 20
+
+	limited := io.LimitReader(c.Request.Body, maxBytes+1)
+	body, err := io.ReadAll(limited)
 	if err != nil {
+		_ = c.Request.Body.Close()
+		if IsRequestBodyTooLargeError(err) {
+			return nil, errors.Wrap(ErrRequestBodyTooLarge, fmt.Sprintf("request body exceeds %d MB", maxMB))
+		}
 		return nil, err
 	}
 	_ = c.Request.Body.Close()
-	c.Set(KeyRequestBody, requestBody)
-	return requestBody.([]byte), nil
+	if int64(len(body)) > maxBytes {
+		return nil, errors.Wrap(ErrRequestBodyTooLarge, fmt.Sprintf("request body exceeds %d MB", maxMB))
+	}
+	c.Set(KeyRequestBody, body)
+	return body, nil
 }

 func UnmarshalBodyReusable(c *gin.Context, v any) error {
@@ -90,6 +90,8 @@ func InitEnv() {
 	SyncFrequency = GetEnvOrDefault("SYNC_FREQUENCY", 60)
 	BatchUpdateInterval = GetEnvOrDefault("BATCH_UPDATE_INTERVAL", 5)
 	RelayTimeout = GetEnvOrDefault("RELAY_TIMEOUT", 0)
+	RelayMaxIdleConns = GetEnvOrDefault("RELAY_MAX_IDLE_CONNS", 500)
+	RelayMaxIdleConnsPerHost = GetEnvOrDefault("RELAY_MAX_IDLE_CONNS_PER_HOST", 100)

 	// Initialize string variables with GetEnvOrDefaultString
 	GeminiSafetySetting = GetEnvOrDefaultString("GEMINI_SAFETY_SETTING", "BLOCK_NONE")
@@ -115,6 +117,8 @@ func initConstantEnv() {
 	constant.DifyDebug = GetEnvOrDefaultBool("DIFY_DEBUG", true)
 	constant.MaxFileDownloadMB = GetEnvOrDefault("MAX_FILE_DOWNLOAD_MB", 20)
 	constant.StreamScannerMaxBufferMB = GetEnvOrDefault("STREAM_SCANNER_MAX_BUFFER_MB", 64)
+	// MaxRequestBodyMB 请求体最大大小（解压后），用于防止超大请求/zip bomb导致内存暴涨
+	constant.MaxRequestBodyMB = GetEnvOrDefault("MAX_REQUEST_BODY_MB", 64)
 	// ForceStreamOption 覆盖请求参数，强制返回usage信息
 	constant.ForceStreamOption = GetEnvOrDefaultBool("FORCE_STREAM_OPTION", true)
 	constant.CountToken = GetEnvOrDefaultBool("CountToken", true)
@@ -129,6 +133,8 @@ func initConstantEnv() {
 	constant.GenerateDefaultToken = GetEnvOrDefaultBool("GENERATE_DEFAULT_TOKEN", false)
 	// 是否启用错误日志
 	constant.ErrorLogEnabled = GetEnvOrDefaultBool("ERROR_LOG_ENABLED", false)
+	// 任务轮询时查询的最大数量
+	constant.TaskQueryLimit = GetEnvOrDefault("TASK_QUERY_LIMIT", 1000)

 	soraPatchStr := GetEnvOrDefaultString("TASK_PRICE_PATCH", "")
 	if soraPatchStr != "" {
@@ -2,6 +2,15 @@ package common

 import "net"

+func IsIP(s string) bool {
+	ip := net.ParseIP(s)
+	return ip != nil
+}
+
+func ParseIP(s string) net.IP {
+	return net.ParseIP(s)
+}
+
 func IsPrivateIP(ip net.IP) bool {
 	if ip.IsLoopback() || ip.IsLinkLocalUnicast() || ip.IsLinkLocalMulticast() {
 		return true
@@ -20,3 +29,23 @@ func IsPrivateIP(ip net.IP) bool {
 	}
 	return false
 }
+
+func IsIpInCIDRList(ip net.IP, cidrList []string) bool {
+	for _, cidr := range cidrList {
+		_, network, err := net.ParseCIDR(cidr)
+		if err != nil {
+			// 尝试作为单个IP处理
+			if whitelistIP := net.ParseIP(cidr); whitelistIP != nil {
+				if ip.Equal(whitelistIP) {
+					return true
+				}
+			}
+			continue
+		}
+
+		if network.Contains(ip) {
+			return true
+		}
+	}
+	return false
+}
@@ -23,11 +23,11 @@ func Marshal(v any) ([]byte, error) {
 }

 func GetJsonType(data json.RawMessage) string {
-	data = bytes.TrimSpace(data)
-	if len(data) == 0 {
+	trimmed := bytes.TrimSpace(data)
+	if len(trimmed) == 0 {
 		return "unknown"
 	}
-	firstChar := bytes.TrimSpace(data)[0]
+	firstChar := trimmed[0]
 	switch firstChar {
 	case '{':
 		return "object"
@@ -17,6 +17,13 @@ var (
 		"flux-",
 		"flux.1-",
 	}
+	OpenAITextModels = []string{
+		"gpt-",
+		"o1",
+		"o3",
+		"o4",
+		"chatgpt",
+	}
 )

 func IsOpenAIResponseOnlyModel(modelName string) bool {
@@ -40,3 +47,13 @@ func IsImageGenerationModel(modelName string) bool {
 	}
 	return false
 }
+
+func IsOpenAITextModel(modelName string) bool {
+	modelName = strings.ToLower(modelName)
+	for _, m := range OpenAITextModels {
+		if strings.Contains(modelName, m) {
+			return true
+		}
+	}
+	return false
+}
@@ -186,23 +186,7 @@ func isIPListed(ip net.IP, list []string) bool {
 		return false
 	}

-	for _, whitelistCIDR := range list {
-		_, network, err := net.ParseCIDR(whitelistCIDR)
-		if err != nil {
-			// 尝试作为单个IP处理
-			if whitelistIP := net.ParseIP(whitelistCIDR); whitelistIP != nil {
-				if ip.Equal(whitelistIP) {
-					return true
-				}
-			}
-			continue
-		}
-
-		if network.Contains(ip) {
-			return true
-		}
-	}
-	return false
+	return IsIpInCIDRList(ip, list)
 }

 // IsIPAccessAllowed 检查IP是否允许访问
@@ -3,12 +3,19 @@ package common
 import (
 	"encoding/base64"
 	"encoding/json"
-	"math/rand"
 	"net/url"
 	"regexp"
 	"strconv"
 	"strings"
 	"unsafe"
+
+	"github.com/samber/lo"
+)
+
+var (
+	maskURLPattern    = regexp.MustCompile(`(http|https)://[^\s/$.?#].[^\s]*`)
+	maskDomainPattern = regexp.MustCompile(`\b(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}\b`)
+	maskIPPattern     = regexp.MustCompile(`\b(?:\d{1,3}\.){3}\d{1,3}\b`)
 )

 func GetStringIfEmpty(str string, defaultValue string) string {
@@ -19,12 +26,10 @@ func GetStringIfEmpty(str string, defaultValue string) string {
 }

 func GetRandomString(length int) string {
-	//rand.Seed(time.Now().UnixNano())
-	key := make([]byte, length)
-	for i := 0; i < length; i++ {
-		key[i] = keyChars[rand.Intn(len(keyChars))]
+	if length <= 0 {
+		return ""
 	}
-	return string(key)
+	return lo.RandomString(length, lo.AlphanumericCharset)
 }

 func MapToJsonStr(m map[string]interface{}) string {
@@ -170,8 +175,7 @@ func maskHostForPlainDomain(domain string) string {
 // api.openai.com -> ***.***.com
 func MaskSensitiveInfo(str string) string {
 	// Mask URLs
-	urlPattern := regexp.MustCompile(`(http|https)://[^\s/$.?#].[^\s]*`)
-	str = urlPattern.ReplaceAllStringFunc(str, func(urlStr string) string {
+	str = maskURLPattern.ReplaceAllStringFunc(str, func(urlStr string) string {
 		u, err := url.Parse(urlStr)
 		if err != nil {
 			return urlStr
@@ -224,14 +228,12 @@ func MaskSensitiveInfo(str string) string {
 	})

 	// Mask domain names without protocol (like openai.com, www.openai.com)
-	domainPattern := regexp.MustCompile(`\b(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}\b`)
-	str = domainPattern.ReplaceAllStringFunc(str, func(domain string) string {
+	str = maskDomainPattern.ReplaceAllStringFunc(str, func(domain string) string {
 		return maskHostForPlainDomain(domain)
 	})

 	// Mask IP addresses
-	ipPattern := regexp.MustCompile(`\b(?:\d{1,3}\.){3}\d{1,3}\b`)
-	str = ipPattern.ReplaceAllString(str, "***.***.***.***")
+	str = maskIPPattern.ReplaceAllString(str, "***.***.***.***")

 	return str
 }
@@ -217,11 +217,6 @@ func IntMax(a int, b int) int {
 	}
 }

-func IsIP(s string) bool {
-	ip := net.ParseIP(s)
-	return ip != nil
-}
-
 func GetUUID() string {
 	code := uuid.New().String()
 	code = strings.Replace(code, "-", "", -1)
@@ -3,8 +3,9 @@ package constant
 type ContextKey string

 const (
-	ContextKeyTokenCountMeta ContextKey = "token_count_meta"
-	ContextKeyPromptTokens   ContextKey = "prompt_tokens"
+	ContextKeyTokenCountMeta  ContextKey = "token_count_meta"
+	ContextKeyPromptTokens    ContextKey = "prompt_tokens"
+	ContextKeyEstimatedTokens ContextKey = "estimated_tokens"

 	ContextKeyOriginalModel    ContextKey = "original_model"
 	ContextKeyRequestStartTime ContextKey = "request_start_time"
@@ -17,6 +18,7 @@ const (
 	ContextKeyTokenSpecificChannelId ContextKey = "specific_channel_id"
 	ContextKeyTokenModelLimitEnabled ContextKey = "token_model_limit_enabled"
 	ContextKeyTokenModelLimit        ContextKey = "token_model_limit"
+	ContextKeyTokenCrossGroupRetry   ContextKey = "token_cross_group_retry"

 	/* channel related keys */
 	ContextKeyChannelId                ContextKey = "channel_id"
@@ -36,6 +38,10 @@ const (
 	ContextKeyChannelMultiKeyIndex     ContextKey = "channel_multi_key_index"
 	ContextKeyChannelKey               ContextKey = "channel_key"

+	ContextKeyAutoGroup           ContextKey = "auto_group"
+	ContextKeyAutoGroupIndex      ContextKey = "auto_group_index"
+	ContextKeyAutoGroupRetryIndex ContextKey = "auto_group_retry_index"
+
 	/* user related keys */
 	ContextKeyUserId      ContextKey = "id"
 	ContextKeyUserSetting ContextKey = "user_setting"
@@ -9,12 +9,14 @@ var CountToken bool
 var GetMediaToken bool
 var GetMediaTokenNotStream bool
 var UpdateTask bool
+var MaxRequestBodyMB int
 var AzureDefaultAPIVersion string
 var GeminiVisionMaxImageNum int
 var NotifyLimitCount int
 var NotificationLimitDurationMinute int
 var GenerateDefaultToken bool
 var ErrorLogEnabled bool
+var TaskQueryLimit int

 // temporary variable for sora patch, will be removed in future
 var TaskPricePatches []string
@@ -15,6 +15,7 @@ const (
 	TaskActionTextGenerate      = "textGenerate"
 	TaskActionFirstTailGenerate = "firstTailGenerate"
 	TaskActionReferenceGenerate = "referenceGenerate"
+	TaskActionRemix             = "remixGenerate"
 )

 var SunoModel2Action = map[string]string{
@@ -2,9 +2,9 @@ package controller

 import (
 	"github.com/QuantumNous/new-api/common"
-	"github.com/QuantumNous/new-api/dto"
 	"github.com/QuantumNous/new-api/model"
 	"github.com/QuantumNous/new-api/setting/operation_setting"
+	"github.com/QuantumNous/new-api/types"
 	"github.com/gin-gonic/gin"
 )

@@ -29,7 +29,7 @@ func GetSubscription(c *gin.Context) {
 		expiredTime = 0
 	}
 	if err != nil {
-		openAIError := dto.OpenAIError{
+		openAIError := types.OpenAIError{
 			Message: err.Error(),
 			Type:    "upstream_error",
 		}
@@ -81,7 +81,7 @@ func GetUsage(c *gin.Context) {
 		quota, err = model.GetUserUsedQuota(userId)
 	}
 	if err != nil {
-		openAIError := dto.OpenAIError{
+		openAIError := types.OpenAIError{
 			Message: err.Error(),
 			Type:    "new_api_error",
 		}
@@ -351,7 +351,7 @@ func testChannel(channel *model.Channel, testModel string, endpointType string)
 			newAPIError: types.NewOpenAIError(err, types.ErrorCodeReadResponseBodyFailed, http.StatusInternalServerError),
 		}
 	}
-	info.PromptTokens = usage.PromptTokens
+	info.SetEstimatePromptTokens(usage.PromptTokens)

 	quota := 0
 	if !priceData.UsePrice {
@@ -165,6 +165,30 @@ func GetAllChannels(c *gin.Context) {
 	return
 }

+func buildFetchModelsHeaders(channel *model.Channel, key string) (http.Header, error) {
+	var headers http.Header
+	switch channel.Type {
+	case constant.ChannelTypeAnthropic:
+		headers = GetClaudeAuthHeader(key)
+	default:
+		headers = GetAuthHeader(key)
+	}
+
+	headerOverride := channel.GetHeaderOverride()
+	for k, v := range headerOverride {
+		str, ok := v.(string)
+		if !ok {
+			return nil, fmt.Errorf("invalid header override for key %s", k)
+		}
+		if strings.Contains(str, "{api_key}") {
+			str = strings.ReplaceAll(str, "{api_key}", key)
+		}
+		headers.Set(k, str)
+	}
+
+	return headers, nil
+}
+
 func FetchUpstreamModels(c *gin.Context) {
 	id, err := strconv.Atoi(c.Param("id"))
 	if err != nil {
@@ -223,14 +247,13 @@ func FetchUpstreamModels(c *gin.Context) {
 	}
 	key = strings.TrimSpace(key)

-	// 获取响应体 - 根据渠道类型决定是否添加 AuthHeader
-	var body []byte
-	switch channel.Type {
-	case constant.ChannelTypeAnthropic:
-		body, err = GetResponseBody("GET", url, channel, GetClaudeAuthHeader(key))
-	default:
-		body, err = GetResponseBody("GET", url, channel, GetAuthHeader(key))
+	headers, err := buildFetchModelsHeaders(channel, key)
+	if err != nil {
+		common.ApiError(c, err)
+		return
 	}
+
+	body, err := GetResponseBody("GET", url, channel, headers)
 	if err != nil {
 		common.ApiError(c, err)
 		return
@@ -114,7 +114,7 @@ func DiscordOAuth(c *gin.Context) {
 		DiscordBind(c)
 		return
 	}
-		if !system_setting.GetDiscordSettings().Enabled {
+	if !system_setting.GetDiscordSettings().Enabled {
 		c.JSON(http.StatusOK, gin.H{
 			"success": false,
 			"message": "管理员未开启通过 Discord 登录以及注册",
@@ -18,6 +18,7 @@ import (
 	"github.com/QuantumNous/new-api/service"
 	"github.com/QuantumNous/new-api/setting/operation_setting"
 	"github.com/QuantumNous/new-api/setting/ratio_setting"
+	"github.com/QuantumNous/new-api/types"
 	"github.com/gin-gonic/gin"
 	"github.com/samber/lo"
 )
@@ -275,7 +276,7 @@ func RetrieveModel(c *gin.Context, modelType int) {
 			c.JSON(200, aiModel)
 		}
 	} else {
-		openAIError := dto.OpenAIError{
+		openAIError := types.OpenAIError{
 			Message: fmt.Sprintf("The model '%s' does not exist", modelId),
 			Type:    "invalid_request_error",
 			Param:   "model",
@@ -3,12 +3,10 @@ package controller
 import (
 	"errors"
 	"fmt"
-	"time"

-	"github.com/QuantumNous/new-api/common"
-	"github.com/QuantumNous/new-api/constant"
 	"github.com/QuantumNous/new-api/middleware"
 	"github.com/QuantumNous/new-api/model"
+	relaycommon "github.com/QuantumNous/new-api/relay/common"
 	"github.com/QuantumNous/new-api/types"

 	"github.com/gin-gonic/gin"
@@ -31,8 +29,11 @@ func Playground(c *gin.Context) {
 		return
 	}

-	group := common.GetContextKeyString(c, constant.ContextKeyUsingGroup)
-	modelName := c.GetString("original_model")
+	relayInfo, err := relaycommon.GenRelayInfo(c, types.RelayFormatOpenAI, nil, nil)
+	if err != nil {
+		newAPIError = types.NewError(err, types.ErrorCodeInvalidRequest, types.ErrOptionWithSkipRetry())
+		return
+	}

 	userId := c.GetInt("id")

@@ -46,16 +47,10 @@ func Playground(c *gin.Context) {

 	tempToken := &model.Token{
 		UserId: userId,
-		Name:   fmt.Sprintf("playground-%s", group),
-		Group:  group,
+		Name:   fmt.Sprintf("playground-%s", relayInfo.UsingGroup),
+		Group:  relayInfo.UsingGroup,
 	}
 	_ = middleware.SetupContextForToken(c, tempToken)
-	_, newAPIError = getChannel(c, group, modelName, 0)
-	if newAPIError != nil {
-		return
-	}
-	//middleware.SetupContextForSelectedChannel(c, channel, playgroundRequest.Model)
-	common.SetContextKey(c, constant.ContextKeyRequestStartTime, time.Now())

 	Relay(c, types.RelayFormatOpenAI)
 }
@@ -2,6 +2,7 @@ package controller

 import (
 	"bytes"
+	"errors"
 	"fmt"
 	"io"
 	"log"
@@ -64,8 +65,8 @@ func geminiRelayHandler(c *gin.Context, info *relaycommon.RelayInfo) *types.NewA
 func Relay(c *gin.Context, relayFormat types.RelayFormat) {

 	requestId := c.GetString(common.RequestIdKey)
-	group := common.GetContextKeyString(c, constant.ContextKeyUsingGroup)
-	originalModel := common.GetContextKeyString(c, constant.ContextKeyOriginalModel)
+	//group := common.GetContextKeyString(c, constant.ContextKeyUsingGroup)
+	//originalModel := common.GetContextKeyString(c, constant.ContextKeyOriginalModel)

 	var (
 		newAPIError *types.NewAPIError
@@ -104,7 +105,12 @@ func Relay(c *gin.Context, relayFormat types.RelayFormat) {

 	request, err := helper.GetAndValidateRequest(c, relayFormat)
 	if err != nil {
-		newAPIError = types.NewError(err, types.ErrorCodeInvalidRequest)
+		// Map "request body too large" to 413 so clients can handle it correctly
+		if common.IsRequestBodyTooLargeError(err) || errors.Is(err, common.ErrRequestBodyTooLarge) {
+			newAPIError = types.NewErrorWithStatusCode(err, types.ErrorCodeReadRequestBodyFailed, http.StatusRequestEntityTooLarge, types.ErrOptionWithSkipRetry())
+		} else {
+			newAPIError = types.NewError(err, types.ErrorCodeInvalidRequest)
+		}
 		return
 	}

@@ -114,9 +120,17 @@ func Relay(c *gin.Context, relayFormat types.RelayFormat) {
 		return
 	}

-	meta := request.GetTokenCountMeta()
+	needSensitiveCheck := setting.ShouldCheckPromptSensitive()
+	needCountToken := constant.CountToken
+	// Avoid building huge CombineText (strings.Join) when token counting and sensitive check are both disabled.
+	var meta *types.TokenCountMeta
+	if needSensitiveCheck || needCountToken {
+		meta = request.GetTokenCountMeta()
+	} else {
+		meta = fastTokenCountMetaForPricing(request)
+	}

-	if setting.ShouldCheckPromptSensitive() {
+	if needSensitiveCheck && meta != nil {
 		contains, words := service.CheckSensitiveText(meta.CombineText)
 		if contains {
 			logger.LogWarn(c, fmt.Sprintf("user sensitive words detected: %s", strings.Join(words, ", ")))
@@ -125,13 +139,13 @@ func Relay(c *gin.Context, relayFormat types.RelayFormat) {
 		}
 	}

-	tokens, err := service.CountRequestToken(c, meta, relayInfo)
+	tokens, err := service.EstimateRequestToken(c, meta, relayInfo)
 	if err != nil {
 		newAPIError = types.NewError(err, types.ErrorCodeCountTokenFailed)
 		return
 	}

-	relayInfo.SetPromptTokens(tokens)
+	relayInfo.SetEstimatePromptTokens(tokens)

 	priceData, err := helper.ModelPriceHelper(c, relayInfo, tokens, meta)
 	if err != nil {
@@ -157,16 +171,32 @@ func Relay(c *gin.Context, relayFormat types.RelayFormat) {
 		}
 	}()

-	for i := 0; i <= common.RetryTimes; i++ {
-		channel, err := getChannel(c, group, originalModel, i)
-		if err != nil {
-			logger.LogError(c, err.Error())
-			newAPIError = err
+	retryParam := &service.RetryParam{
+		Ctx:        c,
+		TokenGroup: relayInfo.TokenGroup,
+		ModelName:  relayInfo.OriginModelName,
+		Retry:      common.GetPointer(0),
+	}
+
+	for ; retryParam.GetRetry() <= common.RetryTimes; retryParam.IncreaseRetry() {
+		channel, channelErr := getChannel(c, relayInfo, retryParam)
+		if channelErr != nil {
+			logger.LogError(c, channelErr.Error())
+			newAPIError = channelErr
 			break
 		}

 		addUsedChannel(c, channel.Id)
-		requestBody, _ := common.GetRequestBody(c)
+		requestBody, bodyErr := common.GetRequestBody(c)
+		if bodyErr != nil {
+			// Ensure consistent 413 for oversized bodies even when error occurs later (e.g., retry path)
+			if common.IsRequestBodyTooLargeError(bodyErr) || errors.Is(bodyErr, common.ErrRequestBodyTooLarge) {
+				newAPIError = types.NewErrorWithStatusCode(bodyErr, types.ErrorCodeReadRequestBodyFailed, http.StatusRequestEntityTooLarge, types.ErrOptionWithSkipRetry())
+			} else {
+				newAPIError = types.NewErrorWithStatusCode(bodyErr, types.ErrorCodeReadRequestBodyFailed, http.StatusBadRequest, types.ErrOptionWithSkipRetry())
+			}
+			break
+		}
 		c.Request.Body = io.NopCloser(bytes.NewBuffer(requestBody))

 		switch relayFormat {
@@ -186,7 +216,7 @@ func Relay(c *gin.Context, relayFormat types.RelayFormat) {

 		processChannelError(c, *types.NewChannelError(channel.Id, channel.Type, channel.Name, channel.ChannelInfo.IsMultiKey, common.GetContextKeyString(c, constant.ContextKeyChannelKey), channel.GetAutoBan()), newAPIError)

-		if !shouldRetry(c, newAPIError, common.RetryTimes-i) {
+		if !shouldRetry(c, newAPIError, common.RetryTimes-retryParam.GetRetry()) {
 			break
 		}
 	}
@@ -211,8 +241,35 @@ func addUsedChannel(c *gin.Context, channelId int) {
 	c.Set("use_channel", useChannel)
 }

-func getChannel(c *gin.Context, group, originalModel string, retryCount int) (*model.Channel, *types.NewAPIError) {
-	if retryCount == 0 {
+func fastTokenCountMetaForPricing(request dto.Request) *types.TokenCountMeta {
+	if request == nil {
+		return &types.TokenCountMeta{}
+	}
+	meta := &types.TokenCountMeta{
+		TokenType: types.TokenTypeTokenizer,
+	}
+	switch r := request.(type) {
+	case *dto.GeneralOpenAIRequest:
+		if r.MaxCompletionTokens > r.MaxTokens {
+			meta.MaxTokens = int(r.MaxCompletionTokens)
+		} else {
+			meta.MaxTokens = int(r.MaxTokens)
+		}
+	case *dto.OpenAIResponsesRequest:
+		meta.MaxTokens = int(r.MaxOutputTokens)
+	case *dto.ClaudeRequest:
+		meta.MaxTokens = int(r.MaxTokens)
+	case *dto.ImageRequest:
+		// Pricing for image requests depends on ImagePriceRatio; safe to compute even when CountToken is disabled.
+		return r.GetTokenCountMeta()
+	default:
+		// Best-effort: leave CombineText empty to avoid large allocations.
+	}
+	return meta
+}
+
+func getChannel(c *gin.Context, info *relaycommon.RelayInfo, retryParam *service.RetryParam) (*model.Channel, *types.NewAPIError) {
+	if info.ChannelMeta == nil {
 		autoBan := c.GetBool("auto_ban")
 		autoBanInt := 1
 		if !autoBan {
@@ -225,14 +282,18 @@ func getChannel(c *gin.Context, group, originalModel string, retryCount int) (*m
 			AutoBan: &autoBanInt,
 		}, nil
 	}
-	channel, selectGroup, err := service.CacheGetRandomSatisfiedChannel(c, group, originalModel, retryCount)
+	channel, selectGroup, err := service.CacheGetRandomSatisfiedChannel(retryParam)
+
+	info.PriceData.GroupRatioInfo = helper.HandleGroupRatio(c, info)
+
 	if err != nil {
-		return nil, types.NewError(fmt.Errorf("获取分组 %s 下模型 %s 的可用渠道失败（retry）: %s", selectGroup, originalModel, err.Error()), types.ErrorCodeGetChannelFailed, types.ErrOptionWithSkipRetry())
+		return nil, types.NewError(fmt.Errorf("获取分组 %s 下模型 %s 的可用渠道失败（retry）: %s", selectGroup, info.OriginModelName, err.Error()), types.ErrorCodeGetChannelFailed, types.ErrOptionWithSkipRetry())
 	}
 	if channel == nil {
-		return nil, types.NewError(fmt.Errorf("分组 %s 下模型 %s 的可用渠道不存在（retry）", selectGroup, originalModel), types.ErrorCodeGetChannelFailed, types.ErrOptionWithSkipRetry())
+		return nil, types.NewError(fmt.Errorf("分组 %s 下模型 %s 的可用渠道不存在（retry）", selectGroup, info.OriginModelName), types.ErrorCodeGetChannelFailed, types.ErrOptionWithSkipRetry())
 	}
-	newAPIError := middleware.SetupContextForSelectedChannel(c, channel, originalModel)
+
+	newAPIError := middleware.SetupContextForSelectedChannel(c, channel, info.OriginModelName)
 	if newAPIError != nil {
 		return nil, newAPIError
 	}
@@ -285,7 +346,7 @@ func processChannelError(c *gin.Context, channelError types.ChannelError, err *t
 	logger.LogError(c, fmt.Sprintf("channel error (channel #%d, status code: %d): %s", channelError.ChannelId, err.StatusCode, err.Error()))
 	// 不要使用context获取渠道信息，异步处理时可能会出现渠道信息不一致的情况
 	// do not use context to get channel info, there may be inconsistent channel info when processing asynchronously
-	if service.ShouldDisableChannel(channelError.ChannelId, err) && channelError.AutoBan {
+	if service.ShouldDisableChannel(channelError.ChannelType, err) && channelError.AutoBan {
 		gopool.Go(func() {
 			service.DisableChannel(channelError, err.Error())
 		})
@@ -366,7 +427,7 @@ func RelayMidjourney(c *gin.Context) {
 }

 func RelayNotImplemented(c *gin.Context) {
-	err := dto.OpenAIError{
+	err := types.OpenAIError{
 		Message: "API not implemented",
 		Type:    "new_api_error",
 		Param:   "",
@@ -378,7 +439,7 @@ func RelayNotImplemented(c *gin.Context) {
 }

 func RelayNotFound(c *gin.Context) {
-	err := dto.OpenAIError{
+	err := types.OpenAIError{
 		Message: fmt.Sprintf("Invalid URL (%s %s)", c.Request.Method, c.Request.URL.Path),
 		Type:    "invalid_request_error",
 		Param:   "",
@@ -392,8 +453,6 @@ func RelayNotFound(c *gin.Context) {
 func RelayTask(c *gin.Context) {
 	retryTimes := common.RetryTimes
 	channelId := c.GetInt("channel_id")
-	group := c.GetString("group")
-	originalModel := c.GetString("original_model")
 	c.Set("use_channel", []string{fmt.Sprintf("%d", channelId)})
 	relayInfo, err := relaycommon.GenRelayInfo(c, types.RelayFormatTask, nil, nil)
 	if err != nil {
@@ -403,8 +462,14 @@ func RelayTask(c *gin.Context) {
 	if taskErr == nil {
 		retryTimes = 0
 	}
-	for i := 0; shouldRetryTaskRelay(c, channelId, taskErr, retryTimes) && i < retryTimes; i++ {
-		channel, newAPIError := getChannel(c, group, originalModel, i)
+	retryParam := &service.RetryParam{
+		Ctx:        c,
+		TokenGroup: relayInfo.TokenGroup,
+		ModelName:  relayInfo.OriginModelName,
+		Retry:      common.GetPointer(0),
+	}
+	for ; shouldRetryTaskRelay(c, channelId, taskErr, retryTimes) && retryParam.GetRetry() < retryTimes; retryParam.IncreaseRetry() {
+		channel, newAPIError := getChannel(c, relayInfo, retryParam)
 		if newAPIError != nil {
 			logger.LogError(c, fmt.Sprintf("CacheGetRandomSatisfiedChannel failed: %s", newAPIError.Error()))
 			taskErr = service.TaskErrorWrapperLocal(newAPIError.Err, "get_channel_failed", http.StatusInternalServerError)
@@ -414,10 +479,18 @@ func RelayTask(c *gin.Context) {
 		useChannel := c.GetStringSlice("use_channel")
 		useChannel = append(useChannel, fmt.Sprintf("%d", channelId))
 		c.Set("use_channel", useChannel)
-		logger.LogInfo(c, fmt.Sprintf("using channel #%d to retry (remain times %d)", channel.Id, i))
+		logger.LogInfo(c, fmt.Sprintf("using channel #%d to retry (remain times %d)", channel.Id, retryParam.GetRetry()))
 		//middleware.SetupContextForSelectedChannel(c, channel, originalModel)

-		requestBody, _ := common.GetRequestBody(c)
+		requestBody, err := common.GetRequestBody(c)
+		if err != nil {
+			if common.IsRequestBodyTooLargeError(err) || errors.Is(err, common.ErrRequestBodyTooLarge) {
+				taskErr = service.TaskErrorWrapperLocal(err, "read_request_body_failed", http.StatusRequestEntityTooLarge)
+			} else {
+				taskErr = service.TaskErrorWrapperLocal(err, "read_request_body_failed", http.StatusBadRequest)
+			}
+			break
+		}
 		c.Request.Body = io.NopCloser(bytes.NewBuffer(requestBody))
 		taskErr = taskRelayHandler(c, relayInfo)
 	}
@@ -29,7 +29,7 @@ func UpdateTaskBulk() {
 		time.Sleep(time.Duration(15) * time.Second)
 		common.SysLog("任务进度轮询开始")
 		ctx := context.TODO()
-		allTasks := model.GetAllUnFinishSyncTasks(500)
+		allTasks := model.GetAllUnFinishSyncTasks(constant.TaskQueryLimit)
 		platformTask := make(map[constant.TaskPlatform][]*model.Task)
 		for _, t := range allTasks {
 			platformTask[t.Platform] = append(platformTask[t.Platform], t)
@@ -88,7 +88,7 @@ func UpdateSunoTaskAll(ctx context.Context, taskChannelM map[int][]string, taskM
 	for channelId, taskIds := range taskChannelM {
 		err := updateSunoTaskAll(ctx, channelId, taskIds, taskM)
 		if err != nil {
-			logger.LogError(ctx, fmt.Sprintf("渠道 #%d 更新异步任务失败: %d", channelId, err.Error()))
+			logger.LogError(ctx, fmt.Sprintf("渠道 #%d 更新异步任务失败: %s", channelId, err.Error()))
 		}
 	}
 	return nil
@@ -116,9 +116,10 @@ func updateSunoTaskAll(ctx context.Context, channelId int, taskIds []string, tas
 	if adaptor == nil {
 		return errors.New("adaptor not found")
 	}
+	proxy := channel.GetSetting().Proxy
 	resp, err := adaptor.FetchTask(*channel.BaseURL, channel.Key, map[string]any{
 		"ids": taskIds,
-	})
+	}, proxy)
 	if err != nil {
 		common.SysLog(fmt.Sprintf("Get Task Do req error: %v", err))
 		return err
@@ -140,7 +141,7 @@ func updateSunoTaskAll(ctx context.Context, channelId int, taskIds []string, tas
 		return err
 	}
 	if !responseItems.IsSuccess() {
-		common.SysLog(fmt.Sprintf("渠道 #%d 未完成的任务有: %d, 成功获取到任务数: %d", channelId, len(taskIds), string(responseBody)))
+		common.SysLog(fmt.Sprintf("渠道 #%d 未完成的任务有: %d, 成功获取到任务数: %s", channelId, len(taskIds), string(responseBody)))
 		return err
 	}

@@ -67,6 +67,7 @@ func updateVideoSingleTask(ctx context.Context, adaptor channel.TaskAdaptor, cha
 	if channel.GetBaseURL() != "" {
 		baseURL = channel.GetBaseURL()
 	}
+	proxy := channel.GetSetting().Proxy

 	task := taskM[taskId]
 	if task == nil {
@@ -76,7 +77,7 @@ func updateVideoSingleTask(ctx context.Context, adaptor channel.TaskAdaptor, cha
 	resp, err := adaptor.FetchTask(baseURL, channel.Key, map[string]any{
 		"task_id": taskId,
 		"action":  task.Action,
-	})
+	}, proxy)
 	if err != nil {
 		return fmt.Errorf("fetchTask failed for task %s: %w", taskId, err)
 	}
@@ -142,7 +142,7 @@ func AddToken(c *gin.Context) {
 		common.ApiError(c, err)
 		return
 	}
-	if len(token.Name) > 30 {
+	if len(token.Name) > 50 {
 		c.JSON(http.StatusOK, gin.H{
 			"success": false,
 			"message": "令牌名称过长",
@@ -171,6 +171,7 @@ func AddToken(c *gin.Context) {
 		ModelLimits:        token.ModelLimits,
 		AllowIps:           token.AllowIps,
 		Group:              token.Group,
+		CrossGroupRetry:    token.CrossGroupRetry,
 	}
 	err = cleanToken.Insert()
 	if err != nil {
@@ -208,7 +209,7 @@ func UpdateToken(c *gin.Context) {
 		common.ApiError(c, err)
 		return
 	}
-	if len(token.Name) > 30 {
+	if len(token.Name) > 50 {
 		c.JSON(http.StatusOK, gin.H{
 			"success": false,
 			"message": "令牌名称过长",
@@ -248,6 +249,7 @@ func UpdateToken(c *gin.Context) {
 		cleanToken.ModelLimits = token.ModelLimits
 		cleanToken.AllowIps = token.AllowIps
 		cleanToken.Group = token.Group
+		cleanToken.CrossGroupRetry = token.CrossGroupRetry
 	}
 	err = cleanToken.Update()
 	if err != nil {
@@ -7,12 +7,12 @@ import (
 	"encoding/hex"
 	"encoding/json"
 	"fmt"
-	"io"
-	"log"
-	"net/http"
 	"github.com/QuantumNous/new-api/common"
 	"github.com/QuantumNous/new-api/model"
 	"github.com/QuantumNous/new-api/setting"
+	"io"
+	"log"
+	"net/http"
 	"time"

 	"github.com/gin-gonic/gin"
@@ -110,18 +110,17 @@ func setupLogin(user *model.User, c *gin.Context) {
 		})
 		return
 	}
-	cleanUser := model.User{
-		Id:          user.Id,
-		Username:    user.Username,
-		DisplayName: user.DisplayName,
-		Role:        user.Role,
-		Status:      user.Status,
-		Group:       user.Group,
-	}
 	c.JSON(http.StatusOK, gin.H{
 		"message": "",
 		"success": true,
-		"data":    cleanUser,
+		"data": map[string]any{
+			"id":           user.Id,
+			"username":     user.Username,
+			"display_name": user.DisplayName,
+			"role":         user.Role,
+			"status":       user.Status,
+			"group":        user.Group,
+		},
 	})
 }

@@ -1,6 +1,7 @@
 package controller

 import (
+	"context"
 	"fmt"
 	"io"
 	"net/http"
@@ -10,6 +11,7 @@ import (
 	"github.com/QuantumNous/new-api/constant"
 	"github.com/QuantumNous/new-api/logger"
 	"github.com/QuantumNous/new-api/model"
+	"github.com/QuantumNous/new-api/service"

 	"github.com/gin-gonic/gin"
 )
@@ -75,11 +77,22 @@ func VideoProxy(c *gin.Context) {
 	}

 	var videoURL string
-	client := &http.Client{
-		Timeout: 60 * time.Second,
+	proxy := channel.GetSetting().Proxy
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		logger.LogError(c.Request.Context(), fmt.Sprintf("Failed to create proxy client for task %s: %s", taskID, err.Error()))
+		c.JSON(http.StatusInternalServerError, gin.H{
+			"error": gin.H{
+				"message": "Failed to create proxy client",
+				"type":    "server_error",
+			},
+		})
+		return
 	}

-	req, err := http.NewRequestWithContext(c.Request.Context(), http.MethodGet, "", nil)
+	ctx, cancel := context.WithTimeout(c.Request.Context(), 60*time.Second)
+	defer cancel()
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, "", nil)
 	if err != nil {
 		logger.LogError(c.Request.Context(), fmt.Sprintf("Failed to create request: %s", err.Error()))
 		c.JSON(http.StatusInternalServerError, gin.H{
@@ -35,10 +35,11 @@ func getGeminiVideoURL(channel *model.Channel, task *model.Task, apiKey string)
 		return "", fmt.Errorf("api key not available for task")
 	}

+	proxy := channel.GetSetting().Proxy
 	resp, err := adaptor.FetchTask(baseURL, apiKey, map[string]any{
 		"task_id": task.TaskID,
 		"action":  task.Action,
-	})
+	}, proxy)
 	if err != nil {
 		return "", fmt.Errorf("fetch task failed: %w", err)
 	}
@@ -16,13 +16,13 @@
      "name": "OpenAI格式(Responses)"
    },
    {
-      "name": "图片生成/编辑(Images)"
+      "name": "图片生成"
    },
    {
-      "name": "图片生成/编辑(Images)/OpenAI兼容格式"
+      "name": "图片生成/OpenAI兼容格式"
    },
    {
-      "name": "图片生成/编辑(Images)/Qwen千问"
+      "name": "图片生成/Qwen千问"
    },
    {
      "name": "视频生成"
@@ -292,7 +292,7 @@
        "description": " 百炼qwen-image系列图片生成",
        "operationId": "createImage",
        "tags": [
-          "图片生成/编辑(Images)/Qwen千问"
+          "图片生成/Qwen千问"
        ],
        "parameters": [],
        "requestBody": {
@@ -408,7 +408,7 @@
        "description": " 百炼qwen-image系列图片生成",
        "operationId": "createImage",
        "tags": [
-          "图片生成/编辑(Images)/Qwen千问"
+          "图片生成/Qwen千问"
        ],
        "parameters": [],
        "requestBody": {
@@ -2,6 +2,7 @@ package dto

 import (
 	"encoding/json"
+	"strings"

 	"github.com/QuantumNous/new-api/types"

@@ -24,11 +25,14 @@ func (r *AudioRequest) GetTokenCountMeta() *types.TokenCountMeta {
 		CombineText: r.Input,
 		TokenType:   types.TokenTypeTextNumber,
 	}
+	if strings.Contains(r.Model, "gpt") {
+		meta.TokenType = types.TokenTypeTokenizer
+	}
 	return meta
 }

 func (r *AudioRequest) IsStream(c *gin.Context) bool {
-	return false
+	return r.StreamFormat == "sse"
 }

 func (r *AudioRequest) SetModelName(modelName string) {
@@ -1,26 +1,31 @@
 package dto

-import "github.com/QuantumNous/new-api/types"
+import (
+	"encoding/json"

-type OpenAIError struct {
-	Message string `json:"message"`
-	Type    string `json:"type"`
-	Param   string `json:"param"`
-	Code    any    `json:"code"`
-}
+	"github.com/QuantumNous/new-api/common"
+	"github.com/QuantumNous/new-api/types"
+)
+
+//type OpenAIError struct {
+//	Message string `json:"message"`
+//	Type    string `json:"type"`
+//	Param   string `json:"param"`
+//	Code    any    `json:"code"`
+//}

 type OpenAIErrorWithStatusCode struct {
-	Error      OpenAIError `json:"error"`
-	StatusCode int         `json:"status_code"`
+	Error      types.OpenAIError `json:"error"`
+	StatusCode int               `json:"status_code"`
 	LocalError bool
 }

 type GeneralErrorResponse struct {
-	Error    types.OpenAIError `json:"error"`
-	Message  string            `json:"message"`
-	Msg      string            `json:"msg"`
-	Err      string            `json:"err"`
-	ErrorMsg string            `json:"error_msg"`
+	Error    json.RawMessage `json:"error"`
+	Message  string          `json:"message"`
+	Msg      string          `json:"msg"`
+	Err      string          `json:"err"`
+	ErrorMsg string          `json:"error_msg"`
 	Header   struct {
 		Message string `json:"message"`
 	} `json:"header"`
@@ -31,9 +36,35 @@ type GeneralErrorResponse struct {
 	} `json:"response"`
 }

+func (e GeneralErrorResponse) TryToOpenAIError() *types.OpenAIError {
+	var openAIError types.OpenAIError
+	if len(e.Error) > 0 {
+		err := common.Unmarshal(e.Error, &openAIError)
+		if err == nil && openAIError.Message != "" {
+			return &openAIError
+		}
+	}
+	return nil
+}
+
 func (e GeneralErrorResponse) ToMessage() string {
-	if e.Error.Message != "" {
-		return e.Error.Message
+	if len(e.Error) > 0 {
+		switch common.GetJsonType(e.Error) {
+		case "object":
+			var openAIError types.OpenAIError
+			err := common.Unmarshal(e.Error, &openAIError)
+			if err == nil && openAIError.Message != "" {
+				return openAIError.Message
+			}
+		case "string":
+			var msg string
+			err := common.Unmarshal(e.Error, &msg)
+			if err == nil && msg != "" {
+				return msg
+			}
+		default:
+			return string(e.Error)
+		}
 	}
 	if e.Message != "" {
 		return e.Message
@@ -142,7 +142,7 @@ type GeminiThinkingConfig struct {
 	IncludeThoughts bool `json:"includeThoughts,omitempty"`
 	ThinkingBudget  *int `json:"thinkingBudget,omitempty"`
 	// TODO Conflict with thinkingbudget.
-	ThinkingLevel json.RawMessage `json:"thinkingLevel,omitempty"`
+	ThinkingLevel string `json:"thinkingLevel,omitempty"`
 }

 // UnmarshalJSON allows GeminiThinkingConfig to accept both snake_case and camelCase fields.
@@ -150,9 +150,9 @@ func (c *GeminiThinkingConfig) UnmarshalJSON(data []byte) error {
 	type Alias GeminiThinkingConfig
 	var aux struct {
 		Alias
-		IncludeThoughtsSnake *bool           `json:"include_thoughts,omitempty"`
-		ThinkingBudgetSnake  *int            `json:"thinking_budget,omitempty"`
-		ThinkingLevelSnake   json.RawMessage `json:"thinking_level,omitempty"`
+		IncludeThoughtsSnake *bool  `json:"include_thoughts,omitempty"`
+		ThinkingBudgetSnake  *int   `json:"thinking_budget,omitempty"`
+		ThinkingLevelSnake   string `json:"thinking_level,omitempty"`
 	}

 	if err := common.Unmarshal(data, &aux); err != nil {
@@ -169,7 +169,7 @@ func (c *GeminiThinkingConfig) UnmarshalJSON(data []byte) error {
 		c.ThinkingBudget = aux.ThinkingBudgetSnake
 	}

-	if len(aux.ThinkingLevelSnake) > 0 {
+	if aux.ThinkingLevelSnake != "" {
 		c.ThinkingLevel = aux.ThinkingLevelSnake
 	}

@@ -27,8 +27,11 @@ type ImageRequest struct {
 	OutputCompression json.RawMessage `json:"output_compression,omitempty"`
 	PartialImages     json.RawMessage `json:"partial_images,omitempty"`
 	// Stream            bool            `json:"stream,omitempty"`
-	Watermark *bool           `json:"watermark,omitempty"`
-	Image     json.RawMessage `json:"image,omitempty"`
+	Watermark *bool `json:"watermark,omitempty"`
+	// zhipu 4v
+	WatermarkEnabled json.RawMessage `json:"watermark_enabled,omitempty"`
+	UserId           json.RawMessage `json:"user_id,omitempty"`
+	Image            json.RawMessage `json:"image,omitempty"`
 	// 用匿名参数接收额外参数
 	Extra map[string]json.RawMessage `json:"-"`
 }
@@ -83,6 +83,7 @@ type GeneralOpenAIRequest struct {
 	// Ali Qwen Params
 	VlHighResolutionImages json.RawMessage `json:"vl_high_resolution_images,omitempty"`
 	EnableThinking         any             `json:"enable_thinking,omitempty"`
+	ChatTemplateKwargs     json.RawMessage `json:"chat_template_kwargs,omitempty"`
 	// ollama Params
 	Think json.RawMessage `json:"think,omitempty"`
 	// baidu v2
@@ -33,7 +33,7 @@ require (
 	github.com/mewkiz/flac v1.0.13
 	github.com/pkg/errors v0.9.1
 	github.com/pquerna/otp v1.5.0
-	github.com/samber/lo v1.39.0
+	github.com/samber/lo v1.52.0
 	github.com/shirou/gopsutil v3.21.11+incompatible
 	github.com/shopspring/decimal v1.4.0
 	github.com/stripe/stripe-go/v81 v81.4.0
@@ -99,6 +99,7 @@ require (
 	github.com/mitchellh/mapstructure v1.5.0 // indirect
 	github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
 	github.com/modern-go/reflect2 v1.0.2 // indirect
+	github.com/ncruces/go-strftime v0.1.9 // indirect
 	github.com/pelletier/go-toml/v2 v2.2.1 // indirect
 	github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
 	github.com/tidwall/match v1.1.1 // indirect
@@ -110,13 +111,13 @@ require (
 	github.com/x448/float16 v0.8.4 // indirect
 	github.com/yusufpapurcu/wmi v1.2.3 // indirect
 	golang.org/x/arch v0.21.0 // indirect
-	golang.org/x/exp v0.0.0-20240404231335-c0f41cb1a7a0 // indirect
+	golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b // indirect
 	golang.org/x/sys v0.38.0 // indirect
 	golang.org/x/text v0.31.0 // indirect
 	google.golang.org/protobuf v1.34.2 // indirect
 	gopkg.in/yaml.v3 v3.0.1 // indirect
-	modernc.org/libc v1.22.5 // indirect
-	modernc.org/mathutil v1.5.0 // indirect
-	modernc.org/memory v1.5.0 // indirect
-	modernc.org/sqlite v1.23.1 // indirect
+	modernc.org/libc v1.66.10 // indirect
+	modernc.org/mathutil v1.7.1 // indirect
+	modernc.org/memory v1.11.0 // indirect
+	modernc.org/sqlite v1.40.1 // indirect
 )
@@ -120,6 +120,7 @@ github.com/google/go-tpm v0.9.5/go.mod h1:h9jEsEECg7gtLis0upRBQU+GhYVH6jMjrFxI8u
 github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
 github.com/google/pprof v0.0.0-20221118152302-e6195bd50e26 h1:Xim43kblpZXfIBQsbuBVKCudVG457BR2GZFIz3uw3hQ=
 github.com/google/pprof v0.0.0-20221118152302-e6195bd50e26/go.mod h1:dDKJzRmX4S37WGHujM7tX//fmj1uioxKzKxz3lo4HJo=
+github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
 github.com/google/uuid v1.1.2/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
 github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
 github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
@@ -193,6 +194,8 @@ github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJ
 github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742/go.mod h1:bx2lNnkwVCuqBIxFjflWJWanXIb3RllmbCylyMrvgv0=
 github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M=
 github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
+github.com/ncruces/go-strftime v0.1.9 h1:bY0MQC28UADQmHmaF5dgpLmImcShSi2kHU9XLdhx/f4=
+github.com/ncruces/go-strftime v0.1.9/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls=
 github.com/nxadm/tail v1.4.8 h1:nPr65rt6Y5JFSKQO7qToXr7pePgD6Gwiw05lkbyAQTE=
 github.com/nxadm/tail v1.4.8/go.mod h1:+ncqLTQzXmGhMZNUePPaPqPvBxHAIsmXswZKocGu+AU=
 github.com/onsi/ginkgo v1.16.5 h1:8xi0RTUf59SOSfEtZMvwTvXYMzG4gV23XVHOZiXNtnE=
@@ -219,6 +222,8 @@ github.com/rogpeppe/go-internal v1.8.0 h1:FCbCCtXNOY3UtUuHUYaghJg4y7Fd14rXifAYUA
 github.com/rogpeppe/go-internal v1.8.0/go.mod h1:WmiCO8CzOY8rg0OYDC4/i/2WRWAB6poM+XZ2dLUbcbE=
 github.com/samber/lo v1.39.0 h1:4gTz1wUhNYLhFSKl6O+8peW0v2F4BCY034GRpU9WnuA=
 github.com/samber/lo v1.39.0/go.mod h1:+m/ZKRl6ClXCE2Lgf3MsQlWfh4bn1bz6CXEOxnEXnEA=
+github.com/samber/lo v1.52.0 h1:Rvi+3BFHES3A8meP33VPAxiBZX/Aws5RxrschYGjomw=
+github.com/samber/lo v1.52.0/go.mod h1:4+MXEGsJzbKGaUEQFKBq2xtfuznW9oz/WrgyzMzRoM0=
 github.com/shirou/gopsutil v3.21.11+incompatible h1:+1+c1VGhc88SSonWP6foOcLhvnKlUeu/erjjvaPEYiI=
 github.com/shirou/gopsutil v3.21.11+incompatible/go.mod h1:5b4v6he4MtMOwMlS0TUMTu2PcXUg8+E1lC7eC3UO/RA=
 github.com/shopspring/decimal v1.4.0 h1:bxl37RwXBklmTi0C79JfXCEBD1cqqHt0bbgBAGFp81k=
@@ -285,6 +290,8 @@ golang.org/x/crypto v0.45.0 h1:jMBrvKuj23MTlT0bQEOBcAE0mjg8mK9RXFhRH6nyF3Q=
 golang.org/x/crypto v0.45.0/go.mod h1:XTGrrkGJve7CYK7J8PEww4aY7gM3qMCElcJQ8n8JdX4=
 golang.org/x/exp v0.0.0-20240404231335-c0f41cb1a7a0 h1:985EYyeCOxTpcgOTJpflJUwOeEz0CQOdPt73OzpE9F8=
 golang.org/x/exp v0.0.0-20240404231335-c0f41cb1a7a0/go.mod h1:/lliqkxwWAhPjf5oSOIJup2XcqJaw8RGS6k3TGEc7GI=
+golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b h1:M2rDM6z3Fhozi9O7NWsxAkg/yqS/lQJ6PmkyIV3YP+o=
+golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b/go.mod h1:3//PLf8L/X+8b4vuAfHzxeRUl04Adcb341+IGKfnqS8=
 golang.org/x/image v0.23.0 h1:HseQ7c2OpPKTPVzNjG5fwJsOTCiiwS4QdsYi5XU6H68=
 golang.org/x/image v0.23.0/go.mod h1:wJJBTdLfCCf3tiHa1fNxpZmUI4mmoZvwMCPP0ddoNKY=
 golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
@@ -345,9 +352,17 @@ gorm.io/gorm v1.25.2 h1:gs1o6Vsa+oVKG/a9ElL3XgyGfghFfkKA2SInQaCyMho=
 gorm.io/gorm v1.25.2/go.mod h1:L4uxeKpfBml98NYqVqwAdmV1a2nBtAec/cf3fpucW/k=
 modernc.org/libc v1.22.5 h1:91BNch/e5B0uPbJFgqbxXuOnxBQjlS//icfQEGmvyjE=
 modernc.org/libc v1.22.5/go.mod h1:jj+Z7dTNX8fBScMVNRAYZ/jF91K8fdT2hYMThc3YjBY=
+modernc.org/libc v1.66.10 h1:yZkb3YeLx4oynyR+iUsXsybsX4Ubx7MQlSYEw4yj59A=
+modernc.org/libc v1.66.10/go.mod h1:8vGSEwvoUoltr4dlywvHqjtAqHBaw0j1jI7iFBTAr2I=
 modernc.org/mathutil v1.5.0 h1:rV0Ko/6SfM+8G+yKiyI830l3Wuz1zRutdslNoQ0kfiQ=
 modernc.org/mathutil v1.5.0/go.mod h1:mZW8CKdRPY1v87qxC/wUdX5O1qDzXMP5TH3wjfpga6E=
+modernc.org/mathutil v1.7.1 h1:GCZVGXdaN8gTqB1Mf/usp1Y/hSqgI2vAGGP4jZMCxOU=
+modernc.org/mathutil v1.7.1/go.mod h1:4p5IwJITfppl0G4sUEDtCr4DthTaT47/N3aT6MhfgJg=
 modernc.org/memory v1.5.0 h1:N+/8c5rE6EqugZwHii4IFsaJ7MUhoWX07J5tC/iI5Ds=
 modernc.org/memory v1.5.0/go.mod h1:PkUhL0Mugw21sHPeskwZW4D6VscE/GQJOnIpCnW6pSU=
+modernc.org/memory v1.11.0 h1:o4QC8aMQzmcwCK3t3Ux/ZHmwFPzE6hf2Y5LbkRs+hbI=
+modernc.org/memory v1.11.0/go.mod h1:/JP4VbVC+K5sU2wZi9bHoq2MAkCnrt2r98UGeSK7Mjw=
 modernc.org/sqlite v1.23.1 h1:nrSBg4aRQQwq59JpvGEQ15tNxoO5pX/kUjcRNwSAGQM=
 modernc.org/sqlite v1.23.1/go.mod h1:OrDj17Mggn6MhE+iPbBNf7RGKODDE9NFT0f3EwDzJqk=
+modernc.org/sqlite v1.40.1 h1:VfuXcxcUWWKRBuP8+BR9L7VnmusMgBNNnBYGEe9w/iY=
+modernc.org/sqlite v1.40.1/go.mod h1:9fjQZ0mB1LLP0GYrp39oOJXx/I2sxEnZtzCmEQIKvGE=
@@ -2,12 +2,14 @@ package middleware

 import (
 	"fmt"
+	"net"
 	"net/http"
 	"strconv"
 	"strings"

 	"github.com/QuantumNous/new-api/common"
 	"github.com/QuantumNous/new-api/constant"
+	"github.com/QuantumNous/new-api/logger"
 	"github.com/QuantumNous/new-api/model"
 	"github.com/QuantumNous/new-api/service"
 	"github.com/QuantumNous/new-api/setting/ratio_setting"
@@ -240,13 +242,20 @@ func TokenAuth() func(c *gin.Context) {
 			return
 		}

-		allowIpsMap := token.GetIpLimitsMap()
-		if len(allowIpsMap) != 0 {
+		allowIps := token.GetIpLimits()
+		if len(allowIps) > 0 {
 			clientIp := c.ClientIP()
-			if _, ok := allowIpsMap[clientIp]; !ok {
+			logger.LogDebug(c, "Token has IP restrictions, checking client IP %s", clientIp)
+			ip := net.ParseIP(clientIp)
+			if ip == nil {
+				abortWithOpenAiMessage(c, http.StatusForbidden, "无法解析客户端 IP 地址")
+				return
+			}
+			if common.IsIpInCIDRList(ip, allowIps) == false {
 				abortWithOpenAiMessage(c, http.StatusForbidden, "您的 IP 不在令牌允许访问的列表中")
 				return
 			}
+			logger.LogDebug(c, "Client IP %s passed the token IP restrictions check", clientIp)
 		}

 		userCache, err := model.GetUserCache(token.UserId)
@@ -307,7 +316,8 @@ func SetupContextForToken(c *gin.Context, token *model.Token, parts ...string) e
 	} else {
 		c.Set("token_model_limit_enabled", false)
 	}
-	c.Set("token_group", token.Group)
+	common.SetContextKey(c, constant.ContextKeyTokenGroup, token.Group)
+	common.SetContextKey(c, constant.ContextKeyTokenCrossGroupRetry, token.CrossGroupRetry)
 	if len(parts) > 1 {
 		if model.IsAdmin(token.UserId) {
 			c.Set("specific_channel_id", parts[1])
@@ -97,7 +97,12 @@ func Distribute() func(c *gin.Context) {
 						common.SetContextKey(c, constant.ContextKeyUsingGroup, usingGroup)
 					}
 				}
-				channel, selectGroup, err = service.CacheGetRandomSatisfiedChannel(c, usingGroup, modelRequest.Model, 0)
+				channel, selectGroup, err = service.CacheGetRandomSatisfiedChannel(&service.RetryParam{
+					Ctx:        c,
+					ModelName:  modelRequest.Model,
+					TokenGroup: usingGroup,
+					Retry:      common.GetPointer(0),
+				})
 				if err != nil {
 					showGroup := usingGroup
 					if usingGroup == "auto" {
@@ -157,7 +162,7 @@ func getModelRequest(c *gin.Context) (*ModelRequest, bool, error) {
 			}
 			midjourneyModel, mjErr, success := service.GetMjRequestModel(relayMode, &midjourneyRequest)
 			if mjErr != nil {
-				return nil, false, fmt.Errorf(mjErr.Description)
+				return nil, false, fmt.Errorf("%s", mjErr.Description)
 			}
 			if midjourneyModel == "" {
 				if !success {
@@ -181,6 +186,10 @@ func getModelRequest(c *gin.Context) (*ModelRequest, bool, error) {
 		}
 		c.Set("platform", string(constant.TaskPlatformSuno))
 		c.Set("relay_mode", relayMode)
+	} else if strings.Contains(c.Request.URL.Path, "/v1/videos/") && strings.HasSuffix(c.Request.URL.Path, "/remix") {
+		relayMode := relayconstant.RelayModeVideoSubmit
+		c.Set("relay_mode", relayMode)
+		shouldSelectChannel = false
 	} else if strings.Contains(c.Request.URL.Path, "/v1/videos") {
 		//curl https://api.openai.com/v1/videos \
 		//  -H "Authorization: Bearer $OPENAI_API_KEY" \
@@ -5,32 +5,69 @@ import (
 	"io"
 	"net/http"

+	"github.com/QuantumNous/new-api/constant"
 	"github.com/andybalholm/brotli"
 	"github.com/gin-gonic/gin"
 )

+type readCloser struct {
+	io.Reader
+	closeFn func() error
+}
+
+func (rc *readCloser) Close() error {
+	if rc.closeFn != nil {
+		return rc.closeFn()
+	}
+	return nil
+}
+
 func DecompressRequestMiddleware() gin.HandlerFunc {
 	return func(c *gin.Context) {
 		if c.Request.Body == nil || c.Request.Method == http.MethodGet {
 			c.Next()
 			return
 		}
+		maxMB := constant.MaxRequestBodyMB
+		if maxMB <= 0 {
+			maxMB = 32
+		}
+		maxBytes := int64(maxMB) << 20
+
+		origBody := c.Request.Body
+		wrapMaxBytes := func(body io.ReadCloser) io.ReadCloser {
+			return http.MaxBytesReader(c.Writer, body, maxBytes)
+		}
+
 		switch c.GetHeader("Content-Encoding") {
 		case "gzip":
-			gzipReader, err := gzip.NewReader(c.Request.Body)
+			gzipReader, err := gzip.NewReader(origBody)
 			if err != nil {
+				_ = origBody.Close()
 				c.AbortWithStatus(http.StatusBadRequest)
 				return
 			}
-			defer gzipReader.Close()
-
-			// Replace the request body with the decompressed data
-			c.Request.Body = io.NopCloser(gzipReader)
+			// Replace the request body with the decompressed data, and enforce a max size (post-decompression).
+			c.Request.Body = wrapMaxBytes(&readCloser{
+				Reader: gzipReader,
+				closeFn: func() error {
+					_ = gzipReader.Close()
+					return origBody.Close()
+				},
+			})
 			c.Request.Header.Del("Content-Encoding")
 		case "br":
-			reader := brotli.NewReader(c.Request.Body)
-			c.Request.Body = io.NopCloser(reader)
+			reader := brotli.NewReader(origBody)
+			c.Request.Body = wrapMaxBytes(&readCloser{
+				Reader: reader,
+				closeFn: func() error {
+					return origBody.Close()
+				},
+			})
 			c.Request.Header.Del("Content-Encoding")
+		default:
+			// Even for uncompressed bodies, enforce a max size to avoid huge request allocations.
+			c.Request.Body = wrapMaxBytes(origBody)
 		}

 		// Continue processing the request
@@ -254,6 +254,9 @@ func (channel *Channel) Save() error {
 }

 func (channel *Channel) SaveWithoutKey() error {
+	if channel.Id == 0 {
+		return errors.New("channel ID is 0")
+	}
 	return DB.Omit("key").Save(channel).Error
 }

@@ -6,7 +6,6 @@ import (
 	"strings"

 	"github.com/QuantumNous/new-api/common"
-
 	"github.com/bytedance/gopkg/util/gopool"
 	"gorm.io/gorm"
 )
@@ -27,6 +26,7 @@ type Token struct {
 	AllowIps           *string        `json:"allow_ips" gorm:"default:''"`
 	UsedQuota          int            `json:"used_quota" gorm:"default:0"` // used quota
 	Group              string         `json:"group" gorm:"default:''"`
+	CrossGroupRetry    bool           `json:"cross_group_retry" gorm:"default:false"` // 跨分组重试，仅auto分组有效
 	DeletedAt          gorm.DeletedAt `gorm:"index"`
 }

@@ -34,26 +34,26 @@ func (token *Token) Clean() {
 	token.Key = ""
 }

-func (token *Token) GetIpLimitsMap() map[string]any {
+func (token *Token) GetIpLimits() []string {
 	// delete empty spaces
 	//split with \n
-	ipLimitsMap := make(map[string]any)
+	ipLimits := make([]string, 0)
 	if token.AllowIps == nil {
-		return ipLimitsMap
+		return ipLimits
 	}
 	cleanIps := strings.ReplaceAll(*token.AllowIps, " ", "")
 	if cleanIps == "" {
-		return ipLimitsMap
+		return ipLimits
 	}
 	ips := strings.Split(cleanIps, "\n")
 	for _, ip := range ips {
 		ip = strings.TrimSpace(ip)
 		ip = strings.ReplaceAll(ip, ",", "")
-		if common.IsIP(ip) {
-			ipLimitsMap[ip] = true
+		if ip != "" {
+			ipLimits = append(ipLimits, ip)
 		}
 	}
-	return ipLimitsMap
+	return ipLimits
 }

 func GetAllUserTokens(userId int, startIdx int, num int) ([]*Token, error) {
@@ -112,7 +112,12 @@ func ValidateUserToken(key string) (token *Token, err error) {
 		}
 		return token, nil
 	}
-	return nil, errors.New("无效的令牌")
+	common.SysLog("ValidateUserToken: failed to get token: " + err.Error())
+	if errors.Is(err, gorm.ErrRecordNotFound) {
+		return nil, errors.New("无效的令牌")
+	} else {
+		return nil, errors.New("无效的令牌，数据库查询出错，请联系管理员")
+	}
 }

 func GetTokenByIds(id int, userId int) (*Token, error) {
@@ -185,7 +190,7 @@ func (token *Token) Update() (err error) {
 		}
 	}()
 	err = DB.Model(token).Select("name", "status", "expired_time", "remain_quota", "unlimited_quota",
-		"model_limits_enabled", "model_limits", "allow_ips", "group").Updates(token).Error
+		"model_limits_enabled", "model_limits", "allow_ips", "group", "cross_group_retry").Updates(token).Error
 	return err
 }

@@ -67,8 +67,11 @@ func AudioHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *type
 		service.ResetStatusCode(newAPIError, statusCodeMappingStr)
 		return newAPIError
 	}
-
-	postConsumeQuota(c, info, usage.(*dto.Usage), "")
+	if usage.(*dto.Usage).CompletionTokenDetails.AudioTokens > 0 || usage.(*dto.Usage).PromptTokensDetails.AudioTokens > 0 {
+		service.PostAudioConsumeQuota(c, info, usage.(*dto.Usage), "")
+	} else {
+		postConsumeQuota(c, info, usage.(*dto.Usage), "")
+	}

 	return nil
 }
@@ -47,7 +47,7 @@ type TaskAdaptor interface {
 	GetChannelName() string

 	// FetchTask
-	FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error)
+	FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error)

 	ParseTaskResult(respBody []byte) (*relaycommon.TaskInfo, error)
 }
@@ -18,7 +18,7 @@ var awsModelIDMap = map[string]string{
 	"claude-opus-4-1-20250805":   "anthropic.claude-opus-4-1-20250805-v1:0",
 	"claude-sonnet-4-5-20250929": "anthropic.claude-sonnet-4-5-20250929-v1:0",
 	"claude-haiku-4-5-20251001":  "anthropic.claude-haiku-4-5-20251001-v1:0",
-	"claude-opus-4-5-20251101":  "anthropic.claude-opus-4-5-20251101-v1:0",
+	"claude-opus-4-5-20251101":   "anthropic.claude-opus-4-5-20251101-v1:0",
 	// Nova models
 	"nova-micro-v1:0":   "amazon.nova-micro-v1:0",
 	"nova-lite-v1:0":    "amazon.nova-lite-v1:0",
@@ -18,6 +18,7 @@ import (
 	"github.com/gin-gonic/gin"
 	"github.com/pkg/errors"

+	"github.com/QuantumNous/new-api/setting/model_setting"
 	"github.com/aws/aws-sdk-go-v2/aws"
 	"github.com/aws/aws-sdk-go-v2/credentials"
 	"github.com/aws/aws-sdk-go-v2/service/bedrockruntime"
@@ -129,7 +130,7 @@ func doAwsClientRequest(c *gin.Context, info *relaycommon.RelayInfo, a *Adaptor,
 				Accept:      aws.String("application/json"),
 				ContentType: aws.String("application/json"),
 			}
-			awsReq.Body, err = common.Marshal(awsClaudeReq)
+			awsReq.Body, err = buildAwsRequestBody(c, info, awsClaudeReq)
 			if err != nil {
 				return nil, types.NewError(errors.Wrap(err, "marshal aws request fail"), types.ErrorCodeBadRequestBody)
 			}
@@ -141,7 +142,7 @@ func doAwsClientRequest(c *gin.Context, info *relaycommon.RelayInfo, a *Adaptor,
 				Accept:      aws.String("application/json"),
 				ContentType: aws.String("application/json"),
 			}
-			awsReq.Body, err = common.Marshal(awsClaudeReq)
+			awsReq.Body, err = buildAwsRequestBody(c, info, awsClaudeReq)
 			if err != nil {
 				return nil, types.NewError(errors.Wrap(err, "marshal aws request fail"), types.ErrorCodeBadRequestBody)
 			}
@@ -151,6 +152,24 @@ func doAwsClientRequest(c *gin.Context, info *relaycommon.RelayInfo, a *Adaptor,
 	}
 }

+// buildAwsRequestBody prepares the payload for AWS requests, applying passthrough rules when enabled.
+func buildAwsRequestBody(c *gin.Context, info *relaycommon.RelayInfo, awsClaudeReq any) ([]byte, error) {
+	if model_setting.GetGlobalSettings().PassThroughRequestEnabled || info.ChannelSetting.PassThroughBodyEnabled {
+		body, err := common.GetRequestBody(c)
+		if err != nil {
+			return nil, errors.Wrap(err, "get request body for pass-through fail")
+		}
+		var data map[string]interface{}
+		if err := common.Unmarshal(body, &data); err != nil {
+			return nil, errors.Wrap(err, "pass-through unmarshal request body fail")
+		}
+		delete(data, "model")
+		delete(data, "stream")
+		return common.Marshal(data)
+	}
+	return common.Marshal(awsClaudeReq)
+}
+
 func getAwsRegionPrefix(awsRegionId string) string {
 	parts := strings.Split(awsRegionId, "-")
 	regionPrefix := ""
@@ -150,7 +150,7 @@ func baiduHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Respon
 		return types.NewError(err, types.ErrorCodeBadResponseBody), nil
 	}
 	if baiduResponse.ErrorMsg != "" {
-		return types.NewError(fmt.Errorf(baiduResponse.ErrorMsg), types.ErrorCodeBadResponseBody), nil
+		return types.NewError(fmt.Errorf("%s", baiduResponse.ErrorMsg), types.ErrorCodeBadResponseBody), nil
 	}
 	fullTextResponse := responseBaidu2OpenAI(&baiduResponse)
 	jsonResponse, err := json.Marshal(fullTextResponse)
@@ -175,7 +175,7 @@ func baiduEmbeddingHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *ht
 		return types.NewError(err, types.ErrorCodeBadResponseBody), nil
 	}
 	if baiduResponse.ErrorMsg != "" {
-		return types.NewError(fmt.Errorf(baiduResponse.ErrorMsg), types.ErrorCodeBadResponseBody), nil
+		return types.NewError(fmt.Errorf("%s", baiduResponse.ErrorMsg), types.ErrorCodeBadResponseBody), nil
 	}
 	fullTextResponse := embeddingResponseBaidu2OpenAI(&baiduResponse)
 	jsonResponse, err := json.Marshal(fullTextResponse)
@@ -9,6 +9,7 @@ var ModelList = []string{
 	"claude-3-opus-20240229",
 	"claude-3-haiku-20240307",
 	"claude-3-5-haiku-20241022",
+	"claude-haiku-4-5-20251001",
 	"claude-3-5-sonnet-20240620",
 	"claude-3-5-sonnet-20241022",
 	"claude-3-7-sonnet-20250219",
@@ -483,9 +483,11 @@ func StreamResponseClaude2OpenAI(reqMode int, claudeResponse *dto.ClaudeResponse
 				}
 			}
 		} else if claudeResponse.Type == "message_delta" {
-			finishReason := stopReasonClaude2OpenAI(*claudeResponse.Delta.StopReason)
-			if finishReason != "null" {
-				choice.FinishReason = &finishReason
+			if claudeResponse.Delta != nil && claudeResponse.Delta.StopReason != nil {
+				finishReason := stopReasonClaude2OpenAI(*claudeResponse.Delta.StopReason)
+				if finishReason != "null" {
+					choice.FinishReason = &finishReason
+				}
 			}
 			//claudeUsage = &claudeResponse.Usage
 		} else if claudeResponse.Type == "message_stop" {
@@ -673,7 +675,7 @@ func HandleStreamResponseData(c *gin.Context, info *relaycommon.RelayInfo, claud
 func HandleStreamFinalResponse(c *gin.Context, info *relaycommon.RelayInfo, claudeInfo *ClaudeResponseInfo, requestMode int) {

 	if requestMode == RequestModeCompletion {
-		claudeInfo.Usage = service.ResponseText2Usage(c, claudeInfo.ResponseText.String(), info.UpstreamModelName, info.PromptTokens)
+		claudeInfo.Usage = service.ResponseText2Usage(c, claudeInfo.ResponseText.String(), info.UpstreamModelName, info.GetEstimatePromptTokens())
 	} else {
 		if claudeInfo.Usage.PromptTokens == 0 {
 			//上游出错
@@ -734,10 +736,7 @@ func HandleClaudeResponseData(c *gin.Context, info *relaycommon.RelayInfo, claud
 		return types.WithClaudeError(*claudeError, http.StatusInternalServerError)
 	}
 	if requestMode == RequestModeCompletion {
-		completionTokens := service.CountTextToken(claudeResponse.Completion, info.OriginModelName)
-		claudeInfo.Usage.PromptTokens = info.PromptTokens
-		claudeInfo.Usage.CompletionTokens = completionTokens
-		claudeInfo.Usage.TotalTokens = info.PromptTokens + completionTokens
+		claudeInfo.Usage = service.ResponseText2Usage(c, claudeResponse.Completion, info.UpstreamModelName, info.GetEstimatePromptTokens())
 	} else {
 		claudeInfo.Usage.PromptTokens = claudeResponse.Usage.InputTokens
 		claudeInfo.Usage.CompletionTokens = claudeResponse.Usage.OutputTokens
@@ -74,7 +74,7 @@ func cfStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Res
 	if err := scanner.Err(); err != nil {
 		logger.LogError(c, "error_scanning_stream_response: "+err.Error())
 	}
-	usage := service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.PromptTokens)
+	usage := service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.GetEstimatePromptTokens())
 	if info.ShouldIncludeUsage {
 		response := helper.GenerateFinalUsageResponse(id, info.StartTime.Unix(), info.UpstreamModelName, *usage)
 		err := helper.ObjectData(c, response)
@@ -105,7 +105,7 @@ func cfHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Response)
 	for _, choice := range response.Choices {
 		responseText += choice.Message.StringContent()
 	}
-	usage := service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.PromptTokens)
+	usage := service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.GetEstimatePromptTokens())
 	response.Usage = *usage
 	response.Id = helper.GetResponseID(c)
 	jsonResponse, err := json.Marshal(response)
@@ -142,10 +142,6 @@ func cfSTTHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Respon
 	c.Writer.WriteHeader(resp.StatusCode)
 	_, _ = c.Writer.Write(jsonResponse)

-	usage := &dto.Usage{}
-	usage.PromptTokens = info.PromptTokens
-	usage.CompletionTokens = service.CountTextToken(cfResp.Result.Text, info.UpstreamModelName)
-	usage.TotalTokens = usage.PromptTokens + usage.CompletionTokens
-
+	usage := service.ResponseText2Usage(c, cfResp.Result.Text, info.UpstreamModelName, info.GetEstimatePromptTokens())
 	return nil, usage
 }
@@ -165,7 +165,7 @@ func cohereStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http
 		}
 	})
 	if usage.PromptTokens == 0 {
-		usage = service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.PromptTokens)
+		usage = service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.GetEstimatePromptTokens())
 	}
 	return usage, nil
 }
@@ -225,9 +225,9 @@ func cohereRerankHandler(c *gin.Context, resp *http.Response, info *relaycommon.
 	}
 	usage := dto.Usage{}
 	if cohereResp.Meta.BilledUnits.InputTokens == 0 {
-		usage.PromptTokens = info.PromptTokens
+		usage.PromptTokens = info.GetEstimatePromptTokens()
 		usage.CompletionTokens = 0
-		usage.TotalTokens = info.PromptTokens
+		usage.TotalTokens = info.GetEstimatePromptTokens()
 	} else {
 		usage.PromptTokens = cohereResp.Meta.BilledUnits.InputTokens
 		usage.CompletionTokens = cohereResp.Meta.BilledUnits.OutputTokens
@@ -208,7 +208,7 @@ func handleCozeEvent(c *gin.Context, event string, data string, responseText *st
 			return
 		}

-		common.SysLog(fmt.Sprintf("stream event error: ", errorData.Code, errorData.Message))
+		common.SysLog(fmt.Sprintf("stream event error: %v %v", errorData.Code, errorData.Message))
 	}
 }

@@ -246,7 +246,7 @@ func difyStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.R
 	})
 	helper.Done(c)
 	if usage.TotalTokens == 0 {
-		usage = service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.PromptTokens)
+		usage = service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.GetEstimatePromptTokens())
 	}
 	usage.CompletionTokens += nodeToken
 	return usage, nil
@@ -137,6 +137,8 @@ func (a *Adaptor) GetRequestURL(info *relaycommon.RelayInfo) (string, error) {
 			info.UpstreamModelName = strings.TrimSuffix(info.UpstreamModelName, "-thinking")
 		} else if strings.HasSuffix(info.UpstreamModelName, "-nothinking") {
 			info.UpstreamModelName = strings.TrimSuffix(info.UpstreamModelName, "-nothinking")
+		} else if baseModel, level := parseThinkingLevelSuffix(info.UpstreamModelName); level != "" {
+			info.UpstreamModelName = baseModel
 		}
 	}

@@ -5,7 +5,6 @@ import (
 	"net/http"

 	"github.com/QuantumNous/new-api/common"
-	"github.com/QuantumNous/new-api/constant"
 	"github.com/QuantumNous/new-api/dto"
 	"github.com/QuantumNous/new-api/logger"
 	relaycommon "github.com/QuantumNous/new-api/relay/common"
@@ -70,12 +69,7 @@ func NativeGeminiEmbeddingHandler(c *gin.Context, resp *http.Response, info *rel
 		println(string(responseBody))
 	}

-	usage := &dto.Usage{
-		PromptTokens: info.PromptTokens,
-		TotalTokens:  info.PromptTokens,
-	}
-
-	common.SetContextKey(c, constant.ContextKeyLocalCountTokens, true)
+	usage := service.ResponseText2Usage(c, "", info.UpstreamModelName, info.GetEstimatePromptTokens())

 	if info.IsGeminiBatchEmbedding {
 		var geminiResponse dto.GeminiBatchEmbeddingResponse
@@ -100,10 +94,10 @@ func GeminiTextGenerationStreamHandler(c *gin.Context, info *relaycommon.RelayIn
 	helper.SetEventStreamHeaders(c)

 	return geminiStreamHandler(c, info, resp, func(data string, geminiResponse *dto.GeminiChatResponse) bool {
-		// 直接发送 GeminiChatResponse 响应
 		err := helper.StringData(c, data)
 		if err != nil {
-			logger.LogError(c, err.Error())
+			logger.LogError(c, "failed to write stream data: "+err.Error())
+			return false
 		}
 		info.SendResponseCount++
 		return true
@@ -19,8 +19,8 @@ import (
 	"github.com/QuantumNous/new-api/relay/helper"
 	"github.com/QuantumNous/new-api/service"
 	"github.com/QuantumNous/new-api/setting/model_setting"
+	"github.com/QuantumNous/new-api/setting/reasoning"
 	"github.com/QuantumNous/new-api/types"
-
 	"github.com/gin-gonic/gin"
 )

@@ -122,6 +122,14 @@ func clampThinkingBudgetByEffort(modelName string, effort string) int {
 	return clampThinkingBudget(modelName, maxBudget)
 }

+func parseThinkingLevelSuffix(modelName string) (string, string) {
+	base, level, ok := reasoning.TrimEffortSuffix(modelName)
+	if !ok {
+		return modelName, ""
+	}
+	return base, level
+}
+
 func ThinkingAdaptor(geminiRequest *dto.GeminiChatRequest, info *relaycommon.RelayInfo, oaiRequest ...dto.GeneralOpenAIRequest) {
 	if model_setting.GetGeminiSettings().ThinkingAdapterEnabled {
 		modelName := info.UpstreamModelName
@@ -178,6 +186,12 @@ func ThinkingAdaptor(geminiRequest *dto.GeminiChatRequest, info *relaycommon.Rel
 					ThinkingBudget: common.GetPointer(0),
 				}
 			}
+		} else if _, level := parseThinkingLevelSuffix(modelName); level != "" {
+			geminiRequest.GenerationConfig.ThinkingConfig = &dto.GeminiThinkingConfig{
+				IncludeThoughts: true,
+				ThinkingLevel:   level,
+			}
+			info.ReasoningEffort = level
 		}
 	}
 }
@@ -1115,7 +1129,7 @@ func geminiStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http
 	if usage.CompletionTokens <= 0 {
 		str := responseText.String()
 		if len(str) > 0 {
-			usage = service.ResponseText2Usage(c, responseText.String(), info.UpstreamModelName, info.PromptTokens)
+			usage = service.ResponseText2Usage(c, responseText.String(), info.UpstreamModelName, info.GetEstimatePromptTokens())
 		} else {
 			usage = &dto.Usage{}
 		}
@@ -1288,11 +1302,7 @@ func GeminiEmbeddingHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *h
 	// Google has not yet clarified how embedding models will be billed
 	// refer to openai billing method to use input tokens billing
 	// https://platform.openai.com/docs/guides/embeddings#what-are-embeddings
-	usage := &dto.Usage{
-		PromptTokens:     info.PromptTokens,
-		CompletionTokens: 0,
-		TotalTokens:      info.PromptTokens,
-	}
+	usage := service.ResponseText2Usage(c, "", info.UpstreamModelName, info.GetEstimatePromptTokens())
 	openAIResponse.Usage = *usage

 	jsonResponse, jsonErr := common.Marshal(openAIResponse)
@@ -163,7 +163,7 @@ func handleTTSResponse(c *gin.Context, resp *http.Response, info *relaycommon.Re
 	}

 	usage = &dto.Usage{
-		PromptTokens:     info.PromptTokens,
+		PromptTokens:     info.GetEstimatePromptTokens(),
 		CompletionTokens: 0,
 		TotalTokens:      int(minimaxResp.ExtraInfo.UsageCharacters),
 	}
@@ -42,7 +42,7 @@ type Adaptor struct {
 // support OAI models: o1-mini/o3-mini/o4-mini/o1/o3 etc...
 // minimal effort only available in gpt-5
 func parseReasoningEffortFromModelSuffix(model string) (string, string) {
-	effortSuffixes := []string{"-high", "-minimal", "-low", "-medium", "-none"}
+	effortSuffixes := []string{"-high", "-minimal", "-low", "-medium", "-none", "-xhigh"}
 	for _, suffix := range effortSuffixes {
 		if strings.HasSuffix(model, suffix) {
 			effort := strings.TrimPrefix(suffix, "-")
@@ -306,10 +306,11 @@ func (a *Adaptor) ConvertOpenAIRequest(c *gin.Context, info *relaycommon.RelayIn
 			request.Temperature = nil
 		}

+		// gpt-5系列模型适配 归零不再支持的参数
 		if strings.HasPrefix(info.UpstreamModelName, "gpt-5") {
-			if info.UpstreamModelName != "gpt-5-chat-latest" {
-				request.Temperature = nil
-			}
+			request.Temperature = nil
+			request.TopP = 0 // oai 的 top_p 默认值是 1.0，但是为了 omitempty 属性直接不传，这里显式设置为 0
+			request.LogProbs = false
 		}

 		// 转换模型推理力度后缀
@@ -0,0 +1,145 @@
+package openai
+
+import (
+	"bytes"
+	"fmt"
+	"io"
+	"math"
+	"net/http"
+
+	"github.com/QuantumNous/new-api/common"
+	"github.com/QuantumNous/new-api/constant"
+	"github.com/QuantumNous/new-api/dto"
+	"github.com/QuantumNous/new-api/logger"
+	relaycommon "github.com/QuantumNous/new-api/relay/common"
+	"github.com/QuantumNous/new-api/relay/helper"
+	"github.com/QuantumNous/new-api/service"
+	"github.com/QuantumNous/new-api/types"
+	"github.com/gin-gonic/gin"
+)
+
+func OpenaiTTSHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) *dto.Usage {
+	// the status code has been judged before, if there is a body reading failure,
+	// it should be regarded as a non-recoverable error, so it should not return err for external retry.
+	// Analogous to nginx's load balancing, it will only retry if it can't be requested or
+	// if the upstream returns a specific status code, once the upstream has already written the header,
+	// the subsequent failure of the response body should be regarded as a non-recoverable error,
+	// and can be terminated directly.
+	defer service.CloseResponseBodyGracefully(resp)
+	usage := &dto.Usage{}
+	usage.PromptTokens = info.GetEstimatePromptTokens()
+	usage.TotalTokens = info.GetEstimatePromptTokens()
+	for k, v := range resp.Header {
+		c.Writer.Header().Set(k, v[0])
+	}
+	c.Writer.WriteHeader(resp.StatusCode)
+
+	if info.IsStream {
+		helper.StreamScannerHandler(c, resp, info, func(data string) bool {
+			if service.SundaySearch(data, "usage") {
+				var simpleResponse dto.SimpleResponse
+				err := common.Unmarshal([]byte(data), &simpleResponse)
+				if err != nil {
+					logger.LogError(c, err.Error())
+				}
+				if simpleResponse.Usage.TotalTokens != 0 {
+					usage.PromptTokens = simpleResponse.Usage.InputTokens
+					usage.CompletionTokens = simpleResponse.OutputTokens
+					usage.TotalTokens = simpleResponse.TotalTokens
+				}
+			}
+			_ = helper.StringData(c, data)
+			return true
+		})
+	} else {
+		common.SetContextKey(c, constant.ContextKeyLocalCountTokens, true)
+		// 读取响应体到缓冲区
+		bodyBytes, err := io.ReadAll(resp.Body)
+		if err != nil {
+			logger.LogError(c, fmt.Sprintf("failed to read TTS response body: %v", err))
+			c.Writer.WriteHeaderNow()
+			return usage
+		}
+
+		// 写入响应到客户端
+		c.Writer.WriteHeaderNow()
+		_, err = c.Writer.Write(bodyBytes)
+		if err != nil {
+			logger.LogError(c, fmt.Sprintf("failed to write TTS response: %v", err))
+		}
+
+		// 计算音频时长并更新 usage
+		audioFormat := "mp3" // 默认格式
+		if audioReq, ok := info.Request.(*dto.AudioRequest); ok && audioReq.ResponseFormat != "" {
+			audioFormat = audioReq.ResponseFormat
+		}
+
+		var duration float64
+		var durationErr error
+
+		if audioFormat == "pcm" {
+			// PCM 格式没有文件头，根据 OpenAI TTS 的 PCM 参数计算时长
+			// 采样率: 24000 Hz, 位深度: 16-bit (2 bytes), 声道数: 1
+			const sampleRate = 24000
+			const bytesPerSample = 2
+			const channels = 1
+			duration = float64(len(bodyBytes)) / float64(sampleRate*bytesPerSample*channels)
+		} else {
+			ext := "." + audioFormat
+			reader := bytes.NewReader(bodyBytes)
+			duration, durationErr = common.GetAudioDuration(c.Request.Context(), reader, ext)
+		}
+
+		usage.PromptTokensDetails.TextTokens = usage.PromptTokens
+
+		if durationErr != nil {
+			logger.LogWarn(c, fmt.Sprintf("failed to get audio duration: %v", durationErr))
+			// 如果无法获取时长，则设置保底的 CompletionTokens，根据body大小计算
+			sizeInKB := float64(len(bodyBytes)) / 1000.0
+			estimatedTokens := int(math.Ceil(sizeInKB)) // 粗略估算每KB约等于1 token
+			usage.CompletionTokens = estimatedTokens
+			usage.CompletionTokenDetails.AudioTokens = estimatedTokens
+		} else if duration > 0 {
+			// 计算 token: ceil(duration) / 60.0 * 1000，即每分钟 1000 tokens
+			completionTokens := int(math.Round(math.Ceil(duration) / 60.0 * 1000))
+			usage.CompletionTokens = completionTokens
+			usage.CompletionTokenDetails.AudioTokens = completionTokens
+		}
+		usage.TotalTokens = usage.PromptTokens + usage.CompletionTokens
+	}
+
+	return usage
+}
+
+func OpenaiSTTHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo, responseFormat string) (*types.NewAPIError, *dto.Usage) {
+	defer service.CloseResponseBodyGracefully(resp)
+
+	responseBody, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return types.NewOpenAIError(err, types.ErrorCodeReadResponseBodyFailed, http.StatusInternalServerError), nil
+	}
+	// 写入新的 response body
+	service.IOCopyBytesGracefully(c, resp, responseBody)
+
+	var responseData struct {
+		Usage *dto.Usage `json:"usage"`
+	}
+	if err := common.Unmarshal(responseBody, &responseData); err == nil && responseData.Usage != nil {
+		if responseData.Usage.TotalTokens > 0 {
+			usage := responseData.Usage
+			if usage.PromptTokens == 0 {
+				usage.PromptTokens = usage.InputTokens
+			}
+			if usage.CompletionTokens == 0 {
+				usage.CompletionTokens = usage.OutputTokens
+			}
+			return nil, usage
+		}
+	}
+
+	usage := &dto.Usage{}
+	usage.PromptTokens = info.GetEstimatePromptTokens()
+	usage.CompletionTokens = 0
+	usage.TotalTokens = usage.PromptTokens + usage.CompletionTokens
+	return nil, usage
+}
@@ -172,7 +172,7 @@ func handleLastResponse(lastStreamData string, responseId *string, createAt *int
 	shouldSendLastResp *bool) error {

 	var lastStreamResponse dto.ChatCompletionsStreamResponse
-	if err := json.Unmarshal(common.StringToByteSlice(lastStreamData), &lastStreamResponse); err != nil {
+	if err := common.Unmarshal(common.StringToByteSlice(lastStreamData), &lastStreamResponse); err != nil {
 		return err
 	}

@@ -1,7 +1,6 @@
 package openai

 import (
-	"encoding/json"
 	"fmt"
 	"io"
 	"net/http"
@@ -151,7 +150,7 @@ func OaiStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Re
 		var streamResp struct {
 			Usage *dto.Usage `json:"usage"`
 		}
-		err := json.Unmarshal([]byte(secondLastStreamData), &streamResp)
+		err := common.Unmarshal([]byte(secondLastStreamData), &streamResp)
 		if err == nil && streamResp.Usage != nil && service.ValidUsage(streamResp.Usage) {
 			usage = streamResp.Usage
 			containStreamUsage = true
@@ -183,7 +182,7 @@ func OaiStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Re
 	}

 	if !containStreamUsage {
-		usage = service.ResponseText2Usage(c, responseTextBuilder.String(), info.UpstreamModelName, info.PromptTokens)
+		usage = service.ResponseText2Usage(c, responseTextBuilder.String(), info.UpstreamModelName, info.GetEstimatePromptTokens())
 		usage.CompletionTokens += toolCount * 7
 	}

@@ -245,9 +244,9 @@ func OpenaiHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Respo
 			}
 		}
 		simpleResponse.Usage = dto.Usage{
-			PromptTokens:     info.PromptTokens,
+			PromptTokens:     info.GetEstimatePromptTokens(),
 			CompletionTokens: completionTokens,
-			TotalTokens:      info.PromptTokens + completionTokens,
+			TotalTokens:      info.GetEstimatePromptTokens() + completionTokens,
 		}
 		usageModified = true
 	}
@@ -327,68 +326,6 @@ func streamTTSResponse(c *gin.Context, resp *http.Response) {
 	}
 }

-func OpenaiTTSHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) *dto.Usage {
-	// the status code has been judged before, if there is a body reading failure,
-	// it should be regarded as a non-recoverable error, so it should not return err for external retry.
-	// Analogous to nginx's load balancing, it will only retry if it can't be requested or
-	// if the upstream returns a specific status code, once the upstream has already written the header,
-	// the subsequent failure of the response body should be regarded as a non-recoverable error,
-	// and can be terminated directly.
-	defer service.CloseResponseBodyGracefully(resp)
-	usage := &dto.Usage{}
-	usage.PromptTokens = info.PromptTokens
-	usage.TotalTokens = info.PromptTokens
-	for k, v := range resp.Header {
-		c.Writer.Header().Set(k, v[0])
-	}
-	c.Writer.WriteHeader(resp.StatusCode)
-
-	isStreaming := resp.ContentLength == -1 || resp.Header.Get("Content-Length") == ""
-	if isStreaming {
-		streamTTSResponse(c, resp)
-	} else {
-		c.Writer.WriteHeaderNow()
-		_, err := io.Copy(c.Writer, resp.Body)
-		if err != nil {
-			logger.LogError(c, err.Error())
-		}
-	}
-	return usage
-}
-
-func OpenaiSTTHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo, responseFormat string) (*types.NewAPIError, *dto.Usage) {
-	defer service.CloseResponseBodyGracefully(resp)
-
-	responseBody, err := io.ReadAll(resp.Body)
-	if err != nil {
-		return types.NewOpenAIError(err, types.ErrorCodeReadResponseBodyFailed, http.StatusInternalServerError), nil
-	}
-	// 写入新的 response body
-	service.IOCopyBytesGracefully(c, resp, responseBody)
-
-	var responseData struct {
-		Usage *dto.Usage `json:"usage"`
-	}
-	if err := json.Unmarshal(responseBody, &responseData); err == nil && responseData.Usage != nil {
-		if responseData.Usage.TotalTokens > 0 {
-			usage := responseData.Usage
-			if usage.PromptTokens == 0 {
-				usage.PromptTokens = usage.InputTokens
-			}
-			if usage.CompletionTokens == 0 {
-				usage.CompletionTokens = usage.OutputTokens
-			}
-			return nil, usage
-		}
-	}
-
-	usage := &dto.Usage{}
-	usage.PromptTokens = info.PromptTokens
-	usage.CompletionTokens = 0
-	usage.TotalTokens = usage.PromptTokens + usage.CompletionTokens
-	return nil, usage
-}
-
 func OpenaiRealtimeHandler(c *gin.Context, info *relaycommon.RelayInfo) (*types.NewAPIError, *dto.RealtimeUsage) {
 	if info == nil || info.ClientWs == nil || info.TargetWs == nil {
 		return types.NewError(fmt.Errorf("invalid websocket connection"), types.ErrorCodeBadResponse), nil
@@ -659,7 +596,7 @@ func applyUsagePostProcessing(info *relaycommon.RelayInfo, usage *dto.Usage, res
 		if usage.PromptTokensDetails.CachedTokens == 0 && usage.PromptCacheHitTokens != 0 {
 			usage.PromptTokensDetails.CachedTokens = usage.PromptCacheHitTokens
 		}
-	case constant.ChannelTypeZhipu_v4:
+	case constant.ChannelTypeZhipu_v4, constant.ChannelTypeMoonshot:
 		if usage.PromptTokensDetails.CachedTokens == 0 {
 			if usage.InputTokensDetails != nil && usage.InputTokensDetails.CachedTokens > 0 {
 				usage.PromptTokensDetails.CachedTokens = usage.InputTokensDetails.CachedTokens
@@ -687,7 +624,7 @@ func extractCachedTokensFromBody(body []byte) (int, bool) {
 		} `json:"usage"`
 	}

-	if err := json.Unmarshal(body, &payload); err != nil {
+	if err := common.Unmarshal(body, &payload); err != nil {
 		return 0, false
 	}

@@ -141,7 +141,7 @@ func OaiResponsesStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp
 	}

 	if usage.PromptTokens == 0 && usage.CompletionTokens != 0 {
-		usage.PromptTokens = info.PromptTokens
+		usage.PromptTokens = info.GetEstimatePromptTokens()
 	}

 	usage.TotalTokens = usage.PromptTokens + usage.CompletionTokens
@@ -81,7 +81,7 @@ func (a *Adaptor) DoResponse(c *gin.Context, resp *http.Response, info *relaycom
 	if info.IsStream {
 		var responseText string
 		err, responseText = palmStreamHandler(c, resp)
-		usage = service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.PromptTokens)
+		usage = service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.GetEstimatePromptTokens())
 	} else {
 		usage, err = palmHandler(c, info, resp)
 	}
@@ -121,13 +121,8 @@ func palmHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Respons
 		}, resp.StatusCode)
 	}
 	fullTextResponse := responsePaLM2OpenAI(&palmResponse)
-	completionTokens := service.CountTextToken(palmResponse.Candidates[0].Content, info.UpstreamModelName)
-	usage := dto.Usage{
-		PromptTokens:     info.PromptTokens,
-		CompletionTokens: completionTokens,
-		TotalTokens:      info.PromptTokens + completionTokens,
-	}
-	fullTextResponse.Usage = usage
+	usage := service.ResponseText2Usage(c, palmResponse.Candidates[0].Content, info.UpstreamModelName, info.GetEstimatePromptTokens())
+	fullTextResponse.Usage = *usage
 	jsonResponse, err := common.Marshal(fullTextResponse)
 	if err != nil {
 		return nil, types.NewError(err, types.ErrorCodeBadResponseBody)
@@ -135,5 +130,5 @@ func palmHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Respons
 	c.Writer.Header().Set("Content-Type", "application/json")
 	c.Writer.WriteHeader(resp.StatusCode)
 	service.IOCopyBytesGracefully(c, resp, jsonResponse)
-	return &usage, nil
+	return usage, nil
 }
@@ -393,7 +393,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 }

 // FetchTask 查询任务状态
-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	taskID, ok := body["task_id"].(string)
 	if !ok {
 		return nil, fmt.Errorf("invalid task_id")
@@ -408,7 +408,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http

 	req.Header.Set("Authorization", "Bearer "+key)

-	return service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
+	}
+	return client.Do(req)
 }

 func (a *TaskAdaptor) GetModelList() []string {
@@ -146,7 +146,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 }

 // FetchTask fetch task status
-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	taskID, ok := body["task_id"].(string)
 	if !ok {
 		return nil, fmt.Errorf("invalid task_id")
@@ -163,7 +163,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http
 	req.Header.Set("Content-Type", "application/json")
 	req.Header.Set("Authorization", "Bearer "+key)

-	return service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
+	}
+	return client.Do(req)
 }

 func (a *TaskAdaptor) GetModelList() []string {
@@ -200,7 +200,7 @@ func (a *TaskAdaptor) GetChannelName() string {
 }

 // FetchTask fetch task status
-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	taskID, ok := body["task_id"].(string)
 	if !ok {
 		return nil, fmt.Errorf("invalid task_id")
@@ -223,7 +223,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http
 	req.Header.Set("Accept", "application/json")
 	req.Header.Set("x-goog-api-key", key)

-	return service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
+	}
+	return client.Do(req)
 }

 func (a *TaskAdaptor) ParseTaskResult(respBody []byte) (*relaycommon.TaskInfo, error) {
@@ -110,7 +110,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 	return hResp.TaskID, responseBody, nil
 }

-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	taskID, ok := body["task_id"].(string)
 	if !ok {
 		return nil, fmt.Errorf("invalid task_id")
@@ -126,7 +126,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http
 	req.Header.Set("Accept", "application/json")
 	req.Header.Set("Authorization", "Bearer "+key)

-	return service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
+	}
+	return client.Do(req)
 }

 func (a *TaskAdaptor) GetModelList() []string {
@@ -196,7 +196,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 	}

 	if jResp.Code != 10000 {
-		taskErr = service.TaskErrorWrapper(fmt.Errorf(jResp.Message), fmt.Sprintf("%d", jResp.Code), http.StatusInternalServerError)
+		taskErr = service.TaskErrorWrapper(fmt.Errorf("%s", jResp.Message), fmt.Sprintf("%d", jResp.Code), http.StatusInternalServerError)
 		return
 	}

@@ -210,7 +210,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 }

 // FetchTask fetch task status
-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	taskID, ok := body["task_id"].(string)
 	if !ok {
 		return nil, fmt.Errorf("invalid task_id")
@@ -251,7 +251,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http
 			return nil, errors.Wrap(err, "sign request failed")
 		}
 	}
-	return service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
+	}
+	return client.Do(req)
 }

 func (a *TaskAdaptor) GetModelList() []string {
@@ -186,7 +186,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 		return
 	}
 	if kResp.Code != 0 {
-		taskErr = service.TaskErrorWrapperLocal(fmt.Errorf(kResp.Message), "task_failed", http.StatusBadRequest)
+		taskErr = service.TaskErrorWrapperLocal(fmt.Errorf("%s", kResp.Message), "task_failed", http.StatusBadRequest)
 		return
 	}
 	ov := dto.NewOpenAIVideo()
@@ -199,7 +199,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 }

 // FetchTask fetch task status
-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	taskID, ok := body["task_id"].(string)
 	if !ok {
 		return nil, fmt.Errorf("invalid task_id")
@@ -228,7 +228,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http
 	req.Header.Set("Authorization", "Bearer "+token)
 	req.Header.Set("User-Agent", "kling-sdk/1.0")

-	return service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
+	}
+	return client.Do(req)
 }

 func (a *TaskAdaptor) GetModelList() []string {
@@ -5,8 +5,10 @@ import (
 	"fmt"
 	"io"
 	"net/http"
+	"strings"

 	"github.com/QuantumNous/new-api/common"
+	"github.com/QuantumNous/new-api/constant"
 	"github.com/QuantumNous/new-api/dto"
 	"github.com/QuantumNous/new-api/model"
 	"github.com/QuantumNous/new-api/relay/channel"
@@ -67,11 +69,30 @@ func (a *TaskAdaptor) Init(info *relaycommon.RelayInfo) {
 	a.apiKey = info.ApiKey
 }

+func validateRemixRequest(c *gin.Context) *dto.TaskError {
+	var req struct {
+		Prompt string `json:"prompt"`
+	}
+	if err := common.UnmarshalBodyReusable(c, &req); err != nil {
+		return service.TaskErrorWrapperLocal(err, "invalid_request", http.StatusBadRequest)
+	}
+	if strings.TrimSpace(req.Prompt) == "" {
+		return service.TaskErrorWrapperLocal(fmt.Errorf("field prompt is required"), "invalid_request", http.StatusBadRequest)
+	}
+	return nil
+}
+
 func (a *TaskAdaptor) ValidateRequestAndSetAction(c *gin.Context, info *relaycommon.RelayInfo) (taskErr *dto.TaskError) {
+	if info.Action == constant.TaskActionRemix {
+		return validateRemixRequest(c)
+	}
 	return relaycommon.ValidateMultipartDirect(c, info)
 }

 func (a *TaskAdaptor) BuildRequestURL(info *relaycommon.RelayInfo) (string, error) {
+	if info.Action == constant.TaskActionRemix {
+		return fmt.Sprintf("%s/v1/videos/%s/remix", a.baseURL, info.OriginTaskID), nil
+	}
 	return fmt.Sprintf("%s/v1/videos", a.baseURL), nil
 }

@@ -125,7 +146,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, _ *relayco
 }

 // FetchTask fetch task status
-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	taskID, ok := body["task_id"].(string)
 	if !ok {
 		return nil, fmt.Errorf("invalid task_id")
@@ -140,7 +161,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http

 	req.Header.Set("Authorization", "Bearer "+key)

-	return service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
+	}
+	return client.Do(req)
 }

 func (a *TaskAdaptor) GetModelList() []string {
@@ -105,7 +105,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 		return
 	}
 	if !sunoResponse.IsSuccess() {
-		taskErr = service.TaskErrorWrapper(fmt.Errorf(sunoResponse.Message), sunoResponse.Code, http.StatusInternalServerError)
+		taskErr = service.TaskErrorWrapper(fmt.Errorf("%s", sunoResponse.Message), sunoResponse.Code, http.StatusInternalServerError)
 		return
 	}

@@ -132,7 +132,7 @@ func (a *TaskAdaptor) GetChannelName() string {
 	return ChannelName
 }

-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	requestUrl := fmt.Sprintf("%s/suno/fetch", baseUrl)
 	byteBody, err := json.Marshal(body)
 	if err != nil {
@@ -153,11 +153,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http
 	req = req.WithContext(ctx)
 	req.Header.Set("Content-Type", "application/json")
 	req.Header.Set("Authorization", "Bearer "+key)
-	resp, err := service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
 	if err != nil {
-		return nil, err
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
 	}
-	return resp, nil
+	return client.Do(req)
 }

 func actionValidate(c *gin.Context, sunoRequest *dto.SunoSubmitReq, action string) (err error) {
@@ -120,7 +120,11 @@ func (a *TaskAdaptor) BuildRequestHeader(c *gin.Context, req *http.Request, info
 		return fmt.Errorf("failed to decode credentials: %w", err)
 	}

-	token, err := vertexcore.AcquireAccessToken(*adc, "")
+	proxy := ""
+	if info != nil {
+		proxy = info.ChannelSetting.Proxy
+	}
+	token, err := vertexcore.AcquireAccessToken(*adc, proxy)
 	if err != nil {
 		return fmt.Errorf("failed to acquire access token: %w", err)
 	}
@@ -216,7 +220,7 @@ func (a *TaskAdaptor) GetModelList() []string { return []string{"veo-3.0-generat
 func (a *TaskAdaptor) GetChannelName() string { return "vertex" }

 // FetchTask fetch task status
-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	taskID, ok := body["task_id"].(string)
 	if !ok {
 		return nil, fmt.Errorf("invalid task_id")
@@ -249,7 +253,7 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http
 	if err := json.Unmarshal([]byte(key), adc); err != nil {
 		return nil, fmt.Errorf("failed to decode credentials: %w", err)
 	}
-	token, err := vertexcore.AcquireAccessToken(*adc, "")
+	token, err := vertexcore.AcquireAccessToken(*adc, proxy)
 	if err != nil {
 		return nil, fmt.Errorf("failed to acquire access token: %w", err)
 	}
@@ -261,7 +265,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http
 	req.Header.Set("Accept", "application/json")
 	req.Header.Set("Authorization", "Bearer "+token)
 	req.Header.Set("x-goog-user-project", adc.ProjectID)
-	return service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
+	}
+	return client.Do(req)
 }

 func (a *TaskAdaptor) ParseTaskResult(respBody []byte) (*relaycommon.TaskInfo, error) {
@@ -188,7 +188,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 	return vResp.TaskId, responseBody, nil
 }

-func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http.Response, error) {
+func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any, proxy string) (*http.Response, error) {
 	taskID, ok := body["task_id"].(string)
 	if !ok {
 		return nil, fmt.Errorf("invalid task_id")
@@ -204,7 +204,11 @@ func (a *TaskAdaptor) FetchTask(baseUrl, key string, body map[string]any) (*http
 	req.Header.Set("Accept", "application/json")
 	req.Header.Set("Authorization", "Token "+key)

-	return service.GetHttpClient().Do(req)
+	client, err := service.GetHttpClientWithProxy(proxy)
+	if err != nil {
+		return nil, fmt.Errorf("new proxy http client failed: %w", err)
+	}
+	return client.Do(req)
 }

 func (a *TaskAdaptor) GetModelList() []string {
@@ -105,7 +105,7 @@ func tencentStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *htt
 		data = strings.TrimPrefix(data, "data:")

 		var tencentResponse TencentChatResponse
-		err := json.Unmarshal([]byte(data), &tencentResponse)
+		err := common.Unmarshal([]byte(data), &tencentResponse)
 		if err != nil {
 			common.SysLog("error unmarshalling stream response: " + err.Error())
 			continue
@@ -130,7 +130,7 @@ func tencentStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *htt

 	service.CloseResponseBodyGracefully(resp)

-	return service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.PromptTokens), nil
+	return service.ResponseText2Usage(c, responseText, info.UpstreamModelName, info.GetEstimatePromptTokens()), nil
 }

 func tencentHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Response) (*dto.Usage, *types.NewAPIError) {
@@ -17,6 +17,7 @@ import (
 	relaycommon "github.com/QuantumNous/new-api/relay/common"
 	"github.com/QuantumNous/new-api/relay/constant"
 	"github.com/QuantumNous/new-api/setting/model_setting"
+	"github.com/QuantumNous/new-api/setting/reasoning"
 	"github.com/QuantumNous/new-api/types"

 	"github.com/gin-gonic/gin"
@@ -50,10 +51,43 @@ type Adaptor struct {
 }

 func (a *Adaptor) ConvertGeminiRequest(c *gin.Context, info *relaycommon.RelayInfo, request *dto.GeminiChatRequest) (any, error) {
+	// Vertex AI does not support functionResponse.id; keep it stripped here for consistency.
+	if model_setting.GetGeminiSettings().RemoveFunctionResponseIdEnabled {
+		removeFunctionResponseID(request)
+	}
 	geminiAdaptor := gemini.Adaptor{}
 	return geminiAdaptor.ConvertGeminiRequest(c, info, request)
 }

+func removeFunctionResponseID(request *dto.GeminiChatRequest) {
+	if request == nil {
+		return
+	}
+
+	if len(request.Contents) > 0 {
+		for i := range request.Contents {
+			if len(request.Contents[i].Parts) == 0 {
+				continue
+			}
+			for j := range request.Contents[i].Parts {
+				part := &request.Contents[i].Parts[j]
+				if part.FunctionResponse == nil {
+					continue
+				}
+				if len(part.FunctionResponse.ID) > 0 {
+					part.FunctionResponse.ID = nil
+				}
+			}
+		}
+	}
+
+	if len(request.Requests) > 0 {
+		for i := range request.Requests {
+			removeFunctionResponseID(&request.Requests[i])
+		}
+	}
+}
+
 func (a *Adaptor) ConvertClaudeRequest(c *gin.Context, info *relaycommon.RelayInfo, request *dto.ClaudeRequest) (any, error) {
 	if v, ok := claudeModelMap[info.UpstreamModelName]; ok {
 		c.Set("request_model", v)
@@ -181,6 +215,8 @@ func (a *Adaptor) GetRequestURL(info *relaycommon.RelayInfo) (string, error) {
 				info.UpstreamModelName = strings.TrimSuffix(info.UpstreamModelName, "-thinking")
 			} else if strings.HasSuffix(info.UpstreamModelName, "-nothinking") {
 				info.UpstreamModelName = strings.TrimSuffix(info.UpstreamModelName, "-nothinking")
+			} else if baseModel, level, ok := reasoning.TrimEffortSuffix(info.UpstreamModelName); ok && level != "" {
+				info.UpstreamModelName = baseModel
 			}
 		}

@@ -184,9 +184,9 @@ func handleTTSResponse(c *gin.Context, resp *http.Response, info *relaycommon.Re
 	c.Data(http.StatusOK, contentType, audioData)

 	usage = &dto.Usage{
-		PromptTokens:     info.PromptTokens,
+		PromptTokens:     info.GetEstimatePromptTokens(),
 		CompletionTokens: 0,
-		TotalTokens:      info.PromptTokens,
+		TotalTokens:      info.GetEstimatePromptTokens(),
 	}

 	return usage, nil
@@ -284,9 +284,9 @@ func handleTTSWebSocketResponse(c *gin.Context, requestURL string, volcRequest V
 			if msg.Sequence < 0 {
 				c.Status(http.StatusOK)
 				usage = &dto.Usage{
-					PromptTokens:     info.PromptTokens,
+					PromptTokens:     info.GetEstimatePromptTokens(),
 					CompletionTokens: 0,
-					TotalTokens:      info.PromptTokens,
+					TotalTokens:      info.GetEstimatePromptTokens(),
 				}
 				return usage, nil
 			}
@@ -297,9 +297,9 @@ func handleTTSWebSocketResponse(c *gin.Context, requestURL string, volcRequest V

 	c.Status(http.StatusOK)
 	usage = &dto.Usage{
-		PromptTokens:     info.PromptTokens,
+		PromptTokens:     info.GetEstimatePromptTokens(),
 		CompletionTokens: 0,
-		TotalTokens:      info.PromptTokens,
+		TotalTokens:      info.GetEstimatePromptTokens(),
 	}
 	return usage, nil
 }
@@ -70,7 +70,7 @@ func xAIStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Re
 	})

 	if !containStreamUsage {
-		usage = service.ResponseText2Usage(c, responseTextBuilder.String(), info.UpstreamModelName, info.PromptTokens)
+		usage = service.ResponseText2Usage(c, responseTextBuilder.String(), info.UpstreamModelName, info.GetEstimatePromptTokens())
 		usage.CompletionTokens += toolCount * 7
 	}

@@ -36,8 +36,7 @@ func (a *Adaptor) ConvertAudioRequest(c *gin.Context, info *relaycommon.RelayInf
 }

 func (a *Adaptor) ConvertImageRequest(c *gin.Context, info *relaycommon.RelayInfo, request dto.ImageRequest) (any, error) {
-	//TODO implement me
-	return nil, errors.New("not implemented")
+	return request, nil
 }

 func (a *Adaptor) Init(info *relaycommon.RelayInfo) {
@@ -63,6 +62,8 @@ func (a *Adaptor) GetRequestURL(info *relaycommon.RelayInfo) (string, error) {
 				return fmt.Sprintf("%s/embeddings", specialPlan.OpenAIBaseURL), nil
 			}
 			return fmt.Sprintf("%s/api/paas/v4/embeddings", baseURL), nil
+		case relayconstant.RelayModeImagesGenerations:
+			return fmt.Sprintf("%s/api/paas/v4/images/generations", baseURL), nil
 		default:
 			if hasSpecialPlan && specialPlan.OpenAIBaseURL != "" {
 				return fmt.Sprintf("%s/chat/completions", specialPlan.OpenAIBaseURL), nil
@@ -114,6 +115,9 @@ func (a *Adaptor) DoResponse(c *gin.Context, resp *http.Response, info *relaycom
 			return claude.ClaudeHandler(c, resp, info, claude.RequestModeMessage)
 		}
 	default:
+		if info.RelayMode == relayconstant.RelayModeImagesGenerations {
+			return zhipu4vImageHandler(c, resp, info)
+		}
 		adaptor := openai.Adaptor{}
 		return adaptor.DoResponse(c, resp, info)
 	}
@@ -4,6 +4,7 @@ import (
 	"time"

 	"github.com/QuantumNous/new-api/dto"
+	"github.com/QuantumNous/new-api/types"
 )

 //	type ZhipuMessage struct {
@@ -37,7 +38,7 @@ type ZhipuV4Response struct {
 	Model               string                         `json:"model"`
 	TextResponseChoices []dto.OpenAITextResponseChoice `json:"choices"`
 	Usage               dto.Usage                      `json:"usage"`
-	Error               dto.OpenAIError                `json:"error"`
+	Error               types.OpenAIError              `json:"error"`
 }

 //
@@ -0,0 +1,127 @@
+package zhipu_4v
+
+import (
+	"io"
+	"net/http"
+
+	"github.com/QuantumNous/new-api/common"
+	"github.com/QuantumNous/new-api/dto"
+	"github.com/QuantumNous/new-api/logger"
+	relaycommon "github.com/QuantumNous/new-api/relay/common"
+	"github.com/QuantumNous/new-api/service"
+	"github.com/QuantumNous/new-api/types"
+
+	"github.com/gin-gonic/gin"
+)
+
+type zhipuImageRequest struct {
+	Model            string `json:"model"`
+	Prompt           string `json:"prompt"`
+	Quality          string `json:"quality,omitempty"`
+	Size             string `json:"size,omitempty"`
+	WatermarkEnabled *bool  `json:"watermark_enabled,omitempty"`
+	UserID           string `json:"user_id,omitempty"`
+}
+
+type zhipuImageResponse struct {
+	Created       *int64            `json:"created,omitempty"`
+	Data          []zhipuImageData  `json:"data,omitempty"`
+	ContentFilter any               `json:"content_filter,omitempty"`
+	Usage         *dto.Usage        `json:"usage,omitempty"`
+	Error         *zhipuImageError  `json:"error,omitempty"`
+	RequestID     string            `json:"request_id,omitempty"`
+	ExtendParam   map[string]string `json:"extendParam,omitempty"`
+}
+
+type zhipuImageError struct {
+	Code    string `json:"code"`
+	Message string `json:"message"`
+}
+
+type zhipuImageData struct {
+	Url      string `json:"url,omitempty"`
+	ImageUrl string `json:"image_url,omitempty"`
+	B64Json  string `json:"b64_json,omitempty"`
+	B64Image string `json:"b64_image,omitempty"`
+}
+
+type openAIImagePayload struct {
+	Created int64             `json:"created"`
+	Data    []openAIImageData `json:"data"`
+}
+
+type openAIImageData struct {
+	B64Json string `json:"b64_json"`
+}
+
+func zhipu4vImageHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) (*dto.Usage, *types.NewAPIError) {
+	responseBody, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return nil, types.NewOpenAIError(err, types.ErrorCodeReadResponseBodyFailed, http.StatusInternalServerError)
+	}
+	service.CloseResponseBodyGracefully(resp)
+
+	var zhipuResp zhipuImageResponse
+	if err := common.Unmarshal(responseBody, &zhipuResp); err != nil {
+		return nil, types.NewOpenAIError(err, types.ErrorCodeBadResponseBody, http.StatusInternalServerError)
+	}
+
+	if zhipuResp.Error != nil && zhipuResp.Error.Message != "" {
+		return nil, types.WithOpenAIError(types.OpenAIError{
+			Message: zhipuResp.Error.Message,
+			Type:    "zhipu_image_error",
+			Code:    zhipuResp.Error.Code,
+		}, resp.StatusCode)
+	}
+
+	payload := openAIImagePayload{}
+	if zhipuResp.Created != nil && *zhipuResp.Created != 0 {
+		payload.Created = *zhipuResp.Created
+	} else {
+		payload.Created = info.StartTime.Unix()
+	}
+	for _, data := range zhipuResp.Data {
+		url := data.Url
+		if url == "" {
+			url = data.ImageUrl
+		}
+		if url == "" {
+			logger.LogWarn(c, "zhipu_image_missing_url")
+			continue
+		}
+
+		var b64 string
+		switch {
+		case data.B64Json != "":
+			b64 = data.B64Json
+		case data.B64Image != "":
+			b64 = data.B64Image
+		default:
+			_, downloaded, err := service.GetImageFromUrl(url)
+			if err != nil {
+				logger.LogError(c, "zhipu_image_get_b64_failed: "+err.Error())
+				continue
+			}
+			b64 = downloaded
+		}
+
+		if b64 == "" {
+			logger.LogWarn(c, "zhipu_image_empty_b64")
+			continue
+		}
+
+		imageData := openAIImageData{
+			B64Json: b64,
+		}
+		payload.Data = append(payload.Data, imageData)
+	}
+
+	jsonResp, err := common.Marshal(payload)
+	if err != nil {
+		return nil, types.NewError(err, types.ErrorCodeBadResponseBody)
+	}
+
+	service.IOCopyBytesGracefully(c, resp, jsonResp)
+
+	return &dto.Usage{}, nil
+}
@@ -11,6 +11,8 @@ import (
 	"github.com/tidwall/sjson"
 )

+var negativeIndexRegexp = regexp.MustCompile(`\.(-\d+)`)
+
 type ConditionOperation struct {
 	Path           string      `json:"path"`             // JSON路径
 	Mode           string      `json:"mode"`             // full, prefix, suffix, contains, gt, gte, lt, lte
@@ -186,8 +188,7 @@ func checkSingleCondition(jsonStr, contextJSON string, condition ConditionOperat
 }

 func processNegativeIndex(jsonStr string, path string) string {
-	re := regexp.MustCompile(`\.(-\d+)`)
-	matches := re.FindAllStringSubmatch(path, -1)
+	matches := negativeIndexRegexp.FindAllStringSubmatch(path, -1)

 	if len(matches) == 0 {
 		return path
@@ -11,6 +11,7 @@ import (
 	"github.com/QuantumNous/new-api/constant"
 	"github.com/QuantumNous/new-api/dto"
 	relayconstant "github.com/QuantumNous/new-api/relay/constant"
+	"github.com/QuantumNous/new-api/setting/model_setting"
 	"github.com/QuantumNous/new-api/types"

 	"github.com/gin-gonic/gin"
@@ -73,11 +74,17 @@ type ChannelMeta struct {
 	SupportStreamOptions bool // 是否支持流式选项
 }

+type TokenCountMeta struct {
+	//promptTokens int
+	estimatePromptTokens int
+}
+
 type RelayInfo struct {
 	TokenId           int
 	TokenKey          string
+	TokenGroup        string
 	UserId            int
-	UsingGroup        string // 使用的分组
+	UsingGroup        string // 使用的分组，当auto跨分组重试时，会变动
 	UserGroup         string // 用户所在分组
 	TokenUnlimited    bool
 	StartTime         time.Time
@@ -91,7 +98,6 @@ type RelayInfo struct {
 	RelayMode              int
 	OriginModelName        string
 	RequestURLPath         string
-	PromptTokens           int
 	ShouldIncludeUsage     bool
 	DisablePing            bool // 是否禁止向下游发送自定义 Ping
 	ClientWs               *websocket.Conn
@@ -115,6 +121,7 @@ type RelayInfo struct {
 	Request dto.Request

 	ThinkingContentInfo
+	TokenCountMeta
 	*ClaudeConvertInfo
 	*RerankerInfo
 	*ResponsesUsageInfo
@@ -189,7 +196,7 @@ func (info *RelayInfo) ToString() string {
 	fmt.Fprintf(b, "IsPlayground: %t, ", info.IsPlayground)
 	fmt.Fprintf(b, "RequestURLPath: %q, ", info.RequestURLPath)
 	fmt.Fprintf(b, "OriginModelName: %q, ", info.OriginModelName)
-	fmt.Fprintf(b, "PromptTokens: %d, ", info.PromptTokens)
+	fmt.Fprintf(b, "EstimatePromptTokens: %d, ", info.estimatePromptTokens)
 	fmt.Fprintf(b, "ShouldIncludeUsage: %t, ", info.ShouldIncludeUsage)
 	fmt.Fprintf(b, "DisablePing: %t, ", info.DisablePing)
 	fmt.Fprintf(b, "SendResponseCount: %d, ", info.SendResponseCount)
@@ -368,6 +375,12 @@ func genBaseRelayInfo(c *gin.Context, request dto.Request) *RelayInfo {
 	//channelId := common.GetContextKeyInt(c, constant.ContextKeyChannelId)
 	//paramOverride := common.GetContextKeyStringMap(c, constant.ContextKeyChannelParamOverride)

+	tokenGroup := common.GetContextKeyString(c, constant.ContextKeyTokenGroup)
+	// 当令牌分组为空时，表示使用用户分组
+	if tokenGroup == "" {
+		tokenGroup = common.GetContextKeyString(c, constant.ContextKeyUserGroup)
+	}
+
 	startTime := common.GetContextKeyTime(c, constant.ContextKeyRequestStartTime)
 	if startTime.IsZero() {
 		startTime = time.Now()
@@ -391,11 +404,11 @@ func genBaseRelayInfo(c *gin.Context, request dto.Request) *RelayInfo {
 		UserEmail:  common.GetContextKeyString(c, constant.ContextKeyUserEmail),

 		OriginModelName: common.GetContextKeyString(c, constant.ContextKeyOriginalModel),
-		PromptTokens:    common.GetContextKeyInt(c, constant.ContextKeyPromptTokens),

 		TokenId:        common.GetContextKeyInt(c, constant.ContextKeyTokenId),
 		TokenKey:       common.GetContextKeyString(c, constant.ContextKeyTokenKey),
 		TokenUnlimited: common.GetContextKeyBool(c, constant.ContextKeyTokenUnlimited),
+		TokenGroup:     tokenGroup,

 		isFirstResponse: true,
 		RelayMode:       relayconstant.Path2RelayMode(c.Request.URL.Path),
@@ -408,6 +421,10 @@ func genBaseRelayInfo(c *gin.Context, request dto.Request) *RelayInfo {
 			IsFirstThinkingContent:  true,
 			SendLastThinkingContent: false,
 		},
+		TokenCountMeta: TokenCountMeta{
+			//promptTokens: common.GetContextKeyInt(c, constant.ContextKeyPromptTokens),
+			estimatePromptTokens: common.GetContextKeyInt(c, constant.ContextKeyEstimatedTokens),
+		},
 	}

 	if info.RelayMode == relayconstant.RelayModeUnknown {
@@ -463,8 +480,16 @@ func GenRelayInfo(c *gin.Context, relayFormat types.RelayFormat, request dto.Req
 	}
 }

-func (info *RelayInfo) SetPromptTokens(promptTokens int) {
-	info.PromptTokens = promptTokens
+//func (info *RelayInfo) SetPromptTokens(promptTokens int) {
+//	info.promptTokens = promptTokens
+//}
+
+func (info *RelayInfo) SetEstimatePromptTokens(promptTokens int) {
+	info.estimatePromptTokens = promptTokens
+}
+
+func (info *RelayInfo) GetEstimatePromptTokens() int {
+	return info.estimatePromptTokens
 }

 func (info *RelayInfo) SetFirstResponseTime() {
@@ -610,3 +635,47 @@ func RemoveDisabledFields(jsonData []byte, channelOtherSettings dto.ChannelOther
 	}
 	return jsonDataAfter, nil
 }
+
+// RemoveGeminiDisabledFields removes disabled fields from Gemini request JSON data
+// Currently supports removing functionResponse.id field which Vertex AI does not support
+func RemoveGeminiDisabledFields(jsonData []byte) ([]byte, error) {
+	if !model_setting.GetGeminiSettings().RemoveFunctionResponseIdEnabled {
+		return jsonData, nil
+	}
+
+	var data map[string]interface{}
+	if err := common.Unmarshal(jsonData, &data); err != nil {
+		common.SysError("RemoveGeminiDisabledFields Unmarshal error: " + err.Error())
+		return jsonData, nil
+	}
+
+	// Process contents array
+	// Handle both camelCase (functionResponse) and snake_case (function_response)
+	if contents, ok := data["contents"].([]interface{}); ok {
+		for _, content := range contents {
+			if contentMap, ok := content.(map[string]interface{}); ok {
+				if parts, ok := contentMap["parts"].([]interface{}); ok {
+					for _, part := range parts {
+						if partMap, ok := part.(map[string]interface{}); ok {
+							// Check functionResponse (camelCase)
+							if funcResp, ok := partMap["functionResponse"].(map[string]interface{}); ok {
+								delete(funcResp, "id")
+							}
+							// Check function_response (snake_case)
+							if funcResp, ok := partMap["function_response"].(map[string]interface{}); ok {
+								delete(funcResp, "id")
+							}
+						}
+					}
+				}
+			}
+		}
+	}
+
+	jsonDataAfter, err := common.Marshal(data)
+	if err != nil {
+		common.SysError("RemoveGeminiDisabledFields Marshal error: " + err.Error())
+		return jsonData, nil
+	}
+	return jsonDataAfter, nil
+}
@@ -57,8 +57,8 @@ func RerankHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Respo
 		jinaResp = dto.RerankResponse{
 			Results: jinaRespResults,
 			Usage: dto.Usage{
-				PromptTokens: info.PromptTokens,
-				TotalTokens:  info.PromptTokens,
+				PromptTokens: info.GetEstimatePromptTokens(),
+				TotalTokens:  info.GetEstimatePromptTokens(),
 			},
 		}
 	} else {
@@ -181,7 +181,7 @@ func TextHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *types
 		return newApiErr
 	}

-	if strings.HasPrefix(info.OriginModelName, "gpt-4o-audio") {
+	if usage.(*dto.Usage).CompletionTokenDetails.AudioTokens > 0 || usage.(*dto.Usage).PromptTokensDetails.AudioTokens > 0 {
 		service.PostAudioConsumeQuota(c, info, usage.(*dto.Usage), "")
 	} else {
 		postConsumeQuota(c, info, usage.(*dto.Usage), "")
@@ -192,9 +192,9 @@ func TextHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *types
 func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage *dto.Usage, extraContent string) {
 	if usage == nil {
 		usage = &dto.Usage{
-			PromptTokens:     relayInfo.PromptTokens,
+			PromptTokens:     relayInfo.GetEstimatePromptTokens(),
 			CompletionTokens: 0,
-			TotalTokens:      relayInfo.PromptTokens,
+			TotalTokens:      relayInfo.GetEstimatePromptTokens(),
 		}
 		extraContent += "（可能是请求出错）"
 	}
@@ -300,14 +300,20 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 	if !relayInfo.PriceData.UsePrice {
 		baseTokens := dPromptTokens
 		// 减去 cached tokens
+		// Anthropic API 的 input_tokens 已经不包含缓存 tokens，不需要减去
+		// OpenAI/OpenRouter 等 API 的 prompt_tokens 包含缓存 tokens，需要减去
 		var cachedTokensWithRatio decimal.Decimal
 		if !dCacheTokens.IsZero() {
-			baseTokens = baseTokens.Sub(dCacheTokens)
+			if relayInfo.ChannelType != constant.ChannelTypeAnthropic {
+				baseTokens = baseTokens.Sub(dCacheTokens)
+			}
 			cachedTokensWithRatio = dCacheTokens.Mul(dCacheRatio)
 		}
 		var dCachedCreationTokensWithRatio decimal.Decimal
 		if !dCachedCreationTokens.IsZero() {
-			baseTokens = baseTokens.Sub(dCachedCreationTokens)
+			if relayInfo.ChannelType != constant.ChannelTypeAnthropic {
+				baseTokens = baseTokens.Sub(dCachedCreationTokens)
+			}
 			dCachedCreationTokensWithRatio = dCachedCreationTokens.Mul(dCachedCreationRatio)
 		}

@@ -14,15 +14,28 @@ import (
 	"github.com/gorilla/websocket"
 )

-func FlushWriter(c *gin.Context) error {
-	if c.Writer == nil {
+func FlushWriter(c *gin.Context) (err error) {
+	defer func() {
+		if r := recover(); r != nil {
+			err = fmt.Errorf("flush panic recovered: %v", r)
+		}
+	}()
+
+	if c == nil || c.Writer == nil {
 		return nil
 	}
-	if flusher, ok := c.Writer.(http.Flusher); ok {
-		flusher.Flush()
-		return nil
+
+	if c.Request != nil && c.Request.Context().Err() != nil {
+		return fmt.Errorf("request context done: %w", c.Request.Context().Err())
 	}
-	return errors.New("streaming error: flusher not found")
+
+	flusher, ok := c.Writer.(http.Flusher)
+	if !ok {
+		return errors.New("streaming error: flusher not found")
+	}
+
+	flusher.Flush()
+	return nil
 }

 func SetEventStreamHeaders(c *gin.Context) {
@@ -66,17 +79,31 @@ func ResponseChunkData(c *gin.Context, resp dto.ResponsesStreamResponse, data st
 }

 func StringData(c *gin.Context, str string) error {
-	//str = strings.TrimPrefix(str, "data: ")
-	//str = strings.TrimSuffix(str, "\r")
+	if c == nil || c.Writer == nil {
+		return errors.New("context or writer is nil")
+	}
+
+	if c.Request != nil && c.Request.Context().Err() != nil {
+		return fmt.Errorf("request context done: %w", c.Request.Context().Err())
+	}
+
 	c.Render(-1, common.CustomEvent{Data: "data: " + str})
-	_ = FlushWriter(c)
-	return nil
+	return FlushWriter(c)
 }

 func PingData(c *gin.Context) error {
-	c.Writer.Write([]byte(": PING\n\n"))
-	_ = FlushWriter(c)
-	return nil
+	if c == nil || c.Writer == nil {
+		return errors.New("context or writer is nil")
+	}
+
+	if c.Request != nil && c.Request.Context().Err() != nil {
+		return fmt.Errorf("request context done: %w", c.Request.Context().Err())
+	}
+
+	if _, err := c.Writer.Write([]byte(": PING\n\n")); err != nil {
+		return fmt.Errorf("write ping data failed: %w", err)
+	}
+	return FlushWriter(c)
 }

 func ObjectData(c *gin.Context, object interface{}) error {
@@ -99,7 +99,10 @@ func ModelPriceHelper(c *gin.Context, info *relaycommon.RelayInfo, promptTokens
 	// check if free model pre-consume is disabled
 	if !operation_setting.GetQuotaSetting().EnableFreeModelPreConsume {
 		// if model price or ratio is 0, do not pre-consume quota
-		if usePrice {
+		if groupRatioInfo.GroupRatio == 0 {
+			preConsumedQuota = 0
+			freeModel = true
+		} else if usePrice {
 			if modelPrice == 0 {
 				preConsumedQuota = 0
 				freeModel = true
@@ -72,6 +72,8 @@ func StreamScannerHandler(c *gin.Context, resp *http.Response, info *relaycommon
 	if common.DebugEnabled {
 		// print timeout and ping interval for debugging
 		println("relay timeout seconds:", common.RelayTimeout)
+		println("relay max idle conns:", common.RelayMaxIdleConns)
+		println("relay max idle conns per host:", common.RelayMaxIdleConnsPerHost)
 		println("streaming timeout seconds:", int64(streamingTimeout.Seconds()))
 		println("ping interval seconds:", int64(pingInterval.Seconds()))
 	}
@@ -32,7 +32,94 @@ func RelayTaskSubmit(c *gin.Context, info *relaycommon.RelayInfo) (taskErr *dto.
 	if info.TaskRelayInfo == nil {
 		info.TaskRelayInfo = &relaycommon.TaskRelayInfo{}
 	}
+	path := c.Request.URL.Path
+	if strings.Contains(path, "/v1/videos/") && strings.HasSuffix(path, "/remix") {
+		info.Action = constant.TaskActionRemix
+	}
+
+	// 提取 remix 任务的 video_id
+	if info.Action == constant.TaskActionRemix {
+		videoID := c.Param("video_id")
+		if strings.TrimSpace(videoID) == "" {
+			return service.TaskErrorWrapperLocal(fmt.Errorf("video_id is required"), "invalid_request", http.StatusBadRequest)
+		}
+		info.OriginTaskID = videoID
+	}
+
 	platform := constant.TaskPlatform(c.GetString("platform"))
+
+	// 获取原始任务信息
+	if info.OriginTaskID != "" {
+		originTask, exist, err := model.GetByTaskId(info.UserId, info.OriginTaskID)
+		if err != nil {
+			taskErr = service.TaskErrorWrapper(err, "get_origin_task_failed", http.StatusInternalServerError)
+			return
+		}
+		if !exist {
+			taskErr = service.TaskErrorWrapperLocal(errors.New("task_origin_not_exist"), "task_not_exist", http.StatusBadRequest)
+			return
+		}
+		if info.OriginModelName == "" {
+			if originTask.Properties.OriginModelName != "" {
+				info.OriginModelName = originTask.Properties.OriginModelName
+			} else if originTask.Properties.UpstreamModelName != "" {
+				info.OriginModelName = originTask.Properties.UpstreamModelName
+			} else {
+				var taskData map[string]interface{}
+				_ = json.Unmarshal(originTask.Data, &taskData)
+				if m, ok := taskData["model"].(string); ok && m != "" {
+					info.OriginModelName = m
+					platform = originTask.Platform
+				}
+			}
+		}
+		if originTask.ChannelId != info.ChannelId {
+			channel, err := model.GetChannelById(originTask.ChannelId, true)
+			if err != nil {
+				taskErr = service.TaskErrorWrapperLocal(err, "channel_not_found", http.StatusBadRequest)
+				return
+			}
+			if channel.Status != common.ChannelStatusEnabled {
+				taskErr = service.TaskErrorWrapperLocal(errors.New("the channel of the origin task is disabled"), "task_channel_disable", http.StatusBadRequest)
+				return
+			}
+			key, _, newAPIError := channel.GetNextEnabledKey()
+			if newAPIError != nil {
+				taskErr = service.TaskErrorWrapper(newAPIError, "channel_no_available_key", newAPIError.StatusCode)
+				return
+			}
+			common.SetContextKey(c, constant.ContextKeyChannelKey, key)
+			common.SetContextKey(c, constant.ContextKeyChannelType, channel.Type)
+			common.SetContextKey(c, constant.ContextKeyChannelBaseUrl, channel.GetBaseURL())
+			common.SetContextKey(c, constant.ContextKeyChannelId, originTask.ChannelId)
+
+			info.ChannelBaseUrl = channel.GetBaseURL()
+			info.ChannelId = originTask.ChannelId
+			info.ChannelType = channel.Type
+			info.ApiKey = key
+			platform = originTask.Platform
+		}
+
+		// 使用原始任务的参数
+		if info.Action == constant.TaskActionRemix {
+			var taskData map[string]interface{}
+			_ = json.Unmarshal(originTask.Data, &taskData)
+			secondsStr, _ := taskData["seconds"].(string)
+			seconds, _ := strconv.Atoi(secondsStr)
+			if seconds <= 0 {
+				seconds = 4
+			}
+			sizeStr, _ := taskData["size"].(string)
+			if info.PriceData.OtherRatios == nil {
+				info.PriceData.OtherRatios = map[string]float64{}
+			}
+			info.PriceData.OtherRatios["seconds"] = float64(seconds)
+			info.PriceData.OtherRatios["size"] = 1
+			if sizeStr == "1792x1024" || sizeStr == "1024x1792" {
+				info.PriceData.OtherRatios["size"] = 1.666667
+			}
+		}
+	}
 	if platform == "" {
 		platform = GetTaskPlatform(c)
 	}
@@ -94,34 +181,6 @@ func RelayTaskSubmit(c *gin.Context, info *relaycommon.RelayInfo) (taskErr *dto.
 		return
 	}

-	if info.OriginTaskID != "" {
-		originTask, exist, err := model.GetByTaskId(info.UserId, info.OriginTaskID)
-		if err != nil {
-			taskErr = service.TaskErrorWrapper(err, "get_origin_task_failed", http.StatusInternalServerError)
-			return
-		}
-		if !exist {
-			taskErr = service.TaskErrorWrapperLocal(errors.New("task_origin_not_exist"), "task_not_exist", http.StatusBadRequest)
-			return
-		}
-		if originTask.ChannelId != info.ChannelId {
-			channel, err := model.GetChannelById(originTask.ChannelId, true)
-			if err != nil {
-				taskErr = service.TaskErrorWrapperLocal(err, "channel_not_found", http.StatusBadRequest)
-				return
-			}
-			if channel.Status != common.ChannelStatusEnabled {
-				return service.TaskErrorWrapperLocal(errors.New("该任务所属渠道已被禁用"), "task_channel_disable", http.StatusBadRequest)
-			}
-			c.Set("base_url", channel.GetBaseURL())
-			c.Set("channel_id", originTask.ChannelId)
-			c.Request.Header.Set("Authorization", fmt.Sprintf("Bearer %s", channel.Key))
-
-			info.ChannelBaseUrl = channel.GetBaseURL()
-			info.ChannelId = originTask.ChannelId
-		}
-	}
-
 	// build body
 	requestBody, err := adaptor.BuildRequestBody(c, info)
 	if err != nil {
@@ -137,7 +196,7 @@ func RelayTaskSubmit(c *gin.Context, info *relaycommon.RelayInfo) (taskErr *dto.
 	// handle response
 	if resp != nil && resp.StatusCode != http.StatusOK {
 		responseBody, _ := io.ReadAll(resp.Body)
-		taskErr = service.TaskErrorWrapper(fmt.Errorf(string(responseBody)), "fail_to_fetch_task", resp.StatusCode)
+		taskErr = service.TaskErrorWrapper(fmt.Errorf("%s", string(responseBody)), "fail_to_fetch_task", resp.StatusCode)
 		return
 	}

@@ -326,6 +385,7 @@ func videoFetchByIDRespBodyBuilder(c *gin.Context) (respBody []byte, taskResp *d
 		if channelModel.GetBaseURL() != "" {
 			baseURL = channelModel.GetBaseURL()
 		}
+		proxy := channelModel.GetSetting().Proxy
 		adaptor := GetTaskAdaptor(constant.TaskPlatform(strconv.Itoa(channelModel.Type)))
 		if adaptor == nil {
 			return
@@ -333,7 +393,7 @@ func videoFetchByIDRespBodyBuilder(c *gin.Context) (respBody []byte, taskResp *d
 		resp, err2 := adaptor.FetchTask(baseURL, channelModel.Key, map[string]any{
 			"task_id": originTask.TaskID,
 			"action":  originTask.Action,
-		})
+		}, proxy)
 		if err2 != nil || resp == nil {
 			return
 		}
@@ -14,6 +14,7 @@ func SetVideoRouter(router *gin.Engine) {
 		videoV1Router.GET("/videos/:task_id/content", controller.VideoProxy)
 		videoV1Router.POST("/video/generations", controller.RelayTask)
 		videoV1Router.GET("/video/generations/:task_id", controller.RelayTask)
+		videoV1Router.POST("/videos/:video_id/remix", controller.RelayTask)
 	}
 	// openai compatible API video routes
 	// docs: https://platform.openai.com/docs/api-reference/videos/create
@@ -11,31 +11,151 @@ import (
 	"github.com/gin-gonic/gin"
 )

-func CacheGetRandomSatisfiedChannel(c *gin.Context, group string, modelName string, retry int) (*model.Channel, string, error) {
+type RetryParam struct {
+	Ctx          *gin.Context
+	TokenGroup   string
+	ModelName    string
+	Retry        *int
+	resetNextTry bool
+}
+
+func (p *RetryParam) GetRetry() int {
+	if p.Retry == nil {
+		return 0
+	}
+	return *p.Retry
+}
+
+func (p *RetryParam) SetRetry(retry int) {
+	p.Retry = &retry
+}
+
+func (p *RetryParam) IncreaseRetry() {
+	if p.resetNextTry {
+		p.resetNextTry = false
+		return
+	}
+	if p.Retry == nil {
+		p.Retry = new(int)
+	}
+	*p.Retry++
+}
+
+func (p *RetryParam) ResetRetryNextTry() {
+	p.resetNextTry = true
+}
+
+// CacheGetRandomSatisfiedChannel tries to get a random channel that satisfies the requirements.
+// 尝试获取一个满足要求的随机渠道。
+//
+// For "auto" tokenGroup with cross-group Retry enabled:
+// 对于启用了跨分组重试的 "auto" tokenGroup：
+//
+//   - Each group will exhaust all its priorities before moving to the next group.
+//     每个分组会用完所有优先级后才会切换到下一个分组。
+//
+//   - Uses ContextKeyAutoGroupIndex to track current group index.
+//     使用 ContextKeyAutoGroupIndex 跟踪当前分组索引。
+//
+//   - Uses ContextKeyAutoGroupRetryIndex to track the global Retry count when current group started.
+//     使用 ContextKeyAutoGroupRetryIndex 跟踪当前分组开始时的全局重试次数。
+//
+//   - priorityRetry = Retry - startRetryIndex, represents the priority level within current group.
+//     priorityRetry = Retry - startRetryIndex，表示当前分组内的优先级级别。
+//
+//   - When GetRandomSatisfiedChannel returns nil (priorities exhausted), moves to next group.
+//     当 GetRandomSatisfiedChannel 返回 nil（优先级用完）时，切换到下一个分组。
+//
+// Example flow (2 groups, each with 2 priorities, RetryTimes=3):
+// 示例流程（2个分组，每个有2个优先级，RetryTimes=3）：
+//
+//	Retry=0: GroupA, priority0 (startRetryIndex=0, priorityRetry=0)
+//	         分组A, 优先级0
+//
+//	Retry=1: GroupA, priority1 (startRetryIndex=0, priorityRetry=1)
+//	         分组A, 优先级1
+//
+//	Retry=2: GroupA exhausted → GroupB, priority0 (startRetryIndex=2, priorityRetry=0)
+//	         分组A用完 → 分组B, 优先级0
+//
+//	Retry=3: GroupB, priority1 (startRetryIndex=2, priorityRetry=1)
+//	         分组B, 优先级1
+func CacheGetRandomSatisfiedChannel(param *RetryParam) (*model.Channel, string, error) {
 	var channel *model.Channel
 	var err error
-	selectGroup := group
-	userGroup := common.GetContextKeyString(c, constant.ContextKeyUserGroup)
-	if group == "auto" {
+	selectGroup := param.TokenGroup
+	userGroup := common.GetContextKeyString(param.Ctx, constant.ContextKeyUserGroup)
+
+	if param.TokenGroup == "auto" {
 		if len(setting.GetAutoGroups()) == 0 {
 			return nil, selectGroup, errors.New("auto groups is not enabled")
 		}
-		for _, autoGroup := range GetUserAutoGroup(userGroup) {
-			logger.LogDebug(c, "Auto selecting group:", autoGroup)
-			channel, _ = model.GetRandomSatisfiedChannel(autoGroup, modelName, retry)
-			if channel == nil {
-				continue
-			} else {
-				c.Set("auto_group", autoGroup)
-				selectGroup = autoGroup
-				logger.LogDebug(c, "Auto selected group:", autoGroup)
-				break
+		autoGroups := GetUserAutoGroup(userGroup)
+
+		// startGroupIndex: the group index to start searching from
+		// startGroupIndex: 开始搜索的分组索引
+		startGroupIndex := 0
+		crossGroupRetry := common.GetContextKeyBool(param.Ctx, constant.ContextKeyTokenCrossGroupRetry)
+
+		if lastGroupIndex, exists := common.GetContextKey(param.Ctx, constant.ContextKeyAutoGroupIndex); exists {
+			if idx, ok := lastGroupIndex.(int); ok {
+				startGroupIndex = idx
 			}
 		}
+
+		for i := startGroupIndex; i < len(autoGroups); i++ {
+			autoGroup := autoGroups[i]
+			// Calculate priorityRetry for current group
+			// 计算当前分组的 priorityRetry
+			priorityRetry := param.GetRetry()
+			// If moved to a new group, reset priorityRetry and update startRetryIndex
+			// 如果切换到新分组，重置 priorityRetry 并更新 startRetryIndex
+			if i > startGroupIndex {
+				priorityRetry = 0
+			}
+			logger.LogDebug(param.Ctx, "Auto selecting group: %s, priorityRetry: %d", autoGroup, priorityRetry)
+
+			channel, _ = model.GetRandomSatisfiedChannel(autoGroup, param.ModelName, priorityRetry)
+			if channel == nil {
+				// Current group has no available channel for this model, try next group
+				// 当前分组没有该模型的可用渠道，尝试下一个分组
+				logger.LogDebug(param.Ctx, "No available channel in group %s for model %s at priorityRetry %d, trying next group", autoGroup, param.ModelName, priorityRetry)
+				// 重置状态以尝试下一个分组
+				common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroupIndex, i+1)
+				common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroupRetryIndex, 0)
+				// Reset retry counter so outer loop can continue for next group
+				// 重置重试计数器，以便外层循环可以为下一个分组继续
+				param.SetRetry(0)
+				continue
+			}
+			common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroup, autoGroup)
+			selectGroup = autoGroup
+			logger.LogDebug(param.Ctx, "Auto selected group: %s", autoGroup)
+
+			// Prepare state for next retry
+			// 为下一次重试准备状态
+			if crossGroupRetry && priorityRetry >= common.RetryTimes {
+				// Current group has exhausted all retries, prepare to switch to next group
+				// This request still uses current group, but next retry will use next group
+				// 当前分组已用完所有重试次数，准备切换到下一个分组
+				// 本次请求仍使用当前分组，但下次重试将使用下一个分组
+				logger.LogDebug(param.Ctx, "Current group %s retries exhausted (priorityRetry=%d >= RetryTimes=%d), preparing switch to next group for next retry", autoGroup, priorityRetry, common.RetryTimes)
+				common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroupIndex, i+1)
+				// Reset retry counter so outer loop can continue for next group
+				// 重置重试计数器，以便外层循环可以为下一个分组继续
+				param.SetRetry(0)
+				param.ResetRetryNextTry()
+			} else {
+				// Stay in current group, save current state
+				// 保持在当前分组，保存当前状态
+				common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroupIndex, i)
+			}
+			break
+		}
 	} else {
-		channel, err = model.GetRandomSatisfiedChannel(group, modelName, retry)
+		channel, err = model.GetRandomSatisfiedChannel(param.TokenGroup, param.ModelName, param.GetRetry())
 		if err != nil {
-			return nil, group, err
+			return nil, param.TokenGroup, err
 		}
 	}
 	return channel, selectGroup, nil
@@ -201,6 +201,10 @@ func generateStopBlock(index int) *dto.ClaudeResponse {
 }

 func StreamResponseOpenAI2Claude(openAIResponse *dto.ChatCompletionsStreamResponse, info *relaycommon.RelayInfo) []*dto.ClaudeResponse {
+	if info.ClaudeConvertInfo.Done {
+		return nil
+	}
+
 	var claudeResponses []*dto.ClaudeResponse
 	if info.SendResponseCount == 1 {
 		msg := &dto.ClaudeMediaMessage{
@@ -209,7 +213,7 @@ func StreamResponseOpenAI2Claude(openAIResponse *dto.ChatCompletionsStreamRespon
 			Type:  "message",
 			Role:  "assistant",
 			Usage: &dto.ClaudeUsage{
-				InputTokens:  info.PromptTokens,
+				InputTokens:  info.GetEstimatePromptTokens(),
 				OutputTokens: 0,
 			},
 		}
@@ -218,45 +222,117 @@ func StreamResponseOpenAI2Claude(openAIResponse *dto.ChatCompletionsStreamRespon
 			Type:    "message_start",
 			Message: msg,
 		})
-		claudeResponses = append(claudeResponses)
 		//claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
 		//	Type: "ping",
 		//})
 		if openAIResponse.IsToolCall() {
 			info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeTools
+			var toolCall dto.ToolCallResponse
+			if len(openAIResponse.Choices) > 0 && len(openAIResponse.Choices[0].Delta.ToolCalls) > 0 {
+				toolCall = openAIResponse.Choices[0].Delta.ToolCalls[0]
+			} else {
+				first := openAIResponse.GetFirstToolCall()
+				if first != nil {
+					toolCall = *first
+				} else {
+					toolCall = dto.ToolCallResponse{}
+				}
+			}
 			resp := &dto.ClaudeResponse{
 				Type: "content_block_start",
 				ContentBlock: &dto.ClaudeMediaMessage{
-					Id:    openAIResponse.GetFirstToolCall().ID,
+					Id:    toolCall.ID,
 					Type:  "tool_use",
-					Name:  openAIResponse.GetFirstToolCall().Function.Name,
+					Name:  toolCall.Function.Name,
 					Input: map[string]interface{}{},
 				},
 			}
 			resp.SetIndex(0)
 			claudeResponses = append(claudeResponses, resp)
+			// 首块包含工具 delta，则追加 input_json_delta
+			if toolCall.Function.Arguments != "" {
+				claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+					Index: &info.ClaudeConvertInfo.Index,
+					Type:  "content_block_delta",
+					Delta: &dto.ClaudeMediaMessage{
+						Type:        "input_json_delta",
+						PartialJson: &toolCall.Function.Arguments,
+					},
+				})
+			}
 		} else {

 		}
 		// 判断首个响应是否存在内容（非标准的 OpenAI 响应）
-		if len(openAIResponse.Choices) > 0 && len(openAIResponse.Choices[0].Delta.GetContentString()) > 0 {
+		if len(openAIResponse.Choices) > 0 {
+			reasoning := openAIResponse.Choices[0].Delta.GetReasoningContent()
+			content := openAIResponse.Choices[0].Delta.GetContentString()
+
+			if reasoning != "" {
+				claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+					Index: &info.ClaudeConvertInfo.Index,
+					Type:  "content_block_start",
+					ContentBlock: &dto.ClaudeMediaMessage{
+						Type:     "thinking",
+						Thinking: common.GetPointer[string](""),
+					},
+				})
+				claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+					Index: &info.ClaudeConvertInfo.Index,
+					Type:  "content_block_delta",
+					Delta: &dto.ClaudeMediaMessage{
+						Type:     "thinking_delta",
+						Thinking: &reasoning,
+					},
+				})
+				info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeThinking
+			} else if content != "" {
+				claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+					Index: &info.ClaudeConvertInfo.Index,
+					Type:  "content_block_start",
+					ContentBlock: &dto.ClaudeMediaMessage{
+						Type: "text",
+						Text: common.GetPointer[string](""),
+					},
+				})
+				claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+					Index: &info.ClaudeConvertInfo.Index,
+					Type:  "content_block_delta",
+					Delta: &dto.ClaudeMediaMessage{
+						Type: "text_delta",
+						Text: common.GetPointer[string](content),
+					},
+				})
+				info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeText
+			}
+		}
+
+		// 如果首块就带 finish_reason，需要立即发送停止块
+		if len(openAIResponse.Choices) > 0 && openAIResponse.Choices[0].FinishReason != nil && *openAIResponse.Choices[0].FinishReason != "" {
+			info.FinishReason = *openAIResponse.Choices[0].FinishReason
+			claudeResponses = append(claudeResponses, generateStopBlock(info.ClaudeConvertInfo.Index))
+			oaiUsage := openAIResponse.Usage
+			if oaiUsage == nil {
+				oaiUsage = info.ClaudeConvertInfo.Usage
+			}
+			if oaiUsage != nil {
+				claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+					Type: "message_delta",
+					Usage: &dto.ClaudeUsage{
+						InputTokens:              oaiUsage.PromptTokens,
+						OutputTokens:             oaiUsage.CompletionTokens,
+						CacheCreationInputTokens: oaiUsage.PromptTokensDetails.CachedCreationTokens,
+						CacheReadInputTokens:     oaiUsage.PromptTokensDetails.CachedTokens,
+					},
+					Delta: &dto.ClaudeMediaMessage{
+						StopReason: common.GetPointer[string](stopReasonOpenAI2Claude(info.FinishReason)),
+					},
+				})
+			}
 			claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
-				Index: &info.ClaudeConvertInfo.Index,
-				Type:  "content_block_start",
-				ContentBlock: &dto.ClaudeMediaMessage{
-					Type: "text",
-					Text: common.GetPointer[string](""),
-				},
+				Type: "message_stop",
 			})
-			claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
-				Index: &info.ClaudeConvertInfo.Index,
-				Type:  "content_block_delta",
-				Delta: &dto.ClaudeMediaMessage{
-					Type: "text_delta",
-					Text: common.GetPointer[string](openAIResponse.Choices[0].Delta.GetContentString()),
-				},
-			})
-			info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeText
+			info.ClaudeConvertInfo.Done = true
 		}
 		return claudeResponses
 	}
@@ -264,7 +340,7 @@ func StreamResponseOpenAI2Claude(openAIResponse *dto.ChatCompletionsStreamRespon
 	if len(openAIResponse.Choices) == 0 {
 		// no choices
 		// 可能为非标准的 OpenAI 响应，判断是否已经完成
-		if info.Done {
+		if info.ClaudeConvertInfo.Done {
 			claudeResponses = append(claudeResponses, generateStopBlock(info.ClaudeConvertInfo.Index))
 			oaiUsage := info.ClaudeConvertInfo.Usage
 			if oaiUsage != nil {
@@ -288,16 +364,114 @@ func StreamResponseOpenAI2Claude(openAIResponse *dto.ChatCompletionsStreamRespon
 		return claudeResponses
 	} else {
 		chosenChoice := openAIResponse.Choices[0]
-		if chosenChoice.FinishReason != nil && *chosenChoice.FinishReason != "" {
-			// should be done
+		doneChunk := chosenChoice.FinishReason != nil && *chosenChoice.FinishReason != ""
+		if doneChunk {
 			info.FinishReason = *chosenChoice.FinishReason
-			if !info.Done {
-				return claudeResponses
+		}
+
+		var claudeResponse dto.ClaudeResponse
+		var isEmpty bool
+		claudeResponse.Type = "content_block_delta"
+		if len(chosenChoice.Delta.ToolCalls) > 0 {
+			toolCalls := chosenChoice.Delta.ToolCalls
+			if info.ClaudeConvertInfo.LastMessagesType != relaycommon.LastMessageTypeTools {
+				claudeResponses = append(claudeResponses, generateStopBlock(info.ClaudeConvertInfo.Index))
+				info.ClaudeConvertInfo.Index++
+			}
+			info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeTools
+
+			for i, toolCall := range toolCalls {
+				blockIndex := info.ClaudeConvertInfo.Index
+				if toolCall.Index != nil {
+					blockIndex = *toolCall.Index
+				} else if len(toolCalls) > 1 {
+					blockIndex = info.ClaudeConvertInfo.Index + i
+				}
+
+				idx := blockIndex
+				if toolCall.Function.Name != "" {
+					claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+						Index: &idx,
+						Type:  "content_block_start",
+						ContentBlock: &dto.ClaudeMediaMessage{
+							Id:    toolCall.ID,
+							Type:  "tool_use",
+							Name:  toolCall.Function.Name,
+							Input: map[string]interface{}{},
+						},
+					})
+				}
+				
+				if len(toolCall.Function.Arguments) > 0 {
+					claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+						Index: &idx,
+						Type:  "content_block_delta",
+						Delta: &dto.ClaudeMediaMessage{
+							Type:        "input_json_delta",
+							PartialJson: &toolCall.Function.Arguments,
+						},
+					})
+				}
+
+				info.ClaudeConvertInfo.Index = blockIndex
+			}
+		} else {
+			reasoning := chosenChoice.Delta.GetReasoningContent()
+			textContent := chosenChoice.Delta.GetContentString()
+			if reasoning != "" || textContent != "" {
+				if reasoning != "" {
+					if info.ClaudeConvertInfo.LastMessagesType != relaycommon.LastMessageTypeThinking {
+						claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+							Index: &info.ClaudeConvertInfo.Index,
+							Type:  "content_block_start",
+							ContentBlock: &dto.ClaudeMediaMessage{
+								Type:     "thinking",
+								Thinking: common.GetPointer[string](""),
+							},
+						})
+					}
+					info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeThinking
+					claudeResponse.Delta = &dto.ClaudeMediaMessage{
+						Type:     "thinking_delta",
+						Thinking: &reasoning,
+					}
+				} else {
+					if info.ClaudeConvertInfo.LastMessagesType != relaycommon.LastMessageTypeText {
+						if info.ClaudeConvertInfo.LastMessagesType == relaycommon.LastMessageTypeThinking || info.ClaudeConvertInfo.LastMessagesType == relaycommon.LastMessageTypeTools {
+							claudeResponses = append(claudeResponses, generateStopBlock(info.ClaudeConvertInfo.Index))
+							info.ClaudeConvertInfo.Index++
+						}
+						claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+							Index: &info.ClaudeConvertInfo.Index,
+							Type:  "content_block_start",
+							ContentBlock: &dto.ClaudeMediaMessage{
+								Type: "text",
+								Text: common.GetPointer[string](""),
+							},
+						})
+					}
+					info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeText
+					claudeResponse.Delta = &dto.ClaudeMediaMessage{
+						Type: "text_delta",
+						Text: common.GetPointer[string](textContent),
+					}
+				}
+			} else {
+				isEmpty = true
 			}
 		}
-		if info.Done {
+
+		claudeResponse.Index = &info.ClaudeConvertInfo.Index
+		if !isEmpty && claudeResponse.Delta != nil {
+			claudeResponses = append(claudeResponses, &claudeResponse)
+		}
+
+		if doneChunk || info.ClaudeConvertInfo.Done {
 			claudeResponses = append(claudeResponses, generateStopBlock(info.ClaudeConvertInfo.Index))
-			oaiUsage := info.ClaudeConvertInfo.Usage
+			oaiUsage := openAIResponse.Usage
+			if oaiUsage == nil {
+				oaiUsage = info.ClaudeConvertInfo.Usage
+			}
 			if oaiUsage != nil {
 				claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
 					Type: "message_delta",
@@ -315,83 +489,8 @@ func StreamResponseOpenAI2Claude(openAIResponse *dto.ChatCompletionsStreamRespon
 			claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
 				Type: "message_stop",
 			})
-		} else {
-			var claudeResponse dto.ClaudeResponse
-			var isEmpty bool
-			claudeResponse.Type = "content_block_delta"
-			if len(chosenChoice.Delta.ToolCalls) > 0 {
-				if info.ClaudeConvertInfo.LastMessagesType != relaycommon.LastMessageTypeTools {
-					claudeResponses = append(claudeResponses, generateStopBlock(info.ClaudeConvertInfo.Index))
-					info.ClaudeConvertInfo.Index++
-					claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
-						Index: &info.ClaudeConvertInfo.Index,
-						Type:  "content_block_start",
-						ContentBlock: &dto.ClaudeMediaMessage{
-							Id:    openAIResponse.GetFirstToolCall().ID,
-							Type:  "tool_use",
-							Name:  openAIResponse.GetFirstToolCall().Function.Name,
-							Input: map[string]interface{}{},
-						},
-					})
-				}
-				info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeTools
-				// tools delta
-				claudeResponse.Delta = &dto.ClaudeMediaMessage{
-					Type:        "input_json_delta",
-					PartialJson: &chosenChoice.Delta.ToolCalls[0].Function.Arguments,
-				}
-			} else {
-				reasoning := chosenChoice.Delta.GetReasoningContent()
-				textContent := chosenChoice.Delta.GetContentString()
-				if reasoning != "" || textContent != "" {
-					if reasoning != "" {
-						if info.ClaudeConvertInfo.LastMessagesType != relaycommon.LastMessageTypeThinking {
-							//info.ClaudeConvertInfo.Index++
-							claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
-								Index: &info.ClaudeConvertInfo.Index,
-								Type:  "content_block_start",
-								ContentBlock: &dto.ClaudeMediaMessage{
-									Type:     "thinking",
-									Thinking: common.GetPointer[string](""),
-								},
-							})
-						}
-						info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeThinking
-						// text delta
-						claudeResponse.Delta = &dto.ClaudeMediaMessage{
-							Type:     "thinking_delta",
-							Thinking: &reasoning,
-						}
-					} else {
-						if info.ClaudeConvertInfo.LastMessagesType != relaycommon.LastMessageTypeText {
-							if info.LastMessagesType == relaycommon.LastMessageTypeThinking || info.LastMessagesType == relaycommon.LastMessageTypeTools {
-								claudeResponses = append(claudeResponses, generateStopBlock(info.ClaudeConvertInfo.Index))
-								info.ClaudeConvertInfo.Index++
-							}
-							claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
-								Index: &info.ClaudeConvertInfo.Index,
-								Type:  "content_block_start",
-								ContentBlock: &dto.ClaudeMediaMessage{
-									Type: "text",
-									Text: common.GetPointer[string](""),
-								},
-							})
-						}
-						info.ClaudeConvertInfo.LastMessagesType = relaycommon.LastMessageTypeText
-						// text delta
-						claudeResponse.Delta = &dto.ClaudeMediaMessage{
-							Type: "text_delta",
-							Text: common.GetPointer[string](textContent),
-						}
-					}
-				} else {
-					isEmpty = true
-				}
-			}
-			claudeResponse.Index = &info.ClaudeConvertInfo.Index
-			if !isEmpty {
-				claudeResponses = append(claudeResponses, &claudeResponse)
-			}
+			info.ClaudeConvertInfo.Done = true
+			return claudeResponses
 		}
 	}

@@ -734,12 +833,18 @@ func StreamResponseOpenAI2Gemini(openAIResponse *dto.ChatCompletionsStreamRespon
 	geminiResponse := &dto.GeminiChatResponse{
 		Candidates: make([]dto.GeminiChatCandidate, 0, len(openAIResponse.Choices)),
 		UsageMetadata: dto.GeminiUsageMetadata{
-			PromptTokenCount:     info.PromptTokens,
+			PromptTokenCount:     info.GetEstimatePromptTokens(),
 			CandidatesTokenCount: 0, // 流式响应中可能没有完整的 usage 信息
-			TotalTokenCount:      info.PromptTokens,
+			TotalTokenCount:      info.GetEstimatePromptTokens(),
 		},
 	}

+	if openAIResponse.Usage != nil {
+		geminiResponse.UsageMetadata.PromptTokenCount = openAIResponse.Usage.PromptTokens
+		geminiResponse.UsageMetadata.CandidatesTokenCount = openAIResponse.Usage.CompletionTokens
+		geminiResponse.UsageMetadata.TotalTokenCount = openAIResponse.Usage.TotalTokens
+	}
+
 	for _, choice := range openAIResponse.Choices {
 		candidate := dto.GeminiChatCandidate{
 			Index:         int64(choice.Index),
@@ -96,19 +96,21 @@ func RelayErrorHandler(ctx context.Context, resp *http.Response, showBodyWhenFai
 		if showBodyWhenFail {
 			newApiErr.Err = fmt.Errorf("bad response status code %d, body: %s", resp.StatusCode, string(responseBody))
 		} else {
-			if common.DebugEnabled {
-				logger.LogInfo(ctx, fmt.Sprintf("bad response status code %d, body: %s", resp.StatusCode, string(responseBody)))
-			}
+			logger.LogError(ctx, fmt.Sprintf("bad response status code %d, body: %s", resp.StatusCode, string(responseBody)))
 			newApiErr.Err = fmt.Errorf("bad response status code %d", resp.StatusCode)
 		}
 		return
 	}
-	if errResponse.Error.Message != "" {
+
+	if common.GetJsonType(errResponse.Error) == "object" {
 		// General format error (OpenAI, Anthropic, Gemini, etc.)
-		newApiErr = types.WithOpenAIError(errResponse.Error, resp.StatusCode)
-	} else {
-		newApiErr = types.NewOpenAIError(errors.New(errResponse.ToMessage()), types.ErrorCodeBadResponseStatusCode, resp.StatusCode)
+		oaiError := errResponse.TryToOpenAIError()
+		if oaiError != nil {
+			newApiErr = types.WithOpenAIError(*oaiError, resp.StatusCode)
+			return
+		}
 	}
+	newApiErr = types.NewOpenAIError(errors.New(errResponse.ToMessage()), types.ErrorCodeBadResponseStatusCode, resp.StatusCode)
 	return
 }

@@ -34,12 +34,20 @@ func checkRedirect(req *http.Request, via []*http.Request) error {
 }

 func InitHttpClient() {
+	transport := &http.Transport{
+		MaxIdleConns:        common.RelayMaxIdleConns,
+		MaxIdleConnsPerHost: common.RelayMaxIdleConnsPerHost,
+		ForceAttemptHTTP2:   true,
+	}
+
 	if common.RelayTimeout == 0 {
 		httpClient = &http.Client{
+			Transport:     transport,
 			CheckRedirect: checkRedirect,
 		}
 	} else {
 		httpClient = &http.Client{
+			Transport:     transport,
 			Timeout:       time.Duration(common.RelayTimeout) * time.Second,
 			CheckRedirect: checkRedirect,
 		}
@@ -50,6 +58,14 @@ func GetHttpClient() *http.Client {
 	return httpClient
 }

+// GetHttpClientWithProxy returns the default client or a proxy-enabled one when proxyURL is provided.
+func GetHttpClientWithProxy(proxyURL string) (*http.Client, error) {
+	if proxyURL == "" {
+		return GetHttpClient(), nil
+	}
+	return NewProxyHttpClient(proxyURL)
+}
+
 // ResetProxyClientCache 清空代理客户端缓存，确保下次使用时重新初始化
 func ResetProxyClientCache() {
 	proxyClientLock.Lock()
@@ -84,7 +100,10 @@ func NewProxyHttpClient(proxyURL string) (*http.Client, error) {
 	case "http", "https":
 		client := &http.Client{
 			Transport: &http.Transport{
-				Proxy: http.ProxyURL(parsedURL),
+				MaxIdleConns:        common.RelayMaxIdleConns,
+				MaxIdleConnsPerHost: common.RelayMaxIdleConnsPerHost,
+				ForceAttemptHTTP2:   true,
+				Proxy:               http.ProxyURL(parsedURL),
 			},
 			CheckRedirect: checkRedirect,
 		}
@@ -116,6 +135,9 @@ func NewProxyHttpClient(proxyURL string) (*http.Client, error) {

 		client := &http.Client{
 			Transport: &http.Transport{
+				MaxIdleConns:        common.RelayMaxIdleConns,
+				MaxIdleConnsPerHost: common.RelayMaxIdleConnsPerHost,
+				ForceAttemptHTTP2:   true,
 				DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) {
 					return dialer.Dial(network, addr)
 				},
@@ -108,7 +108,7 @@ func PreWssConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usag
 	groupRatio := ratio_setting.GetGroupRatio(relayInfo.UsingGroup)
 	modelRatio, _, _ := ratio_setting.GetModelRatio(modelName)

-	autoGroup, exists := ctx.Get("auto_group")
+	autoGroup, exists := common.GetContextKey(ctx, constant.ContextKeyAutoGroup)
 	if exists {
 		groupRatio = ratio_setting.GetGroupRatio(autoGroup.(string))
 		log.Printf("final group ratio: %f", groupRatio)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
CaIon	ceb7ebe5cd	feat(user): simplify user response structure in JSON output	2025-12-25 15:39:58 +08:00
Calcium-Ion	c7125480ac	Merge pull request #2524 from seefs001/fix/revert-model-ratio fix: revert model ratio	2025-12-25 15:38:36 +08:00
Seefs	0edef97413	fix: revert model ratio	2025-12-25 15:37:54 +08:00
Calcium-Ion	3955d61b3e	Merge pull request #2477 from 1420970597/fix/anthropic-cache-billing fix: 修复 Anthropic 渠道缓存计费错误	2025-12-24 16:59:23 +08:00
Seefs	07cb6e9626	Merge pull request #2493 from shikaiwei1/patch-1	2025-12-24 16:52:24 +08:00
Seefs	be047a7053	Merge pull request #2511 from JerryKwan/issue2499	2025-12-24 16:51:51 +08:00
Jerry	2504e9ad04	Resolving event mismatch in OpenAI2Claude add stricter validation for content_block_start corresponding to tool call and fix the crash issue when Claude Code is processing tool call	2025-12-24 14:52:39 +08:00
Calcium-Ion	8ee5c23b48	Merge pull request #2510 from feitianbubu/pr/0e7050dc89c1b761069f5e528d8ecf786e7008ae 修复claudeResponse流式请求空指针Panic	2025-12-24 14:15:51 +08:00
feitianbubu	1dc7ab9a97	fix: check claudeResponse delta StopReason nil point	2025-12-24 11:54:23 +08:00
CaIon	d488a19ed7	feat(token): enhance error handling in ValidateUserToken for better clarity	2025-12-22 18:01:38 +08:00
John Chen	6dbe89f1cf	为Moonshot添加缓存tokens读取逻辑为Moonshot添加缓存tokens读取逻辑。其与智普V4的逻辑相同，所以共用逻辑	2025-12-22 17:05:16 +08:00
Calcium-Ion	4bcdb1ec07	Merge pull request #2486 from QuantumNous/docs/readme-update-doc-links-new-routing 🔗 docs(readme): update documentation links to new site routing	2025-12-21 21:28:35 +08:00
t0ng7u	1f6527e91a	🔗 docs(readme): revert missing docs links to legacy site Keep new-site links (/{lang}/docs/...) where matching pages exist in the current docs repo Revert links that have no equivalent in the new docs to the legacy paths on doc.newapi.pro: Google Gemini Chat Midjourney-Proxy image docs Suno music docs Apply the same rule consistently across all README translations (zh/en/ja/fr)	2025-12-21 21:18:59 +08:00
t0ng7u	93b3cfc0f7	🔗 docs(readme): update documentation links to new site routing - Replace legacy `docs.newapi.pro` paths with the new `/{lang}/docs/...` structure across all README translations - Point key sections (installation, env vars, API, support, features) to their new locations - Ensure language-specific links use the correct locale prefix (zh/en/ja) and keep FR aligned with English routes	2025-12-21 21:00:33 +08:00
Calcium-Ion	7af80ff043	Merge pull request #2483 from seefs001/fix/vertex-function-response-id fix: 模型设置增加针对Vertex渠道过滤content[].part[].functionResponse.id的选项，默认启用	2025-12-21 17:24:07 +08:00
Seefs	45649249b2	fix: 在Vertex Adapter过滤content[].part[].functionResponse.id	2025-12-21 17:22:04 +08:00
Seefs	219b13af70	fix: 模型设置增加针对Vertex渠道过滤content[].part[].functionResponse.id的选项，默认启用	2025-12-21 17:09:49 +08:00
长安	6e3bc06fa6	fix: 修复 Anthropic 渠道缓存计费错误 ## 问题描述当使用 Anthropic 渠道通过 `/v1/chat/completions` 端点调用且启用缓存功能时，计费逻辑错误地减去了缓存 tokens，导致严重的收入损失（94.5%）。 ## 根本原因不同 API 的 `prompt_tokens` 定义不同： - Anthropic API: `input_tokens` 字段已经是纯输入 tokens（不包含缓存） - OpenAI API: `prompt_tokens` 字段包含所有 tokens（包含缓存） - OpenRouter API: `prompt_tokens` 字段包含所有 tokens（包含缓存）当前 `postConsumeQuota` 函数对所有渠道都减去缓存 tokens，这对 Anthropic 渠道是错误的，因为其 `input_tokens` 已经不包含缓存。 ## 修复方案在 `relay/compatible_handler.go` 的 `postConsumeQuota` 函数中，添加渠道类型判断： ```go if relayInfo.ChannelType != constant.ChannelTypeAnthropic { baseTokens = baseTokens.Sub(dCacheTokens) } ``` 只对非 Anthropic 渠道减去缓存 tokens。 ## 影响分析 ### ✅ 不受影响的场景 1. 无缓存调用（所有渠道） - cache_tokens = 0 - 减去 0 = 不减去 - 结果：完全一致 2. OpenAI/OpenRouter 渠道 + 缓存 - 继续减去缓存（因为 ChannelType != Anthropic） - 结果：完全一致 3. Anthropic 渠道 + /v1/messages 端点 - 使用 PostClaudeConsumeQuota（不修改） - 结果：完全不受影响 ### ✅ 修复的场景 4. Anthropic 渠道 + /v1/chat/completions + 缓存 - 修复前：错误地减去缓存，导致 94.5% 收入损失 - 修复后：不减去缓存，计费正确 ## 验证数据以实际记录 143509 为例： \| 项目 \| 修复前 \| 修复后 \| 差异 \| \|------\|--------\|--------\|------\| \| Quota \| 10,489 \| 191,330 \| +180,841 \| \| 费用 \| ¥0.020978 \| ¥0.382660 \| +¥0.361682 \| \| 收入恢复 \| - \| - \| +1724.1% \| ## 测试建议 1. 测试 Anthropic 渠道 + 缓存场景 2. 测试 OpenAI 渠道 + 缓存场景（确保不受影响） 3. 测试无缓存场景（确保不受影响） ## 相关 Issue 修复 Anthropic 渠道使用 prompt caching 时的计费错误。	2025-12-20 14:17:12 +08:00
CaIon	c2a6193497	feat(gin): improve request body handling and error reporting	2025-12-20 13:34:10 +08:00
CaIon	3523acfc2c	feat(init): increase MaxRequestBodyMB to enhance request handling	2025-12-20 13:27:55 +08:00
CaIon	f2d2b6e7fc	feat(channel): add error handling for SaveWithoutKey when channel ID is 0	2025-12-20 13:26:40 +08:00
Seefs	7a9cfa38ff	Merge pull request #2476 from TinsFox/chore/code-inspector-plugin	2025-12-20 11:04:40 +08:00
TinsFox	c06a216a14	chore: add code-inspector-plugin integration	2025-12-19 23:04:53 +08:00
Seefs	6d0cb5df75	Merge pull request #2474 from TinsFox/main	2025-12-19 21:39:56 +08:00
TinsFox	530b3eff11	style: add card spacing	2025-12-19 21:00:31 +08:00
Seefs	2f28f265a9	Merge pull request #2452 from QuantumNous/fix/oom-request-body-limit	2025-12-16 18:21:59 +08:00
t0ng7u	fa814b80fe	🧹 fix: harden request-body size handling and error unwrapping Tighten oversized request handling across relay paths and make error matching reliable. - Align `MAX_REQUEST_BODY_MB` fallback to `32` in request body reader and decompression middleware - Stop ignoring `GetRequestBody` errors in relay retry paths; return consistent 413 on oversized bodies (400 for other read errors) - Add `Unwrap()` to `types.NewAPIError` so `errors.Is/As` can match wrapped underlying errors - `go test ./...` passes	2025-12-16 18:10:00 +08:00
t0ng7u	c2ed76ddfd	🛡️ fix: prevent OOM on large/decompressed requests; skip heavy prompt meta when token count is disabled Clamp request body size (including post-decompression) to avoid memory exhaustion caused by huge payloads/zip bombs, especially with large-context Claude requests. Add a configurable `MAX_REQUEST_BODY_MB` (default `32`) and document it. - Enforce max request body size after gzip/br decompression via `http.MaxBytesReader` - Add a secondary size guard in `common.GetRequestBody` and cache-safe handling - Return 413 Request Entity Too Large on oversized bodies in relay entry - Avoid building large `TokenCountMeta.CombineText` when both token counting and sensitive check are disabled (use lightweight meta for pricing) - Update READMEs (CN/EN/FR/JA) with `MAX_REQUEST_BODY_MB` - Fix a handful of vet/formatting issues encountered during the change - `go test ./...` passes	2025-12-16 17:00:19 +08:00
Calcium-Ion	87195a07b0	Merge pull request #2445 from QuantumNous/feat/token-ip-whitelist-cidr feat(auth): enhance IP restriction handling with CIDR support	2025-12-15 20:14:09 +08:00
CaIon	692b5ff5ac	feat(auth): refactor IP restriction handling to use clearer variable naming	2025-12-15 20:13:09 +08:00
CaIon	947a763a1a	feat(auth): enhance IP restriction handling with CIDR support	2025-12-15 17:24:09 +08:00
CaIon	1bd0d8de02	Revert "feat(audio): replace SysLog with logger for improved logging in GetAudioDuration" This reverts commit `2a980bbcf5`.	2025-12-14 00:04:40 +08:00
CaIon	2a980bbcf5	feat(audio): replace SysLog with logger for improved logging in GetAudioDuration	2025-12-13 23:59:58 +08:00
CaIon	da0d3ea93c	fix(audio): improve WAV duration calculation with enhanced PCM size handling	2025-12-13 23:57:32 +08:00
CaIon	06d1bd404b	feat(model_ratio): add default ratios for gpt-4o-mini-tts	2025-12-13 19:14:27 +08:00
CaIon	284ce42c88	refactor(channel_select): improve retry logic with reset functionality	2025-12-13 18:09:10 +08:00
Calcium-Ion	ef5e4a056c	Merge pull request #2434 from QuantumNous/feat/gpt-4o-mini-tts feat: support gpt tts series model quota calculate	2025-12-13 17:55:16 +08:00
CaIon	3822f4577c	fix(audio): correct TotalTokens calculation for accurate usage reporting	2025-12-13 17:49:57 +08:00
CaIon	be2a863b9b	feat(audio): enhance audio request handling with token type detection and streaming support	2025-12-13 17:24:23 +08:00
CaIon	29565c837f	feat(token): add CrossGroupRetry field to token insertion	2025-12-13 16:45:42 +08:00
CaIon	a1299114a6	refactor(error): replace dto.OpenAIError with types.OpenAIError for consistency	2025-12-13 16:43:57 +08:00
CaIon	6175f254a2	refactor(channel_select): enhance retry logic and context key usage for channel selection	2025-12-13 16:43:38 +08:00
CaIon	7d586ef507	fix(helper): improve error handling in FlushWriter and related functions	2025-12-13 13:29:21 +08:00
CaIon	e0a79e853d	refactor(auth): replace direct token group setting with context key retrieval	2025-12-13 01:38:12 +08:00
Calcium-Ion	6e998eaf82	Merge pull request #2430 from QuantumNous/fix/cross-group-retry fix(channel_select): adjust priority retry logic for cross-group	2025-12-13 01:05:40 +08:00
CaIon	51f3a936e4	fix(channel_select): adjust priority retry logic for cross-group channel selection	2025-12-13 01:04:10 +08:00
Calcium-Ion	2a01d1c996	Merge pull request #2429 from QuantumNous/feat/xhigh feat(adaptor): add '-xhigh' suffix to reasoning effort options	2025-12-12 22:06:19 +08:00
CaIon	413968a0fd	refactor(relay): update channel retrieval to use RelayInfo structure	2025-12-12 22:04:38 +08:00
Calcium-Ion	281ebacb8c	Merge pull request #2424 from ion1ze/main fix: correct sender format issues fix #1347	2025-12-12 20:55:22 +08:00
CaIon	27dd42718b	feat(adaptor): add '-xhigh' suffix to reasoning effort options for model parsing	2025-12-12 20:53:48 +08:00
Calcium-Ion	3c5edc54b7	Merge pull request #2426 from QuantumNous/feat/auto-cross-group-retry feat(token): add cross-group retry option for token processing	2025-12-12 20:45:54 +08:00
Calcium-Ion	8b0e710053	Merge pull request #2428 from seefs001/fix/health-check fix: health check	2025-12-12 20:45:34 +08:00
Seefs	ae30b4d15f	fix: health check	2025-12-12 20:37:32 +08:00
CaIon	ffb1931906	feat: implement cross-group retry functionality and update translations	2025-12-12 18:28:33 +08:00
CaIon	c87deaa7d9	feat(token): add cross-group retry option for token processing	2025-12-12 17:59:21 +08:00
zdwy5	85ecad90a7	fix: 支持aws 通过全局参数透传或者渠道参数透传来调用 (#2423 ) * fix: 支持aws 通过全局参数透传或者渠道参数透传来调用 * fix(aws): replace json.Unmarshal with common.Unmarshal for request body processing --------- Co-authored-by: r0 <liangchunlei@01.ai> Co-authored-by: CaIon <i@caion.me>	2025-12-12 17:09:27 +08:00
zhiheng.wang	8279be2380	fix: correct sender format issues - Adjust sender field format, add space to separate nickname and email address - Ensure email header format complies with standard RFC specifications - Fix potential email client sending exceptions (Tencent Cloud)	2025-12-12 16:19:14 +08:00
Seefs	ee53a7b6bf	Merge pull request #2412 from seefs001/pr-2372 feat: add openai video remix endpoint	2025-12-11 23:35:23 +08:00
Seefs	d3086c8752	Merge pull request #2194 from NoahCodeGG/fix/process_channel_error	2025-12-11 18:12:06 +08:00
Calcium-Ion	4886c74097	Merge pull request #2397 from seefs001/fix/tool-call-claude fix: try to fix tool call issues	2025-12-09 16:57:24 +08:00
Seefs	7c16c20870	Merge pull request #2360 from feitianbubu/pr2/fix-price-currency	2025-12-09 14:10:26 +08:00
Calcium-Ion	a0f127496d	Merge pull request #2398 from seefs001/fix/video-proxy fix: Use channel proxy settings for task query scenarios	2025-12-09 14:05:30 +08:00
Calcium-Ion	36d0fd6fc5	Merge pull request #2396 from seefs001/fix/login fix: Try to fix login error "already logged in" issue	2025-12-09 14:04:48 +08:00
Calcium-Ion	b98c9f404c	Merge pull request #2395 from seefs001/fix/siderbar fix: sidebar color overlap	2025-12-09 14:04:26 +08:00
Calcium-Ion	8e8fbc8e61	Merge pull request #2394 from seefs001/fix/fetch-model-header-overide fix: fetch upstream models	2025-12-09 14:03:34 +08:00
Calcium-Ion	0cfaf8c625	Merge pull request #2359 from seefs001/fix/qwen-chat-args fix: qwen chat_template_kwargs	2025-12-09 14:01:26 +08:00
Calcium-Ion	3676b9f2bc	Merge pull request #2358 from seefs001/fix/regrex-repeat-compile fix: regex repeat compile	2025-12-09 14:01:07 +08:00
Calcium-Ion	6ade0b8211	Merge pull request #2357 from seefs001/feature/go1.25-greengc chore(go): enable greenteagc	2025-12-09 14:00:52 +08:00
Calcium-Ion	4e5c6297cb	Merge pull request #2356 from seefs001/feature/zhipiu_4v_image feat: zhipu 4v image generations	2025-12-09 14:00:20 +08:00
Seefs	ede2e0e94e	fix:try to fix tool call issues	2025-12-09 13:55:52 +08:00
Seefs	920e005048	fix: Use channel proxy settings for task query scenarios	2025-12-09 11:15:27 +08:00
Seefs	1ebbf5171f	fix: Add styles only on mobile	2025-12-09 10:46:16 +08:00
Seefs	019ae700ab	fix: Try to fix login error "already logged in" issue	2025-12-08 22:32:45 +08:00
Seefs	063597c191	fix: sidebar color overlap	2025-12-08 21:25:21 +08:00
Seefs	0d23d599b8	fix: fetch upstream models	2025-12-08 21:14:50 +08:00
Seefs	52c9e0edcf	Merge pull request #2387 from binorxin/fix-bug fix(go.mod): 更新modernc.org/sqlite依赖项版本	2025-12-08 21:02:18 +08:00
Seefs	cf243588fa	Merge pull request #2229 from HynoR/chore/v1 fix: Set default to unsupported value for gpt-5 model series requests	2025-12-08 20:59:30 +08:00
Seefs	0b896d4fe6	Merge pull request #2368 from oudi/main Increase token name length limit from 30 to 50	2025-12-08 20:48:40 +08:00
Seefs	43c1068e50	Merge pull request #2375 from FlowerRealm/feat/add-claude-haiku-4-5 feat: add claude-haiku-4-5-20251001 model support	2025-12-08 20:46:02 +08:00
Seefs	309b0d84f8	Merge pull request #2388 from FirstMelody/main fix(adaptor): fix reasoning suffix not processing in vertex adapter	2025-12-08 20:45:37 +08:00
borx	0be0b36503	fix(go.mod): 更新modernc.org/sqlite依赖项版本	2025-12-08 01:16:30 +08:00
firstmelody	06c23ea562	fix(adaptor): fix reasoning suffix not processing in vertex adapter	2025-12-08 01:12:29 +08:00
FlowerRealm	a655801017	feat: add claude-haiku-4-5-20251001 model support - Add model to Claude ModelList - Add model ratio (0.5, $1/1M input tokens) - Add completion ratio support (5x, $5/1M output tokens) - Add cache read ratio (0.1, $0.10/1M tokens) - Add cache write ratio (1.25, $1.25/1M tokens) Model specs: - Context window: 200K tokens - Max output: 64K tokens - Release date: October 1, 2025	2025-12-05 18:54:20 +08:00
oudi	7cd4de3f24	Merge pull request #1 from oudi/token-length-patch Increase token name length limit from 30 to 50	2025-12-04 11:21:46 +08:00
oudi	6680517aef	Increase token name length limit from 30 to 50	2025-12-04 11:18:51 +08:00
CaIon	ea3e9a01a0	feat: 将任务查询数量改为可配置环境变量 TASK_QUERY_LIMIT	2025-12-03 19:27:15 +08:00
feitianbubu	e030ac827a	feat: update price display use current currency symbol	2025-12-03 10:51:03 +08:00
Seefs	41b1499d41	fix: qwen chat_template_kwargs	2025-12-03 00:47:40 +08:00
Seefs	1dd8c0f569	fix: regex repeat compile	2025-12-03 00:41:47 +08:00
CaIon	64bf6d1e95	fix(token_counter): correct model name reference in image token estimation	2025-12-03 00:25:05 +08:00
Seefs	2430a8e036	chore(go): enable greenteagc	2025-12-02 23:15:20 +08:00
Seefs	634651b463	feat: zhipu v4 image generations	2025-12-02 22:56:58 +08:00
CaIon	049d3f2193	fix(price): adjust pre-consume quota logic for free models based on group ratio	2025-12-02 22:09:48 +08:00
Calcium-Ion	4c54836a53	Merge pull request #2344 from seefs001/feature/gemini-thinking-level feat: gemini 3 thinking level gemini-3-pro-preview-high	2025-12-02 21:55:43 +08:00
Calcium-Ion	cb40004d86	Merge pull request #2355 from QuantumNous/feat/optimize-token-counter feat: refactor token estimation logic	2025-12-02 21:51:09 +08:00
Calcium-Ion	bd7ee7e505	Merge pull request #2351 from prnake/fix-max-conns fix: try resolve the high concurrency issue to a single host	2025-12-02 21:44:24 +08:00
CaIon	2e4d27e001	feat(token_estimator): add concurrency support for multipliers retrieval	2025-12-02 21:38:58 +08:00
CaIon	1fededceb3	feat: refactor token estimation logic - Introduced new OpenAI text models in `common/model.go`. - Added `IsOpenAITextModel` function to check for OpenAI text models. - Refactored token estimation methods across various channels to use estimated prompt tokens instead of direct prompt token counts. - Updated related functions and structures to accommodate the new token estimation approach, enhancing overall token management.	2025-12-02 21:34:39 +08:00
Calcium-Ion	2073a907dd	Merge pull request #2353 from QuantumNous/openapi chore: update the relay openapi file	2025-12-02 18:18:35 +08:00
t0ng7u	90fcf2d20c	chore: update the relay openapi file	2025-12-02 18:17:01 +08:00
Papersnake	bf1a31df80	fix: add ForceAttemptHTTP2	2025-12-02 10:08:58 +08:00
Papersnake	14622aede6	fix: set MaxIdleConnsPerHost to 100	2025-12-02 09:55:03 +08:00
Seefs	607f7305b7	feat: gemini 3 thinking level gemini-3-pro-preview-high	2025-12-01 16:40:46 +08:00
HynoR	601d257b80	fix: Set default to unsupported value for gpt-5 model series requests	2025-11-15 13:28:38 +08:00
NoahCode	ac8b40a276	fix(channel): update channel identification logic in error processing	2025-11-08 20:33:14 +08:00