Route OpenAI image streaming through shared stream handling, split image/realtime/usage helpers for maintainability, and include the related image request and rate limit updates.
Three layered optimizations targeting Gemini-style 5MB base64 payloads where
RSS could balloon to tens of GB under concurrent load:
1. Byte-based param override (relay/common/override.go)
- Switch legacy/operations hot paths from common.Marshal round-trips and
map[string]any conversions to gjson/sjson on []byte directly.
- Avoids cloning 5MB strings during each Set/Delete operation.
2. strings.Builder for Gemini response markdown (relay/channel/gemini/relay-gemini.go)
- Replace string concatenation + strings.Join when assembling
"" content for inline image responses.
- Pre-allocates capacity from inline_data byte sizes.
3. Outbound BodyStorage + streaming Decoder (this commit's core)
- New relay/common/outbound_body.go helper wraps marshaled upstream bodies
in common.BodyStorage, allowing disk-cache mode to offload jsonData to
a temp file while waiting for upstream TTFB. The original []byte can
then be GC'd, removing ~5MB/req of heap residency during the longest
window of a request.
- All 7 relay handlers (gemini/claude/responses/embedding/image/compatible/
rerank) plus chat_completions_via_responses adopt the helper with
defer closer.Close() and explicit jsonData = nil.
- relay/common/relay_info.go: new UpstreamRequestBodySize so
relay/channel/api_request.go can populate req.ContentLength (lost when
body becomes a type-erased io.Reader).
- common/gin.go UnmarshalBodyReusable: when storage is disk-backed and
content-type is JSON, decode via DecodeJson(storage) instead of
storage.Bytes()+Unmarshal, removing one transient 5MB copy per request.
memory mode and form/multipart paths unchanged.
When proxying through another new-api instance, the upstream
X-Oneapi-Request-Id was overwriting the local one in client responses.
This adds a new `upstream_request_id` field to the logs table, captures
the upstream ID during relay, and filters it from being copied back to
the client. Frontend gains search/filter and detail display support.
- quota.go: add missing SettleBilling call in PostWssConsumeQuota
- text_quota.go: gate InjectTieredBillingInfo on tieredBillingApplied bool
instead of tieredResult != nil, so fallback billing still logs metadata
- price.go: remove quotaBeforeGroup == 0 from freeModel condition to avoid
bypassing settlement for output-only expressions
- tiered_settle.go: split cc/cc1h subtraction using UsageSemantic to
distinguish OpenAI vs Claude cache creation token formats
- pricing.go: only set BillingMode when a non-empty expression exists
- useModelPricingEditorState.js: only write billing_mode when
finalBillingExpr is non-empty
The Azure channel's GetRequestURL method only handled RelayModeResponses
but missed RelayModeResponsesCompact. This caused compact requests to
fall through to the generic deployments URL pattern, producing an
incorrect path that Azure returns 404 for.
This fix extends the existing responses API special handling to also
cover the compact mode, appending /compact to the subUrl when the relay
mode is ResponsesCompact.
Affected URLs (before → after):
- Normal Azure: /openai/deployments/{model}/responses/compact → /openai/v1/responses/compact
- cognitiveservices: same pattern → /openai/responses/compact
- Custom AzureResponsesVersion: properly respected for compact too
Co-authored-by: 彭俊杰 <pengjunjie@onero.com>
- Backend: differentiate error messages for admin vs regular users in price.go
- Backend: include error_code in channel test response for structured error handling
- Frontend: render model_price_error as a styled card in Playground with admin nav button
- Frontend: show inline error details and settings link in channel test modal
- Frontend: parse error codes from both SSE and non-streaming API responses
- i18n: remove redundant "Settings" suffix from setting tab translations (en/fr/ru/ja/vi)
- i18n: update "Group & Model Pricing" translations across all locales
Pre-fill BillingRequestInput from dto.Request before ModelPriceHelper,
so tiered_expr billing resolves param() from the structured request
instead of reading HTTP body (which is empty in channel-test context).
- attachTestBillingRequestInput: marshal dto.Request → RequestInput
- ResolveIncomingBillingExprRequestInput: early-return when pre-filled
- settleTestQuota / buildTestLogOther: align test settlement & logging
with production TryTieredSettle / InjectTieredBillingInfo paths