AntFleet

Disagreement · 8ff8c1af-openai-4

OpenAI streaming: abort does not cancel the reader/stream, risking wasted tokens and network work

mismatch
repo 56f59a0d·PR #2·reviewed 4 days ago

Primary finding

OpenAI streaming: abort does not cancel the reader/stream, risking wasted tokens and network work

mediumperformancehigh
  • src/providers/openai.ts:63-69
  • src/providers/openai.ts:101-103
On abort, the loop breaks but the ReadableStream isn’t canceled, and the underlying fetch may continue server-side generation until completion if the AbortSignal wasn’t already triggered in time. This can continue accruing token charges and network usage. Explicitly canceling the reader (and/or aborting the fetch) stops consumption promptly.

Recommendation

When options.signal?.aborted is detected, call reader.cancel(). Also ensure the same AbortSignal actually aborts the fetch after the response starts; if not, wrap fetch with your own AbortController and abort it on user signal and on early return. Example: if (options.signal?.aborted) { await reader.cancel().catch(()=>{}); break; } and in finally also guard-cancel if aborted.

Counterpart finding

OpenAIProvider streamMessage marks response incomplete=false even when aborted mid-stream after some data

lowbugmedium
  • src/providers/openai.ts:60-75
  • src/providers/openai.ts:162-175
processSSEStream exits cleanly on abort (break) so no exception is thrown. incomplete is then read from `options.signal?.aborted`, which is fine when actually aborted, but if abort happens after the stream naturally finished but before we read the flag, we may falsely mark complete responses as incomplete. More importantly, the asymmetric design (Anthropic throws on abort; OpenAI silently swallows) makes downstream incomplete semantics inconsistent. The bigger issue: token estimation runs on a possibly-truncated responseText, conflating estimated tokens for partial response with real usage when not delivered by API.

Recommendation

Distinguish between 'no usage provided' and 'aborted'; emit usage estimates only when complete, and clearly mark partials.

Why this didn't post

This finding didn't meet AntFleet's unanimous agreement threshold. Both frontier models review every PR independently; only findings they both flag with the same severity and category are posted to the PR. This one fell through.

read the methodology →

From the same review

These findings passed the unanimous gate on the same PR review. The disagreement above was filtered out; the findings below were posted.