AntFleet

Disagreement · 8ff8c1af-anthropic-2

OpenAIProvider streamMessage marks response incomplete=false even when aborted mid-stream after some data

mismatch
repo 56f59a0d·PR #2·reviewed 4 days ago

Primary finding

OpenAIProvider streamMessage marks response incomplete=false even when aborted mid-stream after some data

lowbugmedium
  • src/providers/openai.ts:60-75
  • src/providers/openai.ts:162-175
processSSEStream exits cleanly on abort (break) so no exception is thrown. incomplete is then read from `options.signal?.aborted`, which is fine when actually aborted, but if abort happens after the stream naturally finished but before we read the flag, we may falsely mark complete responses as incomplete. More importantly, the asymmetric design (Anthropic throws on abort; OpenAI silently swallows) makes downstream incomplete semantics inconsistent. The bigger issue: token estimation runs on a possibly-truncated responseText, conflating estimated tokens for partial response with real usage when not delivered by API.

Recommendation

Distinguish between 'no usage provided' and 'aborted'; emit usage estimates only when complete, and clearly mark partials.

Counterpart finding

Token usage estimation in OpenAI provider ignores system prompt, undercounting input tokens

lowmaintainabilityhigh
  • src/providers/openai.ts:154-157
  • src/providers/openai.ts:231-233
Both streaming and non-streaming estimations exclude the system prompt text, which can be large and materially affect cost tracking. This under-reports input tokens when the API does not return exact usage.

Recommendation

Include options.systemPrompt.length in the estimation. For example: baseLength = (options.systemPrompt?.length ?? 0) + messages.reduce(...); Then divide by an appropriate chars-per-token heuristic per model.

Why this didn't post

This finding didn't meet AntFleet's unanimous agreement threshold. Both frontier models review every PR independently; only findings they both flag with the same severity and category are posted to the PR. This one fell through.

read the methodology →

From the same review

These findings passed the unanimous gate on the same PR review. The disagreement above was filtered out; the findings below were posted.