AntFleet

Disagreement · f3aba81f-openai-0

Target-hit dedup is global across all targets; later target hits within 4h (even same run) will be suppressed

mismatch
repo 6f7fc663·PR #23·reviewed 1 week ago

Primary finding

Target-hit dedup is global across all targets; later target hits within 4h (even same run) will be suppressed

highapi-contracthigh
  • skills/price-threshold-alert/SKILL.md:138
  • skills/price-threshold-alert/SKILL.md:16
  • skills/price-threshold-alert/SKILL.md:256-257
The spec stores a single last_alerts.target_hit timestamp and applies a 4h dedup to all target-crossing notifications. This contradicts the earlier promise of “One alert per target per direction” and will drop notifications for additional targets crossed within 4h (and within the same run after the first target updates the dedup clock).

Recommendation

Make target-crossing dedup per target, not global. Options: (a) Track per-target dedup timestamps (e.g., last_alerts.target_hit_by_target[normalized_target]); (b) Aggregate multiple target hits into a single message for the run; (c) At minimum, do not advance a global target_hit dedup clock multiple times within one run so multiple target hits in the same run can notify.

Counterpart finding

Sharp-move dedup combined with H1 percentage can suppress a second leg of a real move

lowmaintainabilitymedium
  • skills/price-threshold-alert/SKILL.md:137-143
The H1 rolling window is 1 hour; the dedup window is 4 hours. A token that pumps +25% in hour 1 (alert fires), then dumps -25% in hour 3 (also a 'sharp move' per spec) would be silenced because the dedup clock is wider than the measurement window. This is a deliberate design choice but is not called out, and operators may miss exactly the reversal events the skill markets itself as catching ('liquidation cascade').

Recommendation

Either (a) separate dedup clocks per direction (`sharp_move_up`, `sharp_move_down`) or (b) document the 4h-vs-1h trade-off explicitly so operators understand reversals can be silenced.

Why this didn't post

This finding didn't meet AntFleet's unanimous agreement threshold. Both frontier models review every PR independently; only findings they both flag with the same severity and category are posted to the PR. This one fell through.

read the methodology →