-
Notifications
You must be signed in to change notification settings - Fork 16
chore(tdigest): better handle Centroid adds #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: tison <wander4096@gmail.com>
| self.weight = self | ||
| .weight | ||
| .checked_add(other.weight.get()) | ||
| .expect("weight overflow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of silently saturating_add, panic if overflow.
This should never happen with real inputs, and let it crash should be better than silently holding inaccurate values.
| let delta = other_mean - self_mean; | ||
| self.mean = if delta.is_finite() { | ||
| delta.mul_add(ratio_other, self_mean) | ||
| } else { | ||
| let ratio_self = self_weight / total_weight; | ||
| self_mean.mul_add(ratio_self, other_mean * ratio_other) | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If other_mean - self_mean doesn't overflow/underflow, save one mul + one div to reduce precision loss.
This can overflow/underflow, so we fallback to conservative weighted average computed in that case. In both cases, use mul_add to best effort reduce precision loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the internal Centroid::add logic in the T-Digest implementation to more robustly combine centroids, particularly around numerical edge cases during mean updates.
Changes:
- Switch centroid weight accumulation from saturating addition to checked addition with an explicit overflow expectation.
- Update centroid mean computation to use a numerically stable formulation (
self_mean + (other_mean - self_mean) * ratio_other) with a fallback when the subtraction overflows to non-finite. - Tighten centroid mean invariant checking from “not NaN” to “finite”.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cc @andylokandy