-
Notifications
You must be signed in to change notification settings - Fork 151
perf - Reduce TSO waker churn and quantify impact with Criterion #529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @mingley! |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Criterion benchmark and docs for TSO waker policies, and modifies TSO timestamp handling to track batch sizes, introduce an AtomicBool for lock-wait detection, adjust wake/register semantics based on pending-queue fullness transitions, and add unit tests for the new wake behavior. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pd/timestamp.rs (1)
78-82:⚠️ Potential issue | 🟡 MinorBackground task errors are silently discarded due to missing
JoinHandlehandling.The
Result<()>return type and explicitOk(())at line 116 align cleanly with the?usage inside the function. However, theJoinHandlereturned bytokio::spawn(run_tso(...))at line 62 is never stored or awaited. Since errors can occur at both thepd_client.tso()call (line 99) and withinallocate_timestamps()(line 105), failures in the background task will go unnoticed and the connection closure will only be discovered when callers receive a channel-closed error instead of the root cause.Consider storing the
JoinHandleand handling its potential error, or spawn a task that logs/propagates failures.
🧹 Nitpick comments (3)
benches/tso_waker_policy.rs (2)
20-36: The "old" and "new" response benchmarks have asymmetric work, which is expected but worth noting.In
response_policy_old,wake()is called unconditionally on every iteration, while inresponse_policy_new, it's called only on the full→non-full transition (~once per 1024 iterations). The reported speedup primarily measures the cost of not callingwake(), rather than the overhead of the conditional check itself. This is fine for validating the optimization's effect, but the doc and PR description should be clear that the speedup reflects the amortized skip rate under this specific simulation pattern.Also applies to: 38-57
9-11: Consider documenting whatFULL_EVERYandFULL_WINDOWrepresent.These simulation parameters control how often the queue becomes full in the benchmark, directly affecting the measured speedup ratio. A brief comment explaining their role would help future readers understand and tune the benchmark.
doc/tso_waker_criterion.md (1)
1-58: Consider noting that results should be re-run when the benchmark or production code changes.Hardcoded benchmark results in committed documentation risk becoming stale as the code evolves. Consider adding a note that these numbers are a point-in-time snapshot and should be re-collected after significant changes to
timestamp.rsor the benchmark itself.
c2fdbf5 to
a66b5e6
Compare
|
Addressed the CodeRabbit feedback in commit
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Addressed local review finding #1 in
|
839c263 to
8d5bb72
Compare
558b5b5 to
cd0eaa5
Compare
cd0eaa5 to
5e37adb
Compare
Signed-off-by: Michael Ingley <michael.ingley@gmail.com>
5e37adb to
a1312ac
Compare
Summary
This PR improves correctness and efficiency of TSO request-stream wake coordination in
src/pd/timestamp.rs.Quality objective:
Rationale
Problem statement
The prior sender/response coordination could generate redundant wake operations and had a lock-contention interleaving that could miss a needed wake signal.
Design changes
try_lockplus a register-and-retry handshake:sender_waiting_on_lock,observe_tso_batch(...).Risk analysis
File Scope
src/pd/timestamp.rsCargo.toml(branch-level context change already present in PR)Testing Done
Executed locally:
cargo fmt-> passcargo clippy --all-targets --all-features -- -D warnings-> passcargo test-> passFocused concurrency coverage in
src/pd/timestamp.rsincludes:poll_next_marks_waiting_flag_when_lock_is_contended_and_response_wakesregister_sender_wait_sets_waiting_flag_and_registers_waker_on_retry_failureregister_sender_wait_retries_once_and_clears_waiting_flag_when_lock_reacquirespoll_next_clears_waiting_flag_on_lock_acquirepoll_next_registers_self_waker_when_pending_queue_is_fullpoll_next_does_not_register_self_waker_when_queue_not_fullCompatibility
No public API surface change is introduced by this PR.