Make sure we never context switch while holding VM lock.#735
Open
luke-gruber wants to merge 1 commit intomasterfrom
Open
Make sure we never context switch while holding VM lock.#735luke-gruber wants to merge 1 commit intomasterfrom
luke-gruber wants to merge 1 commit intomasterfrom
Conversation
7ca26e8 to
1fa307d
Compare
Author
|
There's 1 bug in bigdecimal right now that's crashing due to |
36d6901 to
f8887a8
Compare
Author
|
The BigDecimal bug has been fixed. |
|
I wonder if this could be useful: https://github.com/ruby/ruby/blob/0bb6a8bea49fed8ccef0a70aca5f2ea05af94292/vm_core.h#L73-L103 |
938a94c to
29575c4
Compare
We were seeing errors in our application that looked like:
```
[BUG] unexpected situation - recordd:1 current:0
/error.c:1097 rb_bug_without_die_internal
/vm_sync.c:275 disallow_reentry
/eval_intern.h:136 rb_ec_vm_lock_rec_check
/eval_intern.h:147 rb_ec_tag_state
/vm.c:2619 rb_vm_exec
/vm.c:1702 rb_yield
/eval.c:1173 rb_ensure
```
We concluded that there was context switching going on while a thread
held the VM lock. During the investigation into the issue, we added
assertions that we never yield to another thread with the VM lock held.
We enabled these VM lock assertions even in single ractor mode. These
assertions were failing in a few places, but most notably in finalizers.
We were running finalizers with the VM lock held, and they were context
switching and causing this issue.
These rules must be held going forward to ensure we don't context switch unexpectedly:
If you have the VM lock held,
* Don't enter the interpreter loop.
* Don't call ruby methods whether or not they are defined in ruby
* Don't call `rb_nogvl`, `rb_mutex_lock`, etc.
* Don't check interrupts
Rework global variables, don't lock when calling getter or setter.
Add a test that fails without these lock_rec changes.
Add ASSERT_vm_unlocking() to vm_call0_body
This uncovered many more test failures.
Revert changes introduced in 2f6c694
This didn't appear to be a correct fix. We should allow raising
NoMemoryError even if we're under the VM lock. It will automatically
unlock us.
29575c4 to
8cf74f1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We were seeing errors in our application that looked like:
We concluded that there was context switching going on while a thread held the VM lock. During the investigation into the issue, we added assertions that we never yield to another thread with the VM lock held. We enabled these VM lock assertions even in single ractor mode. These assertions were failing in a few places, but most notably in finalizers. We were running finalizers with the VM lock held, and they were context switching and causing this issue.
These rules must be held going forward to ensure we don't context switch unexpectedly:
If you have the VM lock held,
* Don't enter the interpreter loop.
* Don't yield to ruby code.
* Don't call rb_nogvl (it will context switch you and will not unlock the VM lock).
* Don't check your own interrupts, it can switch you.
If you don't have the GVL:
* Don't call rb_ensure/rb_protect, etc (these are old rules but good to have assertions for).