MDEV-36436 Avoid undo log on insertion of intermediate table during ignore DDL #4620
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem:
An assertion failure occurs in InnoDB during consecutive ALTER TABLE operations using the COPY algorithm. The crash happens during the table rename phase because the
source table (dict_table_t::n_ref_count) is non-zero, despite the thread holding an exclusive Metadata Lock (MDL).
Reason:
When ALTER IGNORE TABLE is executed via the COPY algorithm, it generates undo logs for every row inserted into the intermediate table (e.g., #sql-alter-...). The background Purge Thread, responsible for cleaning up these undo logs, attempts to take an MDL on the table to prevent the table from being dropped while in use.
Race condition:
First ALTER: Creates #sql-alter-, copies data, and renames it to t1.
Purge Activation: The Purge thread picks up the undo logs from step 1. It takes an MDL on the temporary name (#sql-alter-) and increments the table's n_ref_count.
Identity Shift: InnoDB renames the physical table object to t1, but the Purge thread still holds a reference to this object.
Second ALTER: Starts a new copy process. When it attempts to rename the "new" t1 to a backup name, it checks if n_ref_count == 0. Because the Purge thread is still "pinning" the object to clean up logs from the first ALTER, the count is > 0, triggering the assertion failure.
Solution:
ALTER IGNORE TABLE needs row-level undo logging to easily roll back the last inserted row in case of duplicate key errors. By discarding the last undo log record after inserting each row, purge will not process any log records generated by ALTER IGNORE TABLE, preventing unexpected access from the purge subsystem during subsequent DDL operations.
Rename skip_alter_undo (1-bit) to alter_undo_mode (2-bit enum) to support different ALTER operation modes:
trx_undo_report_row_operation(): Add ALTER IGNORE undo rewriting logic. Store old undo record info before writing new records for IGNORE_UNDO mode. Reset undo top_offset and top_undo_no to maintain only latest insert undo