Skip to content
/ server Public

Conversation

@Thirunarayanan
Copy link
Member

Problem:

An assertion failure occurs in InnoDB during consecutive ALTER TABLE operations using the COPY algorithm. The crash happens during the table rename phase because the
source table (dict_table_t::n_ref_count) is non-zero, despite the thread holding an exclusive Metadata Lock (MDL).

Reason:

When ALTER IGNORE TABLE is executed via the COPY algorithm, it generates undo logs for every row inserted into the intermediate table (e.g., #sql-alter-...). The background Purge Thread, responsible for cleaning up these undo logs, attempts to take an MDL on the table to prevent the table from being dropped while in use.

Race condition:

First ALTER: Creates #sql-alter-, copies data, and renames it to t1.

Purge Activation: The Purge thread picks up the undo logs from step 1. It takes an MDL on the temporary name (#sql-alter-) and increments the table's n_ref_count.

Identity Shift: InnoDB renames the physical table object to t1, but the Purge thread still holds a reference to this object.

Second ALTER: Starts a new copy process. When it attempts to rename the "new" t1 to a backup name, it checks if n_ref_count == 0. Because the Purge thread is still "pinning" the object to clean up logs from the first ALTER, the count is > 0, triggering the assertion failure.

Solution:

ALTER IGNORE TABLE needs row-level undo logging to easily roll back the last inserted row in case of duplicate key errors. By discarding the last undo log record after inserting each row, purge will not process any log records generated by ALTER IGNORE TABLE, preventing unexpected access from the purge subsystem during subsequent DDL operations.

Rename skip_alter_undo (1-bit) to alter_undo_mode (2-bit enum) to support different ALTER operation modes:

  • NO_UNDO (0): Normal mode with standard undo logging
  • SKIP_UNDO (1): ALTER mode that skips undo logging
  • IGNORE_UNDO (2): ALTER IGNORE mode that rewrites undo blocks

trx_undo_report_row_operation(): Add ALTER IGNORE undo rewriting logic. Store old undo record info before writing new records for IGNORE_UNDO mode. Reset undo top_offset and top_undo_no to maintain only latest insert undo

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

…gnore DDL

Problem:
=========
An assertion failure occurs in InnoDB during consecutive
ALTER TABLE operations using the COPY algorithm. The crash
happens during the table rename phase because the
source table (dict_table_t::n_ref_count) is non-zero, despite
the thread holding an exclusive Metadata Lock (MDL).

Reason:
========
When ALTER IGNORE TABLE is executed via the COPY algorithm,
it generates undo logs for every row inserted into the
intermediate table (e.g., #sql-alter-...). The background Purge
Thread, responsible for cleaning up these undo logs, attempts
to take an MDL on the table to prevent the table from
being dropped while in use.

Race condition:
==================
First ALTER: Creates #sql-alter-, copies data, and renames it to t1.

Purge Activation: The Purge thread picks up the undo logs from step 1.
It takes an MDL on the temporary name (#sql-alter-) and increments
the table's n_ref_count.

Identity Shift: InnoDB renames the physical table object to t1, but
the Purge thread still holds a reference to this object.

Second ALTER: Starts a new copy process. When it attempts to rename
the "new" t1 to a backup name, it checks if n_ref_count == 0.
Because the Purge thread is still "pinning" the object to
clean up logs from the first ALTER, the count is > 0,
triggering the assertion failure.

Solution:
========
ALTER IGNORE TABLE needs row-level undo logging to easily
roll back the last inserted row in case of duplicate key errors.
By discarding the last undo log record after inserting each row,
purge will not process any log records generated by
ALTER IGNORE TABLE, preventing unexpected access from the purge
subsystem during subsequent DDL operations.

Make skip_alter_undo (1-bit) to (2-bit enum)
to support different ALTER operation modes:

- NO_UNDO (0): Normal mode with standard undo logging
- SKIP_UNDO (1): ALTER mode that skips undo logging
- IGNORE_UNDO (2): ALTER IGNORE mode that rewrites undo blocks

trx_undo_report_row_operation(): Add ALTER IGNORE undo
rewriting logic. Store old undo record info before writing
new records for IGNORE_UNDO mode. Reset undo top_offset
to maintain only latest insert undo
@Thirunarayanan Thirunarayanan marked this pull request as ready for review February 9, 2026 08:07
@Thirunarayanan Thirunarayanan requested a review from dr-m February 9, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants