How to Handle MySQL ( AWS RDS ) Replication Errors Caused by Data Deletion or Modification on Write-Enabled Replicas

Replication is a critical feature in MySQL database, where changes made in the master database are propagated to the replicas. However, there could be instances where, by mistake, we delete some rows from the replica (that is write-enabled), and when the same rows are now deleted from the master, the replication will try to execute these statements in the replica and fail as the entries with the specific keys are not present. This blog post discusses the solution to this issue.

While our focus is on deleted data in the replica, the solution below can be applied to any scenario where there is a data mismatch between the replica and master due to independent modifications. This solution helps resume replication by skipping the current error-causing step, in cases where the replication can continue without inconsistency.

Identifying the Replication Error

To check the status of the slave, use the following command:

show slave status

If there is an error like the following, it means that the replication is not able to find the entries:

Could not execute Delete_rows event on table exampledb.EXAMPLE_TABLE; Can't find record in 'POTENTIAL_EARNING_DETAIL', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin-changelog.086030, end_log_pos 308128

In the above error, we can see that entries are missing (or deleted in replica) from the exampledb.EXAMPLE_TABLE table.

Solution

IMPORTANT: This option should only be used when we know what commands were executed in the replica.

There is an option to skip the replication error using the following statement:

stop slave;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
start slave;

The above set of commands does the following:

  1. Stop the replication.
  2. Skip that last error causing replication command ( statement ) and continue with the next.
  3. Start the replication ( slave ).

Once executed, this should restart the replication provided you only had one command executed in the replica that is now conflicting with the master.

However, this will give an error in the AWS RDS as we cannot set the global option. Therefore, if we are using RDS, we need to use the following command:

CALL mysql.rds_skip_repl_error;

The above command is a utility stored procedure available in the RDS that does the same 3 steps as above.

This will show a message like “Skipping the error” and “Slave is running successfully.” If there are multiple batches of delete, we will need to execute this command multiple times.

Note: This method can only work if the delete was run as a single transaction and not as an individual method. If executed as individual transactions, then we are required to run the skip for each statement.

Permanent Fix

There is always a chance of data modification on the replica if the replica is write-enabled. The only option to avoid this issue is to revoke the insert, delete, and update commands for all the users that are using the replica. We can use the following command:

revoke insert,delete,update on `database_name`.* from 'user'@'%';

Execute the command for each user and each database.

Footnotes

In conclusion, replication errors in MySQL databases can be troublesome and could lead to data loss. However, identifying the issue and following the right steps can prevent such errors and ensure a smooth replication process.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *