Friday, April 22, 2016

[WAT] The weird intersection between Foreign Keys, Replication, Unique Indexes, set-based updates, and missing rows.

We are in process of doing a migration from an ancient creaky server to a shiny new VM.  Rather than just rebuild it and restore everything, we're taking the (painful) opportunity to clean things up and improve several systems.

As part of this, we're replicating data from the old server to the new server, so that we can migrate processes piecemeal, so that rollback is not "OH CRAP TURN IT OFF TURN IT OFF ROLL BACK TO THE OLD SERVER".

But we ran into a weird problem.  On the target server, we had a many-to-many table that sits between, let's say, stores and orders.  We have a stores table, we have an orders table, and this one (call it STORE_ORDERS for simplicity) is just a linking table between the two.  ID, stores_id, orders_id.  Everything scripted identically between the two databases (aside from the NOT FOR REPLICATION flag).

We found, while double-checking everything, that there were records missing from the stores_orders table on our new box, that were just fine on the source table.  What. The. Hell.  After looking for a few, we determined that it had to be related to one of our users doing a bunch of changes.... but just to the stores table.

What could it be?  They're updating the END_ASOF field in STORES, which is part of a unique constraint


The table affected does have a FOREIGN KEY constraint on STORES.  Because we're trying to migrate over, the DDL is the same.


And just to make it more fun, there are some AFTER UPDATE triggers involved.  (But those are unrelated).

Any ideas?  Replication, UNIQUE constraint, Foreign Keys.

Now, go read this article:

I'll wait.

In the end, we figured out what had to be happening, then confirmed via trace.  Care to guess?

So our source table, when you update that unique constraint (AS PART OF A SET; doing a singleton update, or "update stores where ID IN (1,2)" doesn't do it, we tried), replication will push a BOUNDED UPDATE, aka sp_MSdel/sp_MSins.


Which normally wouldn't be a big deal.  

Except for that foreign key.  Which has a CASCADE DELETE.

So, to recap: user does an UPDATE on the source table.  Because they updated a field in the unique constraint, the delete becomes a BOUNDED UPDATE.  Then, replication pushes this update, via a DEL/INS pair, to the target server.  That DEL hits the foreign key's CASCADE DELETE, which then DELETEs the stores_orders row.  Voila!  We updated Table A, and Table B loses records!

Moral of the story?  Beware complications and interactions, and test even when you know everything matches.

No comments: