Thursday, April 16, 2015

[Code] Finding the lowest ID (identity) for a date when there's no index on the date, like Zeno

Needed this today.  I remember seeing someone at a SQLSaturday (which for me this year, doesn't narrow it down much) with this idea, but couldn't figure out who it was, so I wound up implementing my own.

Say you have, like most of us, a table with an "id INT IDENTITY PRIMARY KEY," so your PK is the ID, but the table also has a date column.  Nobody cares about the date.  Until they do, months later.  So now you have a busy table that you can't easily add an index on, and an ADHOC OMGWTFBBQ project comes in, where you need to query a date range, and no idea how to get there quickly. .

*raises hand* Yup, that'd be me.  So I wrote this. 

What it does: cycle down through a table, using the ID column.  Start by decrementing the max ID from the table by 10 million.  Does that go below the date you need?  Yes?  Okay, try 1 million. Yes? 100k.  No? How about 200k? Yes. 110k? No. 120k? Yes. 111k? No. 112k? Yes. 111100? (and so on). 

It'll pop down even a terabyte table pretty quickly, since it only does a handful of queries, and they're all against the PK.  Granted, I could make it faster by doing a (min+max)/2, then (newmin+max)/2 etc, but this works.  I've nicknamed it "Zeno's Arrow", although technically my code doesn't go halfsies - but it is fast and direct.

Also, (importantly!) it may not cope with missing rows (which could happen due to stuff like, for instance, a failed transaction).  I don't have a good fix for that, yet.  Maybe in V2

Hope this helps.

1 comment:

Andrew said...

Thanks for that, nice approximation technique. Please do an update post when you have a V2 version (gaps are always an issue)