Needed this today. I remember seeing someone at a SQLSaturday (which for me this year, doesn't narrow it down much) with this idea, but couldn't figure out who it was, so I wound up implementing my own.
Say you have, like most of us, a table with an "id INT IDENTITY PRIMARY KEY," so your PK is the ID, but the table also has a date column. Nobody cares about the date. Until they do, months later. So now you have a busy table that you can't easily add an index on, and an ADHOC OMGWTFBBQ project comes in, where you need to query a date range, and no idea how to get there quickly. .
*raises hand* Yup, that'd be me. So I wrote this.
What it does: cycle down through a table, using the ID column. Start by decrementing the max ID from the table by 10 million. Does that go below the date you need? Yes? Okay, try 1 million. Yes? 100k. No? How about 200k? Yes. 110k? No. 120k? Yes. 111k? No. 112k? Yes. 111100? (and so on).
It'll pop down even a terabyte table pretty quickly, since it only does a handful of queries, and they're all against the PK. Granted, I could make it faster by doing a (min+max)/2, then (newmin+max)/2 etc, but this works. I've nicknamed it "Zeno's Arrow", although technically my code doesn't go halfsies - but it is fast and direct.
Also, (importantly!) it may not cope with missing rows (which could happen due to stuff like, for instance, a failed transaction). I don't have a good fix for that, yet. Maybe in V2
Hope this helps.
Thursday, April 16, 2015
Subscribe to:
Post Comments (Atom)
1 comment:
Thanks for that, nice approximation technique. Please do an update post when you have a V2 version (gaps are always an issue)
Post a Comment