Thursday, July 28, 2016

[WAT] Fun with table variables

Run this code.  Did you expect it to act a certain way?  Why?  Because table variables.  Thanks to (crap, who was it? *sigh*) who pointed this out in his tempDB talk a year or so ago.

DECLARE @blah TABLE (fielda VARCHAR(20))
INSERT INTO @blah
        (fielda)
SELECT 'a'
UNION ALL
SELECT 'b'
SELECT * FROM @blah

BEGIN TRANSACTION
UPDATE @blah
SET fielda = 'c'
SELECT * FROM @blah
ROLLBACK TRANSACTION

SELECT * FROM @blah

Friday, July 8, 2016

[Powershell] Killing that one instance of SSMS

WMI in action!  I was running SSMS 2016 and it locked up on me.  So while I waited for it to become responsive I started up SSMS 2014. Still locked, but which do I kill in taskman? (The easy answer, is, of course: change the open query and then look in Applications, not Processes). I didn't think of that, so used WMI.  You could also get fancy and figure out which is the oldest instance of SSMS and do it that way.

Pretty simple: Tell it the path (which is 130 for SSMS 2016), and kill that.

$processes = Get-WmiObject -query "select * from Win32_Process where executablepath like '%130%ssms%'"

foreach ($process in $processes) {
  $returnval = $process.terminate()
  $processid = $process.handle

if($returnval.returnvalue -eq 0) {
  write-host "The process $ProcessName `($processid`) terminated successfully"
}
else {
  write-host "The process $ProcessName `($processid`) termination has some problems"
}
}

[EN] Troubleshooting and fun with differential backups

Came in this morning to one of my EN alerts going off.  No new messages added in the past hour.  (I have enough servers that this is a really good failsafe)

Queue appears up.
SELECT * FROM sys.service_queues

No records in sys.transmission_queue.  Not a total failsafe, but a good start.

Sys.databases is... interesting.  "ACTIVE_BACKUP_OR_RESTORE" for my EN database.  We'll come back to that.

Any backups running?  NO.

Check the actual queue itself
sp_spaceused '[dbo].[ENAudit_SBQueue]'

Well, it's growing, which is actually a good sign (means, IMHO, that the MSDBs on other servers shouldn't be growing, since the messages are making it over here.

How about the activated stored procedure, then?  What's it doing?
SELECT * FROM sys.dm_broker_activated_tasks 
No records, which means that the activated SP isn't firing.  What the hey?  

Means that for some reason it's not activating properly.

Select * from sys.dm_broker_queue_monitors; 
The state for that database's monitor is "NOTIFIED". Huh, that seems odd.

As per http://rusanu.com/2008/08/03/understanding-queue-monitors/ , this means that it's waiting for the activated stored procedure to do a RECEIVE on the database, but won't fire it until that occurs.  Fortunately for us, possibly (I'm writing this as I diagnose the problem), ANY receive on the queue will work.  We may have to do that.

Okay, let's check what we should've started with - event logs:
The queue 517576882 in database 9 has activation enabled and contains unlocked messages but no RECEIVE has been executed for 26964 seconds.

Well, well, well.  Crap.

And then we see these from earlier:

The transaction log for database 'EventNotificationRec' is full due to 'ACTIVE_BACKUP_OR_RESTORE'.

and

The activated proc '[dbo].[Parse_EN_Messages]' running on queue 'EventNotificationRec.dbo.ENAudit_SBQueue' output the following:  'The transaction log for database 'EventNotificationRec' is full due to 'ACTIVE_BACKUP_OR_RESTORE'. '

Looks like we're onto something.

Aha!

BACKUP failed to complete the command BACKUP DATABASE EventNotificationRec. Check the backup application log for detailed messages.

Okay, so we have several things.  Let's look at job status.  Ahhhh.  The DIFF job is "cancelled". 

I think here's what happened - someone saw a failed backup, and ran the wrong job.  The DIFF backup job is disabled - it's the standard Ola Hallengren job.  Which means it tried to do a DIFF of every DB.  Which... would take a while, since we don't do them, so it could take longer than doing full backups.  They panicked and cancelled it.  And for whatever reason, that left the stored procedure/queue in a weird state where it had been notified but wasn't running.

UPDATE: While I thought the DIFF was the culprit, I was wrong.  The FULL backup on that database started at 00:40:00, so it looks like the TLOG (which is capped) filled up somehow while the backup was running.  No idea why - the cap was 3gb which has been totally fine until now.  Yes, the TLOG can fill during a backup, but it typically doesn't have enough going on for that to matter.

Fix?  Easy, and in Rusanu's post:

ALTER QUEUE ENAudit_SBQueue WITH ACTIVATION (STATUS = OFF);
ALTER QUEUE ENAudit_SBQueue WITH ACTIVATION (STATUS = ON);


Now?  Look in sys.dm_broker_queue_monitors.  All we care about is database_id 9, the other ones are the InternalMailQueue/ExternalMailQueue/syspolicy_event_queue in msdb (everybody has those).


And in sys.broker_activated_tasks.  Wow, it's using all 5 SPIDs.  I must've changed it after the last time the other team broke EN.  And I see new messages showing in my ENAudit_Events table.  Yay!






Wednesday, June 29, 2016

[Presenting] SQL Saturday San Antonio, here I come!

Woot!  August 13th, see you there!  Looking through the sessions, there's going to be a ton of awesome.  R services, Hekaton, Window Functions, BIML, IaaS, Query Store, Event Notifications(plug, plug) and more! 

And looking at just part of the lineup makes me happy - Amy Herold, Tim Costello, Tim Mitchell, Steve Jones, John Sterrett... holy crap, Boles is going to be there?! And Kalen Delaney?!!?!! And I'm not presenting against them?!  Squee!!






@SQLSatSanAntone and #SQLSatSA



Powershell - kill processes remotely

Simple problem: we had to replace the config files for an app that hits the database, pointing it at a CNAME. 

However, the file could be in use because the app was running.  And the app only loads the file on startup.  So we came up with this, which worked quite well.  Note that we use WMI (see my other posts on it!) to grab the application info then a WMI method to terminate the app.


Thursday, May 19, 2016

[Presenting] Know what your machines are doing (besides SQL Server) - WMI, WQL, & Powershell.

Howdy, NTSSUG!  Here's the slide deck for the presentation tonight (May 19th 2016).  Holler if questions!
(Note: live slide deck below; may change without notice as I update it)


Tuesday, May 10, 2016

[Presentation] A Cancellation and a Presentation

Bad news first:
Due to personal issues, I have had to cancel my presentation at SQLSaturday #514, Houston 2016.  I really hate cancelling.  But family comes first, and the team in Houston has been super understanding.

For some good news:
I am presenting the same material (WMI, WQL, PoSH, & SQL Server) at the North Texas SQL Server Users Group on May 19th.  The issues that prevent me from making Houston are manageable for a smaller commitment.  I said I'd speak there a while back (I think they finally cornered me at SQLSat Austin, IIRC), and realized that while I can't do Houston, I _can_ make this one.

After this, the next SQLSat for me is going to be several months.  I will submit my current two presentations for SQLSat San Antonio, which would be cool.  Unsure if I'll submit for Denver again - while their event and town is a blast (Voodoo Doughnuts! Tattered Cover!), that drive is a killer and I'm too much a tightwad to fly.  Finally, I do want to go to Kansas City again (their SQLSat is honestly the high point of all the ones I've been to, which means it's exceptionally high), which means I am going to have to come up with another presentation in the next 2.5 months.  Wonder what problem I'm solving that I can speak on.....

Friday, April 22, 2016

[WAT] The weird intersection between Foreign Keys, Replication, Unique Indexes, set-based updates, and missing rows.

We are in process of doing a migration from an ancient creaky server to a shiny new VM.  Rather than just rebuild it and restore everything, we're taking the (painful) opportunity to clean things up and improve several systems.

As part of this, we're replicating data from the old server to the new server, so that we can migrate processes piecemeal, so that rollback is not "OH CRAP TURN IT OFF TURN IT OFF ROLL BACK TO THE OLD SERVER".

But we ran into a weird problem.  On the target server, we had a many-to-many table that sits between, let's say, stores and orders.  We have a stores table, we have an orders table, and this one (call it STORE_ORDERS for simplicity) is just a linking table between the two.  ID, stores_id, orders_id.  Everything scripted identically between the two databases (aside from the NOT FOR REPLICATION flag).

We found, while double-checking everything, that there were records missing from the stores_orders table on our new box, that were just fine on the source table.  What. The. Hell.  After looking for a few, we determined that it had to be related to one of our users doing a bunch of changes.... but just to the stores table.

What could it be?  They're updating the END_ASOF field in STORES, which is part of a unique constraint

CREATE UNIQUE NONCLUSTERED INDEX [STORE_NUMBER_ENDASOF_IDX] ON [dbo].[STORE] ([STORE_NUMBER], [END_ASOF])






The table affected does have a FOREIGN KEY constraint on STORES.  Because we're trying to migrate over, the DDL is the same.

ALTER TABLE [dbo].[STORES_ORDERS]
ADD CONSTRAINT [FK_STORES_ORDERS_STORES] FOREIGN KEY ([STORES_ID]) REFERENCES [dbo].[STORES] ([ID]) ON DELETE CASCADE

And just to make it more fun, there are some AFTER UPDATE triggers involved.  (But those are unrelated).


Any ideas?  Replication, UNIQUE constraint, Foreign Keys.



Now, go read this article: https://blogs.msdn.microsoft.com/repltalk/2011/10/04/sql-server-transaction-replication-bounded-update-or-why-my-update-was-transformed-into-an-deleteinsert-pair/

I'll wait.



In the end, we figured out what had to be happening, then confirmed via trace.  Care to guess?





So our source table, when you update that unique constraint (AS PART OF A SET; doing a singleton update, or "update stores where ID IN (1,2)" doesn't do it, we tried), replication will push a BOUNDED UPDATE, aka sp_MSdel/sp_MSins.



Which normally wouldn't be a big deal.  

Except for that foreign key.  Which has a CASCADE DELETE.

So, to recap: user does an UPDATE on the source table.  Because they updated a field in the unique constraint, the delete becomes a BOUNDED UPDATE.  Then, replication pushes this update, via a DEL/INS pair, to the target server.  That DEL hits the foreign key's CASCADE DELETE, which then DELETEs the stores_orders row.  Voila!  We updated Table A, and Table B loses records!

Moral of the story?  Beware complications and interactions, and test even when you know everything matches.

Wednesday, April 13, 2016

SSISDB - making a more useful email (additional code so you don't need SQL Sentry)

As you may know by reading http://thebakingdba.blogspot.com/2012/11/sql-server-2012-ssis-getting-useful.html, I built a process a while ago that will scan the SSISDB history, when passed a job, and send out an email with the details of the most recent failure.  I love it and it means you get a useful email instead of a "go run the Integration Services Dashboard and start hunting" that it currently sends.

However, we run a LOT of SSIS jobs, multiple per minute, and so the query now takes over 30 seconds to return.  Which sucks, because SQL Sentry's "execute SQL" has a hard cut-off of 30 seconds.  So we stopped getting the useful emails.  I wanted to fix that.

First attempt: go look in sysjobhistory for failed jobs, and then walk through the jobs that failed and run my SSISDB code explicitly.

Enjoy!



Thursday, April 7, 2016

#BacktoBasics - a list of my basic precepts/tenets/rules

So, last December, Tim Ford (aka @SQLAgentMan) posted this challenge:

tim_challenge

We're now in April and I've "been meaning to" for 3 months now.  So let's get some #backtobasics.

Here's my personal list of rules.  My goal is to expound upon them, 1 post per month.  We'll see if I hit that.

Rule 0: Backup
Rule 1: UPDATE STATS
Rule 2: Data/Log ALWAYS separated
Rule 3: Complexity is the enemy.
Rule 4: Layers of Abstraction
Rule 5: Question Assumptions
Rule 6: NULL is NULL
Rule 7: Once, Adhoc. Twice, Automate.
Rule 8: If you have time, build the framework. (See #3)

Some of the stuff to cover, off the top of my head:
0: Ola, testing, tweaks, diff/full/log
1: Why it matters, why it works, when it doesn't, alternatives, flushing the cache
2: if you can't grow one or the other...
3: find some stories. The 5 hops for replication come to mind.
4: not just views. Synonyms, CNAMEs, views, etc.
5: "it works this way in SQL 200X" "well, SP1" "well, CU5 but not SP1 CU4"
6: Find some examples about getting burned by it.
7: pretty explanatory.  You get one whack at the pie for free.
8: contradicts 3, but reinforces 7. The EPR, partitioning, EN's blacklist.