Thursday, January 27, 2011

[Replication] Orphaned distrubution agent after SQL Agent crash

I need to figure out where THIS particular error is stored in the logs, since none of my normal processes or alerts flagged this...

"Error messages:
The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Di"

How'd I find it? The big red X in the Replication Monitor, for the server whose SQL Agent crashed earlier today. Nothing in the Agent Log or the SQL Server Log. So it's buried somewhere in sysjobhistory

I believe the extant process was running properly, but why chance it?

Login to server, find "qrdrsvc.exe" in Task Manager. Kill process. Run the Queue Reader job. On my server, it's [MYServerName].6 (?), and was Between Retries. Also, the job category was: "REPL-QueueReader"

And you didn't think you'd learn anything new today...

[SQL] BULK INSERT and adding an identity

So I have a file with two columns, and I need to import it into a table with an IDENTITY column. Here's the easy way:


CREATE TABLE blah (narf VARCHAR(4000), poit VARCHAR(4000))

BULK INSERT blah
from "c:\files\blah.txt"

ALTER TABLE blah
ADD ID int IDENTITY(1,1) NOT NULL

[Replication] Orphaned agents after Agent crash.

Had SQL Server agent crash on me, and then my home-grown replication monitoring gave me the following error via email (interestingly enough, the Alerts we created didn't fire):


Agent 'MYSERVERNAME-MYDatabase-MyPublication-Servername-525' is
retrying after an error. 14 retries attempted. See agent
job history in the Jobs folder for more details.


Since that's the error you'd see in the Replication Monitor (start->run->"sqlmonitor"), I went and looked at the agent which showed our now-at-16 retries:

[...]
2011-01-27 22:46:17.596 Agent message code 21036. Another distribution agent for the subscription or subscriptions is running, or the server is working on a previous request by the same agent.


Interesting. My guess is that since the SQL Agent crashed, it left behind agents that are still running their original orders, blocking the new agents from starting. Well, let's look at what's connected.


SELECT * FROM master..sysprocesses
WHERE program_name LIKE @@servername + '%'
AND login_time < CONVERT(CHAR(8),GETDATE(),112)--before today
AND hostname = @@servername


Bingo. Two SPIDs, both with program names like 'MYSERVERNAME-MYDatabase-MyPublication-Servername' - and which match my original email above.

Kill the two SPIDs, and the next go-round the distribution agent spins up successfully.

Tuesday, January 11, 2011

Rule 5

Rule 5: Software scales faster than hardware.

(Case in point: had a service broker issue. Several fixes were made, but changing the code increased performance substantially with an overall lower CPU)

CHAR() vs CHAR()

Another of the "I forget the tool so let's put it in the blog" posts.

Obviously there's the datatype. But you can also create high-ascii & low-ascii symbols that might show up as a tiny square, or (at least in SSMS) not show up at all.

A map:
http://web.cs.mun.ca/~michael/c/ascii-table.html
This also explains some of the other code you may have seen for BULK INSERT, such as 0x0a (which is "newline" or "linefeed", the Unix "return"; Windows uses CR/LF). It can also be used to clean up code. Instead of something like
+ ''' +

you could use
+ char(39) + 


The value in CHAR has to be a number. Which is, naturally, both good and bad.

more examples:
select char(20) --random low-ascii
select char(32) --space

Monday, January 3, 2011

The Rules: 0-4

Here are my current DBA rules. Yours will differ, but these hold me in good stead.

Rule 0: Verify your backups. Obviously I assume you are making backups - you _are_, right?
Rule 1: UPDATE STATISTICS. That will frequently save your hide.
Rule 2: Data/Log always on separate drives. Know what happens if your data and log are on the same drive, and it fills up? No? You don't want to.
Rule 3: Complexity is the enemy. Not quite KISS, but definitely related.
Rule 4: SQL Sentry's "Disable Until" is a godsend. Make sure to use that instead of simply disabling the job. And double-check the time it restarts, since SQL Sentry adds 1 day by default - not useful if you're trying to disable something for 2-3 hours.

Varchar(n) vs Varchar(Max)

Two questions:
1) You have a 1tb table with 1m rows, and need to change a field from varchar(5) to varchar(10). How long does it take, and how much log space is used?
2) You have a 1tb table with 1m rows, and need to change a field from varchar(5) to varchar(MAX). How long does it take, and how much log space is used?


The answer for (1) is simple: instantly, and none. Fortunately, it's a metadata operation, and since it knows everything already in the database must be the right size, no work needs be done.
For (2), I can't tell you - we rolled back after 90 minutes (it then took another 3+ hours to roll back), and it had consumed 200+gb of log space. On the plus side, it's table-partitioned, so we'll upgrade it that way.