1) It has to be running all the time, and it consumes a large number of window resources.
2) Due to the way SQL reports issues in replication, it can show a publication as good, even if it's been erroring for 2 days.
3) It's active, not passive - I have to go look at it. It can't warn me via email or whatnot.
So, half a day digging through code and error messages, and I've come up with this. This is not fullproof by any stretch - I don't deal with Merge Replication, for instance. But in MY environment, it seems to work pretty well.
I'll see about adding a couple more fields, but I want to get this on paper.
(And yes, this is a request: if there's a better way than this, let me know)
--Courtesy of The Baking DBA
--Replication query that shows what's currently in error.
--1.01 added "delivering" exclusion, email, fixed
-- line-too-long-breaks-blogspot formatting
IF OBJECT_ID('tempdb.dbo.##replication_errors') IS NOT NULL
DROP TABLE ##replication_errors
SELECT
errors.agent_id,
errors.last_time,
agentinfo.name,
agentinfo.publication,
agentinfo.subscriber_db,
error_messages.comments AS ERROR
INTO ##replication_errors
FROM
--find our errors; note that a runstatus 3
--can be the last message, even if it's actually idle and good
(SELECT agent_id, MAX(TIME) AS last_time
FROM distribution.dbo.MSdistribution_history with (nolock)
WHERE runstatus IN (3,5,6)
AND comments NOT LIKE '%were delivered.'
AND comments NOT LIKE 'GROUP BY agent_id) errors
INNER JOIN
(SELECT agent_id, MAX(TIME) AS last_time
FROM distribution.dbo.MSdistribution_history with (nolock)
WHERE runstatus IN (1,2,4)
OR comments LIKE '%were delivered.'
GROUP BY agent_id) clean
ON errors.agent_id = clean.agent_id
AND errors.last_TIME > clean.last_time
--grab the agent information
INNER JOIN distribution.dbo.MSdistribution_agents agentinfo
ON agentinfo.id = errors.agent_id
--and the actual message we'd see in the monitor
inner JOIN distribution.dbo.MSdistribution_history error_messages
ON error_messages.agent_id = errors.agent_id
AND error_messages.time = errors.last_time
AND comments NOT LIKE '%TCP Provider%'
AND comments NOT LIKE '%Delivering replicated transactions%'
IF (SELECT COUNT(*) FROM ##replication_errors) > 0
EXEC msdb.dbo.sp_send_dbmail
@profile_name = 'email_profile',
@recipients = 'blah@tbd.com',
@subject = 'Replication errors'
,@query = 'select * from ##replication_errors'
,@query_result_header = 0
DROP TABLE ##replication_errors