[System_Health] 2 dynamic and full parsers for the system_health session

Monday, April 13, 2015

[System_Health] 2 dynamic and full parsers for the system_health session

I needed to look at some locking info in the system_health session recently, and realized nobody ever wrote one - all the ones I've seen are explicit calls.
Everybody does stuff like '/event/data[3]', but there was no comprehensive shredding done. I decided to fix that.

While I've written something for SP_SERVER_DIAGNOSTICS a while ago (which will handle 2012+), I needed something specifically for 2008/2008R2, .
At it's core is a piece of code I picked up a while ago from stack_overflow and a couple other Xquery posts:

            SELECT
                   A.B.value('@name[1]', 'varchar(128)') AS EventType,
                   A.B.value('./@timestamp[1]', 'datetime') AS UTC_Time,
                    X.N.value('local-name(.)', 'varchar(128)') AS NodeName,
                    X.N.value('../@name[1]', 'varchar(128)') AS ParsedName,
                    X.N.value('./text()[1]', 'varchar(max)') AS NodeValue
            FROM    cte_healthsession
            CROSS APPLY EventXML.nodes('/*') AS A (B)
            CROSS APPLY EventXML.nodes('//*') AS X (N)

All this really does is shred out every node at the various levels, including names and values. Which for system_health seems to work pretty well (not perfectly; there are some values I need to filter in my pivot)

But once it's gotten all the potential details, the code automatically goes through each group and shreds it. As a bonus, I realized that by doing it slightly differently, I could return all the details in one table. The table does have a ton of fields, since many are specific to each type of event - error_reported doesn't have duration or wait_type, for instance, whereas waitinfo doesn't include error or messsage. However, it means that you can read them in time order quite easily, which makes it easier to correlate issues.

However, I also wound up with a version that returns multiple tables, one for each event type. (This looks similar to the sp_server_diag parser I wrote)

Now, when you run this (if you're not already doing a ton of stuff with system_health) you may be wondering why you're not seeing anything recent. There are two potential reasons:

1) events that are too large, most likely deadlocks. There's no good way around it other than to modify the system_health session to also write to a file (which 2012 does do). In order to make sure I'm not susceptible to it, I added deadlock monitoring via Event Notifications (add the event DEADLOCK_GRAPH). I will probably modify the system_health session again and add a file.

2) a really nasty bug in system_health, which Jonathon Kehayias mentioned in http://www.sqlskills.com/blogs/jonathan/incorrect-timestamp-on-events-in-extended-events/ . I can easily tell it's happening due to crappy monitoring software - I show a new error message every 5 minutes, although the time is several days ago (and not exactly X days ago, either). So, I added the system time, as suggested in the post.
Here's the code, holler if you have any questions!

One table:

Multiple tables:

The Baking DBA - select * from flour

Monday, April 13, 2015

[System_Health] 2 dynamic and full parsers for the system_health session

No comments:

Blog Archive