Friday, December 28, 2012

sp_server_diag_event_parser - Make SP_Server_Diagnostics EVENTS human-readable!

(update 2013/01/08 - just made another speed fix - now runs in under 20 seconds!)

This is my third post about sp_server_diagnostics, in which I've written two parsers to look at the various data returned from it.  This post is about the events component_type. Next post: query_processing.

50000 foot view: my code in this post takes the data from "events" component type, and returns:

For more info on sp_server_diagnostics in general, since it's going to be the replacement for the system_health session (SQL 2008) and the black box trace (SQL 2005), I'd recommend reading:


My first post on sp_server_diagnostics ( covered two of the rows & was pretty simple - it simply returned the memory info from "resource", and the waits from "query_processing".  Which showed sp_server_diag's power.

However, while 3-4 of the rows are fairly readable, even in XML, the component_name "events" is a bit trickier - there are multiple types of event, each one with its own different elements:

  • connectivity_ring_buffer_recorded
  • error_reported
  • resource_monitor_ring_buffer_recorded
  • scheduler_monitor_system_health_ring_buffer_recorded
  • security_error_ring_buffer_recorded
  • scheduler_monitor_non_yielding_ring_buffer_recorded
  • ...and even more

My thought was, rather than to hard-code it (which could mean I miss something), I would parse the XML dynamically, in two passes.  The first pass gets the various elements for each event type, the second uses that info to write another query to get the info from each event type.  As a side effect, it also allows me to format it better.  This is a modified version from my original post, which had some flaws.

What flaws?  Mostly that it really slow on servers with a lot of info, and it didn't return the full error message in some cases.  Fortunately, I was able to speed it up substantially through a trivial change that took me the better part of an afternoon to find - went from a pulling the XML in a derived sub-queries, to a variable.  From that one change, it now runs between 6 seconds and 2 minutes.  I also tweaked it to return more data in long error strings.

You will see cryptic fields like "callstack : : " in the results.
It comes from (sample XML, formatted strange so that it shows up here):

<data name="call_stack">

<type name="callstack" package="package0" />

<value />


As you can see, there's not really anything to report.  At the same point, I'm not sure why that's there, and I'd rather leave it in case things change in the future.  The nice side-effect is that it converts from:
<data name="tds_flags">

<type name="connectivity_record_tds_flag" package="sqlserver" />


<text>DisconnectDueToReadError, NetworkErrorFoundInInputStream, ErrorFoundBeforeLogin, SessionIsKilled, NormalDisconnect</text>

connectivity_record_tds_flag : DisconnectDueToReadError, NetworkErrorFoundInInputStream, ErrorFoundBeforeLogin, SessionIsKilled, No : 0x0000003e

If you do see a field with a value like: "int8: : 15" (where there are two semicolons), please let me know; while I've trapped all the various event datatypes that I could find, I literally found one (int16) while writing this post.

To use: Copy/paste the entire code below, create the SP, then run it (or just run all the code save the CREATE.  It can return several sets of data.  You'll also see where they reference each other: a row in connectivity_ring_buffer_recorded can reference a TDs error, which can then show up in error_reported as a network error message.

I hope this helps you troubleshoot SQL Server 2012.

Feel free to share, feel free to send me changes and enhancements.  Tell me what you like and dislike!

(Note that when copy/pasting, you may see "WHILE @min <= @max;" in the code, which SSMS can't use, and the proc won't compile. That should be a LESS-THAN symbol, followed by an EQUALS sign.  This should be fixed now - converted it to a GIST on 2014/02/24)

No comments: