Friday, December 28, 2012

sp_server_diag_event_parser - Make SP_Server_Diagnostics EVENTS human-readable!

(update 2013/01/08 - just made another speed fix - now runs in under 20 seconds!)

This is my third post about sp_server_diagnostics, in which I've written two parsers to look at the various data returned from it.  This post is about the events component_type. Next post: query_processing.


50000 foot view: my code in this post takes the data from "events" component type, and returns:



For more info on sp_server_diagnostics in general, since it's going to be the replacement for the system_health session (SQL 2008) and the black box trace (SQL 2005), I'd recommend reading:


  •  http://msdn.microsoft.com/en-us/library/ff878233.aspx
  • http://blogs.msdn.com/b/psssql/archive/2012/03/08/sql-server-2012-true-black-box-recorder.aspx
  •  http://www.mytechmantra.com/LearnSQLServer/Using-sp_server_diagnostics-Quickly-Capture-Diagnostic-Data-and-Health-Information-SQL-Server-2012.html
  •  http://www.sqlskills.com/blogs/joe/post/Under-the-covers-with-sp_server_diagnostics-data-(Part-I).aspx




My first post on sp_server_diagnostics (http://thebakingdba.blogspot.com/2012/09/spserverdiagnostics-parsing-part-1-of-2.html) covered two of the rows & was pretty simple - it simply returned the memory info from "resource", and the waits from "query_processing".  Which showed sp_server_diag's power.

However, while 3-4 of the rows are fairly readable, even in XML, the component_name "events" is a bit trickier - there are multiple types of event, each one with its own different elements:

  • connectivity_ring_buffer_recorded
  • error_reported
  • resource_monitor_ring_buffer_recorded
  • scheduler_monitor_system_health_ring_buffer_recorded
  • security_error_ring_buffer_recorded
  • scheduler_monitor_non_yielding_ring_buffer_recorded
  • ...and even more


My thought was, rather than to hard-code it (which could mean I miss something), I would parse the XML dynamically, in two passes.  The first pass gets the various elements for each event type, the second uses that info to write another query to get the info from each event type.  As a side effect, it also allows me to format it better.  This is a modified version from my original post, which had some flaws.

What flaws?  Mostly that it really slow on servers with a lot of info, and it didn't return the full error message in some cases.  Fortunately, I was able to speed it up substantially through a trivial change that took me the better part of an afternoon to find - went from a pulling the XML in a derived sub-queries, to a variable.  From that one change, it now runs between 6 seconds and 2 minutes.  I also tweaked it to return more data in long error strings.

You will see cryptic fields like "callstack : : " in the results.
It comes from (sample XML, formatted strange so that it shows up here):

       
<data name="call_stack">

<type name="callstack" package="package0" />

<value />

</data>

As you can see, there's not really anything to report.  At the same point, I'm not sure why that's there, and I'd rather leave it in case things change in the future.  The nice side-effect is that it converts from:
       
         
<data name="tds_flags">

<type name="connectivity_record_tds_flag" package="sqlserver" />

<value>0x0000003e</value>

<text>DisconnectDueToReadError, NetworkErrorFoundInInputStream, ErrorFoundBeforeLogin, SessionIsKilled, NormalDisconnect</text>

</data>
to:
connectivity_record_tds_flag : DisconnectDueToReadError, NetworkErrorFoundInInputStream, ErrorFoundBeforeLogin, SessionIsKilled, No : 0x0000003e


If you do see a field with a value like: "int8: : 15" (where there are two semicolons), please let me know; while I've trapped all the various event datatypes that I could find, I literally found one (int16) while writing this post.

To use: Copy/paste the entire code below, create the SP, then run it (or just run all the code save the CREATE.  It can return several sets of data.  You'll also see where they reference each other: a row in connectivity_ring_buffer_recorded can reference a TDs error, which can then show up in error_reported as a network error message.


I hope this helps you troubleshoot SQL Server 2012.

Feel free to share, feel free to send me changes and enhancements.  Tell me what you like and dislike!


(Note that when copy/pasting, you may see "WHILE @min <= @max;" in the code, which SSMS can't use, and the proc won't compile. That should be a LESS-THAN symbol, followed by an EQUALS sign.  This should be fixed now - converted it to a GIST on 2014/02/24)


Thursday, December 13, 2012

Powershell - run a query against multiple servers in parallel, saving the data to a central table

(update 2013/06/13: I've superceded this with a new framework that will run almost any piece of code, powershell or sql, saving the results to a table, and running with multiple threads. Please give it a look!
http://thebakingdba.blogspot.com/2013/04/servers-extensible-powershell.html )


Update at 8:30pm: tweaked to add a throttled scheduler!  Runs 8 simultaneous threads to prevent too much workload on the box running the PS script.

SO...
I have a repository server.
Said repository server has a list of servers I monitor.
I want to be able to run a piece of code against all of them, AT THE SAME TIME, and save those results to  a table on my repository.  The code currently takes an hour to run, I'd like to shrink that.  Looking online, it looks like this is fairly basic code, but I hadn't seen it put together like this.

First, we need a table to save the results to.  Simple results, simple table.  Proof of concept, here, so only two fields.

CREATE TABLE myserverinfo (servername sysname, thedate datetime)

The next part needs two existing Powershell scripts written by "Hey Scripting Guy", invoke-sqlcmd2 and write-datatable.  You will need to get those.  I save (and reference) them in c:\sql_tools.

The powershell script.  Feel free to drag, copy, cut, paste and save as a .ps1 file:

#Run SQL query against multiple servers in parallel, saving results to a central table
# 1.00 2012/12/13 mdb / TBD
# 1.10 2012/12/13 mdb / TBD.  Adding throttling scheduler
# 1.11 2012/12/13 mdb adding comment on WHY you want the wait-job
# Code cribbed and lessons learned courtesy of: Hey Scripting Guy, 
# Aaron Bertrand (the bit with $args[0], his "sad panda face" post)
# Kendra Little.  Throttling scheduler is from "start-automating" on 
# Stackoverflow.  Errors are mine, not theirs.  
# Please keep this header and let me know how it works. thebakingdba.blogspot.com
. C:\sql_tools\invoke-sqlcmd2.ps1;
$serverlist = invoke-sqlcmd2 -serverinstance "myrepositoryserver" -query "SELECT server FROM dba.dbo.myserverlist WHERE active = 1" #get the list of your servers to monitor
foreach ($server in $serverlist)
{
    $jobsrunning = @(Get-Job | Where-Object { $_.JobStateInfo.State -eq 'Running' })
    if ($jobsrunning.Count -le 8) #keeps 8 jobs going simultaneously
    {
        start-job -argumentlist $server.server -scriptblock `
        {
        #have to reimport functions here due to the scriptblock IIRC
        . C:\sql_tools\invoke-sqlcmd2.ps1;
        . C:\sql_tools\write-datatable.ps1;
        $quer = invoke-sqlcmd2 -serverinstance $args[0] -database "master" `
            -query "waitfor delay '00:00:10'; select @@servername as servername, getdate() as thedate" -As 'DataTable'
        Write-DataTable -ServerInstance "myrepositoryserver" -Database "dba" -TableName "myserverinfo" -Data $quer
        } 
    } 
    else
    {
        $jobsrunning | Wait-Job -Any  #as soon as any job finishes, do the next
    }
}

    
get-job | wait-job -timeout 10         #wait 10 seconds or until all jobs are 
# done, whichever comes first.  Gives us a timeout before killing the slowpokes.
#additionally, if you schedule a job to run this, not setting a -timeout allows 
#it to wait for all the jobs to finish before quitting.  They wont run in the 
#background like you think they should.  

get-job|Remove-Job -force #cleanup and remove all jobs.  Kills stragglers.


SQLSentry - new feature in 7 (7.1?) that old customers need to know about

So, we lost our SQL Sentry database server last week - thing took a dive.  And, oddly enough, we didn't get a notification from SQL Sentry, even though the database server was being monitored.

Grrrr.

However, from talking with their tech support (who, honestly, are pretty awesome), there's a new notification that can be enabled, that fixes that.  It's not on by default (though I told them they need to change that), so you SHOULD enable it.  

In the Navigator Pane, go to Shared Groups (Global)
Open Failsafe Actions (on mine, it's a tab on the right side of the app)
Add "Monitoring Service: SQL Sentry Database Offline", Send Email
Add "Monitoring Service: SQL Sentry Database Online", Send Email

It also seems like they've improved the way the monitoring services work these days - it was scary easy to add a secondary site that uses our main SQL Sentry database, but monitors only servers at that site.  It's also really easy to add a backup/load-balancing monitoring service in general - install the service on another box, point it at your main SQL Sentry database, and away you go.

May my pain be your gain.