Showing posts with label system_health. Show all posts
Showing posts with label system_health. Show all posts

Thursday, March 25, 2021

System_health - parsing out queryProcessing node

Needed this today, my pain = your gain.


System_health has a TON of useful info.  However, it's buried in XML, and my shredder (see other posts with the system_health tag) doesn't go this deep.  

sp_server_diagnostics runs every 5 minutes, and there are other posts on getting some of the data out.  But I needed more, and I needed more than just "right now".  System_health saves 32 or so, in 5 minute increments.

This is not complete.  But you do get max workers, max in use, max idle, memory allocations, and non-preemptive wait types, both by count and duration.

Enjoy!


if object_id('tempdb..#systemhealthsessiondata') is not null

    DROP TABLE #systemhealthsessiondata


SELECT CAST(xet.target_data AS XML) AS XMLDATA

INTO #SystemHealthSessionData

FROM sys.dm_xe_session_targets xet

JOIN sys.dm_xe_sessions xe

ON (xe.address = xet.event_session_address)

WHERE xe.name = 'system_health'


SELECT C.query('.') EventXML

, T.C.value('@timestamp', 'datetime')


, T.C.value('(data/value/queryProcessing/@maxWorkers)[1]','varchar(255)') as maxWorkers

, T.C.value('(data/value/queryProcessing/@workersCreated)[1]','varchar(255)') as workersCreated

, T.C.value('(data/value/queryProcessing/@workersIdle)[1]','varchar(255)') as workersIdle

, T.C.value('(data/value/queryProcessing/@tasksCompletedWithinInterval)[1]','varchar(255)') as tasksCompletedWithinInterval

, T.C.value('(data/value/queryProcessing/@pendingTasks)[1]','varchar(255)') as pendingTasks

, T.C.value('(data/value/queryProcessing/@oldestPendingTaskWaitingTime)[1]','varchar(255)') as oldestPendingTaskWaitingTime

, T.C.value('(data/value/queryProcessing/@hasUnresolvableDeadlockOccurred)[1]','varchar(255)') as hasUnresolvableDeadlockOccurred

, T.C.value('(data/value/queryProcessing/@hasDeadlockedSchedulersOccurred)[1]','varchar(255)') as hasDeadlockedSchedulersOccurred

, T.C.value('(data/value/queryProcessing/@trackingNonYieldingScheduler)[1]','varchar(255)') as trackingNonYieldingScheduler

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="MEMORY_ALLOCATION_EXT"]/@waits)[1]','varchar(255)') as MEMORY_ALLOCATION_EXT_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="RESERVED_MEMORY_ALLOCATION_EXT"]/@waits)[1]','varchar(255)') as RESERVED_MEMORY_ALLOCATION_EXT_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="OLEDB"]/@waits)[1]','varchar(255)') as OLEDB_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="WRITELOG"]/@waits)[1]','varchar(255)') as WRITELOG_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="PAGEIOLATCH_SH"]/@waits)[1]','varchar(255)') as PAGEIOLATCH_SH_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="ASYNC_NETWORK_IO"]/@waits)[1]','varchar(255)') as ASYNC_NETWORK_IO_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="IO_COMPLETION"]/@waits)[1]','varchar(255)') as IO_COMPLETION_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="PAGELATCH_EX"]/@waits)[1]','varchar(255)') as PAGELATCH_EX_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="PAGEIOLATCH_EX"]/@waits)[1]','varchar(255)') as PAGEIOLATCH_EX_ct

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byCount/wait[@waitType="CXPACKET"]/@waits)[1]','varchar(255)') as CXPACKET_ct


, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="SQLTRACE_WAIT_ENTRIES"]/@waits)[1]','varchar(255)') as SQLTRACE_WAIT_ENTRIES_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="CLR_AUTO_EVENT"]/@waits)[1]','varchar(255)') as CLR_AUTO_EVENT_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="HADR_FILESTREAM_IOMGR_IOCOMPLETION"]/@waits)[1]','varchar(255)') as HADR_FILESTREAM_IOMGR_IOCOMPLETION_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="TRACEWRITE"]/@waits)[1]','varchar(255)') as TRACEWRITE_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="ASYNC_NETWORK_IO"]/@waits)[1]','varchar(255)') as ASYNC_NETWORK_IO_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="WRITELOG"]/@waits)[1]','varchar(255)') as WRITELOG_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="PAGEIOLATCH_SH"]/@waits)[1]','varchar(255)') as PAGEIOLATCH_SH_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="OLEDB"]/@waits)[1]','varchar(255)') as OLEDB_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="PAGEIOLATCH_EX"]/@waits)[1]','varchar(255)') as PAGEIOLATCH_EX_ms

, T.C.value('(data/value/queryProcessing/topWaits/nonPreemptive/byDuration/wait[@waitType="LCK_M_SCH_S"]/@waits)[1]','varchar(255)') as LCK_M_SCH_S_ms


/*

            <byDuration>

              <wait waitType="SQLTRACE_WAIT_ENTRIES" waits="2559734" averageWaitTime="5898" maxWaitTime="304986" />

              <wait waitType="CLR_AUTO_EVENT" waits="42" averageWaitTime="148452579" maxWaitTime="1161135755" />

              <wait waitType="HADR_FILESTREAM_IOMGR_IOCOMPLETION" waits="6283026" averageWaitTime="500" maxWaitTime="38514" />

              <wait waitType="TRACEWRITE" waits="1656275" averageWaitTime="1892" maxWaitTime="15678" />

              <wait waitType="ASYNC_NETWORK_IO" waits="170646385" averageWaitTime="4" maxWaitTime="34341" />

              <wait waitType="WRITELOG" waits="322871416" averageWaitTime="1" maxWaitTime="166927" />

              <wait waitType="PAGEIOLATCH_SH" waits="204943649" averageWaitTime="1" maxWaitTime="157813" />

              <wait waitType="OLEDB" waits="996781932" averageWaitTime="0" maxWaitTime="158157" />

              <wait waitType="PAGEIOLATCH_EX" waits="32693084" averageWaitTime="1" maxWaitTime="121494" />

              <wait waitType="LCK_M_SCH_S" waits="10554" averageWaitTime="4387" maxWaitTime="94983" />

            </byDuration>

          </nonPreemptive>

          <preemptive>

            <byCount>

              <wait waitType="PREEMPTIVE_OS_AUTHENTICATIONOPS" waits="1935312" averageWaitTime="0" maxWaitTime="206" />

              <wait waitType="PREEMPTIVE_OS_CRYPTOPS" waits="1391671" averageWaitTime="0" maxWaitTime="143" />

              <wait waitType="PREEMPTIVE_OS_QUERYREGISTRY" waits="912266" averageWaitTime="0" maxWaitTime="188" />

              <wait waitType="PREEMPTIVE_OS_WRITEFILE" waits="817138" averageWaitTime="30" maxWaitTime="145310" />

              <wait waitType="PREEMPTIVE_OS_GETPROCADDRESS" waits="637470" averageWaitTime="0" maxWaitTime="15" />

              <wait waitType="PREEMPTIVE_XE_CALLBACKEXECUTE" waits="629446" averageWaitTime="0" maxWaitTime="9" />

              <wait waitType="PREEMPTIVE_OS_AUTHORIZATIONOPS" waits="560759" averageWaitTime="0" maxWaitTime="59" />

              <wait waitType="PREEMPTIVE_OS_CLOSEHANDLE" waits="494332" averageWaitTime="1" maxWaitTime="124284" />

              <wait waitType="PREEMPTIVE_OS_CRYPTACQUIRECONTEXT" waits="464831" averageWaitTime="2" maxWaitTime="129" />

              <wait waitType="PREEMPTIVE_OS_REVERTTOSELF" waits="96931" averageWaitTime="0" maxWaitTime="36" />

            </byCount>

            <byDuration>

              <wait waitType="PREEMPTIVE_OS_WRITEFILE" waits="817138" averageWaitTime="30" maxWaitTime="145310" />

              <wait waitType="PREEMPTIVE_OS_CRYPTACQUIRECONTEXT" waits="464831" averageWaitTime="2" maxWaitTime="129" />

              <wait waitType="PREEMPTIVE_OS_CLOSEHANDLE" waits="494332" averageWaitTime="1" maxWaitTime="124284" />

              <wait waitType="PREEMPTIVE_OS_AUTHENTICATIONOPS" waits="1935312" averageWaitTime="0" maxWaitTime="206" />

              <wait waitType="PREEMPTIVE_OS_QUERYREGISTRY" waits="912266" averageWaitTime="0" maxWaitTime="188" />

              <wait waitType="PREEMPTIVE_OS_FILEOPS" waits="18032" averageWaitTime="16" maxWaitTime="3488" />

              <wait waitType="PREEMPTIVE_OS_REPORTEVENT" waits="2032" averageWaitTime="135" maxWaitTime="102489" />

              <wait waitType="PREEMPTIVE_OS_CRYPTOPS" waits="1391671" averageWaitTime="0" maxWaitTime="143" />

              <wait waitType="PREEMPTIVE_OS_FLUSHFILEBUFFERS" waits="1924" averageWaitTime="60" maxWaitTime="3880" />

              <wait waitType="PREEMPTIVE_OS_CREATEFILE" waits="3048" averageWaitTime="26" maxWaitTime="6397" />

            </byDuration>

          </preemptive>

        </topWaits>

*/


FROM #SystemHealthSessionData a

CROSS APPLY a.XMLDATA.nodes('/RingBufferTarget/event') as T(C)

--cross apply T.C.nodes('/event/data') as E(D)

where 

 T.C.query('.').value('(/event/@name)[1]', 'varchar(255)') in ('sp_server_diagnostics_component_result')

and  C.value('(data/text)[1]','varchar(255)')= 'QUERY_PROCESSING'


Monday, April 13, 2015

[System_Health] 2 dynamic and full parsers for the system_health session




I needed to look at some locking info in the system_health session recently, and realized nobody ever wrote one - all the ones I've seen are explicit calls. 
Everybody does stuff like '/event/data[3]', but there was no comprehensive shredding done.  I decided to fix that.  

While I've written something for SP_SERVER_DIAGNOSTICS a while ago (which will handle 2012+), I needed something specifically for 2008/2008R2, . 
 At it's core is a piece of code I picked up a while ago from stack_overflow and a couple other Xquery posts:

              SELECT  
                    A.B.value('@name[1]', 'varchar(128)') AS EventType,
                    A.B.value('./@timestamp[1]', 'datetime') AS UTC_Time,
                    X.N.value('local-name(.)', 'varchar(128)') AS NodeName,
                    X.N.value('../@name[1]', 'varchar(128)') AS ParsedName,
                    X.N.value('./text()[1]', 'varchar(max)') AS NodeValue
             FROM    cte_healthsession
             CROSS APPLY EventXML.nodes('/*') AS A (B)
             CROSS APPLY EventXML.nodes('//*') AS X (N)



All this really does is shred out every node at the various levels, including names and values.  Which for system_health seems to work pretty well (not perfectly; there are some values I need to filter in my pivot)

But once it's gotten all the potential details, the code automatically goes through each group and shreds it.  As a bonus, I realized that by doing it slightly differently, I could return all the details in one table.  The table does have a ton of fields, since many are specific to each type of event - error_reported doesn't have duration or wait_type, for instance, whereas waitinfo doesn't include error or messsage.  However, it means that you can read them in time order quite easily, which makes it easier to correlate issues.



However, I also wound up with a version that returns multiple tables, one for each event type.  (This looks similar to the sp_server_diag parser I wrote)





 


Now, when you run this (if you're not already doing a ton of stuff with system_health) you may be wondering why you're not seeing anything recent.  There are two potential reasons:

1) events that are too large, most likely deadlocks.  There's no good way around it other than to modify the system_health session to also write to a file (which 2012 does do).  In order to make sure I'm not susceptible to it, I added deadlock monitoring via Event Notifications (add the event DEADLOCK_GRAPH).  I will probably modify the system_health session again and add a file.

2) a really nasty bug in system_health, which Jonathon Kehayias mentioned in http://www.sqlskills.com/blogs/jonathan/incorrect-timestamp-on-events-in-extended-events/ . I can easily tell it's happening due to crappy monitoring software - I show a new error message every 5 minutes, although the time is several days ago (and not exactly X days ago, either).  So, I added the system time, as suggested in the post. 
Here's the code, holler if you have any questions!

One table:




Multiple tables: