Thursday, February 23, 2023

Severity 10 - using Extended Events to get code errors

I wish I knew whom to thank for this.  An absolute genius idea. Probably Ozar, possibly Bertrand, could be White or a bunch of other #SQLFamily.  Warning: once you use this, you'll want it on all your servers. 

Premise: run an Extended Event looking for all errors over Severity 10, saving in a 10mb memory buffer within the SQL Server instance. Keep the rolling last errors so you have a decent amount of history. I believe the original was Severity 10 and above, but that one was too chatty for me, whereas 11+ seems to be ideal. But I refer to it as my "Sev10" code, so here we are. This includes all sorts of stuff you won't normally see - errors from stored procedures, errors in replication, errors in SSMS, etc. 

Testing it is easy: create session & start it, run "select 1/0", then run the query.

I do have a version that I run that saves logs out, but honestly, that's a different post. This Xevent uses 10mb from the Ring Buffer, constantly filling and keeping the last 10mb of errors. Performance hit seems minimal, and we're doing millions of transactions a day.

Performance has always been the kicker on this - using Xquery directly against the ring buffer is slow as heck, and even after solving that via temp tables, I still ran into problems with one particular server. That made me realize I don't appear to have posted this. By splitting it via NODE to multiple rows to a temp table, then querying the temp table to shred the XML, the performance is vastly improved.  In my case, from 8 minutes to 16 seconds. And on most servers, under 5. 

Other thoughts - you might consider adding another WHERE clause to the ADD EVENT; maybe filter end-users' machines or SSMS. I like having it, but you do you. Turns out SSMS is damn chatty sometimes, also. "View server state permission was denied " and "Cannot drop the table '#SVer'" abound.

Want to run this in Azure SQL DB? Sure, just do it against each database, replacing the "ON SERVER" with "ON DATABASE". The table names you query against change slightly (it's commented), and the time zone calculation doesn't work yet. Booooooooo.

I'm working on a collector, just haven't finished yet.

Bonus! Frames! Take the Frames provided in the Event_Data, and go use their amazing code to tell you exactly what piece of code in what function/stored procedure, and what line, caused the issue. Is it in a stored procedure calling a stored procedure calling a function? This will tell you where and what. Genius, and kudos to Tom.  https://straightforwardsql.com/posts/investigating-errors-with-extended-events/

Cheers!


--1.1, 20191210 first version I know of
--1.2, 20230223 major changes to speed it up, after dealing with a particular server, added Azure SQL DB details.
--1.21 20230403 adding event_sequence
--1.22 20230427 adding CPID Client PID. Find app with posh: gwmi win32_process -cn theclientnamehere -filter "ProcessId=2900"
--thebakingdba.blogspot.com
/*
CREATE EVENT SESSION
severity_10plus_errors_XE_memory
ON server --use "on database" for Azure SQL DB
ADD EVENT sqlserver.error_reported
(
ACTION(package0.event_sequence,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.database_id,sqlserver.sql_text,sqlserver.tsql_stack,sqlserver.username,sqlserver.client_pid)
--ACTION (sqlserver.sql_text, sqlserver.tsql_stack, sqlserver.database_id, sqlserver.username)
WHERE ([severity]> 10)
)
ADD TARGET
package0.ring_buffer
(SET
max_memory = 10000 ) -- Units of KB.
WITH (MAX_DISPATCH_LATENCY = 60SECONDS,STARTUP_STATE = on)
GO
ALTER EVENT SESSION severity_10plus_errors_XE_memory
ON server --use "on database" for Azure SQL DB
STATE = START;
GO
*/
if object_id('tempdb..#xevent') is not null
DROP TABLE #xevent
CREATE TABLE #xevent (target_data XML)
INSERT INTO #xevent (target_data)
SELECT target_data
FROM sys.dm_xe_session_targets AS st --For Azure SQL DB, use dm_xe_database_session_targets
INNER JOIN sys.dm_xe_sessions AS se --For Azure SQL DB, use dm_xe_database_sessions
ON CAST(se.address AS BINARY(8)) = CAST(st.event_session_address AS BINARY(8))
WHERE
se.name = 'severity_10plus_errors_XE_memory' AND st.target_name = 'ring_buffer'
DECLARE @count VARCHAR(10)
SELECT @count = target_data.value('/RingBufferTarget[1]/@eventCount','varchar(10)')
FROM #xevent --null means it's not on or not "installed"
PRINT 'eventCount = ' + @count
--need two temp tables to speed shredding; this one will hold one row per event.
if object_id('tempdb..#xevent2') is not null
DROP TABLE #xevent2
CREATE TABLE #xevent2 (err_timestamp DATETIME2, target_data XML)
CREATE INDEX ncidx__xevent2__err_timestamp ON #xevent2(err_timestamp)
--shredded-to-node for speed and to filter
INSERT INTO #xevent2
SELECT DATEADD(MINUTE, DATEPART(TZoffset, SYSDATETIMEOFFSET()),
ed.c.value('(@timestamp)[1]', 'nvarchar(max)')) AS err_timestamp, ed.c.query('.') AS event_data
FROM #xevent a
cross apply (select CAST(target_data as XML) as event_data) as xevents
CROSS APPLY xevents.event_data.nodes('RingBufferTarget/event') ed(c)
--final select out. Put filters in the SELECT * WHERE clause
;with events_cte as(
select err_timestamp,
target_data.value('(/event/data[@name="severity"]/value)[1]', 'bigint') AS [err_severity],
target_data.value('(/event/data[@name="error_number"]/value)[1]', 'bigint') AS [err_number],
target_data.value('(/event/data[@name="message"]/value)[1]', 'nvarchar(512)') AS [err_message],
target_data.value('(/event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS [sql_text],
target_data.value('(/event/action[@name="client_app_name"]/value)[1]', 'nvarchar(max)') AS client_app_name,
target_data.value('(/event/action[@name="client_hostname"]/value)[1]', 'nvarchar(max)') AS client_hostname,
target_data.value('(/event/action[@name="database_id"]/value)[1]', 'int') AS database_id,
target_data.value('(/event/action[@name="nt_username"]/value)[1]', 'nvarchar(max)') AS nt_username,
target_data.value('(/event/action[@name="username"]/value)[1]', 'nvarchar(max)') AS username,
target_data.value('(/event/action[@name="session_nt_username"]/value)[1]', 'nvarchar(max)') AS session_nt_username,
target_data.value('(/event/action[@name="event_sequence"]/value)[1]', 'nvarchar(max)') AS event_sequence,
target_data.value('(/event/action[@name="client_pid"]/value)[1]', 'int') AS client_pid
, a.target_data AS event_data --we can leave this off since it makes this run SO MUCH LONGER.
FROM #xevent2 a
)
SELECT events_cte.[err_timestamp], events_cte.[err_severity], events_cte.[err_number], events_cte.[err_message], events_cte.[sql_text], events_cte.client_app_name, events_cte.client_hostname
, client_pid, DB_NAME(events_cte.database_id) as database_name, events_cte.nt_username, username, events_cte.session_nt_username, events_cte.event_sequence
, events_cte.event_data
from events_cte
WHERE
NOT (events_cte.err_number = 17830 AND events_cte.err_severity = 20) --that network error disconnect message
--and (events_cte.client_hostname IS NULL OR events_cte.client_hostname not IN ('thebakingdba'))
--and err_timestamp >='20210930'
order by err_timestamp desc;

Friday, February 17, 2023

Azure SQL Managed Instance - a simple Extended Event to track logins

 This is a barebones Xevent (Extended Event) . I'm not using blob storage, just the existing "eh, you have a couple hundred meg worth of ring buffer you can use". But I needed to see who was using one of our dev instances. I thought we'd moved everything over to the Azure SQL DBs, since there's a ton of reasons to do so.

Here's some quick off-the-cuff reasons why I prefer Azure SQL DB to Managed Instance:

  1. Spins up/down FAST.  Seconds to minutes. Have a 20gb database? Takes 2 minutes. 30gb? 3.
  2. More availability! Technically more servers!
  3. Serverless options! Allows you to just pay for the price of storage, not compute. Doubly useful for QA/Dev boxes that aren't always used. Also allows us to auto scaleup/down as traffic changes.
  4. Works with our super-limited networking.

Yes, there are downsides. CLR, Replication, Agent, Mail, others. But the wins of using Azure SQL DB are very tangible.  < / soapbox > 




Anyhow, script:

@@

/* 


CREATE EVENT SESSION [Logins] ON SERVER 

ADD EVENT sqlserver.login(

ACTION(sqlserver.client_hostname,sqlserver.client_app_name,sqlserver.database_name,sqlserver.username))

ADD TARGET package0.ring_buffer(SET max_memory = 4096)  --max_event_limit threw error!

go

ALTER EVENT SESSION [Logins] 

ON SERVER 

STATE = start ;

GO*/



if object_id('tempdb..#xevent') is not null

    DROP TABLE #xevent

CREATE TABLE #xevent (target_data XML)

INSERT INTO #xevent (    target_data)

SELECT target_data

 FROM sys.dm_xe_session_targets        AS st

    INNER JOIN sys.dm_xe_sessions       AS se

         ON CAST(se.address AS BINARY(8)) = CAST(st.event_session_address AS BINARY(8))

WHERE

se.name = 'Logins'


DECLARE @count VARCHAR(10) 

SELECT @count = target_data.value('/RingBufferTarget[1]/@eventCount','varchar(10)') 

FROM #xevent --null means it's not on or not "installed"

PRINT 'eventCount = ' + @count


;with events_cte as(

SELECT 

ed.c.value('(@timestamp)[1]', 'DATETIMEOFFSET') at TIME ZONE 'Central Standard Time' AS err_timestamp,

--xevents.event_data.value('(RingBufferTarget/event/@timestamp)[1]', 'datetime2')) AS [err_timestamp],

ed.c.value('(action[@name="client_app_name"]/value)[1]', 'nvarchar(max)') AS client_app_name,

ed.c.value('(action[@name="client_hostname"]/value)[1]', 'nvarchar(max)') AS client_hostname,

ed.c.value('(action[@name="database_id"]/value)[1]', 'int') AS database_id,

ed.c.value('(action[@name="database_name (Action)"]/value)[1]', 'nvarchar(max)') AS nt_username,

ed.c.value('(action[@name="username"]/value)[1]', 'nvarchar(max)') AS username

, ed.c.query('.') AS event_data  --we leave this off since it makes this run SO MUCH LONGER.

FROM #xevent a

cross apply (select CAST(target_data as XML) as event_data) AS xevents

CROSS APPLY event_data.nodes('/RingBufferTarget/event') AS ed(c)

)

SELECT *

--, events_cte.event_data

from events_cte 


order by err_timestamp desc;

@@


Wednesday, February 8, 2023

Azure SQL Database Auditing gotchas.

A couple of notes when using Azure SQL DB Auditing 
1) this is actually kinda cool, overall. I plan on mining this in the future for code changes, etc. 
2) You can filter it! predicateexpression is the term you're looking for, available via the powershell module 3) if you're trying to add predicates, it uses SQL Audit syntax. In our case, we wanted to ignore a particular stored procedure call, so we used: 
"(NOT [statement] like '%myspnamehere%')" 
4) BEFORE YOU DO THIS and add the predicate: turn all the audits off on that "server" first. 
 The database AND the server-level. The regular AND the microsoft. 

We had a heck of a time figuring out why it wasn't working right. 
Between this and making sure the Database-Level Auditing was off, this worked. 

Disconnect-AzAccount myguidhere
Connect-AzAccount -Tenant myguidhere

Set-AzContext -SubscriptionId Get-AzSqlServerAudit -ResourceGroupName "ourRGname" -Servername "ourSERVERname" 

Set-AzSqlServerAudit -ResourceGroupName "ourRGname" -ServerName "ourSERVERname" -PredicateExpression "(NOT [statement] like '%myspnamehere%')" -WorkspaceResourceId "ourworkspaceidforourLogAnalyticsTarget" -LogAnalyticsTargetState ENABLED

Set-AzSqlServerMSSupportAudit -ResourceGroupName "ourRGname" -ServerName "ourSERVERname" -WorkspaceResourceId "ourworkspaceidforourLogAnalyticsTarget" -LogAnalyticsTargetState ENABLED