Detecting Query Plan Changes

The solution to detect the query execution plan changes. There were instances execution plan changed over the period of time for stored procedure and created a slowness in production. SQL server 2012 and prior versions not provides out of box capability to detect the query plan changes. This solution developed for that reason. The collected data useful for detailed analysis of query execution metrics apart from plan change detection. Its similar to query store feature available in SQL 2014 which is not available in SQL 2012 and prior versions.

Pre-requisites and supported versions

  • Supported Versions: SQL Server 2008, 2008 R2, 2012
  • Pre-requisites: SQL agent service should be up and running Note: Query store feature in SQL 2014 and 2016+ provides similar feature. If QueryStore feature not enabled for some reason, This script can be used for 2014 and 2016 as well.

List of objects

  • Table(s):
    • [querystats]      
    • [intervals]
    • [activitylog]
    • [Plans] 
    • [Queries]                      
    • [PlanChangeDetected]
  • DBADB.dbo.CollectQuerystats – This is a stored procedure does the data collection and plan change detection.

Supported functionalities

  • Collect all the queries running in the system and execution plans
  • The execution metrics collected for queries every time the stored procedure runs
  • Execution plans will be collected only if current queryplan not in the [Plans] table
  • Detect the plan change by checking [Query_Plan_Hash] and compare execution metrics to report as bad plan(When new execution metrics > 1.5 times of 1 month average)
  • The metrics considered for comparison are as below, if the new result is 1.5 times higher than previous for either of the below metrics, the new plan is deemed as bad.
    • avg_elapsed_time
    • avg_worker_time
    • avg_logical_reads
    • avg_physical_reads
  • [dbo]. [Plans] table stores the execution plan for every statement within procedure ran in system. PlanGuide can be used with this data to force a good plan as advanced tuning method

            Required parameters

  • @secdecisionfactor – This parameter defines number of times the execution metrics of new plan greater than previous plan. Default value is 1.5 times if not specified

Sample information

  • Query [dbo]. [PlanChangeDetected] table. It gives details about changed queryplan at different times.
  • Planguide can be used. If the requirement is to force a plan, the good plan can be retrieved from a dbo.[plans] table for a specific query and pass it to sp_create_plan_guide system stored procedure.

Space utilization to retain history data

  • Space requirement varies for each system and depends on no. of queries being executed and plan size etc.
  • Default retention is 30 days for collected query execution metrics. This clean up runs between 6pm – 5m. Not allowed to run in day time.

Implementation procedure & scripts

            1) Run the following code( it creates stored procedure )

/*****************************************************************************
-------------------------------------------------------------------------------
Script		:   PlanChangeDetection_CollectQuerystats
Author		:   Mahadevan N
Created		:   14/02/2020
Purpose		:   Capture potential plan changes
Parameter(s)	:   @secdecisionfactor float=1.5
Prerequisites	:   a) SQL agent service should be up and running. 
		   
Usage       	:   The stored proc dbo.[CollectQuerystats] does data collection. Schedule this procedure to run every 4 hours(recommended) using SQL agent job. 
		    DBA/Developer can review the potential plan changes by querying [PlanChangeDetected] table manually. This plan change information will be useful 
		    to identify and review queries which requires fine tuning as part of ongoing performance tuning exercise
		    No alert will be triggered.   
------------------------------------------------------------------------------

*****************************************************************************/


use DBADB /*Change to right DB here*/

go
/*Creation of Objects to store collection data*/
 IF OBJECT_ID( '[querystats]' ) IS NOT NULL DROP TABLE [querystats];  	
 IF OBJECT_ID( '[intervals]' ) IS NOT NULL DROP TABLE [intervals];
 IF OBJECT_ID( '[activitylog]' ) IS NOT NULL DROP TABLE [activitylog];
 IF OBJECT_ID( '[Plans]' ) IS NOT NULL DROP TABLE [Plans];  	
 IF OBJECT_ID( '[Queries]' ) IS NOT NULL DROP TABLE [Queries];  		
 IF OBJECT_ID( '[PlanChangeDetected]' ) IS NOT NULL DROP TABLE [PlanChangeDetected];  	


 CREATE TABLE [activitylog](
	[log_id] [int] IDENTITY(1,1) NOT NULL,
	[log_run_id] [int] NULL,
	[log_timestamp] [datetime] NULL,
	[log_message] [varchar](250) NULL,
 CONSTRAINT [pk_log_id] PRIMARY KEY CLUSTERED 
(
	[log_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [intervals](
	[interval_id] [int] IDENTITY(1,1) NOT NULL,
	[interval_start] [datetime] NULL,
	[interval_end] [datetime] NULL,
 CONSTRAINT [pk_intervals] PRIMARY KEY CLUSTERED 
(
	[interval_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO 
CREATE TABLE [dbo].[querystats](
	[query_id] [int] NOT NULL,
	[interval_id] [int] references [intervals] (interval_id) NOT NULL,
	[creation_time] [datetime] NOT NULL,
	[last_execution_time] [datetime] NOT NULL,
	[execution_count] [bigint] NOT NULL,
	[total_elapsed_time] [bigint] NOT NULL,
	[last_elapsed_time] [bigint] NOT NULL,
	[min_elapsed_time] [bigint] NOT NULL,
	[max_elapsed_time] [bigint] NOT NULL,
	[avg_elapsed_time] [bigint] NOT NULL,
	[total_rows] [bigint] NOT NULL,
	[last_rows] [bigint] NOT NULL,
	[min_rows] [bigint] NOT NULL,
	[max_rows] [bigint] NOT NULL,
	[avg_rows] [bigint] NOT NULL,
	[total_worker_time] [bigint] NOT NULL,
	[last_worker_time] [bigint] NOT NULL,
	[min_worker_time] [bigint] NOT NULL,
	[max_worker_time] [bigint] NOT NULL,
	[avg_worker_time] [bigint] NOT NULL,
	[total_physical_reads] [bigint] NOT NULL,
	[last_physical_reads] [bigint] NOT NULL,
	[min_physical_reads] [bigint] NOT NULL,
	[max_physical_reads] [bigint] NOT NULL,
	[avg_physical_reads] [bigint] NOT NULL,
	[total_logical_reads] [bigint] NOT NULL,
	[last_logical_reads] [bigint] NOT NULL,
	[min_logical_reads] [bigint] NOT NULL,
	[max_logical_reads] [bigint] NOT NULL,
	[avg_logical_reads] [bigint] NOT NULL,
	[total_logical_writes] [bigint] NOT NULL,
	[last_logical_writes] [bigint] NOT NULL,
	[min_logical_writes] [bigint] NOT NULL,
	[max_logical_writes] [bigint] NOT NULL,
	[avg_logical_writes] [bigint] NOT NULL,
	[query_plan_hash] [varbinary](8) NOT NULL,
 CONSTRAINT [pk_querystats] PRIMARY KEY CLUSTERED 
(
	[query_id] ASC,
	[interval_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE TABLE [Queries]
	(
	[queryId] bigint Identity(1,1) Primary Key,
	[query_hash] binary(8) NOT NULL,
	[statement_start_offset] int NOT NULL,
	[statement_end_offset] int NOT NULL,
	[statement_text]  nvarchar(max),
	[dbid] int   NULL,
	[objectID] int  NOT NULL
	)

	create index IX_Queries on [Queries] ( objectid,[query_hash],[statement_start_offset],[statement_end_offset])


CREATE TABLE [Plans]
(
	[PlanID] bigint IDENTITY(1,1) Primary key,
	[QueryID] bigint references [Queries]([queryId]) ,
	[Query_Plan_Hash] binary(8),
	query_plan_fingerprintid binary(16),
	[sql_plan] varchar(max),
	[plan_foundfirst] datetime default GETDATE() NOT NULL
)

create index [ix_plans] on [Plans] ([QueryID],[query_plan_fingerprintid])

GO
CREATE TABLE [dbo].[PlanChangeDetected](
	[queryId] [bigint] NOT NULL,
	[query_hash] [binary](8) NOT NULL,
	[statement_start_offset] [int] NOT NULL,
	[statement_end_offset] [int] NOT NULL,
	[statement_text] [nvarchar](max) NULL,
	[dbid] [int] NULL,
	[objectID] [int] NOT NULL,
	[PlanID] [bigint] NOT NULL,
	[Query_Plan_Hash] [binary](8) NULL,
	[sql_plan] [varchar](max) NULL,
	[plan_foundfirst] [datetime] NOT NULL,
	[interval_id] [int] NOT NULL,
	[avg_elapsed_time] [bigint] NOT NULL,
	[avg_worker_time] [bigint] NOT NULL,
	[avg_physical_reads] [bigint] NOT NULL,
	[avg_logical_reads] [bigint] NOT NULL,
	[avg_logical_writes] [bigint] NOT NULL,
	[dbname] [nvarchar](128) NULL,
	[object_Name] [nvarchar](128) NULL,
	[sql_handle] [varbinary](64)  NULL,
	[plan_handle] [varbinary](64) NULL,
	[timestamp] [datetime] NULL
)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
GO 

/*Stored procedure which does collection and plan change deduction*/

IF NOT EXISTS ( SELECT  TOP 1 1 FROM    sys.procedures WHERE   name = 'CollectQuerystats' )
BEGIN 
EXEC ('CREATE PROCEDURE CollectQuerystats as RETURN')
END 
GO

ALTER PROCEDURE CollectQuerystats
@debug int =0,
@secdecisionfactor float=1.5
as 

DECLARE @log_logrunid int,
@log_queries int;


	SELECT [sql_handle],
       [statement_start_offset],
       [statement_end_offset],
       [plan_generation_num],
       [plan_handle],
       [creation_time],
       [last_execution_time],
       [execution_count],
       [total_worker_time],
       [last_worker_time],
       [min_worker_time],
       [max_worker_time],
       [total_physical_reads],
       [last_physical_reads],
       [min_physical_reads],
       [max_physical_reads],
       [total_logical_writes],
       [last_logical_writes],
       [min_logical_writes],
       [max_logical_writes],
       [total_logical_reads],
       [last_logical_reads],
       [min_logical_reads],
       [max_logical_reads],
       --[total_clr_time],
       --[last_clr_time],
       --[min_clr_time],
       --[max_clr_time],
       [total_elapsed_time],
       [last_elapsed_time],
       [min_elapsed_time],
       [max_elapsed_time],[query_hash],[query_plan_hash],[total_rows],[last_rows],[min_rows],[max_rows],
	   --[statement_sql_handle],[statement_context_id],
	   --[total_dop],[last_dop],[min_dop],[max_dop],
	   
	   --[total_grant_kb],[last_grant_kb],[min_grant_kb],[max_grant_kb],[total_used_grant_kb],[last_used_grant_kb],[min_used_grant_kb],[max_used_grant_kb],[total_ideal_grant_kb],[last_ideal_grant_kb],[min_ideal_grant_kb],[max_ideal_grant_kb],[total_reserved_threads],[last_reserved_threads],[min_reserved_threads],[max_reserved_threads],[total_used_threads],[last_used_threads],[min_used_threads],[max_used_threads],
	   SUBSTRING(st.text, (qs.statement_start_offset/2)+1,   
        ((CASE qs.statement_end_offset  
          WHEN -1 THEN DATALENGTH(st.text)  
         ELSE qs.statement_end_offset  
         END - qs.statement_start_offset)/2) + 1) AS statement_text,
		 b.dbid,
		 ISNULL(b.objectid,0) objectID,
		 [query_hash]+[query_plan_hash] query_plan_fingerprintid,
		 b.query_plan
		    INTO #querystats
		 FROM sys.dm_exec_query_stats qs cross apply 
				  sys.dm_exec_text_query_plan  (plan_handle,statement_start_offset,statement_end_offset) b
				  cross apply
				  sys.dm_exec_sql_text(sql_handle) st
				  where b.dbid >4 --and b.dbid=db_id()

		DELETE FROM #querystats 
		where execution_count=1 and objectID=0

		DELETE FROM #querystats 
		where  objectID=object_ID('CollectQuerystats')

   SET @log_logrunid = (   SELECT ISNULL( MAX( [AL].[log_run_id] ), 0 ) + 1
                                                FROM   [activitylog] AS [AL] );
INSERT INTO [activitylog] ( [log_run_id],
                                                                       [log_timestamp],
                                                                       [log_message] )
                                    VALUES ( @log_logrunid, GETDATE(), 'Querystats capture script started...' );
IF @debug=1
BEGIN
SELECT 'queries'
SELECT DISTINCT [query_hash],[statement_start_offset],[statement_end_offset],[statement_text],[dbid], ISNULL([objectID],0) [objectID]
from #querystats a
WHERE NOT EXISTS  (SELECT [query_hash] from [Queries]  b WHERE  
a.dbid=b.dbid 
and a.objectid=b.objectID 
and a.[query_hash]=b.[query_hash]
and a.[statement_start_offset]=b.[statement_start_offset]
and a.[statement_end_offset]=b.[statement_end_offset]
)
END

INSERT INTO [Queries]
SELECT DISTINCT [query_hash],[statement_start_offset],[statement_end_offset],[statement_text],[dbid], ISNULL([objectID],0) [objectID]
from #querystats a
WHERE NOT EXISTS  (SELECT [query_hash] from [Queries]  b WHERE  
a.dbid=b.dbid 
and a.objectid=b.objectID 
and a.[query_hash]=b.[query_hash]
and a.[statement_start_offset]=b.[statement_start_offset]
and a.[statement_end_offset]=b.[statement_end_offset]
)


   SET @log_queries = @@RowCount;

                         
                    
                                    INSERT INTO [activitylog] ( [log_run_id],
                                                                       [log_timestamp],
                                                                       [log_message] )
                                    VALUES ( @log_logrunid, GETDATE(), 'Querystats captured ' + CONVERT( varchar, @log_queries ) + ' new queries(s)...' );
                             

IF @debug=1
BEGIN
SELECT 'plans'
select distinct b.queryId,a.[Query_Plan_Hash],a.Query_Plan_FingerprintID,a.query_plan from #querystats a join queries b
on a.dbid=b.dbid 
and a.objectid=b.objectID 
and a.[query_hash]=b.[query_hash]
and a.[statement_start_offset]=b.[statement_start_offset]
and a.[statement_end_offset]=b.[statement_end_offset]
WHERE NOT EXISTS  (SELECT [query_plan_fingerprintid] from [Plans]  c 
WHERE  a.[query_plan_fingerprintid]=c.[query_plan_fingerprintid]
and b.queryId=c.QueryID
)
END


INSERT INTO Plans (QueryID,[Query_Plan_Hash],Query_Plan_FingerprintID,sql_plan)
select distinct b.queryId,a.[Query_Plan_Hash],a.Query_Plan_FingerprintID,a.query_plan from #querystats a join queries b
on a.dbid=b.dbid 
and a.objectid=b.objectID 
and a.[query_hash]=b.[query_hash]
and a.[statement_start_offset]=b.[statement_start_offset]
and a.[statement_end_offset]=b.[statement_end_offset]
WHERE NOT EXISTS  (SELECT [query_plan_fingerprintid] from [Plans]  c 
WHERE  a.[query_plan_fingerprintid]=c.[query_plan_fingerprintid]
and b.queryId=c.QueryID
)

 SET @log_queries = @@RowCount;

                         
                    
                                    INSERT INTO [activitylog] ( [log_run_id],
                                                                       [log_timestamp],
                                                                       [log_message] )
                                    VALUES ( @log_logrunid, GETDATE(), 'QueryStats captured ' + CONVERT( varchar, @log_queries ) + ' new plans(s)...' );
                             


   INSERT INTO [intervals] ( [interval_start] ) VALUES ( GETDATE());
   



                            DECLARE @Interval_ID int;
                            SET @Interval_ID = IDENT_CURRENT( '[intervals]' );
						

	INSERT INTO [querystats] ( [query_id],
                                                                      [interval_id],
                                                                      [creation_time],
                                                                      [last_execution_time],
                                                                      [execution_count],
                                                                      [total_elapsed_time],
                                                                      [last_elapsed_time],
                                                                      [min_elapsed_time],
                                                                      [max_elapsed_time],
                                                                      [avg_elapsed_time],
                                                                      [total_rows],
                                                                      [last_rows],
                                                                      [min_rows],
                                                                      [max_rows],
                                                                      [avg_rows],
                                                                      [total_worker_time],
                                                                      [last_worker_time],
                                                                      [min_worker_time],
                                                                      [max_worker_time],
                                                                      [avg_worker_time],
                                                                      [total_physical_reads],
                                                                      [last_physical_reads],
                                                                      [min_physical_reads],
                                                                      [max_physical_reads],
                                                                      [avg_physical_reads],
                                                                      [total_logical_reads],
                                                                      [last_logical_reads],
                                                                      [min_logical_reads],
                                                                      [max_logical_reads],
                                                                      [avg_logical_reads],
                                                                      [total_logical_writes],
                                                                      [last_logical_writes],
                                                                      [min_logical_writes],
                                                                      [max_logical_writes],
                                                                      [avg_logical_writes],
																	  [query_plan_hash] )
                                        SELECT [oqs_q].[queryid],
                                               @Interval_ID,
                                               MIN([qs].[creation_time]),
                                               MAX([qs].[last_execution_time]),
                                               SUM([qs].[execution_count]),
                                               SUM([qs].[total_elapsed_time]),
                                               SUM([qs].[last_elapsed_time]),
                                               MIN([qs].[min_elapsed_time]),
                                               MAX([qs].[max_elapsed_time]),
                                               SUM(ISNULL(( qs.[total_elapsed_time] / NULLIF(( qs.[execution_count] ), 0)), 0 ))     AS [Avg. Time],
                                               SUM([qs].[total_rows]),
                                               SUM([qs].[last_rows]),
                                               MIN([qs].[min_rows]),
                                               MAX([qs].[max_rows]),
											   SUM(ISNULL(( qs.[total_rows] / NULLIF(( qs.[execution_count] ), 0)), 0 ))     AS [avg_rows],
                                               SUM([qs].[total_worker_time]),
                                               SUM([qs].[last_worker_time]),
                                               MIN([qs].[min_worker_time]),
                                               MAX([qs].[max_worker_time]),
											   SUM(ISNULL(( qs.[total_worker_time] / NULLIF(( qs.[execution_count] ), 0)), 0 ))     AS [Avg. Worker Time],
                                               SUM([qs].[total_physical_reads]),
                                               SUM([qs].[last_physical_reads]),
                                               MIN([qs].[min_physical_reads]),
                                               MAX([qs].[max_physical_reads]),
											   SUM(ISNULL(( qs.[total_physical_reads] / NULLIF(( qs.[execution_count] ), 0)), 0 ))     AS [Avg. Phys reads],
                                               SUM([qs].[total_logical_reads]),
                                               SUM([qs].[last_logical_reads]),
                                               MIN([qs].[min_logical_reads]),
                                               MAX([qs].[max_logical_reads]),
											   SUM(ISNULL(( qs.[total_logical_reads] / NULLIF(( qs.[execution_count] ), 0)), 0 ))     AS [Avg. Log reads],
                                               SUM([qs].[total_logical_writes]),
                                               SUM([qs].[last_logical_writes]),
                                               MIN([qs].[min_logical_writes]),
                                               MAX([qs].[max_logical_writes]),
											   SUM(ISNULL(( qs.[total_logical_writes] / NULLIF(( qs.[execution_count] ), 0)), 0 ))     AS [Avg. Log writes],
											   max(qs.[query_plan_hash]) AS		[query_plan_hash]
                                        FROM   [queries]                AS [oqs_q]
                                               INNER JOIN [#querystats] AS [qs] ON 
											   (   [oqs_q].[query_hash] = [qs].[query_hash]
                                                                                             
													AND [oqs_q].[statement_start_offset] = [qs].[statement_start_offset]
                                                    AND [oqs_q].[statement_end_offset] = [qs].[statement_end_offset]
													AND oqs_q.dbid = qs.dbid
													AND oqs_q.objectID =qs.objectID
                                                    )
                                        GROUP BY [oqs_q].[queryid];



										       
                            UPDATE [intervals]
                            SET    [interval_end] = GETDATE()
                            WHERE  [interval_id] = ( SELECT MAX( [interval_id] ) - 1 FROM [intervals] );


IF OBJECT_ID( 'tempdb..#OQS_Runtime_Stats' ) IS NOT NULL DROP TABLE #OQS_Runtime_Stats;
  WITH [CTE_Update_Runtime_Stats] ( [query_id], [interval_id],[creation_time], [execution_count], [total_elapsed_time], [total_rows], [total_worker_time], [total_physical_reads], [total_logical_reads], [total_logical_writes] )
                            AS ( SELECT [QRS].[query_id],
                                        [QRS].[interval_id],
										[QRS].[creation_time],
                                        [QRS].[execution_count],	
                                        [QRS].[total_elapsed_time],
                                        [QRS].[total_rows],
                                        [QRS].[total_worker_time],
                                        [QRS].[total_physical_reads],
                                        [QRS].[total_logical_reads],
                                        [QRS].[total_logical_writes]
                                 FROM   [querystats] AS [QRS]
                                 WHERE  [QRS].[interval_id] = ( SELECT MAX( [interval_id] ) - 1 FROM [intervals] ))
                            SELECT [cte].[query_id],
                                   [qrs].[interval_id],
								   [QRS].[creation_time],
                                   ( [qrs].[execution_count] - [cte].[execution_count] )                                                                                            AS [Delta Exec Count],
                                   ( [qrs].[total_elapsed_time] - [cte].[total_elapsed_time] )                                                                                      AS [Delta Time],
                                   ISNULL((( [qrs].[total_elapsed_time] - [cte].[total_elapsed_time] ) / NULLIF(( [qrs].[execution_count] - [cte].[execution_count] ), 0)), 0 )     AS [Avg. Time],
                                   ( [qrs].[total_rows] - [cte].[total_rows] )                                                                                                      AS [Delta Total Rows],
                                   ISNULL((( [qrs].[total_rows] - [cte].[total_rows] ) / NULLIF(( [qrs].[execution_count] - [cte].[execution_count] ), 0)), 0 )                     AS [Avg. Rows],
                                   ( [qrs].[total_worker_time] - [cte].[total_worker_time] )                                                                                        AS [Delta Total Worker Time],
                                   ISNULL((( [qrs].[total_worker_time] - [cte].[total_worker_time] ) / NULLIF(( [qrs].[execution_count] - [cte].[execution_count] ), 0)), 0 )       AS [Avg. Worker Time],
                                   ( [qrs].[total_physical_reads] - [cte].[total_physical_reads] )                                                                                  AS [Delta Total Phys Reads],
                                   ISNULL((( [qrs].[total_physical_reads] - [cte].[total_physical_reads] ) / NULLIF(( [qrs].[execution_count] - [cte].[execution_count] ), 0)), 0 ) AS [Avg. Phys reads],
                                   ( [qrs].[total_logical_reads] - [cte].[total_logical_reads] )                                                                                    AS [Delta Total Log Reads],
                                   ISNULL((( [qrs].[total_logical_reads] - [cte].[total_logical_reads] ) / NULLIF(( [qrs].[execution_count] - [cte].[execution_count] ), 0)), 0 )   AS [Avg. Log reads],
                                   ( [qrs].[total_logical_writes] - [cte].[total_logical_writes] )                                                                                  AS [Delta Total Log Writes],
                                   ISNULL((( [qrs].[total_logical_writes] - [cte].[total_logical_writes] ) / NULLIF(( [qrs].[execution_count] - [cte].[execution_count] ), 0)), 0 ) AS [Avg. Log writes]
                            INTO   [#OQS_Runtime_Stats]
                            FROM   [CTE_Update_Runtime_Stats]             AS [cte]
                                   INNER JOIN [querystats] AS [qrs] ON [cte].[query_id] = [qrs].[query_id]
								   and [QRS].[creation_time]=[cte].creation_time
                            WHERE  [qrs].[interval_id] = ( SELECT MAX( [interval_id] ) FROM [intervals] );

							  UPDATE [qrs]
                            SET    [qrs].[execution_count] = [tqrs].[Delta Exec Count],
                                   [qrs].[total_elapsed_time] = [tqrs].[Delta Time],
                                   [qrs].[avg_elapsed_time] = [tqrs].[Avg. Time],
                                   [qrs].[total_rows] = [tqrs].[Delta Total Rows],
                                   [qrs].[avg_rows] = [tqrs].[Avg. Rows],
                                   [qrs].[total_worker_time] = [tqrs].[Delta Total Worker Time],
                                   [qrs].[avg_worker_time] = [tqrs].[Avg. Worker Time],
                                   [qrs].[total_physical_reads] = [tqrs].[Delta Total Phys Reads],
                                   [qrs].[avg_physical_reads] = [tqrs].[Avg. Phys reads],
                                   [qrs].[total_logical_reads] = [tqrs].[Delta Total Log Reads],
                                   [qrs].[avg_logical_reads] = [tqrs].[Avg. Log reads],
                                   [qrs].[total_logical_writes] = [tqrs].[Delta Total Log Writes],
                                   [qrs].[avg_logical_writes] = [tqrs].[Avg. Log writes]
                            FROM   [querystats]     AS [qrs]
                                   INNER JOIN [#OQS_Runtime_Stats] AS [tqrs] ON (   [qrs].[interval_id] = [tqrs].[interval_id]
                                                                                    AND [qrs].[query_id] = [tqrs].[query_id] );

;WITH Compare_CTE AS			(

										SELECT [QRS].[query_id],
                                        [QRS].[interval_id],								
                                        [QRS].[avg_elapsed_time],
                                        [QRS].[avg_worker_time],
                                        [QRS].[avg_physical_reads],
                                        [QRS].[avg_logical_reads],
                                        [QRS].[avg_logical_writes],
										[qrs].query_plan_hash
                                 FROM   [querystats] AS [QRS]
                                 WHERE  [QRS].[interval_id] = ( SELECT MAX( [interval_id] ) - 1 FROM [intervals] )
								 --and qrs.execution_count >2
								 )

					SELECT 
					cte.query_id,cte.interval_id,cte.avg_elapsed_time,
					cte.avg_worker_time,cte.avg_physical_reads,
					cte.avg_logical_reads,cte.avg_logical_writes,cte.query_plan_hash,
					qrs.query_id cquery_id,qrs.interval_id cinterval_id,
					qrs.avg_elapsed_time cavg_elapsed_time,
					qrs.avg_worker_time cavg_worker_time,
					qrs.avg_physical_reads cavg_physical_reads,
					qrs.avg_logical_reads cavg_logical_reads,
					qrs.avg_logical_writes cavg_logical_writes,
					qrs.query_plan_hash cquery_plan_hash
					
					into #PlanChanges
					FROM Compare_CTE             AS [cte]
					INNER JOIN [querystats] AS [qrs] ON [cte].[query_id] = [qrs].[query_id]
                    WHERE  [qrs].[interval_id] = ( SELECT MAX( [interval_id] ) FROM [intervals] )
					--and qrs.execution_count >2
					and ( [QRS].[avg_elapsed_time] > @secdecisionfactor*cte.avg_elapsed_time or qrs.avg_worker_time >
					@secdecisionfactor*cte.[avg_worker_time] or qrs.avg_logical_reads >@secdecisionfactor*cte.avg_logical_reads or
					 qrs.avg_physical_reads > @secdecisionfactor*cte.avg_physical_reads)
					and qrs.query_plan_hash <>cte.query_plan_hash 
							
			
			DECLARE @Timest datetime = GETDATE()

			SELECT 
				q.queryId,
				q.query_hash,
				q.statement_start_offset,
				q.statement_end_offset,
				q.statement_text,
				q.dbid,
				q.objectID,
				p.PlanID,
				p.Query_Plan_Hash,
				p.sql_plan,
				p.plan_foundfirst,
				s.interval_id,s.avg_elapsed_time,
				s.avg_worker_time,s.avg_physical_reads,
				s.avg_logical_reads,s.avg_logical_writes
			 INTO #PlanChangeDetected from queries q
			JOIN plans p 
			on q.queryId=p.QueryID
--			drop table #PlanChangeDetected
			JOIN 
			
			(			

				SELECT 		cte.query_id,cte.interval_id,cte.avg_elapsed_time,
					cte.avg_worker_time,cte.avg_physical_reads,
					cte.avg_logical_reads,cte.avg_logical_writes,cte.query_plan_hash from #PlanChanges cte

				UNION 
						SELECT 		 cquery_id, cinterval_id,
					 cavg_elapsed_time,
					 cavg_worker_time,
					 cavg_physical_reads,
					 cavg_logical_reads,
					 cavg_logical_writes,
					 cquery_plan_hash
					 from #PlanChanges qrs
			 ) s
			 on q.queryId=s.query_id
			 and p.Query_Plan_Hash=s.query_plan_hash



			 INSERT INTO PlanChangeDetected
			 select pd.*,db_name(pd.dbid) dbname, OBJECT_NAME(pd.objectID,pd.dbid) [object_Name],
			 qs.sql_handle,qs.plan_handle,
			 @Timest as [timestamp]
			 FROM #PlanChangeDetected PD 
						LEFT JOIN sys.dm_exec_query_stats qs ON
						 PD.query_hash=qs.query_hash 
						 and pd.Query_Plan_Hash=qs.query_plan_hash
						 and pd.statement_start_offset=qs.statement_start_offset
						 and pd.statement_end_offset=qs.statement_end_offset

--IF EXISTS ( SELECT top 1 1 from #PlanChangeDetected)
--BEGIN
--SELECT *, 
--case when charindex('<ParameterList>', sql_plan) >0 then 
--CAST (
		  
--		  substring(sql_plan,
		  
--		  (charindex('<ParameterList>', sql_plan) + len('<ParameterList>')),
		  
--		  (
		  

--		  (charindex('</ParameterList>', sql_plan))-(charindex('<ParameterList>', sql_plan) + len('<ParameterList>'))
		  
--		  )
--		  ) 
		  
--		  as XML)  end as CompliledParams
--		  ,CAST((SELECT statement_text FOR XML PATH('')) as XML) StatementTextXml

--FROM PlanChangeDetected WHERE [timestamp]=@Timest

--DROP TABLE #PlanChangeDetected
--END

            It does following action.

  • Creation of stored procedure and permanent tables listed in objects section

Script designed to run from [DBADB] database. This can be changed to other DBA specific database if exists in the environment.

2) DBA to create a SQL agent job to execute DBADB.dbo.CollectQuerystats stored procedure at different intervals. This monitoring is considerably heavy weight. The job frequency varies for each system and depends on load on the system. Recommended to run every 4 hours. If the system don’t allow to run monitoring during business hours, run at off hours specifically few hours’ after update statistics job completes.

Troubleshoot SQL CPU spikes

In this blog, I am covering an approach to troubleshoot and find SQL queries when there is spike in Server CPU.

There would be scenario where the CPU on server was pegged at 100% for approx. 10 minutes yesterday at 4:00pm. In this case how do you find out what sql was running against the database at that particular time of yesterday?. If you able to find those queries were running at that specific time with other run time details, it helps to fine tune the system

Here is the solution

Many of you know that SQL server supports WMI alerts as part Alerting module. We are going to take advantage of WMI alert when there is CPU spike at the Windows server level.

With help of WMI alert, we can trigger a SQL agent job when CPU usage greater certain threshold. We have an custom stored proc (dbo.sp_Get_CPU_ConsumingSQL_Snapshot) which called in SQL agent job which does data collection of current running SQLs, text, query plan and number of thread etc along with CPUtime.

Sometimes there would be momentary spike for short time, those cases our WMI alert triggers which result into data collection. We mayn’t need to capture data during momentary spike.

Hence another logic added in our script to do cpu usage calculation with in SQL server again, and if it above threshold proceed to data collection.

There are functions in SQL server to find CPU activity. We are using in our script to calculate CPU.

The functions are

@@CPU_BUSY – Time that SQL Server has spent in active operation since its latest start.This value is cumulative for all CPUs.
@@IDLE – Returns the time that SQL Server has been idle since it was last started

Select @CPU_Busy = @@CPU_BUSY,
@Idle = @@IDLE
-- Delay for 1 second
WaitFor Delay '000:00:01'
-- Get current CPU counts and calculate % CPU utilization over the last 1 second
Set @CPUPercent = Cast(Round((@@CPU_BUSY - @CPU_Busy)/((@@IDLE - @Idle + @@CPU_BUSY - @CPU_Busy) * 1.0) * 100, 2) As decimal(5, 2))

@@CPU_BUSY , @@IDLE values are accurate up to only 49 days from last restart. It was mentioned in MSDN note.

https://docs.microsoft.com/en-us/sql/t-sql/functions/idle-transact-sql?view=sql-server-2017

Hence, when you calculate CPU activity using @@CPU_BUSY and @@IDLE you will get arithmetic overflow error.
As a workaround, when its 49 days after system restart, we can retrieve CPU usage from ring buffers and proceed our logic. The ring buffers provides last 256 minutes for CPU utilization information. This logic mentioned many of articles

DECLARE @ts_now bigint = (SELECT cpu_ticks/(cpu_ticks/ms_ticks)FROM sys.dm_os_sys_info); 
SELECT TOP(1) @CPUPercent=SQLProcessUtilization
FROM (
SELECT record.value('(./Record/@id)[1]', 'int') AS record_id,
record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int')
AS [SystemIdle],
record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]',
'int')
AS [SQLProcessUtilization], [timestamp]
FROM (
SELECT [timestamp], CONVERT(xml, record) AS [record]
FROM sys.dm_os_ring_buffers
WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
AND record LIKE N'%%') AS x
) AS y
ORDER BY record_id DESC OPTION (RECOMPILE)

So, next step is when CPU usage is above threshold, pull a data from DMVs currently running SQLs and their execution metrics.

SELECT  TOP 25 
qs.Session_Id as SessionID,
qs.CPU_Time as CPU_TimeMS,
DBName = db_name(qs.database_id),
ObjectName = object_name(qt.objectid, qs.database_id),
QueryText = SUBSTRING(qt.text, qs.statement_start_offset/2,
(Case When qs.statement_end_offset = -1
Then len(convert(nvarchar(max), qt.text)) * 2
Else qs.statement_end_offset
End - qs.statement_start_offset)/2),
qs.Reads,
qs.Logical_reads as LogicalReads,
qs.writes as Writes,
sp.login_time as LoginTime ,
qs.start_time as StartTime,
GETDATE() as CollectionTime,
(select max(isnull(ot.exec_context_id, 0))
From sys.dm_os_tasks ot where ot.session_id=qs.session_ID )+1
as NumberOfworkers
,@CPUPercent as CPUPercent
,qt.Text
,sp.host_name
,sp.program_name
,sp.login_name
,qs.last_wait_type
,qs.wait_type
,qs.wait_time
,qs.wait_resource
,qp.query_plan
FROM sys.dm_exec_requests qs
JOIN sys.dm_exec_sessions sp on qs.session_id = sp.session_id
Cross Apply sys.dm_exec_sql_text(qs.sql_handle) as qt
cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp
WHERE qs.session_id <>@@SPID
ORDER BY qs.CPU_Time Desc, qs.logical_reads Desc;

The result data from above script to be stored in permanent table dbo.tblCPUconsumingQueries. Suggested to keep this table DBA specific database.

The data can be purged after certain day intervals. I usually purge older than a week data. This can be customized as per your requirement.

You can edit the CPU threshold value in in the WMI alert below highlighted
EXEC msdb.dbo.sp_add_alert @name=@TEXT, 
 @message_id=0, 
 @severity=0, 
 @enabled=1, 
 @delay_between_responses=0, 
 @include_event_description_in=0, 
 @category_name=N'[Uncategorized]', 
 @wmi_namespace=@wmi_namespace, 
 @wmi_query=N'select * from __instancemodificationevent within 10 where targetinstance isa ''Win32_Processor'' and targetinstance.LoadPercentage > 65', 
 @job_name=N'Infra_Capture_High_CPU_Queries'
 END 

Here is the solution flow

Objects Involved:

Object Name  Object TYPE
sp_Get_CPU_ConsumingSQL_Snapshot Stored Procedure
Infra_Capture_High_CPU_Queries  SQL agent job
Respond to High CPU WMI Alert
tblCPUconsumingQueries   Permanent Table

The complete setup script has provided here

In my script I created stored proc in master database. Feel free take the code and test and modify accordingly for your environment. if you change the database for stored procedure, you need to update in agent job as well as stored proc called in agent job

use master
 GO
 /Creating a stored proc to check CPU busy and capture currently running SQL to a table. Change the @Threshold value in Stored proc input parameter. Default @Threshold value set as 50 percent. I.e In the event of CPU alert fire, this stored proc again checks how busy CPU is and if > 65 then capture currently running SQL/
 IF NOT EXISTS ( SELECT TOP 1 1 FROM sys.procedures where name = 'sp_Get_CPU_ConsumingSQL_Snapshot')
 BEGIN 
  EXEC ( 'CREATE PROCEDURE sp_Get_CPU_ConsumingSQL_Snapshot as RETURN')
 END 
 GO
 ALTER PROCEDURE [dbo].[sp_Get_CPU_ConsumingSQL_Snapshot]
 @Threshold int = 30
 as
 DECLARE @CPU_Busy INT,
 @Idle INT,
 @CPUPercent decimal(5, 2) 
 -- Get current CPU counts
 --For systems which not restarted for long time
 IF exists ( SELECT top 1 1 FROM sys.dm_os_sys_info where sqlserver_start_time < getdate()-49 ) BEGIN DECLARE @ts_now bigint = (SELECT cpu_ticks/(cpu_ticks/ms_ticks)FROM sys.dm_os_sys_info);  SELECT TOP(1) @CPUPercent=SQLProcessUtilization               FROM (        SELECT record.value('(./Record/@id)[1]', 'int') AS record_id,              record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int')              AS [SystemIdle],              record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]',              'int')              AS [SQLProcessUtilization], [timestamp]        FROM (              SELECT [timestamp], CONVERT(xml, record) AS [record]              FROM sys.dm_os_ring_buffers              WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'              AND record LIKE N'%%') AS x 
       ) AS y 
 ORDER BY record_id DESC OPTION (RECOMPILE)
 END
 ELSE
 BEGIN
 Select @CPU_Busy = @@CPU_BUSY,
 @Idle = @@IDLE
 -- Delay for 1 second
 WaitFor Delay '000:00:01'
 -- Get current CPU counts and calculate % CPU utilization over the last 1 second
 Set @CPUPercent = Cast(Round((@@CPU_BUSY - @CPU_Busy)/((@@IDLE - @Idle + @@CPU_BUSY - @CPU_Busy) * 1.0) * 100, 2) As decimal(5, 2))
 END
 If @CPUPercent >= @Threshold
 BEGIN
 If object_id('dbo.tblCPUconsumingQueries') Is Null
 BegiN
 CREATE TABLE dbo.tblCPUconsumingQueries(
 ID int identity(1, 1) not null primary key,
 SessionID INT,
 CPUTimeMs Bigint not null,
 DBName sysname null,
 ObjectName sysname null,
 QueryText nvarchar(max) null,
 Reads bigint null,
 LogicalReads bigint null,
 Writes bigint null,
 LoginTime datetime,
 StartTime datetime,
 CollectionTime datetime,
 NumberOfworkers int,
 CPUPercent decimal(5, 2),
 Text nvarchar(max),
 HostName nvarchar(120),
 ProgramName nvarchar(120),
 LoginName nvarchar(128),
 LastWaitType nvarchar(60),
 WaitType nvarchar(60),
 WaitTime int,
 WaitResource nvarchar(256),
 QueryPlan xml
 )
 END
 ---Capture Top 25 queries by CPU time
 Insert Into dbo.tblCPUconsumingQueries 
 SELECT  TOP 25 
 qs.Session_Id as SessionID,
 qs.CPU_Time as CPU_TimeMS,
 DBName = db_name(qs.database_id),
 ObjectName = object_name(qt.objectid, qs.database_id),
 QueryText = SUBSTRING(qt.text, qs.statement_start_offset/2,
 (Case When qs.statement_end_offset = -1
 Then len(convert(nvarchar(max), qt.text)) * 2
 Else qs.statement_end_offset
 End - qs.statement_start_offset)/2),
 qs.Reads,
 qs.Logical_reads as LogicalReads,
 qs.writes as Writes,
 sp.login_time as LoginTime ,
 qs.start_time as StartTime,
 GETDATE() as CollectionTime,
 (select max(isnull(ot.exec_context_id, 0))  
 From sys.dm_os_tasks ot where ot.session_id=qs.session_ID )+1  
 as NumberOfworkers 
 ,@CPUPercent as CPUPercent
 ,qt.Text
 ,sp.host_name
 ,sp.program_name
 ,sp.login_name
 ,qs.last_wait_type
 ,qs.wait_type
 ,qs.wait_time
 ,qs.wait_resource
 ,qp.query_plan
 FROM sys.dm_exec_requests qs
 JOIN sys.dm_exec_sessions sp on qs.session_id = sp.session_id
 Cross Apply sys.dm_exec_sql_text(qs.sql_handle) as qt
 cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp
 WHERE qs.session_id <>@@SPID
 ORDER BY qs.CPU_Time Desc, qs.logical_reads Desc;
 DELETE FROM tblCPUconsumingQueries 
 WHERE CollectionTime < GETDATE()-7
 END
 GO
 GO
 USE [msdb]
 GO
 IF NOT EXISTS (SELECT name FROM sysjobs WHERE name = 'Infra_Capture_High_CPU_Queries')
 BEGIN
 DECLARE @jobId BINARY(16)
 DECLARE @schedule_id INT
 EXEC  msdb.dbo.sp_add_job @job_name=N'Infra_Capture_High_CPU_Queries', 
     @enabled=1, 
         @notify_level_eventlog=0, 
         @notify_level_email=3, 
         @notify_level_netsend=0, 
         @notify_level_page=0, 
         @delete_level=0, 
         @description=N'Fire on High CPU', 
         @category_name=N'[Uncategorized (Local)]', 
         @owner_login_name=N'sa', 
         @job_id = @jobId OUTPUT
 EXEC msdb.dbo.sp_add_jobserver @job_name=N'Infra_Capture_High_CPU_Queries', @server_name = N'(local)'
 END
 IF NOT EXISTS (SELECT J.name,JS.step_name FROM sysjobs J INNER JOIN sysjobsteps JS ON J.job_id=JS.job_id WHERE J.name='Infra_Capture_High_CPU_Queries' AND JS.step_name='Infra_Capture_High_CPU_Queries')
 BEGIN
 EXEC msdb.dbo.sp_add_jobstep @job_name=N'Infra_Capture_High_CPU_Queries', @step_name=N'Infra_Capture_High_CPU_Queries', 
         @step_id=1, 
         @cmdexec_success_code=0, 
         @on_success_action=1, 
         @on_fail_action=2, 
         @retry_attempts=0, 
         @retry_interval=0, 
         @os_run_priority=0, @subsystem=N'TSQL', 
         @command=N'EXEC sp_Get_CPU_ConsumingSQL_Snapshot', 
         @database_name=N'master', 
         @flags=0
 END 
 GO 
 USE [msdb]
 GO
 IF EXISTS(SELECT name FROM msdb..sysalerts WHERE name='Respond to High CPU') 
 BEGIN
 EXEC dbo.sp_delete_alert
 @name = N'Respond to High CPU'
 END
 DECLARE @wmi_namespace varchar(max)
 DECLARE @instancename varchar(1000)
 declare @server nvarchar(50)
 DECLARE @productversion varchar(1000)
 DECLARE @TEXT VARCHAR(1000)
 SET @TEXT = 'Respond to High CPU'
 SELECT @server=convert(nvarchar(50),(SERVERPROPERTY('ComputerNamePhysicalNetBIOS')))
 SET @wmi_namespace=N'\'+@server+'\root\CIMV2'
 /*"LoadPercentage" value defines threshold for CPU alert. Here hardcoded to report if cpu usage > 65 percent continously for 20 seocnds */
 IF NOT EXISTS(SELECT name FROM msdb..sysalerts WHERE name=@TEXT) 
 BEGIN
 EXEC msdb.dbo.sp_add_alert @name=@TEXT, 
 @message_id=0, 
 @severity=0, 
 @enabled=1, 
 @delay_between_responses=0, 
 @include_event_description_in=0, 
 @category_name=N'[Uncategorized]', 
 @wmi_namespace=@wmi_namespace, 
 @wmi_query=N'select * from __instancemodificationevent within 20 where targetinstance isa ''Win32_Processor'' and targetinstance.LoadPercentage > 65', 
 @job_name=N'Infra_Capture_High_CPU_Queries'
 END 
 GO
Design a site like this with WordPress.com
Get started