SQL Server Indexes With Unused INCLUDE Columns

2022-09-03 1 Comments

SQL Server Nonclustered Indexes and INCLUDE Columns

Periodic index analysis for SQL Server typically involves tasks such as checking for missing/unused/overlapping indexes, checking for heaps that maybe should have been designed with a clustered index, analyzing ROW/PAGE compression, etc. There are numerous DMVs that you can use as a starting point for those tasks. There's also some good open source tools and scripts that members of the SQL Server community have created and shared. One task that I don't recall ever seeing was an analysis of INCLUDE columns for nonclustered indexes. What I really wanted to do was to find INCLUDE columns that were never being used, and remove them from index definitions.

So I started searching. I found an old thread from a website forum where someone had the same idea. Tara Kizer posted a #SQLHelp question on Twitter and there was just one response:

@TaraKizer There is nothing that tracks index usage to that degree. You'd have to parse the query plans using the index. #sqlhelp
— Robert L Davis (@SQLSoldier) May 13, 2014

That was more than eight years ago. Maybe someone has come up with scripts that implement Robert's suggestion since then? Or maybe there's a different solution? Who knows? So I went to Twitter and asked again. The thread wasn't getting much attention. But about a day later, Jonathan Kehayias (blog | Twitter) joined the discussion. He graciously emailed me a query that shredded query plan XML into a result set of index names and index columns referenced by a query plan. With this, I had a path to meet my goal.

Here is the base query, which parses query plans in the plan cache:

;WITH XMLNAMESPACES
	(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT DISTINCT
	idx.value('(Object/@Database)[1]', 'VARCHAR(128)') AS DatabaseName,
	idx.value('(Object/@Schema)[1]', 'VARCHAR(128)') AS SchemaName,
	idx.value('(Object/@Table)[1]', 'VARCHAR(128)') AS TableName,
	idx.value('(Object/@Index)[1]', 'VARCHAR(128)') AS IndexName,
	c.value('(@Column)[1]', 'VARCHAR(128)') AS ColumnName
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS p
CROSS APPLY p.query_plan.nodes('//IndexScan') AS i(idx)
CROSS APPLY idx.nodes('.//ColumnReference') AS x(c)

After testing out the query, a few things became apparent to me:

The column names returned by the query don't indicate whether they are index key columns or INCLUDE columns. But we can cross-reference other DMVs to make that determination.
The XML parsing code/logic could also be used to examine query plans from Query Store.
In my environment, there are a lot of cross-database queries. I'd need to look at query plans in Query Store for every database where it's on/enabled.

I put together a plan and wrote some scripts that involved querying and parsing the plan cache and every Query Store-enabled database. It worked, but performance was miserable (it was measured in days). So I changed my design and tried again. This time I found success in less than an hour on an instance with ~50 Query Store-enabled databases. I was so encouraged with the results, I created a Github repository with the scripts, which are as follows:

01 Create, Populate Table.sql: Creates a "permanent" table within [tempdb] and populates it with meta data for all nonclustered index INCLUDE columns.
02 Create Table.sql: Creates a "permanent" table to hold all query plans.
03 Load Query Plans from Plan Cache.sql: Loads query plans from plan cache into a table.
04 Load Query Plans from Query Store.sql: Loads query plans from Query Store in every database (if enabled/on) into a table.
05 Parse Query Plans.sql: Parses the table of query plans, and updates a table when INCLUDE columns are found.
06 Results.sql: Queries that help analyze the results.

Personal Results and Findings

On the primary SQL Server instance where I ran the scripts, there were a lot of unused INCLUDE columns for indexes on "dormant" tables. These are tables that showed zero [user_seeks], zero [user_scans], zero [user_lookups], and zero [user_updates] in [sys].[dm_db_index_usage_stats] for the table itself and for all of its indexes. Of course, when you see a table name like [dbo].[OrderLines bak 2013-10-25], you have a pretty good idea why no one is using it. Outside of the dormant tables, there were many indexes I found that were simply not in use. Instead of modifying them, I disabled or dropped them. As for the rest, I was surprised to realize the vast majority of the INCLUDE columns were in use. Frankly, I was a little disappointed--I was hoping to "thin the herd", speed up I/O operations, and improve memory usage.

But I did find some indexes that were candidates for modification. A few large tables in particular had indexes I modified to remove the gratuitous INCLUDE columns. A total of four indexes each shrank in size by a GB or two. And a fifth index on another large table was downright egregious. Its size was reduced from 34GB down to 16GB.

Overall, I am pleased with the results. I do have some other SQL instances where I can perform the analysis and perhaps find some candidates for optimization. Going forward I'll plan to run the scripts periodically as my environments have evolving workloads. Perhaps they can help you too.

Happy index tuning!

Comments

9/3/2022 6:09:53 PM

Glad you got the help to make this very helpful idea get published. I am working with it on SQL2019

Chris Wood