Index Health and Data Loads in SQL Server

Most databases have maintenance plans running that monitor the health of the database, including indexes, and perform maintenance tasks as needed. But, what happens when data loads are sufficient enough to fragment the indexes and degrade performance during the data load? Is there a method to measure index health, and rebuild or reorganize the indexes, at strategic points in the data load process dynamically so that only troubled indexes are rebuilt or reorganized?

A web search will reveal a wealth of blogs, white papers and technical documentation on querying system tables within SQL Server to determine index status and health. Even better, many provide information on when to reorganize or rebuild an index. As with most online examples, you will likely find an 80%, sometimes more, solution. This example is a compilation of my research and works well for our data loads.

After monitoring large data loads and identifying points at which the loads begin to slow, it was obvious that the data loads could benefit from index maintenance at strategic points during the data load. Dimensional databases may benefit from index maintenance on the DIM tables prior to loading FACT tables, or on FACT tables at some point during the data load. The DataVault design pattern may benefit from index maintenance on HUB tables after their load completes and prior to loading HUB Satellite and Link tables. Likewise, maintenance on Link table indexes after their load completes and prior to their Satellite tables can also improve load performance. The stored procedure can be parameterized to receive a partial table name (prefix) to control which tables are considered. For example: ‘DIM%’, ‘FACT%’, ‘HUB%’, ‘LNK%’ and ‘SAT%’.

Through research, you will find generally accepted thresholds and strategies that work best as a starting point. Once implemented, you may also modify these settings and measure their effect on performance. Generally, you reorganize an index at 5% fragmentation and rebuild an index at 30% fragmentation. Many articles mention other considerations that can be drivers from the same set of system tables, which I have not incorporated. If you attempt to reorganize or rebuild an index in SQL Server, and based on other factors SQL Server doesn’t calculate that the action will be beneficial, SQL Server will abort the action. You may run the stored procedure and find that some indexes are still fragmented. This is because SQL Server does not believe the action to be beneficial for that index.

We are going to examine and accomplish a few things, in one stored procedure, that performs the maintenance only if the index fragmentation meets the threshold to reorganize or rebuild.

First, we will set our thresholds.

CREATE PROCEDURE [dbo].[sp_RebuildIndexes]

AS

DECLARE @Schema VARCHAR(130)
DECLARE @Table VARCHAR(130)
DECLARE @Index VARCHAR(130)
DECLARE @PctFragmented float
DECLARE @PartitionNumber bigint
DECLARE @PartitionCount bigint
DECLARE @v_FrgReOrgLmt float
DECLARE @v_FrgReBuildLmt float
DECLARE @v_CMD VARCHAR(max)
DECLARE @IndexType VARCHAR(130)

— ***** Set Reorg and Rebuild Thresholds *****
SET @v_FrgReOrgLmt = 5.00
SET @v_FrgReBuildLmt = 30.00

Then we need to identify indexes that were disabled during the data load and rebuild those indexes. This isn’t common, and you can remove that block of code if you do not disable indexes, but it is included if you need it. This information is best pulled from a group of system tables identified in the query.

— ***** Rebuild disabled indexes *****DECLARE DisabledIndexes CURSOR FOR
SELECT dbschemas.[name] AS ‘Schema’,

dbtables.[name] AS ‘Table’,
sysindexes.[name] AS ‘Index’,
sysindexes.[type_desc] AS ‘IndexType’

FROM sys.indexes sysindexes
INNER JOIN sys.tables dbtables on dbtables.[object_id] = sysindexes.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
WHERE dbschemas.name != ‘dbo’ — Assumes you don’t want to rebuild indexes in the dbo schema
AND sysindexes.is_disabled = 1
ORDER BY sysindexes.[type_desc]

OPEN DisabledIndexes

FETCH NEXT FROM DisabledIndexes INTO @Schema, @Table, @Index, @IndexType

WHILE @@FETCH_STATUS = 0
BEGIN

SET @v_CMD = ‘ALTER INDEX ‘ + ‘[‘ + @Index + ‘]’ + ‘ ON [‘ + RTRIM(LTRIM(@Schema)) + ‘].[‘ + RTRIM(LTRIM(@Table)) + ‘] REBUILD ‘

EXEC (@v_CMD);

PRINT N’Disabled Executed Rebuild: ‘ + @v_CMD;

FETCH NEXT FROM DisabledIndexes INTO @Schema, @Table, @Index, @IndexType

END

CLOSE DisabledIndexes

DEALLOCATE DisabledIndexes

 

The next step would be to identify indexes that require maintenance. I do this by getting a list of all indexes and reorganize or rebuild them conditionally. Again, the information is available in a group of system tables identified by the query. This process also takes partitioned tables into consideration, which are handled slightly different because of their partitions.
We want to rebuild the index if 1) the rebuild limit is met, or 2) the reorg limit is met and the table is partitioned. We cannot reorganize a since partition, so we revert to a rebuild of just that partition.

We want to reorganize the index if 1) the reorg limit is met, or 2) the table is not partitioned. Statistics are not calculated when an index is reorganized, so an Update Statistics command must follow the reorganize command.

— ***** Rebuild or Reorganize indexes if necessary *****DECLARE AllIndexes CURSOR FOR
SELECT dbschemas.[name] as ‘Schema’,

dbtables.[name] as ‘Table’,
dbindexes.[name] as ‘Index’,
indexstats.avg_fragmentation_in_percent as ‘PctFragmented’,
indexstats.partition_number AS ‘PartitionNumber’,
partitioncount.partition_number AS ‘PartitionCount’

FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats
INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id] AND indexstats.index_id = dbindexes.index_id
INNER JOIN (SELECT object_id, partition_number, index_id

FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) a
WHERE partition_number =
(SELECT max(partition_number) AS ‘PartitionCount’
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) b
WHERE b.object_id = a.object_id
AND b.index_id = a.index_id) ) AS partitioncount ON partitioncount.[object_id] = indexstats.[object_id]
AND indexstats.index_id = partitioncount.index_id

WHERE indexstats.database_id = DB_ID()
AND dbindexes.[name] IS NOT NULL
ORDER BY indexstats.avg_fragmentation_in_percent desc, dbschemas.[name], dbtables.[name], dbindexes.[name]

OPEN AllIndexes

FETCH NEXT FROM AllIndexes INTO @Schema, @Table, @Index, @PctFragmented, @PartitionNumber, @PartitionCount
WHILE @@FETCH_STATUS = 0
BEGIN

— Rebuild the index if: 1) Rebuild limit is met; 2) Reorg limit is met and the table is partitioned.
IF (@PctFragmented >= @v_FrgReBuildLmt) OR
( @PctFragmented >= @v_FrgReOrgLmt and @PartitionCount > 1) –30
BEGIN

SET @v_CMD = ‘ALTER INDEX ‘ + ‘[‘ + @Index + ‘] ON [‘ + RTRIM(LTRIM(@Schema)) + ‘].[‘ + RTRIM(LTRIM(@Table)) + ‘] REBUILD ‘

IF(@PartitionCount > 1)

BEGIN
SET @v_CMD = @v_CMD + ‘Partition = ‘ + RTRIM(LTRIM(@PartitionNumber))
END

EXEC (@v_CMD);

PRINT N’Executed REBUILD: ‘ + @v_CMD;

END

— Reorganize the index if: 1) Reorg limit is met.
2) Table is not partitioned
IF (@PctFragmented < @v_FrgReBuildLmt and @PctFragmented >= @v_FrgReOrgLmt

and @PartitionCount = 1 ) –5
BEGIN

SET @v_CMD = ‘ALTER INDEX ‘ + ‘[‘ + @Index + ‘] ON [‘ + RTRIM(LTRIM(@Schema)) + ‘].[‘ + RTRIM(LTRIM(@Table)) + ‘] REORGANIZE ‘

EXEC (@v_CMD)

PRINT N’Executed REORGANIZE: ‘ + @v_CMD;

— Update statistics for the reorganized index
SET @v_CMD = ‘UPDATE STATISTICS [‘ + RTRIM(LTRIM(@Schema)) + ‘].[‘ +
RTRIM(LTRIM(@Table)) + ‘] ‘ + ‘[‘ + @Index + ‘] ‘

EXEC (@v_CMD)

PRINT N’Executed CALC STATS: ‘ + @v_CMD;

END

FETCH NEXT FROM AllIndexes INTO @Schema, @Table, @Index, @PctFragmented, @PartitionNumber, @PartitionCount

END

CLOSE AllIndexes

DEALLOCATE AllIndexes

GO

 

Other, and possibly more common, uses for this strategy may be to execute the stored procedure prior to beginning the data load process to ensure that indexes are healthy, and again after the data load process to ensure that the indexes are again healthy, so that reporting solutions that pull data from this data source execute as efficiently as possible.

Although this is just one possible solution, it provides an additional approach to index challenges during and after a large data load.



« Back to blog