Categories
Database Development

How to avoid tempdb spill

I have several tempdb spills in queries which are currently serving business purposes. I am told by online searching that tempdb spills should never be ignored as they can cause performance problems. It’s still a bit unclear to me as to why that is. Since I have also read that SQL Server by itself will request additional memory for a query if it notices its estimation was wrong for a certain operator. It would seem logical then, that there would only be a performance problem when SQL Server is actually out of memory, which would happen regardless as the memory requirement and available memory would not change if only the estimation is changed to be correct.

Do tempdb spills mean that the SQL Server needs more memory added to it or is there some other reason why tempdb spills are bad that I am not accounting for? I do notice performance drops for queries where the size of the tempdb spills is large, so I am assuming it is a problem which will only get worse as data is added.

To avoid tempdb spills I see the following mentioned online:

  1. Make sure statistics are up to date.
  2. Build indexes in a correct manner.
  3. Adjust query if possible.

I believe I’ve done this. Though I am definitely happy to be proven wrong. It is also worth noting that I also receive tempdb spills when converting said queries into parameterized queries.

I’d really like insights into how a sort operator, and more specifically a tempdb spill can be completely avoided. Either by table design or any other possibility I can’t think of. I am finding very little info about this online apart from the rules above which I have already applied. Even a prominent book on SQL Server performance only mentions the word spill twice. Is there any good resource on this which really goes into depth?

This is the sort operator on the production server:

enter image description here

Below you will find the query I am trying to execute which leads to a sort operator, which does not spill locally with 390 thousand Attribute records, but does with 5 million Attribute records. However, the execution plan has a sort operator in it. This sort operator is what I am trying to remove due to it spilling on the production server. How can I remove the sort operator or avoid the tempdb spill it causes?

Execute this script if you wish to follow along with an example. Otherwise skip it.

USE [master]
GO

DROP DATABASE IF EXISTS [TempDbSpill]
GO

CREATE DATABASE [TempDbSpill]
GO

USE [TempDbSpill]

CREATE TABLE [Product] (
    [Id] UNIQUEIDENTIFIER NOT NULL
)

CREATE TABLE [AttributeType] (
    [Id] UNIQUEIDENTIFIER NOT NULL,
    [Description] NVARCHAR(MAX) NOT NULL
)

CREATE TABLE [AttributeValue] (
    [Id] UNIQUEIDENTIFIER NOT NULL,
    [Value] NVARCHAR(MAX) NOT NULL
)

CREATE TABLE [Attribute] (
    [Id] UNIQUEIDENTIFIER NOT NULL,
    [AttributeTypeId] UNIQUEIDENTIFIER NOT NULL,
    [ProductId] UNIQUEIDENTIFIER NOT NULL,
    [ValueId] UNIQUEIDENTIFIER NULL
)

INSERT INTO
    [Product] ([Id])
SELECT
    NEWID()
FROM
    INFORMATION_SCHEMA.COLUMNS AS [a]
CROSS JOIN
    INFORMATION_SCHEMA.COLUMNS AS [b]
CROSS JOIN
    INFORMATION_SCHEMA.COLUMNS AS [c]
CROSS JOIN
    INFORMATION_SCHEMA.COLUMNS AS [d]
CROSS JOIN
    INFORMATION_SCHEMA.COLUMNS AS [e]

INSERT INTO
    [AttributeType] ([Id], [Description])
SELECT
    NEWID(), CONCAT(NEWID(), NEWID(), NEWID(), NEWID())
FROM
    INFORMATION_SCHEMA.COLUMNS AS [a]
INSERT INTO
    [AttributeType] ([Id], [Description])
SELECT
    '960BE057-EB5B-4746-9A96-0806723433E9', 'Description'

INSERT INTO
    [AttributeValue] ([Id], [Value])
SELECT
    NEWID(), CONCAT(NEWID(), NEWID(), NEWID(), NEWID())
FROM
    INFORMATION_SCHEMA.COLUMNS AS [a]
CROSS JOIN
    INFORMATION_SCHEMA.COLUMNS AS [b]
CROSS JOIN
    INFORMATION_SCHEMA.COLUMNS AS [c]
CROSS JOIN
    INFORMATION_SCHEMA.COLUMNS AS [d]
INSERT INTO
    [AttributeValue] ([Id], [Value])
SELECT
    '98082f59-2cbf-439a-a0c5-b3f56aa8d71a', 'Value'

INSERT INTO
    [Attribute] ([Id], [AttributeTypeId], [ProductId], [ValueId])
SELECT
    NEWID(), [ChecksumAttribute].[AttributeTypeId], [Product].[Id], [ChecksumAttribute].[AttributeValueId]
FROM
    [Product]
JOIN (
    SELECT
        [AttributeType].[Id] AS [AttributeTypeId]
        ,[AttributeValue].[Id] AS [AttributeValueId]
        ,ABS(CHECKSUM(CONCAT([AttributeType].[Id], [AttributeValue].[Id]))) % 10000 AS [Checksum]
    FROM
        [AttributeType]
    CROSS JOIN
        [AttributeValue]
) AS [ChecksumAttribute] ON ABS(CHECKSUM([Product].[Id])) % 10000 = [ChecksumAttribute].[Checksum]


ALTER TABLE [Product] ADD CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED ([Id])
ALTER TABLE [Attribute] ADD CONSTRAINT [PK_Attribute] PRIMARY KEY CLUSTERED ([Id])
ALTER TABLE [AttributeValue] ADD CONSTRAINT [PK_AttributeValue] PRIMARY KEY CLUSTERED ([Id])

Below is the query demonstrating the problem.

SELECT
    [GroupByResult].[AttributeTypeId]
    ,[AttributeValue].[Value]
    ,[GroupByResult].[Count]
FROM (
    SELECT
        [AttributeTypeId]
        ,[ValueId]
        ,COUNT(*) AS [Count]
    FROM
        [Attribute]
    WHERE
        [ProductId] IN 
        (
            SELECT
                [Id]
            FROM
                [Product]
            WHERE
                EXISTS (
                    SELECT
                        1
                    FROM
                        [Attribute]
                    JOIN
                        [AttributeValue] ON [Attribute].[ValueId] = [AttributeValue].[Id]
                    WHERE
                        [Product].[Id] = [Attribute].[ProductId] AND
                        ([Attribute].[AttributeTypeId] = '960BE057-EB5B-4746-9A96-0806723433E9') AND [Attribute].[ValueId] IN ('98082f59-2cbf-439a-a0c5-b3f56aa8d71a'))
        )
    GROUP BY
        [AttributeTypeId]
        ,[ValueId]
) AS [GroupByResult]
JOIN
    [AttributeValue] ON [GroupByResult].[ValueId] = [AttributeValue].[Id]
OPTION (MAXDOP 1, RECOMPILE)

As expected, clearly seen from the execution plan, an index is recommended.

enter image description here

CREATE NONCLUSTERED INDEX [IX_Attribute_SuggestedIndex1] ON [dbo].[Attribute] ([AttributeTypeId],[ValueId]) INCLUDE ([ProductId])

However after applying this index the sort operator remains. A second execution leads to another index being recommended.

enter image description here

CREATE NONCLUSTERED INDEX [IX_Attribute_SuggestedIndex2] ON [dbo].[Attribute] ([ProductId]) INCLUDE ([AttributeTypeId],[ValueId])

However after applying this index, again no luck.

enter image description here

I’ve pretty much tried every combination of indexes possible. I’ve tried making AttributeTypeId, ProductId, ValueId the primary key, to no avail.

Additionally, below is another query with a sort operator which spills on the production server.

SELECT TOP 1001
    *
FROM
    [Product]
WHERE
    EXISTS (
        SELECT
            1
        FROM
            [Attribute]
        WHERE
            [Product].[Id] = [Attribute].[ProductId] AND
            [Attribute].[AttributeTypeId] = '960BE057-EB5B-4746-9A96-0806723433E9' AND
            [Attribute].[ValueId] IN ('98082f59-2cbf-439a-a0c5-b3f56aa8d71a'))
OPTION (MAXDOP 1, RECOMPILE)

Again, I want to understand how I can avoid a tempdb spill by any means necessary.

Leave a Reply

Your email address will not be published. Required fields are marked *