Problem Description / Requirement

The customer operates a 45 TB data warehouse that is updated 24/7 with a frequence of 5 minutes. Partitioning is used on the biggest table to increase efficiency in loading processes. Before the loading starts the loading process must evaluate the last value from the attribute [o_orderkey] from the table [dbo].[orders].

This request is causing long runtimes and massive resources (CPU / Memory) when determining the maximum value.
As impact of the long running evaluation process the timeframe for the load sometimes exceed.
You are asked to find out the root cause of the long runtime and find an adequate solution to the problem.

The developers get the last value from o_orderkey with the following statement:
SELECT MAX(o_orderkey) FROM dbo.orders;

The time analysis returns the following information:

SQL Server Execution Times:
    CPU time = 1359 ms, elapsed time = 793 ms.
SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 14 ms.

Parallel execution plan for recieving the MAX value on an indexed column

The query is very expensive because it is parallelized. Parallelization is unusual when it comes to determining an aggregated value from the values of an index.

Challenge

Analyse the execution plan and check the applied index on the table [dbo].[orders].

Explain why the – simple – query takes to much time for getting the max value.
Create a solution for the customer.

The solution must recieve the max value from o_orderkey in less than 10 ms

Note

The customer is using SWITCH(), MERGE() and TRUNCATE() on partitions.
The DBA of the customer is running Maintenance on the tables by partitions because they have a small window for maintenance (Indexes / Statistics).

Your solution must not affect the customer’s processes!

Technical information

When creating the challenge, one table will be created. The table has a Primary Key on [o_orderkey] and [o_orderdate].

The table is partitioned by [o_orderdate]. The partition boundaries were defined per year.

Table Name	Number of Rows	Storage Space
dbo.orders	8.165.704	559 MB

How to start the challenge

When you are connected to the SQL Server instance…

EXECUTE sqlchallenge.dbo.create_challenge 'list';	will give you a list of open challenges
EXECUTE sqlchallenge.dbo.create_challenge 'name_of_challenge';	will create the challenge for you

After the creation just get into the database for your challenge

Solution

The problem described in this SQL Challenge is a common scenario when you work with partitioned tables and must run aggregation over attributes.

SQL Challenge #2

Connect to Microsoft SQL Server

SQL Server:	sqlchallenges.germanywestcentral.cloudapp.azure.com
Port:	1433
Version:	16.0.4165.4
Edition:	Developer Edition
Authentication Type:	SQL Server Authentication
Login Name:	sqlchallenge
Password:	sqlchallenge#2025!
Encryption	Mandatory
Trust Server Certificate:	True

Associated Workshops

Workshop for Partitioning in Microsoft SQL Server

In this comprehensive workshop, we will delve deep into the principles, strategies, and practical implementation of data partitioning in SQL Server. Whether you’re a seasoned database administrator, developer, or data analyst, this workshop will equip you with the knowledge and hands-on experience needed to harness the benefits of partitioning for your database environments.

4 Comments

Seba on 2025-01-28 at 14:47

maybe columstore index can help
- Uwe Ricken on 2025-01-28 at 17:35
  
  Hey Seba – good approach but it will not work efficiently. Only the compression can optimize (a little bit).
  The real problem stays existent – the full scan.
  We want an efficient index use on the PK (o_orderkey).
  
  It’s difficult. I will take your suggestion into to the solution.
  
  Thank you for your valued feedback, Uwe
  - Seba on 2025-01-29 at 12:54
    
    CREATE NONCLUSTERED columnstore INDEX IX_orders_o_orderkey_datexxx2
    ON dbo.orders ( o_orderdate , o_orderkey )
    WITH (maxdop =1)
    ON ps_o_orderdate(o_orderdate);
    
    set statistics io, time on
    select max(o_orderkey) fROM dbo.orders
    
    (1 row affected)
    Table ‘orders’. Scan count 2, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 4009, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table ‘orders’. Segment reads 11, segment skipped 0.
    
    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 12 ms.
    
    not great not terrible 😉
Henrik Staun Poulsen on 2025-02-06 at 13:53

This runs a bit quicker:

DROP TABLE IF EXISTS #p
CREATE TABLE #p (PartitionNumber INT NOT NULL PRIMARY KEY CLUSTERED) WITH (DATA_COMPRESSION=page)

INSERT INTO #p(PartitionNumber)
SELECT p.Partition_Number
FROM sys.partitions p
INNER JOIN sys.objects o ON o.object_id = p.object_id
WHERE o.name=’orders’
AND p.rows>0

SELECT ISNULL(MAX(o_orderkey),0)
FROM (
SELECT * FROM (
SELECT * FROM #p p
) t1
CROSS APPLY (
SELECT ISNULL(MAX(o_orderkey),0) AS o_orderkey
FROM dbo.orders o
WHERE $PARTITION.pf_o_orderdate(o.o_orderdate) = t1.PartitionNumber
) AS o
) AS t2

SQL Challenge #2 – Aggregations in Partitioned Tables