Challenge – Aggregations in Partitioned Tables
The customer operates a 45 TB data warehouse that is updated 24/7 with a frequence of 5 minutes. Partitioning is used on the biggest table to increase efficiency in loading processes. Before the loading starts the loading process must evaluate the last value from the attribute [o_orderkey] from the table [dbo].[orders]:
- This request is causing long runtimes and massive resources (CPU / Memory) when determining the maximum value.
- As impact of the long running evaluation process the timeframe for the load sometimes exceed.
- You are asked to find out the root cause of the long runtime and find an adequate solution to the problem.
Business Condition
The business regularly imports new data into the table at short intervals and needs to update existing data. Therefore, they use the SWITCH function to detach existing partitions from the table to edit them outside the table. Using a NON-aligned index cannot be used for the existing problem because NON-aligned indexes do not support SWITCH() and MERGE().
Query to be optimized
SELECT MAX(o_orderkey) FROM dbo.orders;
Table 'orders'. Scan count 5, logical reads 71828
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 1250 ms, elapsed time = 707 ms.
Execution Plan

Solution
OK, I admit. This challenge is tough, because it is not a problem that can be easily solved with an index. The problem for this SQL Challenge is partitioning. For partitioned tables, partition elimination is performed before applying predicates. Although the primary key for the table [dbo].[orders] is set to [o_orderkey], [o_orderdate], SQL Server must completely search each partition. This must be done because it is not clear in which partition the maximum value is present.
The query itself does not have the partition key as a criteria so all partitions must be scanned. Paul White and Izik Ben Gan describes this “lack of optimization” in detail:
Author | Blog Post |
Paul White | https://www.sql.kiwi/2013/07/aggregates-and-partitioning/ |
Izik Ben Gan | https://www.itprotoday.com/early-versions/max-and-min-aggregates-against-partitioned-tables |
To optimize the query we cannot help with an additional indexe for two reasons:
- A NON-aligned index prevent the business to switch partitions out for updating data
- An additional index will behave the same way (SCAN) than the Primary Key. There is a small benefit if only the key attribute o_orderkey is part of the index but the root cause cannot be solved with an additional index!
/*
The trick is to do the aggregation by partition
and evaluate the aggregation at the end!
*/
SELECT MAX(part_agg.o_orderkey) AS max_o_orderkey
FROM sys.partitions AS p
CROSS APPLY
(
SELECT MAX(o_orderkey) AS o_orderkey
FROM dbo.orders
WHERE $PARTITION.pf_o_orderdate(o_orderdate) = p.partition_number
) AS part_agg;
GO
Original Query
SQL Server Execution Times:
CPU time = 1548 ms, elapsed time = 863 ms.
Optimized Query
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
Solution by Seba
In the comments section of the SQL Challenge #2, the user Seba suggested using a columnstore index to solve the problem. Thank you very much for the suggestion!
I also tested this variant as an alternative and came to the following conclusion:
- Due to the nature of a columnstore index (no sorting), a HASH operator must be used for aggregation. The HASH operator is a STOP operator and requires additional memory.
- The columnstore index will slow down the ETL processes, as UPDATE operations always initiate a DELETE -> INSERT process. (-)
- The columnstore index is implemented as an aligned index and does not prevent the SWITCH operations (+)
- The columnstore index does not prevent all partitions from being scanned. (-)
- The columnstore index is significantly faster compared to the original query. However, this is more due to compression than to efficient access to the data
Demo
/* Create the columnstore index */
CREATE NONCLUSTERED COLUMNSTORE INDEX nccs_orders_o_orderkey
ON dbo.orders (o_orderkey)
ON ps_o_orderdate (o_orderdate);
GO
/* as aligned - partitioned index */
SELECT partition_number,
data_compression_desc,
rows
FROM sys.partitions
WHERE object_id = OBJECT_ID(N'dbo.orders')
AND index_id = 3;
GO
/* run the query to test the performance */
SELECT MAX(o_orderkey) FROM dbo.orders;
GO
IO and CPU statistics
Table 'orders'. Scan count 2, logical reads 0
Table 'orders'. Segment reads 11, segment skipped 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 53 ms.
Execution Plan

Thank you for reading.