Challenge – When the marketing department calls
The marketing department is planning a private event for its 10 TOP customers. The marketing department provides the following guidelines for customer selection:
- Only customers with at least 3 orders in the last quarter of 2022 may be considered
- The 10 customers with the most orders in the last quarter of 2022 should be considered
- If more customers had the same number of orders in the last quarter of 2022, they should also be invited – regardless of sales.
- For the total amount of sales, only the last 3 orders from 2022 may be taken into account.
Solution
When you are challenged with multiple kind of requirements, you shall it break down into as small parts as possible. This allows you to concentrate exclusively on the relevant sub-area.
Only Customers with at least 3 orders in the last quarter of 2022
It is intended to be an event for the company’s top 10 customers. The selection criteria is the number of orders, regardless of the order value. This creates a conflict for those customers who – as can be seen in the example – also have 38 orders, but have made less revenue. The marketing department wants to avoid this and has set an additional condition that – if there are several customers with the same number of orders – these customers must also be invited.
SELECT TOP (10)
o_custkey,
COUNT_BIG(*) AS num_orders
FROM dbo.orders
WHERE o_orderdate >= '2022-10-01'
AND o_orderdate <= '2022-12-31'
GROUP BY
o_custkey
HAVING COUNT_BIG(*) >= 3
ORDER BY
COUNT_BIG(*) DESC
SELECT TOP (10) WITH TIES
o_custkey,
COUNT_BIG(*) AS num_orders
FROM dbo.orders
WHERE o_orderdate >= '2022-10-01'
AND o_orderdate <= '2022-12-31'
GROUP BY
o_custkey
HAVING COUNT_BIG(*) >= 3
ORDER BY
COUNT_BIG(*) DESC
This solution approach takes into account that only the 10 customers with the most orders are selected. However, this solution does not allow for customers with the same number of orders to also be listed. The WITH TIES
option in a SELECT
statement, when used with TOP (n)
, ensures that if there are additional rows that have the same values as the last row in the result set according to the ORDER BY
clause, those rows are included in the final output as well, potentially resulting in more than n
rows being returned.
Without Ties | With Ties |
---|---|
![]() | ![]() |
Only the last 3 orders are considered for the aggregation
This part of the challenge presents us with bigger problems. It should be made clear that there are different approaches to a solution! A clear hint in the challenge was that this query is frequently updated in a dashboard. It has to be performant!
;WITH l
AS
(
SELECT ROW_NUMBER() OVER
(
PARTITION BY o_custkey
ORDER BY o_orderdate DESC
) AS rn,
o_custkey,
o_totalprice
FROM dbo.orders
WHERE o_orderdate >= '2022-10-01'
AND o_orderdate <= '2022-12-31'
)
SELECT o_custkey,
o_totalprice
FROM l
WHERE o_custkey = 1184546
AND rn <= 3;
GO
SELECT TOP (3)
o_custkey,
o_totalprice
FROM dbo.orders
WHERE o_custkey = 1184546
AND o_orderdate >= '2022-10-01'
AND o_orderdate <= '2022-12-31'
ORDER BY
o_orderdate DES
SQL Server Execution Times:
CPU time = 483 ms, elapsed time = 243 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
SQL Server Execution Times:
CPU time = 437 ms, elapsed time = 224 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Both solutions behave almost identically (for ONE customer) in their query behavior. Because we are not usin any indexes at this time, this behavior should not be surprising. But even with a suitable index, there are no significant differences in runtime behavior.
/* Creation of an nonclustered index covering the predicates and the output */
CREATE NONCLUSTERED INDEX nix_orders_o_custkey
ON dbo.orders
(
o_custkey,
o_orderdate DESC
)
INCLUDE (o_totalprice)
WITH
(
DATA_COMPRESSION = PAGE,
SORT_IN_TEMPDB = ON
);
GO
Let’s assemble the elements
There is hardly a measurable difference in the performance of solutions. Nevertheless, the solution you choose is crucial for further processing.
The variant using TOP (n) cannot be used in a JOIN, as the number of records to be delivered is always determined depending on the current customer number. In a JOIN, it is not possible to use attributes of the outer table in the subquery. Only the APPLY operator has this option.
;WITH cl
AS
(
/*
All customers with more than 3 orders
in the last quarter
*/
SELECT TOP (10) WITH TIES
o_custkey,
COUNT_BIG(*) AS num_orders
FROM dbo.orders
WHERE o_orderdate >= '2022-10-01'
AND o_orderdate <= '2022-12-31'
GROUP BY
o_custkey
HAVING COUNT_BIG(*) >= 3
ORDER BY
COUNT_BIG(*) DESC
)
SELECT c.c_custkey,
c.c_name,
cl.num_orders,
SUM(lo.o_totalprice) AS total_amount
FROM dbo.customers AS c
INNER JOIN cl ON (c.c_custkey = cl.o_custkey)
INNER JOIN
(
/* All orders from each of the TOP n customers */
SELECT ROW_NUMBER() OVER (PARTITION BY o.o_custkey ORDER BY o.o_orderdate DESC) AS rn,
o.o_custkey,
o.o_totalprice
FROM dbo.orders AS o
WHERE o_orderdate >= '2022-10-01'
AND o_orderdate <= '2022-12-31'
) AS lo
ON (c.c_custkey = lo.o_custkey)
WHERE lo.rn <= 3
GROUP BY
c.c_custkey,
c.c_name,
cl.num_orders
ORDER BY
cl.num_orders DESC,
SUM(lo.o_totalprice) DESC;
GO
;WITH cl
AS
(
/*
All customers with more than 3 orders
in the last quarter
*/
SELECT TOP (10) WITH TIES
o_custkey,
COUNT_BIG(*) AS num_orders
FROM dbo.orders
WHERE o_orderdate >= '2022-10-01'
AND o_orderdate <= '2022-12-31'
GROUP BY
o_custkey
HAVING COUNT_BIG(*) >= 3
ORDER BY
COUNT_BIG(*) DESC
)
SELECT c.c_custkey,
c.c_name,
cl.num_orders,
SUM(lo.o_totalprice) AS total_amount
FROM dbo.customers AS c
INNER JOIN cl
ON (c.c_custkey = cl.o_custkey)
CROSS APPLY
(
/* The last 3 orders */
SELECT TOP (3)
o.o_totalprice
FROM dbo.orders AS o
WHERE o.o_custkey = c.c_custkey
AND o_orderdate >= '2022-10-01'
AND o_orderdate <= '2022-12-31'
ORDER BY
o_orderdate DESC
) AS lo
GROUP BY
c.c_custkey,
c.c_name,
cl.num_orders
ORDER BY
cl.num_orders DESC,
SUM(lo.o_totalprice) DESC;
GO
JOIN variant | APPLY variant |
---|---|
SQL Server Execution Times: | SQL Server Execution Times: |
![]() | ![]() |
Let’s make it perfect
Finally, let’s implement indexes to access the tables as efficiently as possible.
/* Placing a clustered index by adding the PRIMARY KEY to dbo.customers */
ALTER TABLE dbo.customers
ADD CONSTRAINT pk_customers PRIMARY KEY CLUSTERED (c_custkey)
WITH
(
DATA_COMPRESSION = PAGE,
SORT_IN_TEMPDB = ON
);
GO
/* This index is for the optimization of the search for the TOP (10) customers */
CREATE NONCLUSTERED INDEX nix_orders_o_orderdate
ON dbo.orders
(o_orderdate)
INCLUDE (o_custkey)
WITH
(
DATA_COMPRESSION = PAGE,
SORT_IN_TEMPDB = ON
);
GO
SQL Server Execution Times:
CPU time = 1078 ms, elapsed time = 575 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 484 ms, elapsed time = 263 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Thank you for reading.