SQL Challenge #4 – Solution

Problem Description / Requirement

The customer has developed an application that he himself has developed and swears by the use of user-defined functions. The developers argue that user-defined functions create reusable code that makes work easier.

A calculated attribute [sales_volume] is used in the [dbo].[customers] table. Users complain that the search for individual customers is satisfactory, but reports that have to list several customers sometimes take more than 30 seconds to display the data.

Challenge

Analyze the situation. Check the runtime behavior and implement the resolution practices that seem best to you to solve the problem. You may use any techniques that you know.

  • The table [dbo].[customers] must continue to have a calculated attribute [sales_volume].
  • The parameters used to calculate [sales_volume] must not be changed!

The business is not satisfied with the performance when using the User Defined Function. They complain that even searching for a single customer takes a long time.

Solution

Step 1: Analyze the Execution Plan

Every analysis of the performance of a query starts with an execution plan. This makes it very easy to identify the problem in this case. The first query is looking for only ONE customer with an INDEX SEEK. The INDEX SEEK is quick but the COMPUTE SCALAR takes more than 500ms. The second query is looking for 91 customers – also with an INDEX SEEK. But look to the performance of the COMPUTE SCALAR operation!

DECLARE	@c_custkey BIGINT = 10;

/* Search pattern for one customer using the primary key */
SELECT * FROM [dbo].[customers]
WHERE c_custkey = @c_custkey
OPTION	(RECOMPILE);
GO
The depiction shows the costs for the search of a single record wth 500ms for the calculated value.
/* Search pattern for multiple customers using a range */
DECLARE @from_c_custkey BIGINT = 10;
DECLARE	@to_c_custkey BIGINT = 100;

SELECT * FROM [dbo].[customers]
WHERE c_custkey BETWEEN @from_c_custkey AND @to_c_custkey
OPTION	(RECOMPILE);
GO
The depiction shows the costs for the search of a 91 records wth 44,5 seconds for the calculated value.

Step 2: Figure out what User Defined Function is used in the table

To find out how the value of [sales_volume] is calculated, simply double-click on the attribute to open the properties window. The function used is called [dbo].[cf_customer_sales_volume].

Step 3: Analyze the User Defined Function

CREATE OR ALTER FUNCTION [dbo].[cf_customer_sales_volume]
(
	@o_custkey BIGINT,
	@sales_year INT
)
RETURNS MONEY
AS
BEGIN
	DECLARE	@result_sales	MONEY;
	DECLARE @result_table TABLE (sales_volume MONEY DEFAULT (0));

	INSERT INTO @result_table (sales_volume)
	SELECT	SUM (o_totalprice)
	FROM	dbo.orders
	WHERE	o_custkey = @o_custkey
			AND YEAR(o_orderdate) = @sales_year

	IF EXISTS (SELECT * FROM @result_table)
		SELECT	@result_sales = sales_volume
		FROM	@result_table;
	ELSE
		SET	@result_sales = 0;

	RETURN @result_sales;
END
GO

The user-defined function has several weaknesses that significantly affect performance:

Line 10: A table variable is used that must write its data to TEMPDB
Line 16: The transformation of the attribute [o_orderdate] leads to NONSargable behavior and an index cannot be used efficiently
Line 18: The previously created table variable is queried for checking and value output (2 accesses!)

Step 4: Optimize the User Defined Function

The user defined function must remain a scalar function if the calculated attribute [sales_volume] must be retained in the table [dbo].[customers]. An inline function cannot help in this specific case. The user defined function must be rewritten in such a way that the problems mentioned above are eliminated.

CREATE OR ALTER FUNCTION [dbo].[cf_customer_sales_volume]
(
	@o_custkey BIGINT,
	@sales_year INT
)
RETURNS MONEY
AS
BEGIN
	DECLARE	@result_sales	MONEY = 0;
	
	SELECT	@result_sales = ISNULL(SUM (o_totalprice), 0)
	FROM	dbo.orders
	WHERE	o_custkey = @o_custkey
			AND o_orderdate >= DATEFROMPARTS(@sales_year, 1, 1)
			AND o_orderdate < DATEFROMPARTS(@sales_year + 1, 1, 1)

	RETURN @result_sales;
END
GO

I don’t like User Defined Functions as calculated columns in tables; if you have no choice, they should be as efficient as possible. With the rewritten function, the cost of the query is reduced by about 50%. But that was still too slow for me.

/* Search pattern for multiple customers using a range */
DECLARE @from_c_custkey BIGINT = 10;
DECLARE	@to_c_custkey BIGINT = 100;

SELECT * FROM [dbo].[customers]
WHERE c_custkey BETWEEN @from_c_custkey AND @to_c_custkey
OPTION	(RECOMPILE);
GO
Execution plan with a performant INDEX SEED and a COMPUTE SCALAR Operator with less than half of the previous execution time.
Cost reduced by 50%

Step 5: Optimize the query inside the User Defined Function

19 seconds for reading ~100 customer records is way to long although the NON Sargable argument has been removed. The response times only allow one conclusion: No efficient index is used for the search!

DECLARE	@o_custkey BIGINT = 10;
DECLARE @sales_year INT = 2023;

SELECT	SUM (o_totalprice)
FROM	dbo.orders
WHERE	o_custkey = @o_custkey
		AND o_orderdate >= DATEFROMPARTS(@sales_year, 1, 1)
		AND o_orderdate < DATEFROMPARTS(@sales_year + 1, 1, 1)
OPTION	(RECOMPILE, QUERYTRACEON 9130);
GO
The execution plan shows a full scan on the clustered index with a filter function to get only data from one dedicated customer.
Full Scan with Pushdown Filter

It is obvious that a necessary index for optimal access to the required data is not available. Once the appropriate index has been implemented, the query will exhibit satisfactory runtime behavior.

/* Index on o_custkey and o_orderdate */
CREATE NONCLUSTERED INDEX nix_orders_o_custkey_o_orderdate
ON dbo.orders
(
	o_custkey,
	o_orderdate
)
INCLUDE (o_totalprice)
WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO

Before / After comparison

Before

After

/* Search pattern for multiple customers using a range */
DECLARE @from_c_custkey BIGINT = 10;
DECLARE	@to_c_custkey BIGINT = 100;

SELECT * FROM [dbo].[customers]
WHERE c_custkey BETWEEN @from_c_custkey AND @to_c_custkey
OPTION	(RECOMPILE);
GO
/* Search pattern for multiple customers using a range */
DECLARE @from_c_custkey BIGINT = 10;
DECLARE	@to_c_custkey BIGINT = 100;

SELECT * FROM [dbo].[customers]
WHERE c_custkey BETWEEN @from_c_custkey AND @to_c_custkey
OPTION	(RECOMPILE);
GO

Special thank for their comments / Email with possible solutions goes to

Henrik Staun Poulsen, Damir Matešić, Art and Calin

Associated Workshops