Problem Description / Requirement
The customer has developed an application that he himself has developed and swears by the use of user-defined functions. The developers argue that user-defined functions create reusable code that makes work easier.
A calculated attribute [sales_volume] is used in the [dbo].[customers] table. Users complain that the search for individual customers is satisfactory, but reports that have to list several customers sometimes take more than 30 seconds to display the data.
Challenge
Analyze the situation. Check the runtime behavior and implement the resolution practices that seem best to you to solve the problem. You may use any techniques that you know.
- The table [dbo].[customers] must continue to have a calculated attribute [sales_volume].
- The parameters used to calculate [sales_volume] must not be changed!
The business is not satisfied with the performance when using the User Defined Function. They complain that even searching for a single customer takes a long time.
Solution
Step 1: Analyze the Execution Plan
Every analysis of the performance of a query starts with an execution plan. This makes it very easy to identify the problem in this case. The first query is looking for only ONE customer with an INDEX SEEK. The INDEX SEEK is quick but the COMPUTE SCALAR takes more than 500ms. The second query is looking for 91 customers – also with an INDEX SEEK. But look to the performance of the COMPUTE SCALAR operation!
DECLARE @c_custkey BIGINT = 10;
/* Search pattern for one customer using the primary key */
SELECT * FROM [dbo].[customers]
WHERE c_custkey = @c_custkey
OPTION (RECOMPILE);
GO

/* Search pattern for multiple customers using a range */
DECLARE @from_c_custkey BIGINT = 10;
DECLARE @to_c_custkey BIGINT = 100;
SELECT * FROM [dbo].[customers]
WHERE c_custkey BETWEEN @from_c_custkey AND @to_c_custkey
OPTION (RECOMPILE);
GO

Step 2: Figure out what User Defined Function is used in the table
To find out how the value of [sales_volume] is calculated, simply double-click on the attribute to open the properties window. The function used is called [dbo].[cf_customer_sales_volume].
Step 3: Analyze the User Defined Function
CREATE OR ALTER FUNCTION [dbo].[cf_customer_sales_volume]
(
@o_custkey BIGINT,
@sales_year INT
)
RETURNS MONEY
AS
BEGIN
DECLARE @result_sales MONEY;
DECLARE @result_table TABLE (sales_volume MONEY DEFAULT (0));
INSERT INTO @result_table (sales_volume)
SELECT SUM (o_totalprice)
FROM dbo.orders
WHERE o_custkey = @o_custkey
AND YEAR(o_orderdate) = @sales_year
IF EXISTS (SELECT * FROM @result_table)
SELECT @result_sales = sales_volume
FROM @result_table;
ELSE
SET @result_sales = 0;
RETURN @result_sales;
END
GO
The user-defined function has several weaknesses that significantly affect performance:
Line 10: A table variable is used that must write its data to TEMPDB
Line 16: The transformation of the attribute [o_orderdate] leads to NONSargable behavior and an index cannot be used efficiently
Line 18: The previously created table variable is queried for checking and value output (2 accesses!)
Step 4: Optimize the User Defined Function
The user defined function must remain a scalar function if the calculated attribute [sales_volume] must be retained in the table [dbo].[customers]. An inline function cannot help in this specific case. The user defined function must be rewritten in such a way that the problems mentioned above are eliminated.
CREATE OR ALTER FUNCTION [dbo].[cf_customer_sales_volume]
(
@o_custkey BIGINT,
@sales_year INT
)
RETURNS MONEY
AS
BEGIN
DECLARE @result_sales MONEY = 0;
SELECT @result_sales = ISNULL(SUM (o_totalprice), 0)
FROM dbo.orders
WHERE o_custkey = @o_custkey
AND o_orderdate >= DATEFROMPARTS(@sales_year, 1, 1)
AND o_orderdate < DATEFROMPARTS(@sales_year + 1, 1, 1)
RETURN @result_sales;
END
GO
I don’t like User Defined Functions as calculated columns in tables; if you have no choice, they should be as efficient as possible. With the rewritten function, the cost of the query is reduced by about 50%. But that was still too slow for me.
/* Search pattern for multiple customers using a range */
DECLARE @from_c_custkey BIGINT = 10;
DECLARE @to_c_custkey BIGINT = 100;
SELECT * FROM [dbo].[customers]
WHERE c_custkey BETWEEN @from_c_custkey AND @to_c_custkey
OPTION (RECOMPILE);
GO

Step 5: Optimize the query inside the User Defined Function
19 seconds for reading ~100 customer records is way to long although the NON Sargable argument has been removed. The response times only allow one conclusion: No efficient index is used for the search!
DECLARE @o_custkey BIGINT = 10;
DECLARE @sales_year INT = 2023;
SELECT SUM (o_totalprice)
FROM dbo.orders
WHERE o_custkey = @o_custkey
AND o_orderdate >= DATEFROMPARTS(@sales_year, 1, 1)
AND o_orderdate < DATEFROMPARTS(@sales_year + 1, 1, 1)
OPTION (RECOMPILE, QUERYTRACEON 9130);
GO

It is obvious that a necessary index for optimal access to the required data is not available. Once the appropriate index has been implemented, the query will exhibit satisfactory runtime behavior.
/* Index on o_custkey and o_orderdate */
CREATE NONCLUSTERED INDEX nix_orders_o_custkey_o_orderdate
ON dbo.orders
(
o_custkey,
o_orderdate
)
INCLUDE (o_totalprice)
WITH (SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO

Before / After comparison
Before
After
/* Search pattern for multiple customers using a range */
DECLARE @from_c_custkey BIGINT = 10;
DECLARE @to_c_custkey BIGINT = 100;
SELECT * FROM [dbo].[customers]
WHERE c_custkey BETWEEN @from_c_custkey AND @to_c_custkey
OPTION (RECOMPILE);
GO
/* Search pattern for multiple customers using a range */
DECLARE @from_c_custkey BIGINT = 10;
DECLARE @to_c_custkey BIGINT = 100;
SELECT * FROM [dbo].[customers]
WHERE c_custkey BETWEEN @from_c_custkey AND @to_c_custkey
OPTION (RECOMPILE);
GO


Special thank for their comments / Email with possible solutions goes to
Henrik Staun Poulsen, Damir Matešić, Art and Calin