Problem Description / Requirement
The customer has developed an application that he himself has developed and swears by the use of user-defined functions. The developers argue that user-defined functions create reusable code that makes work easier.
A calculated attribute [sales_volume] is used in the [dbo].[customers] table. Users complain that the search for individual customers is satisfactory, but reports that have to list several customers sometimes take more than 30 seconds to display the data.
The customer was advised to equip the server with more CPUs so that queries can be executed more quickly. Initial tests have shown, however, that despite faster CPUs, performance is not noticeably improved.
They are now turning to you to solve the problem.
The basis for the analysis are the following SQL statements, which you have been able to evaluate:
/* Search pattern for searching for a customer using the primary key */ SELECT * FROM [dbo].[customers] WHERE c_custkey = @c_custkey;
/* Search pattern for list of customers in a reporting application */ SELECT * FROM [dbo].[customers] WHERE c_custkey BETWEEN @c_custkey_start AND @c_custkey_end;
Challenge
Analyze the situation. Check the runtime behavior and implement the resolution practices that seem best to you to solve the problem. You may use any techniques that you know.
Condition
- The table [dbo].[customers] must continue to have a calculated attribute [sales_volume].
- The parameters used to calculate [sales_volume] must not be changed!
Category
User Defined Functions
Level 300
Technical Information
Two tables are created in the challenge database that are relevant for this task:
Table Name | Number of Rows | Storage Space |
dbo.customers | 1.607.957 | 219 MB |
dbo.orders | 2.275.781 | 296 MB |
Both tables have a clustered primary key and several nonclustered indexes. The tables use a FOREIGN KEY to enforce referential integrity.
How to start the challenge
When you are connected to the SQL Server instance…
EXECUTE sqlchallenge.dbo.create_challenge 'list'; |
will give you a list of open challenges |
EXECUTE sqlchallenge.dbo.create_challenge 'name_of_challenge'; |
will create the challenge for you |
After the creation just get into the database for your challenge
customer expectations
The customer expects queries across the customer base of up to 10,000 customer records to be executed in less than 1 second.
Solution
I hope you had fun with this challenge. You want to know one possible solution. I helped the customer with this solution:
Connect to Microsoft SQL Server
SQL Server: | sqlchallenges.germanywestcentral.cloudapp.azure.com |
Port: | 1433 |
Version: |
16.0.4165.4
|
Edition: | Developer Edition |
Authentication Type: | SQL Server Authentication |
Login Name: | sqlchallenge |
Password: | sqlchallenge#2025! |
Encryption | Mandatory |
Trust Server Certificate: | True |
Improve Your Skills as Query Tuner for SQL Server
During this workshop, you’ll gain the expertise and confidence needed to diagnose and fine-tune slow-performing queries, resulting in improved database responsiveness and end-user satisfaction. Whether you’re an experienced database administrator, developer, or data analyst, this workshop will equip you with the knowledge and practical skills to tackle the most challenging query optimization tasks.
Improve Your Skills as a Database Developer for SQL Server
In this workshop, we will cover a wide range of topics and techniques that are essential for modern database developers working with SQL Server. From fundamental concepts to advanced best practices, you’ll gain the knowledge and practical experience needed to excel in your role.
Here is my KISS solution (it works in dev!):
CREATE NONCLUSTERED INDEX nix_orders_o_custkey_o_totalprice
ON dbo.orders(o_custkey, o_orderdate) INCLUDE (o_totalprice)
GO
ALTER FUNCTION [dbo].[cf_customer_sales_volume]
(
@o_custkey BIGINT,
@sales_year INT
)
RETURNS MONEY
AS
BEGIN
DECLARE @result_sales MONEY;
DECLARE @year_lower_limit date = CAST(CAST(@sales_year AS CHAR(4)) AS DATE)
DECLARE @year_upper_limit date = CAST(CAST(@sales_year + 1 AS CHAR(4)) AS DATE)
SELECT @result_sales = SUM (o_totalprice)
FROM dbo.orders
WHERE o_custkey = @o_custkey
AND (o_orderdate >= @year_lower_limit AND o_orderdate < @year_upper_limit)
RETURN @result_sales;
END
GO
Since posting, I have learned about DATEFROMPARTS function which should be used instead of casting.
I have learned this from relevant video by Erik Darling
https://erikdarling.com/better-filtering-with-year-and-month-functions-in-sql-server/
I’m not sure if I understand your condition:
The parameters used to calculate [sales_volume] must not be changed!
But here is my suggestion:
A trigger on dbo.Orders.
Brute force, so any update on a customer, requires a read of all the orders for this customer.
So it can be made faster, if I dared try to figure out all the combinations of updates, deletes, inserts.
alter TABLE dbo.customers DROP COLUMN sales_volume
–alter TABLE dbo.customers add [sales_volume] AS ([dbo].[cf_customer_sales_volume]([c_custkey],(2023)))
alter TABLE dbo.customers ADD sales_volume MONEY NOT NULL CONSTRAINT cust_SalesVolume DEFAULT (0)
go
CREATE OR ALTER TRIGGER dbo.SalesVolume ON dbo.Orders
AFTER INSERT, UPDATE, DELETE AS
BEGIN
SET XACT_ABORT, NOCOUNT ON
UPDATE c
SET sales_volume = ISNULL(o.o_totalprice,0)
FROM dbo.customers c
INNER JOIN (
SELECT o.o_custkey, SUM (o_totalprice) AS o_totalprice
FROM dbo.orders o
WHERE YEAR(o.o_orderdate) = 2023
GROUP BY o.o_custkey
) o ON o.o_custkey = c.c_custkey
WHERE EXISTS(SELECT * FROM inserted i where i.o_custkey = c.c_custkey)
OR
EXISTS(SELECT * FROM deleted d where d.o_custkey = c.c_custkey)
END
UPDATE orders SET o_orderdate=o_orderdate WHERE 1=1
Assuming you are NOT allowed to change the queries, the answer is an indexed view (I know, I know, I am just opening a new can of worms here), but let’s test this scenario in AdventureWorks:
CREATE FUNCTION dbo.CalculateTotalDueForCustomer(@CustomerID INT)
RETURNS DECIMAL(18, 2)
AS
BEGIN
DECLARE @TotalDue DECIMAL(18, 2);
SELECT @TotalDue = SUM(TotalDue) — just to have a scalar function
FROM Sales.SalesOrderHeader
WHERE CustomerID = @CustomerID;
RETURN ISNULL(@TotalDue, 0);
END;
ALTER TABLE Sales.SalesOrderHeader
ADD TotalDueForCustomer AS dbo.CalculateTotalDueForCustomer(CustomerID);
Let us consider the following query:
SELECT SalesOrderID, CustomerID, TotalDue, TotalDueForCustomer
FROM Sales.SalesOrderHeader
WHERE CustomerID BETWEEN 11000 AND 30000 — intervals are giving us headaches
ORDER BY CustomerID;
I only put ORDER BY to have a clearer picture.
On my machine, the query takes 326312 reads for 860 ms for CPU (Profiler, RPC completed). DO NOT use SET STATISTICS IO, TIME ON, does not reveal useful results here.
Now let’s create the indexed view on the column of interest.
CREATE VIEW [dbo].[vw_CustomerSales]
WITH SCHEMABINDING
AS
SELECT
so.CustomerID,
COUNT_BIG(*) AS TotalOrders, — COUNT_BIG for it be be indexed
SUM(so.TotalDue) AS TotalSales
FROM
Sales.SalesOrderHeader AS so
GROUP BY
so.CustomerID;
GO
CREATE UNIQUE CLUSTERED INDEX IX_CustomerSales ON dbo.vw_CustomerSales (CustomerID);
Now re-run the exact same query as above.
On my machine, the IO is now 62412 with a CPU of 485. So 5 times less IO with 40%+ drop in CPU consumption as well. Again, this is WITHOUT touching any other on the server.