Problem Description / Requirement
The software manufacturer’s database uses the [SQL_Latin1_General_CP1_CI_AS] collation for the application. The development team wrote a custom function that can be used to find customers based on an address search. The department had requested that the search must be case sensitive.
The development team created a User Defined Function dbo.get_customer_address() for the business to find addresses with case sensitive conditions.
Code of the User Defined Function
CREATE FUNCTION dbo.get_customer_address(@address VARCHAR(40)) RETURNS TABLE AS RETURN ( SELECT * FROM dbo.customers WHERE c_address LIKE @address + '%' COLLATE Latin1_General_100_BIN );
Challenge
The business is not satisfied with the performance when using the User Defined Function. They complain that even searching for a single customer takes a long time.
Examine the query and optimize query performance. The following rules apply:
- The metadata of the tables must NOT be adjusted according to manufacturer requirements
- Indexing can be used freely
- The User Defined Function can be rewritten as desired
- You can create and use your own objects.
Category
- Optimization of code
- Level: 200
Technical Information
When creating the challenge, one table will be created. The table has a Primary Key ([pk_customers]) on [c_custkey]. An additional nonclustered index ([nix_customers_c_address]) has been implemented to support wildcard searches in the c_address attribute.
Table Name | Number of Rows | Storage Space |
dbo.customers | 1.607.957 | 219 MB |
How to start the challenge
When you are connected to the SQL Server instance…
EXECUTE sqlchallenge.dbo.create_challenge 'list'; |
will give you a list of open challenges |
EXECUTE sqlchallenge.dbo.create_challenge 'name_of_challenge'; |
will create the challenge for you |
After the creation just get into the database for your challenge
Solution
The problem can be solved by seeking all affected rows independent from the collation and afterwards taking the few found records and filter them by case sensitive searches.
Connect to Microsoft SQL Server
SQL Server: | sqlchallenges.germanywestcentral.cloudapp.azure.com |
Port: | 1433 |
Version: |
16.0.4165.4
|
Edition: | Developer Edition |
Authentication Type: | SQL Server Authentication |
Login Name: | sqlchallenge |
Password: | sqlchallenge#2025! |
Encryption | Mandatory |
Trust Server Certificate: | True |
The query by default scans the primary key.
select * from dbo.get_customer_address(‘jlkVvO7zLByLT3EuTem1bwE2PJtli NVVn5’)
Table ‘customers’. Scan count 3, logical reads 28271, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead r
SQL Server Execution Times:
CPU time = 516 ms, elapsed time = 275 ms.
Add hint with (index = [nix_customers_c_address])
select * from dbo.get_customer_address_NEW(‘jlkVvO7zLByLT3EuTem1bwE2PJtli NVVn5’)
Table ‘customers’. Scan count 3, logical reads 8843, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 297 ms, elapsed time = 168 ms.
This is some improvement. Maybe applying Full-Text Search could help better.
With just clustered index
(1 row affected)
Table ‘customers’. Scan count 3, logical reads 56780, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Wint non clustered index on c_address column
Table ‘customers’. Scan count 3, logical reads 8843, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
After adding the computed column and the non clustered index on that newly created computed column
ALTER TABLE dbo.customers
ADD c_address_case AS (c_address COLLATE Latin1_General_100_BIN)
create INDEX IX_Customers_CaseAddress
ON dbo.customers (c_address_case);
Function was modified to use the new computed column in the where clause and the logical reads went down drastically.
Table ‘customers’. Scan count 1, logical reads 7, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Hence for this scenario, computed column and index on that column would perhaps be the most efficient optimization.
Dear MaPa,
Thank you for trying this challenge. A computed column will work but the rule is NOT to change the Meta Data.
A computed column may break the application.
Try to rewrite the function or add additional index(es).
Another idea
alter FUNCTION [dbo].[get_customer_address_new2](@address VARCHAR(40))
RETURNS TABLE
AS
RETURN
(
SELECT x.*
FROM dbo.customers y with (index = [nix_customers_c_address])
cross apply
(select * from dbo.customers x where x.c_custkey = y.c_custkey and x.c_address COLLATE Latin1_General_100_BIN LIKE @address + ‘%’ COLLATE Latin1_General_100_BIN ) x
WHERE y.c_address LIKE @address + ‘%’
);
GO
What a cool idea, Seba. I love it and it is 99% near to my solution I’ve implemented.
Stay tuned. The solution for this challenge will be online on 9th of February.
BTW: A brilliant idea to cover the “prefiltering”. Really cool!
This significantly reduces number of pages read:
ALTER FUNCTION dbo.get_customer_address(@address VARCHAR(40))
RETURNS TABLE
AS
RETURN
WITH Pass_One as (
SELECT c_custkey
FROM dbo.customers as C
WHERE c.c_address like @address + ‘%’
)
SELECT C.*
FROM Pass_One as PO
JOIN dbo.Customers as C on c.c_custkey = PO.c_custkey
WHERE C.c_address LIKE @address + ‘%’ COLLATE Latin1_General_100_BIN
Brilliant – thank you very much for your effords.
hi Uwe,
I could not see any tables nor views on your server, so I’ve used a copy of StackOverflow.
Are we allowed to create views?
I assume “yes”.
What about indexed views?
If “yes”, then this is quite fast:
GO
CREATE OR ALTER VIEW dbo.MyUserView
WITH SCHEMABINDING
AS
SELECT Id
, AboutMe
–, some other fields from the users table
, ValidToUTC
, address = CAST(u.DisplayName AS VARCHAR(40)) COLLATE Latin1_General_100_BIN
FROM dbo.Users u
go
CREATE UNIQUE CLUSTERED INDEX ix_cl ON dbo.MyUserView (id) WITH (DATA_COMPRESSION=PAGE)
CREATE nonCLUSTERED INDEX ix_ncl ON dbo.MyUserView (address) WITH (DATA_COMPRESSION=PAGE)
go
CREATE OR ALTER FUNCTION dbo.get_customer_address_Via_MyView(@address VARCHAR(40))
RETURNS TABLE
AS
RETURN
(
SELECT *
FROM dbo.MyUserView WITH (NOEXPAND, INDEX=ix_ncl)
WHERE address LIKE @address + ‘%’ COLLATE Latin1_General_100_BIN
);
go
select * from dbo.get_customer_address_Via_MyView(‘jlkVvO7zLByLT3EuTem1bwE2PJtli NVVn5’)
hi Uwe,
I got your database working.
Are we allowed to create views?
I assume “yes”.
What about indexed views?
If “yes”, then this is quite fast:
GO
CREATE OR ALTER VIEW dbo.MyUserView
WITH SCHEMABINDING
AS
SELECT u.c_custkey
, u.c_mktsegment
, u.c_nationkey
, u.c_name
, u.c_address
, u.c_phone
, u.c_acctbal
, u.c_comment
, address = CAST(u.c_address AS VARCHAR(40)) COLLATE Latin1_General_100_BIN
FROM dbo.customers u
go
CREATE UNIQUE CLUSTERED INDEX ix_cl ON dbo.MyUserView (c_custkey) WITH (DATA_COMPRESSION=PAGE)
CREATE nonCLUSTERED INDEX ix_ncl ON dbo.MyUserView (address) WITH (DATA_COMPRESSION=PAGE)
go
CREATE OR ALTER FUNCTION dbo.get_customer_address_Via_MyView(@address VARCHAR(40))
RETURNS TABLE
AS
RETURN
(
SELECT *
FROM dbo.MyUserView WITH (NOEXPAND, INDEX=ix_ncl)
WHERE address LIKE @address + ‘%’ COLLATE Latin1_General_100_BIN
);
go
select * from dbo.get_customer_address_Via_MyView(‘jlkVvO7zLByLT3EuTem1bwE2PJtli NVVn5’)