SQL Challenge #3 – Case Sensitive Searches

by Uwe Ricken | Feb 2, 2025 | Level 200, SQL Challenges | 9 comments

Depiction for a challenge with a level of 200

Problem Description / Requirement

The software manufacturer’s database uses the [SQL_Latin1_General_CP1_CI_AS] collation for the application. The development team wrote a custom function that can be used to find customers based on an address search. The department had requested that the search must be case sensitive.

The development team created a User Defined Function dbo.get_customer_address() for the business to find addresses with case sensitive conditions.

Code of the User Defined Function

CREATE FUNCTION dbo.get_customer_address(@address VARCHAR(40))
RETURNS TABLE
AS
RETURN
(
   SELECT *
   FROM   dbo.customers
   WHERE  c_address LIKE @address + '%' COLLATE Latin1_General_100_BIN
);

Challenge

The business is not satisfied with the performance when using the User Defined Function. They complain that even searching for a single customer takes a long time.

Examine the query and optimize query performance. The following rules apply:

The metadata of the tables must NOT be adjusted according to manufacturer requirements
Indexing can be used freely
The User Defined Function can be rewritten as desired
You can create and use your own objects.

Technical Information

When creating the challenge, one table will be created. The table has a Primary Key ([pk_customers]) on [c_custkey]. An additional nonclustered index ([nix_customers_c_address]) has been implemented to support wildcard searches in the c_address attribute.

Table Name	Number of Rows	Storage Space
dbo.customers	1.607.957	219 MB

How to start the challenge

When you are connected to the SQL Server instance…

EXECUTE sqlchallenge.dbo.create_challenge 'list';	will give you a list of open challenges
EXECUTE sqlchallenge.dbo.create_challenge 'name_of_challenge';	will create the challenge for you

After the creation just get into the database for your challenge

Solution

The problem can be solved by seeking all affected rows independent from the collation and afterwards taking the few found records and filter them by case sensitive searches.

https://www.sqlchallenges.de/sql-challenge-3-solution

Connect to Microsoft SQL Server

SQL Server:	sqlchallenges.germanywestcentral.cloudapp.azure.com
Port:	1433
Version:	16.0.4165.4
Edition:	Developer Edition
Authentication Type:	SQL Server Authentication
Login Name:	sqlchallenge
Password:	sqlchallenge#2025!
Encryption	Mandatory
Trust Server Certificate:	True

Associated Workshops

Improve your skills as a database developer

In this workshop, we will cover a wide range of topics and techniques that are essential for modern database developers working with SQL Server. From fundamental concepts to advanced best practices, you’ll gain the knowledge and practical experience needed to excel in your role.

9 Comments

Seba on 2025-02-03 at 08:30

The query by default scans the primary key.

select * from dbo.get_customer_address(‘jlkVvO7zLByLT3EuTem1bwE2PJtli NVVn5’)
Table ‘customers’. Scan count 3, logical reads 28271, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead r
SQL Server Execution Times:
CPU time = 516 ms, elapsed time = 275 ms.

Add hint with (index = [nix_customers_c_address])

select * from dbo.get_customer_address_NEW(‘jlkVvO7zLByLT3EuTem1bwE2PJtli NVVn5’)
Table ‘customers’. Scan count 3, logical reads 8843, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
SQL Server Execution Times:
CPU time = 297 ms, elapsed time = 168 ms.

This is some improvement. Maybe applying Full-Text Search could help better.
Reply
MaPa on 2025-02-03 at 13:38

With just clustered index

(1 row affected)
Table ‘customers’. Scan count 3, logical reads 56780, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Wint non clustered index on c_address column

Table ‘customers’. Scan count 3, logical reads 8843, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

After adding the computed column and the non clustered index on that newly created computed column

ALTER TABLE dbo.customers
ADD c_address_case AS (c_address COLLATE Latin1_General_100_BIN)

create INDEX IX_Customers_CaseAddress
ON dbo.customers (c_address_case);

Function was modified to use the new computed column in the where clause and the logical reads went down drastically.

Table ‘customers’. Scan count 1, logical reads 7, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Hence for this scenario, computed column and index on that column would perhaps be the most efficient optimization.
Reply
- Uwe Ricken on 2025-02-03 at 13:49
  
  Dear MaPa,
  
  Thank you for trying this challenge. A computed column will work but the rule is NOT to change the Meta Data.
  
  A computed column may break the application.
  
  Try to rewrite the function or add additional index(es).
  Reply
Seba on 2025-02-03 at 14:58

Another idea

alter FUNCTION [dbo].[get_customer_address_new2](@address VARCHAR(40))
RETURNS TABLE
AS
RETURN
(
SELECT x.*
FROM dbo.customers y with (index = [nix_customers_c_address])
cross apply
(select * from dbo.customers x where x.c_custkey = y.c_custkey and x.c_address COLLATE Latin1_General_100_BIN LIKE @address + ‘%’ COLLATE Latin1_General_100_BIN ) x
WHERE y.c_address LIKE @address + ‘%’
);
GO
Reply
- Uwe Ricken on 2025-02-03 at 15:44
  
  What a cool idea, Seba. I love it and it is 99% near to my solution I’ve implemented.
  Stay tuned. The solution for this challenge will be online on 9th of February.
  
  BTW: A brilliant idea to cover the “prefiltering”. Really cool!
  Reply
Art on 2025-02-03 at 20:58

This significantly reduces number of pages read:

ALTER FUNCTION dbo.get_customer_address(@address VARCHAR(40))
RETURNS TABLE
AS
RETURN
WITH Pass_One as (
SELECT c_custkey
FROM dbo.customers as C
WHERE c.c_address like @address + ‘%’
)
SELECT C.*
FROM Pass_One as PO
JOIN dbo.Customers as C on c.c_custkey = PO.c_custkey
WHERE C.c_address LIKE @address + ‘%’ COLLATE Latin1_General_100_BIN
Reply
- Uwe Ricken on 2025-02-04 at 06:12
  
  Brilliant – thank you very much for your effords.
  Reply
Henrik Staun Poulsen on 2025-02-06 at 13:04

hi Uwe,
I could not see any tables nor views on your server, so I’ve used a copy of StackOverflow.

Are we allowed to create views?
I assume “yes”.
What about indexed views?
If “yes”, then this is quite fast:

GO
CREATE OR ALTER VIEW dbo.MyUserView
WITH SCHEMABINDING
AS
SELECT Id
, AboutMe
–, some other fields from the users table
, ValidToUTC
, address = CAST(u.DisplayName AS VARCHAR(40)) COLLATE Latin1_General_100_BIN
FROM dbo.Users u

go
CREATE UNIQUE CLUSTERED INDEX ix_cl ON dbo.MyUserView (id) WITH (DATA_COMPRESSION=PAGE)
CREATE nonCLUSTERED INDEX ix_ncl ON dbo.MyUserView (address) WITH (DATA_COMPRESSION=PAGE)

go
CREATE OR ALTER FUNCTION dbo.get_customer_address_Via_MyView(@address VARCHAR(40))
RETURNS TABLE
AS
RETURN
(
SELECT *
FROM dbo.MyUserView WITH (NOEXPAND, INDEX=ix_ncl)
WHERE address LIKE @address + ‘%’ COLLATE Latin1_General_100_BIN
);

go

select * from dbo.get_customer_address_Via_MyView(‘jlkVvO7zLByLT3EuTem1bwE2PJtli NVVn5’)
Reply
Henrik Staun Poulsen on 2025-02-06 at 14:54

hi Uwe,
I got your database working.

Are we allowed to create views?
I assume “yes”.
What about indexed views?
If “yes”, then this is quite fast:

GO
CREATE OR ALTER VIEW dbo.MyUserView
WITH SCHEMABINDING
AS
SELECT u.c_custkey
, u.c_mktsegment
, u.c_nationkey
, u.c_name
, u.c_address
, u.c_phone
, u.c_acctbal
, u.c_comment
, address = CAST(u.c_address AS VARCHAR(40)) COLLATE Latin1_General_100_BIN
FROM dbo.customers u

go
CREATE UNIQUE CLUSTERED INDEX ix_cl ON dbo.MyUserView (c_custkey) WITH (DATA_COMPRESSION=PAGE)
CREATE nonCLUSTERED INDEX ix_ncl ON dbo.MyUserView (address) WITH (DATA_COMPRESSION=PAGE)

go
CREATE OR ALTER FUNCTION dbo.get_customer_address_Via_MyView(@address VARCHAR(40))
RETURNS TABLE
AS
RETURN
(
SELECT *
FROM dbo.MyUserView WITH (NOEXPAND, INDEX=ix_ncl)
WHERE address LIKE @address + ‘%’ COLLATE Latin1_General_100_BIN
);

go

select * from dbo.get_customer_address_Via_MyView(‘jlkVvO7zLByLT3EuTem1bwE2PJtli NVVn5’)
Reply