SQL Challenge #8 – Overlapping Time Series

by Uwe Ricken | Jul 7, 2025 | Level 400, SQL Challenges | 12 comments

Problem Description / Requirement

This example comes from a customer from the automtive sector. Whenever a vehicle is built, EVERY single step must be precisely recorded. So-called “telegrams” are used for this purpose. These telegrams can convey simple information (e.g., entry into an assembly station) or complex information. The crucial thing is that EVERY piece of information must be stored in a traceable manner!

Each process step in a vehicle”s production is logged in a table as “telegrams.”
– The type of telegram is identified by a flag.
– Each telegram has a start and end date.

In the event of a malfunction, you want to know which processes were active at that exact time (timeline).
– These can be very short processes (login, tool change, etc.).
– These can be very long processes (painting, wedding, etc.).

The customer complains about very long response times. Especially if a timeline spans several days (which is possible), the query sometimes takes several minutes.

Problem

The problem with executing the query is the unpredictable amount of data that can be returned depending on the time specification. For example, if a query is started with a time range that is very close to the end or beginning of the table, the query runs very well. However, the closer you get to the middle stored date value, the longer the query takes.

Quick running query

SELECT id, [t_status], begin_ts, end_ts
FROM   dbo.telegram
WHERE  begin_ts <= '2025-06-11 14:00'
       AND end_ts >= '2025-06-11 13:00';
GO

Long running query

SELECT id, [t_status], begin_ts, end_ts
FROM   dbo.telegram
WHERE  begin_ts <= '2025-05-26 14:00'
       AND end_ts >= '2025-05-26 13:00';
GO

Note

When you run both queries without optimization you will see the same IO and CPU statistics. When you try the same with an appropriate index the first query should run faster than the second one.
The customer wants a performance for the query independent from the date and time but only accepts longer runtime when more records are found!

Challenge

The table [dbo].[telegram] has only a clustered Primary Key on [internal_id], no other indexes are set
Use ANY technique you know to optimize the queries.
The customer wants a predictable runtime for the query.
The runtime must be independent of the time period in which a process is queried.

Condition

none. You can do all to improve the query (adding indexes, tables, partitions, …)

Technical Information

The creation of the challenge can take up to 5 Minutes and creates the following objects / data

Table [dbo].[telegram]
40 Mio rows

How to start the challenge

When you are connected to the SQL Server instance, run the following commands:

EXECUTE sqlchallenge.dbo.create_challenge 'list';	will give you a list of open challenges
EXECUTE sqlchallenge.dbo.create_challenge 'SQL Challenge #8 - querying time ranges';	will create the challenge for you

customer expectations

The runtime may not vary due to a different time window; however, it may vary due to the amount of data.

Solution

This challenge is certainly one of the most difficult. Whoever finds this solution can call themselves an “experienced DBA.”

SQL Challenge #8 – Solution

Connect to Microsoft SQL Server

SQL Server:	sqlchallenges.germanywestcentral.cloudapp.azure.com
Port:	1433
Version:	16.0.4165.4
Edition:	Developer Edition
Authentication Type:	SQL Server Authentication
Login Name:	sqlchallenge
Password:	sqlchallenge#2025!
Encryption	Mandatory
Trust Server Certificate:	True

Improve Your Skills as Query Tuner for SQL Server During this workshop, you’ll gain the expertise and confidence needed to diagnose and fine-tune slow-performing queries, resulting in improved database responsiveness and end-user satisfaction. Whether you’re an experienced database administrator, developer, or data analyst, this workshop will equip you with the knowledge and practical skills to tackle the most challenging query optimization tasks.

Improve Your Skills as a Database Developer for SQL Server In this workshop, we will cover a wide range of topics and techniques that are essential for modern database developers working with SQL Server. From fundamental concepts to advanced best practices, you’ll gain the knowledge and practical experience needed to excel in your role.

12 Comments

Gabriel Köhl on 2025-07-10 at 12:18

Tried a lot, ended up with a columnstore index because it’s a bit of analytics for timeseries. Furthermore, the amount of telegrams in May is massively more than in June. I don’t have experience with columnstore indexes and don’t know the dependencies in a production environment. For my understanding, it’s more for DWH and analytics. Fast Query 40ms, Slow Query 1300ms.
Reply
- Uwe Ricken on 2025-07-11 at 09:09
  
  Dear Gabriel,
  
  god idea and if it works it’s fine. There is a “better” approach when you think of the problem with the two columns begin_ts and end_ts which cannot be covered fully in one index.
  Think about a solution where you can access the time range with ONE column – maybe that’s the trick here.
  
  Good luck and THANK YOU for accepting the callenge.
  Reply
Peter Porskjær on 2025-07-11 at 13:18

Working on a solution. But its not easy. Having problems running out of log-space when adding + updating columns.
Hope i have a solution later today. Google seems to be my friend on this one.
Reply
Peter Porskjær on 2025-07-11 at 15:27

Would like to add that log-space is not the problem. Sorry for unclear post
Reply
Art on 2025-07-12 at 06:34

Hi Uwe,

Many thanks for designing new challenge!

This is interesting one, few tricks I had in mind did not deliver.
At the end, good old KISS worked just fine:

short running query:
(604 rows affected)
Table ‘telegram’. Scan count 1, logical reads 8,…
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.

long running query:
(135946 rows affected)
Table ‘telegram’. Scan count 1, logical reads 49044
SQL Server Execution Times:
CPU time = 1219 ms, elapsed time = 1251 ms.

Index seek for both queries.

Not posting script yet, I am still looking at NOTE section (I have the feeling something is lurking there 🙂
Reply
Peter Porskjær on 2025-07-14 at 08:20

I have spend quite some time on this one. Really interesting.

Google helped me to get here:
https://lucient.com/en/blog/interval-queries-in-sql-server-part-1/
https://lucient.com/blog/a-static-relational-interval-tree/
https://web.archive.org/web/20220526022251/https://blogs.solidq.com/en/businessanalytics/using-static-relational-interval-tree-time-intervals/

I have:
Added lower_int, upper_int and node (persisted) to telegram – my time starts at 2020-01-01 and the grain is minutes. Its not fast so patience is virtue.

Created a function to calculate the node

Indexed the node column with a covering index for the output query.

Created a bitmaptable with appropriate depth.

I have created the helper functions to get the different parts of the tree…… So far so good

Then populated a new table with all intervals and corresponding overlapping nodes.

On the positive note: Its seems to be blazingly fast on searching

Bad news: I’m clearly off by a bit in the node maths – some windows return a few rows too many, others a few too few.

Its been really fun – but i’m getting in too deep here – trying to implement bit-logic i’m not on top of. So for my part it would not get to production – even if i could make it work

So. thanks for a great challenge – so much fun playing around with it.
Hope its ok to submit this with no code – there is too much to fit in this box.
Reply
- Uwe Ricken on 2025-07-14 at 08:30
  
  Dear Peter,
  
  thank you so much for your valued comment. I came to my solution by reading the great article from Itzik Ben Gan but our tests with this feature didn’t scale good when writing data.
  So our approach was a new table with the intervalls (15 min) in one table. You can read the details of my solution in link [Solution] in the challenge.
  
  Once again thank you for taking the challenge. I love your idea and will give it a try when I find the time for it.
  All the best, Uwe
  Reply
  - Peter Porskjær on 2025-07-14 at 09:00
    
    I didn’t manage to find the original article from Itzik Ben Gan – i only managed to dig up broken links.
    
    Looking forward to reading your solution later this week.
    Reply
Art on 2025-07-15 at 14:57

Hi Uwe,

How big is the real table?
The one at customer’s database?
I have a feeling it is bit larger than the one in the challenge.

Thank you.
Reply
- Uwe Ricken on 2025-07-15 at 15:03
  
  Hey Art,
  
  it’s nice to read from you. Your assumption is correct – the customers table contains app. 1.2 Billion rows and the table is much wider than this example which only contains one simple attribute which is not necessary [t_status].
  
  Thank you very much for participating to this challenge…
  Reply
Uwe Köhler on 2025-09-01 at 16:01

SELECT MIN(begin_ts),MAX(end_ts)FROM dbo.telegram –2025-05-12 12:00:00.000 2025-06-11 21:00:00.000
SELECT AVG(DATEDIFF(minute,begin_ts,end_ts)) AS AvgNumberMinDuration FROM dbo.telegram –0Min is avg.!
SELECT AVG(DATEDIFF(s,begin_ts,end_ts)) AS AvgNumberMinDuration FROM dbo.telegram –28s is avg.!
SELECT DATEDIFF(hour,MIN(begin_ts),MAX(end_ts))FROM dbo.telegram –729 => to less groups
SELECT DATEDIFF(minute,MIN(begin_ts),MAX(end_ts))FROM dbo.telegram –43740 => sounds good => Lokkuptable by minutes

–Quick and dirty: create 2 new lookup columns into the telegram table which holds the number minutes after first entry in telegram table
–Good for write, bad for future (lookuptable with minutes will reach max(int))
ALTER TABLE dbo.telegram ADD MinAfterForBegin int NULL;
ALTER TABLE dbo.telegram ADD MinAfterForEnd int NULL;
go
UPDATE dbo.telegram SET
[MinAfterForBegin]=DATEDIFF(s,CAST(‘2025-05-12 12:00:00.000’AS DATETIME2(3)),begin_ts)/60 –With minute not get right result (try min of 13:00:00.5 – 13:01:00.0 returns 1 instead 0)
,[MinAfterForEnd]= DATEDIFF(s,CAST(‘2025-05-12 12:00:00.000’AS DATETIME2(3)),end_ts)/60 –ms is over int capacity

CREATE NONCLUSTERED INDEX [IX_MinutesBeginEnd_Include_All] ON [dbo].[telegram]
(
[MinAfterForBegin] DESC,
[MinAfterForEnd] ASC
)
INCLUDE([id],[t_status],[begin_ts],[end_ts]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF)
go

SELECT id, [t_status], begin_ts, end_ts
FROM dbo.telegram
WHERE [MinAfterForEnd] >= DATEDIFF(minute,CAST(‘2025-05-12 12:00:00.000’AS DATETIME2(3)),CAST(‘2025-05-26 13:00’AS DATETIME2(3)))
AND [MinAfterForBegin] = DATEDIFF(minute,CAST(‘2025-05-12 12:00:00.000’AS DATETIME2(3)),CAST(‘2025-05-26 13:00’AS DATETIME2(3)))
AND [MinAfterForBegin] = DATEDIFF(minute,CAST(‘2025-05-12 12:00:00.000’AS DATETIME2(3)),CAST(‘2025-05-26 14:00’AS DATETIME2(3)))
AND
(
begin_ts = CAST(‘2025-05-26 13:00’AS DATETIME2(3))
)

–135946 rows
–SQL Server Execution Times:
–CPU time = 2719 ms, elapsed time = 1808 ms. ==>this is not the fastest sql ;-(
Reply
- Uwe Ricken on 2025-09-01 at 17:59
  
  Dear Uwe,
  
  thank you very much for your valued idea for the optimization. It’s an interesting idea. I’m gonna check it on my side to see whether it is a usable approach for the customer.
  Reply