I was recently asked to help tune a stored procedure that has been historically taking between 55 and 60 seconds to complete. Overall the code wasn’t too complex but getting to the root cause did surprise me a bit.
We run a mixture of SQL Server 2008 R2 and SQL Server 2016. Sadly this procedure resides in one of the SQL Server 2008 R2 instances, which means no live query plan and many other goodies are also unavailable to help troubleshoot. That’s OK though we’ll still follow the methodical flow of picking apart a batch and getting to root cause that I’ve grown to love about my job.
Step 1: Understand the issue
It’s really simple to say just go get the query plan and tune what you find. What if you did all that and the issue wasn’t the query? I like to understand what the perceived issue is before I start any tuning.
Ask yourself or the user: Is this an issue with only this query or is it everything?
You’re next set of steps could vary greatly, depending on what answer you come up with.
If everything is slow, I’d start with Task Manager or Perfmon to quickly validate the hardware.
Another good question to ask: Is the issue repeatable? Can you repro the issue on demand or does this appear more random?
Let’s say that this issue is with a single query and it’s always slow…
Step 2: Get the query plan
You can get the query plan any number of ways. Here’s how to grab the graphical plan from Management Studio (SSMS) https://technet.microsoft.com/en-us/library/ms178071(v=sql.105).aspx
Step 3: 1000 ft view of the plan
Examine the query plan and look for any obvious issues. Do the basics like looking for scans, lookups, warnings, and high cost operators.
When looking at this troubled stored procedure the obvious was glaring. Query #8 was showing a cost relative to the batch of 90% and there was a missing index that could improve it by 99.943%. Easy work!
I added the index and ran the proc. No change! It still took 55 seconds to complete.
Here’s how the query looked after the index being added:
There’s a total of 75% of the batch shown here. Where would you go next? Would you start looking at the two queries in the batch with 26% cost?
Step 4: Digging a bit deeper
To reduce the likelihood of parameter sniffing I recompiled the procedure and it didn’t help.
Since the query with the highest cost wasn’t actually the query causing the issue it’s time to figure out which one is. It’d be really nice to have SQL 2016 for this task, since you’d be able to simply run the proc using the Live query plan and see what was causing the delay.
You could use SET STATISTICS TIME ON. Looking a little more closely I noticed the query had commented PRINT blocks that read “print convert(varchar(20), getdate(), 109)” This was repeated before each query in the batch so it looked like someone had tried to troubleshoot the performance previously. I uncommented the print statements and reran. Here I noticed that query #11 was taking 54 seconds to complete.
The query had table variables with what appeared to contain millions of rows. So I changed them to temp tables and this reduced the run time to 24 seconds. Not bad but still terrible.
I noticed that the query had some expressions as seen here:
A closer look at the XML plan showed this interesting function (GetRangeWithMismatchedTypes):
If you didn’t already notice, take a look at the plan and you’ll see the very first operators are listed as Constant Scans and are followed by Compute Scalars. This is actually the Start and End date parameters being converted and something you should watch for when tuning.
Peeking at the parameters and the table ddl I found that they didn’t match each other.
I wouldn’t normally be too concerned with this because the column isn’t being converted, it’s the parameter; so, no problem right?
Not so fast. I added a local parameter for the start and end date parameters and reran the query.
I was a bit surprised when the stored procedure completed in 1 second.
Here’s the new plan:
Note that the table variables output much less data now.
Strangely enough the cost and memory grant was much higher for the new plan vs the old much slower plan.
I took a moment and reset the test proc completely to erase all my changes except the local date params and it still runs in 1 second. You wouldn’t normally think that converting the parameters would cause so much fuss but in this case they certainly did and this is something you should be looking for when tuning.
Summing this up.
1) When testing make sure you do so in test or dev.
2) Don’t always dig directly into tuning, you may be wasting your time. Understand the issue first.
3) When looking at a query plan understand that the cost doesn’t always represent fact. From time to time the highest cost operator or query may not represent the item that is actually taking the most time or that costs the most.
4) Look in sys.dm_exec_query_stats for the stored procedure first. This can save some time trying to find the longest running query in the procedure. You’d simply have to order by total_elapsed_time desc.
5) Pick apart your query piece by piece. It’s often a good idea to run the query piece by piece outside of the proc context. To do this, you’d simply run the procedure query by query in SSMS instead of running the whole proc at once. This can help identify parameter sniffing issues and a whole lot of other things. It also helps break the proc into easy to manage sections so you can run and rerun the troubled piece easily.
6) I’ve posted about this before and I’ll say it again here. Make sure your data types match between all joined or unioned columns and for all the parameters you use as well.