I’ve read a lot of things lately pointing to scalar functions as if they were the devil. In this blog I’m going to explore if that’s the case. Let’s have a look.
It’s true that in many situations a scalar function is often a performance bottleneck; but, is there a situation where they could be responsibly used?
What if you had a lookup table that almost never changed? Is it worth doing a join on the lookup to get the data you need?
Let’s examine a simple join between a customer address and a state lookup table.
We have the following tables
To join these you’d usually use a very simple inner join to get the state code, which could also be the state name which is longer than the 4 bytes for int. You may also include other metrics on the States table like geo data or capitol. For now let’s pretend like Abbr is much larger than State (int) so it would make sense to dump it to a lookup table.
Here’s what the join query may look like
To test the value of scalar functions we created the the following code (note that it cuts off but you get the point)
Now that we have a function to test with, we want to create a query that returns the same results as our join query. Let’s have a look.
How do these compare with an actual plan?
Wow! The query with the function has much less cost! Incredible results, right? Not so fast! Sometimes cost can be misleading. To perform a true test we’ll need to do a bit more. Let’s turn off the plan and turn on STATISTICS TIME.
206 ms total elapsed time for the join query.
Only 92 ms for the funtion query. That’s good!
Let’s examine the query stats to see how these two compare.
Here we find some unexpected results. The execution time was less for this single run; however, CPU was greater by a significant amount.
We should dig deeper. You may have not noticed in the query plans that the function isn’t represented in the plan at all. Note that this is SQL Server 2017 and the database is running in the latest compatibility level.
To find out more we’re going to look at the function stats DMV.
SELECT db_name(database_id) as DB,
object_name(object_id) as fn,
type_desc,
last_execution_time,
execution_count,
total_elapsed_time,
total_worker_time,
total_logical_reads,
total_physical_reads,
total_logical_writes
FROM sys.dm_exec_function_stats
What we find is the function was executed 192 times. This is because the function must run once for each row in the result and it’s primarily this reason why scalar functions aren’t usually a great choice.
One final test is to run the queries 100 times. To do this, we’re going to use GO 100.
What have we learned? While the function may look great in the query plan it doesn’t stack up to actual runtime metrics. If there’s one thing a DBA should always say it’s “It depends” but in this case it doesn’t and you should instead add this quote to your toolbox, “You should never fully trust cost in query plans”.
[…] Daniel Janik head-fakes us a few times when looking at scalar user-defined function performance: […]
LikeLike
You don’t answer the question posed in the title–are scalar function good, bad, or does it depend? I’m not sure how your conclusion of not trusting the query plan cost data answers this.
LikeLike
Hi Ron, Thanks for reading and leaving a comment. If you consider that the execution time and CPU usage are much higher over many executions then it should be clear that the scalar function is not a good choice for this scenario. In general, scalar functions are not great for performance.
LikeLike
Good article except that you must not use SET STATISTICS to measure the performance of Scalar or Multi-statement Functions. Please see the following article for the demonstrable proof of that…
http://www.sqlservercentral.com/articles/T-SQL/91724/
Still, I agree and so does that article.. Scalar functions are still bad for performance… just not as bad as everyone makes them out to be. They’re “only” 7 times slower than inline code. 😀
LikeLike
I noticed you use schemabinding in your function, that in itself can cause some issues. When you leave it out scalar functions perform a lot worse.
LikeLike
Scalar functions are quite bad but they’re not as bad as SET STATISTICS makes them look. Please see the following article.
http://www.sqlservercentral.com/articles/T-SQL/91724/
LikeLike