Are scalar functions really a bad thing?

I’ve read a lot of things lately pointing to scalar functions as if they were the devil. In this blog I’m going to explore if that’s the case. Let’s have a look.

It’s true that in many situations a scalar function is often a performance bottleneck; but, is there a situation where they could be responsibly used?

What if you had a lookup table that almost never changed? Is it worth doing a join on the lookup to get the data you need?

Let’s examine a simple join between a customer address and a state lookup table.

We have the following tables

image

To join these you’d usually use a very simple inner join to get the state code, which could also be the state name which is longer than the 4 bytes for int. You may also include other metrics on the States table like geo data or capitol. For now let’s pretend like Abbr is much larger than State (int) so it would make sense to dump it to a lookup table.

Here’s what the join query may look like

image

To test the value of scalar functions we created the the following code (note that it cuts off but you get the point)

image

Now that we have a function to test with, we want to create a query that returns the same results as our join query. Let’s have a look.

image

How do these compare with an actual plan?

image

Wow! The query with the function has much less cost! Incredible results, right? Not so fast! Sometimes cost can be misleading. To perform a true test we’ll need to do a bit more. Let’s turn off the plan and turn on STATISTICS TIME.

image

206 ms total elapsed time for the join query.

image

Only 92 ms for the funtion query. That’s good!

Let’s examine the query stats to see how these two compare.

image

Here we find some unexpected results. The execution time was less for this single run; however, CPU was greater by a significant amount.

We should dig deeper. You may have not noticed in the query plans that the function isn’t represented in the plan at all. Note that this is SQL Server 2017 and the database is running in the latest compatibility level.

To find out more we’re going to look at the function stats DMV.

SELECT db_name(database_id) as DB,
object_name(object_id) as fn,
type_desc,
last_execution_time,
execution_count,
total_elapsed_time,
total_worker_time,
total_logical_reads,
total_physical_reads,
total_logical_writes

FROM
sys.dm_exec_function_stats

What we find is the function was executed 192 times. This is because the function must run once for each row in the result and it’s primarily this reason why scalar functions aren’t usually a great choice.

image

One final test is to run the queries 100 times. To do this, we’re going to use GO 100.

image

What have we learned? While the function may look great in the query plan it doesn’t stack up to actual runtime metrics. If there’s one thing a DBA should always say it’s “It depends” but in this case it doesn’t and you should instead add this quote to your toolbox, “You should never fully trust cost in query plans”.

6 thoughts on “Are scalar functions really a bad thing?

  1. You don’t answer the question posed in the title–are scalar function good, bad, or does it depend? I’m not sure how your conclusion of not trusting the query plan cost data answers this.

    Like

    • Hi Ron, Thanks for reading and leaving a comment. If you consider that the execution time and CPU usage are much higher over many executions then it should be clear that the scalar function is not a good choice for this scenario. In general, scalar functions are not great for performance.

      Like

  2. Good article except that you must not use SET STATISTICS to measure the performance of Scalar or Multi-statement Functions. Please see the following article for the demonstrable proof of that…
    http://www.sqlservercentral.com/articles/T-SQL/91724/

    Still, I agree and so does that article.. Scalar functions are still bad for performance… just not as bad as everyone makes them out to be. They’re “only” 7 times slower than inline code. 😀

    Like

  3. I noticed you use schemabinding in your function, that in itself can cause some issues. When you leave it out scalar functions perform a lot worse.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s