Long ago there were locking / blocking problems with the SELECT INTO statement. That’s not the case anymore and for AdHoc operations and investigation of data SELECT INTO is very helpful.
If you’re not aware of what SELECT INTO is or its benefits here’s an example:
SELECT City, COUNT(1) as CNT
GROUP BY City
HAVING COUNT(1) > 1
This query for AdventureWorks will dump all of its results into a table named #MyDuplicateCities. Note that there is no CREATE TABLE statement. The INTO [tablename] will create the table for you.
Running this query a second time will result in failure if you haven’t dropped the #MyDuplicateCities table.
Using this syntax can be really helpful if you just need to do some quick and dirty cleanup; however, it should be avoided for stored procedures. Here’s why…
Have you ever tried to get an Estimated plan from a stored procedure and it results in an ERROR 208 Invalid object name?
When you’re attempting to get an estimated plan the query / stored procedure isn’t actually running. Since the DDL doesn’t exist SQL Server errors with a 208 message and cannot continue to analyze the statement.
Creating the DDL at the beginning of your procedure will always ensure that the estimated plan can be retrieved and as an added bonus the code will be easier to debug for someone that isn’t familiar with the data.
Avoid using SELECT INTO for code you’re going to push to production for the following reasons:
- It can make estimated plans fail.
- It’s slightly harder to debug or modify since the schema isn’t easily shown.
Be lazy and efficient when investigating data. For code that you’re not going to reuse SELECT INTO is a champ. It’s faster than building the typing the CREATE TABLE statement and works just as well for AdHoc scenarios.
*Update for Mr Ozar*
When I originally wrote this post I thought I did include a counter point or at least some mention of parallelism for SELECT INTO as a benefit. I think it’s easy to say that in the world of SQL Server “It depends.” and this is always why a talking point like this has two sides and both are valid. I typically always select the highest performing path. I think it’s up to you, the reader, to decide which is best for your environment. Given the new Parallel Insert operation in SQL Server 2016, I’d say why not create your temp table first and have the best of both worlds?
4 thoughts on “Why I don’t love SELECT INTO”
Well, one problem there – starting with SQL 2012 SP1 CU10 and SQL 2014, using SELECT…INTO gets you parallelism, and it ends up going way faster in those scenarios:
Totally agree! This was one of those 1 AM posts. I thought I included something about parallelism as a counter point. Maybe one day they can change the estimated plan to not fail on temporary tables that are created on the fly with the INTO.
Does anyone really care about estimated plans? That seems like a minor point.
As for schema obfuscation…I think this is actually a benefit of using SELECT INTO. I know I hate having to grok source code looking for every table/col reference where I need to change a datatype. Using SELECT…INTO avoids that and is the equivalent of using %ROWTYPE in Oracle. Inheriting a type is a common OO pattern. Why risk bizarre behaviors because you increased a VARCHAR(20) to VARCHAR(200) and now are getting goofy truncation problems.
It’s a good point you make. I can tell you that I do care about estimated plans. A large part of what I’ve done for the past 8 years is tuning and often times that’s for queries that take a very long time to run and an actual plan isn’t always readily available. I see some great points being made and have changed the title of this post because I completely agree with the points made. If you create all your DDL you’d need to go back and change it if the base table changes and that can be costly and time consuming; so, your point is very valid and helpful. Thanks for the comment.