Sunday, 13 March 2011

How Do Join Operaters In SQL Server Work?

I was asked a question this week, that I could only answer at a very high level, so of course my instant reaction was to research a more detailed answer and blog about it! The question was "How do the join operators in SQL Server work?"

So first for a little background... In SQL Server, if you join two table together, in the old days, there was only one way that the optimizer could perform this task, and that was with an operation called Nested Loops (or loop join). In more recent versions however, in order to support larger databases, with larger tables, two additional operators have been added. These are Merge Join or Hash Match (Hash join). But how do these operators work?

Well, first up is Nested Loops. This operator will use statistics to decide which is the smallest of the two tables (in terms of rows) and will mark this table as the "outer table", and mark the larger table as the "inner table". For each row in the outer table, it will loop through every row in the inner table, looking for a matching key, based on the join column(s), and join the tables based on this value.

Next is Hash Match. This operator works by choosing one of the tables and running a Hash Function against the key column(s). It then puts these hashed values into an in-memory hash table, although if this becomes to large, it can be written to TempDB. It then runs the Hash Function against the second table, and joins the rows based on a match of the two hashed values.

Lastly is Merge Join. This operator relies on both of the tables being sorted by the join key. Because both keys are sorted in the same way, SQL can look at the first value in the first table and the first value in the second table. If they match, then it can output them to the result set, if not, then it can take the next row from the first table and repeat the process, until all of the rows from the first table have been matched.

So when is each operator best? Well Nested Loops is generally better for smaller tables, and also needs to be able to perform an index seek on the "inner table", to ensure any kind of performance.

Hash Match can be better for larger tables, because it requires less IO. This is because each key only needs to be read once, rather than multiple times. However, the hash function causes a fair bit of CPU, and hash table can require a lot of memory, and if you are coming under memory pressure, it may need to write the table to TempDB, which can cause more IO, as well as the memory pressure, making it a less desirable option.

Merge Join is generally regarded to be the most efficient of the operators for larger datasets, because it only needs to read the keys once, and also does not require a lot of CPU cycles. However, it does require both tables to be sorted. If they are (especially by the Clustered Index) then the operator is very fast, but if they are not, then SQL needs to perform a sort operation on them, before they enter the join operation. This, of course, is very inefficient.

It is possible to "override" the optimizer's choice of operator, by using OPTION(Loop Join), OPTION(Hash join) or OPTION(Merge Join) after the join in your query. However, remember that optimizer is very smart, and does not make that many mistakes. Updating Statistics and rebuilding indexes is usually the best way to ensure the optimizer uses the correct option. If you really do need to use these hints, however, then you should always work with SQL, rather than against it.

What do I mean by that? Well, if we take the example that SQL is deciding to use a Merge Join, but we know that this will never be the most effective option, because one of the tables (for some reason) will never be sorted by the join key, then instead of telling the optimizer that it must use a Hash Match, we should tell it to use anything it wants except a Merge Join. We can do this, by comma separating the list of operators that it can use. In this example, we would say OPTION(Hash join, Loop Join)


Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

No comments:

Post a Comment