SQL Server Physical Joins

SQL Server uses following physical joins to retrieve the data from tables:

Nested Loops

Nested Loop joins the tables by making the one with least rows as an outer table to optimize performance. For each row in this outer table a one-by-one comparison is done to all the rows in the inner table. If there is a match, the row is added to the result set. Nested Loop joins will perform best when the outer table is relatively small, and in case of large inner table, when it has proper indexing. Nested loop is most efficient loop for joining a small data stream with less resource consumption. Nested Loop is ideal for OLTP where you join two tables on indexed columns and filtering specific rows.

Hash Match

Hash joins are performed in two phases called Build and Probe. During the Build phase all the rows are read from the input, which is often the table with the least rows in it. This process creates an in-memory hash table. In the Probe phase rows from the second input are read and hashes are created using the same function, after which the results are matched to hash table. If there is a match, the results will be added to output.

While hash joins perform best with very large data sets, as it uses parallelism and scale better than the two other joins, it is also the most intensive workload to perform. There are few reasons to this. Sometimes the hashing functions can be complex, out-of-data statistics can cause bad estimates and if your hash table is too large to fit into memory, it might be spooled to tempdb increasing I/O load.

If I conclude Hash join, they are most effective for joining large unsorted sets. Hash joins are perfect for OLAP or batch processes. They are fast but very expensive and requires parallelism so not good for OLTP.

Merge Join

Merge join merges two sorted lists together. Merge joins are very common when two tables are joined by already sorted predicates or joined keys. They might also occur even when one of the join predicates needs to be sorted. With merge join two input rows are read and compared simultaneously, one row at the time. Equal rows are joined and with rows that are not, the lesser of the inputs is discarded. Merge joins are fast, however if there are sort operations involved they can be expensive. With good indexing, merge join is often the fastest algorithm, but expensive. In high transactional OLTP they are better than Hash Join but not good as Nested Loop.

The Adaptive Join operator was added in SQL Server 2017 as an alternative to the traditional join operators

Adaptive Join

The adaptive join is intended to be used when there is no efficient way to fulfill the order requirement of the Merge Join, and the optimizer cannot reliably predict which of the remaining algorithms (Hash Match or Nested Loops) would perform best. Because it has to be able to join the data using either the Nested Loops or the Hash Match algorithm, Adaptive Join suffers from the combined restrictions of these operators. As such, Adaptive Join supports only four logical join operations: inner join, left outer join (but not the probed version), left semi join, and left anti semi join; it requires at least one equality-based join predicate, it uses lots of memory, and it is semi-blocking.

This is very important to understand all these joins as some time a bad join operator can create a massive impact on database performance.

Please leave your comment if you any query or feedback.

 

9 thoughts on “SQL Server Physical Joins

Add yours

    1. Thanks Sumit for reading my blogs consistently. The participation from you guys encourages me to share my professional experience with everyone and might help someone in the world.

      Like

Leave a reply to sqlwizard Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑