What are the main strategies for implementing the Join operation?

- July 26, 2022

Like selection, the join operation (joining) can be implemented in a variety of ways. In terms of disk accesses, the joining can be very expensive, so implementing and utilizing efficient join algorithms is critical in minimizing a query's execution time. The following are 5 well-known strategies for implementing the join operation/types of join operaton are described below:-

Nested-Loop Join
Block Nested-Loop Join
Indexed Nested-Loop Join
Sort-Merge Join
Hash Join
Nested-Loop Join

Nested-Loop Join

This algorithm consists of an inner for loop nested within an outer for loop. To illustrate this

the algorithm, we will use the following notations:

R, S Relations r and s

tr Tuple (record) in relation r

ts Tuple (record) in relation s

nr Number of records in relation r

ns Number of records in relation s

br Number of blocks with records in relation r

bs Number of blocks with records in relation s

Here is a sample pseudo-code listing for joining the two relations r and s utilizing the nested-for loop:

In the algorithm, t, and t. are the tuples of relations r and s, respectively. The notation texts is a tuple constructed by concatenating the attribute values of tuples tr and t.. With the help of the algorithm, we understood the following points:

The nested-loop join does not need any indexing similar to a linear file scan for accessing the data.
Nested-loop join does not care about the given join condition. It is suitable for each given join condition.
The nested-loop join algorithm is expensive in nature. It is because it computes and examines each pair of tuples in the given two relations.

Block Nested-Loop Join:

If the buffer is too small to hold either relation entirely in memory, we can still obtain a major saving in block accesses if we process the relations on a per-block basis, rather than on a per tuple basis. Figure 3.5 shows a block nested-loop join, which is a variant of the nested-loop join where every block of the inner relation is paired with every block of the outer relation. Within each pair of blocks, ever tuple in one block is paired with every tuple in the other block, to generate all pairs of tuples. As before, all pairs of tuples that satisfy the join condition are added to the result.

The primary difference in cost between the block nested-loop join and the basic nested-loop join is that, in the worst case, each block in the inner relation s is read only once for each block in the outer relation, instead of once for each tuple in the outer relation. Clearly, it is more efficient to use the smaller relation as the outer relation, in case neither of the relations fits in memory.

Index Nested-Loop Join

This algorithm is the same as the Nested-Loop Join, except an index file on the inner relation's (s) join attribute is used versus a data-file scan on s each index lookup in the inner loop is essentially an equality selection on s utilizing one of the selection algorithms.

Sort-Merge Join

This algorithm can be used to perform natural joins and equijoin and requires that each relation (r and s) be sorted by the common attributes between them (R^S). The details of how this algorithm works will not be presented here. However, it is notable to point out that each record in r and s is only scanned once, thus producing a worst and best-case cost of br+ ba. Variations of the Sort-Merge Join algorithm are used, for instance, when the data files are in unsorted order, but there exist secondary indices for the two relations.

Hash Join

Like the sort-merge join, the hash join algorithm can be used to perform natural joins and equi joins. The concept behind the Hash join algorithm is to partition the tuples of each given relation into sets. The partition is done on the basis of the same hash value on the join attributes. The hash function provides the hash value. The main goal of using the hash function in the algorithm is to reduce the number of comparisons and increase the efficiency to complete the join operation on the relations.

For example, suppose there are two tuples a and b where both of them satisfy the join condition. It means they have the same value for the join attributes. Suppose that both a and b tuples consist of a hash value as i. It implies that tuple a should be in ai, and tuple b should be in bi. Thus, only compare tuples in a; with b tuples of bi. There is no need to compare the b tuples in any other partition. Therefore, in this way, the hash join operation works.

Search This Blog

Notes for BSc CSIT