What are the main strategies for implementing the Join operation?

 Like selection, the join operation (joining) can be implemented in a variety of ways. In terms of disk accesses, the joining can be very expensive, so implementing and utilizing efficient join algorithms is critical in minimizing a query's execution time. The following are 5 well-known strategies for implementing the join operation/types of join operaton are described below:-


  • Nested-Loop Join
  • Block Nested-Loop Join
  • Indexed Nested-Loop Join
  • Sort-Merge Join
  • Hash Join
  • Nested-Loop Join

Nested-Loop Join

This algorithm consists of an inner for loop nested within an outer for loop. To illustrate this
the algorithm, we will use the following notations:


R, S   Relations r and s
tr      Tuple (record) in relation r
ts     Tuple (record) in relation s
nr     Number of records in relation r
ns     Number of records in relation s
br     Number of blocks with records in relation r
bs      Number of blocks with records in relation s
Here is a sample pseudo-code listing for joining the two relations r and s utilizing the nested-for loop:

In the algorithm, t, and t. are the tuples of relations r and s, respectively. The notation texts is a tuple constructed by concatenating the attribute values of tuples tr and t.. With the help of the algorithm, we understood the following points:

  • The nested-loop join does not need any indexing similar to a linear file scan for accessing the data.
  • Nested-loop join does not care about the given join condition. It is suitable for each given join condition.
  •  The nested-loop join algorithm is expensive in nature. It is because it computes and examines each pair of tuples in the given two relations.

Block Nested-Loop Join:

If the buffer is too small to hold either relation entirely in memory, we can still obtain a major saving in block accesses if we process the relations on a per-block basis, rather than on a per tuple basis. Figure 3.5 shows a block nested-loop join, which is a variant of the nested-loop join where every block of the inner relation is paired with every block of the outer relation. Within each pair of blocks, ever tuple in one block is paired with every tuple in the other block, to generate all pairs of tuples. As before, all pairs of tuples that satisfy the join condition are added to the result.

The primary difference in cost between the block nested-loop join and the basic nested-loop join is that, in the worst case, each block in the inner relation s is read only once for each block in the outer relation, instead of once for each tuple in the outer relation. Clearly, it is more efficient to use the smaller relation as the outer relation, in case neither of the relations fits in memory.

Index Nested-Loop Join

This algorithm is the same as the Nested-Loop Join, except an index file on the inner relation's (s) join attribute is used versus a data-file scan on s each index lookup in the inner loop is essentially an equality selection on s utilizing one of the selection algorithms.


Sort-Merge Join

This algorithm can be used to perform natural joins and equijoin and requires that each relation (r and s) be sorted by the common attributes between them (R^S). The details of how this algorithm works will not be presented here. However, it is notable to point out that each record in r and s is only scanned once, thus producing a worst and best-case cost of br+ ba. Variations of the Sort-Merge Join algorithm are used, for instance, when the data files are in unsorted order, but there exist secondary indices for the two relations.

Hash Join

Like the sort-merge join, the hash join algorithm can be used to perform natural joins and equi joins. The concept behind the Hash join algorithm is to partition the tuples of each given relation into sets. The partition is done on the basis of the same hash value on the join attributes. The hash function provides the hash value. The main goal of using the hash function in the algorithm is to reduce the number of comparisons and increase the efficiency to complete the join operation on the relations.

For example, suppose there are two tuples a and b where both of them satisfy the join condition. It means they have the same value for the join attributes. Suppose that both a and b tuples consist of a hash value as i. It implies that tuple a should be in ai, and tuple b should be in bi. Thus, only compare tuples in a; with b tuples of bi. There is no need to compare the b tuples in any other partition. Therefore, in this way, the hash join operation works.

Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Explain network topology .Explain tis types with its advantages and disadvantges.