discuss semi-join and anti-join as operations to which nested queries may be mapped; provide an example of each.

 Semi-Join Defined

A “semi-join” between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned. Semi-joins are written using the EXISTS or IN constructs.

Suppose you have the DEPT and EMP tables in the SCOTT schema and you want a list of departments with at least one employee. You could write the query with a conventional join:


        SELECT   D.deptno, D.dname

        FROM     dept D, emp E

        WHERE    E.deptno = D.deptno

        ORDER BY D.deptno;

Unfortunately, if a department has 400 employees then that department will appear in the query output 400 times. You could eliminate the duplicate rows by using the DISTINCT keyword, but you would be making Oracle do more work than necessary. Really what you want to do is specify a semi-join between the DEPT and EMP tables instead of a conventional join:


        SELECT   D.deptno, D.dname

        FROM     dept D

        WHERE    EXISTS

                 (

                 SELECT 1

                 FROM   emp E

                 WHERE  E.deptno = D.deptno

                 )

        ORDER BY D.deptno;

The above query will list the departments that have at least one employee. Whether a department has one employee or 100, the department will appear only once in the query output. Moreover, Oracle will move on to the next department as soon as it finds the first employee in a department, instead of finding all of the employees in each department.


Anti-Join Defined

An “anti-join” between two tables returns rows from the first table where no matches are found in the second table. An anti-join is essentially the opposite of a semi-join: While a semi-join returns one copy of each row in the first table for which at least one match is found, an anti-join returns one copy of each row in the first table for which no match is found. Anti-joins are written using the NOT EXISTS or NOT IN constructs. These two constructs differ in how they handle nulls—a subtle but very important distinction that we will discuss later.

Suppose you want a list of empty departments—departments with no employees. You could write a query that finds all departments and subtracts off the department of each employee:


        SELECT   D1.deptno, D1.dname

        FROM     dept D1

        MINUS

        SELECT   D2.deptno, D2.dname

        FROM     dept D2, emp E2

        WHERE    D2.deptno = E2.deptno

        ORDER BY 1;

The above query will give the desired results, but it might be clearer to write the query using an anti-join:

        SELECT   D.deptno, D.dname

        FROM     dept D

        WHERE    NOT EXISTS

                 (

                 SELECT 1

                 FROM   emp E

                 WHERE  E.deptno = D.deptno

                )

           ORDER BY D.deptno;

The above query is more efficient because Oracle can employ an anti-join access path. The difference in efficiency here is akin to the difference between a nested loops join and a hash join when you are joining every row in one table to another.

Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Explain network topology .Explain tis types with its advantages and disadvantges.