discuss semi-join and anti-join as operations to which nested queries may be mapped; provide an example of each.

- July 27, 2022

Semi-Join Defined

A “semi-join” between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned. Semi-joins are written using the EXISTS or IN constructs.

Suppose you have the DEPT and EMP tables in the SCOTT schema and you want a list of departments with at least one employee. You could write the query with a conventional join:

SELECT D.deptno, D.dname

FROM dept D, emp E

WHERE E.deptno = D.deptno

ORDER BY D.deptno;

Unfortunately, if a department has 400 employees then that department will appear in the query output 400 times. You could eliminate the duplicate rows by using the DISTINCT keyword, but you would be making Oracle do more work than necessary. Really what you want to do is specify a semi-join between the DEPT and EMP tables instead of a conventional join:

SELECT D.deptno, D.dname

FROM dept D

WHERE EXISTS

(

SELECT 1

FROM emp E

WHERE E.deptno = D.deptno

)

ORDER BY D.deptno;

The above query will list the departments that have at least one employee. Whether a department has one employee or 100, the department will appear only once in the query output. Moreover, Oracle will move on to the next department as soon as it finds the first employee in a department, instead of finding all of the employees in each department.

Anti-Join Defined

An “anti-join” between two tables returns rows from the first table where no matches are found in the second table. An anti-join is essentially the opposite of a semi-join: While a semi-join returns one copy of each row in the first table for which at least one match is found, an anti-join returns one copy of each row in the first table for which no match is found. Anti-joins are written using the NOT EXISTS or NOT IN constructs. These two constructs differ in how they handle nulls—a subtle but very important distinction that we will discuss later.

Suppose you want a list of empty departments—departments with no employees. You could write a query that finds all departments and subtracts off the department of each employee:

SELECT D1.deptno, D1.dname

FROM dept D1

MINUS

SELECT D2.deptno, D2.dname

FROM dept D2, emp E2

WHERE D2.deptno = E2.deptno

ORDER BY 1;

The above query will give the desired results, but it might be clearer to write the query using an anti-join:

SELECT D.deptno, D.dname

FROM dept D

WHERE NOT EXISTS

(

SELECT 1

FROM emp E

WHERE E.deptno = D.deptno

)

ORDER BY D.deptno;

The above query is more efficient because Oracle can employ an anti-join access path. The difference in efficiency here is akin to the difference between a nested loops join and a hash join when you are joining every row in one table to another.

Search This Blog

Notes for BSc CSIT