IN vs EXISTS in Oracle

IN vs EXISTS in Oracle

This is really a typical question as the oracle optimizer is changing the plans in optimized version by version by rewriting the query, but still I have done some work on this.

I am using 11.2.0.1 version.

First create the below tables in the database.

CREATE TABLE T1 (
  C1 NUMBER,
  FILLER VARCHAR2(300),
  PRIMARY KEY (C1));

CREATE TABLE T2 (
  C1 NUMBER,
  FILLER VARCHAR2(300),
  PRIMARY KEY (C1));

INSERT INTO
  T1
SELECT
  ROWNUM,
  LPAD(‘A’,300,’A’)
FROM
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V1,
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=1000) V2;

INSERT INTO
  T2
SELECT
  ROWNUM*3,
  LPAD(‘A’,300,’A’)
FROM
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=333) V1,
  (SELECT
    ROWNUM NR
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10) V2;

COMMIT;

EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>’T1′);
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER, TABNAME=>’T2′);


Now we have more number of rows in first table T1 and less number of rows in T2.

Now I am disabling the constraint for the tables. So that it will automatically remove the index.(disable).

alter table T1 disable constraint SYS_C0011053;
alter table T2 disable constraint SYS_C0011054;


explain plan for DELETE FROM T1 WHERE C1 IN (SELECT C1 FROM T2);

PLAN_TABLE_OUTPUT
——————————————————————–
Plan hash value: 3070334650
——————————————————————–
| Id  | Operation             | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
——————————————————————–
|   0 | DELETE STATEMENT      |      |   333K|  3251K|       | 17278   (1)| 00:03:28 |
|   1 |  DELETE               | T1   |       |       |       |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|      |   333K|  3251K|  5536K| 17278   (1)| 00:03:28 |
|   3 |    TABLE ACCESS FULL  | T2   |   333K|  1625K|       |  4129   (1)| 00:00:50 |
|   4 |    TABLE ACCESS FULL  | T1   |  1000K|  4882K|       | 12058   (1)| 00:02:25 |
——————————————————————–

Predicate Information (identified by operation id):
—————————————————

   2 – access(“C1″=”C1”)

 
SQL> explain plan for DELETE FROM T1 WHERE exists (SELECT 1 FROM T2 where T2.C1=T1.C1);

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
——————————————————————–
Plan hash value: 3070334650
——————————————————————–
| Id  | Operation             | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
——————————————————————–
|   0 | DELETE STATEMENT      |      |   333K|  3251K|       | 17278   (1)| 00:03:28 |
|   1 |  DELETE               | T1   |       |       |       |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|      |   333K|  3251K|  5536K| 17278   (1)| 00:03:28 |
|   3 |    TABLE ACCESS FULL  | T2   |   333K|  1625K|       |  4129   (1)| 00:00:50 |
|   4 |    TABLE ACCESS FULL  | T1   |  1000K|  4882K|       | 12058   (1)| 00:02:25 |
——————————————————————–

Predicate Information (identified by operation id):
—————————————————

   2 – access(“T2″.”C1″=”T1”.”C1″)

16 rows selected.

SQL> explain plan for DELETE FROM T2 WHERE exists (SELECT 1 FROM T1 where T2.C1=T1.C1);

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
——————————————————————–
Plan hash value: 3678400186
——————————————————————–
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
——————————————————————–
|   0 | DELETE STATEMENT    |      |   333K|  3251K|       | 17278   (1)| 00:03:28 |
|   1 |  DELETE             | T2   |       |       |       |            |          |
|*  2 |   HASH JOIN SEMI    |      |   333K|  3251K|  5536K| 17278   (1)| 00:03:28 |
|   3 |    TABLE ACCESS FULL| T2   |   333K|  1625K|       |  4129   (1)| 00:00:50 |
|   4 |    TABLE ACCESS FULL| T1   |  1000K|  4882K|       | 12058   (1)| 00:02:25 |
——————————————————————–

Predicate Information (identified by operation id):
—————————————————

   2 – access(“T2″.”C1″=”T1”.”C1″)

16 rows selected.

All the queries are having the same exection plan , doing full table scan on all the tables.

SQL> delete from t1 where c1 in (select c1 from t2);

333000 rows deleted.

Elapsed: 00:00:33.87

SQL> delete from t1 where exists (select 1 from t2 where t1.c1=t2.c1);

333000 rows deleted.

Elapsed: 00:00:33.14

“Both the execution times are same but little bit IN cluase is completed in less time than exists”.

Now I am enabling the constraint on the tables which automatically enable the index on the table.


SQL> explain plan for delete from t1 where c1 in (select c1 from t2);

Explained.

Elapsed: 00:00:00.02
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
——————————————————————–
Plan hash value: 55652975

——————————————————————–
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————–
|   0 | DELETE STATEMENT    |              |   333K|  3251K|  2238   (7)| 00:00:27 |
|   1 |  DELETE             | T1           |       |       |            |          |
|   2 |   NESTED LOOPS      |              |   333K|  3251K|  2238   (7)| 00:00:27 |
|   3 |    INDEX FULL SCAN  | SYS_C0011053 |  1000K|  4882K|  2104   (1)| 00:00:26 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0011054 |     1 |     5 |     0   (0)| 00:00:01 |
——————————————————————–

Predicate Information (identified by operation id):
—————————————————

   4 – access(“C1″=”C1”)

16 rows selected.


SQL> delete from t1 where c1 in (select c1 from t2);

333000 rows deleted.

Elapsed: 00:00:22.29

SQL> explain plan for delete from t1 where exists (select 1 from t2 where t1.c1=t2.c1);

Explained.

Elapsed: 00:00:00.05
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
——————————————————————–

Plan hash value: 337034320

——————————————————————–
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————–
|   0 | DELETE STATEMENT    |              |   333K|  3251K|  2238   (7)| 00:00:27 |
|   1 |  DELETE             | T1           |       |       |            |          |
|   2 |   NESTED LOOPS SEMI |              |   333K|  3251K|  2238   (7)| 00:00:27 |
|   3 |    INDEX FULL SCAN  | SYS_C0011053 |  1000K|  4882K|  2104   (1)| 00:00:26 |
|*  4 |    INDEX UNIQUE SCAN| SYS_C0011054 |   110K|   541K|     0   (0)| 00:00:01 |
——————————————————————–

Predicate Information (identified by operation id):
—————————————————

   4 – access(“T1″.”C1″=”T2”.”C1″)

16 rows selected.

SQL> delete from t1 where exists (select 1 from t2 where t1.c1=t2.c1);

333000 rows deleted.

Elapsed: 00:00:49.30


NOTE : If the sub query is retrieving less number of rows then “IN” is working better than “EXISTS”.

====================================================================

Now I am testing the deletion with sub query retrieving more rows in subquery.

SQL> explain plan for delete from t2 where c1 in (select c1 from t1);

Explained.

Elapsed: 00:00:00.01
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
——————————————————————–
Plan hash value: 2291474641
——————————————————————–
| Id  | Operation              | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————–
|   0 | DELETE STATEMENT       |              |   333K|  3251K|   710  (21)| 00:00:09 |
|   1 |  DELETE                | T2           |       |       |            |          |
|   2 |   NESTED LOOPS         |              |   333K|  3251K|   710  (21)| 00:00:09 |
|   3 |    INDEX FAST FULL SCAN| SYS_C0011053 |  1000K|  4882K|   576   (2)| 00:00:07 |
|*  4 |    INDEX UNIQUE SCAN   | SYS_C0011054 |     1 |     5 |     0   (0)| 00:00:01 |
——————————————————————–

Predicate Information (identified by operation id):
—————————————————

   4 – access(“C1″=”C1”)

16 rows selected.


SQL> delete from t2 where c1 in (select c1 from t1);

333000 rows deleted.

Elapsed: 00:00:25.28

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
——————————————————————–

SQL_ID  4nqvaj3zafkv1, child number 0
————————————-
delete from t2 where exists (select 1 from t1 where t1.c1=t2.c1)

Plan hash value: 3082207826

——————————————————————–
| Id  | Operation              | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
——————————————————————–
|   0 | DELETE STATEMENT       |              |       |       |       |  2368 (100)|          |
|   1 |  DELETE                | T2           |       |       |       |            |          |
|*  2 |   HASH JOIN SEMI       |              |   333K|  3251K|  5536K|  2368   (2)| 00:00:29 |
|   3 |    INDEX FULL SCAN     | SYS_C0011054 |   333K|  1625K|       |   701   (1)| 00:00:09 |
|   4 |    INDEX FAST FULL SCAN| SYS_C0011053 |  1000K|  4882K|       |   576   (2)| 00:00:07 |
——————————————————————–

Predicate Information (identified by operation id):
—————————————————

   2 – access(“T1″.”C1″=”T2”.”C1″)


SQL> delete from t2 where exists (select 1 from t1 where t1.c1=t2.c1);

333000 rows deleted.

Elapsed: 00:00:27.30

Here I have observed that again IN is performing better than the exists clause.

Now I have truncated the table t2 and inserted less number of rows  than existing and tested the same and found that exists is working is better than the IN clause.

I reduced the rows from 3 lac to 33 thousand.

SQL> select count(1) from t2;

  COUNT(1)
———-
     33300

Elapsed: 00:00:40.71
SQL>
SQL> exec dbms_stats.gather_table_stats(‘TEST’,’T2′);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.22
SQL>
SQL> delete from t2 where exists (select 1 from t1 where t1.c1=t2.c1);

33300 rows deleted.

Elapsed: 00:00:00.78
SQL> rollback;

Rollback complete.

Elapsed: 00:00:00.26
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:02.05
SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.18
SQL> delete from t2 where c1 in  (select c1 from t1);

33300 rows deleted.


Elapsed: 00:00:02.04

This is entirely depends on the optimizer how it is generating the plans.

Finally the conclusion is :-

*) If we do not have any index on the column we are using to join there is no change in the execution time by IN or EXISTS.

If we have created index on the column we use in the query to join.


*) If the subquery is giving less number of rows and outer table has more number of rows “IN clause is working better than EXISTS”.


*) If the subquery is giving more rows and outer query has less number of rows “EXISTS is working better than IN”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s