Join cardinality computation case

      4 Comments on Join cardinality computation case

Hi there
Giving an SQL tuning training session, I came to the following point:
This query:
 
 
[code language=”sql”]
select count(prs.firstname),
count(adr.zip_additional),
count(mlg.dispatch_date)
from addresses adr
join recipients rec on (rec.adr_id = adr.id)
join mailings mlg on (mlg.id = rec.mlg_id)
join persons prs on (prs.id = adr.prs_id)
join formsofaddress foa on (foa.id = prs.foa_id)
where mlg.product = ‘Easter bunny’
and adr.zip like ‘8%’;
[/code]
Gives this plan:
[code language=”sql”]
Plan hash value: 13115236
—————————————————————————————————————–
| Id  | Operation                       | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
—————————————————————————————————————–
|   0 | SELECT STATEMENT                |            |      1 |        |      1 |00:00:00.13 |    3233 |    177 |
|   1 |  SORT AGGREGATE                 |            |      1 |      1 |      1 |00:00:00.13 |    3233 |    177 |
|*  2 |   HASH JOIN                     |            |      1 |   5741 |     35 |00:00:00.23 |    3233 |    177 |
|*  3 |    HASH JOIN                    |            |      1 |   5741 |     35 |00:00:00.08 |    2225 |    108 |
|*  4 |     TABLE ACCESS FULL           | ADDRESSES  |      1 |  29677 |  27211 |00:00:00.06 |    2212 |    107 |
|   5 |     NESTED LOOPS                |            |      1 |  49384 |    324 |00:00:00.01 |      13 |      1 |
|   6 |      NESTED LOOPS               |            |      1 |  49384 |    324 |00:00:00.01 |      11 |      1 |
|*  7 |       TABLE ACCESS FULL         | MAILINGS   |      1 |      1 |      1 |00:00:00.01 |       7 |      1 |
|*  8 |       INDEX RANGE SCAN          | REC_MLG_ID |      1 |  49384 |    324 |00:00:00.01 |       4 |      0 |
|   9 |      TABLE ACCESS BY INDEX ROWID| RECIPIENTS |    324 |  49384 |    324 |00:00:00.01 |       2 |      0 |
|  10 |    TABLE ACCESS FULL            | PERSONS    |      1 |    154K|    154K|00:00:00.04 |    1008 |     69 |
—————————————————————————————————————–
[/code]
Considering that the solution is:
[code language=”sql”]
select /*+   leading(mlg) use_nl (adr prs)  */ count(prs.firstname),
       count(adr.zip_additional),
       count(mlg.dispatch_date)
from   addresses adr
       join recipients rec on (rec.adr_id = adr.id)
       join mailings mlg on (mlg.id = rec.mlg_id)
       join persons prs on (prs.id = adr.prs_id)
       join formsofaddress foa on (foa.id = prs.foa_id)
where  mlg.product = ‘Easter bunny’
and    adr.zip like ‘8%’;
SELECT * FROM table(dbms_xplan.display_cursor(NULL, NULL, ‘iostats last’));
[/code]
Giving the following execution plan:
Plan hash value: 79467701
[code language=”sql”]
—————————————————————————————————————–
  Id  | Operation                                | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
—————————————————————————————————————-
|   0 | SELECT STATEMENT                         |            |      1 |        |      1 |00:00:00.01 |     735 |
|   1 |  SORT AGGREGATE                          |            |      1 |      1 |      1 |00:00:00.01 |     735 |
|   2 |   NESTED LOOPS                           |            |      1 |   5741 |     35 |00:00:00.01 |     735 |
|   3 |    NESTED LOOPS                          |            |      1 |   5741 |     35 |00:00:00.01 |     700 |
|   4 |     NESTED LOOPS                         |            |      1 |   5741 |     35 |00:00:00.01 |     663 |
|   5 |      NESTED LOOPS                        |            |      1 |  49384 |    324 |00:00:00.01 |      13 |
|*  6 |       TABLE ACCESS FULL                  | MAILINGS   |      1 |      1 |      1 |00:00:00.01 |       7 |
|   7 |       TABLE ACCESS BY INDEX ROWID BATCHED| RECIPIENTS |      1 |  49384 |    324 |00:00:00.01 |       6 |
|*  8 |        INDEX RANGE SCAN                  | REC_MLG_ID |      1 |  49384 |    324 |00:00:00.01 |       4 |
|*  9 |      TABLE ACCESS BY INDEX ROWID         | ADDRESSES  |    324 |      1 |     35 |00:00:00.01 |     650 |
|* 10 |       INDEX UNIQUE SCAN                  | ADR_PK     |    324 |      1 |    324 |00:00:00.01 |     326 |
|* 11 |     INDEX UNIQUE SCAN                    | PRS_PK     |     35 |      1 |     35 |00:00:00.01 |      37 |
|  12 |    TABLE ACCESS BY INDEX ROWID           | PERSONS    |     35 |      1 |     35 |00:00:00.01 |      35 |
—————————————————————————————————————–
[/code]
We have frequency histogram on the column MLG_ID with 10 buckets as we have 10 distinct values.
The cardinality with mlg=id = 3 is 324. So 324 is the result of the join between mlg and rec base on mlg.product = ‘Easter bunny’
The estimate of the optimizer is the same in 12c or 11g.
Join Cardinality = Join Selectivity * cardinality rec * cardinality mlg
where join Selectivity = 1 / greater(num_distinct(rec.mlg_id), num_distinct(mlg.id))  1 – (greater (10,10) = 0.1
Join cardinality = 0.1 * 493835 * 1 (1 because mlg.product = ‘Easter bunny’) = 49384
Question behind, how comes the optimizer at parse cannot use the good cardinality that comes from the mailings table and do a projection base on histograms on recipients to directly have the good estimate?
I mean, at this stage, it has everything to clearly evaluate precisely the join!
Now if I just add “and rec.mlg_id=3″ like this
[code language=”sql”]
select /*+  leading(mlg) use_nl (adr prs)  */ count(prs.firstname),
       count(adr.zip_additional),
       count(mlg.dispatch_date)
from   addresses adr
       join recipients rec on (rec.adr_id = adr.id)
       join mailings mlg on (mlg.id = rec.mlg_id)
       join persons prs on (prs.id = adr.prs_id)
       join formsofaddress foa on (foa.id = prs.foa_id)
where  mlg.product = ‘Easter bunny’ and rec.mlg_id=3
and    adr.zip like ‘8%’;
SELECT * FROM table(dbms_xplan
[/code]
 
This is the plan I have in 12c
[code language=”sql”]
————————————————————————————————————————–
| Id  | Operation                                | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
————————————————————————————————————————–
|   0 | SELECT STATEMENT                         |            |      1 |        |      1 |00:00:00.01 |     730 |      1 |
|   1 |  SORT AGGREGATE                          |            |      1 |      1 |      1 |00:00:00.01 |     730 |      1 |
|   2 |   NESTED LOOPS                           |            |      1 |    324 |     35 |00:00:00.01 |     730 |      1 |
|   3 |    NESTED LOOPS                          |            |      1 |    324 |     35 |00:00:00.01 |     695 |      1 |
|   4 |     NESTED LOOPS                         |            |      1 |    324 |     35 |00:00:00.01 |     658 |      1 |
|   5 |      NESTED LOOPS                        |            |      1 |    324 |    324 |00:00:00.01 |       8 |      0 |
|*  6 |       TABLE ACCESS BY INDEX ROWID        | MAILINGS   |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |
|*  7 |        INDEX UNIQUE SCAN                 | MLG_PK     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
|   8 |       TABLE ACCESS BY INDEX ROWID BATCHED| RECIPIENTS |      1 |    324 |    324 |00:00:00.01 |       6 |      0 |
|*  9 |        INDEX RANGE SCAN                  | REC_MLG_ID |      1 |    324 |    324 |00:00:00.01 |       4 |      0 |
|* 10 |      TABLE ACCESS BY INDEX ROWID         | ADDRESSES  |    324 |      1 |     35 |00:00:00.01 |     650 |      1 |
|* 11 |       INDEX UNIQUE SCAN                  | ADR_PK     |    324 |      1 |    324 |00:00:00.01 |     326 |      0 |
|* 12 |     INDEX UNIQUE SCAN                    | PRS_PK     |     35 |      1 |     35 |00:00:00.01 |      37 |      0 |
|  13 |    TABLE ACCESS BY INDEX ROWID           | PERSONS    |     35 |      1 |     35 |00:00:00.01 |      35 |      0 |
————————————————————————————————————————–
[/code]
Which is a good estimation using the histogram on mlg_id!
But in 11.2.0.3 it gives this:
[code language=”sql”]
——————————————————————————————————————-
| Id  | Operation                         | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
——————————————————————————————————————-
|   0 | SELECT STATEMENT                  |            |      1 |        |      1 |00:00:00.02 |     735 |    368 |
|   1 |  SORT AGGREGATE                   |            |      1 |      1 |      1 |00:00:00.02 |     735 |    368 |
|   2 |   NESTED LOOPS                    |            |      1 |        |     35 |00:00:00.02 |     735 |    368 |
|   3 |    NESTED LOOPS                   |            |      1 |    454 |     35 |00:00:00.02 |     700 |    346 |
|   4 |     NESTED LOOPS                  |            |      1 |    454 |     35 |00:00:00.02 |     663 |    336 |
|*  5 |      HASH JOIN                    |            |      1 |    454 |    324 |00:00:00.01 |      13 |      9 |
|*  6 |       TABLE ACCESS BY INDEX ROWID | MAILINGS   |      1 |      1 |      1 |00:00:00.01 |       2 |      1 |
|*  7 |        INDEX UNIQUE SCAN          | MLG_PK     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
|   8 |       TABLE ACCESS BY INDEX ROWID | RECIPIENTS |      1 |    454 |    324 |00:00:00.01 |      11 |      8 |
|   9 |        BITMAP CONVERSION TO ROWIDS|            |      1 |        |    324 |00:00:00.01 |       3 |      0 |
|* 10 |         BITMAP INDEX SINGLE VALUE | REC_MLG_ID |      1 |        |      1 |00:00:00.01 |       3 |      0 |
|* 11 |      TABLE ACCESS BY INDEX ROWID  | ADDRESSES  |    324 |      1 |     35 |00:00:00.02 |     650 |    327 |
|* 12 |       INDEX UNIQUE SCAN           | ADR_PK     |    324 |      1 |    324 |00:00:00.01 |     326 |     21 |
|* 13 |     INDEX UNIQUE SCAN             | PRS_PK     |     35 |      1 |     35 |00:00:00.01 |      37 |     10 |
|  14 |    TABLE ACCESS BY INDEX ROWID    | PERSONS    |     35 |      1 |     35 |00:00:00.01 |      35 |     22 |
——————————————————————————————————————-
[/code]
Why 454? Where this is coming from?  Why not using the histogram?
Anyone have an opinion on this?
Cheers
jko


     

Leave a Reply

Your email address will not be published. Required fields are marked *