Sometimes a histogram existence may mislead to a suboptimal plan
INTRO
In one of our critical OLTP database, we faced "Latch: cache buffer chains" wait event issue(a long running query) on a commonly used page. Long running query was called inside a PL/SQL package function. To alleviate the situation, first we changed the function to a deterministic function and rewrite the query's function call into a scalar subquery allows Oracle to use scalar subquery caching to optimize the number of function calls. As these cautions could not cure the issue and segment access statistics were not even, we changed table partitioning strategy from list partitioned to hash-partitioned. For the problematic query issue was solved, but a batch job which completes in 5 minutes before, started to experience performance problems and i digged down the issue. Root cause was the change of partitioning strategy. One query suffered deeply. This is how i investigated poor query performance and cured it.
ANALYSIS
After changing the partitioning strategy optimizer started to choose nested loop in the outer query instead of hash join. Plan output and query is not available. Query is not at all easy to understand, consists of lots of different subqueries. I will not share the real query to comply with data protection regulations. Most expensive section of the query was as below. On T_ACC_DOC and T_ACC_DOC_ENT tables, full table scan is performed.
SQL> SELECT D.RECV_ACC_ID AS RECEIVER, | |
DE.ENTITY_ID AS ENTITY, | |
NVL (SUM (DE.TRANS_CNT), 0) AS TRANS_COUNT | |
FROM S_ACC_DOC_ENT DE, | |
S_ACC_DOC D, | |
S_ACC_DEF S, | |
S_ACC_DEF S2 | |
WHERE D.RECV_ACC_ID = S.ACC_ID | |
AND D.ACC_ID = S2.ACC_ID | |
AND S2.STATE_CODE = 1 | |
AND S.STATE_CODE = 1 | |
AND D.DOC_ID = DE.DOC_ID | |
AND D.IN_OUT = 'O' | |
AND D.STATE_CODE = '1' | |
AND D.RELATED_DOC_ID IS NULL | |
AND D.RELATED_DOC IS NULL | |
AND D.ACC_ID <> D.RECV_ACC_ID | |
AND D.DOC_TYPE IN ('X11', | |
'X21', | |
'X71', | |
'X81') | |
AND D.DOC_REC_DATE >= TO_DATE ('01.01.2014', 'DD.MM.YYYY') | |
GROUP BY D.RECV_ACC_ID, DE.ENTITY_ID ; | |
-------------------------------------------------------------------------------------------------------------------- | |
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | | |
-------------------------------------------------------------------------------------------------------------------- | |
| 0 | SELECT STATEMENT | | 2564K| 176M| | 1577K (1)| 00:01:02 | | |
| 1 | HASH GROUP BY | | 2564K| 176M| 196M| 1577K (1)| 00:01:02 | | |
| 2 | NESTED LOOPS | | 2564K| 176M| | 1530K (1)| 00:01:00 | | |
| 3 | NESTED LOOPS | | 2599K| 156M| | 1529K (1)| 00:01:00 | | |
|* 4 | HASH JOIN | | 2724K| 140M| 42M| 1529K (1)| 00:01:00 | | |
| 5 | JOIN FILTER CREATE | :BF0000 | 883K| 32M| | 397K (1)| 00:00:16 | | |
|* 6 | TABLE ACCESS STORAGE FULL| T_ACC_DOC | 883K| 32M| | 397K (1)| 00:00:16 | | |
| 7 | JOIN FILTER USE | :BF0000 | 178M| 2723M| | 771K (1)| 00:00:31 | | |
|* 8 | TABLE ACCESS STORAGE FULL| T_ACC_DOC_ENT | 178M| 2723M| | 771K (1)| 00:00:31 | | |
|* 9 | INDEX UNIQUE SCAN | IU_ACCDEF_ACCIDSTATECODE | 1 | 9 | | 0 (0)| 00:00:01 | | |
|* 10 | INDEX UNIQUE SCAN | IU_ACCDEF_ACCIDSTATECODE | 1 | 9 | | 0 (0)| 00:00:01 | | |
-------------------------------------------------------------------------------------------------------------------- | |
Predicate Information (identified by operation id): | |
--------------------------------------------------- | |
4 - access("D"."DOC_ID"="DE"."DOC_ID") | |
6 - storage("D"."RELATED_DOC_ID" IS NULL AND "D"."IN_OUT"='O' AND | |
"D"."DOC_REC_DATE">=TO_DATE(' 2014-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND | |
("D"."DOC_TYPE"='X11' OR "D"."DOC_TYPE"='X21' OR "D"."DOC_TYPE"='X71' OR | |
"D"."DOC_TYPE"='X81') AND "D"."RELATED_DOC" IS NULL AND "D"."STATE_CODE"='1' AND | |
"D"."ACC_ID"<>"D"."RECV_ACC_ID") | |
filter("D"."RELATED_DOC_ID" IS NULL AND "D"."IN_OUT"='O' AND | |
"D"."DOC_REC_DATE">=TO_DATE(' 2014-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND | |
("D"."DOC_TYPE"='X11' OR "D"."DOC_TYPE"='X21' OR "D"."DOC_TYPE"='X71' OR | |
"D"."DOC_TYPE"='X81') AND "D"."RELATED_DOC" IS NULL AND "D"."STATE_CODE"='1' AND | |
"D"."ACC_ID"<>"D"."RECV_ACC_ID") | |
8 - storage(SYS_OP_BLOOM_FILTER(:BF0000,"DE"."DOC_ID")) | |
filter(SYS_OP_BLOOM_FILTER(:BF0000,"DE"."DOC_ID")) | |
9 - access("D"."ACC_ID"="S2"."ACC_ID" AND "S2"."STATE_CODE"=1) | |
10 - access("D."RECV_ACC_ID"="S"."ACC_ID" AND "S"."STATE_CODE"=1) | |
Statistics | |
---------------------------------------------------------- | |
26 recursive calls | |
26 db block gets | |
4184952 consistent gets | |
0 physical reads | |
8972 redo size | |
2746126 bytes sent via SQL*Net to client | |
75298 bytes received via SQL*Net from client | |
6742 SQL*Net roundtrips to/from client | |
0 sorts (memory) | |
0 sorts (disk) | |
101108 rows processed |
It looked pretty easy at first glance, there were two "IS NULL" predicate filters which causes full table scan operation. By default null values are not stored in index. For that reason, no index is used. FTS chosen.
Query was called inside a PL/SQL package function which is solely managed by dba team. For that reason, we have the chance to rewrite query as we want, so i examined selectivity of each condition on S_ACC_DOC table. I calculated selectivity based on the assumption of "Replacing NULL values with an unused value" as below. On RELATED_DOC_ID column, i replaced NULL values with a negative value (-1903). On RELATED_DOC column, i replaced NULL values with 'BJK'
Selectivity of DOC_TYPE IN ('X11', 'X21','X71', 'X81') => 0,175438596491228
Selectivity of NVL ("RELATED_DOC_ID", -1903)=-1903 => 0,543859649122807
Selectivity of DOC_REC_DATE >= TO_DATE ('01.01.2014', 'DD.MM.YYYY') => 0,719298245614035
Selectivity of NVL ("RELATED_DOC", 'BJK') ='BJK' => 0,929824561403509
I created a concatenated index according to the selectivity of conditions and rewrite the query as below.
SQL> CREATE INDEX ACCS.IN_ACCDOC_DCTYPRLTDDDTRLTD ON ACCS.T_ACC_DOC ( | |
DOC_TYPE, | |
NVL ("RELATED_DOC_ID", -1903), -- Casted nulls to -1903 (no negative values in table) | |
DOC_REC_DATE, | |
NVL ("RELATED_DOC", 'BJK') -- Casted nulls to BJK (no values like 'BJK' in table) | |
) | |
TABLESPACE USERS_IDX | |
ONLINE PARALLEL 64; | |
SQL> ALTER INDEX ACCS.IN_ACCDOC_DCTYPRLTDDDTRLTD NOPARALLEL; | |
SQL> SQL> SELECT D.RECV_ACC_ID AS RECEIVER, | |
DE.ENTITY_ID AS ENTITY, | |
NVL (SUM (DE.TRANS_CNT), 0) AS TRANS_COUNT | |
FROM S_ACC_DOC_ENT DE, | |
S_ACC_DOC D, | |
S_ACC_DEF S, | |
S_ACC_DEF S2 | |
WHERE D.RECV_ACC_ID = S.ACC_ID | |
AND D.ACC_ID = S2.ACC_ID | |
AND S2.STATE_CODE = 1 | |
AND S.STATE_CODE = 1 | |
AND D.DOC_ID = DE.DOC_ID | |
AND D.IN_OUT = 'O' | |
AND D.STATE_CODE = '1' | |
AND NVL(RELATED_DOC_ID,-1903)=-1903 | |
AND NVL(D.RELATED_DOC,'BJK')='BJK' | |
AND D.ACC_ID <> D.RECV_ACC_ID | |
AND D.DOC_TYPE IN ('X11', | |
'X21', | |
'X71', | |
'X81') | |
AND D.DOC_REC_DATE >= TO_DATE ('01.01.2014', 'DD.MM.YYYY') | |
GROUP BY D.RECV_ACC_ID, DE.ENTITY_ID ; | |
---------------------------------------------------------------------------------------------------------------------------- | |
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | |
---------------------------------------------------------------------------------------------------------------------------- | |
| 0 | SELECT STATEMENT | | 1 | 98 | 109 (1)| 00:00:01 | | |
| 1 | HASH GROUP BY | | 1 | 98 | 109 (1)| 00:00:01 | | |
| 2 | NESTED LOOPS | | 1 | 98 | 108 (0)| 00:00:01 | | |
| 3 | NESTED LOOPS | | 1 | 98 | 108 (0)| 00:00:01 | | |
| 4 | NESTED LOOPS | | 1 | 77 | 104 (0)| 00:00:01 | | |
| 5 | NESTED LOOPS | | 1 | 68 | 104 (0)| 00:00:01 | | |
| 6 | INLIST ITERATOR | | | | | | | |
|* 7 | TABLE ACCESS BY INDEX ROWID BATCHED| T_ACC_DOC | 1 | 59 | 104 (0)| 00:00:01 | | |
|* 8 | INDEX RANGE SCAN | IN_ACCDOC_DCTYPRLTDDDTRLTD | 1 | | 103 (0)| 00:00:01 | | |
|* 9 | INDEX UNIQUE SCAN | IU_ACCDEF_ACCIDSTATECODE | 1 | 9 | 0 (0)| 00:00:01 | | |
|* 10 | INDEX UNIQUE SCAN | IU_ACCDEF_ACCIDSTATECODE | 1 | 9 | 0 (0)| 00:00:01 | | |
|* 11 | INDEX RANGE SCAN | IN_ACCDOCENT_ACCIDDOCID | 1 | | 3 (0)| 00:00:01 | | |
| 12 | TABLE ACCESS BY INDEX ROWID | T_ACC_DOC_ENT | 1 | 21 | 4 (0)| 00:00:01 | | |
---------------------------------------------------------------------------------------------------------------------------- | |
Predicate Information (identified by operation id): | |
--------------------------------------------------- | |
7 - filter("D"."IN_OUT"='O' AND "D"."STATE_CODE"='1' AND "D"."ACC_ID"<>"D"."RECV_ACC_ID") | |
8 - access(("D"."DOC_TYPE"='X11' OR "D"."DOC_TYPE"='X21' OR "D"."DOC_TYPE"='X71' OR | |
"D"."DOC_TYPE"='X81') AND NVL("RELATED_DOC_ID",(-1903))=(-1903) AND "D"."DOC_REC_DATE">=TO_DATE(' | |
2014-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND NVL("RELATED_DOC",'BJK')='BJK') | |
filter(NVL("RELATED_DOC",'BJK')='BJK') | |
9 - access("D"."RECV_ACC_ID"="S"."ACC_ID" AND "S"."STATE_CODE"=1) | |
10 - access("D"."ACC_ID"="S2"."ACC_ID" AND "S2"."STATE_CODE"=1) | |
11 - access("D"."ACC_ID"="DE"."ACC_ID" AND "D"."DOC_ID"="DE"."DOC_ID") | |
Statistics | |
---------------------------------------------------------- | |
30 recursive calls | |
22 db block gets | |
548422 consistent gets | |
0 physical reads | |
9772 redo size | |
2746389 bytes sent via SQL*Net to client | |
75491 bytes received via SQL*Net from client | |
6742 SQL*Net roundtrips to/from client | |
0 sorts (memory) | |
0 sorts (disk) | |
101108 rows processed |
Initially, it looked like solved. Block gets were decreased heavily %87. Also execution time decreased dramatically from 60 seconds to 1 second. But after two or three days later, batch job did not complete in time again. Problem revived. Examined the execution plan in cursor cache and saw that plan_hash_value was changed. Our concatenated index was not used in the new execution plan. Why optimizer changed its choice was the new mystery. When searching for a clue, I noticed that table statistics of one of the base tables was refreshed yesterday. Gathering of the table_stats triggered a change of execution plan and it resulted with a degraded performance for our query.
I abbreviated the sample query with the below one to simplify testing.
SQL> SELECT RECV_ACC_ID FROM S_ACC_DOC D | |
WHERE D.DOC_TYPE IN ('X11', | |
'X21', | |
'X71', | |
'X81') | |
AND NVL (D.RELATED_DOC_ID, -1903) = -1903 | |
AND D.DOC_REC_DATE >= TO_DATE ('01.01.2014', 'DD.MM.YYYY') | |
AND NVL (D.RELATED_DOC, 'BJK') = 'BJK'; | |
Elapsed: 00:00:08.51 | |
Execution Plan | |
---------------------------------------------------------- | |
Plan hash value: 2849118445 | |
---------------------------------------------------------------------------------------- | |
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | |
---------------------------------------------------------------------------------------- | |
| 0 | SELECT STATEMENT | | 2762K| 73M| 397K (2)| 00:00:16 | | |
|* 1 | TABLE ACCESS STORAGE FULL| T_ACC_DOC | 2762K| 73M| 397K (2)| 00:00:16 | | |
---------------------------------------------------------------------------------------- | |
Predicate Information (identified by operation id): | |
--------------------------------------------------- | |
1 - storage("D"."DOC_REC_DATE">=TO_DATE(' 2014-01-01 00:00:00', | |
'syyyy-mm-dd hh24:mi:ss') AND ("D"."DOC_TYPE"='X11' OR | |
"D"."DOC_TYPE"='X21' OR "D"."DOC_TYPE"='X71' OR | |
"D"."DOC_TYPE"='X81')) | |
filter("D"."DOC_REC_DATE">=TO_DATE(' 2014-01-01 00:00:00', | |
'syyyy-mm-dd hh24:mi:ss') AND ("D"."DOC_TYPE"='X11' OR | |
"D"."DOC_TYPE"='X21' OR "D"."DOC_TYPE"='X71' OR | |
"D"."DOC_TYPE"='X81') AND NVL("RELATED_DOC_ID",(-1903))=(-1903) AND | |
NVL("RELATED_DOC",'BJK')='BJK') | |
Statistics | |
---------------------------------------------------------- | |
8 recursive calls | |
9 db block gets | |
1352052 consistent gets | |
0 physical reads | |
1728 redo size | |
1627073 bytes sent via SQL*Net to client | |
65034 bytes received via SQL*Net from client | |
5853 SQL*Net roundtrips to/from client | |
0 sorts (memory) | |
0 sorts (disk) | |
87776 rows processed |
Our concatenated index is not preferred by CBO. I dropped the index and recreated it and executed query.
SQL> SELECT RECV_ACC_ID FROM S_ACC_DOC D | |
WHERE D.DOC_TYPE IN ('X11', | |
'X21', | |
'X71', | |
'X81') | |
AND NVL (D.RELATED_DOC_ID, -1903) = -1903 | |
AND D.DOC_REC_DATE >= TO_DATE ('01.01.2014', 'DD.MM.YYYY') | |
AND NVL (D.RELATED_DOC, 'BJK') = 'BJK'; | |
87776 rows selected. | |
Elapsed: 00:00:00.43 | |
Execution Plan | |
---------------------------------------------------------- | |
Plan hash value: 3313358201 | |
----------------------------------------------------------------------------------------------------------------------- | |
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | |
----------------------------------------------------------------------------------------------------------------------- | |
| 0 | SELECT STATEMENT | | 1 | 44 | 104 (0)| 00:00:01 | | |
| 1 | INLIST ITERATOR | | | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T_ACC_DOC | 1 | 44 | 104 (0)| 00:00:01 | | |
|* 3 | INDEX RANGE SCAN | IN_ACCDOC_DCTYPRLTDDDTRLTD | 1 | | 103 (0)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------- | |
Predicate Information (identified by operation id): | |
--------------------------------------------------- | |
3 - access(("D"."DOC_TYPE"='A11' OR "D"."DOC_TYPE"='A21' OR "D"."DOC_TYPE"='A71' OR | |
"D"."DOC_TYPE"='A81') AND NVL("RELATED_DOC_ID",(-1903))=(-1903) AND | |
"D"."DOC_REC_DATE">=TO_DATE(' 2014-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND | |
NVL("RELATED_DOC",'BJK')='BJK') | |
filter(NVL("RELATED_DOC",'BJK')='BJK') | |
Statistics | |
---------------------------------------------------------- | |
382 recursive calls | |
5 db block gets | |
90813 consistent gets | |
472 physical reads | |
37732 redo size | |
1627073 bytes sent via SQL*Net to client | |
65034 bytes received via SQL*Net from client | |
5853 SQL*Net roundtrips to/from client | |
6 sorts (memory) | |
0 sorts (disk) | |
87776 rows processed |
Strange but true, index is preferred again and query performed better. I gathered table statistics (EXEC dbms_stats.gather_table_stats( 'ACCS', 'T_ACC_DOC')) and executed query again and query performed poorly. That was reproducible.
I tried to clarify why optimizer changes the decision after statistics gathered freshly. And compared the 10053 analysis of the query, before and after of the statistics gather operation. I noticed that the calculations are made according to the hybrid histogram statistics after statistics gathered freshly. Plan statistics of the query is as below.
SQL> SELECT /*+ gather_plan_statistics */ | |
RECV_ACC_ID | |
FROM S_ACC_DOC D | |
WHERE D.DOC_TYPE IN ('X11', | |
'X21', | |
'X71', | |
'X81') | |
AND D.DOC_REC_DATE >= TO_DATE ('01.01.2021', 'DD.MM.YYYY') | |
AND NVL (D.RELATED_DOC_ID, -1903) = -1903 | |
AND NVL (D.RELATED_DOC, 'BJK') = 'BJK'; | |
SQL> select * from dbms_xplan.display_cursor('bk5qq5jnhxs0g',0,'+ALLSTATS LAST') | |
PLAN_TABLE_OUTPUT | |
----------------- | |
SQL_ID bk5qq5jnhxs0g, child number 0 | |
------------------------------------- | |
SELECT /*+ gather_plan_statistics */ RECV_ACC_ID FROM | |
S_ACC_DOC D WHERE D.DOC_TYPE IN ('X11', 'X21', | |
'X71', 'X81') AND D.DOC_REC_DATE >= TO_DATE | |
('01.01.2021', 'DD.MM.YYYY') AND NVL (D.RELATED_DOC_ID, | |
-1903) = -1903 AND NVL (D.RELATED_DOC, 'BJK') = 'BJK' | |
Plan hash value: 2849118445 | |
-------------------------------------------------------------------------------------------------- | |
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | | |
-------------------------------------------------------------------------------------------------- | |
| 0 | SELECT STATEMENT | | 1 | | 85619 |00:00:01.42 | 1351K| | |
|* 1 | TABLE ACCESS STORAGE FULL| T_ACC_DOC | 1 | 623K| 85619 |00:00:01.42 | 1351K| | |
-------------------------------------------------------------------------------------------------- | |
Predicate Information (identified by operation id): | |
--------------------------------------------------- | |
1 - storage(("D"."DOC_REC_DATE">=TO_DATE(' 2021-01-01 00:00:00', 'syyyy-mm-dd | |
hh24:mi:ss') AND INTERNAL_FUNCTION("D"."DOC_TYPE"))) | |
filter(("D"."DOC_REC_DATE">=TO_DATE(' 2021-01-01 00:00:00', 'syyyy-mm-dd | |
hh24:mi:ss') AND INTERNAL_FUNCTION("D"."DOC_TYPE") AND | |
NVL("RELATED_DOC_ID",(-1903))=(-1903) AND NVL("RELATED_DOC",'BJK')='BJK')) |
Estimated rows(623000) and affected rows(85619) differ highly. Optimizer estimate cardinalities looks a little bit inaccurate and i tried with /*+ dynamic_sampling(11) */ hint whether optimizer may generate an optimal execution plan. Here is the output.
SQL> SELECT /*+ dynamic_sampling(11) gather_plan_statistics */ | |
RECV_ACC_ID | |
FROM S_ACC_DOC B | |
WHERE D.DOC_TYPE IN ('X11', | |
'X21', | |
'X71', | |
'X81') | |
AND D.DOC_REC_DATE >= TO_DATE ('01.01.2021', 'DD.MM.YYYY') | |
AND NVL (D.RELATED_DOC_ID, -1903) = -1903 | |
AND NVL (D.RELATED_DOC, 'BJK') = 'BJK'; | |
SQL> select * from dbms_xplan.display_cursor('brgh4jm2xppfn',0,'+ALLSTATS LAST'); | |
SQL_ID brgh4jm2xppfn, child number 0 | |
------------------------------------- | |
SELECT /*+ dynamic_sampling(11) gather_plan_statistics */ | |
RECV_ACC_ID FROM S_ACC_DOC B WHERE D.DOC_TYPE IN | |
('X11', 'X21', 'X71', 'X81') AND | |
D.DOC_REC_DATE >= TO_DATE ('01.01.2021', 'DD.MM.YYYY') AND | |
NVL (D.RELATED_DOC_ID, -1903) = -1903 AND NVL | |
(D.RELATED_DOC, 'BJK') = 'BJK' | |
Plan hash value: 3313358201 | |
--------------------------------------------------------------------------------------------------------------------------------- | |
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | | |
--------------------------------------------------------------------------------------------------------------------------------- | |
| 0 | SELECT STATEMENT | | 1 | | 85619 |00:00:00.13 | 89215 | | |
| 1 | INLIST ITERATOR | | 1 | | 85619 |00:00:00.13 | 89215 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T_ACC_DOC | 4 | 86000 | 85619 |00:00:00.13 | 89215 | | |
|* 3 | INDEX RANGE SCAN | IN_ACCDOC_DCTYPRLTDDDTRLTD | 4 | 85619 | 85619 |00:00:00.03 | 6082 | | |
--------------------------------------------------------------------------------------------------------------------------------- | |
Predicate Information (identified by operation id): | |
--------------------------------------------------- | |
3 - access((("D"."DOC_TYPE"='X11' OR "D"."DOC_TYPE"='X21' OR "D"."DOC_TYPE"='X71' OR | |
"D"."DOC_TYPE"='X81')) AND "D"."SYS_NC00076$"=(-1903) AND "D"."DOC_REC_DATE">=TO_DATE(' 2021-01-01 00:00:00', | |
'syyyy-mm-dd hh24:mi:ss') AND "D"."SYS_NC00077$"='BJK') | |
filter("D"."SYS_NC00077$"='BJK') | |
Note | |
----- | |
- dynamic statistics used: dynamic sampling (level=AUTO) | |
85619 rows processed |
When dynamic statistics with level 11 used, Optimizer estimations have been more accurate(86000), almost perfect. On predicate information, SYS_NC00076$ and SYS_NC00077$ virtual columns are used. With default dynamic_statistics level 2 is in charge, what happened step by step is explained here.
* A concatenated index (IN_ACCDOC_DCTYPRLTDDDTRLTD) created according to filter conditions.
* In this concatenated index, 4 columns are referenced from base table(T_ACC_DOC), two of them are referenced in NVL function.
* Oracle created two virtual columns internally for the NVL functions. (SYS_NC00076$, SYS_NC00077$) This is default behaviour when a formula or expression is used. These virtual columns may also be seen in dba_tab_cols dictionary view and will not be seen in dba_tab_columns view.
* As no table statistics gathering preference is in use for T_ACC_DOC table, table statistics are collected with default parameters (METHOD_OPT => 'FOR ALL COLUMNS AUTO').
* Oracle evaluated column usage statistics (select * from sys.col_usage$ where obj#=83662 and intcol# in (76,77)) and decided to collect a hybrid histogram statistic to help optimizer make better estimates. As problematic query's predicate filters are recorded in col_usage$ table, Oracle was aware of these virtual columns statistics may be useful for subsequent executions. Nigel Bayliss blog post (How does the Oracle Database choose what histograms to create?) has more information about it.
* Also as data is highly skewed on some values such as NVL ("RELATED_DOC_ID", -1903)=-1903 and NVL ("RELATED_DOC", 'BJK')=BJK. These values are spotted as popular values in histogram statistics.
* In the light of all these information, Full table scan looked reasonable to CBO, when -1903 and BJK values are queried all together.
We have lots of possible solutions, some of them are-- Set table preference not to collect histogram on specified column(virtual) | |
BEGIN | |
DBMS_STATS.SET_TABLE_PREFS ('ACCS' ,'T_ACC_DOC','METHOD_OPT','FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 SYS_NC00076$,SYS_NC00077$'); | |
END; | |
-- To reset table statistics collection preference | |
BEGIN | |
DBMS_STATS.DELETE_TABLE_PREFS ('ACCS' ,'T_ACC_DOC','METHOD_OPT'); | |
END; |
I was just wondering what level of dynamic sampling oracle decided to go with when you set it to 11. Was it available from the optimizer trace?
ReplyDeleteSorry Fadai, Unfortunately trace is not available. If i have a chance, i will try to check.
Delete