Oracle - Adaptive Query Optimization(Adaptive Plans)
by Yasser Khan
Introduction
Adaptive Plans is one of the new feature in introduced in 12c and is part of Adaptive Query Optimization feature. Adaptive Plans can be further categorized into two methods as Join method and Parallel distribution methods. In this article we will focus only on Adaptive Join method.
Adaptive Join method is an mechanism to learn from its mistakes and got introduced into Optimizer to perform run-time adjustments for the execution plans when the derived plan through statistics is not optimal. Thus Optimizer will defer its decision of chosing the final join method to the execution phase of the sql. In earlier releases this was only possible at parse phase of the sql and once sql is into execution phase plan is fixed and it can’t be changed. It is important to note that Adaptive plan will delay its decision of chosing final join method untill execution phase and introduces the new operation in the execution plan called as 'STATISTICS COLLECTOR'
which buffers the rows and switches from the default estimated join method when buffered rows are more than the inflection point.
So it is not that executing join method will change to other join method on the fly, infact it is the decision of final join method which is evaluated at run-time. When final join method is decided then 'STATISTICS COLLECTOR'
will be disabled.
Adaptive plan in action:
Table DEMO
is having below structure along with index on column LOCATION
SQL> desc demo
Name Null? Type
-------------------- -------- ----------------------------
SSN NUMBER
LOCATION NUMBER
DETAILS VARCHAR2(200)
The data distribution is as follows.
select count(*),location from demo group by location order by 1;
COUNT(*) LOCATION
---------- ----------
10 10
100 20
1000 30
10000 40
100000 50
1000000 60
10000000 70
Now let’s execute below sql and check its execution plan. Since LOCATION
column data is skewed Optimizer will not be able to estimate the cardinality precisely and thus it may perform wrong join method.
SELECT /*+ gather_plan_statistics */ count(*)
FROM DEMO a,
DEMO b
WHERE a.ssn = b.ssn AND
a.location = 40 AND
b.location = 50;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 9w38rcsj3ph39, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ count(*) FROM DEMO a,DEMO b WHERE a.ssn = b.ssn AND a.location = 40 AND b.location = 50
Plan hash value: 2967406341
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.64 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.64 |
|* 2 | HASH JOIN | | 1 | 1587K| 0 |00:00:00.64 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO | 1 | 1587K| 10000 |00:00:00.01 |
|* 4 | INDEX RANGE SCAN | DEMO_IDX | 1 | 1587K| 10000 |00:00:00.01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO | 1 | 1587K| 100K|00:00:00.07 |
|* 6 | INDEX RANGE SCAN | DEMO_IDX | 1 | 1587K| 100K|00:00:00.03 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."SSN"="B"."SSN")
4 - access("A"."LOCATION"=40)
6 - access("B"."LOCATION"=50)
Note
-----
- this is an adaptive plan
So the note section says this plan is the adaptive planand and HASH JOIN has been used as the final joining method, but we dont know yet what was the estimated plan and where did the 'STATISTICS COLLECTOR'
occur in this execution plan. Also note that Actual and Estimated rows are still way off, this concludes that Adaptive joins are meant only for deciding final join method but not for rectifying estimated cardinality. To get all these details we can use ‘adaptive’ as reporting type in dbms_xplan
package.
SQL> select * from table(dbms_xplan.display_cursor('9w38rcsj3ph39',null,'adaptive allstats'));
PLAN_TABLE_OUTPUT
-----------------
SQL_ID 9w38rcsj3ph39, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ count(*) FROM DEMO a, DEMO b WHERE a.ssn = b.ssn AND a.location = 40 AND b.location = 50
Plan hash value: 2967406341
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.64 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.64 |
| * 2 | HASH JOIN | | 1 | 1587K| 0 |00:00:00.64 |
|- 3 | NESTED LOOPS | | 1 | 1587K| 10000 |00:00:00.01 |
|- 4 | NESTED LOOPS | | 1 | | 10000 |00:00:00.01 |
|- 5 | STATISTICS COLLECTOR | | 1 | | 10000 |00:00:00.01 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO | 1 | 1587K| 10000 |00:00:00.01 |
| * 7 | INDEX RANGE SCAN | DEMO_IDX | 1 | 1587K| 10000 |00:00:00.01 |
|- * 8 | INDEX RANGE SCAN | DEMO_IDX | 0 | 1587K| 0 |00:00:00.01 |
|- * 9 | TABLE ACCESS BY INDEX ROWID | DEMO | 0 | 1 | 0 |00:00:00.01 |
| 10 | TABLE ACCESS BY INDEX ROWID BATCHED | DEMO | 1 | 1587K| 100K|00:00:00.07 |
| * 11 | INDEX RANGE SCAN | DEMO_IDX | 1 | 1587K| 100K|00:00:00.03 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."SSN"="B"."SSN")
7 - access("A"."LOCATION"=40)
8 - access("B"."LOCATION"=50)
9 - filter("A"."SSN"="B"."SSN")
11 - access("B"."LOCATION"=50)
Note
-----
- this is an adaptive plan (rows marked '-' are inactive)
Adaptive report says Nested Loop join method was not chosen and 'STATISTICS COLLECTOR'
was placed as parent operation for line ID 6 and 7. According to Predicate information line ID 7 is having ‘access’ predicate for "A"."LOCATION"=40
. In principle 'STATISTICS COLLECTOR'
will buffer the rows from the index access path of DEMO_IDX
for the LOCATION
value 40 and if total number of buffered rows exceeds the inflection point then HASH JOIN
will be chosed else NESTED LOOP
join will be chosed as final join method.
But what is the inflection point and how do I find it? Enable 10053 tracing and search for lines starting with AP/DP as shown below.
SQL> !egrep "^AP:|^DP:" cdbt0001_ora_21971.trc
AP: Checking validity for query block SEL$1, sqlid=9w38rcsj3ph39.
AP: Computing costs for inflection point at min value 0.00
AP: Using binary search for inflection point search
AP: Costing Nested Loops Join for inflection point at card 0.00
AP: Costing Hash Join for inflection point at card 0.00
AP: lcost=108275.29, rcost=108278.83
AP: Computing costs for inflection point at max value 1587301.43
AP: Costing Nested Loops Join for inflection point at card 1587301.43
AP: Costing Hash Join for inflection point at card 1587301.43
AP: lcost=85933997509.98, rcost=111444.49
AP: Searching for inflection point at value 1.00
AP: Costing Nested Loops Join for inflection point at card 793650.71
AP: Costing Hash Join for inflection point at card 793650.71
AP: lcost=42967052892.63, rcost=110652.44
AP: Searching for inflection point at value 793650.71
AP: Costing Nested Loops Join for inflection point at card 396825.36
AP: Costing Hash Join for inflection point at card 396825.36
AP: lcost=21483526445.56, rcost=110256.42
AP: Searching for inflection point at value 396825.36
AP: Costing Nested Loops Join for inflection point at card 198412.68
AP: Costing Hash Join for inflection point at card 198412.68
AP: lcost=10741817360.42, rcost=110058.41
....
....
....
AP: Costing Nested Loops Join for inflection point at card 48.44
AP: Costing Hash Join for inflection point at card 48.44
AP: lcost=2652780.29, rcost=108278.83
AP: Searching for inflection point at value 48.44
AP: Costing Nested Loops Join for inflection point at card 24.22
AP: Costing Hash Join for inflection point at card 24.22
AP: lcost=1353458.59, rcost=108278.83
AP: Searching for inflection point at value 24.22
AP: Costing Nested Loops Join for inflection point at card 12.11
AP: Costing Hash Join for inflection point at card 12.11
AP: lcost=703797.74, rcost=108278.83
AP: Searching for inflection point at value 12.11
AP: Costing Nested Loops Join for inflection point at card 6.06
AP: Costing Hash Join for inflection point at card 6.06
AP: lcost=378967.31, rcost=108278.83
AP: Searching for inflection point at value 6.06
AP: Costing Nested Loops Join for inflection point at card 3.03
AP: Costing Hash Join for inflection point at card 3.03
AP: lcost=216552.10, rcost=108278.83
AP: Searching for inflection point at value 3.03
AP: Costing Nested Loops Join for inflection point at card 1.51
AP: Costing Hash Join for inflection point at card 1.51
AP: lcost=162413.69, rcost=108278.83
AP: Searching for inflection point at value 1.51
AP: Costing Nested Loops Join for inflection point at card 0.76
AP: Costing Hash Join for inflection point at card 0.76
AP: lcost=108275.29, rcost=108278.83
AP: Costing Nested Loops Join for inflection point at card 0.76
DP: Found point of inflection for NLJ vs. HJ: card = 0.76
According to tracing information estimated join cardinality 1587301 has been halved every time and performed cost calculation of Hash join and Nested Loop join untill the other join exceeds the cost. In this case inflection point has been found at cardinality 0.76 which means if buffering of rows by 'STATISTICS COLLECTOR'
for ‘access’ predicate "A"."LOCATION"=40
is more than 0.76 then HASH join will be chosen. Amount of work done to find the inflection point will depends upon the size of the tables and clustering factor of the index, and it increases linearly when table size and clustering factor of index delve each other to reach the inflection point.
NOTE: We can also use optimizer_adaptive_reporting_only
parameter to control the reporting-only mode. When this parameter is enabled all the adaptive optimizations information will be gathered but will not be implemented. To view the report of adaptive optimizations we can use DBMS_XPLAN
package.
Breaking point
This raises an interesting question, what happens if my data is volatile enough to change this inflection point ? Well, inflection point is calculated only at hard parse phase fo the sql, so if change in data is presumed to change the inflection point then hard parse has to occur else inflection point will not be re-calculated.
Let’s continue with the previous example and see what happens if we delete all the rows having "A"."LOCATION"=40
SQL> delete from demo where location=40;
10000 rows deleted.
SELECT /*+ gather_plan_statistics */ count(*)
FROM DEMO a,
DEMO b
WHERE a.ssn = b.ssn AND
a.location = 40 AND
b.location = 50;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 9w38rcsj3ph39, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ count(*) FROM DEMO a,DEMO b WHERE a.ssn = b.ssn AND a.location = 40 AND b.location = 50
Plan hash value: 2967406341
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 |
|* 2 | HASH JOIN | | 1 | 1587K| 0 |00:00:00.01 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO | 1 | 1587K| 0 |00:00:00.01 |
|* 4 | INDEX RANGE SCAN | DEMO_IDX | 1 | 1587K| 0 |00:00:00.01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO | 0 | 1587K| 0 |00:00:00.01 |
|* 6 | INDEX RANGE SCAN | DEMO_IDX | 0 | 1587K| 0 |00:00:00.01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."SSN"="B"."SSN")
4 - access("A"."LOCATION"=40)
6 - access("B"."LOCATION"=50)
Note
-----
- this is an adaptive plan
Plan has not changed to use NESTED LOOP
join method as there are no rows with location=40 and is less than inflection point cardinality 0.76, but if we force this sql to go for hard parse then STATISTICS COLLECTOR
will detect this and switches the plan to use NESTED LOOP
join.
SQL> alter system flush shared_pool;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
SQL_ID 9w38rcsj3ph39, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ count(*) FROM DEMO a,DEMO b WHERE a.ssn = b.ssn AND a.location = 40 AND b.location = 50
Plan hash value: 3074188767
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 |
| 2 | NESTED LOOPS | | 1 | 1587K| 0 |00:00:00.01 |
| 3 | NESTED LOOPS | | 1 | | 0 |00:00:00.01 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO | 1 | 1587K| 0 |00:00:00.01 |
|* 5 | INDEX RANGE SCAN | DEMO_IDX | 1 | 1587K| 0 |00:00:00.01 |
|* 6 | INDEX RANGE SCAN | DEMO_IDX | 0 | 1587K| 0 |00:00:00.01 |
|* 7 | TABLE ACCESS BY INDEX ROWID | DEMO | 0 | 1 | 0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("A"."LOCATION"=40)
6 - access("B"."LOCATION"=50)
7 - filter("A"."SSN"="B"."SSN")
Note
-----
- this is an adaptive plan
This confirms that inflection point is calculated only at hard parse of the sql statement and thus contradicts this feature with higly volatile data which may influence inflection point. The threats from this feature in databases where data is higly volatile resembles similar to that long live bind variable peeking issue. So one of the best method to solve these kind of issues is to know your data well and then gain complete control over the cursors by modelling them which requires special care. Better to closely monitor the performance of sql’s using adaptive plans, information can be found at many places and one of the important column is IS_RESOLVED_ADAPTIVE_PLAN
in V$SQL/V$SQL_PLAN
.
Persisting Adaptive plan by using SQL Plan Baseline
Since adaptive plans doesn’t persist its learning information we can use SPM to instruct sql’s for using the ultimate adaptive plans without going through buffering of rows by 'STATISTICS COLLECTOR'
for finding the inflection point. Using SPM to fix the adaptive plans when you are using cursor_sharing=EXACT
will enforce to create multiple baselines for each literal value which seems to be impractical, in such cases we can leverage the force_match
option of Sql profile to moderate the plan for all the similar sql’s which differs only by literal values. Infact Sql profile will use force_match_signature
when force_match option is enabled for the sql. Using band aid features like SPM, Sql profile are all not meant for providing permanent fix as it will go sour over the period of time and thus its an additional overhead for maintaining it.
Lets implement SPM for previous sql and check whether buffering of rows will be done by STATISTICS COLLECTOR
or not.
DECLARE
return_var binary_integer;
BEGIN
return_var :=
dbms_spm.load_plans_from_cursor_cache(
sql_id => '9w38rcsj3ph39',
plan_hash_value => 2967406341,
fixed => 'NO',
enabled => 'YES');
END;
/
SQL> select sql_text,enabled,LAST_EXECUTED,ADAPTIVE from dba_sql_plan_baselines;
SQL_TEXT ENABLED LAST_EXECUTED ADA
-------------------------------------------------- ------- ---------------- ---
SELECT /*+ gather_plan_statistics */ count(*) YES NO
SELECT /*+ gather_plan_statistics */ count(*)
FROM DEMO a,
DEMO b
WHERE a.ssn = b.ssn AND
a.location = 40 AND
b.location = 50;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
SQL_ID 9w38rcsj3ph39, child number 2
-------------------------------------
SELECT /*+ gather_plan_statistics */ count(*) FROM DEMO a,DEMO b WHERE a.ssn = b.ssn AND a.location = 40 AND b.location = 50
Plan hash value: 2967406341
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.27 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.27 |
|* 2 | HASH JOIN | | 1 | 1587K| 0 |00:00:00.27 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO | 1 | 1587K| 10000 |00:00:00.22 |
|* 4 | INDEX RANGE SCAN | DEMO_IDX | 1 | 1587K| 10000 |00:00:00.16 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO | 1 | 1587K| 100K|00:00:00.06 |
|* 6 | INDEX RANGE SCAN | DEMO_IDX | 1 | 1587K| 100K|00:00:00.02 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."SSN"="B"."SSN")
4 - access("A"."LOCATION"=40)
6 - access("B"."LOCATION"=50)
Note
-----
- SQL plan baseline SQL_PLAN_7yjjgk7bkg32yf1102560 used for this statement
SQL> select sql_text,enabled,LAST_EXECUTED,ADAPTIVE from dba_sql_plan_baselines;
SQL_TEXT ENABLED LAST_EXECUTED ADA
--------------------------------------------- ------- ---------------------------- ---
SELECT /*+ gather_plan_statistics */ count(*) YES 07-JAN-16 04.09.45.000000 AM NO
As you see SQL plan baseline has been used to use final adaptive plan without STATISTICS COLLECTOR
operator. The Adaptive column in this view states whether this plan is adaptive or not when it was automatically captured. If adaptive plan is found for the sql already having baseline then adaptive column will be set to YES, and this adaptive plan will be tested and will be accepted if it performs better than existing baseline plans, at last adaptive column for this plan will set to NO since it is resolved plan and no longer an adaptive plan.
In case of Adaptive plan it doesn’t mean that all the time taken by STATISTICS COLLECTOR
for buffering the rows until final join method found is waste of time/resource and overhead. Because the buffered time will be negligible when compared to the penalty caused by using wrong join method with disastrous effect on the total response time of the sql.
Plan hash value for Adaptive plans
There is a new column FULL_PLAN_HASH_VALUE
in V$SQL/V$SQL_PLAN/V$SQLAREA/V$SQL_PLAN_STATISTICS
which represents the plan hash value for adaptive plans. We need to be careful enough and not get confused as each PLAN_HASH_VALUE
can have multiple FULL_PLAN_HASH_VALUE
and viceversa.
In case of sql using SQL plan baseline FULL_PLAN_HASH_VALUE
will be the result of PHV2
and this PHV2
is derived from OTHER_XML
column of V$SQL_PLAN
. SPM will always considers PHV2
of the plan and then compare it with stored 'Plan ID'
to check the possibility of reproducing accepted plan stored in baseline, if PHV2
and "Plan ID"
doesn’t match then baseline will be ignored. SPM uses PHV2
but not PLAN_HASH_VALUE
because PHV2
is independent of the intermediate objects name conventions which Oracle creates. In case of SQL profile/patches it never uses PHV2
, it just applies the hints stored to come up with the execution plan. SPM also applies hints but later on compares the PHV2
and stored 'Plan ID'
. Interesting thing is all these hints are stored in sqlobj$data but used by SPM/Profile/Patches in their own distintive way.
Below is the report derived by using different scenarios to generate multiple combination of FULL_PLAN_HASH_VALUE
and PLAN_HASH_VALUE
.
SELECT p.sql_id,
p.child_number,
p.plan_hash_value,
t.phv2,
p.full_plan_hash_value
FROM V$SQL_PLAN p,
XMLTABLE(' for $i in /other_xml/info
where $i/@type eq "plan_hash_2"
return $i'
passing xmltype(p.other_xml)
COLUMNS phv2 NUMBER path '/') t
WHERE p.sql_id = '9w38rcsj3ph39' AND
p.other_xml IS NOT NULL
SQL_ID CHILD_NUM PLAN_HASH_VALUE PHV2 FULL_PLAN_HASH_VALUE
------------- --------- --------------- ---------- --------------------
9w38rcsj3ph39 0 2967406341 4044367200 1001692354 -> Adaptive plan using HASH JOIN
9w38rcsj3ph39 0 3074188767 2183012257 1001692354 -> Adaptive plan using NESTED LOOP JOIN
9w38rcsj3ph39 1 2967406341 4044367200 4044367200 -> Not Adaptive using HASH JOIN
9w38rcsj3ph39 2 2967406341 4044367200 4044367200 -> Not Adaptive but using baseline(HASH JOIN)
In above report we can obeserve that for Adaptive plans FULL_PLAN_HASH_VALUE
is same when PLAN_HASH_VALUE/PHV2
is different in terms of joining methods(Hash and Nested), this concludes FULL_PLAN_HASH_VALUE
is calculated for all the access operations in the execution plan regardless of whether they were active or not. To avoid confusion consider FULL_PLAN_HASH_VALUE
for a child cursor only when it is Adaptive plan.
Conclusion
This Adaptive Query Optimization feature learns from its mistakes and choses the most efficient join method between two datasets by buffering few records until inflection point is reached during runtime of the sql. Adaptive join method will influence only the join methods(HASH and NESTED) but not join orders like cardinality feedback or ACS does. In upcoming post we will discuss about how Adaptive Query Optimization feature influences the Parallel ditribution methods.
Subscribe via RSS