How ‘Consistent Read’ processing is dealt by Oracle Exadata cell server ?

Concept

Consistent read is one of the main concept among the ACID properties provided by Oracle. The one which supports this Consistency is UNDO in Oracle. But according to concepts of Exadata, cell/storage servers doesn’t communicate each other and thus constructing the consistent image of the database blocks by using UNDO data is not possible from storage layer as UNDO data itself is striped accross multiple storage servers which can’t be shared/pass among the cell servers to construct the consistent data. This has direct impact on Smart Scan efficiency as those entire blocks has to be shipped back to database servers for processing of consistent read with the help UNDO data. Oracle has implemented new optimizations in Exadata to minimize the impact on Smart Scan by reducing the back and forth communication between cell server and database server whenever Smart Scan finds the blocks having active lock byte set(ITL).

When Smart Scan reaches a row which has lock byte set it has to ship entire block back to database layer for constructing consistent blocks by taking advantage of UNDO data. But when lock byte is set and not cleared for the rows by the transaction which has already been committed then Exadata will not ship those blocks to database layer everytime due to implementation of new enhancements ‘commit cache’ and ‘minscn’ optimization. So Oracle has taken few steps to avoid slow down of Smart Scan when lock byte is set for the rows by transaction which is committed already, but when Smart Scan encounters the blocks where lock byte is set and transaction is not yet committed then it has no way of dealing it in cell server level instead it has to transfer all such blocks back to the database layer for consistent read processing.

Whenever query starts it usually compares it’s query start SCN with the cleanout scn of the blocks where the lock byte has been cleared, if query scn is greater than the cleanout scn found in the block then cell server will come to know that there is no need of rolling back that block for consistent read purpose. But if query scn is less than the cleanout scn found in the block, it has to be shipped back to database layer for consistent read processing which will have huge impact on Smart Scan as it have to ship back many such blocks to database layer where lock byte is set and at places where lock byte is set and has not been cleaned out. This ensures that whenever query starts using Smart Scan it sends its query scn to cell servers through iDB protocol. Even in these types of scenarios Exadata optimization like 'commit cache' and 'minscn' at cell server level will help to reduce the amount of blocks needs to be shipped back to database layer for consistent read processing.

Investigating it with session level statistics

Let’s create a table and set the storage property to not cache blocks of this table into flash cache for having stable performance metrics.

CREATE TABLE DEMO
  (
     id   NUMBER,
     pad  VARCHAR2(500)
  ) TABLESPACE users STORAGE (flash_cache NONE)
/  

Load the table such that we have about 10,000 blocks for easier conputation of various metrics.

INSERT INTO DEMO
SELECT rownum,
       rpad( '*', 500, '*' )
FROM   DUAL
CONNECT BY LEVEL <= 1.4e5
/

COMMIT
/

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'DEMO')
/

Before moving ahead with the setup let’s check how many blocks have been allocated for this table and how many of them are in use and free, as this information will be helpful while interpreting Smart Scan related statistics. The best way to get such details is by using dbms_space package as shown below.

declare
  UNFOR_BLK number;
  UNFOR_BYT number;
  FS1_BLK   number;
  FS1_BYT   number;
  FS2_BLK   number;
  FS2_BYT   number;
  FS3_BLK   number;
  FS3_BYT   number;
  FS4_BLK   number;
  FS4_BYT   number;
  FULL_BLK  number;
  FULL_BYT  number;
begin
  dbms_space.space_usage('TEST','DEMO','TABLE',UNFOR_BLK,UNFOR_BYT,FS1_BLK,FS1_BYT,FS2_BLK,FS2_BYT,FS3_BLK,FS3_BYT,FS4_BLK,FS4_BYT,FULL_BLK,FULL_BYT);
  dbms_output.put_line('Unformatted Blocks                  ==> '||UNFOR_BLK);
  dbms_output.put_line('Blocks having 0  to 25%  free space ==> '||FS1_BLK  );
  dbms_output.put_line('Blocks having 25 to 50%  free space ==> '||FS2_BLK  );
  dbms_output.put_line('Blocks having 50 to 75%  free space ==> '||FS3_BLK  );
  dbms_output.put_line('Blocks having 75 to 100% free space ==> '||FS4_BLK  );
  dbms_output.put_line('Blocks which are full               ==> '||FULL_BLK );
end;
/

Unformatted Blocks                  ==> 0
Blocks having 0  to 25%  free space ==> 1
Blocks having 25 to 50%  free space ==> 0
Blocks having 50 to 75%  free space ==> 0
Blocks having 75 to 100% free space ==> 97
Blocks which are full               ==> 9999

PL/SQL procedure successfully completed.

With above output we could say that this table has 9999+1=10000 blocks which are fully or atleast 75% used and has 97 free blocks which doesn’t contain any data.

Now setup a base table to capture all the session level staistics by using V$SESSTAT for a target session simulating different cases so that we can perform our analysis on these metrics later. This base table will have unique captured identifier to represent each snapshot of V$SESSTAT.

CREATE TABLE CAP_SES_STAT AS 
SELECT 0 AS ID, name, value
FROM v$statname NATURAL JOIN v$sesstat
WHERE 1=2
/

From session 1

Execute the UPDATE statement - [ update demo set id=id*50 ]
Flush buffer cache - [ alter system flush buffer_cache ]
Commit the transaction - [ commit ]
Perform conventional read of the table to clear the transaction lock byte set in the blocks. - [ alter session set "_serial_direct_read"=FALSE; select count(*) from demo]

From session 2 set the "_serial_direct_read" to ALWAYS to force direct path of the table regardless of table size and other factors, this ensures Smart Scan would happen on this table. After enabling this parameter this session has to perform "SELECT count(*) FROM DEMO" after each activity of session 1.

From session 3 capture the session level statistics of session 2 for each select statement execution with unique ID as shown below.

INSERT INTO CAP_SES_STAT
SELECT 1, name, value
FROM v$statname NATURAL JOIN v$sesstat
WHERE SID=2
/

When all the three sessions are done with their tasks we will have snapshots of session statistics for each different simulated case stored in table CAP_SES_STAT. With this information we can derive the result for each different case and do our analysis of Smart Scan dealing with Consistent Read of the blocks. Below sql is selecting only few statistics which are deemed to be worthful and is reporting the result of each case in columnar format using PIVOT so that results of each case can be compared easily.

SELECT   *
FROM (
      SELECT   id,
               name,
               value - Lag ( value ) over ( ORDER BY name, id ) AS value
      FROM     cap_ses_stat
      WHERE    name IN ('CPU used by this session',
                        'active txn count during cleanout',
                        'cell blocks helped by minscn optimization',
                        'cell blocks processed by cache layer',
                        'cell blocks processed by data layer',
                        'cell blocks processed by txn layer',
                        'cell commit cache queries',
                        'cell physical IO bytes eligible for predicate offload',
                        'cell physical IO interconnect bytes',
                        'cell physical IO interconnect bytes returned by smart scan',
                        'cell scans',
                        'cell transactions found in commit cache',
                        'cleanouts and rollbacks - consistent read gets',
                        'consistent gets',
                        'data blocks consistent reads - undo records applied',
                        'physical read total IO requests',
                        'physical read total multi block requests',
                        'cleanouts only - consistent read gets',
                        'session logical reads',
                        'physical reads',
                        'physical reads direct',
                        'cell blocks helped by commit cache',
                        'table scan blocks gotten',
                        'table scans (direct read)',
                        'table scan rows gotten' )
     ) 
	 pivot ( SUM ( value ) FOR id IN ( 1 AS before_commit,
                                       2 AS after_flush1,
                                       3 AS after_flush2,
                                       4 AS after_commit,
                                       5 AS after_scan ) )
/

NAME                                                       BEFORE_COMMIT AFTER_FLUSH1 AFTER_FLUSH2 AFTER_COMMIT AFTER_SCAN
---------------------------------------------------------- ------------- ------------ ------------ ------------ ----------
CPU used by this session                                              48           89           55            4          4
active txn count during cleanout                                   10000        10000        10000            0          0
cell scans                                                             1            1            1            1          1
cell physical IO bytes eligible for predicate offload           82714624     82714624     82714624     82714624   82714624
cell physical IO interconnect bytes                             81937784    105792280     81937632      2581416    2573224
cell physical IO interconnect bytes returned by smart scan      81937784     81937176     81937632      2573224    2573224
cell blocks processed by cache layer                               10127        10123        10126        10097      10097
cell blocks processed by data layer                                   97           97           97        10097      10097
cell blocks processed by txn layer                                    97           97           97        10097      10097
cell commit cache queries                                          10030        10026        10029        10000          0
cell transactions found in commit cache                                0            0            0        10000          0
cell blocks helped by commit cache                                     0            0            0        10000          0
cell blocks helped by minscn optimization                             97           97           97           97         97
cleanouts and rollbacks - consistent read gets                     10000        10000        10000            0          0
cleanouts only - consistent read gets                                  0            0            0            0          0
consistent gets                                                   300003       300003       300003        10100      10100
data blocks consistent reads - undo records applied               279903       279903       279903            0          0
physical read total IO requests                                      117         3025          116           88         87
physical read total multi block requests                              88           84           85           79         79
physical reads                                                     10097        13009        10097        10098      10097
physical reads direct                                              10097        10097        10097        10097      10097
session logical reads                                             300003       300003       300003        10100      10100
table scan blocks gotten                                           10000        10000        10000        10000      10000
table scan rows gotten                                            140000       140000       140000       140000     140000
table scans (direct read)                                              1            1            1            1          1

With above output it will be easy to conclude the concepts of Smart Scan dealing with consistent reads of the blocks. Before we proceed it’s important to understand the concepts of 'commit cache' and 'minscn' optimization of cell server for consistent read.

Commit Cache

It is an memory area in cell servers which keeps track of recently committed transaction. When smart scan hits the blocks having lock byte set then it will check commit cache area to find information of the transaction found in ITL section of the block, this check/query of commit cache area will increase the statistic 'cell commit cache queries' and if it finds the information of this transaction then it will increase the statistic 'cell blocks helped by commit cache' or 'cell transactions found in commit cache'. Without this optimzation it would overhead for cell server to ship all the blocks back to database server for consistent read processing which infact is very slow due to single block I/O.

Minscn

This is one more kind of optimization used by cell servers which keeps track of oldest open transaction SCN. This information will be helpful when smart scan hits the blocks having lock byte set, it can get the SCN from ITL entry and compare it to the oldest open transaction SCN and conclude whether this transaction has been committed or not. If smart scan is optimized with this optimization then statistic 'cell blocks helped by minscn optimization' will be increased. Before smart scan operation database server will send this minscn information to the cell servers to take advantage of it and avoid interaction with database layer for finding whether particular transaction has been committed or not. This minscn is maintained by MMON and is global in RAC, it can be queried from x$ktumascn table.

Let’s walk through each case and see how consistent read requests were handled intelligently.

Before Commit

In this case it did smart scan of the table DEMO after updating all the rows without committing the transaction. By looking at statistics 'cell scans' and 'table scans (direct read)' we can say for sure that smart scan has occurred but there was no reduction in bytes returned by the smart scan as statistics 'cell physical IO bytes eligible for predicate offload' , 'cell physical IO interconnect bytes' and 'cell physical IO interconnect bytes returned by smart scan' are all having same amount of processed bytes. Reason for poor performing smart scan can be found by looking at statistics related to cell blocks processed at three different layers(cache, data, txn) in the cell servers respectively. So in this case all the blocks(10127) were passed through cache layer but only 97 blocks out of them survived through data and txn layer due to minnscn optimization 'cell blocks helped by minscn optimization' and these 97 blocks are the one which doesn’t contain any data. Further we could see that consistent read were performed by constructing the consistent blocks using undo records, related statistic is 'data blocks consistent reads - undo records applied' because we found lock byte set for all the 10000 blocks according to statistic 'active txn count during cleanout'. There was no optimization done for this case by the cell server and it did tried to optimize it by querying the commit cache 'cell commit cache queries' but failed to get the details from commit cache as statistic 'cell transactions found in commit cache' is zero. Impact of this can be measured directly with the single block I/O operations, in this case there were 117 total I/O requests out of which 88 were multi block reads and remaining 117-88=29 were single block reads as per the staistics 'physical read total IO requests' and 'physical read total multi block requests'.

After buffer cache flush 1 and 2 run:

In this case it did smart scan after updating the table and flushing the buffer cache. As per the statistics impact on smart scan is most worst in this case when compared with other cases. This case has used highest CPU due to 3025 physical I/O (physical read total IO requests) out of which there were just 84 multi block reads(physical read total multi block requests) and thus there were whopping 3025-84=2968 single block reads. All other behavioural statistics are similar to previous case. So if most of the blocks needed by smart scan are on disk having open transaction lock byte set then impact will be worst as it has to perform single block physical reads for these blocks. But if smart scan is performed immediately after first smart scan then impact will not be huge as consistent blocks already exists in cache because of first smart scan. As per the statisticcs for second run it’s CPU usage less compared to first run but still high when considered in general with other cases. Even in this case there was no optimization done for this case by the cell server and it did tried to optimize it by querying the commit cache ‘cell commit cache queries’ but failed to get the details from commit cache as statistic 'cell transactions found in commit cache' is zero.

After commit of the transaction:

This case is the most interesting one due to optimzation trick used by the cell server and it also demonstrate the internal working of smart scan when dealing with 'delayed block cleanout'. Smart scan was performed after updating the table, flushing buffer cache and then commiting the transaction. This ensures that blocks on the disk will have lock byte set though the transaction has been committed. Since Exadata leverages the commit cache optimization for this committed transaction, smart scan was very efficient by returning just 2573224 bytes(cell physical IO interconnect bytes returned by smart scan) out of 82714624 bytes (cell physical IO bytes eligible for predicate offload). Commit cache was queried for each block having lock byte set(cell commit cache queries) and was success in getting the details of the committed transaction(cell blocks helped by commit cache) and thus back and forth trips to database layer for consistent read were avoided. The staistic 'active txn count during cleanout' says no active transaction found in all of the blocks though the lock byte is set, this clearly states that cell server has the detail of this committed transation in its commit cache. Due to this optimization there are no consistent block construction(data blocks consistent reads - undo records applied) and the 97 empty blocks were optimized due to minscn. This optimization is not present in non-exadata environment, but greatly used and helpful in Exadata environment while smart scan detects that block consist of lock byte set though transaction has been committed.

After clearing all the lock byte set in the blocks by performing conventional read of the table DEMO: This case is nothing special, we update the table, flush buffer cache, commit the transaction and then perform cleanout of the lock byte set in the blocks residing on the disk by querying the entire table through conventional path. This is the reason why sometimes select statement generates redo as it has to clear lock byte set which inturn generates the redo for the activity. In this case minscn optimization was used for 97 empty blocks and smart scan has returned just 2573224 bytes out of 82714624 bytes similar to previous case. Also in this case we don’t see any queries on commit cache(cell commit cache queries) as none of the blocks has lock byte set. This is a prefect demonstration of a case where there are no hurdles for smart scan unlike previous cases.

Conclusion

Exadata cell server optimizations like Commit cache and minscn are very helpful when smart scan encounters active transaction in the blocks. Minscn is the intial approach where it compares the SCN found in the ITL of the block with the oldest active transaction scn(minscn) and decides whether it can read the block as it is or it has to construct the consistent block. When minscn is not helpful it will try to optimize smart scan by using commit cache consisting details of all the recent committed transaction.

This article shows the impact of consistent read processing on smart scan and few optimization used by the cell server to overcome the overhead of communicating with database layer for consistent read processing. All these details will be very helpful when dealing with Smart Scan efficiency issues, most prominent indication of these kind of issues will be single block I/O along with Smart Scan. So whenever we see high contribution of 'cell single block physical reads' along with 'smart scan' we need to drill down further for finding the reason behind it.