Sunday, July 27, 2008

Oracle RAC Wait Events

RAC Differences
The main difference to keep in mind when monitoring a RAC database versus a single-instance database, is the buffer cache and its operation. In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs. When a process in a RAC database needs to modify or read data, Oracle will first check to see if it already exists in the local buffer cache. If the data is not in the local buffer cache the global buffer cache will be reviewed to see if another instance already has it in their buffer cache. In this case the remote instance will send the data to the local instance via the high-speed interconnect, thus avoiding a disk read. Monitoring a RAC database often means monitoring this situation and the amount of requests going back and forth over the RAC interconnect. The most common wait events related to this are gc cr request and gc buffer busy.
gc cr request

This wait event, also known as global cache cr request prior to Oracle 10g, specifies the time it takes to retrieve the data from the remote cache. High wait times for this wait event often are because of:
RAC Traffic Using Slow Connection - typically RAC traffic should use a high-speed interconnect to transfer data between instances, however, sometimes Oracle may not pick the correct connection and instead route traffic over the slower public network. This will significantly increase the amount of wait time for the gc rc request event. The oradebug command can be used to verify which network is being used for RAC traffic:

SQL> oradebug setmypid
SQL> oradebug ipc

This will dump a trace file to the location specified by the user_dump_dest Oracle parameter containing information about the network and protocols being used for the RAC interconnect.
Inefficient Queries � poorly tuned queries will increase the amount of data blocks requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.
gc buffer busy
This wait event, also known as global cache buffer busy prior to Oracle 10g, specifies the time the remote instance locally spends accessing the requested data block. This wait event is very similar to the buffer busy waits wait event in a single-instance database and are often the result of:
Hot Blocks - multiple sessions may be requesting a block that is either not in buffer cache or is in an incompatible mode. Deleting some of the hot rows and re-inserting them back into the table may alleviate the problem. Most of the time the rows will be placed into a different block and reduce contention on the block. The DBA may also need to adjust the pctfree and/or pctused parameters for the table to ensure the rows are placed into a different block.
Inefficient Queries � as with the gc cr request wait event, the more blocks requested from the buffer cache the more likelihood of a session having to wait for other sessions. Tuning queries to access fewer blocks will often result in less contention for the same block.

2 comments:

dbacas said...

simple and nice explanation. keep it up. Enjoyed your post.

Vladimir said...

This is an excellent article.
Still, if you want something more hands-on, try these:
http://vgrigorian.com/11gsimulator/1_rac11gr2.htm
http://vgrigorian.com/11gsimulator/2_rac11gr2rdbms1.htm
http://vgrigorian.com/11gsimulator/3_rac11gasm.htm
http://vgrigorian.com/11gsimulator/4_11gr2dbcreate.htm

You can find more demos (including dataguard, goldengate, streams) there at http://vgrigorian.com/

Thanks.
Vladimir Grigorian