Thursday 22 November 2012

'log file sync' wait troubleshooting

 'log file sync' wait

When a user session commits, the session's redo information needs to be flushed from memory to the redo logfile to make it permanent.

At the time of commit, the user session will post the LGWR to write the log buffer (containing the current unwritten redo, including this session's redo information) to the redo log file. When the LGWR has finished writing, it will post the user session to notify it that this has completed. The user session waits on 'log file sync' while waiting for LGWR to post it back to confirm all redo changes have made it safely on to disk.

The time between the user session posting the LGWR and the LGWR posting the user after the write has completed is the wait time for 'log file sync' that the user session will show.
Note that if a sync is ongoing, other sessions that want to commit (and thus flush log information) will also wait for the LGWR to sync and will also wait on 'log file sync'?

What should be collected for initial diagnosis of 'log file sync' waits ?

To initially analyse 'log file sync' waits the following information is helpful:
• AWR report from a similar time frame and period where 'log file sync' waits are not experienced in order to use as a baseline for reasonable performance for comparison purposes
• AWR report when 'log file sync' waits are occurring
Note: The 2 reports should be for between 10-30 minutes each.
• Lgwr trace file
The lgwr trace file will show warning messages for periods when 'log file parallel wait' may be high
What causes high waits for 'log file sync'?
Waits for the 'log file sync' event can occur at any stage between a user process posting the LGWR to write redo information and the LGWR posting back the user process after the redo has been written from the log buffer to disk and the user process waking up.

• Issues affecting LGWR's IO Performance
The primary question we are looking to answer here is "Is LGWR slow in writing to disk?". The following steps can assist determine whether this the case or not.
Compare the average wait time for 'log file sync' to the average wait time for 'log file parallel write'.


Wait event 'log file parallel' write is waited for by LGWR while the actual write
operation to the redo is occurring. The duration of the event shows the time waited
for the IO portion of the operation to occur. For more information on "log
file parallel write"


If the proportion of the 'log file sync' time spent on 'log file parallel write' times is high, then most of the wait time is due to IO (waiting for the redo to be written). The performance of LGWR in terms of IO should be examined. As a rule of thumb, an average time for 'log file parallel write' over 20 milliseconds suggests a problem with IO subsystem.

Recommendations
o Work with the system administrator to examine the filesystems where the redologs are located with a view to improving the performance of IO.
o Do not put redo logs on RAID 5.
o Do not put redo logs on Solid State Disk (SSD)
Although generally, Solid State Disks write performance is good on average, they may endure write peaks which will highly increase waits on 'log file sync'
o Look for other processes that may be writing to that same location and ensure that the disks have sufficient bandwith to cope with the required capacity. If they don't then move the activity or the redo.
o Ensure that the log_buffer is not too big. A very large log_buffer can have an adverse affect as waits will be longer when flushes occur. When the buffer fills up, it has to write all the data into the redo log file and the LGWR will wait unitl the last I/O is completed.
Check LGWR Traces
Even if the average wait for 'log file parallel write' may be in the normal range, there may be peaks where the write time is longer and will therefore influence waits on 'log file sync'. From 10.2.0.4, messages are written in the LGWR trace when a write to the log file takes more than 500 ms. This is quite a high threshold so a lack of messages does not necessarily mean there is no problem. The messages look similar to the following:

*** 2012-11-21 23:18:25.669
Warning: log write elapsed time 1292ms, size 1151KB

*** 2012-11-21 23:18:53.543
Warning: log write elapsed time 556ms, size 741KB

*** 2012-11-21 23:21:29.785
Warning: log write elapsed time 2392ms, size 723KB

*** 2012-11-22 00:24:35.743
Warning: log write elapsed time 618ms, size 2009KB

*** 2012-11-22 00:30:30.525
Warning: log write elapsed time 539ms, size 5943KB

Note: Peaks like the following may not have a high influence on the 'log file parallel wait' if they are far between. However , if 100s of sessions are waiting for the 'log file parallel wait' to complete, the total wait for 'log file sync' can be high as the wait time will be multiplied for thr 100s of sessions. Therefore it is worth investigating the reason for the high peaks in IO for the log writer.

Recommendations
o Work with the system administrator to examine what else is happening at this time that may be causing the peaks in LGWR writing to disk
o Truss of the LGWR process when the slow down is occurring may help identify where time is going

Note: These warnings can be particularly useful for preempting potential issues. Even if a general problem in terms of the average wait time is not been seen, by highlighting extreme peaks of IO performance, a dba can have a useful indicator that LGWR is encountering intermittent issues. These can then be resolved before they cause outages or similar.

Check to see if redo logs are large enough

A 'log file sync' operation is performed every time the redo logs switch to the next log to ensure that everything is written before the next log is started. Standard recommendations are that a log switch should occur at most once every 15 to 20 minutes. If switches occur more frequently than this, then more 'log file sync' operations will occur meaning more waiting for individual sessions.
o Check the time between log file switches in alert.log
Thu Jun 02 14:57:01 2011
Thread 1 advanced to log sequence 2501 (LGWR switch)
Current log# 5 seq# 2501 mem# 0: /opt/oracle/oradata/orcl/redo05a.log
Current log# 5 seq# 2501 mem# 1: /opt/oracle/logs/orcl/redo05b.log
Thu Nov 03 14:59:12 2011
Thread 1 advanced to log sequence 2502 (LGWR switch)
Current log# 6 seq# 2502 mem# 0: /opt/oracle/oradata/orcl/redo06a.log
Current log# 6 seq# 2502 mem# 1: /opt/oracle/logs/orcl/redo06b.log
Thu Nov 03 15:03:01 2011
Thread 1 advanced to log sequence 2503 (LGWR switch)
Current log# 4 seq# 2503 mem# 0: /opt/oracle/oradata/orcl/redo04a.log
Current log# 4 seq# 2503 mem# 1: /opt/oracle/logs/orcl/redo04b.log

o You can also check the average time for log switch in the AWR report

The example above shows that based on the information in AWR, there are 29.98 redo logs switches per hour: ~1 switch every 2 minutes. This is higher than the accepted value of 1 switch every 15-20 minutes and will have an affect on the time foreground process will need to wait for 'log file sync' waits to complete because the overhead of initiating the sync operation more than necessary.

• Excessive Application Commits
In this case the question to answer is "Is the Application Committing too Frequently?".
If it is , then the excessive commit activity can cause performance issues since commits flush redo from the log buffer to the redo logs which can cause waits for 'log file sync'.

To identify a potential high commit rate, if the average wait time for 'log file sync' is much higher than the average wait time for 'log file parallel write', then this means that most of the time waiting is not due to waiting for the redo to be written and thus slow IO is not the cause of the problem. The surplas time is CPU activity and is most commonly contention caused by over committing.

Additionally, if the average time waited on 'log file sync' is low, but the number of waits is high, then the application might be committing too frequently.

Compare the average user commits to user calls using AWR report

 if the user calls/user commits (average user calls per commit) is less than 30, then commits are happening too frequently:

 Recommendations

o If there are lots of short duration transactions, see if it is possible to group transactions together so there are fewer distinct COMMIT operations. Since it is mandatory for each commit to receive confirmation that the relevant REDO is on disk, additional commits can add significantly to the overhead. Although commits can be "piggybacked" by Oracle, reducing the overall number of commits by batching transactions can have a very beneficial effect.
o See if any of the processing can use the COMMIT NOWAIT option (be sure to understand the semantics of this before using it).
o See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options.
























Tuesday 20 November 2012

CPU WAIT EVENT

Scheduling and Switching

The operating system may spend excessive time scheduling and switching processes. Examine the way in which you are using the operating system, because you could be using too many processes. On NT systems, do not overload your server with too many non-Oracle processes.


Oracle CPU Utilization

This section explains how to examine the processes running in Oracle. Three dynamic performance views provide information on Oracle processes:

V$SYSSTAT shows Oracle CPU usage for all sessions. The statistic "CPU used by this session" shows the aggregate CPU used by all sessions.
V$SESSTAT shows Oracle CPU usage per session. You can use this view to determine which particular session is using the most CPU.
V$RSRC_CONSUMER_GROUP shows CPU utilization statistics on a per consumer group basis, if you are running the Oracle Database Resource Manager.

Reparsing SQL Statements

When Oracle executes a SQL statement, it parses it to determine whether the syntax and its contents are correct. This process can consume significant overhead. Once parsed, Oracle does not parse the statement again unless the parsing information is aged from the memory cache and is no longer available. Ineffective memory sharing among SQL statements can result in reparsing. Use the following procedure to determine whether reparsing is occurring:

Get the parse time CPU and CPU figures used by this session from the "Statistics" section of the estat report or from V$SYSTATS. For example:

SELECT * FROM V$SYSSTAT
WHERE NAME IN('parse time cpu', 'parse time elapsed', 'parse count (hard)');


Now you can detect the general response time on parsing. The more your application is parsing, the more contention exists, and the more time your system spends waiting. If parse time CPU represents a large percentage of the CPU time, then time is being spent parsing instead of executing statements. If this is the case, then it is likely that the application is using literal SQL and not sharing it, or the shared pool is poorly configured.

Query V$SQLAREA to find frequently reparsed statements. For example:

SELECT SQL_TEXT, PARSE_CALLS, EXECUTIONS
FROM V$SQLAREA
ORDER BY PARSE_CALLS;


Tune the statements with the higher numbers of parse calls.

If the parse time CPU is only a small percentage of the total CPU used, then you should determine where the CPU resources are going. There are several things you can do to help with this.

Find statements with large numbers of buffer gets, because these are typically heavy on CPU.

The following statement finds SQL statements which frequently access database buffers. Such statements are probably looking at many rows of data.

SELECT ADDRESS, HASH_VALUE, BUFFER_GETS, EXECUTIONS,
BUFFER GETS/EXECUTIONS "GETS/EXEC", SQL_TEXT
FROM V$SQLAREA
WHERE BUFFER_GETS > 50000
AND EXECUTIONS > 0
ORDER BY 3;


This example shows which SQL statements have the most buffer_gets and use the most CPU. The statements of interest are those with a large number of gets per execution, especially if execution is high. It is very beneficial to have an understanding of the application components to know which statements are expected to be expensive.

After candidate statements have been isolated, the full statement text can be obtained using the following query, substituting relevant values for ADDRESS and HASH_VALUE pairs. For example:

SELECT SQL_TEXT
FROM V$SQLTEXT
WHERE ADDRESS='&ADDRESS_WANTED'
AND HASH_VALUe=&HASH_VALUE
ORDER BY piece;


The statement can then be explained (using EXPLAIN PLAN) or isolated for further testing to see how CPU-intensive it really is. If the statement uses bind variables and if your data is highly skewed, then the statement may only be CPU-intensive for certain bind values.

Find which sessions are responsible for most CPU usage. The following statement helps locate sessions which have used the most CPU:

SELECT v.SID, SUBSTR(s.NAME,1,30) "Statistic", v.VALUE
FROM V$STATNAME s, V$SESSTAT v
WHERE s.NAME = 'CPU used by this session'
AND v.STATISTIC# = s.STATISTIC#
AND v.VALUE > 0
ORDER BY 3;


Note:
CPU time is cumulative; therefore, a session that has been connected for several days may appear to be heavier on CPU than one that has only been connected for a short period of time. Thus, it is better to write a script to sample the difference in the statistic between two known points in time, letting you see how much CPU was used by each session in a known time frame.

After any CPU-intensive sessions have been identified, the V$SESSION view can be used to get more information. At this stage, it is generally best to revert to user session tracing (SQL_TRACE) to determine where the CPU is being used.

Trace typical user sessions using the SQL_TRACE option to see how CPU is apportioned amongst the main application statements.

After these statements have been identified, you have the following three options for tuning them:

Rewrite the application so that statements do not continually reparse.
Reduce parsing by using the initialization parameter SESSION_CACHED_CURSORS.
If the parse count is small, the execute count is small, and the SQL statements are very similar except for the WHERE clause, then you may find that hard coded values are being used instead of bind variables. Use bind variables to reduce parsing.



If your system must scan too many buffers in the foreground to find a free buffer, then it wastes CPU resources. To alleviate this problem, tune the DBWn process(es) to write more frequently.

Wait Detection

Whenever an Oracle process waits for something, it records it as a wait using one of a set of predefined wait events. (See V$EVENT_NAME for a list of all wait events). Some of these events can be considered idle events; i.e., the process is waiting for work. Other events indicate time spent waiting for a resource or action to complete. By comparing the relative time spent waiting on each wait event and the "CPU used by this session" (from above), you can see where the Oracle instance is spending most of its time. To get an indication of where time is spent, follow these steps:

Review either the V$SYSTATS view or the wait events section of the AWR report.
Ignore any idle wait events. Common idle wait events include:
Client message
SQL*Net message from client
SQL*Net more data from client
RDBMS IPC message
Pipe get
Null event
PMON timer
SMON timer
Parallel query dequeue
Ignore any wait events that represent a very small percentage of the total time waited.
Add the remaining wait event times, and calculate each one as a percentage of total time waited.
Compare the total time waited with the CPU used by this session figure.
Find the event with the largest wait event time. This may be the first item you want to tune.
Latch Contention

Latch contention is a symptom of CPU problems; it is not usually a cause. To resolve it, you must locate the latch contention within your application, identify its cause, and determine which part of your application is poorly written.

In some cases, the spin count may be set too high. It's also possible that one process may be holding a latch that another process is attempting to secure. The process attempting to secure the latch may be endlessly spinning. After a while, this process may go to sleep and later resume processing and repeat its ineffectual spinning. To resolve this:

Check the Oracle latch statistics. The "latch free" event in V$SYSTEM_EVENT shows how long processes have been waiting for latches. If there is no latch contention, then this statistic does not appear. If there is a lot of contention, then it may be better for a process to go to sleep at once when it cannot obtain a latch, rather than use CPU time by spinning.
Look for the ratio of CPUs to processes. If there are large numbers of both, then many processes can run. But, if a single process is holding a latch on a system with ten CPUs, then reschedule that process so it is not running. But, ten other processes may run ineffectively trying to secure the same latch. This situation wastes, in parallel, some CPU resource.
Check V$LATCH_MISSES, which indicates where in the Oracle code most contention occurs.

VMSTATS

]$ vmstat -a

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free inact active si so bi bo in cs us sy id wa st
1 0 3018448 444680 41001060 54450332 1 1 1363 514 0 0 5 0 93 2 0


FIELD DESCRIPTION FOR VM MODE

Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.


Memory
swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)

Swap
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).

IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).

System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.

CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

$ vmstat -d
disk- ------------reads------------ ------------writes----------- -----IO------
total merged sectors ms total merged sectors ms cur sec
ram0 0 0 0 0 0 0 0 0 0 0
ram1 0 0 0 0 0 0 0 0 0 0
sda 23491424 38630865 1269137454 229607384 640208708 443248801 47228429413 2423012429 0 420137
loop0 0 0 0 0 0 0 0 0 0 0
loop1 0 0 0 0 0 0 0 0 0 0

FIELD DESCRIPTION FOR DISK MODE Reads
total: Total reads completed successfully
merged: grouped reads (resulting in one I/O)
sectors: Sectors read successfully
ms: milliseconds spent reading

Writes
total: Total writes completed successfully
merged: grouped writes (resulting in one I/O)
sectors: Sectors written successfully
  ms: milliseconds spent writing

IO
cur: I/O in progress
s: seconds spent for I/O