|
Description
|
The customer's Oracle database is terminating after an async i/o issued by
an Oracle dbwriter has not returned in 10 minutes. The error 'ora 27062'
is logged.
A forced crash dump at the point where Oracle flags the error shows the
following:
We have 10 dbwriters all issuing async i/o to the database which is on a
vxfs filesystem. When the crash dump was taken, there were 1457 threads,
belonging to the 10 dbwriter processes, which were all waiting for an exclusive
writer lock on the vxfs lock which serialises writes to a file
( the lock is of type vx_rwsleep_rec, which is a vxfs implimentation of a
reader/writer lock ). When a thread needs to wait for a writer lock, we call
cv_wait() which results in the thread being inserted in a sleep queue. When
the owner releases the lock, cv_signal() is called resulting in the first
thread on the sleep queue being woken up.
The threads for dbwriters 2 to 10 are predominatly at priority 59, while
the threads for dbwriter 1 are at a variety of lower priorities ( I understand
the first dbwriter has some 'master' resonsibilties, and hence probably uses
more cpu than dbwriters 2 to 10, which have a more dedicated i/o function )
Three threads in particluar, belonging to the first dbwriter, have been waiting
for 10 minutes on the sleep queue, at priority 17.
cv_wait()/cv_block() will call ts_sleep() to bump up the threads priority a
little ( using ts_slpret from the ts dispatch table ), before calling
sleepq_insert() to insert the thread onto the sleep queue in priority
order. A thread with low prioriy is put toward the back of the queue.
Because the priority of threads is currently not adjusted while on the sleepq
( ts_udpate_list() updates dispwait, but skips the recalculation of priority
while a thread is in state TS_SLEEP ), low priority threads can be
victimised when higher priority threads are continuously joining the same
sleep queue.
james.mcpherson@aus 2001-07-10
The above condition does not just affect oracle and async io! Under solaris 2.6 KT Freetel's
heavily loaded domain papa2 failed to respond to pings. Analysis of the core revealed that
there were several threads on the sleepq waiting for the muxifier mutex and the thread which
held muxifier was prioritised down to 5. Most of the other threads on the sleepq were
prioritised way, way down (below 10) from what one might expect them to be at.
The result was therefore that the customer perceived the domain to have hung so they dropped
it and generated a crash dump.
While trying to determine how pathological the scheduler / priority determination algorithm
can get I came across bugids 4042155 and 4246211 which are earlier accounts of this current bug.
This customer is particularly susceptible to this problem due to (1) incredibly agressive
system tuning and (2) heavily overloading the domain - in spite of APAC TSG and Geo SSE recommendations.
cores and explorers are available at
/net/necrom.aus/tsg/calls/10096228
Radiance/APAC case number is 10096228.
xxxxx@xxxxx.com 2001-07-12
Company xxxxx
[Bob Sneed wrote ...]
The Telephone Data Systems (TDS) case (Radiance #62562532) demonstrates
that a large number of threads are not required to provoke this thread
starvation issue. In the TDS case, a small set of Oracle client processes
(< 100) concurrently trying to use the same file for disk sorting resulted
in client process failures from ORA-27062.
Also of great concern to the customer is that the same workload completes
with no client failures on their NT system, and that the variance of
client completion times is dramatically less under NT.
The threads on sleep queue with low priority will never get chance to run. Need built in mechanism to
give fair chance to low priority threads to run in timely manner.
Oracle data, explorer output and unix.1 and vmunix.1 files at /net/cores.central/cores/62562532.
System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise E4500/E5500
System clock frequency: 100 MHz
Memory size: 4096Mb
SunOS metaunix 5.7 Generic_106541-16
ORACLE RDBMS Version: 8.1.6.1.0.
System was force panic to produce core file during Oracle client 10 min timeouts in trace file.
WARNING: aiowait timed out 1 times
Oracle is running with default (1) db_writer_process.
Threads summary:
156 threads ran in the last second (106 user, 50 kernel)
377 threads ran in the last minute (318 user, 59 kernel)
19 runnable threads (16 user, 3 kernel)
0 zombied threads
1 stopped threads (0 user, 1 kernel)
55 free_threads (0 user, 55 kernel)
0 mutexes pending
1* rwlocks pending (1 user, 0 kernel)
404 condition variables pending (285 user, 119 kernel)
2* semaphores pending (1 user, 1 kernel)
0 user-level sobjs pending
16 shuttle (doors) sobjs pending (16 user, 0 kernel)
2* threads in biowait (2 user, 0 kernel)
19 threads in dispatch queues (16 user, 3 kernel)
0 swapped threads
0 interrupt threads running
1067 total threads (876 user, 191 kernel)
There are 71 blocked oracle threads, 68 via pwrite(), 3 via pread()..
8 have run in the last 2.5 seconds (tspri>=49), while the other 63 (tspri<=46)
have not run for >= 598 seconds.
cmd: oracleTBSP tid: 0x300080cd9e0 pri: 58(TS) idle: 0.03 sec pread+0x118
cmd: oracleTBSP tid: 0x300080cd760 pri: 58(TS) idle: 0.03 sec pread+0x118
cmd: oracleTBSP tid: 0x30008072320 pri: 58(TS) idle: 0.13 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007deafa0 pri: 58(TS) idle: 0.22 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007ad79a0 pri: 58(TS) idle: 0.08 sec pread+0x118
cmd: oracleTBSP tid: 0x300075f1d00 pri: 58(TS) idle: 0.09 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300075c6160 pri: 58(TS) idle: 0.09 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007656e40 pri: 49(TS) idle: 2.22 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007d5ea60 pri: 46(TS) idle: 598.58 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300078fcde0 pri: 46(TS) idle: 599.48 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300076ac700 pri: 46(TS) idle: 600.04 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300076ac480 pri: 46(TS) idle: 600.03 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e6fc80 pri: 45(TS) idle: 598.39 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e6f780 pri: 45(TS) idle: 598.39 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e374e0 pri: 45(TS) idle: 598.48 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e37260 pri: 45(TS) idle: 598.48 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e134c0 pri: 45(TS) idle: 598.35 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e13240 pri: 45(TS) idle: 598.35 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007d90800 pri: 45(TS) idle: 598.57 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007d90580 pri: 45(TS) idle: 598.57 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300075a5a40 pri: 45(TS) idle: 601.24 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007f47ac0 pri: 44(TS) idle: 598.4 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007f47840 pri: 44(TS) idle: 598.4 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007f461c0 pri: 44(TS) idle: 598.37 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007edb560 pri: 44(TS) idle: 598.08 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007edb2e0 pri: 44(TS) idle: 598.08 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007eb1040 pri: 44(TS) idle: 598.49 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007eb0dc0 pri: 44(TS) idle: 598.49 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e7a8a0 pri: 44(TS) idle: 598.49 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e7a620 pri: 44(TS) idle: 598.4 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007d411a0 pri: 44(TS) idle: 598.22 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007d40f20 pri: 44(TS) idle: 598.22 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300079d7100 pri: 44(TS) idle: 599.28 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300079d6e80 pri: 44(TS) idle: 599.28 sec pwrite+0x148
cmd: oracleTBSP tid: 0x3000771aca0 pri: 44(TS) idle: 600.35 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300074ab200 pri: 44(TS) idle: 601.29 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007f37320 pri: 43(TS) idle: 598.4 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007f370a0 pri: 43(TS) idle: 598.4 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007ef5080 pri: 43(TS) idle: 598.45 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e6e600 pri: 43(TS) idle: 598.48 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007e6e380 pri: 43(TS) idle: 598.49 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007d522c0 pri: 43(TS) idle: 598.57 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007d52040 pri: 43(TS) idle: 598.59 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007891ca0 pri: 43(TS) idle: 599.71 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007868100 pri: 43(TS) idle: 599.3 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300074f7c40 pri: 43(TS) idle: 601.21 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300074b40a0 pri: 43(TS) idle: 601.16 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300077c6820 pri: 38(TS) idle: 600.05 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30008047c00 pri: 36(TS) idle: 598.29 sec pwrite+0x148
cmd: oracleTBSP tid: 0x3000802a060 pri: 36(TS) idle: 598.3 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007bdba60 pri: 36(TS) idle: 598.8 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007bdb7e0 pri: 36(TS) idle: 598.4 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007ab0d00 pri: 36(TS) idle: 599.09 sec pwrite+0x148
cmd: oracleTBSP tid: 0x3000757f520 pri: 36(TS) idle: 601.27 sec pwrite+0x148
cmd: oracleTBSP tid: 0x3000757f2a0 pri: 36(TS) idle: 601.27 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007fdcf20 pri: 35(TS) idle: 598.31 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300080fba00 pri: 34(TS) idle: 598.22 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300080fb780 pri: 34(TS) idle: 598.21 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30008046580 pri: 34(TS) idle: 598.27 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30008046300 pri: 34(TS) idle: 598.26 sec pwrite+0x148
cmd: oracleTBSP tid: 0x3000802b960 pri: 34(TS) idle: 598.31 sec pwrite+0x148
cmd: oracleTBSP tid: 0x3000802b6e0 pri: 34(TS) idle: 598.31 sec pwrite+0x148
cmd: oracleTBSP tid: 0x300080011c0 pri: 34(TS) idle: 598.31 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007faaa00 pri: 34(TS) idle: 598.17 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007faa780 pri: 34(TS) idle: 598.17 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007f72480 pri: 34(TS) idle: 598.19 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007f72200 pri: 34(TS) idle: 598.19 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30008092fc0 pri: 24(TS) idle: 598.24 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30008092d40 pri: 24(TS) idle: 598.24 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007ef5d00 pri: 24(TS) idle: 598.43 sec pwrite+0x148
cmd: oracleTBSP tid: 0x30007ef5a80 pri: 24(TS) idle: 598.33 sec pwrite+0x148
All threads with pri <=46 are ready to issue aiowait timed error message in
Oracle trace files..
james.mcpherson@aus 2001-08-06
xxxxx had an ora-27062 fallover a few days ago. The first core
(0) showed a failfast timeout panic with the system mpsun10 having been up for
around 40 days. I checked the rt and ts classes. The RT class had 31 threads
in it, and these were for the usual suncluster processes. The TS class showed
one slightly interesting thread - 0x3001dea9040, pid 12812 for the reboot
command which was in suspend and idle for 153 ticks (1.53 seconds). The process
table when viewed tree-style shows this:
fm3(vmcore.0):6> proc tree
0 sched
3 fsflush
2 pageout
1 /etc/init -
12833 /usr/lib/saf/sac -t 300
12850 /usr/lib/saf/ttymon
8601 ksh -o vi
12812 reboot
3037 /opt/openFT/ft/ftvfsm -a
3033 /opt/bin/fta -sn
1886 /opt/SUNWpnm/bin/pnmd -s -c mpsunc01 -l 0
1427 /opt/SUNWcluster/bin/ccdd -f /etc/opt/SUNWcluster/conf/ccd.database.init
1373 /opt/SUNWcluster/bin/clustd -f /etc/opt/SUNWcluster/conf/mpsunc01.cdb
12813 /bin/ksh -p /opt/SUNWcluster/bin/reconf_ener cmmabort mpsunc01
12861 /bin/ksh /opt/SUNWcluster/etc/reconf/conf.d/rcA.d/05_loghost
12893 /opt/SUNWcluster/bin/timed_run 10 /opt/SUNWcluster/ha/oracle/oracle_fm_stop cw
12894 /bin/ksh /opt/SUNWcluster/ha/oracle/oracle_fm_stop cwhlh2,cwhlh1 60
12905 /bin/ksh /opt/SUNWcluster/bin/oracle_status_svcs -mode all -hosts cwhlh2,cwhlh1
12916 tr \012
12917 hareg -q oracle -H
12918 sh -c /opt/SUNWcluster/bin/ccdadm mpsunc01 -w
1093 -sh
1065 /opt/SUNWsma/bin/smad
1066 /opt/SUNWsma/bin/smad
783 /opt/etc/tnsxd
60 /usr/lib/devfsadm/devfseventd
18 vxconfigd -m boot
which I take to indicate that the cluster software was trying to shutdown the
system because the reboot command had been issued. There are some hardware error
messages listed in the msgbuf for ssd14 (/sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@w2100002037901d75,0)
which is c4t19d0. There are also some read and vxfs vx_nospace errors,
presumably as as result of c4t19d0's unrecoverable media errors.
I didn't see anything else of interest in core 0.
The second core (2) did show some interesting things. Firstly, there are three
Oracle instances running - DWHI, DMPOI and DWH_TEST. Both DWHI and DWH_TEST have
three dbwrs, and DMPOI only has one. Interestingly, each dbwr had 258 threads
associated with it. DWHI's ckpt had 23 threads, it's lgwr had 24. DMPOI's ckpt
and lgwr each had 11 threads. DWH_TEST's ckpt had 11 while its lgwr had 38
threads. Out of the 203 processes in the system 88 were oracle-owned or
oracle-related.
Three threads were in biowait, one listener and two dbwr but all for the DWHI instance:
thread: 0x30021c59820 pid: 6311 cmd: oracleDWHI (LOCAL=NO)
idle: 2 ticks (0.02 seconds)
age: 14 ticks (0.14 seconds)
buf @ 0x3000dec4730
b_bcount: 49152
b_edev: 209(vxio),85004
b_vp: 0x0
b_blkno: 0x3e610e0
thread: 0x3000ef68d40 pid: 5276 cmd: ora_dbw2_DWHI
idle: 0 ticks (0 seconds)
age: 49271 ticks (8 minutes 12.71 seconds)
buf @ 0x3000de347e8
b_bcount: 16384
b_edev: 209(vxio),85004
b_vp: 0x0
b_blkno: 0x3358400
thread: 0x30007d89a60 pid: 5276 cmd: ora_dbw2_DWHI
idle: 0 ticks (0 seconds)
age: 43595 ticks (7 minutes 15.95 seconds)
buf @ 0x300113077b8
b_bcount: 16384
b_edev: 209(vxio),85001
b_vp: 0x0
b_blkno: 0x4f8b4f0
3 matching threads found.
threads in biowait() by device:
count device
2 209(vxio),85004
1 209(vxio),85001
And the same three threads are waiting on semaphores. The filesystems which
have 209,8500[4|1] as major,minor are
0x30008316c18 0x300002cabc8/vx_vfs 209(vxio),85004 vxfs /cwhlh2/(0x3000821dc48<VROOT>)/oradata05
0x300034dfeb8 0x30007709480/vx_vfs 209(vxio),85001 vxfs /cwhlh2/(0x3000821dc48<VROOT>)/oradata02
For condition variables (cv) there are 1052 threads, with 651 waiting on the
same one: 0x3000e304fc8. Amazingly (not!) these are all for
the DWHI instance:
228 ora_dbw0_DWHI
231 ora_dbw1_DWHI
188 ora_dbw2_DWHI
4 oracleDWHI
And the list of those which were waiting more than 1 minute is
thread pri idle pid wchan command
0x3001706c000 31 10m1.41s 3462 0x3000e304fc8 oracleDWHI (LOCAL=NO)
0x30011334800 31 10m1.39s 3462 0x3000e304fc8 oracleDWHI (LOCAL=NO)
0x30016553280 31 10m1.38s 3462 0x3000e304fc8 oracleDWHI (LOCAL=NO)
0x3001707e320 31 10m1.37s 3462 0x3000e304fc8 oracleDWHI (LOCAL=NO)
0x3000ef41800 49 26m32.41s 5268 0x3000e304fc8 ora_dbw0_DWHI
0x3000ee5f2e0 49 28m25.27s 5268 0x3000e304fc8 ora_dbw0_DWHI
0x30003bcba40 49 34m55.72s 5268 0x3000e304fc8 ora_dbw0_DWHI
0x3000ef002e0 49 2m55.90s 5268 0x3000e304fc8 ora_dbw0_DWHI
0x3000dd7d820 49 14m42.80s 5272 0x3000e304fc8 ora_dbw1_DWHI
0x3000ee20560 49 34m44.80s 5272 0x3000e304fc8 ora_dbw1_DWHI
0x3000eefcae0 49 5m38.21s 5272 0x3000e304fc8 ora_dbw1_DWHI
0x3000ee765c0 49 34m9.99s 5276 0x3000e304fc8 ora_dbw2_DWHI
0x3000edf8820 49 34m9.73s 5276 0x3000e304fc8 ora_dbw2_DWHI
0x3000eda6da0 49 25m38.95s 5276 0x3000e304fc8 ora_dbw2_DWHI
This core also shows the vx_nospace message, consistently for the /dev/vx/dsk/cwhlh2dg/vol01
filesystem which is mounted as /cwhlh2/oradata01.... Let's check the memory...
meminfo shows that priority_paging was not operational, desfree and lotsfree
were not set. They are using vxfs and have not tuned ncsize to be between
50-80% of the vxfs_ninode value. With the defaults for this system ncsize is
set at approx 7.9% of vxfs_ninode which is not quite good enough.
The RT class (52 threads) shows the usual cluster processes (clustd, rpc.pnmd
etc), nothing unusual or interesting. The TS class (2440 threads) shows that
most of the threads it contains are at priority 58:
num priority
9 0
5 10
2 12
4 20
2 21
5 24
11 25
2 28
1 29
4 30
13 31
6 32
3 33
30 34
26 38
1 42
2 43
1 44
3 45
31 48
43 49
5 50
6 52
18 53
6 54
2 55
2037 58
161 59
Just as a frinstance if we check the threads at priority 48, then we see that
23 of these are lgwr threads for the DWH_TEST instance, and they have been idle
for approx 2 days 17 hours 34 minutes and 43 seconds. I would not normally have
said that priority 48 was all that low. Priority 58 has the interesting stuff -
heaps of threads from DWH_TEST idle for more than 13 hours, and lots of threads
from DWHI which have been idle for more than 1 day.
Each dbwr has some threads in the kaio stack but the majority of the kaio
threads are all from the one instance - DWHI. Those threads are also all (except
for one listener) at priority 58 and have been idle in aio_cleanup_thread for
more than 10 minutes. The threads in aiowait have been idle for between 1 tick
and 145 ticks.
Coming back to the fm thread summary:
173 threads ran in the last second (105 user, 68 kernel)
976 threads ran in the last minute (886 user, 90 kernel)
2 runnable threads (0 user, 2 kernel)
0 zombied threads
1 stopped threads (0 user, 1 kernel)
80 free_threads (0 user, 80 kernel)
0 mutexes pending
0 rwlocks pending
1052 condition variables pending (926 user, 126 kernel)
4* semaphores pending (3 user, 1 kernel)
1538 user-level sobjs pending (1538 user, 0 kernel)
0 shuttle (doors) sobjs pending
3* threads in biowait (3 user, 0 kernel)
2 threads in dispatch queues (0 user, 2 kernel)
2* threads in dispq of cpu running idle thread (0 user, 2 kernel)
0 swapped threads
0 interrupt threads running
2712 total threads (2486 user, 226 kernel)
This clearly isn't a hung system - I'd say it was just overloaded ;|
Infineon's system is an e6500, 8x400MHz/8Mb cache cpus, 8Gb ram. The root/usr/var fs and swap are all mirrored with vxvm. The cluster filesystems (shared dgs etc) are all on vxfs.
|
|
Work Around
|
Use vxfs quick i/o which avoids the bottleneck of a lock serialising write
access to the file.
bob.sneed@East 2001-07-16
UFS forcedirectio (S8 @ U3) also avoids the single-writer lock, but also
removes any benefit Oracle might gain from use of the UNIX buffer cache, and
so cannot be recommended blindly. VxFS QIO, beyond avoiding the single
writer lock, causes the KAIO call from libaio to succeed, thus avoiding the
LWP AIO inheritance of the TS scheduler bug by fully avoiding the LWP AIO
code path. VxFS QIO incurs additional adminsitrative overhead, and also
complicates the matter of growing file sizes. Thus, each of these actions
have side effects that complicate promoting them as "workarounds".
While the broader class of ORA-27062 timeouts can only be worked around by
disabling AIO in Oracle, disabling AIO in Oracle requires alternate Oracle
tuning be used (eg: multiple DB writers or use of I/O slaves), and for all
but the most read-biased workloads cannot result in performance equivalent
to that obtainable with AIO.
In most cases with Oracle, optimal tuning and disk layout can dramatically
reduce the probability of ORA-27062 incident from this bug, but these measures
hardly qualify as a "workaround". With or without optimal tuning, this bug
impacts Oracle users seemingly at random, and can be provoked by transient
I/O competition, soft hardware errors, etc - at I/O levels far below what one
would predict could possibly cause 10 minute timeouts.
The simgle biggest tunable in Oracle which throttles the probability of dying
due to this bug is db_writer_processes. This should not be tuned past 1 when
using AIO unless measurable business benefit is attained by doing so.
The only real workaround to the TS scheduling bug per se (as it impacts
Oracle by the bug being inherited by the LWP-based AIO) is to manipulate
the scheduler so as to implement a fixed-priority scheduling scheme for
Oracle backgrounds proceses. This should prevent Oracle processes and
their associated AIO LWP's from priority shuffling that can lead to thread
starvation. Cases such as the TDS case demonstrate that client processes
are also vulnerable to thread starvation per this bug. One might approach
the client scenario this by ensuring that the Oracle listener runs with a flat
scheduling priority scheme (perhaps at a priority just below that of the
Oracle background processes). Beware however in the case of client processes
that local client processes can be children of various server-side processes
which may be lauched from shells that rightfully belong in the TS class.
Thus, reigning in all these shadow processes to a fixed priority scheme could
be problematic.
|