|
Description
|
In the past, it was possible for an otherwise idle machine with many CPUs
to show 100% I/O wait if just one thread was blocked on long-term I/O ...
e.g. mt -f /dev/rmt/<N> rewind
This was addressed by the fix provided for ...
4116873: I/O wait statistic misleading
However, this fix is incomplete, and I/O wait continues to be a misleading
and confusing statitic to many customers.
For example, let's take the case of an otherwise idle, 64-way E10K which
just happens to have 64 tape drives ...
#!/bin/ksh
# script to show 100% I/O wait on 64-way with 64 tape drives
n=0
while (( n < 64 ))
do
pbind -b $n $$
mt -f /dev/rmt/$n rewind
done
#!/bin/ksh
# script to show 1-2% I/O wait on 64-way with 64 tape drives
n=0
while (( n < 64 ))
do
pbind -b 0 $$
mt -f /dev/rmt/$n rewind
done
Ok, this is contrived ... but it does show that the same I/O load could
exhibit 1% to 100% I/O wait on the same system ... it just depends on
which CPU(s) the I/O was submitted.
Also, a system with one application seeing 100% I/O wait could have this
masked by another, unrelated application which was 100% user ...
#!/bin/ksh
# script to consume 100% user on 64-way machine
n=0
while (( n < 64 ))
do
( while :; do :; done ) &
done
wait
This script would have no impact on 64 tapes rewinding, but would show
0% I/O wait (instead of 1% to 100%).
I/O wait continues to cause confusion to customers. Here is a recent
example:
Customer: My system runs faster when I turn off CPUs.
Me: How are you measruing this?
Customer: I start with 4 CPUs and see 60% I/O wait and 0% idle. Then I
turn off 2 CPUs and get just 20% I/O wait and 0% idle.
Me: There is no difference ... 60% I/O wait and 0% idle indicates
40% utilisation. This would correspon to 80% utilisation in a
2 CPU machine ... which is where the 20% figure comes from.
Did you measure any difference in application performance?
Customer: No.
This is not uncommon. The confusion would be avoided if the current I/O
wait statitic was incorporated into the idle statistic.
There is no need to change the utilities - indeed they should probably
be left unchanged for the time being for compatablity reasons. This fix
can easily be applied at the kstat level simply by changing CPU_WAIT and
CPU_IDLE.
|