performance / tuning tips. to the point.                
About Us | Site Map | Privacy
Disclaimer | Feedback
About RSS Feed
Add to My Yahoo!
Google Reader or Homepage
del.icio.us performancewiki.com Latest Items


© 2006-2007 PerformanceWiki.com
All Rights Reserved.







Disk I/O Monitoring



  Linux  Windows  AIX  Solaris

Linux

Both 'sar' and 'iostat' are available on Linux to monitor disk activities:

# sar -d 1 5
Linux 2.4.21-27.ELsmp (pw101)         12/04/2005

06:33:54 PM       DEV       tps  rd_sec/s  wr_sec/s
06:33:55 PM    dev8-0      0.00      0.00      0.00
06:33:55 PM   dev8-16      0.00      0.00      0.00
06:33:55 PM   dev8-32      0.00      0.00      0.00
06:33:55 PM   dev8-48      0.00      0.00      0.00
06:33:55 PM   dev8-64      0.00      0.00      0.00
06:33:55 PM   dev8-65      0.00      0.00      0.00
06:33:55 PM   dev8-66      0.00      0.00      0.00


# iostat 1 3
Linux 2.4.21-27.ELsmp (pw101)         12/04/2005

avg-cpu:  %user   %nice    %sys %iowait   %idle
           7.14    0.00    4.53    0.01   88.32

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.23       1066    2377572
sdb               0.01         0.39         0.00    3941054        118
sdc               0.00         0.23         0.00    2364386         16
sdd               0.00         0.00         0.00         24          0
sde               0.35         0.09         6.88     933722   70340208
sde1              0.35         0.09         6.88     933218   70340208
sde2              0.00         0.00         0.00        280          0
If '%idle' goes below 20%, the system maybe queuing up disk I/Os and response time suffers.
'1 5' and '1 3' parameters are intervals and iterations (e.g., 1-second interval 5 iterations).

Windows

'Perfmon' is a Windows tool that can be configured to monitor (and log to file) system resources.  Physical disk counters are one area that can be added to the overall resources to be monitored.  You can start 'perfmon' simply by typing 'perfmon' in the "Start->Run..." window:


Once you created a counter log, for example, named 'perf', you can add counters to it. 
Double-click 'perf' to open it, and click on "Add Counters", and select 
"PhysicalDisk" in the 'Performance object' dropdown list.  Then select the counters needed.

A couple of indicators must be monitored for hard disks in your system. Watch the Physical Disk (instance)\Disk Transfers/sec counter for each physical disk and if it goes above 25 disk I/Os per second then you've got poor response time for your disk. A bottleneck from a disk can significantly impact response time for applications running on your system, so you should investigate this further by tracking Physical Disk(instance)\% Idle Time, which measures the percent time that your hard disk is idle during the measurement interval, and if you see this counter fall below 20% then you've likely got read/write requests queuing up for your disk which is unable to service these requests in a timely fashion. In this case it's time to upgrade your hardware to use faster disks or scale out your application to better handle the load.


 

Then you will need to start the perf counter log by click on the play button (the icon like the one your VCR).  This starts the monitoring of the counters you added and logging.  You may at a later time, stop it, and play it back as shown below.
 

AIX

'sar -d' sends a snapshot of disk activities to STDOUT.  You can supply the interval and iteration parameters for the command to repeat:

# sar -d 1 1

AIX cmcs101 3 5 002FBF7D4C00    01/04/06

System configuration: lcpu=8 drives=22

14:25:18     device    %busy    avque    r+w/s    Kbs/s   avwait   avserv

14:25:19     hdisk6      0      0.0        0        0      0.0      0.0
             hdisk7      0      0.0        0        0      0.0      0.0
             hdisk4      0      0.0        0        0      0.0      0.0
             hdisk5      0      0.0        0        0      0.0      0.0
             hdisk8      0      0.0        0        0      0.0      0.0
            hdisk11      0      0.0        0        0      0.0      0.0
             hdisk9      0      0.0        0        0      0.0      0.0
            hdisk10      0      0.0        0        0      0.0      0.0
            hdisk12      0      0.0        0        0      0.0      0.0
            hdisk13      0      0.0        0        0      0.0      0.0
            hdisk14      0      0.0        0        0      0.0      0.0
             hdisk0      0      0.0        0        0      0.0      0.0
            hdisk15      0      0.0        0        0      0.0      0.0
             hdisk2      0      0.0        0        0      0.0      0.0
             hdisk3      0      0.0        0        0      0.0      0.0
             hdisk1      0      0.0        0        0      0.0      0.0
               dac0      0      0.0        0        0      0.0      0.0
           dac0-utm      0      0.0        0        0      0.0      0.0
               dac1      0      0.0        0        0      0.0      0.0
           dac1-utm      0      0.0        0        0      0.0      0.0
            hdisk16      0      0.0        0        0      0.0      0.0
            hdisk17      0      0.0        0        0      0.0      0.0
** this example shows no disk activities.

'iostat aux' outputs similar information on disk I/O:

# iostat aux 1 1

System configuration: lcpu=8 drives=22

18:34:05     device    %busy    avque    r+w/s    Kbs/s   avwait   avserv

18:34:06     hdisk6      0      0.0        0        0      0.0      0.0
             hdisk7      0      0.0        0        0      0.0      0.0
             hdisk4      0      0.0        0        0      0.0      0.0
             hdisk5      0      0.0        0        0      0.0      0.0
             hdisk8      0      0.0        0        0      0.0      0.0
            hdisk11      0      0.0        0        0      0.0      0.0
             hdisk9      0      0.0        0        0      0.0      0.0
            hdisk10      0      0.0        0        0      0.0      0.0
            hdisk12      0      0.0        0        0      0.0      0.0
            hdisk13      0      0.0        0        0      0.0      0.0
            hdisk14      0      0.0        0        0      0.0      0.0
             hdisk0      0      0.0        0        0      0.0      0.0
            hdisk15      0      0.0        0        0      0.0      0.0
             hdisk2      0      0.0        0        0      0.0      0.0
             hdisk3      0      0.0        0        0      0.0      0.0
             hdisk1      0      0.0        0        0      0.0      0.0
               dac0      0      0.0        0        0      0.0      0.0
           dac0-utm      0      0.0        0        0      0.0      0.0
               dac1      0      0.0        0        0      0.0      0.0
           dac1-utm      0      0.0        0        0      0.0      0.0
            hdisk16      0      0.0        0        0      0.0      0.0
            hdisk17      0      0.0        0        0      0.0      0.0
parameters '1 1' indicate interval and iteration for refreshes.

To measure disk Read throughput:

# time dd if=/tmp/f66mb.out of=/dev/null bs=1024k
63+1 records in.
63+1 records out.

real    0m2.04s
user    0m0.00s
sys     0m1.05s
The 'time' command shows the amount of time it took to complete the read.  The read throughput
in this example is about 33MB per second (66MB / 2.04 seconds real time).

To measure disk Write throughput:

# sync; date; dd if=/dev/zero of=/tmp/1000m bs=1024k count=1000; date; sync; date
Thu Jan  5 23:02:41 PST 2006
1000+0 records in.
1000+0 records out.
Thu Jan  5 23:02:59 PST 2006
Thu Jan  5 23:02:59 PST 2006
In this example, dd completed after 18 seconds (23:02:59 - 23:02:41) and wrote with 55MB
per second (1GB / 18 seconds).

Solaris

The 'iostat –xc' command can be used to view disk activities on a Solaris machine. The –x argument allows extended statistics to be reported and the –c option reports the CPU utilization for the interval.


# iostat -xc
                  extended device statistics                        cpu
device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
fd0          0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   8 64 27  0
sd0         51.1    0.2 6545.1    1.6  0.0  1.8   34.7   0 100 
sd1         84.7    0.0 10615.1   0.0  0.0  1.6   19.0   1  98 
sd4         27.6    6.8  220.5   51.6  0.0  2.9   83.0   0  98 
sd6          0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
nfs1         0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
Looks like disk 'sd0' is really busy (100% busy!).  Next step is to find out what is using it.
The fields have the following meanings:

          disk    name of the disk
          r/s     reads per second
          w/s     writes per second
          Kr/s    kilobytes read per second
          Kw/s    kilobytes written per second
          wait    average number of transactions waiting for ser-
                  vice (queue length)
          actv    average number of transactions  actively  being
                  serviced  (removed  from  the queue but not yet
                  completed)

          %w      percent of time there are transactions  waiting
                  for service (queue non-empty)
          %b      percent of time the disk is busy  (transactions
                  in progress)