SAN Storage Performance from the Host Perspective: Shedding Light on Confusing Counters
Last month we looked at performance from a holistic high level view and the importance of assessing the stack of related components that make up performance. This month I hope to shed some light on some confusing performance counters.
We will look at Microsoft Windows hosts and the performance tools and counters that we need to look at to establish a baseline and gain an understanding of how to apply host symptoms to SAN storage devices. We will also look at a couple of best practice configuration recommendations to facilitate performance tuning.
For Microsoft Windows systems the tool most people use to asses how a system is performing is perfmon. It is a Microsoft console (msc) tool that provides real-time graphs and has the capability of logging performance counters. The performance counters that we will focus on primarily are the Physical Disk counters. It is important, however, to make sure you don’t ignore other key system performance factors such as Processor, Memory, and Network values as these can mask or exacerbate disk performance problems. There are third party tools that leverage CMI or SNMP interfaces, but we will look at those in future installments.
When looking at perfmon disk counters in a SAN environment the first thing you should ignore is the disk utilization counter. While not clearly documented, the perfmon disk utilization counter is calculated by taking 100*Avg Disk Q Length. Instead of disk utilization take 100 minus Percent Idle Time counter to get a true indication of disk utilization. As percent idle time approaches 0 the busier the disk is running. In the same category of counters, but deceptive in a SAN environment is Queue Length counters. Since the windows environment has no idea of how many drives are servicing a particular device, high queue length may seem to indicate a problem where there is none. An effective rule of thumb is 2*the number of spindles.
The next set of counters to look at relate to response time or how quickly a disk transaction is serviced. The perfmon counter to look at is seconds per transfer (or read/write for respective response times) Use a scale value of .001 or 1/1000 to graph values in milliseconds. While large I/O operations and spikes may take longer, this value should typically be under 10ms for properly running systems.
After looking at utilization and response times, the next important set of counters to look at is the read/write balance and operations that you need to size for. Perfmon provides the Physical Disk Reads/sec and Writes/Sec. With these values you can start to establish the number of drives required to support the required I/O load.
With these three categories of Windows performance counters you can start establishing a baseline and a comparison to the SAN performance counters further down the stack. As you look at these performance counters, make sure that your system is up to date with the latest HBA drivers, firmware, and OS patches/hotfixes as recommended by your storage vendor. This will ensure the best performance and support.

