On my Ubuntu Dell D620 laptop (which dual-boots into Windows XP occasionally), I run some pretty demanding software. Sometimes. As a systems architect, I spend a lot of time in standard “Information Worker” tools – email, office suite, web browser for white papers and product reviews. For me, OpenOffice.org, Evolution, and Firefox work great. I even like Evolution’s Exchange plugin better than Outlook – everything updates on my Windows Mobile phone just like Outlook, it’s faster than Outlook, and it threads my messages. I love message threading for all the reasons that it was invented.

However, because I mostly design Windows networks, I also run VMWare Workstation 6 with the following VMs: Windows XP Pro joined to primary domain (full workstation with Office 2003, Visio, Outlook (and EMC EmailExtender plugins), Windows Admin Pack, Resource Kit, SQL Enterprise Manager, and Exchange Admin Pack), Windows Vista Enterprise, Windows 2003 DC, Exchange, SQL server, Windows XP Pro joined to VM domain, RHEL 4, RHEL 5, and a Live-CD system that I often use to test bootable CDs. And a sysprep’d Windows install.

All that, plus Evolution caching my email and Firefox being disk-happy (did you ever “strace -p $(pgrep firefox) -e trace=open,close,read,write”? It’s busier than Evolution during an offline mail check!), means my 7200 RPM 160GB SATA disk gets hammered. Me also using laptop-mode-tools changing my vm.dirty_writeback_* settings and read/write cache isn’t helping either, I’m sure.

Today and yesterday I ran into an issue where my disk would begin a sync that would last 10-20 minutes, leaving me unable to work the entire time that was happening. Hunting down WHAT was causing this, however, was even more frusterating than it happening (If I shut down any of the 3 above-mentioned programs, the problem went away – it only happened with all 3 open). In Windows, you can open Task Manager, go to the “Process” tab, click “View-> Columns” and add “IO Write Bytes” and “IO Read Bytes” and watch the numbers count up. Or you can use Perfmon and look at IO reads/writes/bytes per second or total, and know immediately what’s causing all your disk IO pain. I still don’t know how to do this in Linux.

First, any hunt for “disk utilization” and “Linux” on Google directs you to hundreds of sites, forums, and blogs evangelizing the wonders of “df” for disk utilization. Yes, it’s really nice to know how much free space I have on my hard drives- that’s why I have SuperKaramba to tell me. But when a problem hits and leaves me unable to work, it’s useless.

“iostat -k 1″ is great – you’ll know immediately which disk is being used, and how hard. But on a laptop with a single disk, you already know.

“top” sorted by process-state will show you what’s in “waiting on IO” state, but not what’s CAUSING the IO that’s causing everything else to wait.

“sar” seems to be the only tool that can provide per-process IO stats, but it has to be pre-set up to write to a log. And I can’t begin to guess how well that will work when my disk is at 100% utilization (peaked at 120tps today).

So if anyone knows of any way to know what’s causing disk IO in a “right now” fashion, please comment or email me. And if you’re curious more about my problem:

  1. Only happens when VMWare (with a guest), firefox, and Evolution are all running.
  2. VMWare with multiple guests runs fine, and never has this issue.
  3. rauch@lt00-bofh:~$ free
    total used free shared buffers cached
    Mem: 3348960 1099216 2249744 0 68668 533124
    -/+ buffers/cache: 497424 2851536
    Swap: 6000268 0 6000268
  4. Happens with Laptop_mode disabled or enabled, on AC or on battery
  5. “sync” causes the exact same symptoms, leading me to believe that somehow I’m getting a LOT more dirty pages than my parameters are set at.
  6. dirty_background_ratio
  7. For now, I just close Firefox when I have VMWare open, which means I spend a lot more time in IE than I want to.

As a final note, I’ve updated my Linux EVDO post here with my new built-in card’s info.