Practical Professional Linux — Professional

Chapter 13 · Skill Level: Professional

Performance Tuning & Monitoring

Measure resource usage, locate bottlenecks, and tune systematically.

Performance work is measurement-driven. This chapter covers the key metrics (CPU, memory, disk I/O), the tools that expose them, and a disciplined method for locating the real bottleneck before tuning.

By the end of this chapter you will be able to

  • Interpret load average relative to core count.
  • Use top/htop, vmstat, iostat, and free to profile resources.
  • Distinguish CPU-, memory-, and I/O-bound situations.
  • Apply the USE method to localise bottlenecks.
  • Tune kernel parameters and limits responsibly.

13.1 Load Average

Load average shows demand over the last 1, 5, and 15 minutes. Compare it to your CPU core count: load roughly equal to cores = fully busy; load far above cores = overloaded.

13.2 Core Tools

Resource Tool
Overview / per-process top or htop
CPU + memory + swap over time vmstat 1 5
Memory (look at ‘available’) free -h
Disk I/O per device iostat -xz 1
Logs / errors journalctl -p err -b

13.3 Finding the Bottleneck

The whole game is identifying *which* resource is the limit:

  • CPU-bound: high %us in top, load above core count, low iowait.
  • Memory-bound: low ‘available’, heavy swapping (si/so in vmstat) — the system thrashes.
  • I/O-bound: high iowait, near-100% %util on a disk in iostat — the CPU is waiting on storage.

13.4 Tuning

Only tune what you’ve proven is the bottleneck. Kernel parameters live in sysctl; per-service limits live in systemd unit directives or limits.conf.

13.5 Guided Lab: Performance Triage

Estimated time: 25 minutes. Profile a system across all four resources and decide where a bottleneck would be — read-only and safe to run anywhere.

  • Read load vs cores: uptime then nproc. Is load near, below, or above core count?
  • Watch processes live: top — note the top CPU and memory consumers, then quit with q.
  • Sample CPU/memory/swap: vmstat 1 5. Any swap activity (si/so)?
  • Check memory honestly: free -h — compare ‘available’ to ‘total’.
  • Check disk I/O: iostat -xz 1 3 (install sysstat if needed). Any device near 100% util?
  • Scan for recent errors: journalctl -p err -b | tail.

Troubleshooting

Symptom Likely cause and fix
High load but low CPU usage Likely I/O wait. Check iostat for a saturated disk and high await; the CPU is waiting on storage, not computing.
Memory looks ‘full’ but system is fine That’s file cache. Look at ‘available’ in free -h; cache is reclaimable. Real pressure shows as swapping.
Sudden slowdowns with heavy swapping Insufficient RAM for the workload. Reduce memory use, add RAM, or move load; tune swappiness only after measuring.
One process pegs a core Profile that PID; it may be a bug, a tight loop, or legitimate work. Consider nice/renice or fixing the app.

Practice & Prove It

Write-the-command drills

  • Show the current load average and uptime.
  • Show memory usage including the available figure, human-readable.
  • Sample CPU, memory, and swap every second, five times.
  • Show extended per-device disk I/O three times at one-second intervals.
  • Show all error-priority journal messages from the current boot.

Quick quiz

  • What do the three load-average numbers represent?
  • Which memory column should you actually watch?
  • What does sustained high iowait indicate?
  • What does the USE method stand for?
  • What’s the golden rule before tuning?

Key Takeaways

  • Compare load average to core count to judge overload.
  • Use the right tool per resource: top, vmstat, free, iostat, journalctl.
  • Identify whether the system is CPU-, memory-, or I/O-bound before acting.
  • Watch ‘available’ memory; cache is reclaimable, swapping signals real pressure.
  • Tune one proven bottleneck at a time and measure the effect.

Next — Chapter 14: virtualisation and containers.