Performance Tuning & Monitoring
Measure resource usage, locate bottlenecks, and tune systematically.
Performance work is measurement-driven. This chapter covers the key metrics (CPU, memory, disk I/O), the tools that expose them, and a disciplined method for locating the real bottleneck before tuning.
By the end of this chapter you will be able to
- Interpret load average relative to core count.
- Use top/htop, vmstat, iostat, and free to profile resources.
- Distinguish CPU-, memory-, and I/O-bound situations.
- Apply the USE method to localise bottlenecks.
- Tune kernel parameters and limits responsibly.
13.1 Load Average
Load average shows demand over the last 1, 5, and 15 minutes. Compare it to your CPU core count: load roughly equal to cores = fully busy; load far above cores = overloaded.
13.2 Core Tools
| Resource | Tool |
|---|---|
| Overview / per-process | top or htop |
| CPU + memory + swap over time | vmstat 1 5 |
| Memory (look at ‘available’) | free -h |
| Disk I/O per device | iostat -xz 1 |
| Logs / errors | journalctl -p err -b |
13.3 Finding the Bottleneck
The whole game is identifying *which* resource is the limit:
- CPU-bound: high %us in top, load above core count, low iowait.
- Memory-bound: low ‘available’, heavy swapping (si/so in vmstat) — the system thrashes.
- I/O-bound: high iowait, near-100% %util on a disk in iostat — the CPU is waiting on storage.
13.4 Tuning
Only tune what you’ve proven is the bottleneck. Kernel parameters live in sysctl; per-service limits live in systemd unit directives or limits.conf.
13.5 Guided Lab: Performance Triage
Estimated time: 25 minutes. Profile a system across all four resources and decide where a bottleneck would be — read-only and safe to run anywhere.
- Read load vs cores:
uptimethennproc. Is load near, below, or above core count? - Watch processes live:
top— note the top CPU and memory consumers, then quit with q. - Sample CPU/memory/swap:
vmstat 1 5. Any swap activity (si/so)? - Check memory honestly:
free -h— compare ‘available’ to ‘total’. - Check disk I/O:
iostat -xz 1 3(install sysstat if needed). Any device near 100% util? - Scan for recent errors:
journalctl -p err -b | tail.
Troubleshooting
| Symptom | Likely cause and fix |
|---|---|
| High load but low CPU usage | Likely I/O wait. Check iostat for a saturated disk and high await; the CPU is waiting on storage, not computing. |
| Memory looks ‘full’ but system is fine | That’s file cache. Look at ‘available’ in free -h; cache is reclaimable. Real pressure shows as swapping. |
| Sudden slowdowns with heavy swapping | Insufficient RAM for the workload. Reduce memory use, add RAM, or move load; tune swappiness only after measuring. |
| One process pegs a core | Profile that PID; it may be a bug, a tight loop, or legitimate work. Consider nice/renice or fixing the app. |
Practice & Prove It
Write-the-command drills
- Show the current load average and uptime.
- Show memory usage including the available figure, human-readable.
- Sample CPU, memory, and swap every second, five times.
- Show extended per-device disk I/O three times at one-second intervals.
- Show all error-priority journal messages from the current boot.
Quick quiz
- What do the three load-average numbers represent?
- Which memory column should you actually watch?
- What does sustained high iowait indicate?
- What does the USE method stand for?
- What’s the golden rule before tuning?
Key Takeaways
- Compare load average to core count to judge overload.
- Use the right tool per resource: top, vmstat, free, iostat, journalctl.
- Identify whether the system is CPU-, memory-, or I/O-bound before acting.
- Watch ‘available’ memory; cache is reclaimable, swapping signals real pressure.
- Tune one proven bottleneck at a time and measure the effect.
Next — Chapter 14: virtualisation and containers.