We should get rid of average CPU utilization

www.theocharis.dev

32 points by JeremyTheo 1 day ago

arianvanp 1 day ago

A more general metric that is useful to watch for is pressure stall information for CPU, IO and Memory.

https://docs.kernel.org/accounting/psi.html

I made a Prometheus exporter for it:

https://github.com/arianvp/cgroup-exporter

JeremyTheo 1 day ago

Yes!

JanMa 1 day ago

I've learned the hard way that CPU resource limits in K8S are a bad idea, as can be seen in this post. Just use CPU requests without limits so the scheduler has an estimate of your applications CPU requirements, but it can burst to use more CPU when it's available.

With memory of course you should set a limit and from experience it should be the same as your memory requests.

JeremyTheo 1 day ago

There is also the concern that a single pod shouldn’t be able to take down an entire node. So there needs to be some safety levels. But then also not. I find this is a really complex issue which is not widely known (only in Kubernetes bubble)
- ralgozino 1 day ago
  
  you can reserve node resources for system processes so the pods don't kill the node using some kubelet parameters: https://kubernetes.io/docs/tasks/administer-cluster/reserve-...
cassianoleal 1 day ago

This, very much. With memory, I have seen one or two use cases where it made sense to have bigger limits than requests but it's the exception rather than the norm.

nairboon 1 day ago

No, not at all. Why get rid of a low-level statistical measure? It's not even quite clear what the article argues against. htop doesn't even show you "average CPU utilization", it provides a sample of the current CPU utilization.

To me the problem appears to be that they try to do some hard realtime computing with strict time guarantees, but are so far up the stack (golang library, golang scheduler, docker, kubernetes, virtualization, etc.), that they don't realize that this stack can't guarantee you realtime computing. CPU utilization is a very low-level measure and, in this stack, is only indirectly related to the observed timeouts.

joshspankit 1 day ago

> It's not even quite clear what the article argues against.
I think it can be summed up as “average CPU utilization, which is the common and intuitive first check doesn’t tell you the real story”
I would also suggest that these are “outdated” measurements as common CPU metrics are really designed for moderately multi-threaded, single-foreground-application on bare metal
To your point, someone who deeply understands the stack already knows these are not the metrics to look at, but this is clearly aimed at people who have not (yet) had to dive deep to figure out a scheduling issue

CodesInChaos 1 day ago

It's well known that many throttling implementations are broken, usually by design. You shouldn't blame the CPU utilization metric for that footgun.

In a well designed scheduler, a task that has been granted an allotment of at least n cores, should never get throttled to less than n cores at any time. It can be limited to less than n cores if CPU utilization is at 100% and another task gets scheduled at the time, since that's unavoidable when you oversubscribe the available resources.

zeafoamrun 1 day ago

Same thing when it comes to memory. The rabbit hole goes on forever, and metrics lie to you if you don't know how to interpret them properly.

ahartmetz 1 day ago

No, we shouldn't. We should measure latency if we care about latency.

jiggawatts 1 day ago

I’ve come to realise that “wide logs” like OpenTelemetry traces are the only way to go, despite the expense of collecting and storing them with current technology.
As open source columnar databases improve, the cost will drop.

cyclonereef 1 day ago

I've worked with plenty of companies that provide some sort of hosting for enterprise customers, and the number of times I've seen even senior admins use only CPU Utilisation and Memory In-Use investigating an issue is disheartening. And given that CPU Utilisation is an aggregate of all time != CPU idle, the same utilisation number can mean very different underlying system states.

There's something like a dozen different CPU metrics that can be referred to by the OS alone.

VimEscapeArtist 1 day ago

Let’s measure temperature :)

techpression 1 day ago

Lovely read, if you’ve ever had even remotely similar issues (you think you’re looking at the right places but you’re not) it read like a detective novel.

rimworld 1 day ago

great article thanks

ksk23 1 day ago

TLDR; if app slow, give more resources

luipugs 1 day ago

Or just don't put CPU limits: https://home.robusta.dev/blog/stop-using-cpu-limits
- JeremyTheo 1 day ago
  
  Yeah, that is mainly the point there. But difficult if company internal policies require it (for security, etc)
andrepd 1 day ago

Writing better code is of course out of the question.
- dgellow 1 day ago
  
  What do you mean, I always append “make it excellent” to all my prompts!
- inglor_cz 1 day ago
  
  Shockingly many developers have never profiled any code in their life.
  
  ahartmetz 1 day ago
  
  The nice thing about such code is that, when you come in to improve it, you can make huge improvements in no time. As a user, though...