Kubelet spends 20% time in GC in most environments where it runs

nonameiguess 2021-08-17 17:57:50 +0000 UTC [ - ]

This doesn't seem corroborated just from that chart. runtime.systemstack() is one of the calls that allows code to switch from the user stack to the system stack, so that non-preemptible (by the goroutine scheduler) code can run, but this doesn't only include garbage collection. It includes all system calls. The kubelet works by synchronizing node state with the definition retrieved from the apiserver across the network, so I would expect it spends a lot of time making system calls just to read from and write to whatever port and/or socket it uses to do that.

klysm 2021-08-17 17:39:00 +0000 UTC [ - ]

I’m still convinced GC is not a good idea as the single memory model for a language. It is possible to know about memory behavior at compile time and not pay at runtime for it in latency.

jayd16 2021-08-17 17:57:23 +0000 UTC [ - ]

You need to lean into a fully managed model if you want to support things like compaction, no? (Speaking generally. I don't think Go supports compaction.)

You can still squeeze out some tricks though. Span<T> in C# seems to be pretty successful.

uluyol 2021-08-17 17:57:11 +0000 UTC [ - ]

Go will allocate some things on the stack by performing escape analysis. That can be improved, but it's incorrect to say that GC is the only way to manage memory in Go.

felipellrocha 2021-08-17 23:03:01 +0000 UTC [ - ]

You're right, and I think Rust proved that. I do wonder if we could turn that on and off during compilation time, though, to simplify most use cases until one needs to have a more manual approach (using the borrow checker or the whatnot)

PedroBatista 2021-08-17 17:54:49 +0000 UTC [ - ]

I kinda see your point, but before we talk about the GC, can we address kubelet and the absolute mindless frenzy, resume-driven technological orgy that Kubernetes and "DevOps" has turned into?

outworlder 2021-08-17 17:59:33 +0000 UTC [ - ]

No. Kubernetes actually solves problems. You'll know that's the case for you when people starts preferring to deploy their stuff in K8s.

It's just that people have been using it where it doesn't belong. If you don't need it, you don't need it.

wcarss 2021-08-17 18:19:14 +0000 UTC [ - ]

To be fair, it also can get overwrought: it's far from impossible to end up using a programming language to emit a templating language that emits a configuration language that itself describes configuration files.

Sometimes, for simple things, those could have written by hand at the bottom without the extra tools. But even when writing them by hand would have been very hard or not the right choice, the devops stack is a daunting thing to reckon with!

A developer coming from "jenkins invokes a script that runs npm install && npm start", or even the mildly more modern "jenkins invokes docker-compose on a file" has a lot of very abstract and difficult-to-play-around-with context to pick up.

It's hard to learn fully even if you're at a shop with an established approach where someone knows what they're doing, but imagine moving yourself from Rackspace or Heroku onto AWS or GCP, and having to figure out IAM permissions and private VPC networking ingress configurations while also deciding if you really need to use helm. Especially if you're doing this because the team "doesn't have time to spend on ops".

At the end of the day I of course agree that using k8s to manage moderately complex things can melt a lot of toil right off. In the right scenarios, more abstraction can help for more complex stuff too. But there's also a lot of marketing out there for tools-on-tools, and businesses that would love for folks to adopt solutions to problems which few people have, let alone really deeply understand.

outworlder 2021-08-17 19:12:28 +0000 UTC [ - ]

> A developer coming from "jenkins invokes a script that runs npm install && npm start", or even the mildly more modern "jenkins invokes docker-compose on a file" has a lot of very abstract and difficult-to-play-around-with context to pick up.

Let's see.

If you are deploying this on a physical machine: you have to connect to the machine somehow. You have to setup the correct permissions. You need auditing. You need to configure logging (and log rotation). You need to ensure your npm start script will be able to be restarted if needed. You may need cronjobs for various housekeeping chores. And then suddenly you need to run more instances of your app. What to do? Are you going to setup a proxy now and a bunch of different ports? Is your script picking them up or are they hard-coded? Are you instead doing network namespaces and iptables trickery? How do you upgrade your service without downtime? Are you going to write scripts to do that? What if your app need storage.

And what if you now need more machines. How are you going to spread the workloads around?

And how do you automate the above? One team prefers Ansible, the other goes with Chef, a third one likes to embed shell scripts inside a Terraform provisioner.

These things are _hard_. There is a whole lot of context. It's just that we have got used to the way things have always been done. It does not help that K8s development is very fast and there are far too many people and companies jumping on the bandwagon, each with their own unique spin. The ecosystem is becoming big enough to the point of being unmanageable.

Also, Kubernetes joins some of the work that was done by the development team or release teams with work that was traditionally done by the operations teams, and makes it all visible. Suddenly you have to _know_ what the heck a liveness probe is. You could get by with not knowing this in a standard deployment, your app would simply lack the check(and cause issues in production). You need to know what exactly your app needs to persist. Previously it would just dump data in whatever directory it had permissions to, and this became tribal knowledge.

But K8s by itself? My heuristic is: am I running containers? Do I need to spread said containers across machines? Just use K8s. Trying to replicate this with homegrown scripts will be very painful. It may not seem like it's painful now, but trust me, it will be.

Will K8s exist in 15 years? Hard to say. Probably but, for the reasons you describe, it's likely that there will be, at least, a high level abstraction to hide decisions most people don't need to make.

wcarss 2021-08-17 21:40:57 +0000 UTC [ - ]

I'll clarify that I'm on board with what you're saying. Definitely if you need N machines to split the traffic load, or an application that will auto-restart under certain conditions, you ought to just use kubernetes. It's a lot easier than rolling it all yourself.

What I was trying to target above was more tooling built atop the tooling -- complex instances "the devops stack", rather than _just_ kubernetes. Take for instance, Dhall[1] -- just look at the kind of places that goes[2]!

A person coming from a "run docker-compose" world maybe wasn't ever thinking about log rotation, let alone audits, and maybe they should have been. But after they read a marketing blogpost somewhere that's convinced them it is "modern best practices" to use something like Dhall, as the saying goes... now they have two problems.

When the person up above mentioned the devops resume madness, that's what I was thinking of and felt it's worth noting does exist, while agreeing that vanilla k8s can be great.

1 - https://dhall-lang.org/

2 - (shudder) https://docs.dhall-lang.org/howtos/How-to-translate-recursiv...

blacktriangle 2021-08-17 18:23:36 +0000 UTC [ - ]

It's not that Kube doesn't solve problems.

It's that Kube solves problems 99% of projects using it do not and never will have.

ltbarcly3 2021-08-17 18:09:00 +0000 UTC [ - ]

Agree, people are spending weeks of effort to figure out how to deploy their app which has 3 webservers and one RDS database via kubernetes because 'thats the right way to do it'.

Kubernetes is a solution to a problem hardly anyone has, and it is maddingly complex, and large parts of it are poorly designed. If you have a large team deploying many services and can afford full time devops staff, it is a good solution to many problems, but probably 90% of the people using it would find things to be simpler and more reliable without it.

scottlamb 2021-08-17 18:05:28 +0000 UTC [ - ]

This is amusing next to "Go code that uses 20% of CPU time in GC is usually bad Go code," but how much does it matter? Kubelet runs on every machine in the cluster, but I'd expect few of each machine's cycles to be spent on it. I wouldn't be surprised if relatively little attention has been paid to optimizing its GC and/or if someone has deliberately tuned it to reduce RAM at the expense of GC CPU.

Put another way: this percentage uses the wrong denominator for ranking optimization targets. Don't rank by percentage of the binary's cycles but instead by percentage of overall cluster cycles.

This is particularly true for Kubernetes itself. It affects the efficiency of the cluster as a whole via its bin-packing decisions. Eg the whole cluster becomes more efficient when Kubernetes minimizes stranded capacity and also when it isolates workloads that heavily thrash the CPU cache to separate NUMA nodes or machines. Thus if I were to dive into Kubernetes optimization, I'd focus on bin-packing much more than its own GC cycles.

ltbarcly3 2021-08-17 18:11:17 +0000 UTC [ - ]

I don't think the point is that this is causing wasted CPU cycles, it's that if something shows symptoms of being written by incompetent programmers, it's likely they were incompetent in many other ways.

scottlamb 2021-08-17 18:18:44 +0000 UTC [ - ]

That's what makes it amusing, but I don't necessarily agree with "Go code that uses 20% of CPU time in GC is usually bad Go code", and I certainly would disagree if you omitted the word "usually".

Quality code is optimized based on what really matters, and I'm skeptical GC cycles really matter here.

smarterclayton 2021-08-17 22:06:46 +0000 UTC [ - ]

Right now it’s mostly the CRI/kubelet interaction (which polls the container runtime for containers) and metrics scraping generating the garbage.

Having an incremental CRI watch (vs poll) has been on the backlog for years, but correctness and features have been more important for most people.

Several people were looking at this recently as we were squeezing management pods + system processes onto two dedicated cores for telco/edge real-time use cases (the other tens of cores being reserved for workloads). It’s not that hard to fix, it’s just never been the top priority.

Real software has curves.