perf(kube): shrink binary 15.5MB and drop the k8s.io/kubernetes dependency#989
perf(kube): shrink binary 15.5MB and drop the k8s.io/kubernetes dependency#989dangrondahl wants to merge 2 commits into
Conversation
The k8s environment reporter only lists pods and namespaces, but using the aggregate typed clientset (kubernetes.NewForConfig) transitively imported the global client-go scheme, which registers all 53 API groups and linked the entire k8s.io/api package (~15MB) into the binary. Switch production code to the dynamic client, which has no dependency on k8s.io/api, and convert unstructured objects to typed corev1.Pod/Namespace via runtime.DefaultUnstructuredConverter so downstream logic and tests are unchanged. Only k8s.io/api/core/v1 remains linked now. The test suite keeps a separate typed clientset for cluster setup/teardown; test-only imports do not affect the release binary. Binary size: 83.2MB -> 67.8MB (-15.5MB, ~18.6%).
The kube test only used k8s.io/kubernetes for e2epod.WaitForPodNameRunningInNamespace. Importing that package pulled in the entire Kubernetes monorepo, which in turn forced ~33 replace directives (its staging modules are pinned to v0.0.0 and must be replaced) and dozens of indirect dependencies. Replace the single helper with a small local waitForPodRunning poll loop using the typed clientset already created for test setup. This lets go mod tidy remove k8s.io/kubernetes and 36 require entries, and drops all staging-module replaces. The kind library (hermetic cluster setup) is kept, so CI needs no changes. go.mod: require entries 246 -> 210, replace directives 35 -> 2 (kept client-go v1.5.2->v0.36.0, which MVS still needs, and the unrelated docker->moby replace). Verified: go build ./..., go vet ./..., golangci-lint, and the full KIND integration suite all pass.
|
Claude finished @dangrondahl's task in 4m 45s —— View job Review: PR #989 — shrink k8s binary size
This is a clean, well-scoped, behaviour-preserving refactor with a genuinely impressive result (-15.5MB / -19% and 33 fewer Static analysis — all clean ✅
Findings (all non-blocking)I left three inline comments:
Things done well
Verdict: LGTM. None of the above blocks merge; #1 is the only one I'd consciously decide on (confirm-on-large-cluster), the other two are polish. Note: I couldn't run the KIND integration suite here (needs a live cluster + Docker), so I relied on your reported run for runtime verification. |
| if len(filter.IncludeNames) == 0 && len(filter.IncludeNamesRegex) == 0 && | ||
| len(filter.ExcludeNames) == 0 && len(filter.ExcludeNamesRegex) == 0 { | ||
| list, err := clientset.Clientset.CoreV1().Pods("").List(context.Background(), metav1.ListOptions{}) | ||
| ul, err := clientset.dynamicClient.Resource(podsGVR).Namespace("").List(context.Background(), metav1.ListOptions{}) |
There was a problem hiding this comment.
Performance tradeoff worth noting (non-blocking). The dynamic client always negotiates JSON, not protobuf, and runtime.DefaultUnstructuredConverter.FromUnstructured decodes each object into a map[string]interface{} and then reflection-converts it into a typed Pod. Versus the old typed clientset (protobuf + direct decode), this means larger wire payloads and noticeably more CPU/allocations per pod.
The QPS/burst tuning comments above (lines 103-106) say this path targets large clusters, so a cluster-wide list of tens of thousands of pods is exactly where the extra cost shows up. Given the goal here is binary size and the behavior is preserved, this is an acceptable trade — but it'd be good to confirm the memory/latency profile on your largest real cluster (the KIND suite won't surface it), and maybe leave a one-line comment acknowledging the JSON/reflection cost so a future reader doesn't "optimize" it back to the typed client unaware of the size implication.
| } | ||
|
|
||
| return namespaces.Items, nil | ||
| namespaces := make([]corev1.Namespace, 0, len(ul.Items)) |
There was a problem hiding this comment.
Minor symmetry nit: pods get a reusable unstructuredToPods helper, but namespaces are converted with an inline loop here. Consider a small unstructuredToNamespaces helper (or a generic converter) so both resources read the same way and the conversion error wrapping stays consistent. Purely cosmetic — not blocking.
| select { | ||
| case <-deadline: | ||
| return fmt.Errorf("timed out after %s waiting for pod %s/%s to be running", timeout, namespace, podName) | ||
| case <-ticker.C: | ||
| } |
There was a problem hiding this comment.
Nice, correct helper — it checks the pod before the first sleep and treats transient Get errors (e.g. the brief NotFound right after create) as "keep polling", which is right.
One tiny robustness gap: the ctx passed in is never observed. If it's ever cancelled, the loop ignores the resulting Get error and keeps spinning every 2s until the full timeout elapses. Today the only caller uses context.Background(), so it can't happen — but a future caller with a real context would be surprised. Cheap to make it honor cancellation:
| select { | |
| case <-deadline: | |
| return fmt.Errorf("timed out after %s waiting for pod %s/%s to be running", timeout, namespace, podName) | |
| case <-ticker.C: | |
| } | |
| select { | |
| case <-ctx.Done(): | |
| return ctx.Err() | |
| case <-deadline: | |
| return fmt.Errorf("timed out after %s waiting for pod %s/%s to be running", timeout, namespace, podName) | |
| case <-ticker.C: | |
| } |
|
Thanks @dangrondahl , but I wonder if this is worth doing. Especially with removing structured types feels like we're taking more on ourselves not using existing libraries, which might already solved this and also have good performance optimisations in place 🤔 |
Thanks @mbevc1 Happy to close it with a "won't fix" or "not now" decision. 😊 |
|
Tradeoff exists! Chose to stay weith what we know for now. |
What
The k8s environment reporter only lists pods and namespaces, but the CLI was linking the entire Kubernetes API surface and, via tests, the whole Kubernetes monorepo. Two focused, behaviour-preserving changes:
Use the dynamic client instead of the aggregate typed clientset.
kubernetes.NewForConfigtransitively imports the global client-go scheme, which registers all 53 API groups and linked the entirek8s.io/apipackage (~15MB) into the binary. The dynamic client has zero dependency onk8s.io/api; unstructured objects are converted to typedcorev1.Pod/corev1.Namespaceviaruntime.DefaultUnstructuredConverter, so all downstream logic and tests are unchanged. Onlyk8s.io/api/core/v1remains linked.Drop the test-only
k8s.io/kubernetesmonorepo dependency. The kube test used it solely fore2epod.WaitForPodNameRunningInNamespace. Importing that one helper dragged in the whole monorepo, which forced ~33replacedirectives (its staging modules are pinned tov0.0.0and must be replaced) and dozens of indirect deps. Replaced it with a small localwaitForPodRunningpoll loop using the typed clientset already created for test setup.Impact
go.modhygiene: require entries 246 → 210,replacedirectives 35 → 2. The two kept areclient-go v1.5.2 → v0.36.0(MVS still needs it) and the pre-existing, unrelateddocker → mobyreplace.kindlibrary (hermetic cluster setup) is kept, so CI needs no kind-CLI install.What was measured and deliberately NOT changed
The
_ "k8s.io/client-go/plugin/pkg/client/auth"blank import was measured and left as-is: removing all three providers saved 0 bytes (their only notable dep,golang.org/x/oauth2, is already linked by the GCP/Azure integrations), and keeping it avoids breaking legacyauth-providerkubeconfigs.Verification
go build ./...,go vet ./...,golangci-lint run ./internal/kube/— all cleango mod verify— all modules verifiedFollow-ups (not in this PR)
Largest remaining binary contributors are non-k8s:
aws-sdk-go-v2(~11.5MB) andopen-policy-agent/opa(~4MB).