perf(kube): shrink binary 15.5MB and drop the k8s.io/kubernetes dependency by dangrondahl · Pull Request #989 · kosli-dev/cli

dangrondahl · 2026-07-03T09:31:27Z

What

The k8s environment reporter only lists pods and namespaces, but the CLI was linking the entire Kubernetes API surface and, via tests, the whole Kubernetes monorepo. Two focused, behaviour-preserving changes:

Use the dynamic client instead of the aggregate typed clientset. kubernetes.NewForConfig transitively imports the global client-go scheme, which registers all 53 API groups and linked the entire k8s.io/api package (~15MB) into the binary. The dynamic client has zero dependency on k8s.io/api; unstructured objects are converted to typed corev1.Pod/corev1.Namespace via runtime.DefaultUnstructuredConverter, so all downstream logic and tests are unchanged. Only k8s.io/api/core/v1 remains linked.
Drop the test-only k8s.io/kubernetes monorepo dependency. The kube test used it solely for e2epod.WaitForPodNameRunningInNamespace. Importing that one helper dragged in the whole monorepo, which forced ~33 replace directives (its staging modules are pinned to v0.0.0 and must be replaced) and dozens of indirect deps. Replaced it with a small local waitForPodRunning poll loop using the typed clientset already created for test setup.

Impact

Binary: 83.2MB → 67.8MB (-15.5MB, ~19%). Lighter downloads, CI installs, and the k8s-reporter image.
Smaller supply-chain / CVE surface: we no longer ship ~50 unused Kubernetes API groups or the Kubernetes monorepo source tree.
go.mod hygiene: require entries 246 → 210, replace directives 35 → 2. The two kept are client-go v1.5.2 → v0.36.0 (MVS still needs it) and the pre-existing, unrelated docker → moby replace.
No behaviour change and no CI change. The kind library (hermetic cluster setup) is kept, so CI needs no kind-CLI install.

What was measured and deliberately NOT changed

The _ "k8s.io/client-go/plugin/pkg/client/auth" blank import was measured and left as-is: removing all three providers saved 0 bytes (their only notable dep, golang.org/x/oauth2, is already linked by the GCP/Azure integrations), and keeping it avoids breaking legacy auth-provider kubeconfigs.

Verification

go build ./..., go vet ./..., golangci-lint run ./internal/kube/ — all clean
go mod verify — all modules verified
Full KIND integration suite passed against a live cluster for both changes (list pods cluster-wide + per-namespace, namespace filtering, throttling, and the new wait helper)

Follow-ups (not in this PR)

Largest remaining binary contributors are non-k8s: aws-sdk-go-v2 (~11.5MB) and open-policy-agent/opa (~4MB).

The k8s environment reporter only lists pods and namespaces, but using the aggregate typed clientset (kubernetes.NewForConfig) transitively imported the global client-go scheme, which registers all 53 API groups and linked the entire k8s.io/api package (~15MB) into the binary. Switch production code to the dynamic client, which has no dependency on k8s.io/api, and convert unstructured objects to typed corev1.Pod/Namespace via runtime.DefaultUnstructuredConverter so downstream logic and tests are unchanged. Only k8s.io/api/core/v1 remains linked now. The test suite keeps a separate typed clientset for cluster setup/teardown; test-only imports do not affect the release binary. Binary size: 83.2MB -> 67.8MB (-15.5MB, ~18.6%).

The kube test only used k8s.io/kubernetes for e2epod.WaitForPodNameRunningInNamespace. Importing that package pulled in the entire Kubernetes monorepo, which in turn forced ~33 replace directives (its staging modules are pinned to v0.0.0 and must be replaced) and dozens of indirect dependencies. Replace the single helper with a small local waitForPodRunning poll loop using the typed clientset already created for test setup. This lets go mod tidy remove k8s.io/kubernetes and 36 require entries, and drops all staging-module replaces. The kind library (hermetic cluster setup) is kept, so CI needs no changes. go.mod: require entries 246 -> 210, replace directives 35 -> 2 (kept client-go v1.5.2->v0.36.0, which MVS still needs, and the unrelated docker->moby replace). Verified: go build ./..., go vet ./..., golangci-lint, and the full KIND integration suite all pass.

claude · 2026-07-03T09:31:52Z

Claude finished @dangrondahl's task in 4m 45s —— View job

Review: PR #989 — shrink k8s binary size

Gather context & read changed files
Static analysis (build, vet, mod tidy)
Code review — kube.go dynamic-client conversion
Code review — test waitForPodRunning helper
Post review

This is a clean, well-scoped, behaviour-preserving refactor with a genuinely impressive result (-15.5MB / -19% and 33 fewer replace directives). The PR description is excellent — measurements, what was deliberately not changed, and follow-ups. Nice work. 👏

Static analysis — all clean ✅

Check	Result
`go build ./...`	✅ passes
`go vet ./internal/kube/...`	✅ no findings (also type-checks the test file)
`go mod tidy` → `git diff go.mod go.sum`	✅ no changes — already tidy
Remaining `replace` directives	✅ just the two expected (`client-go → v0.36.0`, `docker → moby`)
External API impact	✅ only `NewK8sClientSet` / `GetPodsData` are used outside the package (`cmd/kosli/snapshotK8S.go`); the removed embedded `*Clientset` field had no external users

Findings (all non-blocking)

I left three inline comments:

kube.go:140 — performance tradeoff (worth confirming, not fixing). The dynamic client always uses JSON (not protobuf), and DefaultUnstructuredConverter.FromUnstructured goes via map[string]interface{} + reflection. On the large clusters this path is tuned for (the QPS/burst comments), that's more wire bytes and more CPU/allocs per pod than the old typed path. Acceptable given the size win — just worth spot-checking against your largest real cluster (KIND won't surface it) and maybe leaving a one-line note so nobody "optimizes" it back later.
kube.go:344 — symmetry nit. Pods get an unstructuredToPods helper; namespaces use an inline loop. A matching unstructuredToNamespaces (or a small generic converter) would read more consistently. Purely cosmetic.
kube_test.go:337 — honor ctx cancellation. The waitForPodRunning poll loop ignores the passed ctx and would spin until the full timeout if the context were cancelled. Harmless today (caller uses context.Background()), but a cheap case <-ctx.Done(): return ctx.Err() future-proofs it. Suggestion posted inline.

Things done well

Correctness of the conversion is integration-covered — getK8sClient now returns the dynamic-backed client, so the KIND suite exercises the real unstructured→typed path (cluster-wide list, per-namespace, filtering, throttling).
The waitForPodRunning replacement is a faithful, minimal stand-in: it checks before the first sleep, treats transient Get errors as "keep polling" (correct for the post-create NotFound window), and fails fast on terminal phases.
Good use of an unexported dynamicClient field + package-level GVRs with an explanatory comment on why the dynamic client was chosen.
Test/production separation is clear: the typed clientset is confined to setup/teardown with a comment saying so.

Verdict: LGTM. None of the above blocks merge; #1 is the only one I'd consciously decide on (confirm-on-large-cluster), the other two are polish.

Note: I couldn't run the KIND integration suite here (needs a live cluster + Docker), so I relied on your reported run for runtime verification.
· branch reduce-k8s-binary-size

claude · 2026-07-03T09:36:14Z

 	if len(filter.IncludeNames) == 0 && len(filter.IncludeNamesRegex) == 0 &&
 		len(filter.ExcludeNames) == 0 && len(filter.ExcludeNamesRegex) == 0 {
-		list, err := clientset.Clientset.CoreV1().Pods("").List(context.Background(), metav1.ListOptions{})
+		ul, err := clientset.dynamicClient.Resource(podsGVR).Namespace("").List(context.Background(), metav1.ListOptions{})


Performance tradeoff worth noting (non-blocking). The dynamic client always negotiates JSON, not protobuf, and runtime.DefaultUnstructuredConverter.FromUnstructured decodes each object into a map[string]interface{} and then reflection-converts it into a typed Pod. Versus the old typed clientset (protobuf + direct decode), this means larger wire payloads and noticeably more CPU/allocations per pod.

The QPS/burst tuning comments above (lines 103-106) say this path targets large clusters, so a cluster-wide list of tens of thousands of pods is exactly where the extra cost shows up. Given the goal here is binary size and the behavior is preserved, this is an acceptable trade — but it'd be good to confirm the memory/latency profile on your largest real cluster (the KIND suite won't surface it), and maybe leave a one-line comment acknowledging the JSON/reflection cost so a future reader doesn't "optimize" it back to the typed client unaware of the size implication.

claude · 2026-07-03T09:36:16Z

 	}

-	return namespaces.Items, nil
+	namespaces := make([]corev1.Namespace, 0, len(ul.Items))


Minor symmetry nit: pods get a reusable unstructuredToPods helper, but namespaces are converted with an inline loop here. Consider a small unstructuredToNamespaces helper (or a generic converter) so both resources read the same way and the conversion error wrapping stays consistent. Purely cosmetic — not blocking.

claude · 2026-07-03T09:36:21Z

+		select {
+		case <-deadline:
+			return fmt.Errorf("timed out after %s waiting for pod %s/%s to be running", timeout, namespace, podName)
+		case <-ticker.C:
+		}


Nice, correct helper — it checks the pod before the first sleep and treats transient Get errors (e.g. the brief NotFound right after create) as "keep polling", which is right.

One tiny robustness gap: the ctx passed in is never observed. If it's ever cancelled, the loop ignores the resulting Get error and keeps spinning every 2s until the full timeout elapses. Today the only caller uses context.Background(), so it can't happen — but a future caller with a real context would be surprised. Cheap to make it honor cancellation:

Suggested change

select {

case <-deadline:

return fmt.Errorf("timed out after %s waiting for pod %s/%s to be running", timeout, namespace, podName)

case <-ticker.C:

}

select {

case <-ctx.Done():

return ctx.Err()

case <-deadline:

return fmt.Errorf("timed out after %s waiting for pod %s/%s to be running", timeout, namespace, podName)

case <-ticker.C:

}

mbevc1 · 2026-07-03T09:38:13Z

Thanks @dangrondahl , but I wonder if this is worth doing. Especially with removing structured types feels like we're taking more on ourselves not using existing libraries, which might already solved this and also have good performance optimisations in place 🤔

dangrondahl · 2026-07-03T10:10:27Z

Thanks @dangrondahl , but I wonder if this is worth doing. Especially with removing structured types feels like we're taking more on ourselves not using existing libraries, which might already solved this and also have good performance optimisations in place 🤔

Thanks @mbevc1
I agree and actually also why I hesitated on opening a PR in the fist place. So this was just to show what the changes actually was.
It comes at a price and the full trade-off is not clear to me.

Happy to close it with a "won't fix" or "not now" decision. 😊

dangrondahl · 2026-07-03T10:31:47Z

Tradeoff exists! Chose to stay weith what we know for now.

dangrondahl added 2 commits July 3, 2026 08:43

claude Bot reviewed Jul 3, 2026

View reviewed changes

dangrondahl closed this Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(kube): shrink binary 15.5MB and drop the k8s.io/kubernetes dependency#989

perf(kube): shrink binary 15.5MB and drop the k8s.io/kubernetes dependency#989
dangrondahl wants to merge 2 commits into
mainfrom
reduce-k8s-binary-size

dangrondahl commented Jul 3, 2026

Uh oh!

claude Bot commented Jul 3, 2026 •

edited

Loading

Uh oh!

claude Bot Jul 3, 2026

Uh oh!

claude Bot Jul 3, 2026

Uh oh!

claude Bot Jul 3, 2026

Uh oh!

mbevc1 commented Jul 3, 2026

Uh oh!

dangrondahl commented Jul 3, 2026

Uh oh!

dangrondahl commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dangrondahl commented Jul 3, 2026

What

Impact

What was measured and deliberately NOT changed

Verification

Follow-ups (not in this PR)

Uh oh!

claude Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: PR #989 — shrink k8s binary size

Static analysis — all clean ✅

Findings (all non-blocking)

Things done well

Uh oh!

claude Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

mbevc1 commented Jul 3, 2026

Uh oh!

dangrondahl commented Jul 3, 2026

Uh oh!

dangrondahl commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Jul 3, 2026 •

edited

Loading