Performance

jhelm renders Helm charts to Kubernetes manifests in pure Java. Its performance work is profile-driven: rather than synthetic micro-benchmarks, jhelm is measured against the real workload it exists for — rendering a large corpus of real-world Helm charts — and tuned where a profiler points, with the byte-for-byte parity suite as the correctness gate for every change.

Numbers below are indicative. They come from sampled JFR profiles of a chart-render batch on an otherwise-idle developer workstation (JDK 21). Sampling profilers report shares, and shares move as the total moves — a site whose absolute cost is unchanged can show a rising share once a bigger neighbour shrinks (see Reading a profile diff). Always anchor on the absolute total (bytes, GC ms) and re-measure on your own box; the point is the direction and magnitude of a change, not the absolute figures.

1. The layers

jhelm’s render cost splits across two codebases, and it helps to keep them separate:

Layer What it owns

jhelm (jhelm-core, jhelm-gotemplate-helm)

Chart loading, value merging, subchart recursion, named-template collection, manifest assembly, and the Helm function library (toYaml, include, tpl, lookup, …).

gotmpl4j

The underlying Go text/template + Sprig engine — the lexer, parser, and AST executor that jhelm drives once per template.

The engine layer has its own benchmark suite and optimization history — see the gotmpl4j performance page. This page is about the jhelm layer: the cost of turning a chart into manifests.

2. The workload

jhelm’s render benchmark is the same corpus that guards its correctness: the chart-parity suite (KpsComparisonTest), which renders 540+ real-world Helm charts — bitnami, grafana, prometheus, gitlab, cilium, harbor, datadog, and many more — and compares each byte-for-byte against helm template. For profiling, those charts are pre-fetched and driven through jhelm’s render path alone (load → install --dry-run → manifest), with no helm subprocess and no diffing, so the profile reflects jhelm’s own work on a representative spread of chart shapes.

3. Methodology — profiling with jvmlens

jhelm is profiled with jvmlens, which turns a JFR recording into a compact, ranked hot-path / allocation / GC summary (and an A/B diff between two recordings). The loop is: capture a render batch, read the ranked summary, fix the top lever, re-capture, diff to prove the delta, and re-run the full parity suite to prove correctness. The flags and diff semantics used below are on the jvmlens usage page; for the same loop worked end-to-end on the engine layer see the jvmlens case study. Several of jvmlens’s safeguards were filed from this workload — the live-attach-over-dumponexit tip, the child-process-pipe I/O hint, and the flat-total profile-diff hedge described next.

A few hard-won practicalities:

  • Capture by live attach, not a JVM flag. A surefire-forked test JVM is killed, not exited cleanly, so -XX:StartFlightRecording=…​,dumponexit=true silently produces no file. Attaching with jvmlens profile <pid> to the running fork captures reliably.

  • Warm the chart cache first. A first pass populates the local chart cache so that network fetch (which is jhelm code, in RepoManager) doesn’t dominate the profiled pass.

  • Scope to org.alexmond.jhelm so the engine (org.alexmond.gotmpl4j) and crypto frames fall out of the application roll-up and jhelm’s own frames are legible.

JVMLENS=path/to/jvmlens.jar

# Attach to a running render-batch JVM, capture 40s, keep the recording, scope to jhelm
java -jar "$JVMLENS" profile <pid> -d 40 -w 6 -a org.alexmond.jhelm -k before.jfr

# After a change, capture again and diff — names exactly what moved
java -jar "$JVMLENS" analyze after.jfr -b before.jfr -a org.alexmond.jhelm

4. Where render time goes

Profiling a render-only batch (no helm, no diff) shows that jhelm’s own glue is not the bottleneck. Render cost concentrates in three places:

Bucket ~Share Notes

Template parse / lex

~30–40%

The gotmpl4j lexer/parser, driven once per template. The dominant cost; an engine-layer lever (tracked in gotmpl4j).

Chart-invoked crypto

~30%

genCA/genSignedCert/bcrypt/htpasswd called by the charts (RSA key-gen, bcrypt rounds). Inherent — Helm pays the same; not a jhelm cost to optimise.

jhelm orchestration

remainder

Value merge, named-template collection, manifest assembly, the Helm function library. Small individually; the place jhelm can actually move the needle.

The practical implication: jhelm-side wins come from not doing redundant work around the parse, not from the parse itself.

5. Reading a profile diff

Optimising shrinks the total, which inflates the share of everything that stayed the same. A neighbour can read as "▲ slower" in a diff while its absolute work is flat — and a faster render completes more batch rounds in a fixed capture window, so the unchanged crypto frames collect more samples. Always cross-check a share move against the absolute total (bytes / GC ms) before trusting it. This share-inversion is a known profiling artifact, not a regression.

jvmlens now flags it for you: a hot-path row under a ~flat exec-sample total is annotated (possible sampling redistribution …), and the diff adds a one-line caution that fixed-duration exec-sample deltas conflate per-op cost with throughput — pointing at a fixed-iteration bench A/B for a clean per-op comparison (see the jvmlens usage page). Both safeguards were filed from this jhelm workload.

6. Optimization history

Each change below was found with jvmlens, verified with a before/after JFR diff, and gated on the full byte-for-byte parity suite (no manifest may change).

Change Measured (jvmlens diff) Correctness gate

Skip the redundant render-pass re-parse of define-free templates (#573)

parseWithCache CPU 42% → 31% (−28% samples); parse allocation 2.3 GB → 1.3 GB (−45%); Engine.* allocation 3.6 GB → 3.0 GB (−18%); GC pause −20%

540/540 parity charts byte-identical

Quote-normalize regex fast-path in toYaml/toJson (#509)

removeUnnecessaryQuotes GONE from hot paths (was 6% CPU) and allocation (was 1.2 GB); total allocation 23.4 GB → 22.2 GB (−5%) over a 120× gitlab render

jhelm-gotemplate-helm tests + chart parity

6.1. Notes on the wins

  • Double-parse elimination (#573). jhelm parses each template twice per render — a collect pass to gather define blocks, then a render pass before executing. The two passes key the parse differently, so a parse cache can’t dedupe them within one render. A per-render memo now skips the render-pass re-parse when the identical text is already parsed and declares no define (templates with define still parse, preserving define precedence). This roughly halves parse allocation — the single hottest render path.

  • Quote-normalize fast-path (#509). toYaml/toJson ran a quote-stripping regex on every output line, but most rendered-manifest lines carry no quotes. An indexOf('"') < 0 fast-path skips the regex for quoteless lines — behaviour-identical, since a quoteless line took the old no-match branch anyway.

Profiling jhelm also surfaces engine-layer levers — e.g. the gotmpl4j lexer’s per-parse Token allocation — which are filed against gotmpl4j rather than worked around in jhelm.

7. Reproducing

The render corpus is the parity suite (tagged comparison, excluded from the normal build):

# Render + compare a single chart (or a small batch) against `helm template`
./mvnw test -pl jhelm-core -Dtest=KpsComparisonTest#compareSingleChart

# The full corpus (run in chunks; it shells out to `helm` per chart)
./mvnw test -pl jhelm-core -Dgroups=comparison -Dtest=KpsComparisonTest#compareAllTopCharts

To profile, warm the chart cache with one pass, then attach jvmlens to a second render pass as shown under Methodology above.