Performance
jhelm renders Helm charts to Kubernetes manifests in pure Java. Its performance work is profile-driven: rather than synthetic micro-benchmarks, jhelm is measured against the real workload it exists for — rendering a large corpus of real-world Helm charts — and tuned where a profiler points, with the byte-for-byte parity suite as the correctness gate for every change.
|
Numbers below are indicative. They come from sampled JFR profiles of a chart-render batch on an otherwise-idle developer workstation (JDK 21). Sampling profilers report shares, and shares move as the total moves — a site whose absolute cost is unchanged can show a rising share once a bigger neighbour shrinks (see Reading a profile diff). Always anchor on the absolute total (bytes, GC ms) and re-measure on your own box; the point is the direction and magnitude of a change, not the absolute figures. |
1. The layers
jhelm’s render cost splits across two codebases, and it helps to keep them separate:
| Layer | What it owns |
|---|---|
jhelm ( |
Chart loading, value merging, subchart
recursion, named-template collection, manifest assembly, and the Helm function library
( |
The underlying Go |
The engine layer has its own benchmark suite and optimization history — see the gotmpl4j performance page. This page is about the jhelm layer: the cost of turning a chart into manifests.
2. The workload
jhelm’s render benchmark is the same corpus that guards its correctness: the chart-parity suite
(KpsComparisonTest), which renders 540+ real-world Helm charts — bitnami, grafana,
prometheus, gitlab, cilium, harbor, datadog, and many more — and compares each byte-for-byte
against helm template. For profiling, those charts are pre-fetched and driven through jhelm’s
render path alone (load → install --dry-run → manifest), with no helm subprocess and no
diffing, so the profile reflects jhelm’s own work on a representative spread of chart shapes.
3. Methodology — profiling with jvmlens
jhelm is profiled with jvmlens, which turns a JFR recording
into a compact, ranked hot-path / allocation / GC summary (and an A/B diff between two
recordings). The loop is: capture a render batch, read the ranked summary, fix the top lever,
re-capture, diff to prove the delta, and re-run the full parity suite to prove correctness. The
flags and diff semantics used below are on the
jvmlens usage page; for the same loop worked
end-to-end on the engine layer see the
jvmlens case study. Several
of jvmlens’s safeguards were filed from this workload — the live-attach-over-dumponexit tip,
the child-process-pipe I/O hint, and the flat-total profile-diff hedge described next.
A few hard-won practicalities:
-
Capture by live attach, not a JVM flag. A surefire-forked test JVM is killed, not exited cleanly, so
-XX:StartFlightRecording=…,dumponexit=truesilently produces no file. Attaching withjvmlens profile <pid>to the running fork captures reliably. -
Warm the chart cache first. A first pass populates the local chart cache so that network fetch (which is jhelm code, in
RepoManager) doesn’t dominate the profiled pass. -
Scope to
org.alexmond.jhelmso the engine (org.alexmond.gotmpl4j) and crypto frames fall out of the application roll-up and jhelm’s own frames are legible.
JVMLENS=path/to/jvmlens.jar
# Attach to a running render-batch JVM, capture 40s, keep the recording, scope to jhelm
java -jar "$JVMLENS" profile <pid> -d 40 -w 6 -a org.alexmond.jhelm -k before.jfr
# After a change, capture again and diff — names exactly what moved
java -jar "$JVMLENS" analyze after.jfr -b before.jfr -a org.alexmond.jhelm
4. Where render time goes
Profiling a render-only batch (no helm, no diff) shows that jhelm’s own glue is not the
bottleneck. Render cost concentrates in three places:
| Bucket | ~Share | Notes |
|---|---|---|
Template parse / lex |
~30–40% |
The gotmpl4j lexer/parser, driven once per template. The dominant cost; an engine-layer lever (tracked in gotmpl4j). |
Chart-invoked crypto |
~30% |
|
jhelm orchestration |
remainder |
Value merge, named-template collection, manifest assembly, the Helm function library. Small individually; the place jhelm can actually move the needle. |
The practical implication: jhelm-side wins come from not doing redundant work around the parse, not from the parse itself.
5. Reading a profile diff
Optimising shrinks the total, which inflates the share of everything that stayed the same. A neighbour can read as "▲ slower" in a diff while its absolute work is flat — and a faster render completes more batch rounds in a fixed capture window, so the unchanged crypto frames collect more samples. Always cross-check a share move against the absolute total (bytes / GC ms) before trusting it. This share-inversion is a known profiling artifact, not a regression.
jvmlens now flags it for you: a ▲ hot-path row under a ~flat exec-sample total is annotated
(possible sampling redistribution …), and the diff adds a one-line caution that fixed-duration
exec-sample deltas conflate per-op cost with throughput — pointing at a fixed-iteration bench
A/B for a clean per-op comparison (see the
jvmlens usage page). Both safeguards were
filed from this jhelm workload.
6. Optimization history
Each change below was found with jvmlens, verified with a before/after JFR diff, and gated on the full byte-for-byte parity suite (no manifest may change).
| Change | Measured (jvmlens diff) | Correctness gate |
|---|---|---|
Skip the redundant render-pass re-parse of define-free templates (#573) |
|
540/540 parity charts byte-identical |
Quote-normalize regex fast-path in |
|
|
6.1. Notes on the wins
-
Double-parse elimination (#573). jhelm parses each template twice per render — a collect pass to gather
defineblocks, then a render pass before executing. The two passes key the parse differently, so a parse cache can’t dedupe them within one render. A per-render memo now skips the render-pass re-parse when the identical text is already parsed and declares nodefine(templates withdefinestill parse, preserving define precedence). This roughly halves parse allocation — the single hottest render path. -
Quote-normalize fast-path (#509).
toYaml/toJsonran a quote-stripping regex on every output line, but most rendered-manifest lines carry no quotes. AnindexOf('"') < 0fast-path skips the regex for quoteless lines — behaviour-identical, since a quoteless line took the old no-match branch anyway.
Profiling jhelm also surfaces engine-layer levers — e.g. the gotmpl4j lexer’s per-parse Token
allocation — which are filed against
gotmpl4j rather than worked around in jhelm.
7. Reproducing
The render corpus is the parity suite (tagged comparison, excluded from the normal build):
# Render + compare a single chart (or a small batch) against `helm template`
./mvnw test -pl jhelm-core -Dtest=KpsComparisonTest#compareSingleChart
# The full corpus (run in chunks; it shells out to `helm` per chart)
./mvnw test -pl jhelm-core -Dgroups=comparison -Dtest=KpsComparisonTest#compareAllTopCharts
To profile, warm the chart cache with one pass, then attach jvmlens to a second render pass as shown under Methodology above.