For a year, cold-start was the number that kept us honest. Every other benchmark could be tuned in isolation. Cold-start collapsed the whole runtime into one observable.
We measured the path from KetoyRuntime.load() to the first recomposition finishing. No network. Local bundle, cached on disk. A 142-composable tree. Pixel 6, AOSP 15. This post is an accounting of how that number moved.
§ 01 — BASELINE910 ms, mostly in verification
Our first honest measurement was 910 ms. The bulk of it was verifier overhead. We were running an over-general dataflow pass because we had inherited JVM-style invariants that KBC had already retired.
The fastest verifier is the one that proves the fewest theorems.— Pinned above the runtime team desk
§ 02 — THE WINSWhat actually moved the number
- Merged parse + verify. We walk the section table once. 168 ms → 21 ms.
- Hoisted constant pools out of per-function tables. Link time dropped to 18 ms.
- Lazy slot-table allocation. We don't allocate until the composable actually runs; first frame stopped paying for scopes it never enters.
- Compose stability metadata in the header. We can skip entire subtrees during first recomposition.
The single largest win was moving verification out of the critical path. It now runs in parallel with disk read for the next section, so by the time we are ready to link, verification is done.
That took us to 142 ms. On a cold device cache, with no tricks. The floor, we think, is somewhere near 90 ms, bounded by IO and the native Compose layout pass. We are not there yet.
Kenji Park
Runtime engineer. Spends most weeks deleting code that the verifier no longer needs.