Cells to Pixels: EPFL and Google Research Just Broke the NCA Resolution Ceiling
The dirty secret of Neural Cellular Automata has always been the same one that killed early neural volume rendering: if every output element requires its own stored state, your memory bill scales quadratically with resolution, and you hit a wall long before you reach anything a game engine would call "production quality." For NCA researchers, that wall has sat at roughly 128×128 — visually compelling in a paper figure, practically useless in a shipping product.
A paper accepted at SIGGRAPH 2026 moves the wall. "Cells to Pixels," co-authored by researchers at EPFL and Alexander Mordvintsev at Google Research — the same Mordvintsev whose 2020 work on self-organizing texture NCAs sparked most of the subsequent research — introduces a hybrid architecture that decouples the NCA's computational grid from its output resolution entirely. The result is a system that runs self-organizing dynamics on a coarse grid and renders at arbitrary target resolution, with inference costs that stay flat as you scale up. The interactive WebGL demo loads in a browser tab. That detail alone tells you the inference budget is real.
The Wall NCAs Have Always Run Into
To understand why this matters, it helps to understand what classical NCAs actually do and where they structurally break down.
A Neural Cellular Automaton models a grid of cells, each carrying a state vector. At each timestep, an update rule — implemented as a small convolutional neural network — reads each cell's local neighborhood and produces a new state. The system is trained end-to-end so that the update rule, applied repeatedly from a seed state, converges to a target appearance. The appeal is biological plausibility: the same local rule, applied in parallel across every cell, produces globally coherent emergent structure. No cell knows about the overall pattern; the pattern emerges anyway.
The problem is the conflation of two things that should be separate: state dynamics and appearance. In a classical NCA, every cell is a pixel. Grid resolution is output resolution. If you want a 1024×1024 output, you need a 1024×1024 grid, which means a 1024×1024 state tensor, which means your memory and compute cost at training time scales as O(N²) — quadratically with linear dimension. A 4× resolution increase costs 16× in memory.
There are two other structural problems beyond the memory cost. First, NCA update rules are strictly local: each cell only sees its immediate neighbors. Long-range pattern coordination — the kind of global structure that distinguishes a convincing organic texture from noise — has to emerge through thousands of iterative propagation steps, which is slow and sensitive to initialization. Second, real-time inference at high resolution compounds both problems: you need the full high-resolution state tensor in GPU memory every frame, and you need to run the update network across every pixel every step.
The field has known about these constraints since the original Mordvintsev NCA work. MeshNCA extended the architecture to 3D mesh surfaces and produced striking results, but the fundamental scaling limitation remained. What was missing was an architectural insight, not just an engineering optimization.
The LPPN: Decoupling Dynamics from Appearance
The core contribution of "Cells to Pixels" is the Local Pattern Producing Network (LPPN) — a shared lightweight MLP that maps coarse NCA cell states and local coordinates to high-resolution appearance attributes: color, surface normals, and whatever else the rendering pipeline needs.
The architecture works in two stages. The NCA still runs on a grid, but that grid is coarse — you can think of it as a latent field encoding the dynamics of the system. The LPPN then acts as a decoder: given the interpolated cell states surrounding an output pixel and the local coordinates of that pixel within the coarse grid, it produces the final appearance attributes for that pixel. Because the LPPN is shared across all output pixels — the same network weights evaluate every query — it adds no per-pixel state overhead. And because it is conditioned on local coordinates, it can interpolate and hallucinate fine-grained detail between coarse grid samples at whatever output resolution you request.
The three NCA scaling barriers each get addressed cleanly. Quadratic memory cost drops because the state grid stays coarse regardless of output resolution; the LPPN is stateless per-pixel. Strictly local information propagation is ameliorated because the coarse grid can represent larger-scale structure with fewer cells covering more spatial area. Real-time inference cost stays manageable because the LPPN is a small MLP evaluated in parallel across pixels — exactly the workload a fragment shader or CUDA kernel is built for.
Critically, both the NCA update step and the LPPN evaluation are local operations. This preserves the GPU parallelism that makes NCAs practically deployable in the first place. The system also preserves all the characteristic NCA behaviors that make the architecture interesting: regeneration from damage, spontaneous dynamics, and the capacity to run across 2D grids, 3D grids, and mesh surfaces.
The interactive demo and Colab notebook are already public at cells2pixels.github.io; the GitHub code release is forthcoming.
The NeRF Insight Nobody in the NCA Community Is Talking About
Here is the non-obvious read on this architecture: the LPPN is not a new kind of network. It is a coordinate-conditioned implicit neural representation — structurally, it is a NeRF query function with NCA cell states serving as the scene encoding.
The convergence between the NCA community and the implicit neural representation (INR/NeRF) community is more complete than either group is currently advertising. The core insight that made NeRF tractable — you do not need to store the output representation explicitly if you can query it cheaply — is precisely the insight that "Cells to Pixels" imports into NCAs. The coarse NCA grid plays the role of the scene's latent encoding; the LPPN plays the role of the radiance field decoder.
This reframing has a practical implication that teams should exploit immediately: LPPN-style decoders are already well-understood objects with a mature toolbox for accelerating inference. Hash-grid encodings (Müller et al., Instant-NGP, 2022) can dramatically reduce the coordinate encoding cost. Factored feature grids can decompose the state-to-appearance mapping into cheaper tensor operations. Distillation into explicit textures works for scenes or surfaces that do not change frame-to-frame, letting you bake the LPPN output into a static asset and pay the inference cost exactly once.
None of this toolbox is being discussed in the NCA literature, because the NCA community has not historically thought of itself as building implicit neural representations. That gap is the fastest path to production-grade performance for teams adopting this work — importing NeRF-derived acceleration techniques into the LPPN stage without waiting for the NCA research community to catch up.
The architectural separation between state dynamics and appearance rendering is, in retrospect, the obvious move. The fact that it took until 2026 to appear in a SIGGRAPH paper reflects how tightly the NCA community has been anchored to the pixel-equals-cell metaphor from the original formulation.
What This Means in Production
For teams seriously evaluating adoption, the honest accounting looks like this.
Two inference passes per frame, not one. Shipping this in a game engine means a coarse NCA update step on compute shaders followed by an LPPN pixel shader pass. Your frame budget analysis must account for both. The useful architectural fact is that the NCA step frequency does not need to match the render frame rate — you can run the NCA at 10Hz and the LPPN at 60Hz, with the LPPN interpolating between NCA states for the intermediate frames. This is a legitimate LOD strategy that classical NCAs structurally cannot support.
Profile on your target GPU before setting resolution targets. The LPPN is a shared MLP evaluated per output pixel, which maps well to fragment shaders and CUDA kernels. However, it will bottleneck on texture-fill-rate-limited hardware. The WebGL demo running in-browser is a strong signal that consumer hardware can handle it — but browser runtimes impose float32-only constraints. If your pipeline was designed assuming bfloat16, that precision difference will surface as subtle quality regressions that automated evals may not catch.
Train at your production upscale factor. The paper's claim of arbitrary resolution holds within the training distribution of upscale factors. If you train the LPPN at 8× coarse-to-fine and then query at 32× in production, you are extrapolating coordinate inputs the network never saw. The result is subtle interpolation artifacts — easy to miss in automated metric comparisons, obvious to a human artist. Train with the upscale factor range you actually intend to deploy.
Treat the NCA and LPPN as a coupled artifact. The LPPN weights are trained against a specific NCA weight checkpoint. You cannot independently fine-tune one without potentially invalidating the other; the LPPN has learned to decode a state distribution that depends on the specific NCA dynamics it was co-trained with. In your model registry, version them together. Pipelines that checkpoint models independently will silently produce degraded outputs when the two drift out of sync.
NCA dynamics are a feature and a liability. The self-organizing regeneration behavior that makes NCAs compelling for "living texture" use cases — spreading fire damage, growing moss, biological surface materials — also means that two runs from similar seeds can diverge visually over time. This breaks standard asset pipeline validation workflows that diff against a reference render. If your content pipeline requires deterministic, art-directable outputs, you need explicit seed management and either accept the divergence or constrain the NCA dynamics to a near-static regime, at which point you should ask whether you needed an NCA at all.
For mesh surfaces, watch seams. The MeshNCA generalization carries a structural limitation: the LPPN relies on locally smooth cell-state gradients across the mesh. High-curvature regions and UV seams will produce discontinuities that the local-only architecture cannot resolve without global information injection. This is not a tuning problem; it is a consequence of the locality constraint. Budget geometry cleanup time before applying this to hero assets with complex topology.
Know when to reach for something else. For static high-resolution texture synthesis where you need maximum fidelity and full art direction, diffusion-based methods — Adobe Firefly's texture pipeline, Stable Diffusion with ControlNet — outperform this approach and are easier to integrate into existing content pipelines. Traditional procedural approaches (Perlin noise, Wang tiles, Substance-style graphs) are faster and fully deterministic. Choose "Cells to Pixels" when you specifically need real-time, self-organizing dynamics at game resolution: living surface materials, simulations where the texture must respond to and recover from damage, generative environments where static assets would break the believability. The co-training complexity and reduced art-directability are real costs that only pay off if the dynamic behavior is load-bearing in your product.
The Architecture That Should Have Existed in 2020
"Cells to Pixels" is not a revolutionary paper in the sense of introducing an entirely new class of model. It is something arguably more valuable: the right architectural decomposition, applied to a problem the community had been solving with the wrong abstraction for six years.
The LPPN breaks the pixel-equals-cell coupling that has constrained every NCA paper since Mordvintsev's original texture work. By treating the NCA as a coarse latent field and delegating appearance to a shared implicit decoder, the system inherits the best properties of both paradigms: the emergent, self-organizing dynamics of NCAs and the resolution-decoupled, GPU-parallel inference of implicit neural representations. The fact that the WebGL demo runs on consumer hardware is not a footnote — it is the thesis statement.
For graphics engineers and procedural generation researchers, the practical action is clear: the interactive demo and Colab notebook are live now, the GitHub release is forthcoming, and the coarse grid resolution is the first hyperparameter to understand. Set it too coarse and you lose long-range pattern expressivity; set it too fine and you pay quadratic cost again. The sweet spot depends on the scale of self-organizing structure your application actually needs, and no paper can answer that for your specific use case.
The teams that will get the most out of this work are the ones that import the NeRF inference acceleration toolbox into the LPPN stage before the NCA community officially acknowledges it exists. That gap between communities is a six-to-twelve month window of competitive advantage for anyone willing to read two literatures at once.
Sources & Editorial Disclosure
This article was researched and written with AI assistance (Claude by Anthropic) as part of StackRadar's automated editorial pipeline. Content was synthesised from the following public developer community sources: Hacker News — Show HN · Dev.to.
All technical claims, version numbers, benchmarks, and project details should be independently verified against official documentation or the original sources listed above. StackRadar analyses and synthesises publicly available information and does not claim original authorship of the underlying events, projects, or research described. Mention of any project, product, or organisation does not constitute an endorsement by StackRadar. This content is provided for informational purposes only — 2026-06-18.