
Large language models captivate for a variety of reasons; I am particularly intrigued by the fact that we know how to build them even without really knowing how they work. So I tend to gravitate less towards discussions of what they can (or cannot) do and more towards discussions of how & why. (Which may have something to do with the fact that the “what” changes rapidly, while progress on “how” and “why” feels a bit more tractable for an outsider to follow.) Hence my interest in this podcast with Ken Stanley on how neural networks represent the knowledge they are learning (transcript if you prefer). If you don’t have time to listen to the full chat, at least take a minute or three to review the figures in the preprint paper on discussion; they tell a fairly compelling story just on their own.
The core of the paper is a comparison of neural networks trained in different ways to generate the same images. One method is from Stanley’s PicBreeder project, where the neural networks evolve as users pick images they’d like to “reproduce” further. Under this method, there is not necessarily a specific target image (at least at the start; eventually when something recognizable begins to emerge, the users might deliberately refine it). Rather, the criteria are more generally related to interest or novelty or whatever it is that grabs the user’s attention. The second method is the gradient descent approach typically used in one form or another in training the various neural networks at the core of all the headline-generating AI tools. In the second case, there is a target: the specific images that different PicBreeder networks generate. The researchers then went on to examine the behavior of the different networks.
The key observation is that, while both sets of networks can generate the exact same images using comparable numbers of parameters and similar network structure, the way in which they accomplish the end result is very different. This is illustrated by nudging individual parameters in each model and seeing what images they generate. For the PicBreeder networks, these tweaks produce images that are modified in ways that lend themselves to compact descriptions: the shape of a mouth changes, eyes get bigger or smaller, butterfly wings change color. For the gradient descent networks, the tweaks tend to change the images in more complicated ways. The changes might be asymmetric even if the base image is symmetric, or they may apply to the entire image or to portions of it that don’t correspond to what a human would identify as the component shapes. In other words, while the base image results are indistinguishable, the images that are near them in space defined by the networks and their parameters are very different.
What Stanley and colleagues infer from this is that the two networks have meaningfully different representations of their subjects. The PicBreeder models seem to have representations of distinct parts and of their relationships. For the face image, there seems to be a concept of eyes and mouth, and a notion of symmetry; for the butterfly image, a notion of wings and body and so on. The gradient descent images don’t show evidence of the same modularity; instead, they seem to build up the final image from a complicated series of skull-ish images that happen to add up to the target. They describe these two situations as unified factored representation and fragmented entangled representation, respectively.

The hypothesis is that the different training routes might have something to do with how these different styles of representation arise. After all, the PicBreeder networks evolved in an open-ended fashion analogous to how humans evolved or to how humans learn, and the representations of those networks seem to align better with how humans represent concepts. I’m inclined to agree that the method is significant, but perhaps not for the exact same reasons. The representation style may align with humans’ because the images were chosen by humans; images that changed in ways humans expected or could recognize or understand were probably more likely to be selected than images which changes haphazardly–or at least perceived as such by humans. In other words, there may have been a selection bias for the observed feature that nearby images are similar in the ways that humans judge similarity. The researchers describe this as an increase in evolvability, but perhaps they simply have more capacity to evolve in recognizable ways; maybe the gradient descent networks have greater capacity to evolve towards a new class of images.
And then there is the fact that while unified factored representation may better reflect how humans learn, it doesn’t necessarily reflect how humans evolved. While Stanley draws comparisons to biological development and how it can benefit from modularity, it is not clear to me that biology actually is as modular as he is describing. The more we learn about human genetics and the genetics of other large multicellular animals, the more it seems that many of our traits are influenced by many of our genes, an arrangement that sounds more like an entangled representation at least, and perhaps a fractured one. So an evolutionary process in general may not tend to produce unified factor representations; it may in fact be the case that evolution has produced genomes with fractured, entangled representations of their environments which can nevertheless produce neural networks capable of learning unified factored representations of those same environments.
Now perhaps you don’t care about these technical details of neural networks and large language models. But if Stanley and colleagues are correct about the relative merits of different representations, there could be broader implications. As they discuss, some portion of human learning, especially the formal education components, are structured to produce unified factored representations. But plenty of our experiences are haphazard and may leave us with inefficient mental models. We may be paying the price for that right now. As this Scientific American piece articulates, we don’t all have the same understanding of how science works or what scientists do, which in turn is contributing to the growing divide in priorities about how to fund science and what science is worth pursuing.
Religious education and spiritual formation face these issues as well. What is the correct form and sequence for introducing various ideas, stories, and experiences to induce the desired representation of God? I think we are seeing that some paths lead to an understanding of God which, if exposed to slightly different inputs or contexts, becomes unrecognizable. Other paths, however, lead to an understanding of God which is more robust. The costly process of refactoring one representation into another might look like deconstruction and reconstruction. All the better to be shown the path leading to the preferred representation in the first place.
Andy has worn many hats in his life. He knows this is a dreadfully clichéd notion, but since it is also literally true he uses it anyway. Among his current metaphorical hats: husband of one wife, father of two teenagers, reader of science fiction and science fact, enthusiast of contemporary symphonic music, and chief science officer. Previous metaphorical hats include: comp bio postdoc, molecular biology grad student, InterVarsity chapter president (that one came with a literal hat), music store clerk, house painter, and mosquito trapper. Among his more unique literal hats: British bobby, captain’s hats (of varying levels of authenticity) of several specific vessels, a deerstalker from 221B Baker St, and a railroad engineer’s cap. His monthly Science in Review is drawn from his weekly Science Corner posts — Wednesdays, 8am (Eastern) on the Emerging Scholars Network Blog. His book Faith across the Multiverse is available from Hendrickson.
Leave a Reply