Hidden Monsters, Secret Languages

The cryptozoology of modern AI

Nov 22, 2022

One of the strangest things about generative AI models is that, since their advent, ordinary users have reported the discovery of recurring ... and sometimes terrifying ... entities which seem hidden within them. "The Crungus." "Loab." An entire internal language that DALL-E seems to have created, and can interpret.

Hidden possibilities aren’t surprising. A fascinating aspect of this AI summer, with respect to both diffusion models and large language models, is their “capability overhang” — meaning many of their remarkable powers are not easily accessed, but must be teased out by precise and elaborate prompting. “Prompt whispering” has become its own art, and/or science, and/or new form of programming language.

But hidden creatures and languages are another thing entirely. Are such entities really hidden within DALL-E's vast capability overhang? Or is this just apophenia, the human tendency to see patterns where none exist, and construct subjective narratives out of what is actually just stochastic noise?

Unsatisfyingly the answer is both yes and no. There are now three semi-famous examples of what I'll call “AI cryptids”; let’s consider them in turn.

The Crungus

A semi-human monster with a semi-consistently misshapen face appears fairly consistently on Craiyon when the non-word “crungus” — and/or, even more curiously, really any word which ends in “-rungus” — appears in the prompt.

It seems very unlikely that such words appear extensively in the captions on which any diffusion models are trained. The theory that it was an interpretation of “Krampus” was quickly dismissed. Nobody yet seems to have a good explanation of why that particular suffix has such a powerful effect on a diffusion model.

Loab

No less terrifying than The Crungus is “Loab,” a woman originally created from the prompt text “opposite of Brando” followed by a request for the opposite of that generated image. But the opposite of the opposite of Brando is not Brando; rather…

…it seems to be a disfigured woman with wild staring eyes, and moreover, whenever an image with Loab is used as part of a subsequent prompt (recall that diffusion models can use both text and images as prompts), Loab recurs in the results — she seemed to have much more diffusion persistence than other images.

At least one extensive, detailed explanation has been proposed for Loab, suggesting that this is because “she” was created by asking for opposites. If you imagine the possibility space as an actual 3-D space, opposites — the points furthest away from any given possibility — are more likely to be on the very edges of that space, and so there are relatively few of them. Think of a huge square room; the farthest point from anywhere is always going to be one of the four corners. This is a very interesting idea … but almost certainly wrong, as I’ll explain below.

A Secret Language

Not technically an AI cryptid, as it’s not a creature … but perhaps strangest of all … are the reports that DALL-E has a a secret internal language, in which, for instance, “apoploe vesrreaitais” means birds, and “contarra ccetnxniams luryca tanniounons” means bugs — and, furthermore, that DALL-E both generates this language in outputs and responds to it in prompts.

There are plenty of naysayers who debunk this claim … mostly. But even the naysayers tend to grudgingly agree that while these aren't persistent entities per se, there does seem to be be something weird happening with them. Just not ... that weird.

What is going on?

So is any of this AI creepypasta actually true? Again — sorry — yes and no.

This is all an excellent example of why I think knowing how diffusion models work is actually important. I wrote about that at length two weeks ago, but let's review it lightning-quick; both text and image prompts are converted into embeddings, also known as “latent space,” which is an AI's view of the common features and aspects inherent in the data on which it is trained.

Imagine trying describe every painting in a large museum in words. You'd find yourself using a lot of recurring phrases: “thick brushstrokes,” “still life,” “luminous sky,” “woman’s face.” Now imagine being a monk whose life's work is to encode all of this museum's images in words so specific and voluminous — and yet efficient — that the paintings could almost be recreated from those descriptions. You'd probably wind up inventing some entirely new vocabulary words to describe common features, right? Well, that monk's life's work is, to oversimplify, what an AI is trained to do, except it uses an entirely made-up vocabulary. That vocabulary is its latent space, and the description-in-invented-words of any given painting are its embeddings.

It's important to note that here the language is a metaphor; the AI is not recording actual words. Instead this knowledge is ultimately encoded as numeric "weights" as my previous posts described. Prompts are transformed into embeddings, which in turn become the metaphorical magnets guiding the “drift” of the denoising process that diffusion models use to generate images out of nothing.

Some tentative explanations

So: where in this process are the weirdnesses above happening? Well, we don’t know. A fun thing about modern AI; very few explanations of its behavior are certain! Recall that, for instance, even DALL-E's creators don't really have any idea why it's so bad at spelling. We can, however, make some reasonably educated guesses. Let’s do so for DALL-E’s secret language, Loab, and the Crungus.

The secret language explained

The “secret language” has the simplest probable explanation; that it’s most likely due to the text tokenization phase of DALL-E’s pipeline, a part of the process I haven't really discussed at all, yet. Let’s address that.

We humans divide written language into words, and words into letters. But it turns out the most effective way to partition text prompts for AI is as neither words nor letters, but something in between: tokens. (You can see how GPT-3 tokenizes text here.) DALL-E's "secret language" isn’t secret per se, it’s encrypted; a seemingly nonsense word like “Apoploe” when tokenized, closely resembles meaningful scientific names for birds. It’s very vaguely sort of like rhyming.

Note that the “byte pair encoding” used to tokenize text happens before the inputs hit any neural network at all; it’s a classical software algorithm. As such, the “secret language” is more likely pure chance, something bound to happen from time to time, than a real discovery. DALL-E randomly created an image of birds which included nonsense words that, when tokenized, "rhyme" with scientific bird words — that’s all. As more and more people experiment with DALL-E, we'll come across similar situations more often, but that doesn't mean that a secret language has emerged.

Loab semi-explained

The most interesting thing about “Loab” is her prompt persistence; that when she is combined with other images as a prompt, she tends to recurr more often than other recognizable entities. Remember that any prompt becomes, essentially, magnets scattered over a diffusion model's vast probability space, guiding its drift as it slowly denoises a new image into existence. Loab's embeddings seem to engender more powerful "magnetic drift" than most.

OK, you may say, but why? The theory mentioned above that opposites cluster into only a few possibilities is interesting. But Occam’s Razor suggests that this is, again, essentially a random artifact of training, that if you train a neural network a hundred times, each time the "magnetic power" of various different images will be slightly different.

The Crungus … not explained

Finally, the Crungus. This is the most striking of all the AI cryptids, because it's the most recurrent, meaning it exerts the most powerful magnetic influence ... but it's hard to know why. We do know that some embeddings are inherently more powerful, e.g. “in the style of” an artist. (Likely because the significance of authorship is signified as important both implicitly and explicitly in training data.) Craiyon seems like it has been been trained to recognize “crungus” as a marker as significant and dispositive as an artist's name. But, again, why?

The Lovecraftian horror-novel explanation is that the AI has discovered a real pattern, a named yet previously invisible monster that has long been lurking and latent across a broad swathe of human art, even though no individual human mind had ever recognized or known of or consciously crafted it.

But this seems ... unlikely. The prospect of AIs detecting very real connections and patterns, previously invisible to humans, is both fascinating and chilling; but the more prosaic truth is that just because a pattern has been detected doesn't mean it's real. When you connect everything to everything, from time to time you will find strange patterns which are entirely coincidental. Perhaps the most interesting thing we're learning from these cryptids is that, just as we humans are often over-eager to see patterns where none exist, AI models can experience apophenia too.

Gradient Ascendant

Discussion about this post