Let’s look back a little less breathlessly at the crazy period of six months ago, when new and epochal AI research/models/applications seemed to drop every week. The pace has since slowed. We now find ourselves with a whole new set of computing tools at our disposal—scratch that, an entirely new kind of computing tools—and it is our job, as emissaries of ‘progress’ or ‘productivity’ or ‘capitalism’ or whatever label you put on Building New/Better Things, to use them to Build New/Better Things.
It turns out this is hard! Well: sort of. New Things are actually easy, as is showing them off. As a friend recently wrote re an LLM app, “it’s pretty wild how good it is out of the box!” But a New Thing that uses LLMs in predictable, repeatable, and legible ways—that is liberal in what it accepts and conservative in what it generates, as Jon Postel might have put it—is, it turns out, really quite hard to build.
Our new kind of tool frequently twists and shapeshifts in our metaphorical hands, even as we use them. LLMs are very prompt sensitive. It’s very hard to quantitatively measure the output quality of a given prompt. Furthermore prompts are very LLM-sensitive, so if you want to use a different model—because you’ve fine-tuned, or you worry about OpenAI dependence, or OpenAI just deprecated your foundation model—all your prompts may have to change.
Meanwhile, we still have no real idea what exactly is going on inside those models … but we do know GPT-4 is non-deterministic, even if you set its temperature to 0, which is supposed to make it deterministic. Stepping up an abstraction level, it’s hard not to draw the conclusion that LLMs are—currently—fundamentally agents of chaos, not order … and if one were to draw up a list of social cohorts for whom ‘order’ seems pretty essential, “businesspeople” and “engineers” would be near the top.
Nonetheless we engineers have found some ways to impose order in extremely useful ways. Most of the non-technical public still thinks of LLMs as one and the same as ChatGPT; a service you can ask questions of, from which you receive fluently written (but often factually incorrect) replies. Useful for writing essays, suggesting alternatives, sparking ideas, but still nothing more than a smart chatbot that often gets things wrong.
I don’t think the public appreciates that to modern AI engineers, the flawed knowledgebase built into LLMs, what the great but in this case surprisingly blinkered Ted Chiang calls a “blurry JPEG,” is an interesting side effect of their training but not particularly important or relevant. What LLMs are mostly actually used for is transformations. They are “anything from anything machines.” Converting or collating from one format to another, or one language to another; RAG search, not chat search.
Even coding—for which, incidentally, fine-tuned open-source LLMs that run locally are at least trending towards being genuinely competitive with GPT-4—I find myself using GitHub Copilot more than ChatGPT, partly because the latter’s understanding of software libraries ends as of September 2021, but mostly because the former automagically takes the context I’m working in and transforms it into suggestions. (And I can confirm that it feels like it has been getting steadily better … mostly, as it turns out, through prompt engineering.)
All of which is fine. LLMs being used by businesses and engineers as “gray boxes,” mining information and transforming it into more useful forms, building whole superstructures of carefully structured (and brittle) prompts to channel these agents of chaos into consistent and legible outputs … that’s fine, if a bit boring, and also a bit Rube Goldberg, which is why going from “killer demo” to “killer product” is such a long and difficult process.
But if one were to draw up a list of social cohorts for whom ‘order’ is not essential, shouldn’t the groups at the top of that list be making much better, and more natural, use of these agents of chaos? Shouldn’t artists be romping through this wild new weird world of LLMs and diffusion models, wreaking glorious havoc?
On paper, maybe. In practice, no, not least because those cohorts tend to be incredibly angry at this new family of tools, viewing their mere existence as proof of unforgivable and illegal theft. I don’t think most engineers recognize the depth and intensity of this fury.
This is of course largely because they fear their work and income will be superseded. I suspect courts will find scraping training data is not illegal (in nations with fair-use exceptions to copyright.) But I also suspect it will take years, maybe even a whole new generation, and new business model(s) for artists, before those cohorts countenance the use of such agents of chaos. (Modulo some already extant interesting exceptions, such as improvised video-game dialogue.) Which is a real shame, because I also suspect that when the art world finally does turn to them—and it’s when, not if—we’ll get some really interesting art.
That is, once we get some different models. There’s a fascinating TIME article by a comedian with a buddy at OpenAI who shows him how LLMs can write killer comedy—but not an LLM you can access. Rather, one called “base4,” presumably the base model of GPT-4 before it was RLHFed to ensure it never says anything remotely edgy, much less offensive. (I’ve seen Llama-2 be very reluctant to so much as discuss Romeo and Juliet.) If we want artists to use AI, we’ll have to make much edgier AI available. I for one look forward to them being angry at OpenAI for excessive RLHFing rather than scraping training data; but I suspect that won’t happen anytime soon.