I quit my job last week. Not that I didn’t like Metaculus; I’m still no forecasting true believer despite my year there, but it’s a fascinating field, it was great to write code daily for the first time in years, my co-workers were excellent, and we even shipped some LLM-powered features. In fact, Metaculus’s former CTO (about whom I can’t say enough good things) left last year to build a startup focused on forecasting by LLMs. How do they do? Spectacularly well, actually, per Astral Codex Ten yesterday.
So why quit? Well — with apologies to my deep-tech / climate-tech / author / journalist / artist / international affairs friends — in the words of Sarah Guo:
So a friend / former colleague and I are working on a thing. A very business-y (and, I think, lucrative) thing, but with quixotic starry-eyed sociopolitical aspirations too, way down the road. Details to come. The more we work on it, the more obvious it seems that, despite modern AI’s many many frustrations and infelicities, it will be the world’s next great transformative force.1 So I’m jumping in with both feet.
Zeitgeist-wise, the timing seems eerily good; literally the same day, the two biggest AI announcements in eleven months hit —
As Rohit Krishnan put it:
He was talking about Gemini, Google’s mindblowing new model, and he was right.
That day’s other big announcement was, of course, OpenAI’s Sora text-to-video breakthrough. I don’t want to underplay that. With one stroke they lapped entire companies such as RunwayML. We’re now only a few steps — a few years? — from generative AI literally turning scripts into compelling scenes, no intermediaries required. This is astoundingly cool, and I personally have a whole trunk of screenplays I’ve written that I cannot wait to turn into feature-length AI-generated “model movies,” or “talking storyboards,” or whatever we wind up calling that new form.
But, probably because I’ve been beating my head against the limitations of today’s models, it was Google’s Gemini 1.5 that really widened my eyes. I’m on record as underwhelmed by December’s Gemini 1.0/Ultra, but this … is something else. (It’s clearly “2.0,” but came too soon for them to brand it that. I suspect it’s also from a whole separate team, that December was Google Brain and this was DeepMind.)
Anyway. Regardless of the inside baseball of its genesis/launch, I mean just look at it. Less than a year ago GPT-4 launched with a maximum input size of 32K tokens / ~24K words. That seemed like a lot! …Today Gemini 1.5 can handle 10 million tokens.2 I’ve published nine books and half a million words of journalism over my life; you could stuff my entire published oeuvre into a single input four times over.
And when Google calls it multimodal, it means multimodal.
Yes, it’s still imperfect and flawed and will have blind spots and frustrations. No, it won’t write all your code for you, or run your business for you. But the single largest limitation of LLMs — which, I remind you, were already a mindblowing and transformative technology! — was the context window: the hard limit on how much input they would accept at all, and the frustrating soft limit of how their outputs subtly got worse as you stuffed more into that window.
And last week we saw that limitation … not vanish, exactly, 10M tokens is still a very finite number, and I’m sure there are still quantity/quality tradeoffs, but … diminish by two orders of magnitude overnight. And of course Google/DeepMind is only one of several frontier labs. We can assume OpenAI, Anthropic (remember them?), and Meta will add to their/our arsenals of futuristic epistemological toolkits very soon, too.
Seriously, how can you work on (almost) anything else right now?
Contemporaneous with another, mind you, that being solar power.
Albeit only 1M in production right now. But still!