Discover more from Gradient Ascendant
Extropia's Children, Chapter 5
(Each chapter of Extropia’s Children can stand alone given cursory familiarity with the subject matter, but see also Chapter 1, “The Wunderkind”; Chapter 2, “This Demon-Haunted World”; Chapter 3, “Extropicoin Extrapolated”; and Chapter 4, “What You Owe The Future.”)
The golden age of LessWrong, then the hub of rationalist discussion, was 2011 to early 2015. This is confirmed by data as well as community history. According to the Timelines Wiki, in 2013, the Machine Intelligence Research Institute “change[d] focus to put less effort into public outreach and shift its research to Friendly AI math research.” 2013 is also when the authoritarian monarchist neoreactionary movement on LessWrong — a small, but tolerated, minority — rose into public prominence. In early 2015, Eliezer Yudkowsky finally finished his rationalist Harry Potter fanfic. By late 2015, it was generally accepted that LessWrong “seems dying” and “is far less active than it once was.”
You'll note however that there has been a great deal of conversation on LessWrong over the last year or two, including e.g. discussion of the Zoe Curzi and Jessica Taylor posts describing cultlike behavior in the rationalist community. (Which seems to have led to some retrenchment at both CFAR and MIRI.) The data confirms that a trough in 2015-7 was followed by a resurgence, albeit not to the previous frenzied levels of activity. It seems that, basically, rationalism got big enough to splinter, and did so, in a ‘Rationalist Diaspora’ described in detail by Scott Alexander. In particular, much of the community migrated to Alexander's own Slate Star Codex, of which more anon.
What also seems to have happened in this era is that Yudkowsky and MIRI, flush with newly donated millions, decided to try to evolve from theory and community-building to actual practical AI research. It did not go well.
Gradient Ascendant is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
ii. Decision, theory
In 2012, before the Singularity Institute became MIRI, its executive director Luke Muehlhauser wrote a post entitled “AI Risk and Opportunity: Humanity's Efforts So Far.” It included the provocative claim:
From 2005-2007, Yudkowsky worked at various times ... on the technical problems of AGI necessary for technical FAI ['Friendly AI,' aka AI that won't exterminate us] [...] Almost none of this research has been published, in part because of the desire not to accelerate AGI research without having made corresponding safety progress.
This was quite a claim: essentially, that this research would, if published, meaningfully accelerate perhaps the most extraordinarily difficult challenge in the history of human innovation. It is a claim which remains 100% unsubstantiated.
The one bit of research which did emerge from the Singularity Institute days was TDT, or Timeless Decision Theory. “Decision theory” is a branch of either philosophy or probability theory, depending on one’s point of view, which occupied much of MIRI's intellectual horsepower for years. MIRI tends to portray it as technical, practical, immediately applicable work. Others, even EA/rationalists, describe it as “philosophical decision theory” which “has gained significantly fewer advocates among professional philosophers than I’d expect it to if it were very promising.”
Yudkowsky wrote of TDT in 2009:
people asked if there was any Friendly AI problem that could be modularized and handed off and potentially written up afterward, and the answer to this is almost always "No", but this is actually the one exception that I can think of.
Shortly afterwards, Wei Dai — him again! — posted (with the snarky preamble “Since there seems to be little hope that Eliezer will publish his Timeless Decision Theory any time soon”) his own variant / iteration, Updateless Decision Theory. In 2012, responding to Holden Karnofsky's sharp criticism, Muehlhauser conceded:
[We have] no write-up of our major public technical breakthrough (TDT) using the mainstream format and vocabulary comprehensible to most researchers in the field.
Years later, Rob Bensinger of MIRI casually mentioned:
UDT is updateless decision theory, a theory proposed by Wei Dai in 2009 ... TDT is timeless decision theory, a theory proposed by Eliezer Yudkowsky in 2010 ... FDT is functional decision theory, an umbrella term introduced by Yudkowsky and Nate Soares in 2017 to refer to UDT-ish approaches ... TDT was superseded by FDT/UDT.
The claim that Yudkowsky’s theory was ‘superseded’ by Dai’s, which preceded it by a year, is ... remarkable. What this timeline actually shows is that, by MIRI's own estimation, their “major public technical breakthrough” as of 2012 was in fact dead on arrival, the exact opposite of a breakthrough, thanks to Dai’s earlier, superior theory. The mental gymnastics performed to preserve Yudkowsky’s primacy are spectacular.
In 2013, Yudkowsky wrote
nobody appears to be doing similar work or care sufficiently to do so. In the world taken at face value, MIRI is the only organization running MIRI's workshops and trying to figure out things like tiling self-modifying agents ... it would be very dangerous to have any organizational model in which we were not trying to [construct a Friendly AI].
The 2014 version of their research guide prominently mentions tiling agents ... but the ‘canonical’ version, which “has only been lightly updated since 2015,” does not. It’s unclear whether this was because the research was so fruitful that the concept became too dangerous to the world to even discuss … or because it hit a dead end … but Occam’s Razor is pretty unambiguous about which way to bet. Of course, giving up on avenues which turn out to be fruitless is a natural outcome of much research! However it's an awkward fit with the simultaneous claim that your research would endanger the entire world if published.
iii. What did MIRI actually do?
From the late 80s to the 2000s, Yudkowsky’s formative period, AI research was largely driven by so-called “Symbolic AI,” based on formal reasoning, comprehensible algorithms, theoretical logic, and structured ontologies. It didn't really go anywhere — this period is now called an “AI Winter” — but it was thought to be the future, while neural networks were interesting curiosities. Then, to oversimplify, in 2012, three University of Toronto researchers published the landmark paper “ImageNet Classification with Deep Convolutional Neural Networks” and birthed the era of modern machine learning. (That paper has been cited 117148 times as of this writing.) The leading edge of AI research moved — seemingly for good — from “fully specified formal reasoning” to “training black-box neural networks without necessarily understanding how exactly they arrive at their outputs.”
MIRI did not join this new world. To this day, MIRI’s Research page leads with
“MIRI focuses on AI approaches that can be made transparent (e.g., precisely specified decision algorithms, not genetic algorithms.)”
In 2014, they published their initial research agenda, focusing on “Agent Foundations” and in particular a subtopic called “Highly Reliable Agent Design.” It is fair to say HRAD never took off. In 2017 it was subjected to scathing criticism from Daniel Dewey of Open Philanthropy. Describing HRAD as “work that aims to describe basic aspects of reasoning and decision-making in a complete, principled, and theoretically satisfying way,” he argues
HRAD won't be useful as a description of how [early advanced AI] systems should reason and make decisions [...] HRAD has gained fewer strong advocates among AI researchers than I'd expect it to if it were very promising [...] very few researchers think this approach is promising relative to other kinds of theory work.
Three months later, Open Philanthropy awarded MIRI $3.75 million, based, according to their own explanation, almost entirely on a single “very positive review of MIRI's work on ‘logical induction’.” (Reviews of other MIRI papers had been at best lukewarm.) In fairness, “Logical Induction” does seem an interesting paper, albeit one with only 36 citations, remarkably few for a paper which singlehandedly unlocked millions of dollars.
Jessica Taylor, who co-wrote that paper, had previously observed
MIRI has a strong intuition that [the future will need HRAD], and personally I'm somewhat confused about the details [...] it isn't currently cleanly argued that the right way to research good consequentialist reasoning is to study the particular MIRI research topics such as decision theory [...] I think the focus on problems like decision theory is mostly based on intuitions that are (currently) hard to explicitly argue for.
It's worth noting the repetition of intuition regarding the strategic technical direction of a rationalist institute.
A 2015 post by Nate Soares, then executive director, claims “We specialize almost entirely in technical research. We select our researchers for their proficiency in mathematics and computer science.” But essentially all of MIRI's published papers seem extremely theoretical rather than even remotely practical, and devoid of any accompanying source code or proof-of-concept implementations. It’s hard not to be reminded of Yudkowsky’s juvenilia which labeled highly abstract theory as practical guides. The most recently updated public software repository at MIRI's GitHub which isn't forked from elsewhere, a plain web site, or a discussion forum, is five years stale. They did hire a prominent software engineer in 2018 ... but said “we don't consider [his work] part of our core research focus,” and that engineer has since moved on.
In 2016, apparently in response to the rocket-ship hockey-stick rise of neural networks, MIRI announced a new research agenda focusing on the question “What if techniques similar in character to present-day work in ML succeed in creating AGI?” But then, two years later, they updated their agenda again, to roll back that new approach, stressing “[Agent Foundations] research problems continue to be a major focus at MIRI.” (To this day, a sizable majority of their publications focus on HRAD.) Their new agenda re-abandoned neural networks in favor of:
Seeking entirely new low-level foundations for optimization, designed for transparency and alignability from the get-go, as an alternative to gradient-descent-style machine learning foundations. Note that this does not entail trying to beat modern ML techniques on computational efficiency, speed of development, ease of deployment, or other such properties. However, it does mean developing new foundations.
They also said that henceforth their research would be “nondisclosed-by-default.” Jessica Taylor confirmed: “MIRI became very secretive about research.” Others agree “MIRI is famously secretive.” As a result, MIRI's subsequent research was quite disparate from that of the entire rest of the world … even other researchers also focusing on AI safety. People explicitly stated:
General AI safety programs and support ... might not even have the capability to vet MIRI-style research. If you [are a MIRI-style researcher and] want to feel part of the AI safety community and join in the conversations people are having, you will have to spend time learning about ML-style research
There are signs MIRI eventually, grudgingly, accepted modern machine learning might be important. An external researcher writes “[MIRI have been] pretty public that they've made a shift towards transformer alignment as a result of OpenAI's work.” (Transformers are an extraordinarily effective form of deep learning model.) In 2019 they hired Evan Hubinger from OpenAI to continue his work on aligning modern AI/ML, and agreed to make it public. But he's open about being an exception, saying “my view is pretty distinct from the view of a lot of other people at MIRI” and “This is in contrast with the way I think a lot of other people at MIRI view this.”
I suppose it is theoretically possible that MIRI has a secret vault full of fantastically dangerous groundbreaking AI research. But the available evidence suggests something more like “MIRI continued work on a long-obsolete branch of abstract theory, while a few crumbs went towards failing to keep up with the vastly more advanced state of the art. Their ‘undisclosed’ work is yet more abstract theory, with no real path to any practical use in any foreseeable future (much less any significant running software) kept hidden behind a pointless veil of secrecy.” This seems especially likely given that, in 2021, in response to the suggestion “it was reasonable [for MIRI] to develop precursors to AGI in-house to compete with organizations such as DeepMind in terms of developing AGI first,” Yudkowsky commented
I haven't been shy over the course of my entire career about saying that I'd do this [develop precursors to AGI in-house at MIRI] if I could; it's looking less hopeful in 2020 than in 2010 due to the trajectory of machine learning and timelines.
Given all the above, it's hard to interpret this as anything but an admission that MIRI has not succeeded at any meaningful practical AI research at all.
In 2017, Jessica Taylor referred to MIRI as focusing on ‘principled’ AI and modern machine learning as ‘messy.’ This is a reasonably compelling taxonomy, but ‘messy’ AI is now a gargantuan and extraordinary field in which major breakthroughs seem to occur every month, whereas ‘principled’ AI, on which MIRI has focused almost all of its efforts, is … not.
(That said, Eliezer Yudkowsky has had very significant, albeit indirect, influence on modern ‘messy’ AI as well. Arguably the two most significant AI organizations in the world today are DeepMind, which recently mostly-solved biology's longstanding protein-folding problem, and OpenAI, best known for their breakthrough GPT-3 language model and DALL-E image generator, but home to much other pioneering work as well. Yudkowsky introduced DeepMind’s co-founders to their lead investor Peter Thiel, and OpenAI was largely populated by effective altruists. As such one can make a (tenuous) case that both also trace their origins back to the 90s extropians.)
iv. Here there be demons
With some reluctance, let us briefly pause to discuss the most infamous concept in the history of rationalism, one which dropped like a decision-theory neutron bomb during those Singularity Institute years, mostly because I can't write this much about rationalism without at least mentioning Roko’s Basilisk. Then, more interestingly, let's talk about infohazards in general.
The Roko’s Basilisk post on LessWrong, in July 2010 caused so much consternation it was subsequently deleted and discussion of it banned by Yudkowsky; but copies exist. Like most LessWrong posts, it is painfully long, but the key line is:
there is the ominous possibility that if a positive singularity does occur, the resultant singleton may have precommitted to punish all potential donors who knew about existential risks but who didn't give 100% of their disposable incomes to x-risk motivation.
This probably sounds pretty ludicrous … if you haven't spent several years talking about ‘precommitments’ as an aspect of decision theory, the philosophy you believe might save the world. The notion that superintelligences with mental capacities vastly beyond ours will make ‘precommitments’ — i.e. prevent themselves from changing their minds regarding a decision — is not one most people take particularly seriously. The “extended Roko’s Basilisk” idea that a godlike superintelligence would spend valuable computing power on simulating copies of zillions of past puny humans, purely to punish those copies for their originals' … well … original sin of not having built the superintelligence faster, and the dictum that people should treat these hypothetical future copies of themselves as if they are in fact the themselves who are reading this right now, are actually comical, an absurd caricature of a technology-enabled religious terror. To most.
But not to some LessWrongers, or to Yudkowsky, who in the comments thundered:
Listen to me very closely, you idiot. YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL [...] Until we have a better worked-out version of TDT and we can prove that formally, it should just be OBVIOUS that you DO NOT THINK ABOUT DISTANT BLACKMAILERS in SUFFICIENT DETAIL that they have a motive to ACTUALLY BLACKMAIL YOU.
He then deleted the thread. (When rationalists complain that outsiders fixate on Roko’s Basilisk, which rationalists themselves no longer care about at all, that’s likely true! ... but Yudkowsky’s eruptive reaction, and the resulting Streisand Effect, is the reason why.) And thus, the concept of the infohazard — notions so dangerous that to think about them, or sometimes even be aware of their existence, is to invite catastrophe — was promulgated.
The concept of “information hazards” was subsequently formalized by Nick Bostrom — him again! — as: “A risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm.” We can all agree information hazards exist. North Korea acquiring the information required to create invisible backpack nukes, for instance. (Indeed, the very first short story about Internet-enabled AIs, Murray Leinster's insanely prophetic "A Logic Named Joe," published in 1946(!), features information hazards.)
But, descriptively, infohazards is used differently — not “dangerous information if used” so much as “something dangerous just to know or think about,” or “information which is hazardous to have in your mind at all.” This of course connects directly to the demons that Leverage, MIRI, and other rationalism-associated people believed they had in their minds, implanted by others, which required ‘debugging’ to exorcise (if exorcism was even possible.)
I'm far from the first to point out the religious nature of infohazards or the strong eschatological overtones of, well, everything remotely associated with AI risk. What we talk about when we talk about Roko's Basilisk, or similar infohazards, is hell. Hell is an extraordinarily effective short-term motivational concept; just ask any fire-and-brimstone preacher, or read the sermon in Joyce's A Portrait of the Artist as a Young Man. It is also one notorious for breaking vulnerable minds.
I'll pause here to wave my hands wildly and speculate that religion itself is a kind of infohazard, or at least a minefield full of infohazards (such as hell), and as more and more people grow up weakly or areligious, they fail to develop defense mechanisms against infohazard fixation. Maybe there's even a kind of “hygiene hypothesis,” such that if your first serious encounter with the concept of hell is the discovery of Roko’s Basilisk, your “infoimmune system” is at greater risk of triggering terror, nightmares, or even psychosis. However much rationalists deny it, there are strong religious overtones to their focus on AI risk. Again, this is hardly an original observation; the Singularity has been called “the Rapture of the Nerds” for many years.
The paradox of rationalism is that it stresses that evidence should cause one to update one's belief, except for one's belief in rationalism. One can see this in Yudkowsky's uncharacteristically prolix (for a single essay) and handwavey response to the probably correct suggestion that rationalism should just be one of many tools in a ‘toolbox’ of cognitive approaches. (Albeit, in fairness, an unusually useful one.) Similarly, it's notable that a philosophy allegedly all about updating one's beliefs took ten years to grudgingly accept some minor updates to its ur-texts, the Sequences, making them “more optimized for new readers and less focused on extreme fidelity to Eliezer's original blog posts, as this was one of the largest requests we got.”
The paradox of rationalism is seemingly what makes it relatively easy for rationalists to overcome biases, avoid social programming, and come to different and sometimes original conclusions — which is excellent, and admirable! — but also means they can find it hard to update or iterate from conclusions they believe to be the outcomes of proper rationalist thought, even in the face of mounting evidence that those outcomes are suboptimal. This is perhaps why MIRI seems to have wasted a decade stubbornly maintaining focus on symbolic AI rather than deep learning / modern ML … while continuing to ominously suggest that their own research was so groundbreaking and important it was probably a terribly dangerous information hazard.
v. Ruin and despair
Looking at MIRI's blog today, we find it leads with:
A howl of despair by Yudkowsky entitled “AGI Ruin: A List of Lethalities”
An equally lamentory post by Nate Soares.
A five-year-old Yudkowsky essay about "operational adequacy"
Nothing else Elon Musk has done can possibly make up for how hard the "OpenAI" launch trashed humanity's chances of survival.
His April 1 post this year began
tl;dr: It's obvious at this point that humanity isn't going to solve the alignment problem, or even try very hard, or even go out with much of a fight. Since survival is unattainable, we should shift the focus of our efforts to helping humanity die with with [sic] slightly more dignity.
Funny, right? Ish? Darkly? ...Except he then asks and answers
All of this is just an April Fool's joke, right? — Why, of course! Or rather, it's a preview of what might be needful to say later, if matters really do get that desperate. You don't want to drop that on people suddenly and with no warning. Only you can decide whether to live in one mental world or the other.
His subsequent AI Ruin essay makes it pretty clear which world he really means.
in practice, using the techniques we actually have, "please don't disassemble literally everyone with probability roughly 1" [i.e. a roughly 100% chance] is an overly large ask that we are not on course to get [...] The big ask from AGI alignment, the basic challenge I am saying is too difficult, is to obtain by any strategy whatsoever a significant chance of there being any survivors.
This would be quite worrisome if you had great faith in Eliezer Yudkowsky’s predictive power. However, it seems reasonable to factor into that faith things like the history of Yudkowsky’s beliefs, the success level of previous predictions, and the efficacy of his research institute. Given those, while AI risk absolutely is an issue worth taking quite seriously ... I don't find myself overly concerned.
I've been pretty hard on Yudkowsky and MIRI in this chapter, largely because I think it's important to explain why I don't think we should take his despair seriously. He seems to most fear an “intelligence explosion,” an AGI smart enough to make itself smarter, through pure reason. That made more sense in the (thus far) failed paradigm of Symbolic AI, rewriting/evolving its own code on the fly. In fact, “reasoning oneself into godhood” sounds a lot like the notion of rationalism itself — teaching humans how to teach themselves to reason better. (Of course, we quickly hit the limits of our physical substrate; neurons, synapses, etc. It's unclear why a self-evolving AI would not suffer from similar limitations en route to superintelligence.) The basic concept of limitless betterment through pure reason seems the central thesis underpinning both rationalism and Symbolic AI research ... but both seem to have hit their limits.
Even if MIRI's symbolic approach was a dead end, that doesn't mean they were wrong to be wrong. Science is all about being wrong! But practical research is also all about quickly conceding when you are wrong, changing direction accordingly, and working on useful research which leads to effective engineering, rather than abstract theory with no apparent path to implementation. It's hard to ignore that while near-obsessive focus, and an extraordinary gift for theoretical exploration, have been Yudkowsky’s great strengths, their converse — refusal to let go quickly, and an inability to focus on or respond to practical outcomes — seem to remain fatal flaws.
Are they flaws in the greater movement that he created, as well? We'll examine that in the next chapter, “Slate Star Cortex and the Geeks for Monarchy.”
Gradient Ascendant is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.