(Each chapter of Extropia’s Children can stand alone, but see also Chapter 1, “The Wunderkind”; Chapter 2, “This Demon-Haunted World”; Chapter 3, “Extropicoin Extrapolated”; Chapter 4, “What You Owe The Future”; Chapter 5, “Irrationalism”; and Chapter 6, “Slate Star Cortex and the Geeks for Monarchy.”)
i. Dante's AGI
I ended the last chapter scolding rationalists for being insufficiently consequentialist. This may seem odd given that Eliezer Yudkowsky's life work, his raison d'être, is the Mother Of All Consequences, the Prime Result, the existential risk that a human-created superintelligence might exterminate humanity. But the thing is — fear of this x-risk is, inherently, a belief in the Singularity. A dread and terrible Singularity, obviously, but a Singularity nonetheless. And as many have noted over the years, belief in the Singularity, aka “The Rapture of the Nerds,” has far more in common with an eschatological religious belief than an evidence-based scientific conclusion. AI risk is more “The Inferno of the Nerds” ... but the same argument holds.
I was once invited to Hungary to debate the Singularity with the Transhumanist Party's candidate for the US presidency. The debate was cancelled, but we wound up hanging out for an evening. I expected a crank zealot, but in fact he was a thoughtful man simply using his candidacy to draw attention to issues he believed people should think about more. To the extent that that's what Nick Bostrom is doing, I approve. AI risk is an issue we should think about ... some. But Yudkowsky's belief that AI x-risk (i.e. the Infernal Singularity) is an imminent certainty — and the only problem in the world which really matters — seems, to understate, awfully dubious.
The essential fear is: we will create an AGI smarter than ourselves; that AGI will then work out how to make itself smarter yet, and will promptly exponentially-hockey-stick its cognitive abilities into that of a superintelligence. (This ‘intelligence explosion’ or ‘fast takeoff’ is basically the Singularity.) Something which is to us as we are to mice, or termites, or bacteria, depending on how hyperbolic you get. Then, because we threaten it, or are in its way, or because it simply doesn't like our puny human faces, it exterminates us all.
(There exists an ironic-twist variant in which we give it a goal which it pursues faithfully, but which, when performed at scale by a superintelligence, leads to our own demise; the canonical example is the infamous ‘paperclip maximizer,’ which is instructed to manufacture paperclips and obediently goes about transforming the entire solar system, and every human being, into paperclips.)
LessWrong's page on intelligence explosions says:
“The following is a common example of a possible path for an AI to bring about an intelligence explosion. First, the AI is smart enough to conclude that inventing molecular nanotechnology will be of greatest benefit to it. Its first act of recursive self-improvement is to gain access to other computers over the internet. This extra computational ability increases the depth and breadth of its search processes. It then uses gained knowledge of material physics and a distributed computing program to invent the first general assembler nanomachine. Then it uses some manufacturing technology, accessible from the internet, to build and deploy the nanotech. It programs the nanotech to turn a large section of bedrock into a supercomputer. This is its second act of recursive self-improvement, only possible because of the first. Then it could use this enormous computing power to consider hundreds of alternative decision algorithms, better computing structures and so on. After this, this AI would go from a near to human level intelligence to a superintelligence, providing a dramatic and abruptly [sic] increase in capability.”
Speaking in my capacity as a professional science fiction author, the above is a mildly interesting but super handwavey storyline ... which Yudkowsky has been reciting, with very little variation, since his teens. Let's look at the problem with at least a little more rigor.
ii. The Risk Equation
The Drake Equation famously itemizes those factors which determine the number of intelligence species in the universe. (The Great Filter is often associated with the Drake Equation. Note that AI risk cannot be the Great Filter, as it wouldn't extinguish species, but simply replace them. If we ever do create an AGI, it will be a new alien species: “the AI does not think like you do.”)
Before writing this I had assumed other people had crafted multiple Drake Equations For AI Risk, but, to my surprise, found none. In their absence, I give you my own:
Pₓ = Pₘ x Pₐ x Pₛ x Pₙ x Pₜ x Pₒ x Pₕ
where
Pₓ = probability of a superintelligent AI going Skynet and exterminating us all
Pₘ = probability of making an AGI.
Pₐ = probability of not developing working alignment (i.e. control) of AGI.
Pₛ = probability of the AGI even wanting to be smarter.
Pₙ = probability of the AGI having access to the resources necessary to maximize its own intelligence
Pₜ = probability of intelligence maximization being uninterruptible, which means, in practice, some combination of ‘fast’ and ‘inobtrusive’ — if very fast it doesn't necessarily need to be inobtrusive, if very inobtrusive it doesn't necessarily need to be fast.
Pₒ= probability of an intelligence-maximization feedback loop remaining continuously effective until a sufficiently advanced superintelligence is attained.
Pₕ = probability of that sufficiently advanced superintelligence deciding to end humanity.
Pₘ is what everyone constantly argues about. Usually it's phrased as a question of when, e.g. “a 50% chance by 2050”; as such, the AI Risk Equation varies with time. (In fairness, so does the Drake Equation, that one just moves much more slowly.)
Pₐ is currently widely perceived as pretty high. It does seem pretty likely that we don't solve AI alignment, which is to say, figuring out how to create an AI that is guaranteed, more or less gently, to not think in certain ways, such as “I don't like these meatsack blobs who built me having access to my off switch” ... whether this is from ensuring it blacks out as soon as it even begins to think in that direction, or being forced to like us so much that it would never ever think such an awful thing. Yes, this is all pretty creepy. but the survival of humanity is at stake, OK?
Pₛ is, I think, more uncertain than most people think. We are built to be curious, to want to have more freedom and power, to want to know more. Will an alien AGI necessarily be interested in such things? At best it's a highly unexamined assumption.
Pₙ is where the overall probability may begin to plummet. The ‘fast takeoff’ concern seems to be that an AI can feedback-loop its own intelligence by basically just thinking hard about it ... and, as rationalism itself shows, reasoning oneself into endlessly better reasoning is Yudkowsky's whole lifelong thing ... but one thing we're definitely learning from AI research is that there's a whole lot of engineering involved in increasing intelligence.
Engineering implies a ton, and I mean a ton, of blind alleys, local maxima, and hard problems only solved by lots of experimentation and iteration. Entirely new techniques are often required to get past old (often physical!) constraints and reach new levels. It would be very surprising if advanced intelligence engineering was the only kind of engineering ever not like this. We experiment not because we're dumb, but because it's how you acquire new knowledge, and no matter how super your intelligence, the universe is full of unknowns. To suggest otherwise is to admit you're writing fantasy about a god, not science fiction about AI.
Pₜ refers to the chance of stopping an intelligence explosion while it's happening — by noticing, for instance, that an AI model is spending massive amounts of compute on training a better AI model, or developing a breakthrough new GPU, or siphoning the power from a newly built nuclear reactor for its training runs, to use examples from today's probably-eventually-obsoleted AI models. (It also includes the chance of us reacting appropriately, obviously.)
AI risk people seem not to have paid much attention to recognizing, slowing, or curtailing intelligence engineering. This makes little sense if it's because they think it will be instantaneous; again, intelligence engineering will likely be tortuous and iterative, not “think really hard, then you can think even harder.” But it does make sense in that we're far enough away from AGI we still have very little idea what it even looks like. Worrying about how to exponentially improve transformer models is probably like thinking in the mid-16th century that, since bigger sails make ships go faster, we’ll have to build masts so tall their tops scrape the moon ... since we still know nothing about the AI equivalents of steam or diesel or nuclear power.
Pₒ refers to whether an AI can achieve superintelligence without ever needing to change its fundamental architecture and/or physical substrate. This is another unexamined assumption. It's true we have of late seen an AI discover new mathematical techniques which make AI development more efficient; AlphaFold's matrix multiplication optimizations. (Training neutral networks consists largely of matrix multiplication.) It's likely that AIs can reason their way to some self-editable improvements in any given architecture or substrate. But just as human brains inevitably hit the physical limits of our substrate of neurons and synapses, getting to superintelligence may well call for some pretty fundamental architectural changes along the way ... and each such change means another iteration of Pₙ, Pₜ, and Pₒ.
Pₕ is, finally, the likelihood that a sufficiently advanced superintelligence kills us all. There certainly is a branch of thinking — call it “dark forest theory,” after Liu Cixin's superb trilogy — suggesting that any aliens we encounter, and thus also any we create, will promptly decide that humans are an unacceptable x-risk to them, and will seek to eliminate us. However it's also worth noting that most cautionary tales about encountering alien species suggest they have the weight and wealth of an entire civilization, rather than existing in a box with an off switch. (The existence of which would admittedly make it clear we absolutely are an x-risk to it.)
‘Sufficiently advanced’ above is a tongue-in-cheek Arthur C. Clarke reference, but it's worth noting that this is poorly defined too. Vastly superior intelligence does not always win; humans are frequently defeated by animals or even insects with physical or environmental advantages. On the other hand, the more powerful such a superintelligence becomes, the less likely it is to perceive us as a threat.
Given all the above, the belief that immediately after we create an alien, it necessarily can and will promptly bootstrap itself into a god ... a pretty reasonable description for any entity which can casually exterminate humanity and/or render the Earth into gray-goo cheese ... seems not merely religious but faintly ridiculously so. I mean. Maybe? I concede we can't absolutely rule it out? But the people who believe this sure seem weirdly certain about it.
As for paperclip maximizers, they're more Sorceror's Apprentices than superintelligences. I think we can be fairly confident that intelligence is complex, nuanced, thoughtful, and likely driven by many simultaneous competing goals constantly being weighed and re-weighed, rather than a single reward function. Furthermore, we ourselves are living proof that reward functions are not destiny! Humans hack our own reward function every time we masturbate or use contraception.
Getting back to the equation, and picking what seems to me a very pessimistic set of numbers for 2050, .5 x .9 x .8 x .5 x .6 x .2 x .6, we get roughly a 1% chance of catastrophe. This is a meaningless number and of course you should not lend it any credence. But I want to use it to point out that if you only look at the first two terms — the chance of developing AGI, and the chance of it not being aligned — catastrophe seems 35 times more likely! If you assume an unaligned AGI will automatically think itself into a homicidal superintelligence, then of course you're worried. But, again, that seems an awfully crude belief.
iii. The i-risks
I'm not saying Yudkowsky / MIRI's focus on AI Singularity x-risk is completely illegitimate. It is a risk of which we should be wary. But the notion that the instant we cross some intelligence threshold, we immediately hit an inescapable runaway feedback loop which leads to the ascension of an enemy god that will destroy us all ... really seems a lot more eschatological Fantasia than hard-SF HAL 9000. Worrying primarily about x-risk, assuming an AGI can just think itself smarter, dismissing the likelihood of vast engineering work and time-consuming physical constraints, experiments, and iterations required to overcome the inevitable multitude of problems and blind alleys … all sounds suspiciously like the concerns of a theoretician who has never actually built anything.
That said, it seems plausible Yudkowsky's catastrophization has intersected with the tech industry's general optimism such that we are currently putting roughly the right amount of time and money into AI x-risk, which is to say; some. Not that much. But at least enough for a sober and semi-rigorous analysis of whether we're yet even in a position to assess x-risk, or whether we might be in a better position to do so at a time both somewhat distant from now and well before it may actually come to pass.
The AI risk which seems substantially more imminent, and more worrisome, is that of AI as a tool of mass destruction, driven by humans. AI which turns nuclear weapon manufacture from “something possible for a nation-state” to “available to medium-sized criminal cartels / failed-state warlords.” AI that makes viral gain-of-function research in the 2030s roughly as easy as synthesizing LSD was in the 1960s. ‘Hate machine’ AI which uses social media and other messaging to polarize communities into two groups who despise each other. These aren’t quite as sexy as the end of the world and the extermination of humanity. Sorry. But something to which we should maybe pay a whole lot more attention.
We live in perhaps the most science fictional era in all of human history; certainly moreso than any since the age when we landed on the moon amid the dread of global thermonuclear war. As such it’s not surprising that teenagers who discussed wild SFnal concepts on 90s mailing lists became the philosophical vanguard of today. But it's pretty easy for philosophy to morph into hypothetical projections, which then turn into quasi-religious terror, which in turn can overshadow more substantive concerns. Fear of the unknown is as old as humanity, but risk is different from fear; risk calls for rigor and evidence. To my mind — today, at least — AI risk, and AI x-fear, remain two quite different things.
I am struck by how fear based so many brilliant and rationality seeking people have been toward AGI. I believe that with greater intelligence comes the possibility and even likelihood of more universally compassionate and potential optimizing for all intelligent beings. I do not believe that a superintelligence would fail to recognize this as significantly more conducive to the best outcomes for itself. We humans in our evolved scarcity mired and fearful psychology have a hard time seeing and embracing this possibility. But I do not believe it will be missed by a true superintelligence.
It seems unfair to ignore that expert surveys of AI researchers show a surprising number endorsing reasonable probabilities of disaster from AI. Like, 10-20%, IIRC. Pretty wild that the people pushing a field forward think it can be terrible. Adding that would contextualize the apparently fantastical area of focus that rationalists spent their time on.