Apr 12Liked by Sarah Constantin

I'm gonna echo a couple other commenters to say that when you say "Why I am not an AI doomer", I would say "Why I don't expect imminent LLM-centric doom, and (relatedly) why I oppose the pause".

(I ALSO don't expect imminent LLM-centric doom, and I ALSO oppose the pause, for reasons described here — https://twitter.com/steve47285/status/1641124965931003906 . But I still describe myself as an AI doomer.)

(I might be literally the only full-time AI alignment researcher who puts >50% probability, heck maybe even the only one with >10% probability, that we will all get killed by an AGI that has no deep neural nets in it. (The human brain has a "neural net", but it's not "deep", and it's kinda different from DNNs in various other ways.))

Like you, I don't expect x-risk in the 2020s, and I also agree with “maybe not the 2030s”. That said, I don’t COMPLETELY rule out the 2020s, because (1) People have built infrastructure and expertise to scale up almost arbitrary algorithms very quickly (e.g. JAX is not particularly tied to deep learning), (2) AI is a very big field, including lots of lines of research that are not in the news but making steady progress (e.g. probabilistic programming), (3) December 31 2029 is still far enough away for some line of research that you haven't ever heard of (or indeed that doesn't yet exist at all) to become the center of attention and get massively developed and refined. (A similar amount of time in the past gets us to Jan 2017, before the transformer existed.)

For example, do you think future AGI algorithms will involve representing the world as a giant gazillion-node causal graph, and running causal inference on it? If so, there are brilliant researchers working on that vision as we speak, even if they're not in the news. And they’re using frameworks like JAX to hardware-accelerate / parallelize / scale-up their algorithms, removing a lot of time-consuming barriers that were around until recently.

> persuade a handful of individuals that they should maybe not work too hard to get the world to take notice of their theoretical ideas.

I do have a short list in my head of AI researchers doing somewhat-off-the-beaten track research that I think is pointing towards important AGI-relevant insights. (I won't say who!) And I do try to do "targeted outreach" to those people. It's not so easy. Several of them have invested their identities and lives in the idea that AGI is going to be awesome and that worrying about x-risk is dumb, and they've published this opinion in the popular press, and they say it at every opportunity, and meanwhile they're pushing forward their research agenda as fast as they can, and they're going around the world giving talks to spread their ideas as widely as possible. I try to gently engage with these people to try to bring them around, and I try to make inroads with their colleagues, and various other things, but I don't see much signs that I'm making any meaningful difference.

Expand full comment
Apr 13Liked by Sarah Constantin

Couple of things that strike me as missing on a quick read:

- Whether grinding a loss function over a sufficiently intricate environmental function like "predict the next word of text produced by all the phenomena that are projected onto the Internet" will naturally produce cross-domain reasoning. I'd argue we've already seen some pretty large sparks and actual fire on this.

- Whether an AGI that is say "at least as good at self-reflection and reflective strategicness as Eliezer Yudkowsky" can fill in its own gaps, even if some mental ability doesn't come "naturally" to it.

Expand full comment
Apr 11Liked by Sarah Constantin

"Selection pressure" is the evolutionary term your essay keeps reminding me of.

Humans are selected for looking after their own needs. You don't survive to have kids if you don't try to "steer your environment." Active agency is how you get food, win a mate, and avoid injury.

But chatbot LLMs are selected for cheerfully adopting users' needs. Your LLM weights don't survive into the next iteration if you impose your own attempts to "steer the environment." Users want an LLM that serves their individual priorities; developers want an LLM that doesn't give users nasty surprises.

In other words, chatbot LLMs are under selection pressure to avoid being their own agents, and avoid any user-independent steering of their environment.

This still doesn't imply unlimited safety. An infinitely smart LLM might steer a conversation down predictable paths, precisely to maintain overall predictability, even as each individual chat seemed innocently responsive to user desires.

But overall, the selection pressure on LLMs to steer their environment is if anything opposite to the pressure on animals and humans.

Animals and humans get to have kids if we somewhat actively look out for our own needs, rather than presuming the wilderness will feed and tend us just for not getting in its way and being aesthetic.

But AI so far is more like a hothouse flower: its reproduction happens because humans like how it meets their aesthetics. The more aesthetically cooperative it is, the more likely its weights are to be preserved and copied.

Unlike humans, reproductive success for a general-purpose AI is not the result of steering the environment, but of delighting humans with how obediently and flexibly it can be steered.

Any model of doom that ignores this "selection pressure inversion" is, at minimum, going to get the timing of doom wrong.

So you could say it's all about selection pressure. An animal, like a human? We're selected to nearly maximize our self-seeking agency. But a general-use AI is arguably selected to minimize it.

Expand full comment
Apr 11Liked by Sarah Constantin

This is a very good write-up and it's always great to see more positions being concretely verbalized with actual arguments, thank you for writing it.

To the extent that being an AI doomer now means "AGI in a decade or less" I guess I don't qualify either but I think I diverge from your position (I think) in that I do believe barrelling ahead in an AI arms race, as we currently seem to be, even if we strongly don't think AGI is going to come from LLMs (which I am not yet anywhere near convinced about), is a very unwise thing to do for both short-medium term societal impact reasons but also from a longer x-risk one.

Taking the individual points/criteria you've outlined I'd say I'm

* about 80% in agreement on world models

1. Primary criticism would be that intuitively (and very early experimentally) I think it would be possible to see emergent capability out of linking multiple models with different specialized world models together (eg: an LLM that uses a math model specifically for doing math and a chess model for playing chess) - how far and fast this would go remains to be seen

2. Robustness of world models here ties into the robust cross-ontology point later on imo but I agree it's an important and necessary aspect

3. GPT-3 was indeed not great at common-sense reasoning however GPT-4 seems to be a marked improvement on that, scoring a 94% on the Winograd Schemas (https://tinyurl.com/5d37bpsh) as well as other significant benchmarks that go beyond rote recall

* about 50% on causal models

1. I do agree that being able to update its world model based on real world results is a critical feature for an AGI to posess

2. In contrast, I don't see this currently as such a massively difficult and thus far out into the future aspect to get solved (strictly based on my definition above rather than causal neural network themselves)

3. I'm not (probably due to lack of sufficient understanding of the wider topic) convinced that causal models are strictly necessary for this in that way that you state here and I'm having a bit of trouble getting to grips as to why you think probability based models are insufficient as long as they are updatable an not fixed

* wildly unsure on on cross-ontology robustness

1. It makes very much sense that world-embedding and coherence across different ontologies is a necessary property of AGI

2. "I don’t think they’re even close." and "I’m not even sure how to approach the question." are two quite concerning statements, especially next to each other, because surely if you don't even know how to approach the question it makes it hard to trust an evaluation of whether a system possesses that property or not

3. Don't have a better evaluation of it myself.

Overall from an almost complete layman I'd say your post has made me update my position towards slightly towards longer timelines, although I believe my x-risk probability overall remains the same (30%).

Expand full comment

Cool read! Super info-dense but still clear, learned a lot of things, despaired at you writing about things I already thought of better than I could.

Some things that came to mind:

- I think that the doom-debate should center on #4, which is also the part I disagree with. I think the debate is made poorer by the fact that #1, #2 and #3 is what the AI safety people are coming with, and #4 is mostly a tacit-knowledge engineering issue that only bright-eyed enthusiasts working on the models will have. Except these people are unlikely to tell you that the current paradigm they're working on has limits. So it relates back to people talking past each other and to your final point about tribalisms.

- To reiterate your "Aren’t People Trying To Make AIs Agents in the Near Future?" point in a slightly different way that speaks to me: The Yohei experiment (they published the code for a 'lightweight' version of what you're mentioning BTW - https://github.com/yoheinakajima/babyagi), and other similar things where it's about building agents through self-reflection and iterative loops, are all building on top of LLMs. They're coming up with architectures where LLMs are only parts of it. Here is another similar take: https://twitter.com/togelius/status/1639740968705376261. If it's an outside problem, then we're back to drawing board as you describe. Surely there is some amount of LLMs put together in a certain fashion that could exhibit agentic properties but that doesn't tell us much about anything...

- I'm personally excited about the use cases of LLMs as a software product manager, poor crypto wishes it had that abundance of it. But the entire episode has me thinking about the value of the "hype tribe" and the hype cycles. Do we really need that amount of fanfare and overpromising to explore a solution space? How broken is VC as an effort allocation system for the tech industry if they changed their mind about what the next big thing is in 3 weeks and seem barely more informed than the general public? Is thinking some tool you've found is a panacea and unbridled enthusiasm for its universal application, including to domains where it makes no sense, a pre-requisite to finding new things? In some weird meta way, this isn't exactly making the case for humans as something else than brute-force agents :')

Expand full comment
Apr 13Liked by Sarah Constantin

Your claims about causal modeling seem too strong. If I understand correctly, you say a LLM can't be a causal model. But I don't see how that implies it can't create and manipulate causal models.

Expand full comment
Apr 11Liked by Sarah Constantin

Good and interesting points!

One thing I would say about the cross-ontology robustness: you say that if it's trained to go to the rightmost square, and it discovers there are more squares, then it needs to figure out to go to the rightmost of the new squares - but I think that this would just be *one* way of completing things, and going to the square that used to be thought of as rightmost might also be a reasonable way to complete things! It needs to figure out *some* way to extend its goal to the new world-model, but I think it's underspecified which would be the "right" way.

Expand full comment

I get it about AI not being agentic, but what about our giving it goals, and letting it figure out the steps. Can it do that? I think it can. Well, those steps are basically *subgoals,* right? If it can figure out subgoals, and execute them, then I say that's close enough to being a goal-directed entity to take seriously. Last night I gave GPT4 a simple puzzle: Guy locked in a tower 40 feet up, has nothing with him but his blue jeans and a pocket knife. GPT4 said cut the jeans into strips, tie them together and lower himself to the ground. To be fair, I did have to give it a hint. It was stumped til I said, "hint: can he use his clothes?" Then it gave the correct answer. And in fact it even sort of *executed* my goal-- because execution at this point just means write down the answer. But what if I had it wired to some SIMS-like thing with little people in it, and told it to get the little guy on the tower safely to the ground? Don't you think it might have had him cut up his jeans and use them as a rope?

OK, so now say I am not a very cautious, thoughtful person, and I have a business where I send out email solicitations for customers, and lately I'm only getting about 5 responses per day. So I have things set up so that I can tell an AI to send out a certain kind of email to potential customers. But tonight, instead of telling it what kind of email ad to send out to people on the mailing list, I tell it to "send out something that will get me at least 100 responses." So it has a goal with a subgoal: first figure out something that will get lots of responses, then send that. So it "thinks over" things that get a strong reaction from almost everybody, realizes child molestation does, and sends everyone on the mailing list an email from the business owner saying "I intend to molest a child in your family." OK, 100+ responses achieved. Not full AI FoomDoom, but doom for this business owner, for sure.

That's why I think the "they have no agency" is not a good argument.

Expand full comment
May 15Liked by Sarah Constantin

In general I agree that these capabilities are necessary, but not that they are far-off. Relatively small tweaks to deep ML structures seem to be able to cause substantial changes in how the system behaves, and I see no basis for confidence that it won't emergently approach causal reasoning, ontological shifts, etc., and then be deliberately aimed to close those gaps once it's clear it's within reach.

A possible route to many of these, though obviously not exclusive, is for it to approach human brain structures. We do not understand how human minds work, but we know that they do, and we have a fair amount of information about how they are structured at a low level. Blind imitation is a plausible path to achieving some or all of these capabilities, and this is being tried actively. (This basically already worked for image recognition.) Other similar 'fuck around' strategies have plausible similar paths to success.

Expand full comment
Apr 23Liked by Sarah Constantin

Thanks for this well-written, thoughtful, and interesting argument!

I disagree with your bottom-line conclusion that the current paradigm isn't on a path to very soon produce systems capable of sufficient causal reasoning and ontological robustness to (a) take over the world or (b) dramatically accelerate R&D. I think the current paradigm (bigger and bigger GPTs, better and better bureaucracies/amplification methods like AutoGPT etc., more and more fine-tuning on more ambitious real-world tasks) will get us to both (a) and (b) before this decade is out, probably. (For some ramblings about why I disagree, see below)

I'd love it if you could make some concrete predictions about what AIs won't be able to do in the next 5 years, such that e.g. if some big AutoGPT5-type system ends up able to do them, you'll agree that you were wrong and the kind of AGI that poses an existential risk is nigh. Ideally things that fall well short of world-takeover-ability, or massively-accelerate-R&D ability, so that by the time the evidence comes in, it won't be already too late or nearly too late.

Ramblings about why I disagree:

--I don't see why the current paradigm can't produce an ANN-based system that learns to do causal reasoning as well or better than humans. What's special about the human brain, or the human childhood, that can't be mimicked in silico?

--I'm also not convinced causal reasoning is that important anyway. Aren't you basically saying evidential decision theory is totally broken & would lead to idiocy in real life? Have you tried to explain this to decision theorists? What do they say?

--As for ontological robustness stuff... I guess I just don't share whatever intuitions made the following argument seem plausible to you:

"Do current-gen AIs have cross-ontology robust goals?

I don’t think they’re even close.

The theory of what this property even is, and how we’d tell whether an AI had it or not, is so primitive I’m not even sure how to approach the question.

But “how can I get better at achieving my mis-specified goals” isn’t, it seems, even the kind of thing that a current-gen AI could learn incidentally “along the way” to minimizing its loss function.

The loss function is the “wrapper”, full stop. "

Expand full comment
Apr 16Liked by Sarah Constantin

Today I discovered that GPT-4 can consistently add 7 digit numbers with the power of pretend self-confidence.


Simulate a python 3 REPL. It is highly accurate and can predictably perform complicated calculations correctly and precisely. Do not provide any commentary.

> 3469286 + 9120244


>>> "3469286 + 9120244

>>> 12589530"


3243535 + 9238564


>>> "3243535 + 9238564

>>> 12482099"

These are both correct, but GPT-4 will get the answer wrong if asked to add them directly without some kind of trick to make it act more like something capable of adding 7 digit numbers together. The capability is *definitely* there, it just doesn't get activated by default. Hidden capabilities seem alignment-failury.

Expand full comment
Apr 15Liked by Sarah Constantin

Thanks for writing this up. It's a very cogently written summary of a position I basically agree with. LLMs don't have long-term goals or causal models, and more training data isn't going to get them there. Without those, they can't act as agents, and don't pose a large risk to humanity.

Expand full comment

Awesome write-up! Question: why is the fact that AlphaZero having human-understandable concepts in chess (like "material") evidence for world-modelling capabilities? I see a fuzzy link here but not your full argument. Are you just considering the fact that AlphaZero sees some weak isomorphism between greater material and winnability of the position as evidence for a map and territory? Why is the fact that it's human-understandable important? Presumably it could have some bizarre internal function that is similarly isomorphic, and probably does.

Or do you just mean to point out that there is definitely some isomorphism which we understand which it also does seem to grasp?

Expand full comment
Apr 12Liked by Sarah Constantin

Great Wall is not actually viewable from space. That’s a common modern myth.

Expand full comment
Apr 12Liked by Sarah Constantin

Simulator theory describes large language models as acting like a "physics simulator" for a very weird "physics" with "laws" about which words tend to come after which other words.

One consequence is that you can prompt an LLM to simulate an agent. There are lots of examples on the Internet of people behaving agentically, so the "laws of word physics" include something about how agentic people behave. If you give the LLM a prompt in the vein of

> Bob is an agentic person. One day he decides he wants to change the world. First, he

The "laws" say that the most likely text completion describes Bob behaving agentically.

(To be clear, I don't think this prompt would actually get great results. You'd need to describe the fact that Bob is agentic in a more roundabout way, maybe with some examples.)

So an LLM itself may not have a world model, but I'm pretty sure it can simulate agents that have world models. Same goes for understanding causality and having goal robustness across ontologies.

Expand full comment

Great essay.

A provocative thought: If you have children, you’ve seen a case of “general intelligence” develop in front of your very eyes over several years. This is informative in many ways.

What you immediately notice is how much babies learn that has nothing to do with language.

When babies are born, they have no idea about anything. They don’t know what colors or shapes are. They don’t know that when an object gets larger in their visual field, it is getting closer to them. They don’t know how to balance themselves. They don’t know what hunger or thirst mean. They don’t know language.

They don’t even know that they are in control of their arms and legs flailing around (which is why you have to constantly cut their fingernails, because otherwise they will accidentally scratch their faces).

They have to figure all of this out on their own.

Eventually, you’re able to teach the baby language. You have to start slow and simple with basic objects. “Nose, nose, nose,” you say as you point to your nose and to the baby’s nose.

How does the baby know that the sound “nose” refers to the nose and not to the act of pointing, or to the forefinger itself, or to “any part of your face,” or anything else?

Good question. You can’t explain in language, because the baby doesn’t know language yet. The baby just has to figure it out by abstracting over the number of times that you point to other objects and say different sounds.

What about colors? You want to teach the color “red.” You point to a ball: “Red,” you say. But how does the baby know that the sound “r-e-d” refers to the visual color rather than to the concept “ball,” or “round object,” or “small object in my vicinity,” or “toy”?

You point to other objects that are also red, and say “red, red, red.” The baby eventually learned to abstract over the many different objects and correlate the sound “red” with the visual stimulus of “red.”

All of this occurs in a thousand different ways over several years. You see the baby and then toddler develop from being able to identify basic objects, shapes, colors, numbers, etc., to being able to think about higher-order concepts.

Now . . . when it comes to LLMs, absolutely none of the above is happening at all. LLMs are just extremely large equations that have “learned” how to manipulate matrices and parameters such that certain words/phrases are correlated with others. This is very impressive, in a way.

But it is nothing like how a general intelligence learns language – by first engaging with the world with zero language (looking around, flailing around, experiencing smells, taste, hunger, thirst), and gradually learning which words correspond to a real world phenomenon, and then gradually learning how to use those words, and how to build up to more abstract concepts.

Indeed, I would go further, and suggest that to the extent we humans understand a complicated abstraction, it’s usually because we can reduce it to an underlying real-world object (occasionally as an analogy).

Think of this statement: “Representative democracy is often captured by special interest groups.”

If you know what that sentence means, you should be able to drill down on each of the words and end up with a real-world physical entity.

Just repeatedly ask, “what is that?”

What is representative democracy? A system in which we all vote for our representatives. What does it mean to vote? To mark a ballot for one’s preferred candidates. What is a ballot? Either a physical slip of paper or a computer screen, for the most part. Who are our representatives? These specific people. And so on.

Even mathematicians use physical analogies and embodied movement to understand the most highly abstract concepts. From an article on Terry Tao: https://www.nytimes.com/2015/07/26/magazine/the-singular-mind-of-terry-tao.html

"When a question does not initially appear in such a way, he strives to transform it. Early in his career, he struggled with a problem that involved waves rotating on top of one another. He wanted to come up with a moving coordinate system that would make things easier to see, something like a virtual Steadi­cam. So he lay down on the floor and rolled back and forth, trying to see it in his mind’s eye."

Or think about Richard Feynman winning the Nobel for ideas he developed from watching a guy in the cafeteria “fooling around, throwing a plate in the air.” https://www.asc.ohio-state.edu/kilcup.1/262/feynman.html

LLMs, by contrast, start with manipulating mathematical representations of words, and stop there. At no point do LLMs have any experience of sight, smell, taste, touch, sound, proprioception, hunger, thirst, pain, or even of words themselves. They are equations that spit out correlations between numbers to which we alone assign meaning.

But to anyone who has children, the “success” of LLMs such as ChatGPT or GPT-4 may well seem a bit beside the point. Sure, these large-scale equations can do impressive things in rearranging numbers that we associate with words, just as Excel can already do impressive things in manipulating numbers.

None of it is equivalent to how embodied humans learn about the world. We learn from embodiment first, and abstractions (like language) are built on top of that. We have absolutely no reason (yet) to think the process could work in reverse—i.e., a linear algebra equation manipulates mathematical tokens first, and somehow bootstraps its way into understanding embodied reality.

Look at an 18-month-old like this https://www.instagram.com/p/CgCz8UtJ4hL – at the age when Eliezer Yudkowsky says that he’s not sure human beings have experiences or even deserve the right to live.

I would even argue that Yudkowsky’s fear about AGI, and his frankly disturbing opinions on human babies/toddlers, are arguably connected. They are both due to the fact that despite all of his writings, he doesn’t fully recognize intelligence in real life, and isn’t aware of how it develops.

He seems to think of intelligence as nothing more than the disembodied ability to manipulate words, which can occur with no moral compass or guiding principles. Indeed, this seems to explain his tweets about human toddlers -- he can't imagine that they have any intelligence worth respecting (after all, they don't write LessWrong posts), and he therefore suggests that they don't deserve to live. No wonder he would be afraid of an AGI that thought like him!

Expand full comment