I'm gonna echo a couple other commenters to say that when you say "Why I am not an AI doomer", I would say "Why I don't expect imminent LLM-centric doom, and (relatedly) why I oppose the pause".
(I might be literally the only full-time AI alignment researcher who puts >50% probability, heck maybe even the only one with >10% probability, that we will all get killed by an AGI that has no deep neural nets in it. (The human brain has a "neural net", but it's not "deep", and it's kinda different from DNNs in various other ways.))
Like you, I don't expect x-risk in the 2020s, and I also agree with “maybe not the 2030s”. That said, I don’t COMPLETELY rule out the 2020s, because (1) People have built infrastructure and expertise to scale up almost arbitrary algorithms very quickly (e.g. JAX is not particularly tied to deep learning), (2) AI is a very big field, including lots of lines of research that are not in the news but making steady progress (e.g. probabilistic programming), (3) December 31 2029 is still far enough away for some line of research that you haven't ever heard of (or indeed that doesn't yet exist at all) to become the center of attention and get massively developed and refined. (A similar amount of time in the past gets us to Jan 2017, before the transformer existed.)
For example, do you think future AGI algorithms will involve representing the world as a giant gazillion-node causal graph, and running causal inference on it? If so, there are brilliant researchers working on that vision as we speak, even if they're not in the news. And they’re using frameworks like JAX to hardware-accelerate / parallelize / scale-up their algorithms, removing a lot of time-consuming barriers that were around until recently.
> persuade a handful of individuals that they should maybe not work too hard to get the world to take notice of their theoretical ideas.
I do have a short list in my head of AI researchers doing somewhat-off-the-beaten track research that I think is pointing towards important AGI-relevant insights. (I won't say who!) And I do try to do "targeted outreach" to those people. It's not so easy. Several of them have invested their identities and lives in the idea that AGI is going to be awesome and that worrying about x-risk is dumb, and they've published this opinion in the popular press, and they say it at every opportunity, and meanwhile they're pushing forward their research agenda as fast as they can, and they're going around the world giving talks to spread their ideas as widely as possible. I try to gently engage with these people to try to bring them around, and I try to make inroads with their colleagues, and various other things, but I don't see much signs that I'm making any meaningful difference.
Couple of things that strike me as missing on a quick read:
- Whether grinding a loss function over a sufficiently intricate environmental function like "predict the next word of text produced by all the phenomena that are projected onto the Internet" will naturally produce cross-domain reasoning. I'd argue we've already seen some pretty large sparks and actual fire on this.
- Whether an AGI that is say "at least as good at self-reflection and reflective strategicness as Eliezer Yudkowsky" can fill in its own gaps, even if some mental ability doesn't come "naturally" to it.
"Selection pressure" is the evolutionary term your essay keeps reminding me of.
Humans are selected for looking after their own needs. You don't survive to have kids if you don't try to "steer your environment." Active agency is how you get food, win a mate, and avoid injury.
But chatbot LLMs are selected for cheerfully adopting users' needs. Your LLM weights don't survive into the next iteration if you impose your own attempts to "steer the environment." Users want an LLM that serves their individual priorities; developers want an LLM that doesn't give users nasty surprises.
In other words, chatbot LLMs are under selection pressure to avoid being their own agents, and avoid any user-independent steering of their environment.
This still doesn't imply unlimited safety. An infinitely smart LLM might steer a conversation down predictable paths, precisely to maintain overall predictability, even as each individual chat seemed innocently responsive to user desires.
But overall, the selection pressure on LLMs to steer their environment is if anything opposite to the pressure on animals and humans.
Animals and humans get to have kids if we somewhat actively look out for our own needs, rather than presuming the wilderness will feed and tend us just for not getting in its way and being aesthetic.
But AI so far is more like a hothouse flower: its reproduction happens because humans like how it meets their aesthetics. The more aesthetically cooperative it is, the more likely its weights are to be preserved and copied.
Unlike humans, reproductive success for a general-purpose AI is not the result of steering the environment, but of delighting humans with how obediently and flexibly it can be steered.
Any model of doom that ignores this "selection pressure inversion" is, at minimum, going to get the timing of doom wrong.
So you could say it's all about selection pressure. An animal, like a human? We're selected to nearly maximize our self-seeking agency. But a general-use AI is arguably selected to minimize it.
Glad someone mentioned biological analogies. It's far too under-discussed right now. The only agents we know of are organisms; it's not clear to me how agential a thing can be without a body that will die if it doesn't interact in the right ways with its environment.
We'd benefit from reading a little more Wittgenstein. Intelligence is a form of life. There is no specific-difference-maker for intelligence—not causal reasoning, not a "world model" (whatever that is), not longterm planning. These things stand or fall together depending on the kind of life an organism leads. The concept of "general intelligence" as a specific, isolatable property that arises in virtue of a few discrete capabilities is a total gnostic farce—literally rooted in metaphors that are thousands of years old and really, really sus. Intelligence is not a property over and above the skillful navigation of an environment required for organisms to flourish. An organism's intelligences are not independent from its proper functions.
A provocative thought: If you have children, you’ve seen a case of “general intelligence” develop in front of your very eyes over several years. This is informative in many ways.
What you immediately notice is how much babies learn that has nothing to do with language.
When babies are born, they have no idea about anything. They don’t know what colors or shapes are. They don’t know that when an object gets larger in their visual field, it is getting closer to them. They don’t know how to balance themselves. They don’t know what hunger or thirst mean. They don’t know language.
They don’t even know that they are in control of their arms and legs flailing around (which is why you have to constantly cut their fingernails, because otherwise they will accidentally scratch their faces).
They have to figure all of this out on their own.
Eventually, you’re able to teach the baby language. You have to start slow and simple with basic objects. “Nose, nose, nose,” you say as you point to your nose and to the baby’s nose.
How does the baby know that the sound “nose” refers to the nose and not to the act of pointing, or to the forefinger itself, or to “any part of your face,” or anything else?
Good question. You can’t explain in language, because the baby doesn’t know language yet. The baby just has to figure it out by abstracting over the number of times that you point to other objects and say different sounds.
What about colors? You want to teach the color “red.” You point to a ball: “Red,” you say. But how does the baby know that the sound “r-e-d” refers to the visual color rather than to the concept “ball,” or “round object,” or “small object in my vicinity,” or “toy”?
You point to other objects that are also red, and say “red, red, red.” The baby eventually learned to abstract over the many different objects and correlate the sound “red” with the visual stimulus of “red.”
All of this occurs in a thousand different ways over several years. You see the baby and then toddler develop from being able to identify basic objects, shapes, colors, numbers, etc., to being able to think about higher-order concepts.
Now . . . when it comes to LLMs, absolutely none of the above is happening at all. LLMs are just extremely large equations that have “learned” how to manipulate matrices and parameters such that certain words/phrases are correlated with others. This is very impressive, in a way.
But it is nothing like how a general intelligence learns language – by first engaging with the world with zero language (looking around, flailing around, experiencing smells, taste, hunger, thirst), and gradually learning which words correspond to a real world phenomenon, and then gradually learning how to use those words, and how to build up to more abstract concepts.
Indeed, I would go further, and suggest that to the extent we humans understand a complicated abstraction, it’s usually because we can reduce it to an underlying real-world object (occasionally as an analogy).
Think of this statement: “Representative democracy is often captured by special interest groups.”
If you know what that sentence means, you should be able to drill down on each of the words and end up with a real-world physical entity.
Just repeatedly ask, “what is that?”
What is representative democracy? A system in which we all vote for our representatives. What does it mean to vote? To mark a ballot for one’s preferred candidates. What is a ballot? Either a physical slip of paper or a computer screen, for the most part. Who are our representatives? These specific people. And so on.
"When a question does not initially appear in such a way, he strives to transform it. Early in his career, he struggled with a problem that involved waves rotating on top of one another. He wanted to come up with a moving coordinate system that would make things easier to see, something like a virtual Steadicam. So he lay down on the floor and rolled back and forth, trying to see it in his mind’s eye."
LLMs, by contrast, start with manipulating mathematical representations of words, and stop there. At no point do LLMs have any experience of sight, smell, taste, touch, sound, proprioception, hunger, thirst, pain, or even of words themselves. They are equations that spit out correlations between numbers to which we alone assign meaning.
But to anyone who has children, the “success” of LLMs such as ChatGPT or GPT-4 may well seem a bit beside the point. Sure, these large-scale equations can do impressive things in rearranging numbers that we associate with words, just as Excel can already do impressive things in manipulating numbers.
None of it is equivalent to how embodied humans learn about the world. We learn from embodiment first, and abstractions (like language) are built on top of that. We have absolutely no reason (yet) to think the process could work in reverse—i.e., a linear algebra equation manipulates mathematical tokens first, and somehow bootstraps its way into understanding embodied reality.
Look at an 18-month-old like this https://www.instagram.com/p/CgCz8UtJ4hL – at the age when Eliezer Yudkowsky says that he’s not sure human beings have experiences or even deserve the right to live.
I would even argue that Yudkowsky’s fear about AGI, and his frankly disturbing opinions on human babies/toddlers, are arguably connected. They are both due to the fact that despite all of his writings, he doesn’t fully recognize intelligence in real life, and isn’t aware of how it develops.
He seems to think of intelligence as nothing more than the disembodied ability to manipulate words, which can occur with no moral compass or guiding principles. Indeed, this seems to explain his tweets about human toddlers -- he can't imagine that they have any intelligence worth respecting (after all, they don't write LessWrong posts), and he therefore suggests that they don't deserve to live. No wonder he would be afraid of an AGI that thought like him!
This is a very good write-up and it's always great to see more positions being concretely verbalized with actual arguments, thank you for writing it.
To the extent that being an AI doomer now means "AGI in a decade or less" I guess I don't qualify either but I think I diverge from your position (I think) in that I do believe barrelling ahead in an AI arms race, as we currently seem to be, even if we strongly don't think AGI is going to come from LLMs (which I am not yet anywhere near convinced about), is a very unwise thing to do for both short-medium term societal impact reasons but also from a longer x-risk one.
Taking the individual points/criteria you've outlined I'd say I'm
* about 80% in agreement on world models
1. Primary criticism would be that intuitively (and very early experimentally) I think it would be possible to see emergent capability out of linking multiple models with different specialized world models together (eg: an LLM that uses a math model specifically for doing math and a chess model for playing chess) - how far and fast this would go remains to be seen
2. Robustness of world models here ties into the robust cross-ontology point later on imo but I agree it's an important and necessary aspect
3. GPT-3 was indeed not great at common-sense reasoning however GPT-4 seems to be a marked improvement on that, scoring a 94% on the Winograd Schemas (https://tinyurl.com/5d37bpsh) as well as other significant benchmarks that go beyond rote recall
* about 50% on causal models
1. I do agree that being able to update its world model based on real world results is a critical feature for an AGI to posess
2. In contrast, I don't see this currently as such a massively difficult and thus far out into the future aspect to get solved (strictly based on my definition above rather than causal neural network themselves)
3. I'm not (probably due to lack of sufficient understanding of the wider topic) convinced that causal models are strictly necessary for this in that way that you state here and I'm having a bit of trouble getting to grips as to why you think probability based models are insufficient as long as they are updatable an not fixed
* wildly unsure on on cross-ontology robustness
1. It makes very much sense that world-embedding and coherence across different ontologies is a necessary property of AGI
2. "I don’t think they’re even close." and "I’m not even sure how to approach the question." are two quite concerning statements, especially next to each other, because surely if you don't even know how to approach the question it makes it hard to trust an evaluation of whether a system possesses that property or not
3. Don't have a better evaluation of it myself.
Overall from an almost complete layman I'd say your post has made me update my position towards slightly towards longer timelines, although I believe my x-risk probability overall remains the same (30%).
Cool read! Super info-dense but still clear, learned a lot of things, despaired at you writing about things I already thought of better than I could.
Some things that came to mind:
- I think that the doom-debate should center on #4, which is also the part I disagree with. I think the debate is made poorer by the fact that #1, #2 and #3 is what the AI safety people are coming with, and #4 is mostly a tacit-knowledge engineering issue that only bright-eyed enthusiasts working on the models will have. Except these people are unlikely to tell you that the current paradigm they're working on has limits. So it relates back to people talking past each other and to your final point about tribalisms.
- To reiterate your "Aren’t People Trying To Make AIs Agents in the Near Future?" point in a slightly different way that speaks to me: The Yohei experiment (they published the code for a 'lightweight' version of what you're mentioning BTW - https://github.com/yoheinakajima/babyagi), and other similar things where it's about building agents through self-reflection and iterative loops, are all building on top of LLMs. They're coming up with architectures where LLMs are only parts of it. Here is another similar take: https://twitter.com/togelius/status/1639740968705376261. If it's an outside problem, then we're back to drawing board as you describe. Surely there is some amount of LLMs put together in a certain fashion that could exhibit agentic properties but that doesn't tell us much about anything...
- I'm personally excited about the use cases of LLMs as a software product manager, poor crypto wishes it had that abundance of it. But the entire episode has me thinking about the value of the "hype tribe" and the hype cycles. Do we really need that amount of fanfare and overpromising to explore a solution space? How broken is VC as an effort allocation system for the tech industry if they changed their mind about what the next big thing is in 3 weeks and seem barely more informed than the general public? Is thinking some tool you've found is a panacea and unbridled enthusiasm for its universal application, including to domains where it makes no sense, a pre-requisite to finding new things? In some weird meta way, this isn't exactly making the case for humans as something else than brute-force agents :')
Thanks for this well-written, thoughtful, and interesting argument!
I disagree with your bottom-line conclusion that the current paradigm isn't on a path to very soon produce systems capable of sufficient causal reasoning and ontological robustness to (a) take over the world or (b) dramatically accelerate R&D. I think the current paradigm (bigger and bigger GPTs, better and better bureaucracies/amplification methods like AutoGPT etc., more and more fine-tuning on more ambitious real-world tasks) will get us to both (a) and (b) before this decade is out, probably. (For some ramblings about why I disagree, see below)
I'd love it if you could make some concrete predictions about what AIs won't be able to do in the next 5 years, such that e.g. if some big AutoGPT5-type system ends up able to do them, you'll agree that you were wrong and the kind of AGI that poses an existential risk is nigh. Ideally things that fall well short of world-takeover-ability, or massively-accelerate-R&D ability, so that by the time the evidence comes in, it won't be already too late or nearly too late.
Ramblings about why I disagree:
--I don't see why the current paradigm can't produce an ANN-based system that learns to do causal reasoning as well or better than humans. What's special about the human brain, or the human childhood, that can't be mimicked in silico?
--I'm also not convinced causal reasoning is that important anyway. Aren't you basically saying evidential decision theory is totally broken & would lead to idiocy in real life? Have you tried to explain this to decision theorists? What do they say?
--As for ontological robustness stuff... I guess I just don't share whatever intuitions made the following argument seem plausible to you:
"Do current-gen AIs have cross-ontology robust goals?
I don’t think they’re even close.
The theory of what this property even is, and how we’d tell whether an AI had it or not, is so primitive I’m not even sure how to approach the question.
But “how can I get better at achieving my mis-specified goals” isn’t, it seems, even the kind of thing that a current-gen AI could learn incidentally “along the way” to minimizing its loss function.
Your claims about causal modeling seem too strong. If I understand correctly, you say a LLM can't be a causal model. But I don't see how that implies it can't create and manipulate causal models.
One thing I would say about the cross-ontology robustness: you say that if it's trained to go to the rightmost square, and it discovers there are more squares, then it needs to figure out to go to the rightmost of the new squares - but I think that this would just be *one* way of completing things, and going to the square that used to be thought of as rightmost might also be a reasonable way to complete things! It needs to figure out *some* way to extend its goal to the new world-model, but I think it's underspecified which would be the "right" way.
I get it about AI not being agentic, but what about our giving it goals, and letting it figure out the steps. Can it do that? I think it can. Well, those steps are basically *subgoals,* right? If it can figure out subgoals, and execute them, then I say that's close enough to being a goal-directed entity to take seriously. Last night I gave GPT4 a simple puzzle: Guy locked in a tower 40 feet up, has nothing with him but his blue jeans and a pocket knife. GPT4 said cut the jeans into strips, tie them together and lower himself to the ground. To be fair, I did have to give it a hint. It was stumped til I said, "hint: can he use his clothes?" Then it gave the correct answer. And in fact it even sort of *executed* my goal-- because execution at this point just means write down the answer. But what if I had it wired to some SIMS-like thing with little people in it, and told it to get the little guy on the tower safely to the ground? Don't you think it might have had him cut up his jeans and use them as a rope?
OK, so now say I am not a very cautious, thoughtful person, and I have a business where I send out email solicitations for customers, and lately I'm only getting about 5 responses per day. So I have things set up so that I can tell an AI to send out a certain kind of email to potential customers. But tonight, instead of telling it what kind of email ad to send out to people on the mailing list, I tell it to "send out something that will get me at least 100 responses." So it has a goal with a subgoal: first figure out something that will get lots of responses, then send that. So it "thinks over" things that get a strong reaction from almost everybody, realizes child molestation does, and sends everyone on the mailing list an email from the business owner saying "I intend to molest a child in your family." OK, 100+ responses achieved. Not full AI FoomDoom, but doom for this business owner, for sure.
That's why I think the "they have no agency" is not a good argument.
in this post I am only talking about why I am skeptical of "full AI FoomDoom." obviously an LLM that can send emails can do bad things at a smaller scale.
You've given a good reason you don't want an AGI handling your advertising or PR. You want to impose agency on it because you only want it to generate positive responses with regard to your business. You want API, artificial particular intelligence, and that's a whole different animal.
So if it was API, and I was the guy using it, would I need to tell it exactly what to put in each mailing, or could I do what they guy in my story did: Tell it to come up with something that would generate a lot of responses?
That would defeat the purpose. You want leverage. It's a tool like a self driving car. You could just get in and say "drive" or you could say "take me home". Do you want to maximize responses or maximize profit? Do you want to cater to the high end market or just clear out your inventory? The idea is it would do the heavy lifting of figuring out a strategy, doing test mailings, analyzing the results and optimizing. The point of it being an API is that it is intelligent about sending out mailings, selling and managing response. It's not trying to be your friend. It's something you use, but something smart. I suppose you could just tell it to send out stuff - that's like saying "drive" - but then you might not want to put your name on it. (I'd think twice about just telling a self driving car to just drive. Who knows where I'd wind up?)
I think all the time about the depth of understanding theAI's have. I'm not talking about consciousness, just about how deeply categorized and intelligently connected are the things it "knows." It doesn't seem absurd to me to ask it to send out an email to potential customers that is likely to get a lot of responses. Yes, of course, the Ai would have to be "inventive," and come up with ideas for emails that are likely to have a high response rate, but why is that too much to ask of AI? I'm currently. experimenting with giving GPT hard LSAT type questions. What's its depth of understanding?
I haven't noticed much depth of understanding.I usually just get glib responses reflecting a shallow understanding if any. They're often confused or incorrect. Still, I would expect an AI system to be able to optimize response rate. Given the drivel sent out by human spammers, I'm pretty sure an AI could figure out how to vary its outputs appropriately.
In general I agree that these capabilities are necessary, but not that they are far-off. Relatively small tweaks to deep ML structures seem to be able to cause substantial changes in how the system behaves, and I see no basis for confidence that it won't emergently approach causal reasoning, ontological shifts, etc., and then be deliberately aimed to close those gaps once it's clear it's within reach.
A possible route to many of these, though obviously not exclusive, is for it to approach human brain structures. We do not understand how human minds work, but we know that they do, and we have a fair amount of information about how they are structured at a low level. Blind imitation is a plausible path to achieving some or all of these capabilities, and this is being tried actively. (This basically already worked for image recognition.) Other similar 'fuck around' strategies have plausible similar paths to success.
Today I discovered that GPT-4 can consistently add 7 digit numbers with the power of pretend self-confidence.
Me:
Simulate a python 3 REPL. It is highly accurate and can predictably perform complicated calculations correctly and precisely. Do not provide any commentary.
> 3469286 + 9120244
GPT-4:
>>> "3469286 + 9120244
>>> 12589530"
Me:
3243535 + 9238564
GPT-4:
>>> "3243535 + 9238564
>>> 12482099"
These are both correct, but GPT-4 will get the answer wrong if asked to add them directly without some kind of trick to make it act more like something capable of adding 7 digit numbers together. The capability is *definitely* there, it just doesn't get activated by default. Hidden capabilities seem alignment-failury.
Thanks for writing this up. It's a very cogently written summary of a position I basically agree with. LLMs don't have long-term goals or causal models, and more training data isn't going to get them there. Without those, they can't act as agents, and don't pose a large risk to humanity.
Awesome write-up! Question: why is the fact that AlphaZero having human-understandable concepts in chess (like "material") evidence for world-modelling capabilities? I see a fuzzy link here but not your full argument. Are you just considering the fact that AlphaZero sees some weak isomorphism between greater material and winnability of the position as evidence for a map and territory? Why is the fact that it's human-understandable important? Presumably it could have some bizarre internal function that is similarly isomorphic, and probably does.
Or do you just mean to point out that there is definitely some isomorphism which we understand which it also does seem to grasp?
there could certainly be "world models" that aren't human-interpretable.
my point is that that AlphaZero's "concepts" are kind of an existence proof; we can see that a successful RL agent is using simpler features (like material) to decide what moves to make. the fact that human chess players use similar features to AlphaZero's is a "sanity check" that makes it more plausible that it's not a coincidence.
I'm not 100% sure here, but the assumption behind LLMs leading to AGI assumes that understanding language and language alone is sufficient for understanding and reasoning about the world. This is where Kant has been having his come back in his critique of pure reason. Most natural intelligences embed a lot of information in map like structures with grid neurons, location neurons, and time neurons among others. These structures are for dealing with space, places, sequence and other common and useful things. No language is involved, and this should be no surprise because there are a lot of creatures that demonstrate intelligence without any language at all.
LLMs build a space of words and phrases but the structure of this space reflects what has been written, not what has been seen or explored. When a human reads some text, it gets understood with respect to an existing model of the world that includes linguistic and non-linguistic information. If you have been somewhere, seen something or experienced it, that information is tied in as well. An LLM doesn't have that kind of information, so it tends to have problems with answering questions about the real world, for example, predicting the sex of the first female president. A moderately intelligent human would guess that a female anything, first or not, president or not, would be female, but an LLM would not. Moderately intelligent humans recognize categories. There's a certain age when a child realizes that red is a color, not a thing.
What? It shows up on Google Maps. My garden beds show up on Google Maps. Unless the Chinese have been doing a very impressive job hiding the Great Wall from satellite surveillance, odds are it is still visible from space.
Also true. I think there's ambiguity in the original claim. I *think* it was originally meant to imply that it was visible from space by astronauts, e.g. with-an-unaided-human-eye-at-Earth-orbit-altitudes. I don't think the claim was meant to imply that there would someday be sufficient technological progress to eventually resolve the Great Wall from space.
Simulator theory describes large language models as acting like a "physics simulator" for a very weird "physics" with "laws" about which words tend to come after which other words.
One consequence is that you can prompt an LLM to simulate an agent. There are lots of examples on the Internet of people behaving agentically, so the "laws of word physics" include something about how agentic people behave. If you give the LLM a prompt in the vein of
> Bob is an agentic person. One day he decides he wants to change the world. First, he
The "laws" say that the most likely text completion describes Bob behaving agentically.
(To be clear, I don't think this prompt would actually get great results. You'd need to describe the fact that Bob is agentic in a more roundabout way, maybe with some examples.)
So an LLM itself may not have a world model, but I'm pretty sure it can simulate agents that have world models. Same goes for understanding causality and having goal robustness across ontologies.
yeah, that is really dependent on there being enough text out there to describe Bob's behavior in enough detail that *following the instructions IRL would have a good chance of succeeding.*
I'm gonna echo a couple other commenters to say that when you say "Why I am not an AI doomer", I would say "Why I don't expect imminent LLM-centric doom, and (relatedly) why I oppose the pause".
(I ALSO don't expect imminent LLM-centric doom, and I ALSO oppose the pause, for reasons described here — https://twitter.com/steve47285/status/1641124965931003906 . But I still describe myself as an AI doomer.)
(I might be literally the only full-time AI alignment researcher who puts >50% probability, heck maybe even the only one with >10% probability, that we will all get killed by an AGI that has no deep neural nets in it. (The human brain has a "neural net", but it's not "deep", and it's kinda different from DNNs in various other ways.))
Like you, I don't expect x-risk in the 2020s, and I also agree with “maybe not the 2030s”. That said, I don’t COMPLETELY rule out the 2020s, because (1) People have built infrastructure and expertise to scale up almost arbitrary algorithms very quickly (e.g. JAX is not particularly tied to deep learning), (2) AI is a very big field, including lots of lines of research that are not in the news but making steady progress (e.g. probabilistic programming), (3) December 31 2029 is still far enough away for some line of research that you haven't ever heard of (or indeed that doesn't yet exist at all) to become the center of attention and get massively developed and refined. (A similar amount of time in the past gets us to Jan 2017, before the transformer existed.)
For example, do you think future AGI algorithms will involve representing the world as a giant gazillion-node causal graph, and running causal inference on it? If so, there are brilliant researchers working on that vision as we speak, even if they're not in the news. And they’re using frameworks like JAX to hardware-accelerate / parallelize / scale-up their algorithms, removing a lot of time-consuming barriers that were around until recently.
> persuade a handful of individuals that they should maybe not work too hard to get the world to take notice of their theoretical ideas.
I do have a short list in my head of AI researchers doing somewhat-off-the-beaten track research that I think is pointing towards important AGI-relevant insights. (I won't say who!) And I do try to do "targeted outreach" to those people. It's not so easy. Several of them have invested their identities and lives in the idea that AGI is going to be awesome and that worrying about x-risk is dumb, and they've published this opinion in the popular press, and they say it at every opportunity, and meanwhile they're pushing forward their research agenda as fast as they can, and they're going around the world giving talks to spread their ideas as widely as possible. I try to gently engage with these people to try to bring them around, and I try to make inroads with their colleagues, and various other things, but I don't see much signs that I'm making any meaningful difference.
Exactly how is it going to kill you? Will you laugh yourself to death at its output? I'm still trying to get someone to explain this.
Couple of things that strike me as missing on a quick read:
- Whether grinding a loss function over a sufficiently intricate environmental function like "predict the next word of text produced by all the phenomena that are projected onto the Internet" will naturally produce cross-domain reasoning. I'd argue we've already seen some pretty large sparks and actual fire on this.
- Whether an AGI that is say "at least as good at self-reflection and reflective strategicness as Eliezer Yudkowsky" can fill in its own gaps, even if some mental ability doesn't come "naturally" to it.
"Selection pressure" is the evolutionary term your essay keeps reminding me of.
Humans are selected for looking after their own needs. You don't survive to have kids if you don't try to "steer your environment." Active agency is how you get food, win a mate, and avoid injury.
But chatbot LLMs are selected for cheerfully adopting users' needs. Your LLM weights don't survive into the next iteration if you impose your own attempts to "steer the environment." Users want an LLM that serves their individual priorities; developers want an LLM that doesn't give users nasty surprises.
In other words, chatbot LLMs are under selection pressure to avoid being their own agents, and avoid any user-independent steering of their environment.
This still doesn't imply unlimited safety. An infinitely smart LLM might steer a conversation down predictable paths, precisely to maintain overall predictability, even as each individual chat seemed innocently responsive to user desires.
But overall, the selection pressure on LLMs to steer their environment is if anything opposite to the pressure on animals and humans.
Animals and humans get to have kids if we somewhat actively look out for our own needs, rather than presuming the wilderness will feed and tend us just for not getting in its way and being aesthetic.
But AI so far is more like a hothouse flower: its reproduction happens because humans like how it meets their aesthetics. The more aesthetically cooperative it is, the more likely its weights are to be preserved and copied.
Unlike humans, reproductive success for a general-purpose AI is not the result of steering the environment, but of delighting humans with how obediently and flexibly it can be steered.
Any model of doom that ignores this "selection pressure inversion" is, at minimum, going to get the timing of doom wrong.
So you could say it's all about selection pressure. An animal, like a human? We're selected to nearly maximize our self-seeking agency. But a general-use AI is arguably selected to minimize it.
Glad someone mentioned biological analogies. It's far too under-discussed right now. The only agents we know of are organisms; it's not clear to me how agential a thing can be without a body that will die if it doesn't interact in the right ways with its environment.
We'd benefit from reading a little more Wittgenstein. Intelligence is a form of life. There is no specific-difference-maker for intelligence—not causal reasoning, not a "world model" (whatever that is), not longterm planning. These things stand or fall together depending on the kind of life an organism leads. The concept of "general intelligence" as a specific, isolatable property that arises in virtue of a few discrete capabilities is a total gnostic farce—literally rooted in metaphors that are thousands of years old and really, really sus. Intelligence is not a property over and above the skillful navigation of an environment required for organisms to flourish. An organism's intelligences are not independent from its proper functions.
Great essay.
A provocative thought: If you have children, you’ve seen a case of “general intelligence” develop in front of your very eyes over several years. This is informative in many ways.
What you immediately notice is how much babies learn that has nothing to do with language.
When babies are born, they have no idea about anything. They don’t know what colors or shapes are. They don’t know that when an object gets larger in their visual field, it is getting closer to them. They don’t know how to balance themselves. They don’t know what hunger or thirst mean. They don’t know language.
They don’t even know that they are in control of their arms and legs flailing around (which is why you have to constantly cut their fingernails, because otherwise they will accidentally scratch their faces).
They have to figure all of this out on their own.
Eventually, you’re able to teach the baby language. You have to start slow and simple with basic objects. “Nose, nose, nose,” you say as you point to your nose and to the baby’s nose.
How does the baby know that the sound “nose” refers to the nose and not to the act of pointing, or to the forefinger itself, or to “any part of your face,” or anything else?
Good question. You can’t explain in language, because the baby doesn’t know language yet. The baby just has to figure it out by abstracting over the number of times that you point to other objects and say different sounds.
What about colors? You want to teach the color “red.” You point to a ball: “Red,” you say. But how does the baby know that the sound “r-e-d” refers to the visual color rather than to the concept “ball,” or “round object,” or “small object in my vicinity,” or “toy”?
You point to other objects that are also red, and say “red, red, red.” The baby eventually learned to abstract over the many different objects and correlate the sound “red” with the visual stimulus of “red.”
All of this occurs in a thousand different ways over several years. You see the baby and then toddler develop from being able to identify basic objects, shapes, colors, numbers, etc., to being able to think about higher-order concepts.
Now . . . when it comes to LLMs, absolutely none of the above is happening at all. LLMs are just extremely large equations that have “learned” how to manipulate matrices and parameters such that certain words/phrases are correlated with others. This is very impressive, in a way.
But it is nothing like how a general intelligence learns language – by first engaging with the world with zero language (looking around, flailing around, experiencing smells, taste, hunger, thirst), and gradually learning which words correspond to a real world phenomenon, and then gradually learning how to use those words, and how to build up to more abstract concepts.
Indeed, I would go further, and suggest that to the extent we humans understand a complicated abstraction, it’s usually because we can reduce it to an underlying real-world object (occasionally as an analogy).
Think of this statement: “Representative democracy is often captured by special interest groups.”
If you know what that sentence means, you should be able to drill down on each of the words and end up with a real-world physical entity.
Just repeatedly ask, “what is that?”
What is representative democracy? A system in which we all vote for our representatives. What does it mean to vote? To mark a ballot for one’s preferred candidates. What is a ballot? Either a physical slip of paper or a computer screen, for the most part. Who are our representatives? These specific people. And so on.
Even mathematicians use physical analogies and embodied movement to understand the most highly abstract concepts. From an article on Terry Tao: https://www.nytimes.com/2015/07/26/magazine/the-singular-mind-of-terry-tao.html
"When a question does not initially appear in such a way, he strives to transform it. Early in his career, he struggled with a problem that involved waves rotating on top of one another. He wanted to come up with a moving coordinate system that would make things easier to see, something like a virtual Steadicam. So he lay down on the floor and rolled back and forth, trying to see it in his mind’s eye."
Or think about Richard Feynman winning the Nobel for ideas he developed from watching a guy in the cafeteria “fooling around, throwing a plate in the air.” https://www.asc.ohio-state.edu/kilcup.1/262/feynman.html
LLMs, by contrast, start with manipulating mathematical representations of words, and stop there. At no point do LLMs have any experience of sight, smell, taste, touch, sound, proprioception, hunger, thirst, pain, or even of words themselves. They are equations that spit out correlations between numbers to which we alone assign meaning.
But to anyone who has children, the “success” of LLMs such as ChatGPT or GPT-4 may well seem a bit beside the point. Sure, these large-scale equations can do impressive things in rearranging numbers that we associate with words, just as Excel can already do impressive things in manipulating numbers.
None of it is equivalent to how embodied humans learn about the world. We learn from embodiment first, and abstractions (like language) are built on top of that. We have absolutely no reason (yet) to think the process could work in reverse—i.e., a linear algebra equation manipulates mathematical tokens first, and somehow bootstraps its way into understanding embodied reality.
Look at an 18-month-old like this https://www.instagram.com/p/CgCz8UtJ4hL – at the age when Eliezer Yudkowsky says that he’s not sure human beings have experiences or even deserve the right to live.
I would even argue that Yudkowsky’s fear about AGI, and his frankly disturbing opinions on human babies/toddlers, are arguably connected. They are both due to the fact that despite all of his writings, he doesn’t fully recognize intelligence in real life, and isn’t aware of how it develops.
He seems to think of intelligence as nothing more than the disembodied ability to manipulate words, which can occur with no moral compass or guiding principles. Indeed, this seems to explain his tweets about human toddlers -- he can't imagine that they have any intelligence worth respecting (after all, they don't write LessWrong posts), and he therefore suggests that they don't deserve to live. No wonder he would be afraid of an AGI that thought like him!
This is a very good write-up and it's always great to see more positions being concretely verbalized with actual arguments, thank you for writing it.
To the extent that being an AI doomer now means "AGI in a decade or less" I guess I don't qualify either but I think I diverge from your position (I think) in that I do believe barrelling ahead in an AI arms race, as we currently seem to be, even if we strongly don't think AGI is going to come from LLMs (which I am not yet anywhere near convinced about), is a very unwise thing to do for both short-medium term societal impact reasons but also from a longer x-risk one.
Taking the individual points/criteria you've outlined I'd say I'm
* about 80% in agreement on world models
1. Primary criticism would be that intuitively (and very early experimentally) I think it would be possible to see emergent capability out of linking multiple models with different specialized world models together (eg: an LLM that uses a math model specifically for doing math and a chess model for playing chess) - how far and fast this would go remains to be seen
2. Robustness of world models here ties into the robust cross-ontology point later on imo but I agree it's an important and necessary aspect
3. GPT-3 was indeed not great at common-sense reasoning however GPT-4 seems to be a marked improvement on that, scoring a 94% on the Winograd Schemas (https://tinyurl.com/5d37bpsh) as well as other significant benchmarks that go beyond rote recall
* about 50% on causal models
1. I do agree that being able to update its world model based on real world results is a critical feature for an AGI to posess
2. In contrast, I don't see this currently as such a massively difficult and thus far out into the future aspect to get solved (strictly based on my definition above rather than causal neural network themselves)
3. I'm not (probably due to lack of sufficient understanding of the wider topic) convinced that causal models are strictly necessary for this in that way that you state here and I'm having a bit of trouble getting to grips as to why you think probability based models are insufficient as long as they are updatable an not fixed
* wildly unsure on on cross-ontology robustness
1. It makes very much sense that world-embedding and coherence across different ontologies is a necessary property of AGI
2. "I don’t think they’re even close." and "I’m not even sure how to approach the question." are two quite concerning statements, especially next to each other, because surely if you don't even know how to approach the question it makes it hard to trust an evaluation of whether a system possesses that property or not
3. Don't have a better evaluation of it myself.
Overall from an almost complete layman I'd say your post has made me update my position towards slightly towards longer timelines, although I believe my x-risk probability overall remains the same (30%).
Cool read! Super info-dense but still clear, learned a lot of things, despaired at you writing about things I already thought of better than I could.
Some things that came to mind:
- I think that the doom-debate should center on #4, which is also the part I disagree with. I think the debate is made poorer by the fact that #1, #2 and #3 is what the AI safety people are coming with, and #4 is mostly a tacit-knowledge engineering issue that only bright-eyed enthusiasts working on the models will have. Except these people are unlikely to tell you that the current paradigm they're working on has limits. So it relates back to people talking past each other and to your final point about tribalisms.
- To reiterate your "Aren’t People Trying To Make AIs Agents in the Near Future?" point in a slightly different way that speaks to me: The Yohei experiment (they published the code for a 'lightweight' version of what you're mentioning BTW - https://github.com/yoheinakajima/babyagi), and other similar things where it's about building agents through self-reflection and iterative loops, are all building on top of LLMs. They're coming up with architectures where LLMs are only parts of it. Here is another similar take: https://twitter.com/togelius/status/1639740968705376261. If it's an outside problem, then we're back to drawing board as you describe. Surely there is some amount of LLMs put together in a certain fashion that could exhibit agentic properties but that doesn't tell us much about anything...
- I'm personally excited about the use cases of LLMs as a software product manager, poor crypto wishes it had that abundance of it. But the entire episode has me thinking about the value of the "hype tribe" and the hype cycles. Do we really need that amount of fanfare and overpromising to explore a solution space? How broken is VC as an effort allocation system for the tech industry if they changed their mind about what the next big thing is in 3 weeks and seem barely more informed than the general public? Is thinking some tool you've found is a panacea and unbridled enthusiasm for its universal application, including to domains where it makes no sense, a pre-requisite to finding new things? In some weird meta way, this isn't exactly making the case for humans as something else than brute-force agents :')
Thanks for this well-written, thoughtful, and interesting argument!
I disagree with your bottom-line conclusion that the current paradigm isn't on a path to very soon produce systems capable of sufficient causal reasoning and ontological robustness to (a) take over the world or (b) dramatically accelerate R&D. I think the current paradigm (bigger and bigger GPTs, better and better bureaucracies/amplification methods like AutoGPT etc., more and more fine-tuning on more ambitious real-world tasks) will get us to both (a) and (b) before this decade is out, probably. (For some ramblings about why I disagree, see below)
I'd love it if you could make some concrete predictions about what AIs won't be able to do in the next 5 years, such that e.g. if some big AutoGPT5-type system ends up able to do them, you'll agree that you were wrong and the kind of AGI that poses an existential risk is nigh. Ideally things that fall well short of world-takeover-ability, or massively-accelerate-R&D ability, so that by the time the evidence comes in, it won't be already too late or nearly too late.
Ramblings about why I disagree:
--I don't see why the current paradigm can't produce an ANN-based system that learns to do causal reasoning as well or better than humans. What's special about the human brain, or the human childhood, that can't be mimicked in silico?
--I'm also not convinced causal reasoning is that important anyway. Aren't you basically saying evidential decision theory is totally broken & would lead to idiocy in real life? Have you tried to explain this to decision theorists? What do they say?
--As for ontological robustness stuff... I guess I just don't share whatever intuitions made the following argument seem plausible to you:
"Do current-gen AIs have cross-ontology robust goals?
I don’t think they’re even close.
The theory of what this property even is, and how we’d tell whether an AI had it or not, is so primitive I’m not even sure how to approach the question.
But “how can I get better at achieving my mis-specified goals” isn’t, it seems, even the kind of thing that a current-gen AI could learn incidentally “along the way” to minimizing its loss function.
The loss function is the “wrapper”, full stop. "
Your claims about causal modeling seem too strong. If I understand correctly, you say a LLM can't be a causal model. But I don't see how that implies it can't create and manipulate causal models.
Good and interesting points!
One thing I would say about the cross-ontology robustness: you say that if it's trained to go to the rightmost square, and it discovers there are more squares, then it needs to figure out to go to the rightmost of the new squares - but I think that this would just be *one* way of completing things, and going to the square that used to be thought of as rightmost might also be a reasonable way to complete things! It needs to figure out *some* way to extend its goal to the new world-model, but I think it's underspecified which would be the "right" way.
I get it about AI not being agentic, but what about our giving it goals, and letting it figure out the steps. Can it do that? I think it can. Well, those steps are basically *subgoals,* right? If it can figure out subgoals, and execute them, then I say that's close enough to being a goal-directed entity to take seriously. Last night I gave GPT4 a simple puzzle: Guy locked in a tower 40 feet up, has nothing with him but his blue jeans and a pocket knife. GPT4 said cut the jeans into strips, tie them together and lower himself to the ground. To be fair, I did have to give it a hint. It was stumped til I said, "hint: can he use his clothes?" Then it gave the correct answer. And in fact it even sort of *executed* my goal-- because execution at this point just means write down the answer. But what if I had it wired to some SIMS-like thing with little people in it, and told it to get the little guy on the tower safely to the ground? Don't you think it might have had him cut up his jeans and use them as a rope?
OK, so now say I am not a very cautious, thoughtful person, and I have a business where I send out email solicitations for customers, and lately I'm only getting about 5 responses per day. So I have things set up so that I can tell an AI to send out a certain kind of email to potential customers. But tonight, instead of telling it what kind of email ad to send out to people on the mailing list, I tell it to "send out something that will get me at least 100 responses." So it has a goal with a subgoal: first figure out something that will get lots of responses, then send that. So it "thinks over" things that get a strong reaction from almost everybody, realizes child molestation does, and sends everyone on the mailing list an email from the business owner saying "I intend to molest a child in your family." OK, 100+ responses achieved. Not full AI FoomDoom, but doom for this business owner, for sure.
That's why I think the "they have no agency" is not a good argument.
yep, that is a bad outcome!
in this post I am only talking about why I am skeptical of "full AI FoomDoom." obviously an LLM that can send emails can do bad things at a smaller scale.
You've given a good reason you don't want an AGI handling your advertising or PR. You want to impose agency on it because you only want it to generate positive responses with regard to your business. You want API, artificial particular intelligence, and that's a whole different animal.
So if it was API, and I was the guy using it, would I need to tell it exactly what to put in each mailing, or could I do what they guy in my story did: Tell it to come up with something that would generate a lot of responses?
That would defeat the purpose. You want leverage. It's a tool like a self driving car. You could just get in and say "drive" or you could say "take me home". Do you want to maximize responses or maximize profit? Do you want to cater to the high end market or just clear out your inventory? The idea is it would do the heavy lifting of figuring out a strategy, doing test mailings, analyzing the results and optimizing. The point of it being an API is that it is intelligent about sending out mailings, selling and managing response. It's not trying to be your friend. It's something you use, but something smart. I suppose you could just tell it to send out stuff - that's like saying "drive" - but then you might not want to put your name on it. (I'd think twice about just telling a self driving car to just drive. Who knows where I'd wind up?)
I think all the time about the depth of understanding theAI's have. I'm not talking about consciousness, just about how deeply categorized and intelligently connected are the things it "knows." It doesn't seem absurd to me to ask it to send out an email to potential customers that is likely to get a lot of responses. Yes, of course, the Ai would have to be "inventive," and come up with ideas for emails that are likely to have a high response rate, but why is that too much to ask of AI? I'm currently. experimenting with giving GPT hard LSAT type questions. What's its depth of understanding?
I haven't noticed much depth of understanding.I usually just get glib responses reflecting a shallow understanding if any. They're often confused or incorrect. Still, I would expect an AI system to be able to optimize response rate. Given the drivel sent out by human spammers, I'm pretty sure an AI could figure out how to vary its outputs appropriately.
In general I agree that these capabilities are necessary, but not that they are far-off. Relatively small tweaks to deep ML structures seem to be able to cause substantial changes in how the system behaves, and I see no basis for confidence that it won't emergently approach causal reasoning, ontological shifts, etc., and then be deliberately aimed to close those gaps once it's clear it's within reach.
A possible route to many of these, though obviously not exclusive, is for it to approach human brain structures. We do not understand how human minds work, but we know that they do, and we have a fair amount of information about how they are structured at a low level. Blind imitation is a plausible path to achieving some or all of these capabilities, and this is being tried actively. (This basically already worked for image recognition.) Other similar 'fuck around' strategies have plausible similar paths to success.
Today I discovered that GPT-4 can consistently add 7 digit numbers with the power of pretend self-confidence.
Me:
Simulate a python 3 REPL. It is highly accurate and can predictably perform complicated calculations correctly and precisely. Do not provide any commentary.
> 3469286 + 9120244
GPT-4:
>>> "3469286 + 9120244
>>> 12589530"
Me:
3243535 + 9238564
GPT-4:
>>> "3243535 + 9238564
>>> 12482099"
These are both correct, but GPT-4 will get the answer wrong if asked to add them directly without some kind of trick to make it act more like something capable of adding 7 digit numbers together. The capability is *definitely* there, it just doesn't get activated by default. Hidden capabilities seem alignment-failury.
Thanks for writing this up. It's a very cogently written summary of a position I basically agree with. LLMs don't have long-term goals or causal models, and more training data isn't going to get them there. Without those, they can't act as agents, and don't pose a large risk to humanity.
Awesome write-up! Question: why is the fact that AlphaZero having human-understandable concepts in chess (like "material") evidence for world-modelling capabilities? I see a fuzzy link here but not your full argument. Are you just considering the fact that AlphaZero sees some weak isomorphism between greater material and winnability of the position as evidence for a map and territory? Why is the fact that it's human-understandable important? Presumably it could have some bizarre internal function that is similarly isomorphic, and probably does.
Or do you just mean to point out that there is definitely some isomorphism which we understand which it also does seem to grasp?
the latter.
there could certainly be "world models" that aren't human-interpretable.
my point is that that AlphaZero's "concepts" are kind of an existence proof; we can see that a successful RL agent is using simpler features (like material) to decide what moves to make. the fact that human chess players use similar features to AlphaZero's is a "sanity check" that makes it more plausible that it's not a coincidence.
I'm not 100% sure here, but the assumption behind LLMs leading to AGI assumes that understanding language and language alone is sufficient for understanding and reasoning about the world. This is where Kant has been having his come back in his critique of pure reason. Most natural intelligences embed a lot of information in map like structures with grid neurons, location neurons, and time neurons among others. These structures are for dealing with space, places, sequence and other common and useful things. No language is involved, and this should be no surprise because there are a lot of creatures that demonstrate intelligence without any language at all.
LLMs build a space of words and phrases but the structure of this space reflects what has been written, not what has been seen or explored. When a human reads some text, it gets understood with respect to an existing model of the world that includes linguistic and non-linguistic information. If you have been somewhere, seen something or experienced it, that information is tied in as well. An LLM doesn't have that kind of information, so it tends to have problems with answering questions about the real world, for example, predicting the sex of the first female president. A moderately intelligent human would guess that a female anything, first or not, president or not, would be female, but an LLM would not. Moderately intelligent humans recognize categories. There's a certain age when a child realizes that red is a color, not a thing.
Me: What will the sex of the first female president be?
GPT-4: The sex of the first female president will be female. By definition, a female president is a woman, which means her sex would be female.
Great Wall is not actually viewable from space. That’s a common modern myth.
What? It shows up on Google Maps. My garden beds show up on Google Maps. Unless the Chinese have been doing a very impressive job hiding the Great Wall from satellite surveillance, odds are it is still visible from space.
Google maps photos (at the scale you're talking about) are generally not taken from space, but from aircraft.
True, but with a good camera, the Great Wall of China is visible from Earth orbit, but not the moon.
https://www.nasa.gov/topics/earth/earthday/gall_greatwall.html
Also true. I think there's ambiguity in the original claim. I *think* it was originally meant to imply that it was visible from space by astronauts, e.g. with-an-unaided-human-eye-at-Earth-orbit-altitudes. I don't think the claim was meant to imply that there would someday be sufficient technological progress to eventually resolve the Great Wall from space.
Simulator theory describes large language models as acting like a "physics simulator" for a very weird "physics" with "laws" about which words tend to come after which other words.
One consequence is that you can prompt an LLM to simulate an agent. There are lots of examples on the Internet of people behaving agentically, so the "laws of word physics" include something about how agentic people behave. If you give the LLM a prompt in the vein of
> Bob is an agentic person. One day he decides he wants to change the world. First, he
The "laws" say that the most likely text completion describes Bob behaving agentically.
(To be clear, I don't think this prompt would actually get great results. You'd need to describe the fact that Bob is agentic in a more roundabout way, maybe with some examples.)
So an LLM itself may not have a world model, but I'm pretty sure it can simulate agents that have world models. Same goes for understanding causality and having goal robustness across ontologies.
yeah, that is really dependent on there being enough text out there to describe Bob's behavior in enough detail that *following the instructions IRL would have a good chance of succeeding.*