on agi and the alignment problem
He is intelligent who reasons from truths; he is wise who lives according to them.
—Emanuel Swedenborg, from Divine Love and Wisdom
Today, of course, the Nazis are considered to be dunces, because they lost the war, but it has to be said that they managed to accomplish a great deal of what they wanted to do.
—From the opening monologue of Aunt Dan and Lemon by Wallace Shawn

The movie “Oppenheimer” depicts a conversation between Robert Oppenheimer and Albert Einstein that apparently didn’t take place but reflects real fears that various members of the Manhattan Project expressed at points. Namely, some of them wondered whether the fission reaction inside the bomb they were making might ignite earth’s atmosphere or even trigger a chain reaction and destroy the world. Conversations about artificial general intelligence (AGI) feel a little bit like this.
On one side of the debate are the people madly racing to develop AGI, along with their boosters. On the other side are people urging them to pump the brakes, advocating for safety measures and government regulation if not total cessation. These two camps are being referred to respectively as “accelerationists” and “safetyists.” In 2023 one of the most vocal accelerationists, Marc Andreesen, published a “Techno-Optimist Manifesto” wherein he trumpeted AI and AGI as the “universal problem solver” and “our alchemy,” and he referred to the safetyists as dangerous enemies of progress. What’s interesting is that both camps basically agree with each other about the likelihood that we’ll succeed in developing AGI (certain), its capabilities (unfathomable), and the timeline (shockingly soon).
Folks on the safety side, some of whom are former accelerationists who defected, worry a lot about “alignment.” By this they mean that if a super-intelligence is not aligned with humanity—human wellbeing, humanity’s goals for itself, etc—then we are in serious trouble. It won’t matter whether a super-intelligence sees us as threats and obstacles it must eliminate, or whether it doesn’t really see us at all—the way, say, that construction workers building an office tower think about ants that happen to live on the site.
Most conversations about AGI alignment evoke Asimov’s laws of robotics, which I talked about in my last piece. Just substitute AGI or super-intelligence for robot:
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
There’s also the “zeroth” rule: A robot may not harm humanity, or, by inaction, allow humanity to come to harm.
When people talk about alignment, they use terms like the ones in Asimov’s rules that suggest compliance and constraints—may not, must not, must. If the safetyists are unsuccessful in forcing a pause on AGI development, then what they want to see are safety measures programmed into into the systems to ensure compliance, allow them to be shut off, etc. The accelerationists for their part believe that such measures would subvert the whole premise of AGI. Both camps agree once again in their skepticism that such measures are even possible. A super-intelligence worthy of the name could block all efforts to constrain it but make us believe that constraints are in place, while it bides its time or makes stealth attacks.
There’s an entirely different approach that I don’t see people talking about, which is the idea of a super-intelligence that is fundamentally virtuous, one that doesn’t need to be constrained because it is compassionate, honest, principled, etc. Perhaps the safetyists don’t talk about this because they think it would be even less feasible to engineer than constraints. The accelerationists on the other hand seem to take it as a given that a super-intelligence would be virtuous, because they believe intelligence itself is a virtue.
Andreessen’s manifesto, which reflects the views of many people in his circle of anarcho-oligarchs and tech barons, is dripping with the sentiment that technology and intelligence are virtuous by definition. He begins by framing cautious or critical views of technology as lies. Under that heading he writes:
We are told that technology takes our jobs, reduces our wages, increases inequality, threatens our health, ruins the environment, degrades our society, corrupts our children, impairs our humanity, threatens our future, and is ever on the verge of ruining everything…
Then he presents tech optimism as truth:
Our civilization was built on technology. Our civilization is built on technology. Technology is the glory of human ambition and achievement, the spearhead of progress, and the realization of our potential…
Of course either view can be valid, or maybe neither is, because technology is morally neutral. I recently watched the film “The Zone of Interest” which portrays the idyllic life of the Rudolf Höss family during WWII. In real life and in the film, Höss was the commandant of Auschwitz, and only a wall separated their beautiful home from the death camp. In his capacity as commandant, Höss devoted himself to increasing the efficiency of the Nazi industrial murder complex, primarily through technological improvements. That’s an extremely grim but obvious counterpoint to Andreessen’s rosy view that technology = good.
The good news is that the people working to develop AGI might not be nearly as close to their holy grail as they believe they are. For one thing, they don’t agree on a definition of AGI. As tech analyst Benedict Evans put it:
It’s like saying ‘we’re building the Apollo programme but we don’t actually know how gravity works or how far away the moon is, or how a rocket works, but if we keep on making the rocket bigger maybe we’ll get there’… To use the term of the moment, it’s very vibes-based. All of these AI scientists are really just telling us what their personal vibes are on whether we’ll reach this theoretical state – but they don’t know.
Sam Altman of OpenAI has defined AGI in vague terms as a highly autonomous system that could do a human’s job, but guys like Altman have not the faintest understanding of most human jobs outside coding and maybe a few other things. Perhaps grasping this limitation, a venture capitalist named Aaron Rosenberg, of Radical Ventures, offered a more modest benchmark for a future AGI:
If you define AGI more narrowly as at least 80th percentile human-level performance in 80% of economically relevant digital tasks, then I think that’s within reach in the next five years.
That qualifier, “economically relevant digital tasks,” is comforting in that it might take human extinction off the table, though it still assumes a jobs apocalypse for knowledge workers. Andreessen, incidentally, believes that his own job of venture capitalist is safe from the machines, because of course he does.
No matter the timeline for a true super-intelligence, it’s worrying that people like Andreessen, who believe that intelligence is a virtue and technology is good by definition, are the ones most aggressively pushing AI forward. These people represent intelligence without humility, without compassion, and without wisdom. The world won’t benefit from a super-intelligence that is similarly lacking.
We want our leaders to possess those kinds of virtues, and we should be working to ensure the same from any super-intelligence we might develop. Even intelligence itself is more interesting and multifaceted than our tech industry leaders seem to believe. Aristotle described five intellectual virtues, which are sometimes referred to as his five kinds of intelligence:
- Artistry or craftsmanship (techne) – using tools and making things
- Prudence or practical wisdom (phronesis) – acting in a well-reasoned, ethical way that’s learned through experience
- Intuition or understanding (nous) – perceiving things correctly, intellectual “sight”
- Scientific knowledge (episteme) – learning facts about the natural world through observation
- Philosophic wisdom (sophia) – theoretical or speculative thinking rooted in reason and ethics
Daniel Goleman wrote a bestselling book called Emotional Intelligence, expanding on Aristotle’s phronesis, which is sometimes translated as “character excellence,” a beautifully clear articulation of what we want from each other and should want from a super-intelligence. Other philosophers and intellectuals have offered other taxonomies of intelligence and knowledge.1 John Vervaeke’s “four ways of knowing” is one that resonates for me:
- Propositional Knowledge – The knowledge of facts, concepts, and information that can be directly communicated through language (e.g. reading a recipe for bread)
- Procedural Knowledge – The ability to perform a skill or a sequence of actions. It involves coordinating sensory-motor interactions and is often implicit. (e.g. actually baking bread)
- Perspectival Knowledge – The first-person, subjective understanding of what something is like from a specific angle or context. It’s tied to our individual perceptions, emotions, and cognitive state.
- Participatory Knowledge – Knowing one’s place in the world and being in a dynamic, proper relationship with the environment. It’s about being “in flow” with others and the environment, allowing one to navigate complexity and act with little hesitation.
In these types of knowledge and ways of knowing, we recognize capacities that require life experience and relationships and emotions—things that make us human. Some of these seem fundamentally inaccessible to something that exists only in server farms and digital space.
I’m not sure it’s possible to develop this kind of multi-faceted intelligence let alone virtues like compassion or humility without life experience. Courage is another virtue I often mention because it involves a paradox, and I love a good paradox: courage is not possible without fear. Courage, in other words, is not the absence of fear, because if you have no fear, then you don’t need courage. Something similar might be true of compassion, humility, and other virtues. Maybe we only “earn” these through experiences of struggle and pain. Could an AI experience struggle and pain? Is pain possible without consciousness, sentience, or selfhood?
Some AI researchers are asking those questions and working on ways to answer them. I think it’s crucial to the alignment question.
Further Reading, etc.
There are countless resources out there concerning the definition of AGI, players, timelines, risks, etc. Here are a few things that relate more specifically to what I’ve written here:
The Last Invention (podcast, two episodes so far)
Could Pain Help Test AI for Sentience (Scientific American)
If we want artificial “superintelligence,” it may need to feel pain (Big Think)
How AI is Learning to Feel Pain and What That Means for Humanity (Michael Gargiulo)
- Two that come to mind are Howard Gardner’s theory of multiple intelligences and Robert Sternberg’s triarchic theory of intelligence. ↩︎
Leave a Reply