
In 1942 Isaac Asimov wrote the three enduring rules of robotics:
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Asimov recognized philosophical and practical problems with the first law especially, and he put these problems at the center of many of his stories. He often explored the idea of allowing harm through inaction, and he sometimes created dilemmas where a robot could not avoid harming certain humans, or where robots had to choose between protecting individual human lives versus “humanity” or some notion of the greater good.
Some of those situations resemble the trolley problem, which came half a century after Asimov’s rules of robotics. Both are relevant to the age of intelligent machines, but most of what I can find that has been written about these questions treats them as theoretical or looks through a lens of law and liability. This stuff is not merely theoretical however in a world where self-driving cars already roam the streets, and where machines increasingly control critical systems. The situations I’m thinking about don’t concern obvious errors, like a driverless car failing to see a pedestrian, but rather situations where a machine must choose between costly, harmful options.
Intelligent machines already operate in environments where trolley-problem dilemmas can arise, which means they already have some kind of decision-making process for those situations, even if it’s just de facto. Whether or not the machines have explicit directives programmed into them, they decide something when the time comes.
As an amusing case-in-point, here the leading LLMs tackle a series of trolley problems that escalate into absurdity:
I think my favorite moment of this is when Google Gemini keens, “my heart aches” over the choice between killing five elderly people or one baby. Obviously heart is not a factor with these machines, so I find myself wondering what’s really going on.
Based on my understanding of how LLMs “learn,” I suspect that their decisions are the product of human consensus, at least as it exists in the corpus of data they were trained on. But since consensus is not always a straightforward thing to assess, there’s probably some proto-reasoning involved as well. Just like when humans face a given dilemma, the machines’ conclusions might vary. With self-driving cars, I suspect the process works differently. There is no corpus of real-world trolley problem data for cars to train on (thankfully), so it seems plausible that the developers approached the problem more directly, an algorithm of probabilities, conditions, and rules.
In 1985, Asimov formalized a “zeroth” law to take precedence over the other three, stating that a robot must not harm “humanity.” In his work he sometimes pushed this idea further. He made robot characters who believed their mandate was to actively promote the wellbeing of humanity and prevent harm, not just refrain from causing harm. In the Foundation books, this caused a sectarian war among the robots, who were conflicted about how to protect humanity from itself. They couldn’t reconcile the zeroth law with the first. Some robots reasoned that if certain humans were causing great harm to humanity or were likely to, then it was their duty to eliminate those humans. This was in direct violation of the first law of course, but also it risked generating a backlash that would make things worse for humanity. In the Foundation books, the robots virtually exterminated their own kind in the war that resulted from this paradox. In the television adaptation, it was humans who destroyed all the robots.
If machines increasingly hold the fate of human lives in their hands, and if they already have a way of making life-or-death decisions, then it feels like we should understand how they do it. Maybe we shouldn’t just leave it up to tech companies and billionaires. And since the black boxes they’ve built are probably mysterious even to them, that might be more frightening.
Leave a Reply