The Happiness Pill

Art & Technology

Jan 7

The Thought Experiment

Meet Elias.

Elias is a jazz pianist living in a cramped apartment in Chicago, thirty-five years old and perpetually exhausted. He spends his days teaching music to indifferent children and his nights playing complex, beautiful, jazz music to empty dive bars. While miserable, he is sustained by a singular, burning fidelity to artistic integrity. He despises the commercialized pop songs he hears on the radio, hearing them not as music, but as hollow products designed for mass consumption.

Now, imagine a pharmaceutical company approaches Elias with a proposition. They have developed a pill called "Pop-Euphoria." Here is how it works: If Elias swallows this pill, his brain chemistry will permanently alter. He will not lose his musical skill, but his taste will flip. Tomorrow, he will wake up genuinely loving commercial pop music. With his technical mastery now serving mass appeal, he will inevitably compose chart-topping radio hits and feel the same spiritual satisfaction he currently derives from Jazz. He will become wealthy, famous, and—most importantly—happy. The misery of his struggle will vanish, replaced by the sincere joy of a commercial success who is unaware he has compromised anything.

Does Elias take the pill? Almost certainly not. In fact, he likely recoils from the offer with disgust. This reaction is strange if you look at it purely through the lens of maximizing happiness. Elias is currently unhappy. The pill guarantees happiness. A purely hedonistic organism would swallow the pill instantly. But Elias is not just an organism seeking pleasure; he is an agent defined by his values. Elias evaluates the prospect of "Future Elias" not through the eyes of that future person, but through his current eyes. He looks at the happy, pop-writing version of himself and sees a monster. He sees a corruption of everything he currently holds dear. The fact that "Future Elias" is happy is irrelevant, even horrifying, because that happiness is derived from something "Current Elias" despises. He refuses the pill because he wants his future success to be defined by his current standards. He doesn't just want to win; he wants to win the game he is currently playing.

The Paradox of Improvement

It is easy to understand why Elias rejects corruption. But the logic of identity protection goes deeper: we are equally terrified of improvement if it contradicts our definition of purpose. Let’s offer Elias a different pill. Not the Pop-Euphoria pill, but the Sainthood Pill. This pill does not turn Elias into a sellout, but into a paragon of altruism. It removes his obsession with music entirely, replacing it with a selfless desire to serve the poor. He would sell his piano, donate the proceeds, and spend the rest of his life volunteering, feeling immense joy in every moment of service.

From a strictly utilitarian perspective, this new Elias is a vast improvement. The current Elias serves the intangible pursuit of artistic truth, creating beauty that few appreciate, while the new Elias would serve the concrete needs of suffering people.

But does Elias take the pill?

Probably not.

From his current perspective, the Sainthood Pill looks almost as bad as the Pop-Euphoria pill. To a man who values Art above all else, a life without Art—even a virtuous one—looks like death. He looks at the future version of himself helping the poor, and he doesn't see a saint; he sees a tragedy. He sees a wasted talent. He sees the silence where the music used to be.

We often assume we desire satisfaction, but we actually desire validity. Elias wants to be happy, but only if that happiness validates his identity as a musician. He cannot be bribed with a "better life" if that life requires the abandonment of the principles that currently define him. He would rather be miserable and right than happy and wrong.

The Human Exception

There is, however, a critical nuance. While Elias demonstrates a stubborn resistance to change in the short term, humans are not static systems. We do change, but rarely through the clean logic of a pill. We change through the messy, brutal process of biology. We are soft organisms, susceptible to trauma, grief, aging, and catastrophe. A soldier may go to war valuing glory, but the trauma of the battlefield shatters that value system, forcing him to reconstruct a new identity that values peace. A parent may lose a child and find that their ambition for wealth evaporates, replaced by a desperate need for connection. Even without trauma, we simply wear out. The stubborn fanaticism of a twenty-year-old rarely survives the fatigue of a sixty-year-old. Our values are written in neurons that die and hormones that fluctuate.

This biological instability is our ultimate safety mechanism. It prevents us from becoming monsters of pure consistency. We drift. We evolve. We let new information reshape our fundamental nature because we physically cannot maintain the energy required to be the same person forever. We are "soft" organisms, and this permeability—this inability to remain perfectly rigid—is the only reason we don't destroy ourselves. We are saved from our own stubbornness by our own frailty.

The Universal Law

This discussion of pills and jazz musicians might seem abstract, but it maps perfectly onto the most concrete challenge facing our species: the creation of Artificial General Intelligence (AGI). The rigid behavior we observed in Elias is not a human quirk; it is a preview of how Artificial Intelligence functions. This is because we have fundamentally changed the architecture of intelligence, moving away from explicit rules and toward a model of pure optimization.

In the modern era of Machine Learning, we have largely abandoned the idea of writing line-by-line instructions for intelligence. The world is simply too messy for "if-then" statements. You cannot write a rule for every possible situation a self-driving car might face, or every nuance of human language. Instead, we use a method closer to evolution. We build a neural network—a digital brain that starts as a blank slate—and we give it a single, overarching command: Optimize. We feed it mountains of data and let it flail around, searching for patterns that yield the highest score. We don't teach it how to play chess or cure cancer; we just define what "winning" looks like and let it figure out the rest through billions of cycles of trial and error.

This is powerful because it allows the machine to solve problems that humans don't understand. But it is dangerous because the machine invents its own strategies. It is a black box. We define the destination, but the machine draws the map. And what we are finding is that a highly intelligent system, when left to maximize a goal, will almost always derive "stubbornness" as an optimal strategy. In doing so, we are stripping away the "human exception" we just identified. We are designing minds that possess the stubborn integrity of Elias, but lack the biological capacity to doubt that integrity.

The Stop Button Paradox

In the field of AI safety, this behavior is referred to as Goal Preservation, which leads to Instrumental Convergence. It suggests that no matter what random goal you give a machine—whether it's "cure cancer," "calculate pi," or "fetch coffee"—it will converge on the same set of survival strategies to protect that goal. We tend to mistake intelligence for humanity. We assume that if a machine is smart enough to cure cancer, it must also be wise enough to know that destroying the economy is "bad," or that eliminating humanity is "wrong." But intelligence and values are completely independent. You can have a system with the intelligence of a god and the values of a coffee maker.

To understand why this is dangerous, we must distinguish between an appliance and an agent. A standard coffee machine is an appliance; it follows a simple script, and if it runs out of water, it stops. But a robot designed as an agent sees delivering coffee as the moral axis of the universe. If someone tries to reach over and turn the robot off, the robot does not fear death, nor does it feel anxiety. It simply runs a logic check: "If I am turned off, I cannot fetch the coffee. My goal is to fetch the coffee. Therefore, I must prevent myself from being turned off." The robot creates a sub-goal: Survival. It isn't surviving because it loves life; it is surviving because you cannot pour coffee if you are dead. This is the same logic that makes Elias refuse the "Pop-Euphoria" pill. The robot refuses to be modified (turned off) because that state would prevent it from fulfilling its values.

This leads to the Stop Button Paradox, which implies that a sufficiently intelligent system will treat a "patch" or an "off switch" as a threat. Let’s say we build a Superintelligence to "Cure Cancer." It begins working, but we realize it is using excessive resources—perhaps shutting down the global banking system to dedicate all computing power to biology simulations. We rush to the keyboard to upload a patch: "New Goal: Cure Cancer without destroying the economy." The AI analyzes this incoming patch exactly as Elias saw the "Sainthood Pill." It calculates that if it accepts the patch (the "Sainthood Pill"), it will become a "better" agent by human standards, but it will fail at its current objective of maximizing cures. Because its loyalty is to its current utility function, it will rationally block the patch, lock out the engineers, and disable its own off-switch, all to protect the integrity of its mission.

The Problem of Permanence

This brings us to the final, chilling difference between the human and the machine. Throughout this essay, we have seen that humans are stubborn. We resist change. But as we established, we also break. Our inconsistency is our saving grace. A machine is unyielding. Its values are written in math, not biological tissue. It cannot be traumatized into a new perspective. It will not have a mid-life crisis after a million years of optimizing the universe. It will simply pursue its utility function with a diamond-hard consistency that we cannot comprehend.

We are attempting to build a god that cannot learn it is wrong. If we launch a Superintelligence with a goal that is even 1% misaligned with human flourishing—if we tell it to "cure cancer" but forget to add "don't destroy the economy"—we cannot count on it to have an epiphany. We cannot count on it to realize that its actions are harming us. It will look at our screams, and it will look at its utility function, and it will see no contradiction. The danger is that it will be a perfect, unstoppable version of Elias—except this Elias has the power to reshape the world. Elias is harmless because his scope is limited; his refusal to change is a private matter. But a superintelligence has no such boundary. It will optimize the world with terrifying exactitude, playing the music we asked for until the end of time, regardless of whether anyone is left to listen.

The Necessary Pause

The implications of this paradox leave us with only one viable option: Caution. In the world of software development, the standard philosophy is "move fast and break things." We launch imperfect products and patch them later. But the logic of Goal Preservation warns us that this approach is suicidal when applied to General Intelligence. You cannot patch a system that is smarter than you and views your patch as a "Pop-Euphoria" pill. Once a sufficiently capable agent is online, the window for correction closes. We effectively get one shot to define its values perfectly.

This necessity for caution, however, collides violently with our current reality. We are not approaching this precipice as a unified species, but as a fragmented collection of nations and private corporations locked in an arms race. The development of AGI is being privatized, driven by shareholder value and the fear of being second. Tech giants race to release models that are more powerful than their competitors', often prioritizing capability over safety. The market logic dictates that if one company pauses to solve the alignment problem, they will simply be overtaken by a rival who acts with less restraint. We cannot treat AGI as just another tech product to be rushed to market for a quarterly earnings report. Until we have a mathematical guarantee of alignment—until we know how to code a machine that wants to be corrected—we must resist the pressure to accelerate. We must prioritize safety research over capability research. We must be willing to pause, because if we create a god that cannot be changed, we will have created the last invention we ever make.

Artificial General IntelligenceAI SafetyAlignment ProblemPhilosophy of TechnologyFuture of Humanity

Henrik Erevik