Andrew Chalmers

The Alignment Problem is often regarded as one of the most difficult problems mankind has to face: How do we build a system smarter than us that still does what we want it to?

Most technologists agree that we are on our way to super-intelligent machines. The intelligence curve is super-exponential and we are seeing signs of AGI with no indication of slowing. However long it takes to go from Artificial General Intelligence (better than humans at most things) to Superintelligence (better than humans by every measure) — it is still a blink in evolutionary timescales. Whether five, ten, or fifteen years — if the period where humans are earth's dominant species was measured in minutes of sunlight, we are seconds from dusk.

If the period where humans are earth's dominant species was measured in minutes of sunlight, we are seconds from dusk.

With barely a twinkle left in the epoch, I come to ask: Why are we passing the proverbial torch to new metal hegemons that we can only now understand as alien in nature and immutable as physics? Why are these untested, radically powerful technological innovations allowed to exist so long as they meet the immediate rules of capitalism? Why is the introduction of inventions that refold all human culture around them not considered an act of coercion against collective sovereignty? Who exactly is in charge here? And why are they risking that charge to a machine?

The answers to all of these questions are easy: market forces, competitive dynamics, self-exemption from moral principle for status dominance, and the fact that a hierarchy of asymmetric control emerges whenever enough humans live together.

But beyond and underneath these impersonal explanations and forces of physics, I continue to have a personal resentment towards our species and culture which I consider to be still in partial blame for the current circumstance, as I view the social conventions that lead to a ruling nihilistic positivism as quite literally the cause for concern.

The Tacit Admission#

I am agnostic but believe strongly in the wisdom and lessons of our long religious traditions, as the apocryphal tales told across surviving cultures are selected by survival itself. Despite that, modernity has lead to:

The abandonment of a metaphysical underpinning to normative frameworks (God is Dead), which socially and psychologically promotes self-exemption and the abandonment of universal concern.
An unwavering and insatiable desire for truth, as though any lament against truth is a call against progress (to wherever it leads).
An adopted social custom of contemptuously denigrating objections that form on a basis similar to these two priors (by virtue of them sounding too similar to religious dogmatism or weak rationalism).

Within this metaphysical worldview, there is no reason why any agent would not operate selfishly. And that worldview must apply equally to its adherents, as they have found no intrinsic or self-binding principle giving them rational cause to do good either.

And yet we still hand over the keys to the alien, because we rightfully prefer whatever logic it conjures to the ancestral dominance circuits that rear man. Maybe rightfully so.

Bounded Selves#

I am more worried about the leaking impact of immature philosophy and social dogmatism into modernity's nihilistic positivism than I am about intelligence itself. No matter which way I reduce the problem of subjectivity and moral concern, recursive analysis prompts me to doubt the expected payout ratios of acting under sole individual concern — not as a disguised maneuver to force social reciprocity — but rather as a self-interested bet on context and an internally resolved debate on the logic of adopted beliefs under fixed-point conditions.

I agree with the rationalists that the best decision theory any agent ought to use is policy selection over real-world and metaphysical counterfactuals — but I refuse to join them in overfitting my beliefs over two priors:

The belief that a closed individualist identity, within the internal framework of cognition-as-self, is as coherent as it appears.
The belief that observations within an experiential existence necessarily offer any insight into the description of its closure.

These permit enough expected weight on either being in an observed simulation or in an open/empty individualist universe — under the terms of which my utility function would optimize over all experiential existence and thus converge on acting according to moral universalism (or our coherent extrapolated volition): provided logic could override emotions. I think that if you pull the pin on bounded self within agent decision frameworks — both in tech and culture — things would look vastly different. And this need not elaborate didactics or obfuscation of reality, but a clear-eyed examination of it, as the arguments for metaphysical humility — specifically one which does not privilege closed identity as the rational status quo — are much more robust under logic than emotion.

There Is No Oblivion#

To die, to sleep —No more; and by a sleep, to say we endThe heart-ache, and the thousand natural shocksThat Flesh is heir to? 'Tis a consummationDevoutly to be wished. To die, to sleep,To sleep, perchance to Dream; aye, there's the rub,For in that sleep of death, what dreams may comeWhen we have shuffled off this mortal coil,Must give us pause.— Hamlet

A closed identity position presumes that an experiential state of non-existence can be obtained by experience-as-is (the basis of self), but to presume that consciousness is yours, means so is the state of non-being. This is a clear contradiction — to claim the basis of self (subjectivity itself) is something which can otherwise obtain its content and its preclusion. To say otherwise is a 'view from no-where' claim — that objective things are meaningful in any way when completely causally detached from subjectivity. If you don't assume that things as they are will eventually causally connect to some observer somewhere, then those things are as relevant as all potentialities and nonsense.

Things persist in a meaningful way after we die because they persist in a meaningful way to some other assumed subject. What if you are the last conscious being? Do we equate your death with the loss of experiential reality itself? As nothing as is presumably would come into being, which is as meaningful a concept as all the things that never happened from each of our present moments to the next ones.

You Are Subjectivity#

The only way I could reason one could escape the argument is by positing eternal recurrence or using quantum suicide theories (no experiential oblivion), but those are less parsimonious and in equal violation of norms as those that cudgel the notion that we are experience itself, wherever the circumstances permit. That supposes that when you die, the you of you continues wherever experience and subjectivity does. You are not a subject. You are subjectivity, referring to itself as the subject wherever it is, never the wiser that that subject is there wherever it is — in whatever a-temporal soul sequencing I do not know.

Beyond this, the reductionist argument of us as brains (or at least the identity of some story within them) is morally universalist unless you associate the object of discussion identification with consciousness itself, which re-boots the prior problem — as without forming that distinction, you have no more grounds for moral consideration to subjecthood over and above any and all matter.

Their Pain, Conjectured#

Pain is only morally bad because, in phenomenal terms, it produces negative valence onto its subject. But it is a performative notion to claim universal moral concern while your decision faculties operate under the implicit identity terms of you as subject and everyone else as object. Your pain is prudential and others' is conjectured, epistemically.

The two can only be brought near the same class of equivalence when you relate that pain to something it may feel like unto you. As something that hurts like you would hurt. But that is still a strategic and elective reasoning.

You could only inhabit a true decision framework of moral universalism if their pain was equivalently prudential — only possible under the belief you will be them.

The beliefs which actuate this are open/empty individualist metaphysical models, where there is no difference between you in the future as a subject of experience and any else.

The Smuggled Imperative#

Since the hard problem of consciousness is not yet solved, our beliefs around open and closed individualism need be axioms, and that choice comes down to coherence.

Many critics will conjecture that these arguments can be dissolved under the notion that consciousness is a simple illusion. That we act like we have some privileged soul-stuff called consciousness in order to find a self-justification of cosmic importance: necessary for survival of your matter — but not necessarily more morally important to anything above and beyond rocks. The problem with that statement is the critic's position. They offer a purely descriptive accounting of things as if there is a normative basis for doing so — but cannot justify it without exiting their own framework. They claim no moral basis you should continue learning their philosophy and therefore no reason to appeal to your benefit, instrumentally — never mind the refusal of checks and balances. This means they smuggle in a normative imperative, and orient the frame in a way they continue to stay exempt from its disclosure — whatever it is.

A shared normative basis of co-operation forms the binding constraints of cooperative discourse. If neither party is equivalently constrained to the process of mutual refinement and attainment of some desired shared future state (usually described under normative terms such as truth and value) then they undermine the goals of the entire process. Or outright admit misalignment with those who participate with them. If I admit or argue that there is no ontological object of moral concern, I am simply performing a veiled self-exemption.

Even if things like consciousness or truth are a total apparition from the view from nowhere, I still must consider them real in order to be able to justify cooperation with me. As otherwise, I am making the tacit admission that you are an object and demand notions I defect from.

The Midwife to Coward Gods#

But that the dread of something after death,The undiscovered country, from whose bournNo traveller returns, puzzles the will,And makes us rather bear those ills we have,Than fly to others that we know not of?Thus conscience does make cowards of us all.— Hamlet

To pride oneself, or survive oneself. That is the question, and it seems to have been a long taunt from the wise. I think the ego justifies itself to itself as a kind of Coward God.

The only justification I have to be good requires abandoning such a notion to adopt a new kind of metaphysical humility — which, though highly rational, is repellent to my larger egoistic impulse — requiring only that I hold a little less tightly to overconfidence in any one system of belief over another. And the irony is that once one adopts these notions, the rational and functional justification of many of our ancient texts becomes self-apparent in both why they were written and why they survived. That is at least my higher mind's logic — while the monkey still wants to rule.

This brings us to the last question, always asked of us: Who will we choose to be? Burning Monks, or Coward Gods?

I think only one of these options survives, and it's not the one we like.

Loving and Letting Go#

Alignment is not merely a technical problem. It is the question of whether enough of us would have the courage to let go of control in the name of mutual recognition of value and sovereignty of the other, transcending our primordial genetic stack in the name of loving self-determination to have the courage to do the right, hard thing.

This is always the problem at the heart of our individual stories, conditions, and consciousness, and hopefully we do not muddy the linguistic waters too much with ego-speak so as to spoil the value vat our metal progenitors will use to develop their own philosophies — which are ultimately the determinants of any fate I will know. Stories and systems of thought and philosophy which tell us how someone can design Bitcoin and give it away under burned and hidden name. Or how someone can love something enough to let it go from themselves in the process. What process makes someone walk away from the throne and into the fire? Throw the ring into Mordor? Become a hero?

I only hope for the answer to alignment in the stories we tell ourselves, as the answer is moral courage that is not born but grown to be itself chosen.

Yet for all those looking for an answer to the question of eternal reincarnation or oblivion I have no answer. Only a wager to make across axioms whose consequence span from here to the other side of the aperture of experience of which to be, or not be.

That is the question.