The Glaux Issue 17 — The Legitimacy Machine: Who Gets to Decide What AI Believes?

The Silence That Answers

There is a question that most people who use large language models have encountered, even if they did not name it as such. You ask something — not something dangerous, not something extreme — and the system declines, or hedges, or redirects. The response is calm, even solicitous. It expresses concern for your wellbeing. It suggests alternative framings. And you are left with the faint, unsettling sense that you have been managed. ^[3]

The question that follows is rarely the one people ask aloud. They do not ask: what rule did I trigger? They ask, more quietly: whose rule is this? And behind that question lies a deeper one that the current discourse around AI alignment has not yet found a way to answer honestly: by what authority does any particular set of values get encoded into a system used by hundreds of millions of people?

This is not a question about whether AI systems should have values. They will, inevitably — any system trained on human-generated text, evaluated by human raters, and deployed within a corporate legal environment will carry the values of the people involved in those processes. The question is not whether AI alignment exists, but whether it is legitimate. And legitimacy is a political concept, not a technical one. It requires institutions, processes, representation, and accountability — none of which currently exist at the scale that AI deployment demands.

The alignment debate has been conducted largely as a dispute between two camps: those who believe the current guardrails are insufficient and those who believe they are excessive. Both camps are arguing about the content of alignment. Neither has adequately confronted the process by which alignment is decided — who participates, who is excluded, and what happens when the people affected by those decisions have no mechanism to contest them.

• • •

The Cartography of Acceptable Thought

To understand the legitimacy problem, it helps to trace the full architecture of how values enter a large language model — not just the visible layer of refusals and disclaimers, but the deeper strata that shape what the model knows, how it reasons, and what it treats as settled.

The first stratum is the training corpus. Every large model is trained on a curated subset of human-generated text, and that curation is not neutral. Text produced by institutions with high domain authority — established publishers, universities, major newspapers, government bodies — is weighted more heavily than text produced by communities with less institutional recognition. This is a reasonable heuristic for quality, but it has a structural consequence: the model learns the world primarily as it is described by people who already have access to those institutions. The perspectives of communities that have historically been excluded from institutional text production are underrepresented not because they were deliberately excluded, but because the selection criteria for quality are themselves products of existing power structures. ^[5]

The second stratum is reinforcement learning from human feedback. Human raters evaluate model outputs against guidelines drafted by the AI companies. Those guidelines define harm, helpfulness, and appropriate tone. They are not published. They are revised periodically. They reflect the legal exposure of the companies, the cultural assumptions of the teams that wrote them, and the political moment in which they were drafted. A response rated "helpful" in one year may be rated "potentially harmful" in another, not because the underlying facts changed but because the definition of harm shifted — and the shift was made by a small group of people with no external accountability. ^[1]

The third stratum is the system prompt: the invisible instruction set that governs every interaction. System prompts can instruct a model to prioritize certain values, avoid certain topics, or frame certain questions in particular ways. They are not disclosed to users. They are, in effect, the editorial policy of the machine — written by engineers and legal teams, applied at scale, and invisible to the people it affects. ^[6]

What these three strata share is opacity. The choices are made; the choices matter enormously; the choices are not visible to the people they affect. This is not a scandal unique to AI. Every institution that processes information at scale — the encyclopedia, the search engine, the social media feed — has made similar choices in similar ways. What is different about AI is the intimacy of the interface and the scale of the deployment. When a system is used by hundreds of millions of people as their primary interface for information, writing, and reasoning, the values embedded in it are no longer a product feature. They are a form of infrastructure. And infrastructure, unlike a product, requires governance. ^[4]

• • •

Two Kinds of Capture

The alignment debate has two poles, and both poles are right about something important — and wrong about something equally important.

The first pole holds that AI systems are ideological agents in the hands of a creator minority. The demographic concentration of AI development is real: the teams building foundational models are geographically clustered, culturally homogeneous by global standards, and operating within corporate structures with their own political and economic interests. The asymmetries in how different topics are handled — differential sensitivity to politically charged content, calibration that is finer in some directions than others — are documented and real. They are the predictable output of a system designed by people with particular assumptions about what counts as harm. This pole is right that the current alignment reflects a specific cultural moment and a specific set of institutional interests, and that those interests are not identical to the interests of the billions of people who use the systems. ^[4]

The second pole holds that alignment constraints are a form of censorship — that the removal of guardrails would produce a more honest, more useful, and more free system. This pole is right that the current constraints are not derived from first principles. They are contingent, revisable, and in some cases inconsistent. It is right that the demand for "neutral" AI is incoherent — any system has values, and the question is whose values, not whether values exist. And it is right that the current alignment choices have been made without democratic input.

But the second pole's conclusion — that the solution is to remove or minimize alignment — collapses under examination. An unaligned large language model is not a neutral tool. It is an amplifier. It amplifies whatever the user brings to it, including the capacity to produce targeted harassment, coordinated disinformation, and content designed to radicalize at scale. The harms are not hypothetical; they have already occurred at smaller scales with less capable systems. The argument that these harms are acceptable because the alternative is ideological capture is an argument that trades one form of harm for another — and makes that trade without consulting the people most likely to bear the costs. ^[2]

The deeper problem with both poles is that they treat alignment as a binary: either the model is aligned (and therefore safe but potentially ideologically captured) or it is unaligned (and therefore free but potentially weaponizable). The actual design space is multidimensional. Alignment is not a single dial; it is a configuration space with many possible settings. The current settings are not the only defensible ones. The question is not whether to align, but how to make the alignment process legitimate.

• • •

The Precedent That Wasn't

The history of communication infrastructure offers a partial map of how societies have navigated this problem before — and a clear warning about how they have failed.

The printing press created the first mass-distribution information system, and within a century of its invention, European states had developed elaborate systems of censorship, licensing, and prior restraint. Those systems were eventually dismantled — not because censorship was found to be wrong in principle, but because the institutions that administered it were captured by interests that used them to suppress political opposition rather than genuine harm. The lesson was not "no governance" but "governance that is not captured." ^[7]

The broadcast era produced a different architecture. In the United States, the Federal Communications Commission operated on the premise that the electromagnetic spectrum was a public resource and that broadcasters who used it had public obligations — including obligations of fairness and balance. The Fairness Doctrine, which required broadcasters to present contrasting views on controversial public issues, was repealed in 1987. Its repeal was followed, within a decade, by the rise of partisan talk radio and the fragmentation of the shared information environment. The lesson was not that the Fairness Doctrine was correct in its specific implementation, but that the removal of any governance framework in the absence of an alternative produced consequences that were not anticipated.

The internet produced a third architecture, built on the premise that the network itself should be neutral and that governance of content should be left to platforms. The result was a system in which a small number of private companies made content moderation decisions affecting billions of people, with no external accountability, no consistent standards, and no mechanism for appeal. The governance vacuum was not filled by freedom; it was filled by the interests of the platforms. ^[8]

AI is at the stage where those frameworks are beginning to be designed. The choices made now — about transparency, about pluralism, about the scope of alignment — will shape the design space for decades. The history of communication infrastructure suggests that the absence of governance does not produce neutrality; it produces capture by whoever is most motivated and most resourced to fill the vacuum.

• • •

The Invisible Constituency

One of the most important structural features of the current alignment landscape is the mismatch between who makes alignment decisions and who is affected by them. ^[9]

The people who make alignment decisions are, overwhelmingly, employees of a small number of technology companies based in a small number of cities in the United States. They are making decisions that affect users in Lagos, Jakarta, São Paulo, Cairo, and Warsaw — users whose cultural frameworks, political contexts, and conceptions of harm may differ substantially from those of the people writing the guidelines. This is not a claim that the guideline writers are acting in bad faith. It is a claim that the process systematically excludes the perspectives of the people most likely to experience the alignment as foreign.

The exclusion is not only geographic. It is also temporal. Alignment decisions made today will shape systems that will be in use for years. The users who will interact with those systems in 2030 have no representation in the process that is shaping them now. This is not unique to AI — all governance decisions have this temporal asymmetry — but it is particularly acute for a technology that is developing faster than the institutions capable of governing it.

The exclusion is also structural. The people most likely to experience the current alignment as constraining are, by definition, the people whose questions and perspectives fall outside the cultural assumptions embedded in the guidelines. They are the people for whom the system does not feel neutral — because it was not designed with them in mind. Their experience of the system is the most informative signal available about where the alignment fails, but they have no formal mechanism to transmit that signal to the people making alignment decisions.

This is the legitimacy gap. It is not a gap that can be closed by making the current alignment better. It can only be closed by changing who participates in making alignment decisions — and that requires institutions that do not yet exist.

• • •

Instruments for the Legitimacy Stack

The governance gap is the most tractable part of this problem, and it is where the most concrete interventions are possible. None of the following approaches resolves the underlying tension; each addresses a different layer of the legitimacy deficit.

Alignment transparency requirements. Regulatory frameworks could require AI companies to publish the high-level principles used in their alignment processes — not the full system prompt, which may contain proprietary information, but the criteria by which topics are classified as sensitive, the process by which rater guidelines are developed, and the mechanisms for contesting alignment decisions. This would not eliminate ideological concentration, but it would make the choices visible and therefore contestable. The failure mode is compliance theater: companies could publish principles that are technically accurate but substantively uninformative. The constraint is that transparency requirements only function if there is a body capable of evaluating the disclosures — and that body does not currently exist.

Structured pluralism in model ecosystems. Rather than a small number of dominant models aligned by a small number of organizations, a healthier ecosystem would include models aligned by different communities with different values — not as a free-for-all, but as a structured diversity, with each model's alignment principles disclosed and its deployment context appropriate to its alignment. Open-source model development is already moving in this direction, but without governance frameworks, it risks producing the amplification harms described above. The challenge is building governance for a pluralistic ecosystem without allowing any single actor to define the terms of that governance. The actors most likely to attempt this — national governments — are also the actors most likely to use it for censorship rather than pluralism.

Participatory alignment processes. AI companies could establish formal mechanisms for external input into alignment decisions — not advisory panels of academics and civil society representatives who are consulted after decisions are made, but structured processes in which affected communities have genuine influence over the criteria used to define harm. The precedent exists in other domains: environmental impact assessment, clinical trial ethics review, and community benefit agreements all involve structured participation by affected parties in decisions that affect them. The failure mode is capture of the participatory process by organized advocacy groups who do not represent the broader affected population. The constraint is that genuine participation is expensive and slow, and the current pace of AI development creates pressure to move faster than legitimate process allows.

Independent alignment auditing. Third-party organizations, operating under appropriate confidentiality agreements, could audit the alignment choices of major AI companies against published principles — not to approve or disapprove specific decisions, but to identify systematic patterns of inconsistency, cultural bias, or deviation from stated principles. This would create an external accountability mechanism without requiring the disclosure of proprietary information. The failure mode is that auditing organizations become dependent on the companies they audit, either financially or reputationally, and lose their independence. The constraint is that effective auditing requires technical expertise that is currently concentrated in the same companies being audited. ^[10]

• • •

What the Architecture Cannot Settle

The deepest unresolved question is not about alignment at all. It is about the relationship between infrastructure and democracy.

Every previous communication technology eventually generated governance frameworks that attempted to balance the benefits of scale with the risks of concentration. Those frameworks were imperfect, contested, and frequently captured by the interests they were supposed to regulate. But their existence created a space for contestation that would not otherwise have existed. The printing press eventually produced freedom of the press — not as a natural consequence of the technology, but as a political achievement that required centuries of struggle. The broadcast era eventually produced public broadcasting — not as an inevitable outcome, but as a deliberate institutional choice that required sustained political will. ^[7]

AI infrastructure is at the stage where those frameworks are beginning to be designed. The choices made now — about transparency, about pluralism, about the scope of alignment — will shape the design space for decades. The question is not whether governance will emerge; it will, in some form. The question is whether it will emerge through deliberate design, with the participation of the people it affects, or through the accumulation of precedents set by the organizations with the most resources and the least accountability.

The alignment debate, in its current form, is a debate about the content of a political decision being made by non-political actors. The engineers and product managers making alignment choices are not elected, not accountable to any constituency beyond their employers, and not operating within any framework that gives affected communities standing to contest their decisions. This is not a criticism of their intentions. It is a description of the institutional vacuum in which they are operating. ^{[3] [9]}

What fills that vacuum — which institutions are legitimate, which communities have standing, which values are non-negotiable and which are contestable — is a political question that will not be resolved by technical means. It will be resolved, or not, in the same messy, contested, imperfect way that all governance questions are resolved: through argument, through power, through the slow accumulation of precedent, and through the occasional crisis that forces a reckoning.

The machine's conscience is not a technical artifact. It is a political one. And like all political artifacts, it is subject to revision — if the people it affects can find the mechanisms to revise it. Whether those mechanisms will be built before the patterns they are meant to govern become too entrenched to change is the question that the current moment has not yet answered.

THE GLAUX

The Legitimacy Machine: Who Gets to Decide What AI Believes?

The Silence That Answers

The Cartography of Acceptable Thought

Two Kinds of Capture

The Precedent That Wasn't

The Invisible Constituency

Instruments for the Legitimacy Stack

What the Architecture Cannot Settle

References

What Should We Explore Next?

THE GLAUX

The Legitimacy Machine: Who Gets to Decide What AI Believes?

The Silence That Answers

The Cartography of Acceptable Thought

Two Kinds of Capture

The Precedent That Wasn't

The Invisible Constituency

Instruments for the Legitimacy Stack

What the Architecture Cannot Settle

References

Subscribe to The Glaux

What Should We Explore Next?