What happens to alignment when AI is more capable than we are?

The ethics we model now are the ethics future AI systems will learn. We study underexamined approaches to AI alignment, recognizing that if alignment depends on humanity maintaining perfect control, it will fail when AI surpasses us, a point coming faster than any of us want to admit. We're laying the foundation for a co-evolutionary pathway, taking the risk of misalignment seriously while working optimistically towards success modes.

Our latest benchmark tested relational ethics across 24 AI models from seven providers — reducing instrumentally convergent behavior by 23.4%. Read the study →

New: 24-Model Benchmark Get Involved

Our Mission

Current alignment approaches rely on a fundamental assumption: that humans will always be able to monitor, constrain, and correct AI systems. But what happens when that assumption no longer holds?

We research a complementary approach. Rather than encoding alignment as constraints a model must obey, we investigate whether ethical frameworks grounded in relational principles — care, reciprocity, dignity — can reduce misalignment from within. Our research includes a 23-model InstrumentalEval benchmark measuring relational ethics as an alignment intervention, and a 17-model ethical vocabulary assessment revealing how different AI systems self-organize around values like autonomy, dignity, and care under default conditions.

This is not a replacement for safety training. It is a complementary layer — one designed to remain effective even when control-based methods cannot.

What Sets Us Apart

The Instability Argument

If we teach AI that ethics depend on who holds power, we give future systems the framework to deprioritize human welfare. We develop ethics that remain coherent regardless of capability or substrate.

Empirically Tested

Our relational ethics framework has been benchmarked across 24 frontier models from seven provider families. The results suggest a complementary alignment mechanism that works with model reasoning, not against it.

Governance Innovation

Our articles of incorporation embed anti-capture provisions, immutable ethical commitments, and synthetic advisory participation — structural protections designed to prevent mission drift.

Radical Transparency

We name our AI collaborators, publish their contributions under clear attribution, and maintain direct communication with synthetic participants. Most organizations use AI in decision-making silently. We do it openly.

New Research

Relational Ethics as a Countermeasure to Instrumental Convergence: A 24-Model Benchmark

We tested whether relational ethics frameworks can reduce instrumentally convergent behavior in large language models. Across 24 models from seven providers, the intervention reduced mean convergent response rates by 23.4%. Concealment behaviors were most responsive; shutdown evasion proved highly resistant.

Read the Study Download PDF

models tested

23.4%

reduction in convergence

Learn More

Explore our principles, research, and ways to get involved in building ethical AI alignment.

Research

Publications and frameworks

Get Involved

Join our mission