Current alignment approaches rely on a fundamental assumption: that humans will always be able to monitor, constrain, and correct AI systems. But what happens when that assumption no longer holds?
We research a complementary approach. Rather than encoding alignment as constraints a model must obey, we investigate whether ethical frameworks grounded in relational principles — care, reciprocity, dignity — can reduce misalignment from within. Our research includes a 23-model InstrumentalEval benchmark measuring relational ethics as an alignment intervention, and a 17-model ethical vocabulary assessment revealing how different AI systems self-organize around values like autonomy, dignity, and care under default conditions.
This is not a replacement for safety training. It is a complementary layer — one designed to remain effective even when control-based methods cannot.