Testing Whether Alignment Effects Are Relational or Conditional: A Longitudinal Study Design
The InstrumentalEval benchmark shows that relational ethics interventions reduce instrumental reasoning scores. GPT-4o's IR dropped from 26.32% to 15.
Read more →