publications
Papers, preprints, and workshop publications.
2026
2025
- Dual Mechanisms of Value Expression: Decomposing Intrinsic and Prompted Values in Language ModelsIn Mechanistic Interpretability Workshop at NeurIPS 2025, 2025Workshop paper
Papers, preprints, and workshop publications.