Jongwook Han
PhD Student, Graduate School of Data Science, Seoul National University
Pluralistic value alignment, psychometric evaluation, and LLM behavior
I'm a PhD student at Seoul National University, where I work with Prof. Yohan Jo on how language models represent, express, and can be aligned with diverse human values.
My recent work studies pluralistic value alignment, contamination in psychometric evaluation, and robust ways to measure value expression in language models.
Research Focus
- Pluralistic value alignment for language models
- Psychometric and behavioral evaluation of LLMs
Background
- PhD student, Seoul National University
- M.S. in Electrical Engineering, KAIST
- B.S. in Integrated Technology, Yonsei University
Representative Work
Value Portrait
Introduces a psychometrically validated benchmark built from real user-LLM interactions, making value assessment more reliable and ecologically grounded than annotation-heavy alternatives.
Across 44 language models, it shows consistent emphasis on Benevolence, Security, and Self-Direction, while also surfacing demographic biases in how models express values.
Read paperDual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in LLMs
Separates intrinsic value expression from prompted value expression and studies them mechanistically through value vectors in the residual stream and value neurons in the MLP layers.
The analysis shows that the two mechanisms partly overlap but diverge in practice: prompted values are more steerable, while intrinsic values preserve greater response diversity.
Read paper