Abstract:
As large language models (LLMs) increasingly integrate into human life and become key components in human-computer interaction, evaluating their underlying values has emerged as a critical research topic. This evaluation not only measures the safety of LLMs to ensure their responsible development in interactions, helps users discover LLMs more aligned with their personal values, but also provides essential guidance for value alignment research. However, value evaluation faces three major challenges: How to define an appropriate evaluation objective that clarifies complex and pluralistic human values? How to ensure evaluation validity? Existing static, open-source benchmarks are prone to data contamination, and quickly become obsolete as LLMs evolve. Moreover, these studies assess LLMs' knowledge of values rather than their behavioral alignment with values in real-world interaction scenarios, leading to discrepancies between evaluation results and user expectations.How to measure and interpret evaluation results? Value evaluation is usually multi-dimensional, which requires integrating multiple values while considering pluralistic value priorities. To address these challenges, we introduce the Value Compass Benchmarks, a platform for scientific value assessment that incorporates three innovative modules. First, we define the evaluation objective based on basic human values from social sciences, leveraging a limited number of basic value dimensions to comprehensively reveal a model's values. Second, we propose a generative evolving evaluation framework that employs a dynamic item generator to produce evaluation samples and analyzes LLMs' values from their outputs in scenarios using generative evaluation methods. Finally, we propose an evaluation metric that integrates value dimensions as a weighted sum, enabling personalized weight customization. We envision this platform as a scientific and systematic tool for value assessment, fostering advancements in value alignment research.