Dual-Metric Evaluation of Social Bias in Large Language Models: Evidence from an Underrepresented Nepali Cultural Context
This study evaluates seven state-of-the-art large language models in the underrepresented Nepali cultural context using a Dual-Metric Bias Assessment framework, revealing that while explicit agreement with biased statements is measurable, implicit generative bias is distinct, follows a non-linear relationship with temperature, and is poorly predicted by agreement metrics, thereby highlighting the critical need for culturally grounded datasets and evaluation strategies.