A Dutch Benchmark to Assess Social Bias in LLMs within a Hiring Decision Setting
Renate Burema and Anne Schuth and Christopher Spelt and Dong Nguyen. In Proceedings of the Language Resources and Evaluation Conference (LREC 2026), 2026.
Abstract
In this paper, we present a Dutch benchmark to assess whether large language models (LLMs) exhibit social biases in hiring decisions, focusing on gender and country of origin. We experiment with two approaches: explicit descriptions of the applicants’ demographics and using first names as proxies. We evaluate both monolingual and multilingual LLMs and find that all tested models, gpt-4o-mini, claude-3.5-haiku, Geitje-7B-Ultra and EuroLLM-9B-Instruct, exhibit some degree of social bias in their decisions. Furthermore, all models tested are sensitive to the manner in which the prompts are written. We make our benchmark publicly available under an EUPL-1.2 license. The benchmark is available at https://github.com/MinBZK/llm-benchmark/tree/main/benchmarks/social-bias.
Links
A Dutch Benchmark to Assess Social Bias in LLMs within a Hiring Decision Setting
https://github.com/MinBZK/llm-benchmark/tree/main/benchmarks/social-bias
https://doi.org/10.63317/3gdjhdj7otjm
Supervised Student
Renate Burema
Intern, 2024-2025
Bib
@inproceedings{burema2026,
title = {A Dutch Benchmark to Assess Social Bias in LLMs within a Hiring Decision Setting},
author = {Renate Burema and Anne Schuth and Christopher Spelt and Dong Nguyen},
year = {2026},
booktitle = {Proceedings of the Language Resources and Evaluation Conference (LREC 2026)},
doi = {10.63317/3gdjhdj7otjm}
}