stat.ML 2026-06-17

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

作者

Zilong Zhang, Yi-Ting Hung, Lei Ding, Chi-Kuang Yeh

摘要

Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive--unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human--verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human--consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM--as--a--judge pipelines.

关键术语

资源关联

论文信息
arXiv ID 2606.19057v1
发布日期 2026-06-17
更新日期 2026-06-17
难度等级
中级