stat.CO

该分类下的最新论文

stat.ML 2026-06-17
Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notabl...

Zilong Zhang, Yi-Ting Hung, Lei Ding 等