research20h ago

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

aarXivscore 0.35

Researchers frame LLM evaluation under selective human supervision as a positive-unlabeled learning problem. They propose a method to quantify and audit biases in LLM-as-a-Judge systems, finding systematic issues like verbosity bias. The approach helps builders assess LLM reliability in real-world scenarios. This work informs strategies to improve LLM evaluation.

Key takeaways

LLM-as-a-Judge systems show systematic biases like verbosity bias.
Positive-unlabeled learning can quantify LLM evaluation biases.
Method helps assess LLM reliability in real-world scenarios.

#llm-evaluation #positive-unlabeled-learning #bias-auditing

Read the original

research20h ago

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

Researchers frame LLM evaluation under selective human supervision as a positive-unlabeled learning problem. They propose a method to quantify and audit biases in LLM-as-a-Judge systems, finding systematic issues like verbosity bias. The approach helps builders assess LLM reliability in real-world scenarios. This work informs strategies to improve LLM evaluation.

Key takeaways

LLM-as-a-Judge systems show systematic biases like verbosity bias.
Positive-unlabeled learning can quantify LLM evaluation biases.
Method helps assess LLM reliability in real-world scenarios.

#llm-evaluation #positive-unlabeled-learning #bias-auditing