Predicting Search Satisfaction Metrics with Interleaved Comparisons
Beer&Tech, Criteo, Paris, France. Oct 28, 2015.
Summary
This presentation addresses the challenge of efficiently evaluating search system performance by introducing methods to predict sophisticated user satisfaction metrics using interleaved comparisons. While A/B testing remains the gold standard for online retrieval evaluation, it requires millions of queries to detect statistically significant differences due to high user variance, whereas interleaved comparisons offer 100 times greater sensitivity by presenting each user with results from both control and treatment systems simultaneously. The research demonstrates how interleaved methods, previously limited to simple click-based metrics, can be extended to predict more sophisticated satisfaction measures traditionally used in A/B testing, thereby combining the statistical power of interleaving with the comprehensive evaluation capabilities of established satisfaction metrics.
Slides
Links
Related Publications
Multileaved Comparisons for Fast Online Evaluation
Anne Schuth and Floor Sietsma and Shimon Whiteson and Damien Lefortier and Maarten de Rijke.
In Proceedings of CIKM'14, 2014.