Predicting Search Satisfaction Metrics with Interleaved Comparisons

SIGIR'15, Santiago, Chile. Aug 11, 2015.

Summary

This research addresses the challenge of efficiently evaluating search system performance by introducing methods to predict sophisticated user satisfaction metrics using interleaved comparisons. While A/B testing remains the gold standard for online retrieval evaluation, it requires millions of queries to detect statistically significant differences due to high user variance, whereas interleaved comparisons offer substantially greater sensitivity by presenting each user with results from both control and treatment systems simultaneously. The work demonstrates how interleaved methods, previously limited to simple click-based metrics, can be extended to predict more sophisticated satisfaction measures traditionally used in A/B testing, thereby combining the statistical power of interleaving with the comprehensive evaluation capabilities of established satisfaction metrics.

Slides

📄 View Slides PDF directly

Links

Slides

Related Publications

Predicting Search Satisfaction Metrics with Interleaved Comparisons
Anne Schuth and Katja Hofmann and Filip Radlinski. In Proceedings of SIGIR'15, 2015.