Abstract
In the last decade, many evaluation results have been created within the evaluation initiatives like TREC, NTCIR and CLEF. The large amount of data available has led to substantial research on the validity of the evaluation procedure. An evaluation based on the Cranfield paradigm requires basically topics as descriptions of information needs, a document collection, systems to compare, human jurors to judge the documents retrieved by the systems against the information needs descriptions and some metric to compare the systems. For all these elements, there has been a scientific discussion. How many topics, systems, jurors and juror decisions are necessary to achieve valid results? How can the validity be measured? Which metrics are the most reliable ones and which metrics are appropriate from a user perspective? Examples from current CLEF experiments are used to illustrate some of the issues.
User based evaluations confront test users with the results of search systems and let them solve information tasks given in the experiment. In such a test setting, the performance of the user can be measured by observing the number of relevant documents he finds. This measure can be compared to a gold standard of relevance for the search topic to see if the perceived performance correlates with an objective notion of relevance defined by a juror. In addition, the user can be asked about his satisfaction with the search system and its results. In recent years, there has a growing concern on how well the results of batch and user studies correlate. When systems improve in a batch comparison and bring more relevant documents into the results list, do users get a benefit from this improvement? Are users more satisfied with better result lists and do better systems enable them to find more relevant documents? Some studies could not confirm this relation between system performance and user satisfaction.
The tutorial introduces and summarizes recent research on the validity of evaluation experiments in information retrieval.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mandl, T. (2009). Current Developments in Information Retrieval Evaluation. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_92
Download citation
DOI: https://doi.org/10.1007/978-3-642-00958-7_92
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)