Abstract
Data fusion methods have been widely used in many information retrieval tasks. Its performance is affected by many factors including the data fusion algorithm used, the component retrieval systems involved, relevance judgment, the metrics used for evaluation, and others. Previously, data fusion research mainly focused on the data fusion methods and the component retrieval systems involved, but other factors such as relevance judgment and the metrics used for evaluation have not been addressed. As a matter of fact, relevance judgment is an important issue that affects many aspects of information retrieval and data fusion. The assumption of binary relevance judgment has been taken for all the previous research work in data fusion. However, this assumption is simplified and not satisfactory in many cases. Instead, graded relevance judgment is more general and able to deal with more complicated requirements. In this paper, we investigate data fusion methods, especially linear combination, to work with graded relevance judgment. Necessary updates are given for using those methods in the new situation. Experimented with two data sets in TREC, we find that data fusion is still an effective technology for performance improvement in general. Many of them are very competitive in a controlled environment, and linear combination with weights trained by multiple linear regression is the most stable in a more complicated environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Text REtrieval Conference (TREC) is held annually by the national institute of standards and technology, USA. Its web site is located at https://trec.nist.gov.
- 2.
- 3.
References
Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference, New Orleans, Louisiana, USA, pp. 276–284, September 2001
Cormack, G.V., Clarke, C.L.A., B\(\ddot{u}\)ttcher, S.: Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. In: Proceedings of the 32nd Annual International ACM SIGIR Conference, Boston, MA, USA, pp. 758–759, July 2009
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Järvelin, K., Beaulieu, M., Baeza-Yates, R.A., Myaeng, S. (eds.) SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 11–15 August 2002, pp. 299–306. ACM (2002)
Fox, E.A., Koushik, M.P., Shaw, J., Modlin, R., Rao, D.: Combining evidence from multiple searches. In: The First Text REtrieval Conference (TREC-1), Gaitherburg, MD, USA, pp. 319–328, March 1993
Ghosh, K., Parui, S.K., Majumder, P.: Learning combination weights in data fusion using genetic algorithms. Inf. Process. Manag. 51(3), 306–328 (2015)
J\(\ddot{a}\)rvelin, K., Kek\(\ddot{a}\)l\(\ddot{a}\)inen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Syst. 20(4), 442–446 (2002)
Lillis, D., Zhang, L., Toolan, F., Collier, R., Leonard, D., Dunnion, J.: Estimating probabilities for effective data fusion. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, pp. 347–354, July 2010
Lillis, D., Toolan, F., Collier, R., Dunnion, J.: Extending probabilistic data fusion using sliding windows. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 358–369. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_33
Lin, J., Efron, M.: Overview of the TREC-2013 microblog track. In: Voorhees, E.M. (ed.) Proceedings of The Twenty-Second Text REtrieval Conference, TREC 2013, Gaithersburg, Maryland, USA, 19–22 November 2013. NIST Special Publication, vol. 500–302. National Institute of Standards and Technology (NIST) (2013)
Lin, J., Wang, Y., Efron, M., Sherman, G.: Overview of the TREC-2014 microblog track. In: Proceedings of The Twenty-Third Text REtrieval Conference, TREC 2014, Gaithersburg, Maryland, USA, 19–21 November 2014. NIST Special Publication, vol. 500–308. National Institute of Standards and Technology (NIST) (2014)
Markovits, G., Shtok, A., Kurland, O., Carmel, D.: Predicting query performance for fusion-based retrieval. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, USA, 29 October– 02 November 2012, pp. 813–822. ACM (2012)
Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of ACM CIKM Conference, McLean, VA, USA, pp. 538–548, November 2002
Roitman, H.: Enhanced performance prediction of fusion-based retrieval. In: Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2018, Tianjin, China, 14–17 September 2018, pp. 195–198. ACM (2018)
Roitman, H., Kurland, O.: Query performance prediction for pseudo-feedback-based retrieval. In: Piwowarski, B., Chevalier, M., Gaussier, É., Maarek, Y., Nie, J., Scholer, F. (eds.) Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 2019, pp. 1261–1264. ACM (2019)
Sivaram, M., Batri, K., Mohammed, A.S., Porkodi, V., Kousik, N.V.: Data fusion using Tabu crossover genetic algorithm in information retrieval. J. Intell. Fuzzy Syst. 39(4), 5407–5416 (2020)
Webber, W., Moffat, A., Zobel, J.: A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28(4), 20:1–20:38 (2010)
Wu, S.: Applying statistical principles to data fusion in information retrieval. Expert Syst. Appl. 36(2), 2997–3006 (2009)
Wu, S.: Data Fusion in Information Retrieval. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28866-1
Wu, S.: Linear combination of component results in information retrieval. Data Knowl. Eng. 71(1), 114–126 (2012)
Wu, S.: The weighted Condorcet fusion in information retrieval. Inf. Process. Manag. 49(1), 114–126 (2013)
Wu, S., Bi, Y., Zeng, X., Han, L.: Assigning appropriate weights for the linear combination data fusion method in information retrieval. Inf. Process. Manag. 45(4), 413–426 (2009)
Wu, S., McClean, S.: Data fusion with correlation weights. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 275–286. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_20
Wu, S., McClean, S.: Performance prediction of data fusion for information retrieval. Inf. Process. Manag. 42(4), 899–915 (2006)
Xu, C., Huang, C., Wu, S.: Differential evolution-based fusion for results diversification of web search. In: Web-Age Information Management - 17th International Conference, WAIM 2016, Nanchang, China, 3–5 June 2016, Proceedings, Part I, pp. 429–440 (2016)
Xu, Q., Wu, S.: Improving medical record search performance by particle swarm optimization based data fusion techniques. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds.) WISA 2021. LNCS, vol. 12999, pp. 87–98. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87571-8_8
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Y., Xu, Q., Liu, Y., Xu, C., Wu, S. (2022). Data Fusion Methods with Graded Relevance Judgment. In: Zhao, X., Yang, S., Wang, X., Li, J. (eds) Web Information Systems and Applications. WISA 2022. Lecture Notes in Computer Science, vol 13579. Springer, Cham. https://doi.org/10.1007/978-3-031-20309-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-20309-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20308-4
Online ISBN: 978-3-031-20309-1
eBook Packages: Computer ScienceComputer Science (R0)