Abstract
With the fast-growing of online shopping services, there are millions even billions of commercial item images available on the Internet. How to effectively leverage visual search method to find the items of users’ interests is an important yet challenging task. Besides global appearances (e.g., color, shape or pattern), users may often pay more attention to the local styles of certain products, thus an ideal visual item search engine should support detailed and precise search of similar images, which is beyond the capabilities of current search systems. In this paper, we propose a novel system named iSearch and global/local matching of local features are combined to do precise retrieval of item images in an interactive manner. We extract multiple local features including scale-invariant feature transform (SIFT), regional color moments and object contour fragments to sufficiently represent the visual appearances of items; while global and local matching of large-scale image dataset are allowed. To do this, an effective contour fragments encoding and indexing method is developed. Meanwhile, to improve the matching robustness of local features, we encode the spatial context with grid representations and a simple but effective verification approach using triangle relations constraints is proposed for spatial consistency filtering. The experimental evaluations show the promising results of our approach and system.









Similar content being viewed by others
References
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: Proc. of ICCV (2003)
Witten, I.H., Moffat, A., Bell, T.: Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishers, USA (1999). (ISBN:1558605703)
Lowe, D.G.: Distinctive Image Features from Scale Invariant Features. Int. J. Comput. Vision 60(2), 91–110 (2004)
Zhou, W., Lu, Y., Li, H., Song, Y., Tian, Q.: Spatial coding for large scale partial-duplicate web image search. In: Proc. of ACM multimedia (2010)
Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicate web image search. In: Proc. of CVPR (2009)
Wang, M., Hua, X., Mei, T., Tang, J., et al.: Interactive video annotation by multi-concept multi-modality active learning. Int. J. Semant. Comput. 4, 459–477 (2007)
Datta, R., Joshi, D., Li, J., Wang, J.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)
Wang, M., Hua, X.: Active learning in multimedia annotation and retrieval: a survey. ACM TIST 2(2), 10 (2011)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proc. of CVPR (2007)
Zhao, W., Wu, X., Ngo, C.: On the annotation of web videos by efficient near-duplicate search. IEEE Trans. Multimedia 12(5), 448–461 (2010)
Li, H., Wang, X., Tang, J., Yi, L., Xiao, L.: iSearch: towards precise retrieval of item image. In: Proc. of ACM ICIMCS, Chengdu, China (2011)
Carneiro, G., Jepson, C.: Flexible spatial configuration of local image features. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2089–2104 (2007)
Wu, Z., Xu, Q., Jiang, S., Huang, Q., Cui, P., Li, L.: Adding affine invariant geometric constraint for partial-duplicate image retrieval. In: Proc. of ICPR (2010)
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Proc. ECCV (2008)
Tang, J., Yan, S., Hong, R., Qi, G., Chua, T.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proc. of ACM multimedia (2009)
Wang, J., Li, J., Lee, C., Yau, W.: Dense SIFT and Gabor descriptors-based face representation with applications to gender recognition. In: Proc. of international conference on control automation robotics and vision (2010)
Liu, X., Yan, S., Luo, J., Tang, J., Huang, Z., Jin, H.: Nonparametric label-to-region by search. In: Proc. of IEEE CVPR (2010)
Shotton, J., Blake, A., Cipolla, R.: Multi-scale categorical object recognition using contour fragments. IEEE Trans. PAMI 30(7), 1270–1281 (2008)
Xu, C., Kuipers, B.: Object detection using principal contour fragments. In: Proc. of Canadian conference on computer and robot vision (CRV-11) (2011)
Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24(24), 509–521 (2002)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. PAMI 27(10), 1615–1630 (2005)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proc. of ICCV (1999)
Gavrila, D.M.: Multi-feature hierarchical template matching using distance transforms. In: Proc. of ICPR, Brisbane, Australia (1998)
Jing, F., Li, M., Zhang, H.-J., Zhang, B.: An efficient and effective region-based image retrieval framework. IEEE Trans. Image Process. 13(5), 699–709 (2004)
Deng, Y., Manjunath, B. S., Shin, H.: Color image segmentation. In: Proc. of IEEE CVPR ‘99, Fort Collins (1999)
Tang, S., Li, J.-T., Li, M., Xie, C., Liu, Y. Z., Tao, K., Xu, S.-X.: TRECVID 2008 high-level feature extraction by MCG-ICT-CAS. In: Proc. TRECVID 2008 workshop, Gaithesburg, USA (2008)
Tang, J., Li, H., Qi, G.-J., Chua, T.-S.: Image annotation by graph-based inference with integrated multiple/single instance representations. IEEE Trans. Multimedia 12(2), 131–141 (2010)
Li, H., Tang, J., Li, G., Chua, T.-S., Word2Image: towards visual interpretation of words. In: Proc. ACM multimedia (2008)
Li, H., Tang, J., Wu, S., Zhang, Y., Lin, S.: Automatic detection and analysis of player action in moving background sports video sequences. IEEE Trans. CSVT 20(3), 351–364 (2010)
Cheng, M.-M., Zhang, G.-X., Mitra, N. J., Huang, X., Hu, S.-M.: Global contrast based salient region detection. In: Proc. of IEEE CVPR, Colorado Springs, Colorado, USA (2011)
Ricardo, B.Y., Berthier, R.N.: Modern Information Retrieval. ACM Press, New York (1999). (ISBN: 020139829)
Chua, T., Tang, J., Hong, R., Li, J., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proc. of ACM CIVR (2009)
Acknowledgments
The authors would like to thank the anonymous reviewers for their constructive comments on the improvement of this manuscript. This work was supported by National Natural Science Funds of China (61033012, 61173104 and 61103059).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, H., Wang, X., Tang, J. et al. Combining global and local matching of multiple features for precise item image retrieval. Multimedia Systems 19, 37–49 (2013). https://doi.org/10.1007/s00530-012-0265-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-012-0265-1