Abstract
In recent years, research related to fashion has made remarkable progress, and the use of image content for fashion retrieval has become one of the effective approaches as well as research hot spots. However, it remains a challenging task due to the various contents contained in fashion images. This work presents a practical fashion retrieval method which puts emphasis on specific items. The Semantic Fusion Network of the method firstly extracts two kinds of features, which are the global features from the original query image and the item features. The item features are from the same query image semantically parsed before. Then the network fuses two kinds of features with the combination of color information. Finally, the similarity scores are calculated among features for retrieval. The experiments show that while remaining higher statistical retrieval results, our method grasps the detailed characteristics and items of the clothing and keeps a satisfying overall similarity in shape and color.







Similar content being viewed by others
References
Andriluka M, Pishchulin L, Gehler P (2014) Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp 3686–3693
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Chen H, Gallagher A, Girod B (2012) Describing clothing by semantic attributes. In: European Conference on Computer Vision, Springer, pp 609–623
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062
Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3640–3649
Cheng Z, Chang X, Zhu L, Kanjirathinkal RC, Kankanhalli M (2019) Mmalfm: Explainable recommendation by leveraging reviews and images. ACM Transactions on Information Systems (TOIS) 37(2):16
Corbiere C, Ben-Younes H, Ramé A., Ollion C (2017) Leveraging weakly annotated data for fashion image retrieval and label prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2268–2274
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision & Pattern Recognition
Di W, Wah C, Bhardwaj A, Piramuthu R, Sundaresan N (2013) Style finder: Fine-grained clothing style detection and retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 8–13
Fang HS, Lu G, Fang X, Xie J, Tai YW, Lu C (2018) Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, pp 70–78
Gajic B, Baldrich R (2018) Cross-domain fashion image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1869–1871
Gan C, Lin M, Yang Y, De Melo G, Hauptmann AG (2016) Concepts not alone: Exploring pairwise relationships for zero-shot video activity recognition. In: Thirtieth AAAI Conference on Artificial Intelligence
Gong K, Liang X, Li Y, Chen Y, Yang M, Lin L (2018) Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 770–785
Hadi Kiapour M, Han X, Lazebnik S, Berg AC, Berg TL (2015) Where to buy it: Matching street clothing photos in online shops. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3343–3351
Han X, Song X, Yao Y, Xu XS, Nie L (2019) Neural compatibility modeling with probabilistic knowledge distillation. IEEE Trans Image Process 29:871–882
Han Y, Zhu L, Cheng Z, Li J, Liu X Discrete optimal graph clustering. IEEE Transactions on Cybernetics, pp 1–14
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Huang J, Feris RS, Chen Q, Yan S (2015) Cross-domain image retrieval with a dual attribute-aware ranking network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1062–1070
Kalayeh MM, Basaran E, Gökmen M, Kamasak ME, Shah M (2018) Human semantic parsing for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1062–1071
Liang X, Gong K, Shen X, Lin L (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885
Liang X, Lin L, Yang W, Luo P, Huang J, Yan S (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Trans Multimedia 18(6):1175–1186
Liang X, Liu S, Shen X, Yang J, Liu L, Dong J, Lin L, Yan S (2015) Deep human parsing with active template regression. IEEE Trans Pattern Anal Mach Intell 37(12):2402–2414
Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S (2015) Human parsing with contextualized convolutional neural network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1386–1394
Lin K, Yang HF, Liu KH, Hsiao JH, Chen CS (2015) Rapid clothing retrieval via deep learning of binary codes and hierarchical search. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM, pp 499–502
Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Liu S, Liang X, Liu L, Shen X, Yang J, Xu C, Lin L, Cao X, Yan S (2015) Matching-cnn meets knn: Quasi-parametric human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1419–1427
Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1096–1104
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Luo Z, Yuan J, Yang J, Wen W (2019) Spatial constraint multiple granularity attention network for clothesretrieval. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 859–863
Nie W, Wang K, Wang H, Su Y (2019) The assessment of 3d model representation for retrieval with cnn-rnn networks. Multimedia Tools Appl 78 (12):16979–16994
Nie W, Wang W, Huang X (2019) Srnet: Structured relevance feature learning network from skeleton data for human action recognition. IEEE Access 7:132161–132172
Nie X, Feng J, Yan S (2018) Mutual learning to adapt for joint human parsing and pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 502–517
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
Song X, Feng F, Liu J, Li Z, Nie L, Ma J (2017) Neurostylist: Neural compatibility modeling for clothing matching. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 753–761
Sun X, Liu Z, Hu Y, Zhang L, Zimmermann R (2018) Perceptual multi-channel visual feature fusion for scene categorization. Inf Sci 429:37–48
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Xia F, Zhu J, Wang P, Yuille AL (2016) Pose-guided human parsing by an and/or graph using pose-context features. In: Thirtieth AAAI Conference on Artificial Intelligence
Xie H, Fang S, Zha ZJ, Yang Y, Li Y, Zhang Y (2019) Convolutional attention networks for scene text recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15(1s):3
Yamaguchi K, Hadi Kiapour M, Berg TL (2013) Paper doll parsing: Retrieving similar styles to parse clothing items. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3519–3526
Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2012) Parsing clothing in fashion photographs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 3570–3577
Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural computing and applications, pp 1–20
Zhang H, Li S, Cai S, Jiang H, Kuo CCJ (2018) Representative fashion feature extraction by leveraging weakly annotated online resources. In: 2018 25Th IEEE International Conference on Image Processing (ICIP), IEEE, pp 2640–2644
Zhao B, Feng J, Wu X, Yan S (2017) Memory-augmented attribute manipulation networks for interactive fashion search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1520–1528
Ziaeefard M, Camacaro J, Bessega C (2018) Hierarchical feature map characterization in fashion interpretation. In: 2018 15Th Conference on Computer and Robot Vision (CRV), IEEE, pp 88–94
Acknowledgements
This work was supported in part by the National Nature Science Foundation of China (61902277,61772359,61872267), the grant of 2019 Tianjin New Generation Artificial Intelligence Major Program, the grant of Tianjin New Generation Artificial Intelligence Major Program (19ZXZNGX00110,18ZXZNGX00150), the Open Project Program of the State Key Lab of CAD & CG, Zhejiang University (A2012, A2005).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, AA., Zhang, T., Song, D. et al. FRSFN: A semantic fusion network for practical fashion retrieval. Multimed Tools Appl 80, 17169–17181 (2021). https://doi.org/10.1007/s11042-020-08973-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08973-9