Skip to main content
Log in

FRSFN: A semantic fusion network for practical fashion retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, research related to fashion has made remarkable progress, and the use of image content for fashion retrieval has become one of the effective approaches as well as research hot spots. However, it remains a challenging task due to the various contents contained in fashion images. This work presents a practical fashion retrieval method which puts emphasis on specific items. The Semantic Fusion Network of the method firstly extracts two kinds of features, which are the global features from the original query image and the item features. The item features are from the same query image semantically parsed before. Then the network fuses two kinds of features with the combination of color information. Finally, the similarity scores are calculated among features for retrieval. The experiments show that while remaining higher statistical retrieval results, our method grasps the detailed characteristics and items of the clothing and keeps a satisfying overall similarity in shape and color.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Andriluka M, Pishchulin L, Gehler P (2014) Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp 3686–3693

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  3. Chen H, Gallagher A, Girod B (2012) Describing clothing by semantic attributes. In: European Conference on Computer Vision, Springer, pp 609–623

  4. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062

  5. Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3640–3649

  6. Cheng Z, Chang X, Zhu L, Kanjirathinkal RC, Kankanhalli M (2019) Mmalfm: Explainable recommendation by leveraging reviews and images. ACM Transactions on Information Systems (TOIS) 37(2):16

    Article  Google Scholar 

  7. Corbiere C, Ben-Younes H, Ramé A., Ollion C (2017) Leveraging weakly annotated data for fashion image retrieval and label prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2268–2274

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision & Pattern Recognition

  9. Di W, Wah C, Bhardwaj A, Piramuthu R, Sundaresan N (2013) Style finder: Fine-grained clothing style detection and retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 8–13

  10. Fang HS, Lu G, Fang X, Xie J, Tai YW, Lu C (2018) Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, pp 70–78

  11. Gajic B, Baldrich R (2018) Cross-domain fashion image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1869–1871

  12. Gan C, Lin M, Yang Y, De Melo G, Hauptmann AG (2016) Concepts not alone: Exploring pairwise relationships for zero-shot video activity recognition. In: Thirtieth AAAI Conference on Artificial Intelligence

  13. Gong K, Liang X, Li Y, Chen Y, Yang M, Lin L (2018) Instance-level human parsing via part grouping network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 770–785

  14. Hadi Kiapour M, Han X, Lazebnik S, Berg AC, Berg TL (2015) Where to buy it: Matching street clothing photos in online shops. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3343–3351

  15. Han X, Song X, Yao Y, Xu XS, Nie L (2019) Neural compatibility modeling with probabilistic knowledge distillation. IEEE Trans Image Process 29:871–882

    Article  MathSciNet  Google Scholar 

  16. Han Y, Zhu L, Cheng Z, Li J, Liu X Discrete optimal graph clustering. IEEE Transactions on Cybernetics, pp 1–14

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  18. Huang J, Feris RS, Chen Q, Yan S (2015) Cross-domain image retrieval with a dual attribute-aware ranking network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1062–1070

  19. Kalayeh MM, Basaran E, Gökmen M, Kamasak ME, Shah M (2018) Human semantic parsing for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1062–1071

  20. Liang X, Gong K, Shen X, Lin L (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell 41(4):871–885

    Article  Google Scholar 

  21. Liang X, Lin L, Yang W, Luo P, Huang J, Yan S (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Trans Multimedia 18(6):1175–1186

    Article  Google Scholar 

  22. Liang X, Liu S, Shen X, Yang J, Liu L, Dong J, Lin L, Yan S (2015) Deep human parsing with active template regression. IEEE Trans Pattern Anal Mach Intell 37(12):2402–2414

    Article  Google Scholar 

  23. Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S (2015) Human parsing with contextualized convolutional neural network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1386–1394

  24. Lin K, Yang HF, Liu KH, Hsiao JH, Chen CS (2015) Rapid clothing retrieval via deep learning of binary codes and hierarchical search. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM, pp 499–502

  25. Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116

    Article  MathSciNet  Google Scholar 

  26. Liu S, Liang X, Liu L, Shen X, Yang J, Xu C, Lin L, Cao X, Yan S (2015) Matching-cnn meets knn: Quasi-parametric human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1419–1427

  27. Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1096–1104

  28. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  29. Luo Z, Yuan J, Yang J, Wen W (2019) Spatial constraint multiple granularity attention network for clothesretrieval. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 859–863

  30. Nie W, Wang K, Wang H, Su Y (2019) The assessment of 3d model representation for retrieval with cnn-rnn networks. Multimedia Tools Appl 78 (12):16979–16994

    Article  Google Scholar 

  31. Nie W, Wang W, Huang X (2019) Srnet: Structured relevance feature learning network from skeleton data for human action recognition. IEEE Access 7:132161–132172

    Article  Google Scholar 

  32. Nie X, Feng J, Yan S (2018) Mutual learning to adapt for joint human parsing and pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 502–517

  33. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  34. Song X, Feng F, Liu J, Li Z, Nie L, Ma J (2017) Neurostylist: Neural compatibility modeling for clothing matching. In: Proceedings of the 25th ACM International Conference on Multimedia, pp 753–761

  35. Sun X, Liu Z, Hu Y, Zhang L, Zimmermann R (2018) Perceptual multi-channel visual feature fusion for scene categorization. Inf Sci 429:37–48

    Article  MathSciNet  Google Scholar 

  36. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9

  37. Xia F, Zhu J, Wang P, Yuille AL (2016) Pose-guided human parsing by an and/or graph using pose-context features. In: Thirtieth AAAI Conference on Artificial Intelligence

  38. Xie H, Fang S, Zha ZJ, Yang Y, Li Y, Zhang Y (2019) Convolutional attention networks for scene text recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15(1s):3

    Google Scholar 

  39. Yamaguchi K, Hadi Kiapour M, Berg TL (2013) Paper doll parsing: Retrieving similar styles to parse clothing items. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3519–3526

  40. Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2012) Parsing clothing in fashion photographs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 3570–3577

  41. Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural computing and applications, pp 1–20

  42. Zhang H, Li S, Cai S, Jiang H, Kuo CCJ (2018) Representative fashion feature extraction by leveraging weakly annotated online resources. In: 2018 25Th IEEE International Conference on Image Processing (ICIP), IEEE, pp 2640–2644

  43. Zhao B, Feng J, Wu X, Yan S (2017) Memory-augmented attribute manipulation networks for interactive fashion search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1520–1528

  44. Ziaeefard M, Camacaro J, Bessega C (2018) Hierarchical feature map characterization in fashion interpretation. In: 2018 15Th Conference on Computer and Robot Vision (CRV), IEEE, pp 88–94

Download references

Acknowledgements

This work was supported in part by the National Nature Science Foundation of China (61902277,61772359,61872267), the grant of 2019 Tianjin New Generation Artificial Intelligence Major Program, the grant of Tianjin New Generation Artificial Intelligence Major Program (19ZXZNGX00110,18ZXZNGX00150), the Open Project Program of the State Key Lab of CAD & CG, Zhejiang University (A2012, A2005).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dan Song or Wenhui Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, AA., Zhang, T., Song, D. et al. FRSFN: A semantic fusion network for practical fashion retrieval. Multimed Tools Appl 80, 17169–17181 (2021). https://doi.org/10.1007/s11042-020-08973-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08973-9

Keywords