Small Object Recognition Based on the Generative Adversarial Network and Multi-instance Learning

Zhiyong, Lin

doi:10.1007/978-3-030-84522-3_39

Lin Zhiyong¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12836))

Included in the following conference series:

International Conference on Intelligent Computing

1841 Accesses

Abstract

In recent years, object recognition has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze that the limitation of existing algorithms for small target detection, such as: (1) the high computational overhead of image resolution increase and (2) the non-semantic data augmentation of small-object- copy-based strategy, leading a worse result in mAP. So, we figure out that the limited number of semantic training samples is a key impediment for this task due to the high cost of collecting and labelling nature images. In this paper, we propose a simple but effective framework for small object recognition. With an improved generative model, we propose a multiply instance learning detector based on CNN, which jointly learns from the labeled nature datasets and unlabeled generated images. Our method shows a state-of-the-art performance for small objects, obtained by Mask R-CNN, on MS COCO.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-task Generative Adversarial Network for Detecting Small Objects in the Wild

Article 18 February 2020

SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network

Downsampling GAN for Small Object Data Augmentation

References

Krizhevsky, A, Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603 (2013)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Vaswani, A., et al.: Tensor2Tensor for neural machine translation. arXiv preprint arXiv:1803.07416 (2018)
Caesar, H., Bankiti, V., Lang, A.H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)
Google Scholar
Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1475–1490 (2004)
Article Google Scholar
Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45054-8_2
Chapter Google Scholar
Fu, K., Gong, C., Gu, I.Y.H., et al.: Spectral salient object detection. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., et al.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Cheng, M.M., Zhang, Z., Lin, W.Y., et al.: BING: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3286–3293 (2014)
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Ledig, C., Theis, L., Huszar, F., et al.: Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802 (2016)
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_36
Chapter Google Scholar
Brock, A., Lim, T., Ritchie, J.M., et al.: Neural photo editing with introspective adversarial networks. arXiv preprint arXiv:1609.07093 (2016)
Isola, P., Zhu, J.Y., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016)
Li, J., Monroe, W., Shi, T., et al.: Adversarial learning for neural dialogue generation. arXiv preprint arXiv:1701.06547 (2017)
Reed, S., Akata, Z., Yan, X., et al.: Generative adversarial text to image synthesis. In: Proceedings of the 33rd International Conference on Machine Learning (2016)
Google Scholar
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, pp. 4565–4573 (2016)
Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 570–576 (1998)
Google Scholar
Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2010)
Article Google Scholar
Wu, J., Yu, Y., Huang, C., et al.: Deep multiple instance learning for image classification and auto-annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460–3469 (2015)
Google Scholar
He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar

Download references

Acknowledgments

This work is supported by Xiamen Major Science and Technology Projects (No. 3502Z20201017).

Author information

Authors and Affiliations

Xiamen Road and Bridge Information Co. Ltd., Floor 18, Building A06, Software Park Phase III, Xiamen, 361000, Fujian, China
Lin Zhiyong

Authors

Lin Zhiyong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Shenzhen University, Shenzhen, China
Jianqiang Li
Far Eastern Branch of the Russian Academy of Sciences, Vladivostok, Russia
Valeriya Gribova
Polytechnic University of Bari, Bari, Italy
Vitoantonio Bevilacqua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhiyong, L. (2021). Small Object Recognition Based on the Generative Adversarial Network and Multi-instance Learning. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Bevilacqua, V. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12836. Springer, Cham. https://doi.org/10.1007/978-3-030-84522-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-84522-3_39
Published: 09 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84521-6
Online ISBN: 978-3-030-84522-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics