Abstract
Self-attention mechanisms have been widely used in object detection tasks to distinguish the importance of different channels and reinforce important information in features, and also leads to the exciting results at all scales. However, most of the self-attentive mechanisms, as well as their variants, focus only on the channel dimension and thus easily ignore the wide and high dimensions of the feature map that play an important role in capturing local contextual information. To alleviate this problem, in this paper we propose an one-dimensional feature supervision network for object detection (1DSNet). Specifically, we first propose an one-dimensional feature supervision module (1DSM). It uses a lightweight one-dimensional feature vector to weight the features from the width and height perspectives, respectively, for jointly reinforcing the important information in the features. Moreover, in order to improve the representation of multi-scale feature context information, we construct a receptive field dilated pyramid pooling (RFD-SPP) that can obtain a larger field of view based on the spatial pyramid pooling. Finally, experimental results demonstrate that our proposed 1DSNet is effective and competitive when compared with some representative methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Qiao, S., Chen, L. C., Yuille, A.: Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: 34th IEEE Conference on Computer Vision and Pattern Recognition, pp. 10213–10224. IEEE Press, Online (2021)
Tan, Z., Wang, J., Sun, X., Lin, M., Li, H.: Giraffedet: a heavy-neck paradigm for object detection. In: 10th International Conference on Learning Representations. Elsevier Press, Online (2022)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: 16th IEEE International Conference on Computer Vision, pp. 2980–2988. IEEE Press, Venice (2017)
Li, F., et al.: Lite detr: an interleaved multi-scale encoder for efficient detr. In: 36th IEEE Conference on Computer Vision and Pattern Recognition. IEEE Press, Vancouver (2023)
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. 30 (2017)
Lee, H., Kim, H.E., Nam, H.: Srm: a style-based recalibration module for convolutional neural networks. In: 17th IEEE International Conference on Computer Vision, pp. 1854–1862. IEEE Press, Seoul (2019)
Deng, S., Liang, Z., Sun, L., Jia, K.: Vista: Boosting 3d object detection via dual cross-view spatial attention. In: 35th IEEE Conference on Computer Vision and Pattern Recognition, pp. 8448–8457. IEEE Press, New Orleans (2022)
Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z.N., Cheng, M.M., Hu, S.M.: SegNeXt: rethinking convolutional attention design for semantic segmentation. In: 16th Advances in Neural Information Processing Systems. MIT Press, New Orleans (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 31th IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. IEEE Press, Salt Lake City (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Ghiasi, G., Lin, T.Y., Le, Q.V.: Dropblock: A regularization method for convolutional networks. Adv. Neural. Inf. Process. 31 (2018)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IoU loss: Faster and better learning for bounding box regression. In: 34th AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12993–13000. AAAI Press, New York City (2020)
Misra, D.: Mish: a self regularized non-monotonic activation function. arXiv preprint arXiv:1908.08681 (2019)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: 17th IEEE International Conference on Computer Vision, pp. 6569–6578. IEEE Press, Seoul (2019)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: 14th European Conference on Computer Vision, pp. 21–37. Springer Press, Amsterdam (2016)
Zhao, Q., et al.: M2det: a single-shot object detector based on multi-level feature pyramid network. In: 33th AAAI Conference on Artificial Intelligence, vol. 33, no. 1, pp. 9259–9266. AAAI Press, Hawaii (2019)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., Shi, J.: Foveabox: beyond anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)
Cao, Y., Chen, K., Loy, C.C., Lin, D.: Prime sample attention in object detection. In: 33th IEEE Conference on Computer Vision and Pattern Recognition, pp. 11583–11591. IEEE Press, Seattle (2020)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: 17th IEEE International Conference on Computer Vision, pp. 9627–9636. IEEE Press, Seoul (2019)
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., Sun, J.: You only look one-level feature. In: 34th IEEE Conference on Computer Vision and Pattern Recognition, pp. 13039–13048. IEEE Press, Online (2021)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271. IEEE Press, Hawaii (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: 16th IEEE International Conference on Computer Vision, pp. 2961–2969. IEEE Press, Venice (2017)
Acknowledgment
This work was supported by the Natural Science Foundation of Henan under Grant 232300421023.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shen, L., Dong, Y., Pei, Y., Yang, H., Zheng, L., Ma, J. (2023). One-Dimensional Feature Supervision Network for Object Detection. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science(), vol 14090. Springer, Singapore. https://doi.org/10.1007/978-981-99-4761-4_13
Download citation
DOI: https://doi.org/10.1007/978-981-99-4761-4_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4760-7
Online ISBN: 978-981-99-4761-4
eBook Packages: Computer ScienceComputer Science (R0)