Skip to main content
Log in

A multi-scale integrated learning model with attention mechanisms for UAV audio signal detection

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

With the widespread use of unmanned aerial vehicles (UAV), their safety issues have become increasingly prominent in recent years. Therefore, UAV detection and identification technology has become a hot spot. Radar-based methods make it challenging to monitor low-flying UAVs, and video-based methods require high imaging quality. The acoustic signal-based UAV detection method can compensate for the shortcomings of these traditional UAV detection methods. This paper proposes an integrated learning model based on multi-scale convolution and global local attention by processing audio signals. The model can perform accurate UAV identification through UAV audio signals and aims to complement the shortcomings of other methods. The model proposed in this paper adopts an integrated learning framework, which can directly process the raw audio signals of UAVs without manual feature extraction. The proposed model consists of two first-level expert models and a meta-classifier. Firstly, the two first-level expert models perform feature extraction on the data separately. Then, the obtained classification results are inputted to the meta-classifier. Then, the meta-classifier integrates and fuses the results of the first-level models and finally outputs the results of UAV monitoring and recognition. The two first-level expert models add a multi-scale global local attention module based on the residual and depth-separable convolutional structures. The method in this paper is compared with other methods for processing one-dimensional signals on a self-created UAV dataset. Experiments verify the effectiveness and superiority of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

A link to download the data has been provided.

References

  1. Fan, B., Li, Y., Zhang, R., Fu, Q.: Review on the technological development and application of UAV systems. Chin. J. Electron. 29(2), 199–207 (2020)

    MATH  Google Scholar 

  2. Wellig, P., Speirs, P., Schuepbach, C., Oechslin, R., Renker, M., Boeniger, U., Pratisto, H.: Radar systems and challenges for C-UAV. In: 2018 19th International Radar Symposium (IRS), pp. 1–8. IEEE (2018)

  3. Nie, W., Han, Z.-C., Zhou, M., Xie, L.-B., Jiang, Q.: UAV detection and identification based on WiFi signal and RF fingerprint. IEEE Sens. J. 21(12), 13540–13550 (2021)

    Google Scholar 

  4. Li, J., Ye, D.H., Kolsch, M., Wachs, J.P., Bouman, C.A.: Fast and robust UAV to UAV detection and tracking from video. IEEE Trans. Emerg. Top. Comput. 10(3), 1519–1531 (2021)

    Google Scholar 

  5. Fang, H., Ding, L., Wang, L., Chang, Y., Yan, L., Han, J.: Infrared small UAV target detection based on depthwise separable residual dense network and multiscale feature fusion. IEEE Trans. Instrum. Meas. 71, 1–20 (2022)

    MATH  Google Scholar 

  6. Schmähl, M., Rieger, C., Speck, S., Hornung, M.: Semi-empiric noise modeling of a cargo eVTOL UAV by means of system identification from flight noise measurement data. CEAS Aeronaut. J. 13, 1–12 (2021)

    Google Scholar 

  7. Kawaguchi, D., Nakamura, R., Hadama, H.: Evaluation on a drone classification method using UWB radar image recognition with deep learning. In: 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), pp. 1–5 (2021)

  8. Basak, S., Rajendran, S., Pollin, S., Scheers, B.: Combined RF-based drone detection and classification. IEEE Trans. Cogn. Commun. Netw. 8(1), 111–120 (2021)

    MATH  Google Scholar 

  9. Teutsch, M., Krüger, W., Heinze, N.: Detection and classification of moving objects from UAVs with optical sensors. In: Signal Processing, Sensor Fusion, and Target Recognition XX, vol. 8050, pp. 597–610 (2011)

  10. Shi, Z., Chang, X., Yang, C., Wu, Z., Wu, J.: An acoustic-based surveillance system for amateur drones detection and localization. IEEE Trans. Veh. Technol. 69(3), 2731–2739 (2020)

    MATH  Google Scholar 

  11. Harvey, B., O’Young, S.: Acoustic detection of a fixed-wing UAV. Drones 2(1), 4 (2018)

    MATH  Google Scholar 

  12. Bernardini, A., Mangiatordi, F., Pallotti, E., Capodiferro, L.: Drone detection by acoustic signature identification. Electron. Imaging 29, 60–64 (2017)

    Google Scholar 

  13. Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S.-Y., Sainath, T.: Deep learning for audio signal processing. IEEE J. Sel. Top. Signal Process. 13(2), 206–219 (2019)

    Google Scholar 

  14. Le, Q.V.: Building high-level features using large scale unsupervised learning. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598 (2013)

  15. Talaei Khoei, T., Ould Slimane, H., Kaabouch, N.: Deep learning: systematic review, models, challenges, and research directions. Neural Comput. Appl. 35(31), 23103–23124 (2023)

    Google Scholar 

  16. Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)

  17. Xu, H., Tian, Y., Ren, H., Liu, X.: A lightweight channel and time attention enhanced 1D CNN model for environmental sound classification. Expert Syst. Appl. 249, 123768 (2024)

    Google Scholar 

  18. Mienye, I.D., Sun, Y.: A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access 10, 99129–99149 (2022)

    Google Scholar 

  19. Dietterich, T.G., et al.: Ensemble learning. Handb. Brain Theory Neural Netw. 2(1), 110–125 (2002)

    MATH  Google Scholar 

  20. Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(4), 1249 (2018)

    MATH  Google Scholar 

  21. Xie, Y., Sun, W., Ren, M., Chen, S., Huang, Z., Pan, X.: Stacking ensemble learning models for daily runoff prediction using 1D and 2D CNNs. Expert Syst. Appl. 217, 119469 (2023)

    Google Scholar 

  22. Zhao, Y., Chen, J., Xu, X., Lei, J., Zhou, W.: Sev-net: residual network embedded with attention mechanism for plant disease severity detection. Concurr. Comput. Practice Exp. 33(10), 6161 (2021)

    Google Scholar 

  23. Dutt, A., Gader, P.: Wavelet multiresolution analysis based speech emotion recognition system using 1D CNN LSTM networks. IEEE/ACM Trans. Audio Speech Lang. Process 31, 2043–2054 (2023)

    MATH  Google Scholar 

  24. Flower, T.M.L., Jaya, T.: A novel concatenated 1D-CNN model for speech emotion recognition. Biomed. Signal Process. Control 93, 106201 (2024)

    MATH  Google Scholar 

  25. Moussavou Boussougou, M.K., Park, D.-J.: Attention-based 1D CNN-BILSTM hybrid model enhanced with FastText word embedding for Korean voice phishing detection. Mathematics 11(14), 3217 (2023)

    MATH  Google Scholar 

  26. Abdoli, S., Cardinal, P., Koerich, A.L.: End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst. Appl. 136, 252–263 (2019)

    MATH  Google Scholar 

  27. Zhao, D., Qiu, Z., Jiang, Y., Zhu, X., Zhang, X., Tao, Z.: A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detection. Biomed. Signal Process. Control 88, 105624 (2024)

    Google Scholar 

Download references

Funding

This work is supported in part by the Laboratory of Aerodynamic Noise Control Program (Grant No. ANCL20230204), in part by the National Natural Science Foundation of China (Grant Nos. 62201478 and 61971100), in part by the Sichuan Science and Technology Program (Grant Nos. 2024NSFSC1434 and 2022YFG0148), in part by the Southwest University of Science and Technology Doctor Fund (Grant No. 20zx7119), and in part by the Heilongjiang Provincial Science and Technology Program (Grant No. 2022ZX01A16).

Author information

Authors and Affiliations

Authors

Contributions

JL: Methodology, Validation, Writing-original draft, Software, Investigation, Conceptualization. JZ: Writing—Review and Editing, Supervision, Funding acquisition. JR: Methodology, Investigation, Software, Conceptualization. XG: Writing review, Visualization. ZL: Writing-review and editing, Visualization.

Corresponding author

Correspondence to Ji Zhao.

Ethics declarations

Conflict of interest

There is no potential conflict of interest.

Ethics approval and consent to participate

There are no ethical or right-to-know issues with the data used in the article.

Consent for publication

Written informed consent for publication was obtained from all participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhao, J., Ren, J. et al. A multi-scale integrated learning model with attention mechanisms for UAV audio signal detection. SIViP 19, 344 (2025). https://doi.org/10.1007/s11760-025-03944-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-025-03944-9

Keywords