A multi-scale integrated learning model with attention mechanisms for UAV audio signal detection

Li, Junlin; Zhao, Ji; Ren, Junxiao; Gao, Xuefeng; Li, Zengyan

doi:10.1007/s11760-025-03944-9

A multi-scale integrated learning model with attention mechanisms for UAV audio signal detection

Original Paper
Published: 27 February 2025

Volume 19, article number 344, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Junlin Li¹,
Ji Zhao¹,
Junxiao Ren¹,
Xuefeng Gao¹ &
…
Zengyan Li¹

197 Accesses
Explore all metrics

Abstract

With the widespread use of unmanned aerial vehicles (UAV), their safety issues have become increasingly prominent in recent years. Therefore, UAV detection and identification technology has become a hot spot. Radar-based methods make it challenging to monitor low-flying UAVs, and video-based methods require high imaging quality. The acoustic signal-based UAV detection method can compensate for the shortcomings of these traditional UAV detection methods. This paper proposes an integrated learning model based on multi-scale convolution and global local attention by processing audio signals. The model can perform accurate UAV identification through UAV audio signals and aims to complement the shortcomings of other methods. The model proposed in this paper adopts an integrated learning framework, which can directly process the raw audio signals of UAVs without manual feature extraction. The proposed model consists of two first-level expert models and a meta-classifier. Firstly, the two first-level expert models perform feature extraction on the data separately. Then, the obtained classification results are inputted to the meta-classifier. Then, the meta-classifier integrates and fuses the results of the first-level models and finally outputs the results of UAV monitoring and recognition. The two first-level expert models add a multi-scale global local attention module based on the residual and depth-separable convolutional structures. The method in this paper is compared with other methods for processing one-dimensional signals on a self-created UAV dataset. Experiments verify the effectiveness and superiority of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-performance UAVs visual tracking using deep convolutional feature

Article 30 March 2022

High-performance UAVs visual tracking based on siamese network

Article 13 August 2021

AGD-YOLO: a forward-looking sonar target detection method with attention-guided denoising convolutional neural network

Article 06 March 2025

Data availability

A link to download the data has been provided.

References

Fan, B., Li, Y., Zhang, R., Fu, Q.: Review on the technological development and application of UAV systems. Chin. J. Electron. 29(2), 199–207 (2020)
MATH Google Scholar
Wellig, P., Speirs, P., Schuepbach, C., Oechslin, R., Renker, M., Boeniger, U., Pratisto, H.: Radar systems and challenges for C-UAV. In: 2018 19th International Radar Symposium (IRS), pp. 1–8. IEEE (2018)
Nie, W., Han, Z.-C., Zhou, M., Xie, L.-B., Jiang, Q.: UAV detection and identification based on WiFi signal and RF fingerprint. IEEE Sens. J. 21(12), 13540–13550 (2021)
Google Scholar
Li, J., Ye, D.H., Kolsch, M., Wachs, J.P., Bouman, C.A.: Fast and robust UAV to UAV detection and tracking from video. IEEE Trans. Emerg. Top. Comput. 10(3), 1519–1531 (2021)
Google Scholar
Fang, H., Ding, L., Wang, L., Chang, Y., Yan, L., Han, J.: Infrared small UAV target detection based on depthwise separable residual dense network and multiscale feature fusion. IEEE Trans. Instrum. Meas. 71, 1–20 (2022)
MATH Google Scholar
Schmähl, M., Rieger, C., Speck, S., Hornung, M.: Semi-empiric noise modeling of a cargo eVTOL UAV by means of system identification from flight noise measurement data. CEAS Aeronaut. J. 13, 1–12 (2021)
Google Scholar
Kawaguchi, D., Nakamura, R., Hadama, H.: Evaluation on a drone classification method using UWB radar image recognition with deep learning. In: 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), pp. 1–5 (2021)
Basak, S., Rajendran, S., Pollin, S., Scheers, B.: Combined RF-based drone detection and classification. IEEE Trans. Cogn. Commun. Netw. 8(1), 111–120 (2021)
MATH Google Scholar
Teutsch, M., Krüger, W., Heinze, N.: Detection and classification of moving objects from UAVs with optical sensors. In: Signal Processing, Sensor Fusion, and Target Recognition XX, vol. 8050, pp. 597–610 (2011)
Shi, Z., Chang, X., Yang, C., Wu, Z., Wu, J.: An acoustic-based surveillance system for amateur drones detection and localization. IEEE Trans. Veh. Technol. 69(3), 2731–2739 (2020)
MATH Google Scholar
Harvey, B., O’Young, S.: Acoustic detection of a fixed-wing UAV. Drones 2(1), 4 (2018)
MATH Google Scholar
Bernardini, A., Mangiatordi, F., Pallotti, E., Capodiferro, L.: Drone detection by acoustic signature identification. Electron. Imaging 29, 60–64 (2017)
Google Scholar
Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S.-Y., Sainath, T.: Deep learning for audio signal processing. IEEE J. Sel. Top. Signal Process. 13(2), 206–219 (2019)
Google Scholar
Le, Q.V.: Building high-level features using large scale unsupervised learning. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598 (2013)
Talaei Khoei, T., Ould Slimane, H., Kaabouch, N.: Deep learning: systematic review, models, challenges, and research directions. Neural Comput. Appl. 35(31), 23103–23124 (2023)
Google Scholar
Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)
Xu, H., Tian, Y., Ren, H., Liu, X.: A lightweight channel and time attention enhanced 1D CNN model for environmental sound classification. Expert Syst. Appl. 249, 123768 (2024)
Google Scholar
Mienye, I.D., Sun, Y.: A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access 10, 99129–99149 (2022)
Google Scholar
Dietterich, T.G., et al.: Ensemble learning. Handb. Brain Theory Neural Netw. 2(1), 110–125 (2002)
MATH Google Scholar
Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(4), 1249 (2018)
MATH Google Scholar
Xie, Y., Sun, W., Ren, M., Chen, S., Huang, Z., Pan, X.: Stacking ensemble learning models for daily runoff prediction using 1D and 2D CNNs. Expert Syst. Appl. 217, 119469 (2023)
Google Scholar
Zhao, Y., Chen, J., Xu, X., Lei, J., Zhou, W.: Sev-net: residual network embedded with attention mechanism for plant disease severity detection. Concurr. Comput. Practice Exp. 33(10), 6161 (2021)
Google Scholar
Dutt, A., Gader, P.: Wavelet multiresolution analysis based speech emotion recognition system using 1D CNN LSTM networks. IEEE/ACM Trans. Audio Speech Lang. Process 31, 2043–2054 (2023)
MATH Google Scholar
Flower, T.M.L., Jaya, T.: A novel concatenated 1D-CNN model for speech emotion recognition. Biomed. Signal Process. Control 93, 106201 (2024)
MATH Google Scholar
Moussavou Boussougou, M.K., Park, D.-J.: Attention-based 1D CNN-BILSTM hybrid model enhanced with FastText word embedding for Korean voice phishing detection. Mathematics 11(14), 3217 (2023)
MATH Google Scholar
Abdoli, S., Cardinal, P., Koerich, A.L.: End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst. Appl. 136, 252–263 (2019)
MATH Google Scholar
Zhao, D., Qiu, Z., Jiang, Y., Zhu, X., Zhang, X., Tao, Z.: A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detection. Biomed. Signal Process. Control 88, 105624 (2024)
Google Scholar

Download references

Funding

This work is supported in part by the Laboratory of Aerodynamic Noise Control Program (Grant No. ANCL20230204), in part by the National Natural Science Foundation of China (Grant Nos. 62201478 and 61971100), in part by the Sichuan Science and Technology Program (Grant Nos. 2024NSFSC1434 and 2022YFG0148), in part by the Southwest University of Science and Technology Doctor Fund (Grant No. 20zx7119), and in part by the Heilongjiang Provincial Science and Technology Program (Grant No. 2022ZX01A16).

Author information

Authors and Affiliations

School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, Sichuan, People’s Republic of China
Junlin Li, Ji Zhao, Junxiao Ren, Xuefeng Gao & Zengyan Li

Authors

Junlin Li
View author publications
You can also search for this author inPubMed Google Scholar
Ji Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Junxiao Ren
View author publications
You can also search for this author inPubMed Google Scholar
Xuefeng Gao
View author publications
You can also search for this author inPubMed Google Scholar
Zengyan Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

JL: Methodology, Validation, Writing-original draft, Software, Investigation, Conceptualization. JZ: Writing—Review and Editing, Supervision, Funding acquisition. JR: Methodology, Investigation, Software, Conceptualization. XG: Writing review, Visualization. ZL: Writing-review and editing, Visualization.

Corresponding author

Correspondence to Ji Zhao.

Ethics declarations

Conflict of interest

There is no potential conflict of interest.

Ethics approval and consent to participate

There are no ethical or right-to-know issues with the data used in the article.

Consent for publication

Written informed consent for publication was obtained from all participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, J., Zhao, J., Ren, J. et al. A multi-scale integrated learning model with attention mechanisms for UAV audio signal detection. SIViP 19, 344 (2025). https://doi.org/10.1007/s11760-025-03944-9

Download citation

Received: 18 September 2024
Revised: 01 February 2025
Accepted: 10 February 2025
Published: 27 February 2025
DOI: https://doi.org/10.1007/s11760-025-03944-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-scale integrated learning model with attention mechanisms for UAV audio signal detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

High-performance UAVs visual tracking using deep convolutional feature

High-performance UAVs visual tracking based on siamese network

AGD-YOLO: a forward-looking sonar target detection method with attention-guided denoising convolutional neural network

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now