FastNER: Speeding up Inferences for Named Entity Recognition Tasks

Zhang, Yuming; Gao, Xiangxiang; Zhu, Wei; Wang, Xiaoling

doi:10.1007/978-3-031-46661-8_13

Yuming Zhang¹⁵,
Xiangxiang Gao¹⁶,
Wei Zhu¹⁷ &
…
Xiaoling Wang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14176))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

1221 Accesses
2 Citations

Abstract

BERT and its variants are the most performing models for named entity recognition (NER), a fundamental information extraction task. We must apply inference speedup methods for BERT-based NER models to be deployed in the industrial setting. Early exiting allows the model to use only the shallow layers to process easy samples, thus reducing the average latency. In this work, we introduce FastNER, a novel framework for early exiting with a BERT biaffine NER model, which supports both flat NER tasks and nested NER tasks. First, we introduce a convolutional bypass module to provide suitable features for the current layer’s biaffine prediction head. This way, an intermediate layer can focus more on delivering high-quality semantic representations for the next layer. Second, we introduce a series of early exiting mechanisms for BERT biaffine model, which is the first in the literature. We conduct extensive experiments on 6 benchmark NER datasets, 3 of which are nested NER tasks. The experiments show that: (a) Our proposed convolutional bypass method can significantly improve the overall performances of the multi-exit BERT biaffine NER model. (b) our proposed early exiting mechanisms can effectively speed up the inference of BERT biaffine model. Comprehensive ablation studies are conducted and demonstrate the validity of our design for our FastNER framework.

Y. Zhang, X. Gao and W. Zhu—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

End-to-End Entity Detection with Proposer and Regressor

Article 13 April 2023

BiGCNN: Bidirectional Gated Convolutional Neural Network for Chinese Named Entity Recognition

A multi-head adjacent attention-based pyramid layered model for nested named entity recognition

Article Open access 01 September 2022

Notes

1.
https://catalog.ldc.upenn.edu/LDC2005T09.
2.
https://catalog.ldc.upenn.edu/LDC2006T06.
3.
https://catalog.ldc.upenn.edu/LDC2011T03.
4.
Some literature (e.g., DeeBERT [23]) also refers to exits as off-ramps.
5.
The reason why we set $d_1 = d / 4$ is that smaller $d_1$ would result in significant performance drops, according to our initial experiments.
6.
https://catalog.ldc.upenn.edu/LDC2005T09.
7.
https://catalog.ldc.upenn.edu/LDC2006T06.
8.
https://catalog.ldc.upenn.edu/LDC2011T03.
9.
https://huggingface.co/bert-base-uncased.
10.
https://github.com/ymcui/Chinese-BERT-wwm.

References

Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: ICML (2017)
Google Scholar
Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G.: Revisiting pre-trained models for Chinese natural language processing. arXiv:abs/2004.13922 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gao, X., Zhu, W., Gao, J., Yin, C.: F-PABEE: flexible-patience-based early exiting for single-label and multi-label text classification tasks. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Google Scholar
Hendrycks, D., Gimpel, K.: Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv:abs/1606.08415 (2016)
Huang, G., Chen, D., Li, T., Wu, F., Maaten, L.V.D., Weinberger, K.Q.: Multi-scale dense convolutional networks for efficient prediction. arXiv:abs/1703.09844 (2017)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv:abs/1508.01991 (2015)
Kaya, Y., Hong, S., Dumitras, T.: Shallow-deep networks: understanding and mitigating network overthinking. In: ICML (2019)
Google Scholar
Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: GENIA corpus-a semantically annotated corpus for bio-textmining. Bioinformatics (Oxford, England) 19(Suppl. 1), i180-2 (2003). https://doi.org/10.1093/bioinformatics/btg1023
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv:abs/1909.11942 (2020)
Levow, G.A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108–117. Association for Computational Linguistics, Sydney, Australia, July 2006. https://aclanthology.org/W06-0115
Li, H., Zhang, H., Qi, X., Yang, R., Huang, G.: Improved techniques for training adaptive deep networks. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1891–1900 (2019)
Google Scholar
Li, X., et al.: Pingan smart health and SJTU at COIN - shared task: utilizing pre-trained language models and common-sense knowledge in machine reading tasks. In: Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing, pp. 93–98. Association for Computational Linguistics, Hong Kong, China, November 2019. https://doi.org/10.18653/v1/D19-6011, https://aclanthology.org/D19-6011
Liu, W., Zhou, P., Zhao, Z., Wang, Z., Deng, H., Ju, Q.: FastBERT: a self-distilling BERT with adaptive inference time. arXiv:abs/2004.02178 (2020)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Schwartz, R., Stanovsky, G., Swayamdipta, S., Dodge, J., Smith, N.A.: The right tool for the job: matching model and instance complexities. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6640–6651. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.593, https://aclanthology.org/2020.acl-main.593
Sun, T., et al.: Learning sparse sharing architectures for multiple tasks. In: AAAI (2020)
Google Scholar
Teerapittayanon, S., McDanel, B., Kung, H.T.: BranchyNet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469 (2016)
Google Scholar
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp. 142–147 (2003). https://aclanthology.org/W03-0419
Vaswani, A., et al.: Attention is all you need. arXiv:abs/1706.03762 (2017)
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics, Online, October 2020. www.aclweb.org/anthology/2020.emnlp-demos.6
Xin, J., Tang, R., Lee, J., Yu, Y., Lin, J.: DeeBERT: dynamic early exiting for accelerating BERT inference. arXiv preprint arXiv:2004.12993 (2020)
Xin, J., Tang, R., Yu, Y., Lin, J.: BERxiT: early exiting for BERT with better fine-tuning and extension to regression. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 91–104 (2021)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: NeurIPS (2019)
Google Scholar
Yu, J., Bohnet, B., Poesio, M.: Named entity recognition as dependency parsing. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6470–6476. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.577, https://aclanthology.org/2020.acl-main.577
Zhang, Z., Zhu, W., Zhang, J., Wang, P., Jin, R., Chung, T.S.: PCEE-BERT: accelerating BERT inference via patient and confident early exiting. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 327–338. Association for Computational Linguistics, Seattle, United States, July 2022. https://doi.org/10.18653/v1/2022.findings-naacl.25, https://aclanthology.org/2022.findings-naacl.25
Zhou, W., Xu, C., Ge, T., McAuley, J., Xu, K., Wei, F.: BERT loses patience: fast and robust inference with early exit. arXiv:abs/2006.04152 (2020)
Zhu, W., Wang, P., Ni, Y., Xie, G., Wang, X.: BADGE: speeding up BERT inference after deployment via block-wise bypasses and divergence-based early exiting. In: ACL Industry (2023)
Google Scholar
Zhu, W., Wang, X., Ni, Y., Xie, G.: GAML-BERT: improving BERT early exiting by gradient aligned mutual learning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3033–3044. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021. https://aclanthology.org/2021.emnlp-main.242
Zhu, W., et al.: PANLP at MEDIQA 2019: pre-trained language models, transfer learning and knowledge distillation. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 380–388. Association for Computational Linguistics, Florence, Italy, August 2019. https://doi.org/10.18653/v1/W19-5040, https://aclanthology.org/W19-5040

Download references

Acknowledgement

This work was supported by NSFC grants (No. 61972155 and 62136002), National Key R &D Program of China (No. 2021YFC3340700), Shanghai Trusted Industry Internet Software Collaborative Innovation Center.

Author information

Authors and Affiliations

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Yuming Zhang
Shanghai Jiaotong University, Shanghai, China
Xiangxiang Gao
East China Normal University, Shanghai, China
Wei Zhu & Xiaoling Wang

Authors

Yuming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangxiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoling Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoling Wang .

Editor information

Editors and Affiliations

Northeastern University, Shenyang, China
Xiaochun Yang
The University of Indonesia, Depok, Indonesia
Heru Suhartanto
Beijing Institute of Technology, Beijing, China
Guoren Wang
Northeastern University, Shenyang, China
Bin Wang
University of Technology Sydney, Sydney, NSW, Australia
Jing Jiang
Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Bing Li
Sun Yat-sen University, Guangzhou, China
Huaijie Zhu
Anhui University, Hefei, China
Ningning Cui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Gao, X., Zhu, W., Wang, X. (2023). FastNER: Speeding up Inferences for Named Entity Recognition Tasks. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14176. Springer, Cham. https://doi.org/10.1007/978-3-031-46661-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-46661-8_13
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46660-1
Online ISBN: 978-3-031-46661-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

FastNER: Speeding up Inferences for Named Entity Recognition Tasks