Skip to main content

FastNER: Speeding up Inferences for Named Entity Recognition Tasks

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14176))

Included in the following conference series:

Abstract

BERT and its variants are the most performing models for named entity recognition (NER), a fundamental information extraction task. We must apply inference speedup methods for BERT-based NER models to be deployed in the industrial setting. Early exiting allows the model to use only the shallow layers to process easy samples, thus reducing the average latency. In this work, we introduce FastNER, a novel framework for early exiting with a BERT biaffine NER model, which supports both flat NER tasks and nested NER tasks. First, we introduce a convolutional bypass module to provide suitable features for the current layer’s biaffine prediction head. This way, an intermediate layer can focus more on delivering high-quality semantic representations for the next layer. Second, we introduce a series of early exiting mechanisms for BERT biaffine model, which is the first in the literature. We conduct extensive experiments on 6 benchmark NER datasets, 3 of which are nested NER tasks. The experiments show that: (a) Our proposed convolutional bypass method can significantly improve the overall performances of the multi-exit BERT biaffine NER model. (b) our proposed early exiting mechanisms can effectively speed up the inference of BERT biaffine model. Comprehensive ablation studies are conducted and demonstrate the validity of our design for our FastNER framework.

Y. Zhang, X. Gao and W. Zhu—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://catalog.ldc.upenn.edu/LDC2005T09.

  2. 2.

    https://catalog.ldc.upenn.edu/LDC2006T06.

  3. 3.

    https://catalog.ldc.upenn.edu/LDC2011T03.

  4. 4.

    Some literature (e.g., DeeBERT [23]) also refers to exits as off-ramps.

  5. 5.

    The reason why we set \(d_1 = d / 4\) is that smaller \(d_1\) would result in significant performance drops, according to our initial experiments.

  6. 6.

    https://catalog.ldc.upenn.edu/LDC2005T09.

  7. 7.

    https://catalog.ldc.upenn.edu/LDC2006T06.

  8. 8.

    https://catalog.ldc.upenn.edu/LDC2011T03.

  9. 9.

    https://huggingface.co/bert-base-uncased.

  10. 10.

    https://github.com/ymcui/Chinese-BERT-wwm.

References

  1. Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: ICML (2017)

    Google Scholar 

  2. Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G.: Revisiting pre-trained models for Chinese natural language processing. arXiv:abs/2004.13922 (2020)

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  4. Gao, X., Zhu, W., Gao, J., Yin, C.: F-PABEE: flexible-patience-based early exiting for single-label and multi-label text classification tasks. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)

    Google Scholar 

  5. Hendrycks, D., Gimpel, K.: Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv:abs/1606.08415 (2016)

  6. Huang, G., Chen, D., Li, T., Wu, F., Maaten, L.V.D., Weinberger, K.Q.: Multi-scale dense convolutional networks for efficient prediction. arXiv:abs/1703.09844 (2017)

  7. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv:abs/1508.01991 (2015)

  8. Kaya, Y., Hong, S., Dumitras, T.: Shallow-deep networks: understanding and mitigating network overthinking. In: ICML (2019)

    Google Scholar 

  9. Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: GENIA corpus-a semantically annotated corpus for bio-textmining. Bioinformatics (Oxford, England) 19(Suppl. 1), i180-2 (2003). https://doi.org/10.1093/bioinformatics/btg1023

  10. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv:abs/1909.11942 (2020)

  11. Levow, G.A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108–117. Association for Computational Linguistics, Sydney, Australia, July 2006. https://aclanthology.org/W06-0115

  12. Li, H., Zhang, H., Qi, X., Yang, R., Huang, G.: Improved techniques for training adaptive deep networks. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1891–1900 (2019)

    Google Scholar 

  13. Li, X., et al.: Pingan smart health and SJTU at COIN - shared task: utilizing pre-trained language models and common-sense knowledge in machine reading tasks. In: Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing, pp. 93–98. Association for Computational Linguistics, Hong Kong, China, November 2019. https://doi.org/10.18653/v1/D19-6011, https://aclanthology.org/D19-6011

  14. Liu, W., Zhou, P., Zhao, Z., Wang, Z., Deng, H., Ju, Q.: FastBERT: a self-distilling BERT with adaptive inference time. arXiv:abs/2004.02178 (2020)

  15. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  16. Schwartz, R., Stanovsky, G., Swayamdipta, S., Dodge, J., Smith, N.A.: The right tool for the job: matching model and instance complexities. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6640–6651. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.593, https://aclanthology.org/2020.acl-main.593

  17. Sun, T., et al.: Learning sparse sharing architectures for multiple tasks. In: AAAI (2020)

    Google Scholar 

  18. Teerapittayanon, S., McDanel, B., Kung, H.T.: BranchyNet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2464–2469 (2016)

    Google Scholar 

  19. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp. 142–147 (2003). https://aclanthology.org/W03-0419

  20. Vaswani, A., et al.: Attention is all you need. arXiv:abs/1706.03762 (2017)

  21. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)

  22. Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics, Online, October 2020. www.aclweb.org/anthology/2020.emnlp-demos.6

  23. Xin, J., Tang, R., Lee, J., Yu, Y., Lin, J.: DeeBERT: dynamic early exiting for accelerating BERT inference. arXiv preprint arXiv:2004.12993 (2020)

  24. Xin, J., Tang, R., Yu, Y., Lin, J.: BERxiT: early exiting for BERT with better fine-tuning and extension to regression. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 91–104 (2021)

    Google Scholar 

  25. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: NeurIPS (2019)

    Google Scholar 

  26. Yu, J., Bohnet, B., Poesio, M.: Named entity recognition as dependency parsing. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6470–6476. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-main.577, https://aclanthology.org/2020.acl-main.577

  27. Zhang, Z., Zhu, W., Zhang, J., Wang, P., Jin, R., Chung, T.S.: PCEE-BERT: accelerating BERT inference via patient and confident early exiting. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 327–338. Association for Computational Linguistics, Seattle, United States, July 2022. https://doi.org/10.18653/v1/2022.findings-naacl.25, https://aclanthology.org/2022.findings-naacl.25

  28. Zhou, W., Xu, C., Ge, T., McAuley, J., Xu, K., Wei, F.: BERT loses patience: fast and robust inference with early exit. arXiv:abs/2006.04152 (2020)

  29. Zhu, W., Wang, P., Ni, Y., Xie, G., Wang, X.: BADGE: speeding up BERT inference after deployment via block-wise bypasses and divergence-based early exiting. In: ACL Industry (2023)

    Google Scholar 

  30. Zhu, W., Wang, X., Ni, Y., Xie, G.: GAML-BERT: improving BERT early exiting by gradient aligned mutual learning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3033–3044. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021. https://aclanthology.org/2021.emnlp-main.242

  31. Zhu, W., et al.: PANLP at MEDIQA 2019: pre-trained language models, transfer learning and knowledge distillation. In: Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 380–388. Association for Computational Linguistics, Florence, Italy, August 2019. https://doi.org/10.18653/v1/W19-5040, https://aclanthology.org/W19-5040

Download references

Acknowledgement

This work was supported by NSFC grants (No. 61972155 and 62136002), National Key R &D Program of China (No. 2021YFC3340700), Shanghai Trusted Industry Internet Software Collaborative Innovation Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoling Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Gao, X., Zhu, W., Wang, X. (2023). FastNER: Speeding up Inferences for Named Entity Recognition Tasks. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14176. Springer, Cham. https://doi.org/10.1007/978-3-031-46661-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46661-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46660-1

  • Online ISBN: 978-3-031-46661-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics