A Frustratingly Easy Improvement for Position Embeddings via Random Padding

Tao, Mingxu; Feng, Yansong; Zhao, Dongyan

doi:10.1007/978-3-031-44696-2_24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14303))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1382 Accesses

Abstract

Position embeddings, encoding the positional relationships among tokens in text sequences, make great contributions to modeling local context features in Transformer-based pre-trained language models. However, in Extractive Question Answering, position embeddings trained with instances of varied context lengths may not perform well as we expect. Since the embeddings of rear positions are updated fewer times than the front position embeddings, the rear ones may not be properly trained. In this paper, we propose a simple but effective strategy, Random Padding, without any modifications to architectures of existing pre-trained language models. We adjust the token order of input sequences when fine-tuning, to balance the number of updating times of every position embedding. Experiments show that Random Padding can significantly improve model performance on the instances whose answers are located at rear positions, especially when models are trained on short contexts but evaluated on long contexts. Our code and data will be released for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Robust Token Embeddings for Extractive Question Answering

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

Self-adapted Positional Encoding in the Transformer Encoder for Named Entity Recognition

Notes

1.
The grey parts represent the weights corresponding to masked tokens, whose gradients cannot be back-propagated. The representation vectors, embedding vectors and attention scores of non-padding tokens are shown by coloured areas.
2.
The underlined texts are wrong predictions given by baseline models. are correct predictions given by the improved model. are golden answer.

References

Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. In: Proceedings of ICLR (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL (2019)
Google Scholar
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Proceedings of EMNLP (2017)
Google Scholar
Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: improving pre-training by representing and predicting spans. TACL 8, 64–77 (2020)
Article Google Scholar
Joshi, M., Choi, E., Weld, D., Zettlemoyer, L.: TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. In: Proceedings of ACL (2017)
Google Scholar
Kwiatkowski, T., Palomaki, J., Redfield, O., et al.: Natural questions: a benchmark for question answering research. TACL 7, 452–466 (2019)
Article Google Scholar
Liu, Y., Ott, M., Goyal, N., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019)
Google Scholar
Peters, M.E., Neumann, M., Iyyer, M., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Google Scholar
Press, O., Smith, N., Lewis, M.: Train short, test long: attention with linear biases enables input length extrapolation. In: Proceedings of ICLR (2022)
Google Scholar
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for SQuAD. In: Proceedings of ACL (2018)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of EMNLP (2016)
Google Scholar
Salant, S., Berant, J.: Contextualized word representations for reading comprehension. In: Proceedings of NAACL (2018)
Google Scholar
Su, J., Lu, Y., Pan, S., Wen, B., Liu, Y.: RoFormer: enhanced transformer with rotary position embedding. CoRR abs/2104.09864 (2021)
Google Scholar
Sugawara, S., Inui, K., Sekine, S., Aizawa, A.: What makes reading comprehension questions easier? In: Proceedings of EMNLP (2018)
Google Scholar
Sun, Y., et al.: A length-extrapolatable transformer (2022)
Google Scholar
Tan, Q., Xu, L., Bing, L., Ng, H.T., Aljunied, S.M.: Revisiting docred - addressing the false negative problem in relation extraction. In: Proceedings of EMNLP (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of NIPS (2017)
Google Scholar
Wang, D., Hu, W., Cao, E., Sun, W.: Global-to-local neural networks for document-level relation extraction. In: Proceedings of EMNLP (2020)
Google Scholar
Yang, Z., Qi, P., Zhang, S., et al.: HotpotQA: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of EMNLP (2018)
Google Scholar
Zhang, N., Chen, X., Xie, X., et al.: Document-level relation extraction as semantic segmentation. In: Proceedings of IJCAI (2021)
Google Scholar
Zhou, W., Huang, K., Ma, T., Huang, J.: Document-level relation extraction with adaptive thresholding and localized context pooling. In: Proceedings of AAAI (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Wangxuan Institute of Computer Technology, Peking University, Beijing, China
Mingxu Tao, Yansong Feng & Dongyan Zhao

Authors

Mingxu Tao
View author publications
You can also search for this author in PubMed Google Scholar
Yansong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Dongyan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yansong Feng .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tao, M., Feng, Y., Zhao, D. (2023). A Frustratingly Easy Improvement for Position Embeddings via Random Padding. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-44696-2_24
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44695-5
Online ISBN: 978-3-031-44696-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

A Frustratingly Easy Improvement for Position Embeddings via Random Padding