Question about the paper. #195

zhhao1 · 2021-12-15T02:39:14Z

The paper gave me a lot of inspiration, but I have some questions about the structure.
First, why the text encoder use the mask self attention? The paper have written " Masked self-attention was used in the text encoder to preserve the ability to initialize with a pre-trained language model or add language modeling as an auxiliary objective, though exploration of this is left as future work." And the is considered as the whole sentence representation. Will it be better if using self-attention and considering the mean of the last transformer encoder output?
Second, why attention pooling mechanism is used instead of global average pooling? The paper have written "We also replace
the global average pooling layer with an attention pooling mechanism." But i can't find and explanation for this.
I am very confused about this, and hope you take the time to answer my question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the paper. #195

Question about the paper. #195

zhhao1 commented Dec 15, 2021

Question about the paper. #195

Question about the paper. #195

Comments

zhhao1 commented Dec 15, 2021