You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper gave me a lot of inspiration, but I have some questions about the structure.
First, why the text encoder use the mask self attention? The paper have written " Masked self-attention was used in the text encoder to preserve the ability to initialize with a pre-trained language model or add language modeling as an auxiliary objective, though exploration of this is left as future work." And the is considered as the whole sentence representation. Will it be better if using self-attention and considering the mean of the last transformer encoder output?
Second, why attention pooling mechanism is used instead of global average pooling? The paper have written "We also replace
the global average pooling layer with an attention pooling mechanism." But i can't find and explanation for this.
I am very confused about this, and hope you take the time to answer my question.
The text was updated successfully, but these errors were encountered:
The paper gave me a lot of inspiration, but I have some questions about the structure.
First, why the text encoder use the mask self attention? The paper have written " Masked self-attention was used in the text encoder to preserve the ability to initialize with a pre-trained language model or add language modeling as an auxiliary objective, though exploration of this is left as future work." And the is considered as the whole sentence representation. Will it be better if using self-attention and considering the mean of the last transformer encoder output?
Second, why attention pooling mechanism is used instead of global average pooling? The paper have written "We also replace
the global average pooling layer with an attention pooling mechanism." But i can't find and explanation for this.
I am very confused about this, and hope you take the time to answer my question.
The text was updated successfully, but these errors were encountered: