Skip to content

Question about the paper. #195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhhao1 opened this issue Dec 15, 2021 · 0 comments
Open

Question about the paper. #195

zhhao1 opened this issue Dec 15, 2021 · 0 comments

Comments

@zhhao1
Copy link

zhhao1 commented Dec 15, 2021

The paper gave me a lot of inspiration, but I have some questions about the structure.
First, why the text encoder use the mask self attention? The paper have written " Masked self-attention was used in the text encoder to preserve the ability to initialize with a pre-trained language model or add language modeling as an auxiliary objective, though exploration of this is left as future work." And the is considered as the whole sentence representation. Will it be better if using self-attention and considering the mean of the last transformer encoder output?
Second, why attention pooling mechanism is used instead of global average pooling? The paper have written "We also replace
the global average pooling layer with an attention pooling mechanism." But i can't find and explanation for this.
I am very confused about this, and hope you take the time to answer my question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant