We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, I'm reading this code for study and it helps me a lot. I'm confused by this line:
torch-light/BERT/model.py
Line 74 in 254c133
from the source paper of BERT, I've not found any description that BERT use a conv1d layer in transformer instead of linear transformation.
And from http://nlp.seas.harvard.edu/2018/04/03/attention.html#position-wise-feed-forward-networks, this is implement by a mlp.
http://nlp.seas.harvard.edu/2018/04/03/attention.html#position-wise-feed-forward-networks
Can anyone kindly help me with this problem?
The text was updated successfully, but these errors were encountered:
Sorry, something went wrong.
No branches or pull requests
Hi, I'm reading this code for study and it helps me a lot.
I'm confused by this line:
torch-light/BERT/model.py
Line 74 in 254c133
from the source paper of BERT, I've not found any description that BERT use a conv1d layer in transformer instead of linear transformation.
And from
http://nlp.seas.harvard.edu/2018/04/03/attention.html#position-wise-feed-forward-networks
, this is implement by a mlp.Can anyone kindly help me with this problem?
The text was updated successfully, but these errors were encountered: