NER with Llama 3 #426
d-kleine
started this conversation in
Show and tell
Replies: 1 comment 3 replies
-
That's pretty cool! I think that removing the causal mask is reasonable here. I've seen something similar in the recent classification finetuning papers where they used Llama models. Based on the loss and qualitative eval at the end, it looks like it definitely works! Btw. how long did it take to finetune? If it's not too long, I'd be curious to know how it would perform if you left the causal mask. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have worked on a private project on how to adapt LLaMA 3.2, a decoder-only (autoregressive) transformer, for Named Entity Recognition (NER) with HuggingFace (so not implemented "from scratch"). Traditionally, encoder-only models like BERT have dominated NER tasks due to their ability to process input text bidirectionally, capturing rich contextual information. However, by removing the causal mask in LLaMA, we enable it to leverage bidirectional context while maintaining its strengths in generative tasks, making it a versatile solution for NER.
What do you think about this implementation?
Project: https://github.com/d-kleine/NER_decoder
Beta Was this translation helpful? Give feedback.
All reactions