Replies: 2 comments
-
basically the equivalent of Grad-Cam for audio with whisper? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Any updates on this? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Does anyone know how to visualize the encoder attention maps with respect to the input spectrograms?
I'm interested in understanding which portions of the spectrogram a whisper-base fine-tuned model is focusing on when making a prediction.
I can extract the attention maps in the forward pass, each is 1500x1500, but I don't know how to map them back to the input spectrogram.
Any ideas?
Beta Was this translation helpful? Give feedback.
All reactions