(1) MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. [link]
(2) Contrastive Distillation on Intermediate Representations for Language Model Compression. [link]
(3) DisCo: Effective Knowledge Distillation For Contrastive Learning of Sentence Embeddings. [link]
(1) Adversarial Retriever-Ranker for dense text retrieval. [link]
(2) LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval. [link]
(3) EMBEDDISTILL: A Geometric Knowledge Distillation For Information Retrieval. [link]
(1) In defense of dual-encoders for neural ranking. [link]
(1) Masked Autoencoders Enable Efficient Knowledge Distillers. [link]
(2) Relational Knowledge Distillation. [link]
(3) MiniVLM: A Smaller and Faster Vision-Language Model. [link]
(4) ADVL: Adaptive Distillation For Vision-Language Task. [link]
(5) The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation. [link]
(6) Compressing Visual-linguistic Model via Knowledge Distillation. [link]
(7) Distilled Dual-Encoder Model for Vision-Language Understanding. [link]
(8) XDBERT: Distilling Visual Information to BERT from Cross-Modahttps://arxiv.org/pdf/2203.00048.pdfl Systems to Improve Language Understanding. [link]
(9) Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks. [link]
(10) Multi-modal Alignment using Representation Codebook. [link]