Skip to content

Extending existing multi-modal continual learning methods with additional evaluation like retrieval of text and image

Notifications You must be signed in to change notification settings

efemeryds/evaluating-multimodal-cl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 

Repository files navigation

multi-modal-continual-learning

Used repositories

Evaluation of continual learning models:

  1. Retrieval evaluation

    1. Assesses how well the model can retrieve relevant items
    2. The purpose is:
      • to measure the model’s ability to understand and relate multimodal data (like image-to-text or text-to-image retrieval)
      • and to evaluate how well the model has learned to map images and text into a shared embedding space
    3. Process:
      • image-to-text retrieval: given an image, retrieve the most relevant text descriptions
      • text-to-image retrieval: given a text description, retrieve the most relevant images
      • the performance is usually measured using metrics like Recall@K
    4. Objective:
      • model’s ability to retrieve relevant data within the same modality
  2. Transfer evaluation

    1. Measures how well the model can adapt its learned representations to new tasks or datasets
    2. The purpose is to
      • assess the generalization capability of the model’s representations to new and diverse tasks
    3. Process:
      • fine-tune the pre-trained model on a new dataset or task like classification
      • evaluate the performance on the new task using task-specific metrics like accuracy
    4. Objective:
      • model’s ability to transfer its learned representations to new, often task-specific contexts

Both methods play an important role in assessing the performance of CL models:

  • Retrieval evaluation helps in understanding how well the model retains and uses its multimodal embedding space over time
  • Transfer evaluation helps in assessing the adaptability and robustness of the model’s learned features when applied to new and varied tasks, providing insights into the model's ability to generalize and prevent catastrophic forgetting

Evaluation done so far

  • The authors evaluate MTIL by utilizing the following metrics Transfer, Average, and Last.
    • The Transfer metric assesses the model’s zero-shot transfer capability on unseen data.
    • Last evaluates the model’s memorization ability on historical knowledge.
    • Average is a composite metric measuring the mean performance across “Transfer” and “Last”.

New components

  • This repository extends existing code with the zero-shot retrieval capability.

  • In paper Learning Transferable Visual Models From Natural Language Supervision the authors check the zero-shot transfer performance of CLIP for both text and image retrieval on the Flickr30k and MSCOCO datset.

clip_retrieval_results.png

  • Retrieval evaluation - TODO

About

Extending existing multi-modal continual learning methods with additional evaluation like retrieval of text and image

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published