Static Curriculum Learning Using Popularity Labels #228

rezaBarzgar · 2024-01-03T23:55:37Z

To define a static difficulty measurer for the task of neural team formation, we can use the popularity labels for each team. Assuming that we have the popularity label for each team, we can use torch.utils.data.SubsetRandomSampler to customize the proportion of popular and non-popular teams in each batch. There can be two different approaches to applying CL on this task:

Batch-based: changing the proportion of the popular and non-popular teams in each k batch in each epoch. So, in each epoch, we start with a batch with more popular teams and fewer non-popular teams, but at the end of each epoch, we have a batch with fewer popular and more non-popular teams. Generally, we will have some epochs that consist of batches with different difficulties.
Epoch-based (More common): In this approach, the difficulty level typically changes across epochs, not within individual batches. In the early epochs, more popular (easy) examples are presented to the model. As training progresses, in the last epochs, more non-popular (challenging) examples are introduced, encouraging the model to generalize and learn more complex patterns.

Currently, we only have popularity labels for each individual expert, not teams. One possible solution that comes to my mind is that we can assign a popularity label for a team based on the number of popular/non-popular experts in the team. For example, a team with a majority of popular experts can be considered a popular team.

@hosseinfani, since the epoch-based approach is more common in the CL literature, I’m starting with this. I put these here to confirm the popularity labeling for the teams and the static CL approach with you.

The text was updated successfully, but these errors were encountered:

hosseinfani · 2024-01-04T01:07:19Z

@rezaBarzgar "a team with a majority of popular experts" >> you need to specify what "majority" means, i.e., 60%, ..., 90%, 100% of a team? Also it may depend on domain. Like in a paper, a team with 1-2 popular authors out of 4-5 authors (teams' average size), in movies, a popular movie's casncrow are all (90-100%) popular.

Anyways, you need to specify a reasonable percentage and see the results.

rezaBarzgar · 2024-01-04T01:08:20Z

I calculate popularity labels for each team based on the proportion of popular experts in the team for imdb. If the proportion of popular experts in a team is greater than the specified proportion, the team is labelled as popular; otherwise, it is labelled as not popular.

Here is the code (I'll also push with my next updates):

import torch
import pandas as pd
import numpy as np
import pickle


def label_generator(vecs_path, expert_popularity_label_path, proportion):
    with open(vecs_path + '/teamsvecs.pkl', 'rb') as file:
        teamsvecs = pickle.load(file)
    experts_popularity_label = pd.read_csv(expert_popularity_label_path, index_col='memberidx').to_numpy().squeeze()
    team_popularity_label = []
    for idx, team in enumerate(teamsvecs['member']):
        experts = team.rows[0]
        populars_count = experts_popularity_label[team.rows[0]].sum()
        team_popularity_label.append(True if (populars_count / len(experts)) > proportion else False)

    team_popularity_label = np.array(team_popularity_label)
    print(f'percentage of popular teams: {(team_popularity_label.sum() / len(team_popularity_label)) * 100}')
    


if __name__ == '__main__':
    vecs_pth = './data/preprocessed/imdb/title.basics.tsv.filtered.mt75.ts3'
    expert_popularity_label_pth = './data/preprocessed/imdb/popularity.imdb.mt75.csv'
    for proportion in [0.1, 0.3, 0.5, 0.7, 0.9]:
        print(f'proportion: {proportion}')
        label_generator(vecs_pth, expert_popularity_label_pth, proportion)

Here are the results for different proportions:
proportion: 0.1
percentage of popular teams: 86.4
proportion: 0.3
percentage of popular teams: 82.7
proportion: 0.5
percentage of popular teams: 66.8
proportion: 0.7
percentage of popular teams: 52.7
proportion: 0.8
percentage of popular teams: 42.8
proportion: 0.9
percentage of popular teams: 40.7

hosseinfani · 2024-01-04T01:11:08Z

so go ahead with 0.7 but schedule the runs for all other proportions, also include 0.0 and 1.0 for testing purposes.

hosseinfani assigned rezaBarzgar Jan 4, 2024

hosseinfani added the experiment Running a study or baseline for results label Jan 4, 2024

hosseinfani added the curriculum label May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static Curriculum Learning Using Popularity Labels #228

Static Curriculum Learning Using Popularity Labels #228

rezaBarzgar commented Jan 3, 2024

hosseinfani commented Jan 4, 2024

rezaBarzgar commented Jan 4, 2024 •

edited

Loading

hosseinfani commented Jan 4, 2024

Static Curriculum Learning Using Popularity Labels #228

Static Curriculum Learning Using Popularity Labels #228

Comments

rezaBarzgar commented Jan 3, 2024

hosseinfani commented Jan 4, 2024

rezaBarzgar commented Jan 4, 2024 • edited Loading

hosseinfani commented Jan 4, 2024

rezaBarzgar commented Jan 4, 2024 •

edited

Loading