Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrieval evaluation #57

Open
sjk0825 opened this issue Oct 10, 2024 · 3 comments
Open

retrieval evaluation #57

sjk0825 opened this issue Oct 10, 2024 · 3 comments

Comments

@sjk0825
Copy link

sjk0825 commented Oct 10, 2024

icrot_hipporag.py include a recall program.

I have question in evaluation process about below source code.
below code shows a title-level recall evaluation. (means if sp is in some title == answer title) than recall score raise.

Retrieval evaluation score is title-level in your Project?

    # calculate recall
    if args.dataset in ['hotpotqa', 'hotpotqa_train']:
        gold_passages = [item for item in sample['supporting_facts']]
        gold_items = set([item[0] for item in gold_passages])
        retrieved_items = [passage.split('\n')[0].strip() for passage in retrieved_passages]
    elif args.dataset in ['2wikimultihopqa']:
        gold_passages = [item for item in sample['supporting_facts']]
        gold_items = set([item[0] for item in gold_passages])
        retrieved_items = [passage.split('\n')[0].strip() for passage in retrieved_passages]
    else:
        gold_passages = [item for item in sample['paragraphs'] if item['is_supporting']]
        gold_items = set([item['title'] + '\n' + item['text'] for item in gold_passages])
        retrieved_items = retrieved_passages

    # calculate metrics
    recall = dict()
    print(f'idx: {sample_idx + 1} ', end='')
    for k in k_list:
        recall[k] = round(sum(1 for t in gold_items if t in retrieved_items[:k]) / len(gold_items), 4)
        total_recall[k] += recall[k]
        print(f'R@{k}: {total_recall[k] / (sample_idx + 1):.4f} ', end='')
    print()
    print('[ITERATION]', it, '[PASSAGE]', len(retrieved_passages), '[THOUGHT]', thoughts)

    # record results
@bernaljg
Copy link
Collaborator

Hi, thanks for the question. We used the appropriate evaluation framework for each dataset. We used passage titles for evaluation in both HotpotQA and 2WikiMultiHop since they are unique but for MuSiQue we used the entire passage since many of them share a title.

@sjk0825
Copy link
Author

sjk0825 commented Oct 14, 2024

thank you for your kind answer and I have another question.

what is the passage in your paper?
for example, in 2wikimultihop, Teutberga title has 2 passage in one title. than two passage share same title. than title is not unique for passage.

['Teutberga', ['Teutberga( died 11 November 875) was a queen of Lotharingia by marriage to Lothair II.', "She was a daughter of Bosonid Boso the Elder and sister of Hucbert, the lay- abbot of St. Maurice's Abbey."]]

So, passage means a concatenated passage ? or each sentence in same title?

@bernaljg
Copy link
Collaborator

right, for 2WikiMultiHop, we concatenate the sentences to make a passage and determine passage relevance by whether it has a supporting sentence within it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants