Skip to content

Latest commit

 

History

History

Fair_Comparison

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Information Box

During our survey, we found that it is quite challenging to conduct fair comparisons among different PEFT algorithms.

  1. Under the same settings, Method A performs poorly in some papers but shows better results in others.

  2. Various papers might use different settings for different tasks, involving factors such as the selection of layers for fine-tuning, the size of the trainable parameters, and the distinction between task datasets, among others.

  3. We found that some methods could not reproduce the same results as reported in the original papers during our replication attempt, and they also haven't release their codes...

To address these issues, we have decided to make a preliminary attempt. We will test different methods under the same settings. Given that tasks like NLU are highly influenced by random seeds, we have chosen to conduct a unified evaluation on commonsense reasoning tasks. The results we report will satisfy one of the three conditions:

  1. The method’s code is open-source, and a download link for the CR task weights is available. We have verified that these weights produce results consistent with the original paper.

  2. The method’s code is open-source, but no official weights have been released. We report the results obtained by transferring the official code on CR task.

  3. The method is not open-source, but we have successfully reproduced the results consistent with the original paper.

In conclusion, we will not report any results that we are unable to reproduce. We will also provide the weights corresponding to the CR results if they are reported in our paper. We hope that our attempt can assist researchers and contribute to the advancement of this field.

Results

Method Params BoolQ PIQA SIQA HellaS. WinoG. ARC-e ARC-c OBQA Avg.
ChatGPT - 73.1 85.4 68.5 78.5 66.1 89.8 79.9 74.8 77.0
Fine-tuning LLaMA-7B
Fully FT 100% 69.9 84.2 78.9 92.3 83.3 86.6 72.8 83.4 81.4
Prefix 0.11% 64.3 76.8 73.9 42.1 72.1 72.9 54.0 60.6 64.6
Series 0.99% 63.0 79.2 76.3 67.9 75.7 74.5 57.1 72.4 70.8
Parallel 3.54% 67.9 76.4 78.8 69.8 78.9 73.7 57.3 75.2 72.2
LoRAr=4 0.10% 2.3 46.1 18.3 19.7 55.2 65.4 51.9 57.0 39.5
AdaLoRAr=4 + 0.6k 66.1 78.1 74.3 34.0 74.4 76.7 57.5 71.2 66.5
FLoRAr=4 + 2.6k 67.2 78.0 72.9 65.4 73.8 73.8 55.3 71.8 69.8
DoRAr=4 + 877k 51.3 42.2 77.8 25.4 78.8 78.7 62.5 78.6 61.9
LoRA-Dashr=4 + 1.3k 65.2 79.9 78.3 82.8 77.1 78.6 65.4 78.4 75.7
LoRAr=32 0.83% 68.9 80.7 77.4 78.1 78.8 77.8 61.3 74.8 74.7
AdaLoRAr=32 + 5.1k 69.1 82.2 77.2 78.3 78.2 79.7 61.9 77.2 75.5
FLoRAr=32 + 164k 66.4 81.3 77.1 75.6 77.1 77.2 62.4 77.6 74.3
DoRAr=32 + 877k 69.7 83.4 78.6 87.2 81.0 81.9 66.2 79.2 78.4
LoRA-Dashr=32 + 1.3k 69.9 82.8 78.6 84.9 81.6 82.3 66.5 80.8 78.4
Fine-tuning LLaMA3-8B
LoRAr=16 0.35% 72.3 86.7 79.3 93.5 84.8 87.7 75.7 82.8 82.8
AdaLoRAr=16 + 2.6k 90.4 85.0 76.7 79.1 83.3 86.4 75.1 75.4 81.4
FLoRAr=16 + 41k 90.2 84.2 79.9 79.3 85.1 86.7 74.8 93.9 84.2
LoRA-Dashr=16 + 1.3k 74.8 88.0 80.6 95.2 85.6 89.0 77.4 84.8 84.4
LoRAr=32 0.70% 70.8 85.2 79.9 91.7 84.3 84.2 71.2 79.0 80.8
PISSAr=32 + 0 67.1 81.1 77.2 83.6 78.9 77.7 63.2 74.6 75.4
MiLoRAr=32 + 0 68.8 86.7 77.2 92.9 85.6 86.8 75.5 81.8 81.9
DoRAr=32 + 784k 74.6 89.3 79.9 95.5 85.6 90.5 80.4 85.8 85.2
LoRA-Dashr=32 + 1.3k 75.3 88.5 80.2 95.7 86.8 90.7 80.2 85.6 85.4

Download for Weights

We provide some weights of our work.

Method Links
LoRA-Dash Google Drive
DoRA Github