Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Information Box

During our survey, we found that it is quite challenging to conduct fair comparisons among different PEFT algorithms.

Under the same settings, Method A performs poorly in some papers but shows better results in others.
Various papers might use different settings for different tasks, involving factors such as the selection of layers for fine-tuning, the size of the trainable parameters, and the distinction between task datasets, among others.
We found that some methods could not reproduce the same results as reported in the original papers during our replication attempt, and they also haven't release their codes...

To address these issues, we have decided to make a preliminary attempt. We will test different methods under the same settings. Given that tasks like NLU are highly influenced by random seeds, we have chosen to conduct a unified evaluation on commonsense reasoning tasks. The results we report will satisfy one of the three conditions:

The method’s code is open-source, and a download link for the CR task weights is available. We have verified that these weights produce results consistent with the original paper.
The method’s code is open-source, but no official weights have been released. We report the results obtained by transferring the official code on CR task.
The method is not open-source, but we have successfully reproduced the results consistent with the original paper.

In conclusion, we will not report any results that we are unable to reproduce. We will also provide the weights corresponding to the CR results if they are reported in our paper. We hope that our attempt can assist researchers and contribute to the advancement of this field.

Results

Method	Params	BoolQ	PIQA	SIQA	HellaS.	WinoG.	ARC-e	ARC-c	OBQA	Avg.
ChatGPT	-	73.1	85.4	68.5	78.5	66.1	89.8	79.9	74.8	77.0
Fine-tuning LLaMA-7B
Fully FT	100%	69.9	84.2	78.9	92.3	83.3	86.6	72.8	83.4	81.4
Prefix	0.11%	64.3	76.8	73.9	42.1	72.1	72.9	54.0	60.6	64.6
Series	0.99%	63.0	79.2	76.3	67.9	75.7	74.5	57.1	72.4	70.8
Parallel	3.54%	67.9	76.4	78.8	69.8	78.9	73.7	57.3	75.2	72.2
LoRA_r=4	0.10%	2.3	46.1	18.3	19.7	55.2	65.4	51.9	57.0	39.5
AdaLoRA_r=4	+ 0.6k	66.1	78.1	74.3	34.0	74.4	76.7	57.5	71.2	66.5
FLoRA_r=4	+ 2.6k	67.2	78.0	72.9	65.4	73.8	73.8	55.3	71.8	69.8
DoRA_r=4	+ 877k	51.3	42.2	77.8	25.4	78.8	78.7	62.5	78.6	61.9
LoRA-Dash_r=4	+ 1.3k	65.2	79.9	78.3	82.8	77.1	78.6	65.4	78.4	75.7
LoRA_r=32	0.83%	68.9	80.7	77.4	78.1	78.8	77.8	61.3	74.8	74.7
AdaLoRA_r=32	+ 5.1k	69.1	82.2	77.2	78.3	78.2	79.7	61.9	77.2	75.5
FLoRA_r=32	+ 164k	66.4	81.3	77.1	75.6	77.1	77.2	62.4	77.6	74.3
DoRA_r=32	+ 877k	69.7	83.4	78.6	87.2	81.0	81.9	66.2	79.2	78.4
LoRA-Dash_r=32	+ 1.3k	69.9	82.8	78.6	84.9	81.6	82.3	66.5	80.8	78.4
Fine-tuning LLaMA3-8B
LoRA_r=16	0.35%	72.3	86.7	79.3	93.5	84.8	87.7	75.7	82.8	82.8
AdaLoRA_r=16	+ 2.6k	90.4	85.0	76.7	79.1	83.3	86.4	75.1	75.4	81.4
FLoRA_r=16	+ 41k	90.2	84.2	79.9	79.3	85.1	86.7	74.8	93.9	84.2
LoRA-Dash_r=16	+ 1.3k	74.8	88.0	80.6	95.2	85.6	89.0	77.4	84.8	84.4
LoRA_r=32	0.70%	70.8	85.2	79.9	91.7	84.3	84.2	71.2	79.0	80.8
PISSA_r=32	+ 0	67.1	81.1	77.2	83.6	78.9	77.7	63.2	74.6	75.4
MiLoRA_r=32	+ 0	68.8	86.7	77.2	92.9	85.6	86.8	75.5	81.8	81.9
DoRA_r=32	+ 784k	74.6	89.3	79.9	95.5	85.6	90.5	80.4	85.8	85.2
LoRA-Dash_r=32	+ 1.3k	75.3	88.5	80.2	95.7	86.8	90.7	80.2	85.6	85.4

Download for Weights

We provide some weights of our work.

Method	Links
LoRA-Dash	Google Drive
DoRA	Github

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fair_Comparison

Fair_Comparison

README.md

Information Box

Results

Download for Weights

Files

Fair_Comparison

Directory actions

More options

Directory actions

More options

Latest commit

History

Fair_Comparison

Folders and files

parent directory

README.md

Information Box

Results

Download for Weights