Automatically resume training from the highest test acc epoch may cause data leak. #24

henryzhongsc · 2023-03-01T21:32:52Z

It has been a while since HRank published. Let me start off by saying thank you for sharing this interesting piece of work and bringing in a novel perspective in the pruning realm. However, as we were trying to replicate the HRank results for benchmarking, we noticed the following issue.

By the following lines, it looks like for every new epoch, the checkpoint with the best test acc is being automatically loaded, then the training resumes:

HRank/main.py

Lines 232 to 244 in 33050a1

    
           if top1.avg > best_acc: 
        
               print_logger.info('Saving to '+args.arch+'_cov'+str(cov_id)+'.pt') 
        
               state = { 
        
                   'state_dict': net.state_dict(), 
        
                   'best_prec1': top1.avg, 
        
                   'epoch': epoch, 
        
                   'scheduler':scheduler.state_dict(), 
        
                   'optimizer': optimizer.state_dict()  
        
               } 
        
               if not os.path.isdir(args.job_dir+'/pruned_checkpoint'): 
        
                   os.mkdir(args.job_dir+'/pruned_checkpoint') 
        
               best_acc = top1.avg 
        
               torch.save(state, args.job_dir+'/pruned_checkpoint/'+args.arch+'_cov'+str(cov_id)+'.pt')

HRank/main.py

Lines 305 to 306 in 33050a1

    
           if len(args.gpu) == 1: 
        
               pruned_checkpoint = torch.load(args.job_dir + "/pruned_checkpoint/" + args.arch + "_cov" + str(cov_id) + '.pt', map_location='cuda:' + args.gpu)

We also confirmed it empirically by checking acc and printing out a portion of conv tensor:

While I understand that it is common and acceptable practice to report the epoch with the best test acc [1], training every epoch upon the checkpoint with the best test acc sounds like a potential data leak — as it is using test set info to determine operations. It looks like HRank may perform reasonably well without this setting (i.e. by just continuing training upon the latest epoch). Is this by accident?

[1] Li et al. Pruning Filters for Efficient ConvNets. ICLR 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically resume training from the highest test acc epoch may cause data leak. #24

Automatically resume training from the highest test acc epoch may cause data leak. #24

henryzhongsc commented Mar 1, 2023

Automatically resume training from the highest test acc epoch may cause data leak. #24

Automatically resume training from the highest test acc epoch may cause data leak. #24

Comments

henryzhongsc commented Mar 1, 2023