This project focuses on the application and evaluation of 2 local search optimization techniques, Simulated Annealing and Genetic Algorithm, to solve the problem of Neural Network pruning. Its primary objective was to reduce model complexity and enhance computational efficiency without significantly compromising performance.
👉 Check the source.ipynb and report.pdf for the full implementation details.
Pruning is a crucial technique in Deep Learning model optimization, as it reduces the number of weights, leading to smaller model sizes, faster inference times, and lower memory usage. Traditional pruning techniques rely on predefined rules or assumptions about which weights to prune. This project will bypass these limitations by leveraging local search algorithms to dynamically learn the best pruning masks.
Simulated Annealing and Genetic Algorithm were experimented here to solve this optimization problem. They will be used as a means of automatically discovering best pruning masks that eliminate unnecessary weights without making any rule-based assumptions regarding weight importance:
- Simulated Annealing (
SA
): A probabilistic local search method inspired by the physics of annealing in metallurgy, which explores the pruning mask space by accepting suboptimal solutions at higher temperatures to escape local minima, gradually refining the solution as the temperatures cools. By gradually lowering this temperatures, it controls the probability of accepting worse solutions as the algorithm refines its pruning masks. - Genetic Algorithm (
GA
): An evolutionary algorithm simulating evolutionary processes such as natural selection, crossover, and mutation. It maintains a population of possible candidate pruning solutions (chromosomes
) and evolves them over several generations to discover optimal pruning masks.GA
is particularly effective at exploring a larger solution space due to its population-based nature, allowing for more aggressive pruning strategies.
This function evaluates the costs of pruning masks based on 2 key criteria:
- Loss difference: The performance degradation of the pruned model compared to the original model
- Effective sparsity: The level of sparsity achieved in each layer, along with a small epsilon to avoid division by 0.
This formulation encouraged the algorithm to find a balance between minimizing loss and maximizing sparsity, with a high penalty for excessive loss (exceed the max_loss
). The loss difference (loss_diff
) is also squared to penalize high loss more than low loss.
A unique feature in my implementation is the layer-wise evolution strategy, which allows for focused optimization of each layer's pruning mask. Both techniques are applied sequentially to each layer with a novel warmup strategy for adaptive maximum allowable loss threshold that gradually increases as progressing through the layers:
This approach allows for more aggressive pruning in the earlier layers while being more conservative in deeper layers, which are typically more sensitive to pruning. Both techniques are also applied with adaptive_mutation_rate
to enhance their effectiveness over time:
These adaptive rates allow for broader exploration in early stages and finer tuning towards the end, contributing to the algorithms' ability to find good pruning configurations.
The experiment was conducted in Colab PRO using TensorFlow and LeNet was used as the baseline architecture in this project. With a layer-wise pruning strategy and adaptive parameters, the goal of 2 above algorithms is to effectively prune a LeNet-5 model trained on the MNIST dataset, which achieved a baseline accuracy of 97.23% with 61,470 weights. Moreover, to strengthen a fair comparison, I employed these following on purpose:
- Initial
mutation_rate
was set to 0.15, meaning 15% of the weights in the pruning mask could change at the start of the process. Each algorithm will calculate its adaptive version to fine-tune solutions over time. - The number of pruning loops in each algorithm will be 1000 in totals:
- Total loop(
SA
) =iterations
* number of prunable layers = 200 * 5 = 1000. - Total loop(
GA
) =num_generations
* (population_size
-elite_size
) * number of prunable layers = 20 * (12 - 2) * 5 = 1000.
- Total loop(
Metrics | Simulated Annealing (SA) | Genetic Algorithm (GA) |
---|---|---|
Baseline accuracy | 97.23% | |
Final accuracy | 87.44% | 89.02% |
Initial loss | 0.0959 | 0.0953 |
Final loss | 0.9923 | 0.9442 |
Sparsity gained | 0.4384 | 0.4133 |
Time taken | 11 mins 43 secs | 27 mins 46 secs |
-
Pruning Effectiveness: Both
SA
andGA
demonstrated significant model compression while maintaining reasonable accuracy. SA achieves a final sparsity of 43.84% with a 9.79%-point drop in accuracy, while GA reaches 41.33% sparsity with 8.21%-point drop. To quantify this trade-off, I use the following simple metric:SA
achieves a slightly better trade-off score, indicating that it may be more effective at balancing sparsity and accuracy.- However, the difference is marginal, and both techniques prove effective for the task. They maintain a reasonable balance between pruning aggressiveness (increased sparsity) and maintaining accuracy/loss.
-
Efficiency:
SA
proved to be faster thanGA
, requiring less than half the execution time. It completed the pruning process in 11 mins 43 secs, whereasGA
required 27 mins 46 secs. This makesSA
more suitable for applications where time efficiency is critical. -
Convergence Behaviour:
SA
exhibited rapid initial improvement, particularly beneficial for quick results, whileGA
showed more gradual, steady improvements across generations.
👉 Check the source.ipynb and report.pdf for more experiment details and comparison.