A list of the papers introducing the most popular advancement in deep learning. A list that is curated to only present the essentials, the hit singles of deep learning as we know it if you will. I decided not to include any advanced papers as I consider that they are of little use for beginners and practionners with no interest in advanced research. These are the papers that I consider paramount to read and understand as they present the challenges they tried to tackle and as a result gives some insight into neural networks practical uses.
Generative adversarial nets Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014) [pdf]
Backpropagation applied to handwritten zip code recognition Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. (1989) [pdf]
ImageNet classification with deep convolutional neural networks, Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (2012) [pdf].
Very deep convolutional networks for large-scale image recognition, K Simonyan, A Zisserman - arXiv preprint arXiv:1409.1556, (2014) [pdf].
Going deeper with convolutions, Szegedy, Christian, et al., Proceedings of the IEEE conference on computer vision and pattern recognition. (2015) [pdf].
Deep residual learning for image recognition He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun (2016) [pdf]
Rethinking the inception architecture for computer vision Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. (2016) [pdf]
Inception-v4, inception-resnet and the impact of residual connections on learning Szegedy, Christian, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi (2017) [pdf]
Deep sparse rectifier neural networks, Glorot Xavier, Antoine Bordes, and Yoshua Bengio (2011) [pdf]
Rectifier Nonlinearities Improve Neural Network Acoustic Models Andrew L. Maas Awni Y. Hannun Andrew Y. Ng (2012) [pdf]
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (2015) [pdf]
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs), Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter (2016) [pdf].
An overview of gradient descent optimization algorithms, Sebastian Ruder (2016) [pdf].
Efficient Backprop LeCun, Yann A., Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller (2012) [pdf]
Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba (2014) [pdf]
Ad Click Prediction: a View from the Trenches, H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, Jeremy Kubica (2013) [pdf]
Understanding the difficulty of training deep feedforward neural networks Xavier Glorot, Yoshua Bengio (2010) [pdf]
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun (2015) [pdf]
Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky Ilya Sutskever, Ruslan Salakhutdinov (2014) [pdf].
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Sergey Ioffe, Christian Szegedy (2015) [pdf].
Overtraining, regularization, and searching for minimum in neural networks, Sjöberg Jonas, Lennart Ljung (1992) [pdf]
Regularization of neural networks using dropconnect Wan L, Zeiler M, Zhang S, Le Cun Y, Fergus R. (2013) [pdf]
Layer normalization Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. (2013) [pdf]