This repository is a collection of useful materials for Computer Vision.
The Update will continue, and if you want to be a Contributor, I really appreciate it about send me Pull Request.
Contributions welcome! Read the contribution guidelines first. Thank you for lot of attention.
- Image Classification
- Object Detection
- Semantic Segmentation
- Fine Grained Visual Categorization
- Meta Learning
- Image Self Supervised Learning
Year | Name | Arxiv | CODE |
---|---|---|---|
1998 | LeNet : Gradient-based learning applied to document recognition | ||
2012 | AlexNet : ImageNet Classification with Deep Convolutional Neural Networks | ||
2014 | VGGNet : Very Deep Convolutional Networks for Large-Scale Image Recognition | ||
2015 | GoogLeNet : Going Deeper with Convolutions | ||
2015 | InceptionNet : Going deeper with convolutions | ||
2016 | ResNet : Deep Residual Learning for Image Recognition | ||
2017 | SqueezeNet : AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size | ||
2017 | DenseNet : Densely Connected Convolutional Networks | ||
2017 | XceptionNet : Deep Learning with Depthwise Separable Convolutions | ||
2018 | MobileNetV1 : Efficient Convolutional Neural Networks for Mobile Vision Application | ||
2018 | ShuffleNet : An Extremely Efficient Convolutional Neural Network for Mobile Devices | ||
2018 | MobileNetV2 : Inverted Residuals and Linear Bottlenecks | ||
2018 | NASNet : Learning Transferable Architectures for Scalable Image Recognition | ||
2018 | Squeeze Excitation Network : Squeeze-and-Excitation Networks | ||
2018 | Residual Attention Network : Residual Attention Network for Image Classification | ||
2018 | Relative Position Attention : Self-Attention with Relative Position Representations | ||
2019 | CBAM : Convolutional Block Attention Module | ||
2021 | EfficientNet : Rethinking Model Scaling for Convolutional Neural Networks | ||
2021 | Vision Transformer : An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | ||
2021 | Deit : Training data-efficient image transformers & distillation through attention | ||
2021 | PVT : Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions | ||
2021 | T2T : Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet | ||
2021 | DeepVit : DeepViT: Towards Deeper Vision Transformer | ||
2021 | CvT: Introducing Convolutions to Vision Transformers | ||
2021 | CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification | ||
2021 | Focal-T : Focal Attention for Long-Range Interactions in Vision Transformers | ||
2021 | Hybrid Swin Transformers : Efficient large-scale image retrieval with deep feature orthogonality and Hybrid-Swin-Transformers | ||
2021 | Swin Transformer : Hierarchical Vision Transformer using Shifted Windows | ||
2022 | ConvNeXt : A ConvNet for the 2020s | ||
2023 | ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders |
Year | Name | Arxiv | CODE |
---|---|---|---|
2013 | R-CNN : Rich feature hierarchies for accurate object detection and semantic segmentation | ||
2015 | Faster R-CNN : Towards Real-Time Object Detection with Region Proposal Networks | ||
2016 | OHEM : Training Region-based Object Detectors with Online Hard Example Mining | ||
2016 | YOLOv1 : You Only Look Once: Unified, Real-Time Object Detection | ||
2016 | SSD(Single Shot Detection) : Single Shot MultiBox Detector | ||
2017 | FPN(Feature Pyramids Network) : Feature Pyramid Networks for Object Detection | ||
2017 | RetinaNet : Focal loss for dense object detection | ||
2017 | Mask-RCNN : Mask R-CNN | ||
2018 | YOLOv3 : An Incremental Improvement | ||
2018 | RefineDet : Single-Shot Refinement Neural Network for Object Detection | ||
2018 | M2Det : A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | ||
2018 | MetaAnchor: Learning to Detect Objects with Customized Anchors | ||
2019 | Mask scoring r-cnn | ||
2019 | FSFA : Feature selective anchor-free module for single-shot object detection | ||
2019 | Scratchdet : Exploring to train single-shot object detectors from scratch |
Year | Name | Arxiv | CODE |
---|---|---|---|
2014 | DeepLabV1 : Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs | ||
2015 | FCN(Fully Convolutional Layer) : Fully Convolutional Networks for Semantic Segmentation | ||
2015 | DeConvNet(Deconvolution Network) : Learning Deconvolution Network for Semantic Segmentation | ||
2015 | U-Net : Convolutional Networks for Biomedical Image Segmentation | ||
2016 | DilatedNet : Multi-Scale Context Aggregation by dilated Convolution | ||
2016 | ENet : A Deep Neural Network Architecture for Real-Time Semantic Segmentation | ||
2017 | ICNet : ICNet for Real-Time Semantic Segmentation on High-Resolution Images | ||
2017 | GCN : Large Kernel Matters Improve Semantic Segmentation by Global Convolutional Network | ||
2017 | PSPNet : Pyramid Scene Parsing Network | ||
2017 | LinkNet : Exploiting Encoder Representations for Efficient Semantic Segmentation | ||
2017 | DUC, HUC : Understanding Convolution for Semantic Segmentation | ||
2017 | SegNet : A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation | ||
2018 | ShuffleSeg : Real-Time Semantic Segmentation Network | ||
2018 | AdaptSegNet : Learning to Adapt Structured Output Space for Semantic Segmentation | ||
2018 | R2U-Net : Recurrent Residual Convolutional Neural Network based on U-Net for Medical Image Segmentation | ||
2018 | Attention U-Net : Learning Where to Look for the Pancreas | ||
2019 | MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation | ||
2021 | SETR : Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers | ||
2021 | UTNet : A Hybrid Transformer Architecture for Medical Image Segmentation |
Year | Name | Arxiv | CODE |
---|---|---|---|
2017 | MA-CNN : Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition | ||
2017 | RA-CNN : Recurrent Attention Convolutional Neural Networkfor Fine-grained Image Recognition | ||
2018 | NTS-Net : Learning to Navigate for Fine-grained Classification | ||
2019 | MGE-CNN : Learning a Mixture of Granularity-Specific Experts for Fine-GrainedCategorization | ||
2019 | DCL-Net : Destruction and Construction Learning for Fine-grained Image Recognition | ||
2020 | CIN : Channel Interaction Networks for Fine-Grained Image Categorization | ||
2020 | LIO(Look-into-Object) : Self-supervised Structure Modeling for Object Recognition | ||
2020 | PMG(Progressive Multi Granularity) : Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches | ||
2020 | PIM(Progressive Image Recognition) : Learning Rich Part Hierarchies With Progressive Attention Networks for Fine-Grained Image Recognition | ||
2021 | CAL(Counterfactual Attention Learning) : Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification | ||
2021 | TransFG : TransFG: A Transformer Architecture for Fine-grained Recognition | ||
2021 | ProtoTrees : Neural Prototype Trees for Interpretable Fine-grained Image Recognition | ||
2021 | RAMS-Trans : Recurrent Attention Multi-scale Transformer for Fine-grained Image Recognition | ||
2021 | FFVT : Feature Fusion Vision Transformer for Fine-Grained Visual Categorization | ||
2022 | PIM(plug in Module) : A Novel Plug-in Module for Fine-Grained Visual Classification | ||
2022 | FeNet : A Spatial Feature-Enhanced Attention Neural Network with High-Order Pooling Representation for Application in Pest and Disease Recognition | ||
2022 | MetaFormer : A Unified Meta Framework for Fine-Grained Recognition | ||
2022 | DCAL : Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification |
Year | Name | Arxiv | CODE |
---|---|---|---|
2014 | Neural Turing Machines | ||
2015 | Siamese Neural Networks for One-shot Image Recognition | ||
2016 | Matching Networks for One Shot Learning | ||
2016 | Learning to learn by gradient descent by gradient descent | ||
2016 | One-shot Learning with Memory-Augmented Neural Networks | ||
2017 | A Simple Neural Attentive Meta-Learner | ||
2017 | Meta Networks | ||
2017 | ProtoNet : Prototypical Networks for Few-shot Learning | ||
2017 | RelationNet : Learning to Compare: Relation Network for Few-Shot Learning | ||
2017 | optimization as a model for few-shot learning | ||
2017 | MAML : Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks | ||
2017 | Meta-SGD: Learning to Learn Quickly for Few-Shot Learning | ||
2019 | R2-D2/LR-D2 : Meta-learning with differentiable closed-form solvers | ||
2019 | MetaOptNet : Meta-Learning with Differentiable Convex Optimization |
Year | Name | Arxiv | CODE |
---|---|---|---|
2014 | Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks | ||
2016 | Unsupervised Visual Representation Learning by Context Prediction | ||
2017 | Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles | ||
2018 | Unsupervised Representation Learning by Predicting Image Rotations | ||
2019 | Revisiting self-supervised visual representation learning | ||
2019 | PIRL: Self-Supervised Learning of Pretext-Invariant Representations | ||
2020 | Momentum Contrast for Unsupervised Visual Representation Learning | ||
2020 | A Simple Framework for Contrastive Learning of Visual Representations | ||
2020 | Big Self-Supervised Models are Strong Semi-Supervised Learners | ||
2020 | MocoV2 : Improved Baselines with Momentum Contrastive Learning | ||
2020 | BYOL : Bootstrap Your Own Latent A New Approach to Self-Supervised Learning | ||
2020 | Simsiam : Exploring Simple Siamese Representation Learning | ||
2020 | Swav : Unsupervised Learning of Visual Features by Contrasting Cluster Assignments | ||
2021 | Barlow twins : Self-Supervised Learning via Redundancy Reduction | ||
2021 | VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning | ||
2021 | Jigsaw Clustering for Unsupervised Visual Representation Learning | ||
2021 | Masked Autoencoders Are Scalable Vision Learners | ||
2021 | SimMIM: A Simple Framework for Masked Image Modeling | ||
2021 | iBOT 🤖: Image BERT Pre-Training with Online Tokenizer | ||
2022 | Tailoring Self-Supervision for Supervised Learning | ||
2022 | BEiT: BERT Pre-Training of Image Transformers | ||
2023 | Denoising Masked AutoEncoders Help Robust Classification | ||
2023 | Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling | ||
2023 | CIM : Corrupted Image Modeling for Self-Supervised Visual Pre-Training | ||
2023 | PCAE : Progressively Compressed Auto-Encoder for Self-supervised Representation Learning | ||
2023 | HiViT: A Simple and More Efficient Design of Hierarchical Vision Transformer. |
@article = {
title = {Awesome ComputerVision},
author = {Wongi Park},
url = {https://github.com/kalelpark/Awesome-ComputerVision},
year = {2022},
}