OpenMMLab Detection Toolbox and Benchmark
-
Updated
Aug 21, 2024 - Python
OpenMMLab Detection Toolbox and Benchmark
pix2tex: Using a ViT to convert images of equations into LaTeX code.
This repository contains demos I made with the Transformers library by HuggingFace.
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
SwinIR: Image Restoration Using Swin Transformer (official repository)
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
OpenMMLab Pre-training Toolbox and Benchmark
Scenic: A Jax Library for Computer Vision Research and Beyond
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Efficient vision foundation models for high-resolution generation and perception.
EVA Series: Visual Representation Fantasies from BAAI
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
An all-in-one toolkit for computer vision
This is a collection of our NAS and Vision Transformer work.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
Add a description, image, and links to the vision-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-transformer topic, visit your repo's landing page and select "manage topics."