paper review 30
- [Paper Review] Deep Neural Networks for YouTube Recommendations
- [Paper Review] SSD: Single Shot MultiBox Detector
- [Paper Review] You Only Look Once: Unified, Real-Time Object Detection
- [Paper Review] Very Deep Convolutional Networks for Large-Scale Image Recognition
- [Paper Review] Rich feature hierarchies for accurate object detection and semantic segmentation
- [Paper Review] Aggregated Residual Transformations for Deep Neural Networks
- [Paper Review] Deep Residual Learning for Image Recognition
- [Paper Review] Dropout Reduces Underfitting
- [Paper Review] DeiT III: Revenge of the ViT
- [Paper Review] Training data-efficient image transformers & distillation through attention
- TODO list
- [Paper Review] Vision Transformers Need Registers
- [Paper Review] DINOv2: Learning Robust Visual Features without Supervision
- [Paper Review] iBOT: Image BERT Pre-Training with Online Tokenizer
- [Paper Review] BEiT: BERT Pre-Training of Image Transformers
- [Paper Review] Emerging Properties in Self-Supervised Vision Transformers
- [Paper Review] Vision Transformer Adapter for Dense Predictions
- [Paper Review] Bootstrap your own latent: A new approach to self-supervised Learning
- [Paper Review] MLP-Mixer: An all-MLP Architecture for Vision
- [Paper Review] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- [Paper Review] Expanding Language-Image Pretrained Models for General Video Recognition
- [Paper Review] Learning Transferable Visual Models From Natural Language Supervision
- [Paper Review] DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
- [Paper Review] Deformable DETR: Deformable Transformers for End-to-End Object Detection
- [Paper Review] Attention Bottlenecks for Multimodal Fusion
- [Paper Review] Perceiver: General Perception with Iterative Attention
- [Paper Review] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
- [Paper Review] Masked Autoencoders Are Scalable Vision Learners
- [Paper Review] End-to-End Object Detection with Transformers
- [Paper Review] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation