amusi

Open Source

CVPR2026-Papers-with-Code

# CVPR 2026 论文和开源项目合集(Papers with Code) CVPR 2026 decisions are now available on OpenReview！25.42% = 4090 / 16092 > 注1：欢迎各位大佬提交issue，分享CVPR 2026论文和开源项目！ > > 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision > > - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code) > - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code) 欢迎扫码加入【CVer学术交流群】，可以获取CVPR 2026等最前沿工作！这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料，快加入学起来！ ![](CVer学术交流群.png) # 【CVPR 2026 论文开源目录】 - [3DGS(Gaussian Splatting)](#3DGS) - [Agent)](#Agent) - [Avatars](#Avatars) - [Backbone](#Backbone) - [CLIP](#CLIP) - [Mamba](#Mamba) - [Embodied AI](#Embodied-AI) - [GAN](#GAN) - [GNN](#GNN) - [多模态大语言模型(MLLM)](#MLLM) - [大语言模型(LLM)](#LLM) - [具身智能(Embodied AI)](#Embodied) - [空间智能(Spatial Intelligence](#SI) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [扩散模型(Diffusion Models)](#Diffusion) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [异常检测(Anomaly Detection)](#Anomaly-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像(Medical Image)](#MI) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [3D Visual Grounding(3D视觉定位)](#3DVG) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [3D生成(3D Generation)](#3D-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为检测(Action Detection)](#Action-Detection) - [遥感(Remote)](#Remote) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [视频压缩(Video Compression)](#VC) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [暗光图像增强(Low-light Image Enhancement)](#Low-light) - [场景图生成(Scene Graph Generation)](#SGG) - [图像检索(Image Retrieval)](#Image-Retrieval) - [风格迁移(Style Transfer)](#ST) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) - [压缩感知(Compressive Sensing)](#CS) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) <a name="3DGS"></a> # 3DGS(Gaussian Splatting) **Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting** - Paper: https://arxiv.org/abs/2602.20933 - Code: - Project: https://sk-fun.fun/DropAnSH-GS **Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking** - Paper: https://arxiv.org/abs/2512.01329 - Project: https://haza628.github.io/tagSplat/ **FastGS: Training 3D Gaussian Splatting in 100 Seconds** - Paper: https://arxiv.org/pdf/2511.04283 - Code: https://github.com/fastgs/FastGS - Project: https://fastgs.github.io/ <a name="Agent"></a> # Agent <a name="Avatars"></a> # Avatars # Backbone <a name="CLIP"></a> # CLIP <a name="Mamba"></a> # Mamba <a name="GAN"></a> # GAN <a name="OCR"></a> # OCR <a name="NeRF"></a> # NeRF <a name="DETR"></a> # DETR <a name="Prompt"></a> # Prompt <a name="MLLM"></a> # 多模态大语言模型(MLLM) **Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking** - Paper: https://arxiv.org/abs/2602.20330 - Code: https://github.com/UIUC-MONET/vlm-circuit-tracing **UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark** - Paper: https://arxiv.org/abs/2603.05075 - Code: - Project: https://any2any-mllm.github.io/unim/ <a name="LLM"></a> # 大语言模型(LLM) <a name="Embodied-AI"></a> # 具身智能(Embodied AI) **Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI** - Paper: https://arxiv.org/abs/2511.20620 - Code: https://github.com/ai4ce/wanderland - Project: https://ai4ce.github.io/wanderland/ <a name="SI"></a> # 空间智能(Spatial Intelligence) **Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning** - Paper: https://arxiv.org/abs/2510.27606 - Code: https://github.com/InternLM/Spatial-SSRL - Model: https://huggingface.co/internlm/Spatial-SSRL-7B <a name="NAS"></a> # NAS <a name="ReID"></a> # ReID(重识别) **MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification** - Paper: https://arxiv.org/abs/2512.03404 - Code: https://github.com/yjzhao1019/MOS <a name="Diffusion"></a> # 扩散模型(Diffusion Models) <a name="Vision-Transformer"></a> # Vision Transformer <a name="VL"></a> # 视觉和语言(Vision-Language) **StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues** - Paper: https://arxiv.org/abs/2602.20089 - Code: https://github.com/intelligolabs/StructXLIP **ApET: Approximation-Error Guided Token Compression for Efficient VLMs** - Paper: https://arxiv.org/abs/2602.19870 - Code: https://github.com/MaQianKun0/ApET **Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking** - Paper: https://arxiv.org/abs/2602.20330 - Code: https://github.com/UIUC-MONET/vlm-circuit-tracing <a name="Object-Detection"></a> # 目标检测(Object Detection) <a name="Anomaly-Detection"></a> # 异常检测(Anomaly Detection) <a name="VT"></a> # 目标跟踪(Object Tracking) <a name="MI"></a> # 医学图像(Medical Image) # 医学图像分割(Medical Image Segmentation) **MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation** - Paper: https://arxiv.org/abs/2602.20423 - Code: https://github.com/HealthX-Lab/MedCLIPSeg - Project: https://tahakoleilat.github.io/MedCLIPSeg <a name="Autonomous-Driving"></a> # 自动驾驶(Autonomous Driving) **Open-Vocabulary Domain Generalization in Urban-Scene Segmentation** - Paper: https://arxiv.org/pdf/2602.18853 - Code: https://github.com/DZhaoXd/s2_corr **U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences** - Paper: https://arxiv.org/abs/2512.02982 - Code: https://github.com/worldbench/U4D # 3D点云(3D-Point-Cloud) **CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation** - Paper: https://arxiv.org/abs/2602.20409 - Code: https://github.com/SarthakM320/CLIPoint3D <a name="3DOD"></a> # 3D目标检测(3D Object Detection) <a name="3DOD"></a> # 3D语义分割(3D Semantic Segmentation) <a name="LLV"></a> # Low-level Vision <a name="SR"></a> # 超分辨率(Super-Resolution) <a name="Denoising"></a> # 去噪(Denoising) ## 图像去噪(Image Denoising) <a name="3D-Human-Pose-Estimation"></a> # 3D人体姿态估计(3D Human Pose Estimation) <a name="3DVG"></a> #3D Visual Grounding(3D视觉定位) <a name="Image-Generation"></a> # 图像生成(Image Generation) ExpPortrait: Expressive Portrait Generation via Personalized Representation - Paper: https://arxiv.org/abs/2602.19900 - Code: <a name="Video-Generation"></a> # 视频生成(Video Generation) <a name="Image-Editing"></a> # 图像编辑(Image Editing) <a name="Video-Editing"></a> # 视频编辑(Video Editing) <a name="3D-Generation"></a> # 3D生成(3D Generation) <a name="3D-Reconstruction"></a> # 3D重建(3D Reconstruction) **tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction** - Project: https://cwchenwang.github.io/tttLRM/ - Paper: https://arxiv.org/abs/2602.20160 - Code: https://github.com/cwchenwang/tttLRM **Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning** - Project: https://flow3r-project.github.io/ - Paper: https://arxiv.org/abs/2602.20157 - Code: https://github.com/Kidrauh/flow3r **RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing** - Paper: https://arxiv.org/abs/2602.19753 - Code: https://github.com/yyyykf/RAP <a name="HMG"></a> # 人体运动生成(Human Motion Generation) <a name="Video-Understanding"></a> # 视频理解(Video Understanding) <a name="Remote"></a> # 遥感(Remote) Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation - Paper: https://arxiv.org/abs/2602.19863 - Code: None <a name="KD"></a> # 知识蒸馏(Knowledge Distillation) <a name="Depth-Estimation"></a> # 深度估计(Depth Estimation) <a name="Stereo-Matching"></a> # 立体匹配(Stereo Matching) <a name="Low-light"></a> # 暗光图像增强(Low-light Image Enhancement) <a name="IC"></a> # 图像压缩(Image Compression)](#IC) <a name="VC"></a> # 视频压缩(Video Compression)](#VC) **UniComp: Rethinking Video Compression Through Informational Uniqueness** - Paper: https://arxiv.org/abs/2512.03575 - Code: https://github.com/TimeMarker-LLM/UniComp <a name="SGG"></a> # 场景图生成(Scene Graph Generation) <a name="Image-Retrieval"></a> # 图像检索(Image Retrieval) **PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing ** - Paper: https://arxiv.org/abs/2603.04598 - Code: <a name="ST"></a> # 风格迁移(Style Transfer) <a name="IQA"></a> # 图像质量评价(Image Quality Assessment) <a name="Video-Quality-Assessment"></a> # 视频质量评价(Video Quality Assessment) <a name="CS"></a> # 压缩感知(Compressive Sensing) <a name="Datasets"></a> # 数据集(Datasets) <a name="Others"></a> # 其他(Others) **Decoupling Defense Strategies for Robust Image Watermarking** - Paper: https://arxiv.org/abs/2602.20053 - Code: None **Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery** - Paper: https://arxiv.org/abs/2602.19910 - Code: **The Invisible Gorilla Effect in Out-of-distribution Detection** - Paper: https://arxiv.org/abs/2602.20068 - Code: https://github.com/HarryAnthony/Invisible_Gorilla_Effect **SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images** - Paper: https://arxiv.org/abs/2602.20412 - Code: **RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces** - Paper: https://arxiv.org/abs/2602.20618 - Code: **Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models** - Paper: - Code: **GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement** - Paper: https://arxiv.org/abs/2603.05095 - Code: **FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation** - Paper: https://arxiv.org/abs/2603.04733 - Code: https://github.com/eVI-group-SCU/FOZO **Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning ** - Paper: https://arxiv.org/abs/2603.04825 - Code: https://github.com/RyanZhaoIc/CAD

Education & Learning ML Frameworks

22.7K Github Stars

Open Source

Deep-Learning-Interview-Book

# 深度学习面试宝典 **Deep Learning Interview Book** - :star: [求职攻略](https://github.com/amusi/AI-Job-Notes) - :smiley: [自我介绍](docs/自我介绍.md) - :1234: [数学](docs/数学.md) - :mortar_board: [机器学习](docs/机器学习.md) - :closed_book: [深度学习](docs/深度学习.md) - :green_book: [强化学习](docs/强化学习.md) - :eyes: [计算机视觉](docs/计算机视觉.md) - :camera: [传统图像处理](docs/传统图像处理.md) - :mahjong: [自然语言处理](docs/自然语言处理.md) - :surfer: [SLAM](docs/SLAM.md) - :busts_in_silhouette: [推荐算法](docs/推荐算法.md) - :bar_chart: [数据结构与算法](docs/数据结构与算法.md) - :snake: [编程语言：C/C++/Python](docs/编程语言.md) - :fireworks: [深度学习框架](docs/深度学习框架.md) - :pencil2: [面试经验](docs/面试经验.md) - :bulb: [面试技巧](docs/面试技巧.md) - :mega: [其它（计算机网络/Linux等）](docs/其它.md) - [2024年AI算法岗和开发岗求职群](https://mp.weixin.qq.com/s/sK_oSU1PmbUJ5ZGeMmY27A) # 加入2024年AI算法岗和开发岗求职群方式 **价格：原价199元，限时立减50！特惠仅149元！（每天仅4毛钱）** **时长：一年（从你加入的时刻算起）** **加入方式：微信扫描下方二维码，即可加入AI算法岗和开发岗求职群（知识星球）** > 建议：进群后，推荐下载知识星球APP使用，同时也可使用小程序或者知识星球公众号进行使用，可以发帖/提问/交流/回答，并可以快速访问群里的资源。 ![](docs/imgs/2024年AI求职群优惠券二维码.png) ![](docs/imgs/DLIB-Mindmap.png)

Education & Learning

8.9K Github Stars

Open Source

daily-paper-computer-vision

# daily-paper-computer-vision **记录每天整理的计算机视觉/深度学习/机器学习相关方向的论文** - [CV 优质论文速递](#PaperDaily) - [CV 顶会/顶刊（2017-2023）](#TopPaper) <a name="PaperDaily"></a> ## CV 优质论文速递 - [2023年（日更中）](2023-Paper.md) 为了方便内容沉淀和检索，现已在[【CVer计算机视觉】](https://github.com/amusi/CVPR2023-Papers-with-Code/blob/master/CVer%E5%AD%A6%E6%9C%AF%E4%BA%A4%E6%B5%81%E7%BE%A4.png) 中来完成**CV/AI优质论文、项目和应用速递**的每日更新，欢迎各位 CVer 加入！互相学习，一起进步~ [【CVer计算机视觉】](https://github.com/amusi/CVPR2023-Papers-with-Code/blob/master/CVer%E5%AD%A6%E6%9C%AF%E4%BA%A4%E6%B5%81%E7%BE%A4.png) 是最大的计算机视觉AI知识星球！每日更新！第一时间分享的方向涵盖：目标检测、语义分割、目标跟踪、Transformer、多模态、大模型、NeRF、扩散模型、深度估计、超分辨率、3D目标检测、CNN、GAN、竞赛解决方案、人脸识别、数据增广、人脸检测、数据集、NAS、AutoML、图像分割、SLAM、实例分割、人体姿态估计、视频目标分割、Re-ID、医学图像分割、显著性目标检测、自动驾驶、人群密度估计、PyTorch、人脸、车道线检测、去雾、全景分割、行人检测、文本检测、OCR、6D姿态估计、边缘检测、场景文本检测、视频实例分割、3D点云、模型压缩、人脸对齐、超分辨、去噪、强化学习、行为识别、OpenCV、场景文本识别、去雨、机器学习、风格迁移、视频目标检测、去模糊、显著性检测、剪枝、活体检测、人脸关键点检测、3D目标跟踪、视频修复、人脸表情识别、时序动作检测、图像检索、异常检测等 ![CVer学术交流群](./CVer学术交流群.png) <a name="TopPaper"></a> ## CV 顶会/顶刊 ### 2023 **CVPR 2023** - 论文列表：https://openaccess.thecvf.com/CVPR2023?day=all - 论文和代码：https://github.com/amusi/CVPR2023-Papers-with-Code **IJCAI 2023** 论文列表：https://ijcai-23.org/main-track-accepted-papers/ **ICLR 2023** - 论文列表：https://openreview.net/group?id=ICLR.cc/2023/Conference#notable-top-5- ### 2022 **NIPS 2022** - 论文列表：https://nips.cc/Conferences/2022/Schedule?type=Poster 和 https://openreview.net/group?id=NeurIPS.cc/2022/Conference **CVPR 2022** - 论文列表：https://openaccess.thecvf.com/CVPR2022?day=all - 论文和代码：https://github.com/amusi/CVPR2023-Papers-with-Code/blob/master/CVPR2022-Papers-with-Code.md **ECCV 2022** - 论文列表：https://www.ecva.net/papers.php 和 https://eccv2022.ecva.net/program/accepted-papers/ - 论文和代码：https://github.com/amusi/ECCV2022-Papers-with-Code **ACM MM 2022** - 论文列表：https://2022.acmmm.org/accepted-papers/ **WACV 2022** - 论文列表：https://openaccess.thecvf.com/WACV2023 **MICCAI 2022** - 论文列表：https://conferences.miccai.org/2022/papers/ 和 https://link.springer.com/book/10.1007/978-3-031-16431-6 **AAAI 2022** - 论文列表：https://aaai-2022.virtualchair.net/papers.html?filter=keywords&search=Poster+Session+12&cluster=Red+3 **ICLR 2022** - 论文列表：https://openreview.net/group?id=ICLR.cc/2022/Conference#oral-submissions ### 2021 **ICLR 2021** - 论文列表：https://docs.google.com/spreadsheets/d/1n58O0lgGI5kI0QQY9f4BDDpNB4oFjb5D51yMr9fHAK4/edit#gid=1546418007 - OpenReview数据：https://github.com/evanzd/ICLR2021-OpenReviewData - [ICLR 2021 Stats & Graphs](https://github.com/sharonzhou/ICLR2021-Stats) **AAAI 2021** - 论文列表：https://aaai.org/Conferences/AAAI-21/wp-content/uploads/2020/12/AAAI-21_Accepted-Paper-List.Main_.Technical.Track_.pdf **WACV 2021** - 论文列表：http://wacv2021.thecvf.com/program ### 2020 **CVPR 2020** - [CVPR 2020所有录用论文清单](http://openaccess.thecvf.com/CVPR2020.py) - CVPR 2020论文PDF下载（1467篇论文）：[百度云链接](https://pan.baidu.com/s/1DoPNWXpwEkzQdPOrLsO21w) 密码: te6h - [CVPR 2020 论文开源代码合集](https://github.com/amusi/CVPR2020-Code) **ECCV 2020** - [ECCV 2020 论文开源代码合集](https://github.com/amusi/ECCV2020-Code) **NIPS 2020** - 论文合集：https://neurips.cc/Conferences/2020/AcceptedPapersInitial - 带代码的论文合集：https://www.paperdigest.org/2020/11/neurips-2020-papers-with-code-data/ **ACM MM 2020** - 论文合集：https://dblp.org/db/conf/mm/mm2020.html - 论文合集：https://2020.acmmm.org/main-track-list.html **MICCAI 2020** - 论文合集：https://drive.google.com/drive/folders/1GDKe2raJf4ylWqb1jxGmnsR384kmjYBb?usp=sharing ### 2019 **CVPR 2019** - [CVPR 2019所有录用论文清单](<http://openaccess.thecvf.com/CVPR2019.py>) - CVPR 2019论文PDF下载（1294篇论文）：[百度云链接](https://pan.baidu.com/s/19ef0HOz4hduDpcEK2PY9Kw ) 密码: mwgv - [CVPR 2019 开源代码合集](<https://github.com/amusi/CVPR2019-Code>) **ICCV 2019** - [ICCV 2019所有录用论文清单](<http://openaccess.thecvf.com/ICCV2019.py>) - ICCV 2019论文PDF下载（1075篇论文）：[百度云链接](https://pan.baidu.com/s/1snDhED1Y-6qbV1ImQoYIPA ) 密码: h7c2 **NeurIPS 2019** - NeurIPS 2019 录用论文名单（1427篇）：[百度云链接](https://pan.baidu.com/s/1TxD263qqXmja3fBZVwtP3g) 密码：04wn **IJCAI 2019** - IJCAI 2019所有录用论文清单（847篇）：[百度云链接](https://pan.baidu.com/s/1mVEowSZLBcz3X-_CZt7svA) 密码：v6ps ### 2018 **CVPR 2018** - [CVPR 2018所有录用论文清单](2018/cvpr2018-paper-list.csv) - CVPR 2018论文PDF下载（979篇论文）：[百度云链接](https://pan.baidu.com/s/1lYEM_kkw1PWTkQzUvjG2pw) 密码: 6pgk **ECCV 2018** - [ECCV 2018所有录用论文清单](http://openaccess.thecvf.com/ECCV2018.py) - ECCV 2018论文PDF下载：[百度云链接](https://pan.baidu.com/s/1Mg0Kw9bepUK6_vqqVSOjNQ) 密码: mh97 ### 2017 **CVPR 2017** - CVPR 2017论文PDF下载：[百度云链接](https://pan.baidu.com/s/1RP1wQBFxs8BT0KBLiukxBw) 密码: hnzg

Education & Learning

6.8K Github Stars

Open Source

AI-Job-Notes

# AI-Job-Notes AI算法岗求职攻略：涵盖校招交流群、校招时间表、准备攻略、刷题指南、内推、AI公司清单和答疑等资料。 AI算法岗方向涉及：AIGC、大模型、深度学习、机器学习、计算机视觉、NLP、具身智能、多模态、图像处理、自动驾驶、具身智能和SLAM等。开发岗方向涉及：Java、C/C++、Python、Go、嵌入式等。 # 目录  - [0 2025年校招群](#Group) - [1 校招时间表](#Scheduled) - [2 准备攻略](#Strategy) - [3 AI 面经和刷题指南](#Coding) - [4 AI算法岗和开发岗求职群（内推）](#Recommend) - [5 简历模板](#Resume) - [6 AI 类公司清单（以CV岗为主）](#Company) - [7 往届AI算法岗薪资情况](#Salary) - [8 答疑（含130个问答）](#Q&A) <a name="Group"></a> ## 1 2025年校招群 **2025年AI算法岗和开发岗求职群已成立！** 详情请戳：[「2025年AI算法岗求职群」](https://mp.weixin.qq.com/s/oN9nrIY5_fRVUgHc6ly0nw) **价格：原价199元，限时立减60！特惠仅139元！（每天仅4毛钱）** **时长：一年（从你加入的时刻算起）** **加入方式：微信扫描下方二维码，即可加入AI算法岗和开发岗求职群（知识星球）** > 建议：进群后，推荐下载知识星球APP使用，同时也可使用小程序或者知识星球公众号进行使用，可以发帖/提问/交流/回答，并可以快速访问群里的资源。 ![](imgs/2025年AI求职群优惠券二维码.png) <a name="Scheduled"></a> ## 1 校招时间表 ![](imgs/校招时间表.png) 以今年(2024)为例，默认为2025届学生（2024届学生称为上届） | 时间 | 任务 | | -------------- | ----------------------------------- | | 2024年2月~5月 | 找暑期实习/上届春招（补招） | | 2024年6月~8月 | 秋招提前批（神仙打架） | | 2024年8月~11月 | 秋招正式批（神仙继续打架+菜鸡互啄） | ### 1.1 暑期实习 2024年2月~5月：暑期实习。实习一般分成两种： - 日常实习 - 暑期实习 ![](imgs/实习.png) **日常实习**：日常实习是任何时候都可以找的，通常是根据具体部门的需求，由公司HR、部门主管或者部门员工发布招聘消息，相对较为零散也比较灵活。 **暑期实习**：很多公司，特别是大公司（如BAT等大厂），都会组织专项的**暑期实习生**招聘活动。一方面是针对在校学生的情况（很多学生只有暑期才有假期，或者导师暑假才放人），另一方面就是为了秋季校招（大规模招聘）吸引人才。暑期实习具有很大的意义，对学生来说，最直接的好处就是转正机会。暑期实习，一般6月底左右实习入职（也可以根据自己的时间，提前入职），一般8月底或9月份会有专项暑期实习答辩，根据综合表现，答辩通过后就可以基本结束秋招了。注：这里建议在进入公司参加暑期实习的期间，也要参加秋招提前批和秋招正式批，并多投递一些公司，即使在实习，所谓的很忙，没时间准备秋招了，那也要多投。暑期实习的另一个好处是增加可贵的实习经验，简历会好看很多。 > 其实也还有"寒假实习"，但很少有规模化的寒假实习招聘，顶多算是在寒假期间的集中式日常实习 ### 1.2 秋招提前批 **2024年6月~8月：秋招提前批（神仙打架）** 每年打响秋招第一枪的基本是vivo或者大疆(DJI)科技，然后BAT等大厂居多是7月份开始。这时候的校招，绝大部分都是内推/提前批，而不是正式批，大家一定要珍惜这个时间点：6月~8月。虽然我调侃着说神仙打架，但还是要注意这时候性价比特别高。一方面是薪资普遍高，通常一些SP/SSP Offer都是这个节点发出来的，另一方面是投递的人数还不是很多，因为有些人没有意识到这个提前批的重要性，老想着多准备一点，到秋招正式批再大干一场。需要注意的是：参与秋招提前批的大佬特别多，同时岗位hc并不多（因为企业要考虑正式批的情况，会控制招聘人数），所以我把秋招提前批比作：神仙打架。另外，秋招提前批大多以内推为主，后面章节中我会说到如何获取招聘信息以及如何内推。注：提取批挂了，正式批可以再继续投（具体看不同公司的招聘介绍）。 ### 1.3 秋招正式批 **2024年8月~11月：秋招正式（神仙继续打架+菜鸡互啄）** 有句话叫做金九银十，也就是9月份的 Offer 比10月份的 Offer 更可贵，这话其实很有道理，所以大家可以脑补到7、8月份的 Offer 属于什么 level 了。这时候也很考验大家的心态，比如9月份或10月份了，如果你手里还没有Offer，再看看身边已经拿到Offer的同学，一定变成柠檬精。 > 注：有些公司会在8月就开启秋招正式批的招聘所以 Amusi 这里强烈建议一定要把握住**秋招提前批 **。当然了，如果9月份手里还没有Offer，心态千万别崩，继续投继续干，记住一句话：多投准没错！其实大部分同学都是9月、10月才陆续收到Offer的，所以你多投继续努力，收获肯定会有的。 <a name="Strategy"></a> ## 2 准备攻略因为这就好像是学习计划一样，每个人都要自己的习惯，我的你并不一定适用（即将上新资料）。所以我就用一个精简的公式来介绍。 **公式：刷题(LeetCode/剑指Offer) + AI基础知识 + 编程基础知识 + 面试八股文(cs/AI) + 项目 + 实习 + 竞赛 +顶会/顶刊** 对于上述维度，一般来说：具备的越多越好，特别是对于门槛越来越高的AI算法岗。 <a name="Coding"></a> ## 3 AI面经和刷题指南 ### 3.1 深度学习面试宝典详见：[深度学习面试宝典（含数学、机器学习、深度学习、计算机视觉、自然语言处理和SLAM等方向）](<https://github.com/amusi/Deep-Learning-Interview-Book>) **Deep Learning Interview Book** 部分内容如下： - 😃 [自我介绍](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E8%87%AA%E6%88%91%E4%BB%8B%E7%BB%8D.md) - 🔢 [数学](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%95%B0%E5%AD%A6.md) - 🎓 [机器学习](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0.md) - 📕 [深度学习](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0.md) - 📗 [强化学习](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0.md) - 👀 [计算机视觉](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E8%AE%A1%E7%AE%97%E6%9C%BA%E8%A7%86%E8%A7%89.md) - 📷 [传统图像处理](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E4%BC%A0%E7%BB%9F%E5%9B%BE%E5%83%8F%E5%A4%84%E7%90%86.md) - 🀄️ [自然语言处理](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86.md) - 🏄 [SLAM](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/SLAM.md) - 👥 [推荐算法](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%8E%A8%E8%8D%90%E7%AE%97%E6%B3%95.md) - 📊 [数据结构与算法](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84%E4%B8%8E%E7%AE%97%E6%B3%95.md) - 🐍 [编程语言：C/C++/Python](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80.md) - 🎆 [深度学习框架](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E6%A1%86%E6%9E%B6.md) - ✏️ [面试经验](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E9%9D%A2%E8%AF%95%E7%BB%8F%E9%AA%8C.md) - 💡 [面试技巧](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E9%9D%A2%E8%AF%95%E6%8A%80%E5%B7%A7.md) - 📣 [其它（计算机网络/Linux等）](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E5%85%B6%E5%AE%83.md) ### 3.2 刷题指南刷题的目的是为了学习数据结构和算法，锻炼编程能力和熟悉刷题技巧刷题建议：先刷[《剑指Offer》](https://www.nowcoder.com/ta/coding-interviews)（66题），再刷 [LeetCode](https://leetcode.com/)（目前LeetCode已经有1000+题，可以根据类别来刷，但强烈建议先刷完 [LeetCode 面试高频题](https://leetcode.com/problemset/top-interview-questions/)） > 注：根据去年校招提前批的情况来看，LeetCode 建议至少刷200-300题，所以2024年（2025届）找工作的同学一定要努力刷起来了！ #### 3.2.1 刷题编程语言 - C/C++ - Python - JAVA（不推荐） > 注：如果时间充裕，而且有 C++ 基础，那么强烈建议使用 C++和 Python 同时刷题。 > > 根据 2023 年（2024届）校招提前批的情况来看，会 C++ 的同学具有有一定优势。 #### 3.2.2 书籍推荐 | 书籍 | 豆瓣评分 | 推荐指数 | | ------------------------------------------------------------ | -------- | -------- | | [《剑指Offer》](https://book.douban.com/subject/25910559/) | 9.1 | ☆☆☆☆☆ | | [《数据结构(C++语言版)》](https://book.douban.com/subject/25859528/) | 9.4 | ☆☆☆☆ | | [《算法图解》](https://book.douban.com/subject/26979890/) | 8.4 | ☆☆☆☆ | | [《大话数据结构》](https://book.douban.com/subject/6424904/) | 7.9 | ☆☆☆ | | [《算法》(第四版)](https://book.douban.com/subject/19952400/) | 9.4 | ☆☆☆ | > 注：其实还有很多方向没有涉及，如linux、数据库，但暂时先推荐这些，后面再补充 #### 3.2.3 在线刷题网站 - [LeetCode(英文)](https://leetcode.com/) - [LeetCode(中文)](https://leetcode-cn.com/) - [牛客网](https://www.nowcoder.com/)：推荐剑指Offer和各大公司往年题库，牛客网的优势在于很多公司都会使用其作为在线刷题平台，所以在这上面刷题，有利于懂得输入输出等"套路" #### 3.2.4 刷题方法 - 《剑指Offer》全刷完 - LeetCode选择性刷：可以类别来刷题，如数组类、链表类，或者面试高频类 #### 3.2.5 刷题时间现在起~2024-11-15 #### 3.2.6 刷题重要性正常校招流程都要进行在线笔试，面试中也可能会手撕代码，所以刷题十分影响面试结果。 <a name="Recommend"></a> ## 4 AI算法岗和开发岗求职群和内推国内公司人工智能方向岗位的内推机会，含机器学习、深度学习、计算机视觉和自然语言处理等方向。 ### 4.1 内推的重要性内推，真的太重要了。其实现在找实习也一样，内推的重要性就提醒出来了，比如我这边的资源就可以内推到BAT、商汤、旷视等公司，一般常规操作是网上投递简历，而快速直接的就是将简历送到leader/主管那里。而且内推是建立在一种互信的基础上(虽然不大)，该走的流程还是要走，但无形中增大了面试通过概率。你要知道，很多人的简历在官网或者其他第三方招聘网站上就直接卡死了。 ### 4.2 如何内推？内推的方式很多，比如： 1. 强关联：直接找已经毕业的师兄师姐或朋友内推（缺点是身边朋友去的企业有限，很多人是第一批从事算法岗的，可能都没有师兄师姐搞这个） 2. 常规操作：上牛客网论坛看企业人员发内推帖子、关注一些招聘公众号（这里我就不推荐，因为很多公众号都很有套路，内推一个企业，还要转发文章到其它群里，然后截图给他们，可是对于大多数人，为了内推，只能这么干） 3. Amusi 内推。这里感觉像似打广告一样，但确实是一个方式，因为我手里资源挺多的，很多公司的人都认识，可以直接内推。感兴趣的可以关注一下这个AI算法岗和开发岗求职群：[「2024年AI算法岗求职群」](https://mp.weixin.qq.com/s/sK_oSU1PmbUJ5ZGeMmY27A) ### 4.3 AI算法岗和开发岗求职群 **价格：原价199元，限时立减60！特惠仅139元！（每天仅4毛钱）** **时长：一年（从你加入的时刻算起）** **加入方式：微信扫描下方二维码，即可加入AI算法岗和开发岗求职群（知识星球）** > 建议：进群后，推荐下载知识星球APP使用，同时也可使用小程序或者知识星球公众号进行使用，可以发帖/提问/交流/回答，并可以快速访问群里的资源。 ![](imgs/2025年AI求职群优惠券二维码.png) <a name="Resume"></a> ## 5 简历模板提供了三份简历模板，详见：[AI 算法岗简历模板](https://github.com/amusi/AI-Job-Resume) ![](imgs/Resume-Demo.png) <a name="Company"></a> ## 6 AI类公司清单（以CV岗为主）首先 AI > CV，所以提供CV岗的公司肯定就提供 AI岗。但至于这些公司是否还有 NLP、机器学习、语音识别、推荐算法和 SLAM等岗位，这个需要大家自行去官网进行了解。计算机视觉(CV)算法岗位的公司名单详见：https://github.com/amusi/CV-Jobs <a name="Salary"></a> ## 7 往届AI算法岗薪资情况这里说说2024届AI算法岗的薪资情况。我只以**硕士及一线左右城市**为例（北上广深、南京、杭州等），因为像武汉、成都，你即使找的AI算法岗，但城市不一样，薪资还是多少有区别，明显不能只看Money，不考虑城市大环境。 - **白菜价：25w~35w** - **SP：35w~45w** - **SSP：45w+** 说年薪有点笼统，我再说细一点，大家也可以提取熟悉一下。一般企业薪资构成是： - 年薪总包 = 月薪*(12+X) + 住房补贴+ 股票/期权 + 签字费 X一般是2~5个月的薪资，很多是3个月。注：跟hr谈薪资的时候，如果她/他问你：你的希望薪资是多少？！这时候你一定要往高了要，至少比你想要的高30%。听我的，没有错，不然... ![](imgs/salary.png) <a name="Q&A"></a> ## 8 答疑 130个问答请戳—> [Q&A](Q&A.md)

Productivity ML Frameworks

6.1K Github Stars

Open Source

awesome-object-detection

# object-detection [TOC] This is a list of awesome articles about object detection. If you want to read the paper according to time, you can refer to [Date](Date.md). - R-CNN - Fast R-CNN - Faster R-CNN - Mask R-CNN - Light-Head R-CNN - Cascade R-CNN - SPP-Net - YOLO - YOLOv2 - YOLOv3 - YOLT - SSD - DSSD - FSSD - ESSD - MDSSD - Pelee - Fire SSD - R-FCN - FPN - DSOD - RetinaNet - MegDet - RefineNet - DetNet - SSOD - CornerNet - M2Det - 3D Object Detection - ZSD（Zero-Shot Object Detection） - OSD（One-Shot object Detection） - Weakly Supervised Object Detection - Softer-NMS - 2018 - 2019 - Other Based on handong1587's github: https://handong1587.github.io/deep_learning/2015/10/09/object-detection.html # Survey **Imbalance Problems in Object Detection: A Review** - intro: under review at TPAMI - arXiv: <https://arxiv.org/abs/1909.00169> **Recent Advances in Deep Learning for Object Detection** - intro: From 2013 (OverFeat) to 2019 (DetNAS) - arXiv: <https://arxiv.org/abs/1908.03673> **A Survey of Deep Learning-based Object Detection** - intro：From Fast R-CNN to NAS-FPN - arXiv：<https://arxiv.org/abs/1907.09408> **Object Detection in 20 Years: A Survey** - intro：This work has been submitted to the IEEE TPAMI for possible publication - arXiv：<https://arxiv.org/abs/1905.05055> **《Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks》** - intro: awesome - arXiv: https://arxiv.org/abs/1809.03193 **《Deep Learning for Generic Object Detection: A Survey》** - intro: Submitted to IJCV 2018 - arXiv: https://arxiv.org/abs/1809.02165 # Papers&Codes ## R-CNN **Rich feature hierarchies for accurate object detection and semantic segmentation** - intro: R-CNN - arxiv: <http://arxiv.org/abs/1311.2524> - supp: <http://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr-supp.pdf> - slides: <http://www.image-net.org/challenges/LSVRC/2013/slides/r-cnn-ilsvrc2013-workshop.pdf> - slides: <http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf> - github: <https://github.com/rbgirshick/rcnn> - notes: <http://zhangliliang.com/2014/07/23/paper-note-rcnn/> - caffe-pr("Make R-CNN the Caffe detection example"): <https://github.com/BVLC/caffe/pull/482> ## Fast R-CNN **Fast R-CNN** - arxiv: <http://arxiv.org/abs/1504.08083> - slides: <http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf> - github: <https://github.com/rbgirshick/fast-rcnn> - github(COCO-branch): <https://github.com/rbgirshick/fast-rcnn/tree/coco> - webcam demo: <https://github.com/rbgirshick/fast-rcnn/pull/29> - notes: <http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/> - notes: <http://blog.csdn.net/linj_m/article/details/48930179> - github("Fast R-CNN in MXNet"): <https://github.com/precedenceguo/mx-rcnn> - github: <https://github.com/mahyarnajibi/fast-rcnn-torch> - github: <https://github.com/apple2373/chainer-simple-fast-rnn> - github: <https://github.com/zplizzi/tensorflow-fast-rcnn> **A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection** - intro: CVPR 2017 - arxiv: <https://arxiv.org/abs/1704.03414> - paper: <http://abhinavsh.info/papers/pdfs/adversarial_object_detection.pdf> - github(Caffe): <https://github.com/xiaolonw/adversarial-frcnn> ## Faster R-CNN **Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks** - intro: NIPS 2015 - arxiv: <http://arxiv.org/abs/1506.01497> - gitxiv: <http://www.gitxiv.com/posts/8pfpcvefDYn2gSgXk/faster-r-cnn-towards-real-time-object-detection-with-region> - slides: <http://web.cs.hacettepe.edu.tr/~aykut/classes/spring2016/bil722/slides/w05-FasterR-CNN.pdf> - github(official, Matlab): <https://github.com/ShaoqingRen/faster_rcnn> - github(Caffe): <https://github.com/rbgirshick/py-faster-rcnn> - github(MXNet): <https://github.com/msracver/Deformable-ConvNets/tree/master/faster_rcnn> - github(PyTorch--recommend): <https://github.com//jwyang/faster-rcnn.pytorch> - github: <https://github.com/mitmul/chainer-faster-rcnn> - github(Torch):: <https://github.com/andreaskoepf/faster-rcnn.torch> - github(Torch):: <https://github.com/ruotianluo/Faster-RCNN-Densecap-torch> - github(TensorFlow): <https://github.com/smallcorgi/Faster-RCNN_TF> - github(TensorFlow): <https://github.com/CharlesShang/TFFRCNN> - github(C++ demo): <https://github.com/YihangLou/FasterRCNN-Encapsulation-Cplusplus> - github(Keras): <https://github.com/yhenon/keras-frcnn> - github: <https://github.com/Eniac-Xie/faster-rcnn-resnet> - github(C++): <https://github.com/D-X-Y/caffe-faster-rcnn/tree/dev> **R-CNN minus R** - intro: BMVC 2015 - arxiv: <http://arxiv.org/abs/1506.06981> **Faster R-CNN in MXNet with distributed implementation and data parallelization** - github: <https://github.com/dmlc/mxnet/tree/master/example/rcnn> **Contextual Priming and Feedback for Faster R-CNN** - intro: ECCV 2016. Carnegie Mellon University - paper: <http://abhinavsh.info/context_priming_feedback.pdf> - poster: <http://www.eccv2016.org/files/posters/P-1A-20.pdf> **An Implementation of Faster RCNN with Study for Region Sampling** - intro: Technical Report, 3 pages. CMU - arxiv: <https://arxiv.org/abs/1702.02138> - github: <https://github.com/endernewton/tf-faster-rcnn> - github: https://github.com/ruotianluo/pytorch-faster-rcnn **Interpretable R-CNN** - intro: North Carolina State University & Alibaba - keywords: AND-OR Graph (AOG) - arxiv: <https://arxiv.org/abs/1711.05226> **Domain Adaptive Faster R-CNN for Object Detection in the Wild** - intro: CVPR 2018. ETH Zurich & ESAT/PSI - arxiv: <https://arxiv.org/abs/1803.03243> ## Mask R-CNN - arxiv: <http://arxiv.org/abs/1703.06870> - github(Keras): https://github.com/matterport/Mask_RCNN - github(Caffe2): https://github.com/facebookresearch/Detectron - github(Pytorch): <https://github.com/wannabeOG/Mask-RCNN> - github(MXNet): https://github.com/TuSimple/mx-maskrcnn - github(Chainer): https://github.com/DeNA/Chainer_Mask_R-CNN ## Light-Head R-CNN **Light-Head R-CNN: In Defense of Two-Stage Object Detector** - intro: Tsinghua University & Megvii Inc - arxiv: <https://arxiv.org/abs/1711.07264> - github(offical): https://github.com/zengarden/light_head_rcnn - github: <https://github.com/terrychenism/Deformable-ConvNets/blob/master/rfcn/symbols/resnet_v1_101_rfcn_light.py#L784> ## Cascade R-CNN **Cascade R-CNN: Delving into High Quality Object Detection** - arxiv: <https://arxiv.org/abs/1712.00726> - github: <https://github.com/zhaoweicai/cascade-rcnn> ## SPP-Net **Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition** - intro: ECCV 2014 / TPAMI 2015 - arxiv: <http://arxiv.org/abs/1406.4729> - github: <https://github.com/ShaoqingRen/SPP_net> - notes: <http://zhangliliang.com/2014/09/13/paper-note-sppnet/> **DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection** - intro: PAMI 2016 - intro: an extension of R-CNN. box pre-training, cascade on region proposals, deformation layers and context representations - project page: <http://www.ee.cuhk.edu.hk/%CB%9Cwlouyang/projects/imagenetDeepId/index.html> - arxiv: <http://arxiv.org/abs/1412.5661> **Object Detectors Emerge in Deep Scene CNNs** - intro: ICLR 2015 - arxiv: <http://arxiv.org/abs/1412.6856> - paper: <https://www.robots.ox.ac.uk/~vgg/rg/papers/zhou_iclr15.pdf> - paper: <https://people.csail.mit.edu/khosla/papers/iclr2015_zhou.pdf> - slides: <http://places.csail.mit.edu/slide_iclr2015.pdf> **segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection** - intro: CVPR 2015 - project(code+data): <https://www.cs.toronto.edu/~yukun/segdeepm.html> - arxiv: <https://arxiv.org/abs/1502.04275> - github: <https://github.com/YknZhu/segDeepM> **Object Detection Networks on Convolutional Feature Maps** - intro: TPAMI 2015 - keywords: NoC - arxiv: <http://arxiv.org/abs/1504.06066> **Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction** - arxiv: <http://arxiv.org/abs/1504.03293> - slides: <http://www.ytzhang.net/files/publications/2015-cvpr-det-slides.pdf> - github: <https://github.com/YutingZhang/fgs-obj> **DeepBox: Learning Objectness with Convolutional Networks** - keywords: DeepBox - arxiv: <http://arxiv.org/abs/1505.02146> - github: <https://github.com/weichengkuo/DeepBox> ## YOLO **You Only Look Once: Unified, Real-Time Object Detection** [![img](https://camo.githubusercontent.com/e69d4118b20a42de4e23b9549f9a6ec6dbbb0814/687474703a2f2f706a7265646469652e636f6d2f6d656469612f66696c65732f6461726b6e65742d626c61636b2d736d616c6c2e706e67)](https://camo.githubusercontent.com/e69d4118b20a42de4e23b9549f9a6ec6dbbb0814/687474703a2f2f706a7265646469652e636f6d2f6d656469612f66696c65732f6461726b6e65742d626c61636b2d736d616c6c2e706e67) - arxiv: <http://arxiv.org/abs/1506.02640> - code: <https://pjreddie.com/darknet/yolov1/> - github: <https://github.com/pjreddie/darknet> - blog: <https://pjreddie.com/darknet/yolov1/> - slides: <https://docs.google.com/presentation/d/1aeRvtKG21KHdD5lg6Hgyhx5rPq_ZOsGjG5rJ1HP7BbA/pub?start=false&loop=false&delayms=3000&slide=id.p> - reddit: <https://www.reddit.com/r/MachineLearning/comments/3a3m0o/realtime_object_detection_with_yolo/> - github: <https://github.com/gliese581gg/YOLO_tensorflow> - github: <https://github.com/xingwangsfu/caffe-yolo> - github: <https://github.com/frankzhangrui/Darknet-Yolo> - github: <https://github.com/BriSkyHekun/py-darknet-yolo> - github: <https://github.com/tommy-qichang/yolo.torch> - github: <https://github.com/frischzenger/yolo-windows> - github: <https://github.com/AlexeyAB/yolo-windows> - github: <https://github.com/nilboy/tensorflow-yolo> **darkflow - translate darknet to tensorflow. Load trained weights, retrain/fine-tune them using tensorflow, export constant graph def to C++** - blog: <https://thtrieu.github.io/notes/yolo-tensorflow-graph-buffer-cpp> - github: <https://github.com/thtrieu/darkflow> **Start Training YOLO with Our Own Data** [![img](https://camo.githubusercontent.com/2f99b692dd7ce47d7832385f3e8a6654e680d92a/687474703a2f2f6775616e6768616e2e696e666f2f626c6f672f656e2f77702d636f6e74656e742f75706c6f6164732f323031352f31322f696d616765732d34302e6a7067)](https://camo.githubusercontent.com/2f99b692dd7ce47d7832385f3e8a6654e680d92a/687474703a2f2f6775616e6768616e2e696e666f2f626c6f672f656e2f77702d636f6e74656e742f75706c6f6164732f323031352f31322f696d616765732d34302e6a7067) - intro: train with customized data and class numbers/labels. Linux / Windows version for darknet. - blog: <http://guanghan.info/blog/en/my-works/train-yolo/> - github: <https://github.com/Guanghan/darknet> **YOLO: Core ML versus MPSNNGraph** - intro: Tiny YOLO for iOS implemented using CoreML but also using the new MPS graph API. - blog: <http://machinethink.net/blog/yolo-coreml-versus-mps-graph/> - github: <https://github.com/hollance/YOLO-CoreML-MPSNNGraph> **TensorFlow YOLO object detection on Android** - intro: Real-time object detection on Android using the YOLO network with TensorFlow - github: <https://github.com/natanielruiz/android-yolo> **Computer Vision in iOS – Object Detection** - blog: <https://sriraghu.com/2017/07/12/computer-vision-in-ios-object-detection/> - github:<https://github.com/r4ghu/iOS-CoreML-Yolo> ## YOLOv2 **YOLO9000: Better, Faster, Stronger** - arxiv: <https://arxiv.org/abs/1612.08242> - code: <http://pjreddie.com/yolo9000/> https://pjreddie.com/darknet/yolov2/ - github(Chainer): <https://github.com/leetenki/YOLOv2> - github(Keras): <https://github.com/allanzelener/YAD2K> - github(PyTorch): <https://github.com/longcw/yolo2-pytorch> - github(Tensorflow): <https://github.com/hizhangp/yolo_tensorflow> - github(Windows): <https://github.com/AlexeyAB/darknet> - github: <https://github.com/choasUp/caffe-yolo9000> - github: <https://github.com/philipperemy/yolo-9000> - github(TensorFlow): <https://github.com/KOD-Chen/YOLOv2-Tensorflow> - github(Keras): <https://github.com/yhcc/yolo2> - github(Keras): <https://github.com/experiencor/keras-yolo2> - github(TensorFlow): <https://github.com/WojciechMormul/yolo2> **darknet_scripts** - intro: Auxilary scripts to work with (YOLO) darknet deep learning famework. AKA -> How to generate YOLO anchors? - github: <https://github.com/Jumabek/darknet_scripts> **Yolo_mark: GUI for marking bounded boxes of objects in images for training Yolo v2** - github: <https://github.com/AlexeyAB/Yolo_mark> **LightNet: Bringing pjreddie's DarkNet out of the shadows** <https://github.com//explosion/lightnet> **YOLO v2 Bounding Box Tool** - intro: Bounding box labeler tool to generate the training data in the format YOLO v2 requires. - github: <https://github.com/Cartucho/yolo-boundingbox-labeler-GUI> **Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors** - intro: **LRM** is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded. - arxiv: https://arxiv.org/abs/1804.04606 **Object detection at 200 Frames Per Second** - intro: faster than Tiny-Yolo-v2 - arxiv: https://arxiv.org/abs/1805.06361 **Event-based Convolutional Networks for Object Detection in Neuromorphic Cameras** - intro: YOLE--Object Detection in Neuromorphic Cameras - arxiv:https://arxiv.org/abs/1805.07931 **OmniDetector: With Neural Networks to Bounding Boxes** - intro: a person detector on n fish-eye images of indoor scenes（NIPS 2018） - arxiv:https://arxiv.org/abs/1805.08503 - datasets:https://gitlab.com/omnidetector/omnidetector ## YOLOv3 **YOLOv3: An Incremental Improvement** - arxiv:https://arxiv.org/abs/1804.02767 - paper:https://pjreddie.com/media/files/papers/YOLOv3.pdf - code: <https://pjreddie.com/darknet/yolo/> - github(Official):https://github.com/pjreddie/darknet - github:https://github.com/mystic123/tensorflow-yolo-v3 - github:https://github.com/experiencor/keras-yolo3 - github:https://github.com/qqwweee/keras-yolo3 - github:https://github.com/marvis/pytorch-yolo3 - github:https://github.com/ayooshkathuria/pytorch-yolo-v3 - github:https://github.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch - github:https://github.com/eriklindernoren/PyTorch-YOLOv3 - github:https://github.com/ultralytics/yolov3 - github:https://github.com/BobLiu20/YOLOv3_PyTorch - github:https://github.com/andy-yun/pytorch-0.4-yolov3 - github:https://github.com/DeNA/PyTorch_YOLOv3 ## YOLT **You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery** - intro: Small Object Detection - arxiv:https://arxiv.org/abs/1805.09512 - github:https://github.com/avanetten/yolt ## SSD **SSD: Single Shot MultiBox Detector** [![img](https://camo.githubusercontent.com/ad9b147ed3a5f48ffb7c3540711c15aa04ce49c6/687474703a2f2f7777772e63732e756e632e6564752f7e776c69752f7061706572732f7373642e706e67)](https://camo.githubusercontent.com/ad9b147ed3a5f48ffb7c3540711c15aa04ce49c6/687474703a2f2f7777772e63732e756e632e6564752f7e776c69752f7061706572732f7373642e706e67) - intro: ECCV 2016 Oral - arxiv: <http://arxiv.org/abs/1512.02325> - paper: <http://www.cs.unc.edu/~wliu/papers/ssd.pdf> - slides: [http://www.cs.unc.edu/%7Ewliu/papers/ssd_eccv2016_slide.pdf](http://www.cs.unc.edu/~wliu/papers/ssd_eccv2016_slide.pdf) - github(Official): <https://github.com/weiliu89/caffe/tree/ssd> - video: <http://weibo.com/p/2304447a2326da963254c963c97fb05dd3a973> - github: <https://github.com/zhreshold/mxnet-ssd> - github: <https://github.com/zhreshold/mxnet-ssd.cpp> - github: <https://github.com/rykov8/ssd_keras> - github: <https://github.com/balancap/SSD-Tensorflow> - github: <https://github.com/amdegroot/ssd.pytorch> - github(Caffe): <https://github.com/chuanqi305/MobileNet-SSD> **What's the diffience in performance between this new code you pushed and the previous code? #327** <https://github.com/weiliu89/caffe/issues/327> ## DSSD **DSSD : Deconvolutional Single Shot Detector** - intro: UNC Chapel Hill & Amazon Inc - arxiv: <https://arxiv.org/abs/1701.06659> - github: <https://github.com/chengyangfu/caffe/tree/dssd> - github: <https://github.com/MTCloudVision/mxnet-dssd> - demo: <http://120.52.72.53/www.cs.unc.edu/c3pr90ntc0td/~cyfu/dssd_lalaland.mp4> **Enhancement of SSD by concatenating feature maps for object detection** - intro: rainbow SSD (R-SSD) - arxiv: <https://arxiv.org/abs/1705.09587> **Context-aware Single-Shot Detector** - keywords: CSSD, DiCSSD, DeCSSD, effective receptive fields (ERFs), theoretical receptive fields (TRFs) - arxiv: <https://arxiv.org/abs/1707.08682> **Feature-Fused SSD: Fast Detection for Small Objects** <https://arxiv.org/abs/1709.05054> ## FSSD **FSSD: Feature Fusion Single Shot Multibox Detector** <https://arxiv.org/abs/1712.00960> **Weaving Multi-scale Context for Single Shot Detector** - intro: WeaveNet - keywords: fuse multi-scale information - arxiv: <https://arxiv.org/abs/1712.03149> ## ESSD **Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network** <https://arxiv.org/abs/1801.05918> **Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection** <https://arxiv.org/abs/1802.06488> ## MDSSD **MDSSD: Multi-scale Deconvolutional Single Shot Detector for small objects** - arxiv: https://arxiv.org/abs/1805.07009 ## Pelee **Pelee: A Real-Time Object Detection System on Mobile Devices** https://github.com/Robert-JunWang/Pelee - intro: (ICLR 2018 workshop track) - arxiv: https://arxiv.org/abs/1804.06882 - github: https://github.com/Robert-JunWang/Pelee ## Fire SSD **Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device** - intro:low cost, fast speed and high mAP on factor edge computing devices - arxiv:https://arxiv.org/abs/1806.05363 ## R-FCN **R-FCN: Object Detection via Region-based Fully Convolutional Networks** - arxiv: <http://arxiv.org/abs/1605.06409> - github: <https://github.com/daijifeng001/R-FCN> - github(MXNet): <https://github.com/msracver/Deformable-ConvNets/tree/master/rfcn> - github: <https://github.com/Orpine/py-R-FCN> - github: <https://github.com/PureDiors/pytorch_RFCN> - github: <https://github.com/bharatsingh430/py-R-FCN-multiGPU> - github: <https://github.com/xdever/RFCN-tensorflow> **R-FCN-3000 at 30fps: Decoupling Detection and Classification** <https://arxiv.org/abs/1712.01802> **Recycle deep features for better object detection** - arxiv: <http://arxiv.org/abs/1607.05066> ## FPN **Feature Pyramid Networks for Object Detection** - intro: Facebook AI Research - arxiv: <https://arxiv.org/abs/1612.03144> **Action-Driven Object Detection with Top-Down Visual Attentions** - arxiv: <https://arxiv.org/abs/1612.06704> **Beyond Skip Connections: Top-Down Modulation for Object Detection** - intro: CMU & UC Berkeley & Google Research - arxiv: <https://arxiv.org/abs/1612.06851> **Wide-Residual-Inception Networks for Real-time Object Detection** - intro: Inha University - arxiv: <https://arxiv.org/abs/1702.01243> **Attentional Network for Visual Object Detection** - intro: University of Maryland & Mitsubishi Electric Research Laboratories - arxiv: <https://arxiv.org/abs/1702.01478> **Learning Chained Deep Features and Classifiers for Cascade in Object Detection** - keykwords: CC-Net - intro: chained cascade network (CC-Net). 81.1% mAP on PASCAL VOC 2007 - arxiv: <https://arxiv.org/abs/1702.07054> **DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling** - intro: ICCV 2017 (poster) - arxiv: <https://arxiv.org/abs/1703.10295> **Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries** - intro: CVPR 2017 - arxiv: <https://arxiv.org/abs/1704.03944> **Spatial Memory for Context Reasoning in Object Detection** - arxiv: <https://arxiv.org/abs/1704.04224> **Accurate Single Stage Detector Using Recurrent Rolling Convolution** - intro: CVPR 2017. SenseTime - keywords: Recurrent Rolling Convolution (RRC) - arxiv: <https://arxiv.org/abs/1704.05776> - github: <https://github.com/xiaohaoChen/rrc_detection> **Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection** <https://arxiv.org/abs/1704.05775> **LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems** - intro: Embedded Vision Workshop in CVPR. UC San Diego & Qualcomm Inc - arxiv: <https://arxiv.org/abs/1705.05922> **Point Linking Network for Object Detection** - intro: Point Linking Network (PLN) - arxiv: <https://arxiv.org/abs/1706.03646> **Perceptual Generative Adversarial Networks for Small Object Detection** <https://arxiv.org/abs/1706.05274> **Few-shot Object Detection** <https://arxiv.org/abs/1706.08249> **Yes-Net: An effective Detector Based on Global Information** <https://arxiv.org/abs/1706.09180> **SMC Faster R-CNN: Toward a scene-specialized multi-object detector** <https://arxiv.org/abs/1706.10217> **Towards lightweight convolutional neural networks for object detection** <https://arxiv.org/abs/1707.01395> **RON: Reverse Connection with Objectness Prior Networks for Object Detection** - intro: CVPR 2017 - arxiv: <https://arxiv.org/abs/1707.01691> - github: <https://github.com/taokong/RON> **Mimicking Very Efficient Network for Object Detection** - intro: CVPR 2017. SenseTime & Beihang University - paper: <http://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Mimicking_Very_Efficient_CVPR_2017_paper.pdf> **Residual Features and Unified Prediction Network for Single Stage Detection** <https://arxiv.org/abs/1707.05031> **Deformable Part-based Fully Convolutional Network for Object Detection** - intro: BMVC 2017 (oral). Sorbonne Universités & CEDRIC - arxiv: <https://arxiv.org/abs/1707.06175> **Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors** - intro: ICCV 2017 - arxiv: <https://arxiv.org/abs/1707.06399> **Recurrent Scale Approximation for Object Detection in CNN** - intro: ICCV 2017 - keywords: Recurrent Scale Approximation (RSA) - arxiv: <https://arxiv.org/abs/1707.09531> - github: <https://github.com/sciencefans/RSA-for-object-detection> ## DSOD **DSOD: Learning Deeply Supervised Object Detectors from Scratch** ![img](https://user-images.githubusercontent.com/3794909/28934967-718c9302-78b5-11e7-89ee-8b514e53e23c.png) - intro: ICCV 2017. Fudan University & Tsinghua University & Intel Labs China - arxiv: <https://arxiv.org/abs/1708.01241> - github: <https://github.com/szq0214/DSOD> - github:https://github.com/Windaway/DSOD-Tensorflow - github:https://github.com/chenyuntc/dsod.pytorch **Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids** - arxiv:https://arxiv.org/abs/1712.00886 - github:https://github.com/szq0214/GRP-DSOD **Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages** - intro: BMVC 2018 - arXiv: https://arxiv.org/abs/1807.11013 **Object Detection from Scratch with Deep Supervision** - intro: This is an extended version of DSOD - arXiv: https://arxiv.org/abs/1809.09294 ## RetinaNet **Focal Loss for Dense Object Detection** - intro: ICCV 2017 Best student paper award. Facebook AI Research - keywords: RetinaNet - arxiv: <https://arxiv.org/abs/1708.02002> **CoupleNet: Coupling Global Structure with Local Parts for Object Detection** - intro: ICCV 2017 - arxiv: <https://arxiv.org/abs/1708.02863> **Incremental Learning of Object Detectors without Catastrophic Forgetting** - intro: ICCV 2017. Inria - arxiv: <https://arxiv.org/abs/1708.06977> **Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection** <https://arxiv.org/abs/1709.04347> **StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection** <https://arxiv.org/abs/1709.05788> **Dynamic Zoom-in Network for Fast Object Detection in Large Images** <https://arxiv.org/abs/1711.05187> **Zero-Annotation Object Detection with Web Knowledge Transfer** - intro: NTU, Singapore & Amazon - keywords: multi-instance multi-label domain adaption learning framework - arxiv: <https://arxiv.org/abs/1711.05954> ## MegDet **MegDet: A Large Mini-Batch Object Detector** - intro: Peking University & Tsinghua University & Megvii Inc - arxiv: <https://arxiv.org/abs/1711.07240> **Receptive Field Block Net for Accurate and Fast Object Detection** - intro: RFBNet - arxiv: <https://arxiv.org/abs/1711.07767> - github: <https://github.com//ruinmessi/RFBNet> **An Analysis of Scale Invariance in Object Detection - SNIP** - arxiv: <https://arxiv.org/abs/1711.08189> - github: <https://github.com/bharatsingh430/snip> **Feature Selective Networks for Object Detection** <https://arxiv.org/abs/1711.08879> **Learning a Rotation Invariant Detector with Rotatable Bounding Box** - arxiv: <https://arxiv.org/abs/1711.09405> - github: <https://github.com/liulei01/DRBox> **Scalable Object Detection for Stylized Objects** - intro: Microsoft AI & Research Munich - arxiv: <https://arxiv.org/abs/1711.09822> **Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids** - arxiv: <https://arxiv.org/abs/1712.00886> - github: <https://github.com/szq0214/GRP-DSOD> **Deep Regionlets for Object Detection** - keywords: region selection network, gating network - arxiv: <https://arxiv.org/abs/1712.02408> **Training and Testing Object Detectors with Virtual Images** - intro: IEEE/CAA Journal of Automatica Sinica - arxiv: <https://arxiv.org/abs/1712.08470> **Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video** - keywords: object mining, object tracking, unsupervised object discovery by appearance-based clustering, self-supervised detector adaptation - arxiv: <https://arxiv.org/abs/1712.08832> **Spot the Difference by Object Detection** - intro: Tsinghua University & JD Group - arxiv: <https://arxiv.org/abs/1801.01051> **Localization-Aware Active Learning for Object Detection** - arxiv: <https://arxiv.org/abs/1801.05124> **Object Detection with Mask-based Feature Encoding** - arxiv: <https://arxiv.org/abs/1802.03934> **LSTD: A Low-Shot Transfer Detector for Object Detection** - intro: AAAI 2018 - arxiv: <https://arxiv.org/abs/1803.01529> **Pseudo Mask Augmented Object Detection** <https://arxiv.org/abs/1803.05858> **Revisiting RCNN: On Awakening the Classification Power of Faster RCNN** <https://arxiv.org/abs/1803.06799> **Learning Region Features for Object Detection** - intro: Peking University & MSRA - arxiv: <https://arxiv.org/abs/1803.07066> **Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection** - intro: Singapore Management University & Zhejiang University - arxiv: <https://arxiv.org/abs/1803.08208> **Object Detection for Comics using Manga109 Annotations** - intro: University of Tokyo & National Institute of Informatics, Japan - arxiv: <https://arxiv.org/abs/1803.08670> **Task-Driven Super Resolution: Object Detection in Low-resolution Images** - arxiv: <https://arxiv.org/abs/1803.11316> **Transferring Common-Sense Knowledge for Object Detection** - arxiv: <https://arxiv.org/abs/1804.01077> **Multi-scale Location-aware Kernel Representation for Object Detection** - intro: CVPR 2018 - arxiv: <https://arxiv.org/abs/1804.00428> - github: <https://github.com/Hwang64/MLKP> **Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors** - intro: National University of Defense Technology - arxiv: https://arxiv.org/abs/1804.04606 **Robust Physical Adversarial Attack on Faster R-CNN Object Detector** - arxiv: https://arxiv.org/abs/1804.05810 ## RefineNet **Single-Shot Refinement Neural Network for Object Detection** - intro: CVPR 2018 - arxiv: <https://arxiv.org/abs/1711.06897> - github: <https://github.com/sfzhang15/RefineDet> - github: https://github.com/lzx1413/PytorchSSD - github: https://github.com/ddlee96/RefineDet_mxnet - github: https://github.com/MTCloudVision/RefineDet-Mxnet ## DetNet **DetNet: A Backbone network for Object Detection** - intro: Tsinghua University & Face++ - arxiv: https://arxiv.org/abs/1804.06215 ## SSOD **Self-supervisory Signals for Object Discovery and Detection** - Google Brain - arxiv:https://arxiv.org/abs/1806.03370 ## CornerNet **CornerNet: Detecting Objects as Paired Keypoints** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1808.01244 - github: <https://github.com/umich-vl/CornerNet> ## M2Det **M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network** - intro: AAAI 2019 - arXiv: https://arxiv.org/abs/1811.04533 - github: https://github.com/qijiezhao/M2Det ## 3D Object Detection **3D Backbone Network for 3D Object Detection** - arXiv: https://arxiv.org/abs/1901.08373 **LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDARs** - arxiv: https://arxiv.org/abs/1805.04902 - github: https://github.com/CPFL/Autoware/tree/feature/cnn_lidar_detection ## ZSD（Zero-Shot Object Detection） **Zero-Shot Detection** - intro: Australian National University - keywords: YOLO - arxiv: <https://arxiv.org/abs/1803.07113> **Zero-Shot Object Detection** - arxiv: https://arxiv.org/abs/1804.04340 **Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts** - arxiv: https://arxiv.org/abs/1803.06049 **Zero-Shot Object Detection by Hybrid Region Embedding** - arxiv: https://arxiv.org/abs/1805.06157 ## OSD（One-Shot Object Detection） **Comparison Network for One-Shot Conditional Object Detection** - arXiv: https://arxiv.org/abs/1904.02317 **One-Shot Object Detection** RepMet: Representative-based metric learning for classification and one-shot object detection - intro: IBM Research AI - arxiv:https://arxiv.org/abs/1806.04728 - github: TODO ## Weakly Supervised Object Detection **Weakly Supervised Object Detection in Artworks** - intro: ECCV 2018 Workshop Computer Vision for Art Analysis - arXiv: https://arxiv.org/abs/1810.02569 - Datasets: https://wsoda.telecom-paristech.fr/downloads/dataset/IconArt_v1.zip **Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation** - intro: CVPR 2018 - arXiv: https://arxiv.org/abs/1803.11365 - homepage: https://naoto0804.github.io/cross_domain_detection/ - paper: http://openaccess.thecvf.com/content_cvpr_2018/html/Inoue_Cross-Domain_Weakly-Supervised_Object_CVPR_2018_paper.html - github: https://github.com/naoto0804/cross-domain-detection ## Softer-NMS **《Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection》** - intro: CMU & Face++ - arXiv: https://arxiv.org/abs/1809.08545 - github: https://github.com/yihui-he/softer-NMS ## 2019 **Feature Selective Anchor-Free Module for Single-Shot Object Detection** - intro: CVPR 2019 - arXiv: https://arxiv.org/abs/1903.00621 **Object Detection based on Region Decomposition and Assembly** - intro: AAAI 2019 - arXiv: https://arxiv.org/abs/1901.08225 **Bottom-up Object Detection by Grouping Extreme and Center Points** - intro: one stage 43.2% on COCO test-dev - arXiv: https://arxiv.org/abs/1901.08043 - github: https://github.com/xingyizhou/ExtremeNet **ORSIm Detector: A Novel Object Detection Framework in Optical Remote Sensing Imagery Using Spatial-Frequency Channel Features** - intro: IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING - arXiv: https://arxiv.org/abs/1901.07925 **Consistent Optimization for Single-Shot Object Detection** - intro: improves RetinaNet from 39.1 AP to 40.1 AP on COCO datase - arXiv: https://arxiv.org/abs/1901.06563 **Learning Pairwise Relationship for Multi-object Detection in Crowded Scenes** - arXiv: https://arxiv.org/abs/1901.03796 **RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free** - arXiv: https://arxiv.org/abs/1901.03353 - github: https://github.com/chengyangfu/retinamask **Region Proposal by Guided Anchoring** - intro: CUHK - SenseTime Joint Lab - arXiv: https://arxiv.org/abs/1901.03278 **Scale-Aware Trident Networks for Object Detection** - intro: mAP of **48.4** on the COCO dataset - arXiv: https://arxiv.org/abs/1901.01892 ## 2018 **Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions** - arXiv: https://arxiv.org/abs/1812.11901 **Strong-Weak Distribution Alignment for Adaptive Object Detection** - arXiv: https://arxiv.org/abs/1812.04798 **AutoFocus: Efficient Multi-Scale Inference** - intro: AutoFocus obtains an **mAP of 47.9%** (68.3% at 50% overlap) on the **COCO test-dev** set while processing **6.4 images per second on a Titan X (Pascal) GPU** - arXiv: https://arxiv.org/abs/1812.01600 **NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection** - intro: Google Could - arXiv: https://arxiv.org/abs/1812.00124 **SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection** - intro: UC Berkeley - arXiv: https://arxiv.org/abs/1812.00929 **Grid R-CNN** - intro: SenseTime - arXiv: https://arxiv.org/abs/1811.12030 **Deformable ConvNets v2: More Deformable, Better Results** - intro: Microsoft Research Asia - arXiv: https://arxiv.org/abs/1811.11168 **Anchor Box Optimization for Object Detection** - intro: Microsoft Research - arXiv: https://arxiv.org/abs/1812.00469 **Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects** - intro: https://arxiv.org/abs/1811.12152 **NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection** - arXiv: https://arxiv.org/abs/1812.00124 **Learning RoI Transformer for Detecting Oriented Objects in Aerial Images** - arXiv: https://arxiv.org/abs/1812.00155 **Integrated Object Detection and Tracking with Tracklet-Conditioned Detection** - intro: Microsoft Research Asia - arXiv: https://arxiv.org/abs/1811.11167 **Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection** - arXiv: https://arxiv.org/abs/1811.11318 **Gradient Harmonized Single-stage Detector** - intro: AAAI 2019 - arXiv: https://arxiv.org/abs/1811.05181 **CFENet: Object Detection with Comprehensive Feature Enhancement Module** - intro: ACCV 2018 - github: https://github.com/qijiezhao/CFENet **DeRPN: Taking a further step toward more general object detection** - intro: AAAI 2019 - arXiv: https://arxiv.org/abs/1811.06700 - github: https://github.com/HCIILAB/DeRPN **Hybrid Knowledge Routed Modules for Large-scale Object Detection** - intro: Sun Yat-Sen University & Huawei Noah’s Ark Lab - arXiv: https://arxiv.org/abs/1810.12681 - github: https://github.com/chanyn/HKRM **《Receptive Field Block Net for Accurate and Fast Object Detection》** - intro: ECCV 2018 - arXiv: [https://arxiv.org/abs/1711.07767](https://arxiv.org/abs/1711.07767) - github: [https://github.com/ruinmessi/RFBNet](https://github.com/ruinmessi/RFBNet) **Deep Feature Pyramid Reconfiguration for Object Detection** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1808.07993 **Unsupervised Hard Example Mining from Videos for Improved Object Detection** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1808.04285 **Acquisition of Localization Confidence for Accurate Object Detection** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1807.11590 - github: https://github.com/vacancy/PreciseRoIPooling **Toward Scale-Invariance and Position-Sensitive Region Proposal Networks** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1807.09528 **MetaAnchor: Learning to Detect Objects with Customized Anchors** - arxiv: https://arxiv.org/abs/1807.00980 **Relation Network for Object Detection** - intro: CVPR 2018 - arxiv: https://arxiv.org/abs/1711.11575 - github:https://github.com/msracver/Relation-Networks-for-Object-Detection **Quantization Mimic: Towards Very Tiny CNN for Object Detection** - Tsinghua University1 & The Chinese University of Hong Kong2 &SenseTime3 - arxiv: https://arxiv.org/abs/1805.02152 **Learning Rich Features for Image Manipulation Detection** - intro: CVPR 2018 Camera Ready - arxiv: https://arxiv.org/abs/1805.04953 **SNIPER: Efficient Multi-Scale Training** - arxiv:https://arxiv.org/abs/1805.09300 - github:https://github.com/mahyarnajibi/SNIPER **Soft Sampling for Robust Object Detection** - intro: the robustness of object detection under the presence of missing annotations - arxiv:https://arxiv.org/abs/1806.06986 **Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria** - intro: TNNLS 2018 - arxiv:https://arxiv.org/abs/1807.00147 - code: http://kezewang.com/codes/ASM_ver1.zip ## Other **R3-Net: A Deep Network for Multi-oriented Vehicle Detection in Aerial Images and Videos** - arxiv: https://arxiv.org/abs/1808.05560 - youtube: https://youtu.be/xCYD-tYudN0 # Detection Toolbox - [Detectron(FAIR)](https://github.com/facebookresearch/Detectron): Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including [Mask R-CNN](https://arxiv.org/abs/1703.06870). It is written in Python and powered by the [Caffe2](https://github.com/caffe2/caffe2) deep learning framework. - [Detectron2](https://github.com/facebookresearch/detectron2): Detectron2 is FAIR's next-generation research platform for object detection and segmentation. - [maskrcnn-benchmark(FAIR)](https://github.com/facebookresearch/maskrcnn-benchmark): Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch. - [mmdetection(SenseTime&CUHK)](https://github.com/open-mmlab/mmdetection): mmdetection is an open source object detection toolbox based on PyTorch. It is a part of the open-mmlab project developed by [Multimedia Laboratory, CUHK](http://mmlab.ie.cuhk.edu.hk/).

Developer Tools ML Frameworks

7.5K Github Stars

Open Source

ICCV2025-Papers-with-Code

# ICCV 2025 论文和开源项目合集(Papers with Code) ICCV 2025 Accepance Rate of 24% = 2699 / 11239 > 注1：欢迎各位大佬提交issue，分享ICCV 2025论文和开源项目！ > > 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision > > - [CVPR 2025](https://github.com/amusi/CVPR2025-Papers-with-Code) > - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code) 欢迎扫码加入【CVer学术交流群】，可以获取ICCV 2025等最前沿工作！这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料，快加入学起来！ ![](CVer学术交流群.png) # 【ICCV 2025 论文和开源代码目录】 - [3DGS(Gaussian Splatting)](#3DGS) - [Agent)](#Agent) - [Avatars](#Avatars) - [Backbone](#Backbone) - [CLIP](#CLIP) - [Mamba](#Mamba) - [Embodied AI](#Embodied-AI) - [GAN](#GAN) - [GNN](#GNN) - [多模态大语言模型(MLLM)](#MLLM) - [大语言模型(LLM)](#LLM) - [世界模型(World Model)](#WM) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [扩散模型(Diffusion Models)](#Diffusion) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [异常检测(Anomaly Detection)](#Anomaly-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像(Medical Image)](#MI) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [3D Visual Grounding(3D视觉定位)](#3DVG) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [3D生成(3D Generation)](#3D-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为检测(Action Detection)](#Action-Detection) - [具身智能(Embodied AI)](#Embodied) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [暗光图像增强(Low-light Image Enhancement)](#Low-light) - [场景图生成(Scene Graph Generation)](#SGG) - [风格迁移(Style Transfer)](#ST) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) - [压缩感知(Compressive Sensing)](#CS) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) <a name="3DGS"></a> # 3DGS(Gaussian Splatting) <a name="Agent"></a> # Agent <a name="Avatars"></a> # Avatars # Backbone **TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba** - Paper: https://arxiv.org/abs/2411.17473 - Code: https://github.com/xwmaxwma/TinyViM <a name="CLIP"></a> # CLIP <a name="Mamba"></a> # Mamba **TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba** - Paper: https://arxiv.org/abs/2411.17473 - Code: https://github.com/xwmaxwma/TinyViM **Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers** - Project：https://tiger-ai-lab.github.io/Vamba/ - Paper：https://arxiv.org/abs/2503.11579 - Code：https://github.com/TIGER-AI-Lab/Vamba <a name="Embodied-AI"></a> # Embodied AI <a name="GAN"></a> # GAN <a name="OCR"></a> # OCR <a name="NeRF"></a> # NeRF <a name="DETR"></a> # DETR <a name="Prompt"></a> # Prompt <a name="MLLM"></a> # 多模态大语言模型(MLLM) **FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers** - Paper: https://arxiv.org/abs/2501.16297 - Code: https://github.com/JiuTian-VL/JiuTian-FALCON - Project: https://jiutian-vl.github.io/FALCON.github.io/ <a name="LLM"></a> # 大语言模型(LLM) <a name="WM"></a> # World Model(世界模型) **Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning** - Project: https://yijun-yang.github.io/MeWM/ - Paper: https://arxiv.org/abs/2506.02327 - Code: https://github.com/scott-yjyang/MeWM <a name="ReID"></a> # ReID(重识别) <a name="Diffusion"></a> # 扩散模型(Diffusion Models) **From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers** - Paper: https://arxiv.org/abs/2503.06923 - Code: https://github.com/Shenyi-Z/TaylorSeer <a name="Vision-Transformer"></a> # Vision Transformer <a name="VL"></a> # 视觉和语言(Vision-Language) <a name="Object-Detection"></a> # 目标检测(Object Detection) <a name="Anomaly-Detection"></a> # 异常检测(Anomaly Detection) <a name="VT"></a> # 目标跟踪(Object Tracking) <a name="MI"></a> # 医学图像(Medical Image) **Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning** - Project: https://yijun-yang.github.io/MeWM/ - Paper: https://arxiv.org/abs/2506.02327 - Code: https://github.com/scott-yjyang/MeWM # 医学图像分割(Medical Image Segmentation) <a name="Autonomous-Driving"></a> # 自动驾驶(Autonomous Driving) **Where, What, Why: Towards Explainable Driver Attention Prediction** - Paper: https://arxiv.org/abs/2506.23088 - Code: https://github.com/yuchen2199/Explainable-Driver-Attention-Prediction - Project: https://github.com/yuchen2199/Explainable-Driver-Attention-Prediction **ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones** - Paper: https://arxiv.org/abs/2406.07661 - Code: https://github.com/anuragxel/roadwork-dataset - Project: https://www.cs.cmu.edu/~ILIM/roadwork_dataset/ **DriveMM: All-in-One Large Multimodal Model for Autonomous Driving** - Project: https://zhijian11.github.io/DriveMM/ - Paper: https://arxiv.org/abs/2412.07689 - Code: https://github.com/zhijian11/DriveMM # 3D点云(3D-Point-Cloud) <a name="3DOD"></a> # 3D目标检测(3D Object Detection) <a name="3DOD"></a> # 3D语义分割(3D Semantic Segmentation) <a name="LLV"></a> # Low-level Vision **EAMamba: Efficient All-Around Vision State Space Model for Image Restoration** - Paper: https://arxiv.org/abs/2506.22246 - Code: https://github.com/daidaijr/EAMamba <a name="SR"></a> # 超分辨率(Super-Resolution) <a name="Denoising"></a> # 去噪(Denoising) ## 图像去噪(Image Denoising) <a name="3D-Human-Pose-Estimation"></a> # 3D人体姿态估计(3D Human Pose Estimation) <a name="3DVG"></a> #3D Visual Grounding(3D视觉定位) <a name="Image-Generation"></a> # 图像生成(Image Generation) **DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models** - Paper: https://github.com/limuloo/DreamRenderer - Code: https://arxiv.org/abs/2503.12885 <a name="Video-Generation"></a> # 视频生成(Video Generation) <a name="Image-Editing"></a> # 图像编辑(Image Editing) **Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing** - Project: https://eff-edit.github.io - Paper: https://arxiv.org/abs/2503.10270 - Code: https://github.com/yuriYanZeXuan/EEdit <a name="Video-Editing"></a> # 视频编辑(Video Editing) <a name="3D-Generation"></a> # 3D生成(3D Generation) <a name="3D-Reconstruction"></a> # 3D重建(3D Reconstruction) <a name="HMG"></a> # 人体运动生成(Human Motion Generation) <a name="Video-Understanding"></a> # 视频理解(Video Understanding) **Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers** - Project：https://tiger-ai-lab.github.io/Vamba/ - Paper：https://arxiv.org/abs/2503.11579 - Code：https://github.com/TIGER-AI-Lab/Vamba <a name="Embodied"></a> # 具身智能(Embodied AI) <a name="KD"></a> # 知识蒸馏(Knowledge Distillation) <a name="Depth-Estimation"></a> # 深度估计(Depth Estimation) <a name="Stereo-Matching"></a> # 立体匹配(Stereo Matching) <a name="Low-light"></a> # 暗光图像增强(Low-light Image Enhancement) <a name="IC"></a> # 图像压缩(Image Compression)](#IC) <a name="SGG"></a> # 场景图生成(Scene Graph Generation) <a name="ST"></a> # 风格迁移(Style Transfer) <a name="IQA"></a> # 图像质量评价(Image Quality Assessment) <a name="Video-Quality-Assessment"></a> # 视频质量评价(Video Quality Assessment) <a name="CS"></a> # 压缩感知(Compressive Sensing) <a name="Datasets"></a> # 数据集(Datasets) **ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones** - Paper: https://arxiv.org/abs/2406.07661 - Code: https://github.com/anuragxel/roadwork-dataset - Project: https://www.cs.cmu.edu/~ILIM/roadwork_dataset/ <a name="Others"></a> # 其他(Others) **Music Grounding by Short Video** - Project: https://rucmm.github.io/VMMR/ - Paper: https://arxiv.org/abs/2408.16990 - Code link: https://github.com/xxayt/MGSV

Education & Learning

2.9K Github Stars

Open Source

ECCV2024-Papers-with-Code

# ECCV 2024 论文和开源项目合集(Papers with Code) ECCV 2024 decisions are now available！ > 注1：欢迎各位大佬提交issue，分享ECCV 2024论文和开源项目！ > > 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision > > - [CVPR 2024](https://github.com/amusi/CVPR2024-Papers-with-Code) > - [ECCV 2022](ECCV2022-Papers-with-Code.md) > - [ECCV 2020](ECCV2020-Papers-with-Code.md) 想看ECCV 2024和最新最全的顶会工作，欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！ ![](CVer学术交流群.png) # 【ECCV 2024 论文开源目录】 - [3DGS(Gaussian Splatting)](#3DGS) - [Mamba / SSM)](#Mamba) - [Avatars](#Avatars) - [Backbone](#Backbone) - [CLIP](#CLIP) - [MAE](#MAE) - [Embodied AI](#Embodied-AI) - [GAN](#GAN) - [GNN](#GNN) - [多模态大语言模型(MLLM)](#MLLM) - [大语言模型(LLM)](#LLM) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [Prompt](#Prompt) - [扩散模型(Diffusion Models)](#Diffusion) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [异常检测(Anomaly Detection)](#Anomaly-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像(Medical Image)](#MI) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [3D生成(3D Generation)](#3D-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为识别(Action Recognition)](#Action-Recognition) - [行为检测(Action Detection)](#Action-Detection) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [场景图生成(Scene Graph Generation)](#SGG) - [计数(Counting)](#Counting) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) <a name="3DGS"></a> # 3DGS(Gaussian Splatting) **MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images** - Project: https://donydchen.github.io/mvsplat - Paper: https://arxiv.org/abs/2403.14627 - Code：https://github.com/donydchen/mvsplat **CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians** - Paper: https://arxiv.org/abs/2404.01133 - Code: https://github.com/DekuLiuTesla/CityGaussian **FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting** - Project: https://zehaozhu.github.io/FSGS/ - Paper: https://arxiv.org/abs/2312.00451 - Code: https://github.com/VITA-Group/FSGS <a name="Mamba"></a> # Mamba / SSM **VideoMamba: State Space Model for Efficient Video Understanding** - Paper: https://arxiv.org/abs/2403.06977 - Code: https://github.com/OpenGVLab/VideoMamba **ZIGMA: A DiT-style Zigzag Mamba Diffusion Model** - Paper: https://arxiv.org/abs/2403.13802 - Code: https://taohu.me/zigma/ <a name="Avatars"></a> # Avatars <a name="Backbone"></a> # Backbone <a name="CLIP"></a> # CLIP <a name="MAE"></a> # MAE <a name="Embodied-AI"></a> # Embodied AI <a name="GAN"></a> # GAN <a name="OCR"></a> # OCR **Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors** - Paper: https://arxiv.org/pdf/2312.05286 - Code: https://github.com/SJTU-DeepVisionLab/FreeReal **PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer** - Paper: https://arxiv.org/abs/2407.07764 - Code: https://github.com/SJTU-DeepVisionLab/PosFormer <a name="Occupancy"></a> # Occupancy **Fully Sparse 3D Occupancy Prediction** - Paper: https://arxiv.org/abs/2312.17118 - Code: https://github.com/MCG-NJU/SparseOcc <a name="NeRF"></a> # NeRF **NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields** - Project: https://nerf-mae.github.io/ - Paper: https://arxiv.org/pdf/2404.01300 - Code: https://github.com/zubair-irshad/NeRF-MAE <a name="DETR"></a> # DETR <a name="Prompt"></a> # Prompt <a name="MLLM"></a> # 多模态大语言模型(MLLM) **SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant** - Paper: https://arxiv.org/abs/2403.11299 - Code: https://github.com/heliossun/SQ-LLaVA **ControlCap: Controllable Region-level Captioning** - Paper: https://arxiv.org/abs/2401.17910 - Code: https://github.com/callsys/ControlCap <a name="LLM"></a> # 大语言模型(LLM) <a name="NAS"></a> # NAS <a name="ReID"></a> # ReID(重识别) <a name="Diffusion"></a> # 扩散模型(Diffusion Models) **ZIGMA: A DiT-style Zigzag Mamba Diffusion Model** - Paper: https://arxiv.org/abs/2403.13802 - Code: https://taohu.me/zigma/ **Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation** - Paper: https://arxiv.org/abs/2403.16394 - Code: https://github.com/zdxdsw/skewed_relations_T2I **The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization** - Project: https://ut-mao.github.io/noise.github.io/ - Paper: https://arxiv.org/abs/2312.08872 - Code: https://github.com/UT-Mao/Initial-Noise-Construction <a name="Vision-Transformer"></a> # Vision Transformer **GiT: Towards Generalist Vision Transformer through Universal Language Interface** - Paper: https://arxiv.org/abs/2403.09394 - Code: https://github.com/Haiyang-W/GiT <a name="VL"></a> # 视觉和语言(Vision-Language) **GalLoP: Learning Global and Local Prompts for Vision-Language Models** - Paper：https://arxiv.org/abs/2407.01400 <a name="Object-Detection"></a> # 目标检测(Object Detection) **Relation DETR: Exploring Explicit Position Relation Prior for Object Detection** - Paper: https://arxiv.org/abs/2407.11699v1 - Code: https://github.com/xiuqhou/Relation-DETR - Dataset: https://huggingface.co/datasets/xiuqhou/SA-Det-100k **Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector** - Project: http://yuqianfu.com/CDFSOD-benchmark/ - Paper: https://arxiv.org/pdf/2402.03094 - Code: https://github.com/lovelyqian/CDFSOD-benchmark <a name="Anomaly-Detection"></a> # 异常检测(Anomaly Detection) <a name="VT"></a> # 目标跟踪(Object Tracking) <a name="Semantic-Segmentation"></a> # 语义分割(Semantic Segmentation) **Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation** - Paper: https://arxiv.org/abs/2405.06228 - Code: https://github.com/nizhenliang/CGRSeg <a name="MI"></a> # 医学图像(Medical Image) **Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging** - Paper: https://arxiv.org/abs/2311.16914 - Code: https://github.com/peirong26/Brain-ID **FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification** - Project: https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k - Paper : https://arxiv.org/abs/2407.08813 - Dataset: https://drive.google.com/drive/u/1/folders/1huH93JVeXMj9rK6p1OZRub868vv0UK0O - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain <a name="MIS"></a> # 医学图像分割(Medical Image Segmentation) **ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image** - Project: https://scribbleprompt.csail.mit.edu/ - Paper: https://arxiv.org/abs/2312.07381 - Code: https://github.com/halleewong/ScribblePrompt **AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking** - Paper: https://arxiv.org/abs/2407.06468 - Code: https://github.com/ricklisz/AnatoMask **Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures** - Paper: https://arxiv.org/abs/2407.14754 - Code: https://github.com/cbmi-group/FFM-Multi-Decoder-Network <a name="VOS"></a> # 视频目标分割(Video Object Segmentation) **DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries** - Project: https://zhang-tao-whu.github.io/projects/DVIS_DAQ/ - Paper: https://arxiv.org/abs/2404.00086 - Code: https://github.com/zhang-tao-whu/DVIS_Plus <a name="Autonomous-Driving"></a> # 自动驾驶(Autonomous Driving) **Fully Sparse 3D Occupancy Prediction** - Paper: https://arxiv.org/abs/2312.17118 - Code: https://github.com/MCG-NJU/SparseOcc **milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing** - Paper: https://arxiv.org/abs/2306.17010 - Code: https://github.com/Toytiny/milliFlow/ **4D Contrastive Superflows are Dense 3D Representation Learners** - Paper : https://arxiv.org/abs/2407.06190 - Code: https://github.com/Xiangxu-0103/SuperFlow <a name="3D-Point-Cloud"></a> # 3D点云(3D-Point-Cloud) <a name="3DOD"></a> # 3D目标检测(3D Object Detection) **3D Small Object Detection with Dynamic Spatial Pruning** - Project: https://xuxw98.github.io/DSPDet3D/ - Paper: https://arxiv.org/abs/2305.03716 - Code: https://github.com/xuxw98/DSPDet3D **Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection** - Paper: https://arxiv.org/abs/2402.03634 - Code: https://github.com/LiewFeng/RayDN <a name="3DOD"></a> # 3D语义分割(3D Semantic Segmentation) <a name="Image-Editing"></a> # 图像编辑(Image Editing) <a name="Image-Inpainting"></a> # 图像补全/图像修复(Image Inpainting) **BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion** - Project https://tencentarc.github.io/BrushNet/ - Paper: https://arxiv.org/abs/2403.06976 - Code: https://github.com/TencentARC/BrushNet <a name="Video-Editing"></a> # 视频编辑(Video Editing) <a name="LLV"></a> # Low-level Vision **Restoring Images in Adverse Weather Conditions via Histogram Transformer** - Paper: https://arxiv.org/abs/2407.10172 - Code: https://github.com/sunshangquan/Histoformer **OneRestore: A Universal Restoration Framework for Composite Degradation** - Project https://gy65896.github.io/projects/ECCV2024_OneRestore - Paper: https://arxiv.org/abs/2407.04621 - Code: https://github.com/gy65896/OneRestore # 超分辨率(Super-Resolution) <a name="Denoising"></a> # 去噪(Denoising) ## 图像去噪(Image Denoising) <a name="3D-Human-Pose-Estimation"></a> # 3D人体姿态估计(3D Human Pose Estimation) <a name="Image-Generation"></a> # 图像生成(Image Generation) **Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models** - Paper: https://arxiv.org/abs/2404.07389 - Code: https://github.com/YasminZhang/EBAMA **Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization** - Project: https://kaminyou.com/Dense-Normalization/ - Paper: https://arxiv.org/abs/2407.04245 - Code: https://github.com/Kaminyou/Dense-Normalization **ZIGMA: A DiT-style Zigzag Mamba Diffusion Model** - Paper: https://arxiv.org/abs/2403.13802 - Code: https://taohu.me/zigma/ **Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation** - Paper: https://arxiv.org/abs/2403.16394 - Code: https://github.com/zdxdsw/skewed_relations_T2I <a name="Video-Generation"></a> # 视频生成(Video Generation) **VideoStudio: Generating Consistent-Content and Multi-Scene Videos** - Project: https://vidstudio.github.io/ - Code: https://github.com/FuchenUSTC/VideoStudio <a name="3D-Generation"></a> # 3D生成 <a name="Video-Understanding"></a> # 视频理解(Video Understanding) **VideoMamba: State Space Model for Efficient Video Understanding** - Paper: https://arxiv.org/abs/2403.06977 - Code: https://github.com/OpenGVLab/VideoMamba **C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition** - Paper: https://arxiv.org/abs/2407.06113 - Code: https://github.com/RongchangLi/ZSCAR_C2C <a name="Action-Recognition"></a> # 行为识别(Action Recognition) **SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders** - Paper: https://arxiv.org/abs/2407.13460 - Code: https://github.com/pha123661/SA-DVAE <a name="KD"></a> # 知识蒸馏(Knowledge Distillation) <a name="IC"></a> # 图像压缩(Image Compression) **Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation** - Code: https://github.com/qingshi9974/ECCV2024-AdpatICMH - Paper: http://arxiv.org/abs/2407.09853 <a name="Stereo-Matching"></a> # 立体匹配(Stereo Matching) <a name="SGG"></a> # 场景图生成(Scene Graph Generation) <a name="Counting"></a> # 计数(Counting) **Zero-shot Object Counting with Good Exemplars** - Paper: https://arxiv.org/abs/2407.04948 - Code: https://github.com/HopooLinZ/VA-Count <a name="Video-Quality-Assessment"></a> # 视频质量评价(Video Quality Assessment) <a name="Datasets"></a> # 数据集(Datasets) # 其他(Others) **Multi-branch Collaborative Learning Network for 3D Visual Grounding** - Paper: https://arxiv.org/abs/2407.05363v2 - Code: https://github.com/qzp2018/MCLN **PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers** - Code: https://github.com/ananthu-aniraj/pdiscoformer - Paper: https://arxiv.org/abs/2407.04538 **SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments** - Project: https://fraunhoferhhi.github.io/spvloc/ - Paper: https://arxiv.org/abs/2404.10527 - Code: https://github.com/fraunhoferhhi/spvloc **REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices** - Project: https://xdimlab.github.io/REFRAME/ - Paper: https://arxiv.org/abs/2403.16481 - Code: https://github.com/MARVELOUSJI/REFRAME

AI & Machine Learning Education & Learning

2.3K Github Stars

Open Source

awesome-data-label-tools

# 标注工具大全 - [2D图像](#Image) - [视频](#Video) - [3D](#3D) <a name="Image"></a> ## 2D图像 <a name="Video"></a> ## 视频 <a name="3D"></a> ## 3D

Data Labeling

52 Github Stars

Software by amusi