Home
Softono
a

amusi

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Total Products
8

Software by amusi

CVPR2026-Papers-with-Code
Open Source

CVPR2026-Papers-with-Code

# CVPR 2026 论文和开源项目合集(Papers with Code) CVPR 2026 decisions are now available on OpenReview!25.42% = 4090 / 16092 > 注1:欢迎各位大佬提交issue,分享CVPR 2026论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision > > - [ICCV 2025](https://github.com/amusi/ICCV2025-Papers-with-Code) > - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code) 欢迎扫码加入【CVer学术交流群】,可以获取CVPR 2026等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来! ![](CVer学术交流群.png) # 【CVPR 2026 论文开源目录】 - [3DGS(Gaussian Splatting)](#3DGS) - [Agent)](#Agent) - [Avatars](#Avatars) - [Backbone](#Backbone) - [CLIP](#CLIP) - [Mamba](#Mamba) - [Embodied AI](#Embodied-AI) - [GAN](#GAN) - [GNN](#GNN) - [多模态大语言模型(MLLM)](#MLLM) - [大语言模型(LLM)](#LLM) - [具身智能(Embodied AI)](#Embodied) - [空间智能(Spatial Intelligence](#SI) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [扩散模型(Diffusion Models)](#Diffusion) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [异常检测(Anomaly Detection)](#Anomaly-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像(Medical Image)](#MI) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [3D Visual Grounding(3D视觉定位)](#3DVG) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [3D生成(3D Generation)](#3D-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为检测(Action Detection)](#Action-Detection) - [遥感(Remote)](#Remote) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [视频压缩(Video Compression)](#VC) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [暗光图像增强(Low-light Image Enhancement)](#Low-light) - [场景图生成(Scene Graph Generation)](#SGG) - [图像检索(Image Retrieval)](#Image-Retrieval) - [风格迁移(Style Transfer)](#ST) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) - [压缩感知(Compressive Sensing)](#CS) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) <a name="3DGS"></a> # 3DGS(Gaussian Splatting) **Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting** - Paper: https://arxiv.org/abs/2602.20933 - Code: - Project: https://sk-fun.fun/DropAnSH-GS **Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking** - Paper: https://arxiv.org/abs/2512.01329 - Project: https://haza628.github.io/tagSplat/ **FastGS: Training 3D Gaussian Splatting in 100 Seconds** - Paper: https://arxiv.org/pdf/2511.04283 - Code: https://github.com/fastgs/FastGS - Project: https://fastgs.github.io/ <a name="Agent"></a> # Agent <a name="Avatars"></a> # Avatars # Backbone <a name="CLIP"></a> # CLIP <a name="Mamba"></a> # Mamba <a name="GAN"></a> # GAN <a name="OCR"></a> # OCR <a name="NeRF"></a> # NeRF <a name="DETR"></a> # DETR <a name="Prompt"></a> # Prompt <a name="MLLM"></a> # 多模态大语言模型(MLLM) **Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking** - Paper: https://arxiv.org/abs/2602.20330 - Code: https://github.com/UIUC-MONET/vlm-circuit-tracing **UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark** - Paper: https://arxiv.org/abs/2603.05075 - Code: - Project: https://any2any-mllm.github.io/unim/ <a name="LLM"></a> # 大语言模型(LLM) <a name="Embodied-AI"></a> # 具身智能(Embodied AI) **Wanderland: Geometrically Grounded Simulation for Open-World Embodied AI** - Paper: https://arxiv.org/abs/2511.20620 - Code: https://github.com/ai4ce/wanderland - Project: https://ai4ce.github.io/wanderland/ <a name="SI"></a> # 空间智能(Spatial Intelligence) **Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning** - Paper: https://arxiv.org/abs/2510.27606 - Code: https://github.com/InternLM/Spatial-SSRL - Model: https://huggingface.co/internlm/Spatial-SSRL-7B <a name="NAS"></a> # NAS <a name="ReID"></a> # ReID(重识别) **MOS: Mitigating Optical-SAR Modality Gap for Cross-Modal Ship Re-Identification** - Paper: https://arxiv.org/abs/2512.03404 - Code: https://github.com/yjzhao1019/MOS <a name="Diffusion"></a> # 扩散模型(Diffusion Models) <a name="Vision-Transformer"></a> # Vision Transformer <a name="VL"></a> # 视觉和语言(Vision-Language) **StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues** - Paper: https://arxiv.org/abs/2602.20089 - Code: https://github.com/intelligolabs/StructXLIP **ApET: Approximation-Error Guided Token Compression for Efficient VLMs** - Paper: https://arxiv.org/abs/2602.19870 - Code: https://github.com/MaQianKun0/ApET **Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking** - Paper: https://arxiv.org/abs/2602.20330 - Code: https://github.com/UIUC-MONET/vlm-circuit-tracing <a name="Object-Detection"></a> # 目标检测(Object Detection) <a name="Anomaly-Detection"></a> # 异常检测(Anomaly Detection) <a name="VT"></a> # 目标跟踪(Object Tracking) <a name="MI"></a> # 医学图像(Medical Image) # 医学图像分割(Medical Image Segmentation) **MedCLIPSeg: Probabilistic Vision–Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation** - Paper: https://arxiv.org/abs/2602.20423 - Code: https://github.com/HealthX-Lab/MedCLIPSeg - Project: https://tahakoleilat.github.io/MedCLIPSeg <a name="Autonomous-Driving"></a> # 自动驾驶(Autonomous Driving) **Open-Vocabulary Domain Generalization in Urban-Scene Segmentation** - Paper: https://arxiv.org/pdf/2602.18853 - Code: https://github.com/DZhaoXd/s2_corr **U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences** - Paper: https://arxiv.org/abs/2512.02982 - Code: https://github.com/worldbench/U4D # 3D点云(3D-Point-Cloud) **CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation** - Paper: https://arxiv.org/abs/2602.20409 - Code: https://github.com/SarthakM320/CLIPoint3D <a name="3DOD"></a> # 3D目标检测(3D Object Detection) <a name="3DOD"></a> # 3D语义分割(3D Semantic Segmentation) <a name="LLV"></a> # Low-level Vision <a name="SR"></a> # 超分辨率(Super-Resolution) <a name="Denoising"></a> # 去噪(Denoising) ## 图像去噪(Image Denoising) <a name="3D-Human-Pose-Estimation"></a> # 3D人体姿态估计(3D Human Pose Estimation) <a name="3DVG"></a> #3D Visual Grounding(3D视觉定位) <a name="Image-Generation"></a> # 图像生成(Image Generation) ExpPortrait: Expressive Portrait Generation via Personalized Representation - Paper: https://arxiv.org/abs/2602.19900 - Code: <a name="Video-Generation"></a> # 视频生成(Video Generation) <a name="Image-Editing"></a> # 图像编辑(Image Editing) <a name="Video-Editing"></a> # 视频编辑(Video Editing) <a name="3D-Generation"></a> # 3D生成(3D Generation) <a name="3D-Reconstruction"></a> # 3D重建(3D Reconstruction) **tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction** - Project: https://cwchenwang.github.io/tttLRM/ - Paper: https://arxiv.org/abs/2602.20160 - Code: https://github.com/cwchenwang/tttLRM **Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning** - Project: https://flow3r-project.github.io/ - Paper: https://arxiv.org/abs/2602.20157 - Code: https://github.com/Kidrauh/flow3r **RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing** - Paper: https://arxiv.org/abs/2602.19753 - Code: https://github.com/yyyykf/RAP <a name="HMG"></a> # 人体运动生成(Human Motion Generation) <a name="Video-Understanding"></a> # 视频理解(Video Understanding) <a name="Remote"></a> # 遥感(Remote) Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation - Paper: https://arxiv.org/abs/2602.19863 - Code: None <a name="KD"></a> # 知识蒸馏(Knowledge Distillation) <a name="Depth-Estimation"></a> # 深度估计(Depth Estimation) <a name="Stereo-Matching"></a> # 立体匹配(Stereo Matching) <a name="Low-light"></a> # 暗光图像增强(Low-light Image Enhancement) <a name="IC"></a> # 图像压缩(Image Compression)](#IC) <a name="VC"></a> # 视频压缩(Video Compression)](#VC) **UniComp: Rethinking Video Compression Through Informational Uniqueness** - Paper: https://arxiv.org/abs/2512.03575 - Code: https://github.com/TimeMarker-LLM/UniComp <a name="SGG"></a> # 场景图生成(Scene Graph Generation) <a name="Image-Retrieval"></a> # 图像检索(Image Retrieval) **PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing ** - Paper: https://arxiv.org/abs/2603.04598 - Code: <a name="ST"></a> # 风格迁移(Style Transfer) <a name="IQA"></a> # 图像质量评价(Image Quality Assessment) <a name="Video-Quality-Assessment"></a> # 视频质量评价(Video Quality Assessment) <a name="CS"></a> # 压缩感知(Compressive Sensing) <a name="Datasets"></a> # 数据集(Datasets) <a name="Others"></a> # 其他(Others) **Decoupling Defense Strategies for Robust Image Watermarking** - Paper: https://arxiv.org/abs/2602.20053 - Code: None **Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery** - Paper: https://arxiv.org/abs/2602.19910 - Code: **The Invisible Gorilla Effect in Out-of-distribution Detection** - Paper: https://arxiv.org/abs/2602.20068 - Code: https://github.com/HarryAnthony/Invisible_Gorilla_Effect **SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images** - Paper: https://arxiv.org/abs/2602.20412 - Code: **RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces** - Paper: https://arxiv.org/abs/2602.20618 - Code: **Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models** - Paper: - Code: **GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement** - Paper: https://arxiv.org/abs/2603.05095 - Code: **FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation** - Paper: https://arxiv.org/abs/2603.04733 - Code: https://github.com/eVI-group-SCU/FOZO **Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning ** - Paper: https://arxiv.org/abs/2603.04825 - Code: https://github.com/RyanZhaoIc/CAD

Education & Learning ML Frameworks
22.7K Github Stars
Deep-Learning-Interview-Book
Open Source

Deep-Learning-Interview-Book

# 深度学习面试宝典 **Deep Learning Interview Book** - :star: [求职攻略](https://github.com/amusi/AI-Job-Notes) - :smiley: [自我介绍](docs/自我介绍.md) - :1234: [数学](docs/数学.md) - :mortar_board: [机器学习](docs/机器学习.md) - :closed_book: [深度学习](docs/深度学习.md) - :green_book: [强化学习](docs/强化学习.md) - :eyes: [计算机视觉](docs/计算机视觉.md) - :camera: [传统图像处理](docs/传统图像处理.md) - :mahjong: [自然语言处理](docs/自然语言处理.md) - :surfer: [SLAM](docs/SLAM.md) - :busts_in_silhouette: [推荐算法](docs/推荐算法.md) - :bar_chart: [数据结构与算法](docs/数据结构与算法.md) - :snake: [编程语言:C/C++/Python](docs/编程语言.md) - :fireworks: [深度学习框架](docs/深度学习框架.md) - :pencil2: [面试经验](docs/面试经验.md) - :bulb: [面试技巧](docs/面试技巧.md) - :mega: [其它(计算机网络/Linux等)](docs/其它.md) - [2024年AI算法岗和开发岗求职群](https://mp.weixin.qq.com/s/sK_oSU1PmbUJ5ZGeMmY27A) # 加入2024年AI算法岗和开发岗求职群方式 **价格:原价199元,限时立减50!特惠仅149元!(每天仅4毛钱)** **时长:一年(从你加入的时刻算起)** **加入方式:微信扫描下方二维码,即可加入AI算法岗和开发岗求职群(知识星球)** > 建议:进群后,推荐下载知识星球APP使用,同时也可使用小程序或者知识星球公众号进行使用,可以发帖/提问/交流/回答,并可以快速访问群里的资源。 ![](docs/imgs/2024年AI求职群优惠券二维码.png) ![](docs/imgs/DLIB-Mindmap.png)

Education & Learning
8.9K Github Stars
daily-paper-computer-vision
Open Source

daily-paper-computer-vision

# daily-paper-computer-vision **记录每天整理的计算机视觉/深度学习/机器学习相关方向的论文** - [CV 优质论文速递](#PaperDaily) - [CV 顶会/顶刊(2017-2023)](#TopPaper) <a name="PaperDaily"></a> ## CV 优质论文速递 - [2023年(日更中)](2023-Paper.md) 为了方便内容沉淀和检索,现已在[【CVer计算机视觉】](https://github.com/amusi/CVPR2023-Papers-with-Code/blob/master/CVer%E5%AD%A6%E6%9C%AF%E4%BA%A4%E6%B5%81%E7%BE%A4.png) 中来完成**CV/AI优质论文、项目和应用速递**的每日更新,欢迎各位 CVer 加入!互相学习,一起进步~ [【CVer计算机视觉】](https://github.com/amusi/CVPR2023-Papers-with-Code/blob/master/CVer%E5%AD%A6%E6%9C%AF%E4%BA%A4%E6%B5%81%E7%BE%A4.png) 是最大的计算机视觉AI知识星球!每日更新!第一时间分享的方向涵盖:目标检测、语义分割、目标跟踪、Transformer、多模态、大模型、NeRF、扩散模型、深度估计、超分辨率、3D目标检测、CNN、GAN、竞赛解决方案、人脸识别、数据增广、人脸检测、数据集、NAS、AutoML、图像分割、SLAM、实例分割、人体姿态估计、视频目标分割、Re-ID、医学图像分割、显著性目标检测、自动驾驶、人群密度估计、PyTorch、人脸、车道线检测、去雾 、全景分割、行人检测、文本检测、OCR、6D姿态估计、 边缘检测、场景文本检测、视频实例分割、3D点云、模型压缩、人脸对齐、超分辨、去噪、强化学习、行为识别、OpenCV、场景文本识别、去雨、机器学习、风格迁移、视频目标检测、去模糊、显著性检测、剪枝、活体检测、人脸关键点检测、3D目标跟踪、视频修复、人脸表情识别、时序动作检测、图像检索、异常检测等 ![CVer学术交流群](./CVer学术交流群.png) <a name="TopPaper"></a> ## CV 顶会/顶刊 ### 2023 **CVPR 2023** - 论文列表:https://openaccess.thecvf.com/CVPR2023?day=all - 论文和代码:https://github.com/amusi/CVPR2023-Papers-with-Code **IJCAI 2023** 论文列表:https://ijcai-23.org/main-track-accepted-papers/ **ICLR 2023** - 论文列表:https://openreview.net/group?id=ICLR.cc/2023/Conference#notable-top-5- ### 2022 **NIPS 2022** - 论文列表:https://nips.cc/Conferences/2022/Schedule?type=Poster 和 https://openreview.net/group?id=NeurIPS.cc/2022/Conference **CVPR 2022** - 论文列表:https://openaccess.thecvf.com/CVPR2022?day=all - 论文和代码:https://github.com/amusi/CVPR2023-Papers-with-Code/blob/master/CVPR2022-Papers-with-Code.md **ECCV 2022** - 论文列表:https://www.ecva.net/papers.php 和 https://eccv2022.ecva.net/program/accepted-papers/ - 论文和代码:https://github.com/amusi/ECCV2022-Papers-with-Code **ACM MM 2022** - 论文列表:https://2022.acmmm.org/accepted-papers/ **WACV 2022** - 论文列表:https://openaccess.thecvf.com/WACV2023 **MICCAI 2022** - 论文列表:https://conferences.miccai.org/2022/papers/ 和 https://link.springer.com/book/10.1007/978-3-031-16431-6 **AAAI 2022** - 论文列表:https://aaai-2022.virtualchair.net/papers.html?filter=keywords&search=Poster+Session+12&cluster=Red+3 **ICLR 2022** - 论文列表:https://openreview.net/group?id=ICLR.cc/2022/Conference#oral-submissions ### 2021 **ICLR 2021** - 论文列表:https://docs.google.com/spreadsheets/d/1n58O0lgGI5kI0QQY9f4BDDpNB4oFjb5D51yMr9fHAK4/edit#gid=1546418007 - OpenReview数据:https://github.com/evanzd/ICLR2021-OpenReviewData - [ICLR 2021 Stats & Graphs](https://github.com/sharonzhou/ICLR2021-Stats) **AAAI 2021** - 论文列表:https://aaai.org/Conferences/AAAI-21/wp-content/uploads/2020/12/AAAI-21_Accepted-Paper-List.Main_.Technical.Track_.pdf **WACV 2021** - 论文列表:http://wacv2021.thecvf.com/program ### 2020 **CVPR 2020** - [CVPR 2020所有录用论文清单](http://openaccess.thecvf.com/CVPR2020.py) - CVPR 2020论文PDF下载(1467篇论文):[百度云链接](https://pan.baidu.com/s/1DoPNWXpwEkzQdPOrLsO21w) 密码: te6h - [CVPR 2020 论文开源代码合集](https://github.com/amusi/CVPR2020-Code) **ECCV 2020** - [ECCV 2020 论文开源代码合集](https://github.com/amusi/ECCV2020-Code) **NIPS 2020** - 论文合集:https://neurips.cc/Conferences/2020/AcceptedPapersInitial - 带代码的论文合集:https://www.paperdigest.org/2020/11/neurips-2020-papers-with-code-data/ **ACM MM 2020** - 论文合集:https://dblp.org/db/conf/mm/mm2020.html - 论文合集:https://2020.acmmm.org/main-track-list.html **MICCAI 2020** - 论文合集:https://drive.google.com/drive/folders/1GDKe2raJf4ylWqb1jxGmnsR384kmjYBb?usp=sharing ### 2019 **CVPR 2019** - [CVPR 2019所有录用论文清单](<http://openaccess.thecvf.com/CVPR2019.py>) - CVPR 2019论文PDF下载(1294篇论文):[百度云链接](https://pan.baidu.com/s/19ef0HOz4hduDpcEK2PY9Kw ) 密码: mwgv - [CVPR 2019 开源代码合集](<https://github.com/amusi/CVPR2019-Code>) **ICCV 2019** - [ICCV 2019所有录用论文清单](<http://openaccess.thecvf.com/ICCV2019.py>) - ICCV 2019论文PDF下载(1075篇论文):[百度云链接](https://pan.baidu.com/s/1snDhED1Y-6qbV1ImQoYIPA ) 密码: h7c2 **NeurIPS 2019** - NeurIPS 2019 录用论文名单(1427篇):[百度云链接](https://pan.baidu.com/s/1TxD263qqXmja3fBZVwtP3g) 密码:04wn **IJCAI 2019** - IJCAI 2019所有录用论文清单(847篇):[百度云链接](https://pan.baidu.com/s/1mVEowSZLBcz3X-_CZt7svA) 密码:v6ps ### 2018 **CVPR 2018** - [CVPR 2018所有录用论文清单](2018/cvpr2018-paper-list.csv) - CVPR 2018论文PDF下载(979篇论文):[百度云链接](https://pan.baidu.com/s/1lYEM_kkw1PWTkQzUvjG2pw) 密码: 6pgk **ECCV 2018** - [ECCV 2018所有录用论文清单](http://openaccess.thecvf.com/ECCV2018.py) - ECCV 2018论文PDF下载:[百度云链接](https://pan.baidu.com/s/1Mg0Kw9bepUK6_vqqVSOjNQ) 密码: mh97 ### 2017 **CVPR 2017** - CVPR 2017论文PDF下载:[百度云链接](https://pan.baidu.com/s/1RP1wQBFxs8BT0KBLiukxBw) 密码: hnzg

Education & Learning
6.8K Github Stars
AI-Job-Notes
Open Source

AI-Job-Notes

# AI-Job-Notes AI算法岗求职攻略:涵盖校招交流群、校招时间表、准备攻略、刷题指南、内推、AI公司清单和答疑等资料。 AI算法岗方向涉及:AIGC、大模型、深度学习、机器学习、计算机视觉、NLP、具身智能、多模态、图像处理、自动驾驶、具身智能和SLAM等。 开发岗方向涉及:Java、C/C++、Python、Go、嵌入式等。 # 目录 <!-- MarkdownTOC depth=4 --> - [0 2025年校招群](#Group) - [1 校招时间表](#Scheduled) - [2 准备攻略](#Strategy) - [3 AI 面经和刷题指南](#Coding) - [4 AI算法岗和开发岗求职群(内推)](#Recommend) - [5 简历模板](#Resume) - [6 AI 类公司清单(以CV岗为主)](#Company) - [7 往届AI算法岗薪资情况](#Salary) - [8 答疑(含130个问答)](#Q&A) <a name="Group"></a> ## 1 2025年校招群 **2025年AI算法岗和开发岗求职群已成立!** 详情请戳:[「2025年AI算法岗求职群」](https://mp.weixin.qq.com/s/oN9nrIY5_fRVUgHc6ly0nw) **价格:原价199元,限时立减60!特惠仅139元!(每天仅4毛钱)** **时长:一年(从你加入的时刻算起)** **加入方式:微信扫描下方二维码,即可加入AI算法岗和开发岗求职群(知识星球)** > 建议:进群后,推荐下载知识星球APP使用,同时也可使用小程序或者知识星球公众号进行使用,可以发帖/提问/交流/回答,并可以快速访问群里的资源。 ![](imgs/2025年AI求职群优惠券二维码.png) <a name="Scheduled"></a> ## 1 校招时间表 ![](imgs/校招时间表.png) 以今年(2024)为例,默认为2025届学生(2024届学生称为上届) | 时间 | 任务 | | -------------- | ----------------------------------- | | 2024年2月~5月 | 找暑期实习/上届春招(补招) | | 2024年6月~8月 | 秋招提前批(神仙打架) | | 2024年8月~11月 | 秋招正式批(神仙继续打架+菜鸡互啄) | ### 1.1 暑期实习 2024年2月~5月:暑期实习。 实习一般分成两种: - 日常实习 - 暑期实习 ![](imgs/实习.png) **日常实习**:日常实习是任何时候都可以找的,通常是根据具体部门的需求,由公司HR、部门主管或者部门员工发布招聘消息,相对较为零散也比较灵活。 **暑期实习**:很多公司,特别是大公司(如BAT等大厂),都会组织专项的**暑期实习生**招聘活动。一方面是针对在校学生的情况(很多学生只有暑期才有假期,或者导师暑假才放人),另一方面就是为了秋季校招(大规模招聘)吸引人才。暑期实习具有很大的意义,对学生来说,最直接的好处就是转正机会。暑期实习,一般6月底左右实习入职(也可以根据自己的时间,提前入职),一般8月底或9月份会有专项暑期实习答辩,根据综合表现,答辩通过后就可以基本结束秋招了。 注:这里建议在进入公司参加暑期实习的期间,也要参加秋招提前批和秋招正式批,并多投递一些公司,即使在实习,所谓的很忙,没时间准备秋招了,那也要多投。暑期实习的另一个好处是增加可贵的实习经验,简历会好看很多。 > 其实也还有"寒假实习",但很少有规模化的寒假实习招聘,顶多算是在寒假期间的集中式日常实习 ### 1.2 秋招提前批 **2024年6月~8月:秋招提前批(神仙打架)** 每年打响秋招第一枪的基本是vivo或者大疆(DJI)科技,然后BAT等大厂居多是7月份开始。这时候的校招,绝大部分都是内推/提前批,而不是正式批,大家一定要珍惜这个时间点:6月~8月。虽然我调侃着说神仙打架,但还是要注意这时候性价比特别高。一方面是薪资普遍高,通常一些SP/SSP Offer都是这个节点发出来的,另一方面是投递的人数还不是很多,因为有些人没有意识到这个提前批的重要性,老想着多准备一点,到秋招正式批再大干一场。 需要注意的是:参与秋招提前批的大佬特别多,同时岗位hc并不多(因为企业要考虑正式批的情况,会控制招聘人数),所以我把秋招提前批比作:神仙打架。另外,秋招提前批大多以内推为主,后面章节中我会说到如何获取招聘信息以及如何内推。 注:提取批挂了,正式批可以再继续投(具体看不同公司的招聘介绍)。 ### 1.3 秋招正式批 **2024年8月~11月:秋招正式(神仙继续打架+菜鸡互啄)** 有句话叫做金九银十,也就是9月份的 Offer 比10月份的 Offer 更可贵,这话其实很有道理,所以大家可以脑补到7、8月份的 Offer 属于什么 level 了。这时候也很考验大家的心态,比如9月份或10月份了,如果你手里还没有Offer,再看看身边已经拿到Offer的同学,一定变成柠檬精。 > 注:有些公司会在8月就开启秋招正式批的招聘 所以 Amusi 这里强烈建议一定要把握住**秋招提前批 **。当然了,如果9月份手里还没有Offer,心态千万别崩,继续投继续干,记住一句话:多投准没错!其实大部分同学都是9月、10月才陆续收到Offer的,所以你多投继续努力,收获肯定会有的。 <a name="Strategy"></a> ## 2 准备攻略 因为这就好像是学习计划一样,每个人都要自己的习惯,我的你并不一定适用(即将上新资料)。所以我就用一个精简的公式来介绍。 **公式:刷题(LeetCode/剑指Offer) + AI基础知识 + 编程基础知识 + 面试八股文(cs/AI) + 项目 + 实习 + 竞赛 +顶会/顶刊** 对于上述维度,一般来说:具备的越多越好,特别是对于门槛越来越高的AI算法岗。 <a name="Coding"></a> ## 3 AI面经和刷题指南 ### 3.1 深度学习面试宝典 详见:[深度学习面试宝典(含数学、机器学习、深度学习、计算机视觉、自然语言处理和SLAM等方向)](<https://github.com/amusi/Deep-Learning-Interview-Book>) **Deep Learning Interview Book** 部分内容如下: - 😃 [自我介绍](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E8%87%AA%E6%88%91%E4%BB%8B%E7%BB%8D.md) - 🔢 [数学](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%95%B0%E5%AD%A6.md) - 🎓 [机器学习](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0.md) - 📕 [深度学习](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0.md) - 📗 [强化学习](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9%A0.md) - 👀 [计算机视觉](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E8%AE%A1%E7%AE%97%E6%9C%BA%E8%A7%86%E8%A7%89.md) - 📷 [传统图像处理](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E4%BC%A0%E7%BB%9F%E5%9B%BE%E5%83%8F%E5%A4%84%E7%90%86.md) - 🀄️ [自然语言处理](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86.md) - 🏄 [SLAM](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/SLAM.md) - 👥 [推荐算法](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%8E%A8%E8%8D%90%E7%AE%97%E6%B3%95.md) - 📊 [数据结构与算法](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84%E4%B8%8E%E7%AE%97%E6%B3%95.md) - 🐍 [编程语言:C/C++/Python](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80.md) - 🎆 [深度学习框架](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E6%A1%86%E6%9E%B6.md) - ✏️ [面试经验](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E9%9D%A2%E8%AF%95%E7%BB%8F%E9%AA%8C.md) - 💡 [面试技巧](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E9%9D%A2%E8%AF%95%E6%8A%80%E5%B7%A7.md) - 📣 [其它(计算机网络/Linux等)](https://github.com/amusi/Deep-Learning-Interview-Book/blob/master/docs/%E5%85%B6%E5%AE%83.md) ### 3.2 刷题指南 刷题的目的是为了学习数据结构和算法,锻炼编程能力和熟悉刷题技巧 刷题建议:先刷[《剑指Offer》](https://www.nowcoder.com/ta/coding-interviews)(66题),再刷 [LeetCode](https://leetcode.com/)(目前LeetCode已经有1000+题,可以根据类别来刷,但强烈建议先刷完 [LeetCode 面试高频题](https://leetcode.com/problemset/top-interview-questions/)) > 注:根据去年校招提前批的情况来看,LeetCode 建议至少刷200-300题,所以2024年(2025届)找工作的同学一定要努力刷起来了! #### 3.2.1 刷题编程语言 - C/C++ - Python - JAVA(不推荐) > 注:如果时间充裕,而且有 C++ 基础,那么强烈建议使用 C++和 Python 同时刷题。 > > 根据 2023 年(2024届)校招提前批的情况来看,会 C++ 的同学具有有一定优势。 #### 3.2.2 书籍推荐 | 书籍 | 豆瓣评分 | 推荐指数 | | ------------------------------------------------------------ | -------- | -------- | | [《剑指Offer》](https://book.douban.com/subject/25910559/) | 9.1 | ☆☆☆☆☆ | | [《数据结构(C++语言版)》](https://book.douban.com/subject/25859528/) | 9.4 | ☆☆☆☆ | | [《算法图解》](https://book.douban.com/subject/26979890/) | 8.4 | ☆☆☆☆ | | [《大话数据结构》](https://book.douban.com/subject/6424904/) | 7.9 | ☆☆☆ | | [《算法》(第四版)](https://book.douban.com/subject/19952400/) | 9.4 | ☆☆☆ | > 注:其实还有很多方向没有涉及,如linux、数据库,但暂时先推荐这些,后面再补充 #### 3.2.3 在线刷题网站 - [LeetCode(英文)](https://leetcode.com/) - [LeetCode(中文)](https://leetcode-cn.com/) - [牛客网](https://www.nowcoder.com/):推荐剑指Offer和各大公司往年题库,牛客网的优势在于很多公司都会使用其作为在线刷题平台,所以在这上面刷题,有利于懂得输入输出等"套路" #### 3.2.4 刷题方法 - 《剑指Offer》全刷完 - LeetCode选择性刷:可以类别来刷题,如数组类、链表类,或者面试高频类 #### 3.2.5 刷题时间 现在起~2024-11-15 #### 3.2.6 刷题重要性 正常校招流程都要进行在线笔试,面试中也可能会手撕代码,所以刷题十分影响面试结果。 <a name="Recommend"></a> ## 4 AI算法岗和开发岗求职群和内推 国内公司人工智能方向岗位的内推机会,含机器学习、深度学习、计算机视觉和自然语言处理等方向。 ### 4.1 内推的重要性 内推,真的太重要了。其实现在找实习也一样,内推的重要性就提醒出来了,比如我这边的资源就可以内推到BAT、商汤、旷视等公司,一般常规操作是网上投递简历,而快速直接的就是将简历送到leader/主管那里。而且内推是建立在一种互信的基础上(虽然不大),该走的流程还是要走,但无形中增大了面试通过概率。你要知道,很多人的简历在官网或者其他第三方招聘网站上就直接卡死了。 ### 4.2 如何内推? 内推的方式很多,比如: 1. 强关联:直接找已经毕业的师兄师姐或朋友内推(缺点是身边朋友去的企业有限,很多人是第一批从事算法岗的,可能都没有师兄师姐搞这个) 2. 常规操作:上牛客网论坛看企业人员发内推帖子、关注一些招聘公众号(这里我就不推荐,因为很多公众号都很有套路,内推一个企业,还要转发文章到其它群里,然后截图给他们,可是对于大多数人,为了内推,只能这么干) 3. Amusi 内推。这里感觉像似打广告一样,但确实是一个方式,因为我手里资源挺多的,很多公司的人都认识,可以直接内推。感兴趣的可以关注一下这个AI算法岗和开发岗求职群:[「2024年AI算法岗求职群」](https://mp.weixin.qq.com/s/sK_oSU1PmbUJ5ZGeMmY27A) ### 4.3 AI算法岗和开发岗求职群 **价格:原价199元,限时立减60!特惠仅139元!(每天仅4毛钱)** **时长:一年(从你加入的时刻算起)** **加入方式:微信扫描下方二维码,即可加入AI算法岗和开发岗求职群(知识星球)** > 建议:进群后,推荐下载知识星球APP使用,同时也可使用小程序或者知识星球公众号进行使用,可以发帖/提问/交流/回答,并可以快速访问群里的资源。 ![](imgs/2025年AI求职群优惠券二维码.png) <a name="Resume"></a> ## 5 简历模板 提供了三份简历模板,详见:[AI 算法岗简历模板](https://github.com/amusi/AI-Job-Resume) ![](imgs/Resume-Demo.png) <a name="Company"></a> ## 6 AI类公司清单(以CV岗为主) 首先 AI > CV,所以提供CV岗的公司肯定就提供 AI岗。但至于这些公司是否还有 NLP、机器学习、语音识别、推荐算法和 SLAM等岗位,这个需要大家自行去官网进行了解。 计算机视觉(CV)算法岗位的公司名单详见:https://github.com/amusi/CV-Jobs <a name="Salary"></a> ## 7 往届AI算法岗薪资情况 这里说说2024届AI算法岗的薪资情况。 我只以**硕士及一线左右城市**为例(北上广深、南京、杭州等),因为像武汉、成都,你即使找的AI算法岗,但城市不一样,薪资还是多少有区别,明显不能只看Money,不考虑城市大环境。 - **白菜价:25w~35w** - **SP:35w~45w** - **SSP:45w+** 说年薪有点笼统,我再说细一点,大家也可以提取熟悉一下。 一般企业薪资构成是: - 年薪总包 = 月薪*(12+X) + 住房补贴+ 股票/期权 + 签字费 X一般是2~5个月的薪资,很多是3个月。 注:跟hr谈薪资的时候,如果她/他问你:你的希望薪资是多少?!这时候你一定要往高了要,至少比你想要的高30%。听我的,没有错,不然... ![](imgs/salary.png) <a name="Q&A"></a> ## 8 答疑 130个问答请戳—> [Q&A](Q&A.md)

Productivity ML Frameworks
6.1K Github Stars
awesome-object-detection
Open Source

awesome-object-detection

# object-detection [TOC] This is a list of awesome articles about object detection. If you want to read the paper according to time, you can refer to [Date](Date.md). - R-CNN - Fast R-CNN - Faster R-CNN - Mask R-CNN - Light-Head R-CNN - Cascade R-CNN - SPP-Net - YOLO - YOLOv2 - YOLOv3 - YOLT - SSD - DSSD - FSSD - ESSD - MDSSD - Pelee - Fire SSD - R-FCN - FPN - DSOD - RetinaNet - MegDet - RefineNet - DetNet - SSOD - CornerNet - M2Det - 3D Object Detection - ZSD(Zero-Shot Object Detection) - OSD(One-Shot object Detection) - Weakly Supervised Object Detection - Softer-NMS - 2018 - 2019 - Other Based on handong1587's github: https://handong1587.github.io/deep_learning/2015/10/09/object-detection.html # Survey **Imbalance Problems in Object Detection: A Review** - intro: under review at TPAMI - arXiv: <https://arxiv.org/abs/1909.00169> **Recent Advances in Deep Learning for Object Detection** - intro: From 2013 (OverFeat) to 2019 (DetNAS) - arXiv: <https://arxiv.org/abs/1908.03673> **A Survey of Deep Learning-based Object Detection** - intro:From Fast R-CNN to NAS-FPN - arXiv:<https://arxiv.org/abs/1907.09408> **Object Detection in 20 Years: A Survey** - intro:This work has been submitted to the IEEE TPAMI for possible publication - arXiv:<https://arxiv.org/abs/1905.05055> **《Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks》** - intro: awesome - arXiv: https://arxiv.org/abs/1809.03193 **《Deep Learning for Generic Object Detection: A Survey》** - intro: Submitted to IJCV 2018 - arXiv: https://arxiv.org/abs/1809.02165 # Papers&Codes ## R-CNN **Rich feature hierarchies for accurate object detection and semantic segmentation** - intro: R-CNN - arxiv: <http://arxiv.org/abs/1311.2524> - supp: <http://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr-supp.pdf> - slides: <http://www.image-net.org/challenges/LSVRC/2013/slides/r-cnn-ilsvrc2013-workshop.pdf> - slides: <http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf> - github: <https://github.com/rbgirshick/rcnn> - notes: <http://zhangliliang.com/2014/07/23/paper-note-rcnn/> - caffe-pr("Make R-CNN the Caffe detection example"): <https://github.com/BVLC/caffe/pull/482> ## Fast R-CNN **Fast R-CNN** - arxiv: <http://arxiv.org/abs/1504.08083> - slides: <http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf> - github: <https://github.com/rbgirshick/fast-rcnn> - github(COCO-branch): <https://github.com/rbgirshick/fast-rcnn/tree/coco> - webcam demo: <https://github.com/rbgirshick/fast-rcnn/pull/29> - notes: <http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/> - notes: <http://blog.csdn.net/linj_m/article/details/48930179> - github("Fast R-CNN in MXNet"): <https://github.com/precedenceguo/mx-rcnn> - github: <https://github.com/mahyarnajibi/fast-rcnn-torch> - github: <https://github.com/apple2373/chainer-simple-fast-rnn> - github: <https://github.com/zplizzi/tensorflow-fast-rcnn> **A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection** - intro: CVPR 2017 - arxiv: <https://arxiv.org/abs/1704.03414> - paper: <http://abhinavsh.info/papers/pdfs/adversarial_object_detection.pdf> - github(Caffe): <https://github.com/xiaolonw/adversarial-frcnn> ## Faster R-CNN **Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks** - intro: NIPS 2015 - arxiv: <http://arxiv.org/abs/1506.01497> - gitxiv: <http://www.gitxiv.com/posts/8pfpcvefDYn2gSgXk/faster-r-cnn-towards-real-time-object-detection-with-region> - slides: <http://web.cs.hacettepe.edu.tr/~aykut/classes/spring2016/bil722/slides/w05-FasterR-CNN.pdf> - github(official, Matlab): <https://github.com/ShaoqingRen/faster_rcnn> - github(Caffe): <https://github.com/rbgirshick/py-faster-rcnn> - github(MXNet): <https://github.com/msracver/Deformable-ConvNets/tree/master/faster_rcnn> - github(PyTorch--recommend): <https://github.com//jwyang/faster-rcnn.pytorch> - github: <https://github.com/mitmul/chainer-faster-rcnn> - github(Torch):: <https://github.com/andreaskoepf/faster-rcnn.torch> - github(Torch):: <https://github.com/ruotianluo/Faster-RCNN-Densecap-torch> - github(TensorFlow): <https://github.com/smallcorgi/Faster-RCNN_TF> - github(TensorFlow): <https://github.com/CharlesShang/TFFRCNN> - github(C++ demo): <https://github.com/YihangLou/FasterRCNN-Encapsulation-Cplusplus> - github(Keras): <https://github.com/yhenon/keras-frcnn> - github: <https://github.com/Eniac-Xie/faster-rcnn-resnet> - github(C++): <https://github.com/D-X-Y/caffe-faster-rcnn/tree/dev> **R-CNN minus R** - intro: BMVC 2015 - arxiv: <http://arxiv.org/abs/1506.06981> **Faster R-CNN in MXNet with distributed implementation and data parallelization** - github: <https://github.com/dmlc/mxnet/tree/master/example/rcnn> **Contextual Priming and Feedback for Faster R-CNN** - intro: ECCV 2016. Carnegie Mellon University - paper: <http://abhinavsh.info/context_priming_feedback.pdf> - poster: <http://www.eccv2016.org/files/posters/P-1A-20.pdf> **An Implementation of Faster RCNN with Study for Region Sampling** - intro: Technical Report, 3 pages. CMU - arxiv: <https://arxiv.org/abs/1702.02138> - github: <https://github.com/endernewton/tf-faster-rcnn> - github: https://github.com/ruotianluo/pytorch-faster-rcnn **Interpretable R-CNN** - intro: North Carolina State University & Alibaba - keywords: AND-OR Graph (AOG) - arxiv: <https://arxiv.org/abs/1711.05226> **Domain Adaptive Faster R-CNN for Object Detection in the Wild** - intro: CVPR 2018. ETH Zurich & ESAT/PSI - arxiv: <https://arxiv.org/abs/1803.03243> ## Mask R-CNN - arxiv: <http://arxiv.org/abs/1703.06870> - github(Keras): https://github.com/matterport/Mask_RCNN - github(Caffe2): https://github.com/facebookresearch/Detectron - github(Pytorch): <https://github.com/wannabeOG/Mask-RCNN> - github(MXNet): https://github.com/TuSimple/mx-maskrcnn - github(Chainer): https://github.com/DeNA/Chainer_Mask_R-CNN ## Light-Head R-CNN **Light-Head R-CNN: In Defense of Two-Stage Object Detector** - intro: Tsinghua University & Megvii Inc - arxiv: <https://arxiv.org/abs/1711.07264> - github(offical): https://github.com/zengarden/light_head_rcnn - github: <https://github.com/terrychenism/Deformable-ConvNets/blob/master/rfcn/symbols/resnet_v1_101_rfcn_light.py#L784> ## Cascade R-CNN **Cascade R-CNN: Delving into High Quality Object Detection** - arxiv: <https://arxiv.org/abs/1712.00726> - github: <https://github.com/zhaoweicai/cascade-rcnn> ## SPP-Net **Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition** - intro: ECCV 2014 / TPAMI 2015 - arxiv: <http://arxiv.org/abs/1406.4729> - github: <https://github.com/ShaoqingRen/SPP_net> - notes: <http://zhangliliang.com/2014/09/13/paper-note-sppnet/> **DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection** - intro: PAMI 2016 - intro: an extension of R-CNN. box pre-training, cascade on region proposals, deformation layers and context representations - project page: <http://www.ee.cuhk.edu.hk/%CB%9Cwlouyang/projects/imagenetDeepId/index.html> - arxiv: <http://arxiv.org/abs/1412.5661> **Object Detectors Emerge in Deep Scene CNNs** - intro: ICLR 2015 - arxiv: <http://arxiv.org/abs/1412.6856> - paper: <https://www.robots.ox.ac.uk/~vgg/rg/papers/zhou_iclr15.pdf> - paper: <https://people.csail.mit.edu/khosla/papers/iclr2015_zhou.pdf> - slides: <http://places.csail.mit.edu/slide_iclr2015.pdf> **segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection** - intro: CVPR 2015 - project(code+data): <https://www.cs.toronto.edu/~yukun/segdeepm.html> - arxiv: <https://arxiv.org/abs/1502.04275> - github: <https://github.com/YknZhu/segDeepM> **Object Detection Networks on Convolutional Feature Maps** - intro: TPAMI 2015 - keywords: NoC - arxiv: <http://arxiv.org/abs/1504.06066> **Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction** - arxiv: <http://arxiv.org/abs/1504.03293> - slides: <http://www.ytzhang.net/files/publications/2015-cvpr-det-slides.pdf> - github: <https://github.com/YutingZhang/fgs-obj> **DeepBox: Learning Objectness with Convolutional Networks** - keywords: DeepBox - arxiv: <http://arxiv.org/abs/1505.02146> - github: <https://github.com/weichengkuo/DeepBox> ## YOLO **You Only Look Once: Unified, Real-Time Object Detection** [![img](https://camo.githubusercontent.com/e69d4118b20a42de4e23b9549f9a6ec6dbbb0814/687474703a2f2f706a7265646469652e636f6d2f6d656469612f66696c65732f6461726b6e65742d626c61636b2d736d616c6c2e706e67)](https://camo.githubusercontent.com/e69d4118b20a42de4e23b9549f9a6ec6dbbb0814/687474703a2f2f706a7265646469652e636f6d2f6d656469612f66696c65732f6461726b6e65742d626c61636b2d736d616c6c2e706e67) - arxiv: <http://arxiv.org/abs/1506.02640> - code: <https://pjreddie.com/darknet/yolov1/> - github: <https://github.com/pjreddie/darknet> - blog: <https://pjreddie.com/darknet/yolov1/> - slides: <https://docs.google.com/presentation/d/1aeRvtKG21KHdD5lg6Hgyhx5rPq_ZOsGjG5rJ1HP7BbA/pub?start=false&loop=false&delayms=3000&slide=id.p> - reddit: <https://www.reddit.com/r/MachineLearning/comments/3a3m0o/realtime_object_detection_with_yolo/> - github: <https://github.com/gliese581gg/YOLO_tensorflow> - github: <https://github.com/xingwangsfu/caffe-yolo> - github: <https://github.com/frankzhangrui/Darknet-Yolo> - github: <https://github.com/BriSkyHekun/py-darknet-yolo> - github: <https://github.com/tommy-qichang/yolo.torch> - github: <https://github.com/frischzenger/yolo-windows> - github: <https://github.com/AlexeyAB/yolo-windows> - github: <https://github.com/nilboy/tensorflow-yolo> **darkflow - translate darknet to tensorflow. Load trained weights, retrain/fine-tune them using tensorflow, export constant graph def to C++** - blog: <https://thtrieu.github.io/notes/yolo-tensorflow-graph-buffer-cpp> - github: <https://github.com/thtrieu/darkflow> **Start Training YOLO with Our Own Data** [![img](https://camo.githubusercontent.com/2f99b692dd7ce47d7832385f3e8a6654e680d92a/687474703a2f2f6775616e6768616e2e696e666f2f626c6f672f656e2f77702d636f6e74656e742f75706c6f6164732f323031352f31322f696d616765732d34302e6a7067)](https://camo.githubusercontent.com/2f99b692dd7ce47d7832385f3e8a6654e680d92a/687474703a2f2f6775616e6768616e2e696e666f2f626c6f672f656e2f77702d636f6e74656e742f75706c6f6164732f323031352f31322f696d616765732d34302e6a7067) - intro: train with customized data and class numbers/labels. Linux / Windows version for darknet. - blog: <http://guanghan.info/blog/en/my-works/train-yolo/> - github: <https://github.com/Guanghan/darknet> **YOLO: Core ML versus MPSNNGraph** - intro: Tiny YOLO for iOS implemented using CoreML but also using the new MPS graph API. - blog: <http://machinethink.net/blog/yolo-coreml-versus-mps-graph/> - github: <https://github.com/hollance/YOLO-CoreML-MPSNNGraph> **TensorFlow YOLO object detection on Android** - intro: Real-time object detection on Android using the YOLO network with TensorFlow - github: <https://github.com/natanielruiz/android-yolo> **Computer Vision in iOS – Object Detection** - blog: <https://sriraghu.com/2017/07/12/computer-vision-in-ios-object-detection/> - github:<https://github.com/r4ghu/iOS-CoreML-Yolo> ## YOLOv2 **YOLO9000: Better, Faster, Stronger** - arxiv: <https://arxiv.org/abs/1612.08242> - code: <http://pjreddie.com/yolo9000/> https://pjreddie.com/darknet/yolov2/ - github(Chainer): <https://github.com/leetenki/YOLOv2> - github(Keras): <https://github.com/allanzelener/YAD2K> - github(PyTorch): <https://github.com/longcw/yolo2-pytorch> - github(Tensorflow): <https://github.com/hizhangp/yolo_tensorflow> - github(Windows): <https://github.com/AlexeyAB/darknet> - github: <https://github.com/choasUp/caffe-yolo9000> - github: <https://github.com/philipperemy/yolo-9000> - github(TensorFlow): <https://github.com/KOD-Chen/YOLOv2-Tensorflow> - github(Keras): <https://github.com/yhcc/yolo2> - github(Keras): <https://github.com/experiencor/keras-yolo2> - github(TensorFlow): <https://github.com/WojciechMormul/yolo2> **darknet_scripts** - intro: Auxilary scripts to work with (YOLO) darknet deep learning famework. AKA -> How to generate YOLO anchors? - github: <https://github.com/Jumabek/darknet_scripts> **Yolo_mark: GUI for marking bounded boxes of objects in images for training Yolo v2** - github: <https://github.com/AlexeyAB/Yolo_mark> **LightNet: Bringing pjreddie's DarkNet out of the shadows** <https://github.com//explosion/lightnet> **YOLO v2 Bounding Box Tool** - intro: Bounding box labeler tool to generate the training data in the format YOLO v2 requires. - github: <https://github.com/Cartucho/yolo-boundingbox-labeler-GUI> **Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors** - intro: **LRM** is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded. - arxiv: https://arxiv.org/abs/1804.04606 **Object detection at 200 Frames Per Second** - intro: faster than Tiny-Yolo-v2 - arxiv: https://arxiv.org/abs/1805.06361 **Event-based Convolutional Networks for Object Detection in Neuromorphic Cameras** - intro: YOLE--Object Detection in Neuromorphic Cameras - arxiv:https://arxiv.org/abs/1805.07931 **OmniDetector: With Neural Networks to Bounding Boxes** - intro: a person detector on n fish-eye images of indoor scenes(NIPS 2018) - arxiv:https://arxiv.org/abs/1805.08503 - datasets:https://gitlab.com/omnidetector/omnidetector ## YOLOv3 **YOLOv3: An Incremental Improvement** - arxiv:https://arxiv.org/abs/1804.02767 - paper:https://pjreddie.com/media/files/papers/YOLOv3.pdf - code: <https://pjreddie.com/darknet/yolo/> - github(Official):https://github.com/pjreddie/darknet - github:https://github.com/mystic123/tensorflow-yolo-v3 - github:https://github.com/experiencor/keras-yolo3 - github:https://github.com/qqwweee/keras-yolo3 - github:https://github.com/marvis/pytorch-yolo3 - github:https://github.com/ayooshkathuria/pytorch-yolo-v3 - github:https://github.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch - github:https://github.com/eriklindernoren/PyTorch-YOLOv3 - github:https://github.com/ultralytics/yolov3 - github:https://github.com/BobLiu20/YOLOv3_PyTorch - github:https://github.com/andy-yun/pytorch-0.4-yolov3 - github:https://github.com/DeNA/PyTorch_YOLOv3 ## YOLT **You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery** - intro: Small Object Detection - arxiv:https://arxiv.org/abs/1805.09512 - github:https://github.com/avanetten/yolt ## SSD **SSD: Single Shot MultiBox Detector** [![img](https://camo.githubusercontent.com/ad9b147ed3a5f48ffb7c3540711c15aa04ce49c6/687474703a2f2f7777772e63732e756e632e6564752f7e776c69752f7061706572732f7373642e706e67)](https://camo.githubusercontent.com/ad9b147ed3a5f48ffb7c3540711c15aa04ce49c6/687474703a2f2f7777772e63732e756e632e6564752f7e776c69752f7061706572732f7373642e706e67) - intro: ECCV 2016 Oral - arxiv: <http://arxiv.org/abs/1512.02325> - paper: <http://www.cs.unc.edu/~wliu/papers/ssd.pdf> - slides: [http://www.cs.unc.edu/%7Ewliu/papers/ssd_eccv2016_slide.pdf](http://www.cs.unc.edu/~wliu/papers/ssd_eccv2016_slide.pdf) - github(Official): <https://github.com/weiliu89/caffe/tree/ssd> - video: <http://weibo.com/p/2304447a2326da963254c963c97fb05dd3a973> - github: <https://github.com/zhreshold/mxnet-ssd> - github: <https://github.com/zhreshold/mxnet-ssd.cpp> - github: <https://github.com/rykov8/ssd_keras> - github: <https://github.com/balancap/SSD-Tensorflow> - github: <https://github.com/amdegroot/ssd.pytorch> - github(Caffe): <https://github.com/chuanqi305/MobileNet-SSD> **What's the diffience in performance between this new code you pushed and the previous code? #327** <https://github.com/weiliu89/caffe/issues/327> ## DSSD **DSSD : Deconvolutional Single Shot Detector** - intro: UNC Chapel Hill & Amazon Inc - arxiv: <https://arxiv.org/abs/1701.06659> - github: <https://github.com/chengyangfu/caffe/tree/dssd> - github: <https://github.com/MTCloudVision/mxnet-dssd> - demo: <http://120.52.72.53/www.cs.unc.edu/c3pr90ntc0td/~cyfu/dssd_lalaland.mp4> **Enhancement of SSD by concatenating feature maps for object detection** - intro: rainbow SSD (R-SSD) - arxiv: <https://arxiv.org/abs/1705.09587> **Context-aware Single-Shot Detector** - keywords: CSSD, DiCSSD, DeCSSD, effective receptive fields (ERFs), theoretical receptive fields (TRFs) - arxiv: <https://arxiv.org/abs/1707.08682> **Feature-Fused SSD: Fast Detection for Small Objects** <https://arxiv.org/abs/1709.05054> ## FSSD **FSSD: Feature Fusion Single Shot Multibox Detector** <https://arxiv.org/abs/1712.00960> **Weaving Multi-scale Context for Single Shot Detector** - intro: WeaveNet - keywords: fuse multi-scale information - arxiv: <https://arxiv.org/abs/1712.03149> ## ESSD **Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network** <https://arxiv.org/abs/1801.05918> **Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection** <https://arxiv.org/abs/1802.06488> ## MDSSD **MDSSD: Multi-scale Deconvolutional Single Shot Detector for small objects** - arxiv: https://arxiv.org/abs/1805.07009 ## Pelee **Pelee: A Real-Time Object Detection System on Mobile Devices** https://github.com/Robert-JunWang/Pelee - intro: (ICLR 2018 workshop track) - arxiv: https://arxiv.org/abs/1804.06882 - github: https://github.com/Robert-JunWang/Pelee ## Fire SSD **Fire SSD: Wide Fire Modules based Single Shot Detector on Edge Device** - intro:low cost, fast speed and high mAP on factor edge computing devices - arxiv:https://arxiv.org/abs/1806.05363 ## R-FCN **R-FCN: Object Detection via Region-based Fully Convolutional Networks** - arxiv: <http://arxiv.org/abs/1605.06409> - github: <https://github.com/daijifeng001/R-FCN> - github(MXNet): <https://github.com/msracver/Deformable-ConvNets/tree/master/rfcn> - github: <https://github.com/Orpine/py-R-FCN> - github: <https://github.com/PureDiors/pytorch_RFCN> - github: <https://github.com/bharatsingh430/py-R-FCN-multiGPU> - github: <https://github.com/xdever/RFCN-tensorflow> **R-FCN-3000 at 30fps: Decoupling Detection and Classification** <https://arxiv.org/abs/1712.01802> **Recycle deep features for better object detection** - arxiv: <http://arxiv.org/abs/1607.05066> ## FPN **Feature Pyramid Networks for Object Detection** - intro: Facebook AI Research - arxiv: <https://arxiv.org/abs/1612.03144> **Action-Driven Object Detection with Top-Down Visual Attentions** - arxiv: <https://arxiv.org/abs/1612.06704> **Beyond Skip Connections: Top-Down Modulation for Object Detection** - intro: CMU & UC Berkeley & Google Research - arxiv: <https://arxiv.org/abs/1612.06851> **Wide-Residual-Inception Networks for Real-time Object Detection** - intro: Inha University - arxiv: <https://arxiv.org/abs/1702.01243> **Attentional Network for Visual Object Detection** - intro: University of Maryland & Mitsubishi Electric Research Laboratories - arxiv: <https://arxiv.org/abs/1702.01478> **Learning Chained Deep Features and Classifiers for Cascade in Object Detection** - keykwords: CC-Net - intro: chained cascade network (CC-Net). 81.1% mAP on PASCAL VOC 2007 - arxiv: <https://arxiv.org/abs/1702.07054> **DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling** - intro: ICCV 2017 (poster) - arxiv: <https://arxiv.org/abs/1703.10295> **Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries** - intro: CVPR 2017 - arxiv: <https://arxiv.org/abs/1704.03944> **Spatial Memory for Context Reasoning in Object Detection** - arxiv: <https://arxiv.org/abs/1704.04224> **Accurate Single Stage Detector Using Recurrent Rolling Convolution** - intro: CVPR 2017. SenseTime - keywords: Recurrent Rolling Convolution (RRC) - arxiv: <https://arxiv.org/abs/1704.05776> - github: <https://github.com/xiaohaoChen/rrc_detection> **Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection** <https://arxiv.org/abs/1704.05775> **LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems** - intro: Embedded Vision Workshop in CVPR. UC San Diego & Qualcomm Inc - arxiv: <https://arxiv.org/abs/1705.05922> **Point Linking Network for Object Detection** - intro: Point Linking Network (PLN) - arxiv: <https://arxiv.org/abs/1706.03646> **Perceptual Generative Adversarial Networks for Small Object Detection** <https://arxiv.org/abs/1706.05274> **Few-shot Object Detection** <https://arxiv.org/abs/1706.08249> **Yes-Net: An effective Detector Based on Global Information** <https://arxiv.org/abs/1706.09180> **SMC Faster R-CNN: Toward a scene-specialized multi-object detector** <https://arxiv.org/abs/1706.10217> **Towards lightweight convolutional neural networks for object detection** <https://arxiv.org/abs/1707.01395> **RON: Reverse Connection with Objectness Prior Networks for Object Detection** - intro: CVPR 2017 - arxiv: <https://arxiv.org/abs/1707.01691> - github: <https://github.com/taokong/RON> **Mimicking Very Efficient Network for Object Detection** - intro: CVPR 2017. SenseTime & Beihang University - paper: <http://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Mimicking_Very_Efficient_CVPR_2017_paper.pdf> **Residual Features and Unified Prediction Network for Single Stage Detection** <https://arxiv.org/abs/1707.05031> **Deformable Part-based Fully Convolutional Network for Object Detection** - intro: BMVC 2017 (oral). Sorbonne Universités & CEDRIC - arxiv: <https://arxiv.org/abs/1707.06175> **Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors** - intro: ICCV 2017 - arxiv: <https://arxiv.org/abs/1707.06399> **Recurrent Scale Approximation for Object Detection in CNN** - intro: ICCV 2017 - keywords: Recurrent Scale Approximation (RSA) - arxiv: <https://arxiv.org/abs/1707.09531> - github: <https://github.com/sciencefans/RSA-for-object-detection> ## DSOD **DSOD: Learning Deeply Supervised Object Detectors from Scratch** ![img](https://user-images.githubusercontent.com/3794909/28934967-718c9302-78b5-11e7-89ee-8b514e53e23c.png) - intro: ICCV 2017. Fudan University & Tsinghua University & Intel Labs China - arxiv: <https://arxiv.org/abs/1708.01241> - github: <https://github.com/szq0214/DSOD> - github:https://github.com/Windaway/DSOD-Tensorflow - github:https://github.com/chenyuntc/dsod.pytorch **Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids** - arxiv:https://arxiv.org/abs/1712.00886 - github:https://github.com/szq0214/GRP-DSOD **Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages** - intro: BMVC 2018 - arXiv: https://arxiv.org/abs/1807.11013 **Object Detection from Scratch with Deep Supervision** - intro: This is an extended version of DSOD - arXiv: https://arxiv.org/abs/1809.09294 ## RetinaNet **Focal Loss for Dense Object Detection** - intro: ICCV 2017 Best student paper award. Facebook AI Research - keywords: RetinaNet - arxiv: <https://arxiv.org/abs/1708.02002> **CoupleNet: Coupling Global Structure with Local Parts for Object Detection** - intro: ICCV 2017 - arxiv: <https://arxiv.org/abs/1708.02863> **Incremental Learning of Object Detectors without Catastrophic Forgetting** - intro: ICCV 2017. Inria - arxiv: <https://arxiv.org/abs/1708.06977> **Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection** <https://arxiv.org/abs/1709.04347> **StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection** <https://arxiv.org/abs/1709.05788> **Dynamic Zoom-in Network for Fast Object Detection in Large Images** <https://arxiv.org/abs/1711.05187> **Zero-Annotation Object Detection with Web Knowledge Transfer** - intro: NTU, Singapore & Amazon - keywords: multi-instance multi-label domain adaption learning framework - arxiv: <https://arxiv.org/abs/1711.05954> ## MegDet **MegDet: A Large Mini-Batch Object Detector** - intro: Peking University & Tsinghua University & Megvii Inc - arxiv: <https://arxiv.org/abs/1711.07240> **Receptive Field Block Net for Accurate and Fast Object Detection** - intro: RFBNet - arxiv: <https://arxiv.org/abs/1711.07767> - github: <https://github.com//ruinmessi/RFBNet> **An Analysis of Scale Invariance in Object Detection - SNIP** - arxiv: <https://arxiv.org/abs/1711.08189> - github: <https://github.com/bharatsingh430/snip> **Feature Selective Networks for Object Detection** <https://arxiv.org/abs/1711.08879> **Learning a Rotation Invariant Detector with Rotatable Bounding Box** - arxiv: <https://arxiv.org/abs/1711.09405> - github: <https://github.com/liulei01/DRBox> **Scalable Object Detection for Stylized Objects** - intro: Microsoft AI & Research Munich - arxiv: <https://arxiv.org/abs/1711.09822> **Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids** - arxiv: <https://arxiv.org/abs/1712.00886> - github: <https://github.com/szq0214/GRP-DSOD> **Deep Regionlets for Object Detection** - keywords: region selection network, gating network - arxiv: <https://arxiv.org/abs/1712.02408> **Training and Testing Object Detectors with Virtual Images** - intro: IEEE/CAA Journal of Automatica Sinica - arxiv: <https://arxiv.org/abs/1712.08470> **Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video** - keywords: object mining, object tracking, unsupervised object discovery by appearance-based clustering, self-supervised detector adaptation - arxiv: <https://arxiv.org/abs/1712.08832> **Spot the Difference by Object Detection** - intro: Tsinghua University & JD Group - arxiv: <https://arxiv.org/abs/1801.01051> **Localization-Aware Active Learning for Object Detection** - arxiv: <https://arxiv.org/abs/1801.05124> **Object Detection with Mask-based Feature Encoding** - arxiv: <https://arxiv.org/abs/1802.03934> **LSTD: A Low-Shot Transfer Detector for Object Detection** - intro: AAAI 2018 - arxiv: <https://arxiv.org/abs/1803.01529> **Pseudo Mask Augmented Object Detection** <https://arxiv.org/abs/1803.05858> **Revisiting RCNN: On Awakening the Classification Power of Faster RCNN** <https://arxiv.org/abs/1803.06799> **Learning Region Features for Object Detection** - intro: Peking University & MSRA - arxiv: <https://arxiv.org/abs/1803.07066> **Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection** - intro: Singapore Management University & Zhejiang University - arxiv: <https://arxiv.org/abs/1803.08208> **Object Detection for Comics using Manga109 Annotations** - intro: University of Tokyo & National Institute of Informatics, Japan - arxiv: <https://arxiv.org/abs/1803.08670> **Task-Driven Super Resolution: Object Detection in Low-resolution Images** - arxiv: <https://arxiv.org/abs/1803.11316> **Transferring Common-Sense Knowledge for Object Detection** - arxiv: <https://arxiv.org/abs/1804.01077> **Multi-scale Location-aware Kernel Representation for Object Detection** - intro: CVPR 2018 - arxiv: <https://arxiv.org/abs/1804.00428> - github: <https://github.com/Hwang64/MLKP> **Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors** - intro: National University of Defense Technology - arxiv: https://arxiv.org/abs/1804.04606 **Robust Physical Adversarial Attack on Faster R-CNN Object Detector** - arxiv: https://arxiv.org/abs/1804.05810 ## RefineNet **Single-Shot Refinement Neural Network for Object Detection** - intro: CVPR 2018 - arxiv: <https://arxiv.org/abs/1711.06897> - github: <https://github.com/sfzhang15/RefineDet> - github: https://github.com/lzx1413/PytorchSSD - github: https://github.com/ddlee96/RefineDet_mxnet - github: https://github.com/MTCloudVision/RefineDet-Mxnet ## DetNet **DetNet: A Backbone network for Object Detection** - intro: Tsinghua University & Face++ - arxiv: https://arxiv.org/abs/1804.06215 ## SSOD **Self-supervisory Signals for Object Discovery and Detection** - Google Brain - arxiv:https://arxiv.org/abs/1806.03370 ## CornerNet **CornerNet: Detecting Objects as Paired Keypoints** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1808.01244 - github: <https://github.com/umich-vl/CornerNet> ## M2Det **M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network** - intro: AAAI 2019 - arXiv: https://arxiv.org/abs/1811.04533 - github: https://github.com/qijiezhao/M2Det ## 3D Object Detection **3D Backbone Network for 3D Object Detection** - arXiv: https://arxiv.org/abs/1901.08373 **LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDARs** - arxiv: https://arxiv.org/abs/1805.04902 - github: https://github.com/CPFL/Autoware/tree/feature/cnn_lidar_detection ## ZSD(Zero-Shot Object Detection) **Zero-Shot Detection** - intro: Australian National University - keywords: YOLO - arxiv: <https://arxiv.org/abs/1803.07113> **Zero-Shot Object Detection** - arxiv: https://arxiv.org/abs/1804.04340 **Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts** - arxiv: https://arxiv.org/abs/1803.06049 **Zero-Shot Object Detection by Hybrid Region Embedding** - arxiv: https://arxiv.org/abs/1805.06157 ## OSD(One-Shot Object Detection) **Comparison Network for One-Shot Conditional Object Detection** - arXiv: https://arxiv.org/abs/1904.02317 **One-Shot Object Detection** RepMet: Representative-based metric learning for classification and one-shot object detection - intro: IBM Research AI - arxiv:https://arxiv.org/abs/1806.04728 - github: TODO ## Weakly Supervised Object Detection **Weakly Supervised Object Detection in Artworks** - intro: ECCV 2018 Workshop Computer Vision for Art Analysis - arXiv: https://arxiv.org/abs/1810.02569 - Datasets: https://wsoda.telecom-paristech.fr/downloads/dataset/IconArt_v1.zip **Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation** - intro: CVPR 2018 - arXiv: https://arxiv.org/abs/1803.11365 - homepage: https://naoto0804.github.io/cross_domain_detection/ - paper: http://openaccess.thecvf.com/content_cvpr_2018/html/Inoue_Cross-Domain_Weakly-Supervised_Object_CVPR_2018_paper.html - github: https://github.com/naoto0804/cross-domain-detection ## Softer-NMS **《Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection》** - intro: CMU & Face++ - arXiv: https://arxiv.org/abs/1809.08545 - github: https://github.com/yihui-he/softer-NMS ## 2019 **Feature Selective Anchor-Free Module for Single-Shot Object Detection** - intro: CVPR 2019 - arXiv: https://arxiv.org/abs/1903.00621 **Object Detection based on Region Decomposition and Assembly** - intro: AAAI 2019 - arXiv: https://arxiv.org/abs/1901.08225 **Bottom-up Object Detection by Grouping Extreme and Center Points** - intro: one stage 43.2% on COCO test-dev - arXiv: https://arxiv.org/abs/1901.08043 - github: https://github.com/xingyizhou/ExtremeNet **ORSIm Detector: A Novel Object Detection Framework in Optical Remote Sensing Imagery Using Spatial-Frequency Channel Features** - intro: IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING - arXiv: https://arxiv.org/abs/1901.07925 **Consistent Optimization for Single-Shot Object Detection** - intro: improves RetinaNet from 39.1 AP to 40.1 AP on COCO datase - arXiv: https://arxiv.org/abs/1901.06563 **Learning Pairwise Relationship for Multi-object Detection in Crowded Scenes** - arXiv: https://arxiv.org/abs/1901.03796 **RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free** - arXiv: https://arxiv.org/abs/1901.03353 - github: https://github.com/chengyangfu/retinamask **Region Proposal by Guided Anchoring** - intro: CUHK - SenseTime Joint Lab - arXiv: https://arxiv.org/abs/1901.03278 **Scale-Aware Trident Networks for Object Detection** - intro: mAP of **48.4** on the COCO dataset - arXiv: https://arxiv.org/abs/1901.01892 ## 2018 **Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions** - arXiv: https://arxiv.org/abs/1812.11901 **Strong-Weak Distribution Alignment for Adaptive Object Detection** - arXiv: https://arxiv.org/abs/1812.04798 **AutoFocus: Efficient Multi-Scale Inference** - intro: AutoFocus obtains an **mAP of 47.9%** (68.3% at 50% overlap) on the **COCO test-dev** set while processing **6.4 images per second on a Titan X (Pascal) GPU** - arXiv: https://arxiv.org/abs/1812.01600 **NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection** - intro: Google Could - arXiv: https://arxiv.org/abs/1812.00124 **SPLAT: Semantic Pixel-Level Adaptation Transforms for Detection** - intro: UC Berkeley - arXiv: https://arxiv.org/abs/1812.00929 **Grid R-CNN** - intro: SenseTime - arXiv: https://arxiv.org/abs/1811.12030 **Deformable ConvNets v2: More Deformable, Better Results** - intro: Microsoft Research Asia - arXiv: https://arxiv.org/abs/1811.11168 **Anchor Box Optimization for Object Detection** - intro: Microsoft Research - arXiv: https://arxiv.org/abs/1812.00469 **Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects** - intro: https://arxiv.org/abs/1811.12152 **NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection** - arXiv: https://arxiv.org/abs/1812.00124 **Learning RoI Transformer for Detecting Oriented Objects in Aerial Images** - arXiv: https://arxiv.org/abs/1812.00155 **Integrated Object Detection and Tracking with Tracklet-Conditioned Detection** - intro: Microsoft Research Asia - arXiv: https://arxiv.org/abs/1811.11167 **Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection** - arXiv: https://arxiv.org/abs/1811.11318 **Gradient Harmonized Single-stage Detector** - intro: AAAI 2019 - arXiv: https://arxiv.org/abs/1811.05181 **CFENet: Object Detection with Comprehensive Feature Enhancement Module** - intro: ACCV 2018 - github: https://github.com/qijiezhao/CFENet **DeRPN: Taking a further step toward more general object detection** - intro: AAAI 2019 - arXiv: https://arxiv.org/abs/1811.06700 - github: https://github.com/HCIILAB/DeRPN **Hybrid Knowledge Routed Modules for Large-scale Object Detection** - intro: Sun Yat-Sen University & Huawei Noah’s Ark Lab - arXiv: https://arxiv.org/abs/1810.12681 - github: https://github.com/chanyn/HKRM **《Receptive Field Block Net for Accurate and Fast Object Detection》** - intro: ECCV 2018 - arXiv: [https://arxiv.org/abs/1711.07767](https://arxiv.org/abs/1711.07767) - github: [https://github.com/ruinmessi/RFBNet](https://github.com/ruinmessi/RFBNet) **Deep Feature Pyramid Reconfiguration for Object Detection** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1808.07993 **Unsupervised Hard Example Mining from Videos for Improved Object Detection** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1808.04285 **Acquisition of Localization Confidence for Accurate Object Detection** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1807.11590 - github: https://github.com/vacancy/PreciseRoIPooling **Toward Scale-Invariance and Position-Sensitive Region Proposal Networks** - intro: ECCV 2018 - arXiv: https://arxiv.org/abs/1807.09528 **MetaAnchor: Learning to Detect Objects with Customized Anchors** - arxiv: https://arxiv.org/abs/1807.00980 **Relation Network for Object Detection** - intro: CVPR 2018 - arxiv: https://arxiv.org/abs/1711.11575 - github:https://github.com/msracver/Relation-Networks-for-Object-Detection **Quantization Mimic: Towards Very Tiny CNN for Object Detection** - Tsinghua University1 & The Chinese University of Hong Kong2 &SenseTime3 - arxiv: https://arxiv.org/abs/1805.02152 **Learning Rich Features for Image Manipulation Detection** - intro: CVPR 2018 Camera Ready - arxiv: https://arxiv.org/abs/1805.04953 **SNIPER: Efficient Multi-Scale Training** - arxiv:https://arxiv.org/abs/1805.09300 - github:https://github.com/mahyarnajibi/SNIPER **Soft Sampling for Robust Object Detection** - intro: the robustness of object detection under the presence of missing annotations - arxiv:https://arxiv.org/abs/1806.06986 **Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria** - intro: TNNLS 2018 - arxiv:https://arxiv.org/abs/1807.00147 - code: http://kezewang.com/codes/ASM_ver1.zip ## Other **R3-Net: A Deep Network for Multi-oriented Vehicle Detection in Aerial Images and Videos** - arxiv: https://arxiv.org/abs/1808.05560 - youtube: https://youtu.be/xCYD-tYudN0 # Detection Toolbox - [Detectron(FAIR)](https://github.com/facebookresearch/Detectron): Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including [Mask R-CNN](https://arxiv.org/abs/1703.06870). It is written in Python and powered by the [Caffe2](https://github.com/caffe2/caffe2) deep learning framework. - [Detectron2](https://github.com/facebookresearch/detectron2): Detectron2 is FAIR's next-generation research platform for object detection and segmentation. - [maskrcnn-benchmark(FAIR)](https://github.com/facebookresearch/maskrcnn-benchmark): Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch. - [mmdetection(SenseTime&CUHK)](https://github.com/open-mmlab/mmdetection): mmdetection is an open source object detection toolbox based on PyTorch. It is a part of the open-mmlab project developed by [Multimedia Laboratory, CUHK](http://mmlab.ie.cuhk.edu.hk/).

Developer Tools ML Frameworks
7.5K Github Stars
ICCV2025-Papers-with-Code
Open Source

ICCV2025-Papers-with-Code

# ICCV 2025 论文和开源项目合集(Papers with Code) ICCV 2025 Accepance Rate of 24% = 2699 / 11239 > 注1:欢迎各位大佬提交issue,分享ICCV 2025论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision > > - [CVPR 2025](https://github.com/amusi/CVPR2025-Papers-with-Code) > - [ECCV 2024](https://github.com/amusi/ECCV2024-Papers-with-Code) 欢迎扫码加入【CVer学术交流群】,可以获取ICCV 2025等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来! ![](CVer学术交流群.png) # 【ICCV 2025 论文和开源代码目录】 - [3DGS(Gaussian Splatting)](#3DGS) - [Agent)](#Agent) - [Avatars](#Avatars) - [Backbone](#Backbone) - [CLIP](#CLIP) - [Mamba](#Mamba) - [Embodied AI](#Embodied-AI) - [GAN](#GAN) - [GNN](#GNN) - [多模态大语言模型(MLLM)](#MLLM) - [大语言模型(LLM)](#LLM) - [世界模型(World Model)](#WM) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [扩散模型(Diffusion Models)](#Diffusion) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [异常检测(Anomaly Detection)](#Anomaly-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像(Medical Image)](#MI) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [3D Visual Grounding(3D视觉定位)](#3DVG) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [3D生成(3D Generation)](#3D-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为检测(Action Detection)](#Action-Detection) - [具身智能(Embodied AI)](#Embodied) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [暗光图像增强(Low-light Image Enhancement)](#Low-light) - [场景图生成(Scene Graph Generation)](#SGG) - [风格迁移(Style Transfer)](#ST) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) - [压缩感知(Compressive Sensing)](#CS) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) <a name="3DGS"></a> # 3DGS(Gaussian Splatting) <a name="Agent"></a> # Agent <a name="Avatars"></a> # Avatars # Backbone **TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba** - Paper: https://arxiv.org/abs/2411.17473 - Code: https://github.com/xwmaxwma/TinyViM <a name="CLIP"></a> # CLIP <a name="Mamba"></a> # Mamba **TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba** - Paper: https://arxiv.org/abs/2411.17473 - Code: https://github.com/xwmaxwma/TinyViM **Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers** - Project:https://tiger-ai-lab.github.io/Vamba/ - Paper:https://arxiv.org/abs/2503.11579 - Code:https://github.com/TIGER-AI-Lab/Vamba <a name="Embodied-AI"></a> # Embodied AI <a name="GAN"></a> # GAN <a name="OCR"></a> # OCR <a name="NeRF"></a> # NeRF <a name="DETR"></a> # DETR <a name="Prompt"></a> # Prompt <a name="MLLM"></a> # 多模态大语言模型(MLLM) **FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers** - Paper: https://arxiv.org/abs/2501.16297 - Code: https://github.com/JiuTian-VL/JiuTian-FALCON - Project: https://jiutian-vl.github.io/FALCON.github.io/ <a name="LLM"></a> # 大语言模型(LLM) <a name="WM"></a> # World Model(世界模型) **Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning** - Project: https://yijun-yang.github.io/MeWM/ - Paper: https://arxiv.org/abs/2506.02327 - Code: https://github.com/scott-yjyang/MeWM <a name="ReID"></a> # ReID(重识别) <a name="Diffusion"></a> # 扩散模型(Diffusion Models) **From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers** - Paper: https://arxiv.org/abs/2503.06923 - Code: https://github.com/Shenyi-Z/TaylorSeer <a name="Vision-Transformer"></a> # Vision Transformer <a name="VL"></a> # 视觉和语言(Vision-Language) <a name="Object-Detection"></a> # 目标检测(Object Detection) <a name="Anomaly-Detection"></a> # 异常检测(Anomaly Detection) <a name="VT"></a> # 目标跟踪(Object Tracking) <a name="MI"></a> # 医学图像(Medical Image) **Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning** - Project: https://yijun-yang.github.io/MeWM/ - Paper: https://arxiv.org/abs/2506.02327 - Code: https://github.com/scott-yjyang/MeWM # 医学图像分割(Medical Image Segmentation) <a name="Autonomous-Driving"></a> # 自动驾驶(Autonomous Driving) **Where, What, Why: Towards Explainable Driver Attention Prediction** - Paper: https://arxiv.org/abs/2506.23088 - Code: https://github.com/yuchen2199/Explainable-Driver-Attention-Prediction - Project: https://github.com/yuchen2199/Explainable-Driver-Attention-Prediction **ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones** - Paper: https://arxiv.org/abs/2406.07661 - Code: https://github.com/anuragxel/roadwork-dataset - Project: https://www.cs.cmu.edu/~ILIM/roadwork_dataset/ **DriveMM: All-in-One Large Multimodal Model for Autonomous Driving** - Project: https://zhijian11.github.io/DriveMM/ - Paper: https://arxiv.org/abs/2412.07689 - Code: https://github.com/zhijian11/DriveMM # 3D点云(3D-Point-Cloud) <a name="3DOD"></a> # 3D目标检测(3D Object Detection) <a name="3DOD"></a> # 3D语义分割(3D Semantic Segmentation) <a name="LLV"></a> # Low-level Vision **EAMamba: Efficient All-Around Vision State Space Model for Image Restoration** - Paper: https://arxiv.org/abs/2506.22246 - Code: https://github.com/daidaijr/EAMamba <a name="SR"></a> # 超分辨率(Super-Resolution) <a name="Denoising"></a> # 去噪(Denoising) ## 图像去噪(Image Denoising) <a name="3D-Human-Pose-Estimation"></a> # 3D人体姿态估计(3D Human Pose Estimation) <a name="3DVG"></a> #3D Visual Grounding(3D视觉定位) <a name="Image-Generation"></a> # 图像生成(Image Generation) **DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models** - Paper: https://github.com/limuloo/DreamRenderer - Code: https://arxiv.org/abs/2503.12885 <a name="Video-Generation"></a> # 视频生成(Video Generation) <a name="Image-Editing"></a> # 图像编辑(Image Editing) **Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing** - Project: https://eff-edit.github.io - Paper: https://arxiv.org/abs/2503.10270 - Code: https://github.com/yuriYanZeXuan/EEdit <a name="Video-Editing"></a> # 视频编辑(Video Editing) <a name="3D-Generation"></a> # 3D生成(3D Generation) <a name="3D-Reconstruction"></a> # 3D重建(3D Reconstruction) <a name="HMG"></a> # 人体运动生成(Human Motion Generation) <a name="Video-Understanding"></a> # 视频理解(Video Understanding) **Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers** - Project:https://tiger-ai-lab.github.io/Vamba/ - Paper:https://arxiv.org/abs/2503.11579 - Code:https://github.com/TIGER-AI-Lab/Vamba <a name="Embodied"></a> # 具身智能(Embodied AI) <a name="KD"></a> # 知识蒸馏(Knowledge Distillation) <a name="Depth-Estimation"></a> # 深度估计(Depth Estimation) <a name="Stereo-Matching"></a> # 立体匹配(Stereo Matching) <a name="Low-light"></a> # 暗光图像增强(Low-light Image Enhancement) <a name="IC"></a> # 图像压缩(Image Compression)](#IC) <a name="SGG"></a> # 场景图生成(Scene Graph Generation) <a name="ST"></a> # 风格迁移(Style Transfer) <a name="IQA"></a> # 图像质量评价(Image Quality Assessment) <a name="Video-Quality-Assessment"></a> # 视频质量评价(Video Quality Assessment) <a name="CS"></a> # 压缩感知(Compressive Sensing) <a name="Datasets"></a> # 数据集(Datasets) **ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones** - Paper: https://arxiv.org/abs/2406.07661 - Code: https://github.com/anuragxel/roadwork-dataset - Project: https://www.cs.cmu.edu/~ILIM/roadwork_dataset/ <a name="Others"></a> # 其他(Others) **Music Grounding by Short Video** - Project: https://rucmm.github.io/VMMR/ - Paper: https://arxiv.org/abs/2408.16990 - Code link: https://github.com/xxayt/MGSV

Education & Learning
2.9K Github Stars
ECCV2024-Papers-with-Code
Open Source

ECCV2024-Papers-with-Code

# ECCV 2024 论文和开源项目合集(Papers with Code) ECCV 2024 decisions are now available! > 注1:欢迎各位大佬提交issue,分享ECCV 2024论文和开源项目! > > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision > > - [CVPR 2024](https://github.com/amusi/CVPR2024-Papers-with-Code) > - [ECCV 2022](ECCV2022-Papers-with-Code.md) > - [ECCV 2020](ECCV2020-Papers-with-Code.md) 想看ECCV 2024和最新最全的顶会工作,欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来! ![](CVer学术交流群.png) # 【ECCV 2024 论文开源目录】 - [3DGS(Gaussian Splatting)](#3DGS) - [Mamba / SSM)](#Mamba) - [Avatars](#Avatars) - [Backbone](#Backbone) - [CLIP](#CLIP) - [MAE](#MAE) - [Embodied AI](#Embodied-AI) - [GAN](#GAN) - [GNN](#GNN) - [多模态大语言模型(MLLM)](#MLLM) - [大语言模型(LLM)](#LLM) - [NAS](#NAS) - [OCR](#OCR) - [NeRF](#NeRF) - [DETR](#DETR) - [Prompt](#Prompt) - [扩散模型(Diffusion Models)](#Diffusion) - [ReID(重识别)](#ReID) - [长尾分布(Long-Tail)](#Long-Tail) - [Vision Transformer](#Vision-Transformer) - [视觉和语言(Vision-Language)](#VL) - [自监督学习(Self-supervised Learning)](#SSL) - [数据增强(Data Augmentation)](#DA) - [目标检测(Object Detection)](#Object-Detection) - [异常检测(Anomaly Detection)](#Anomaly-Detection) - [目标跟踪(Visual Tracking)](#VT) - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) - [实例分割(Instance Segmentation)](#Instance-Segmentation) - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) - [医学图像(Medical Image)](#MI) - [医学图像分割(Medical Image Segmentation)](#MIS) - [视频目标分割(Video Object Segmentation)](#VOS) - [视频实例分割(Video Instance Segmentation)](#VIS) - [参考图像分割(Referring Image Segmentation)](#RIS) - [图像抠图(Image Matting)](#Matting) - [图像编辑(Image Editing)](#Image-Editing) - [Low-level Vision](#LLV) - [超分辨率(Super-Resolution)](#SR) - [去噪(Denoising)](#Denoising) - [去模糊(Deblur)](#Deblur) - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) - [3D点云(3D Point Cloud)](#3D-Point-Cloud) - [3D目标检测(3D Object Detection)](#3DOD) - [3D语义分割(3D Semantic Segmentation)](#3DSS) - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) - [3D配准(3D Registration)](#3D-Registration) - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) - [医学图像(Medical Image)](#Medical-Image) - [图像生成(Image Generation)](#Image-Generation) - [视频生成(Video Generation)](#Video-Generation) - [3D生成(3D Generation)](#3D-Generation) - [视频理解(Video Understanding)](#Video-Understanding) - [行为识别(Action Recognition)](#Action-Recognition) - [行为检测(Action Detection)](#Action-Detection) - [文本检测(Text Detection)](#Text-Detection) - [知识蒸馏(Knowledge Distillation)](#KD) - [模型剪枝(Model Pruning)](#Pruning) - [图像压缩(Image Compression)](#IC) - [三维重建(3D Reconstruction)](#3D-Reconstruction) - [深度估计(Depth Estimation)](#Depth-Estimation) - [轨迹预测(Trajectory Prediction)](#TP) - [车道线检测(Lane Detection)](#Lane-Detection) - [图像描述(Image Captioning)](#Image-Captioning) - [视觉问答(Visual Question Answering)](#VQA) - [手语识别(Sign Language Recognition)](#SLR) - [视频预测(Video Prediction)](#Video-Prediction) - [新视点合成(Novel View Synthesis)](#NVS) - [Zero-Shot Learning(零样本学习)](#ZSL) - [立体匹配(Stereo Matching)](#Stereo-Matching) - [特征匹配(Feature Matching)](#Feature-Matching) - [场景图生成(Scene Graph Generation)](#SGG) - [计数(Counting)](#Counting) - [隐式神经表示(Implicit Neural Representations)](#INR) - [图像质量评价(Image Quality Assessment)](#IQA) - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) - [数据集(Datasets)](#Datasets) - [新任务(New Tasks)](#New-Tasks) - [其他(Others)](#Others) <a name="3DGS"></a> # 3DGS(Gaussian Splatting) **MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images** - Project: https://donydchen.github.io/mvsplat - Paper: https://arxiv.org/abs/2403.14627 - Code:https://github.com/donydchen/mvsplat **CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians** - Paper: https://arxiv.org/abs/2404.01133 - Code: https://github.com/DekuLiuTesla/CityGaussian **FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting** - Project: https://zehaozhu.github.io/FSGS/ - Paper: https://arxiv.org/abs/2312.00451 - Code: https://github.com/VITA-Group/FSGS <a name="Mamba"></a> # Mamba / SSM **VideoMamba: State Space Model for Efficient Video Understanding** - Paper: https://arxiv.org/abs/2403.06977 - Code: https://github.com/OpenGVLab/VideoMamba **ZIGMA: A DiT-style Zigzag Mamba Diffusion Model** - Paper: https://arxiv.org/abs/2403.13802 - Code: https://taohu.me/zigma/ <a name="Avatars"></a> # Avatars <a name="Backbone"></a> # Backbone <a name="CLIP"></a> # CLIP <a name="MAE"></a> # MAE <a name="Embodied-AI"></a> # Embodied AI <a name="GAN"></a> # GAN <a name="OCR"></a> # OCR **Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors** - Paper: https://arxiv.org/pdf/2312.05286 - Code: https://github.com/SJTU-DeepVisionLab/FreeReal **PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer** - Paper: https://arxiv.org/abs/2407.07764 - Code: https://github.com/SJTU-DeepVisionLab/PosFormer <a name="Occupancy"></a> # Occupancy **Fully Sparse 3D Occupancy Prediction** - Paper: https://arxiv.org/abs/2312.17118 - Code: https://github.com/MCG-NJU/SparseOcc <a name="NeRF"></a> # NeRF **NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields** - Project: https://nerf-mae.github.io/ - Paper: https://arxiv.org/pdf/2404.01300 - Code: https://github.com/zubair-irshad/NeRF-MAE <a name="DETR"></a> # DETR <a name="Prompt"></a> # Prompt <a name="MLLM"></a> # 多模态大语言模型(MLLM) **SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant** - Paper: https://arxiv.org/abs/2403.11299 - Code: https://github.com/heliossun/SQ-LLaVA **ControlCap: Controllable Region-level Captioning** - Paper: https://arxiv.org/abs/2401.17910 - Code: https://github.com/callsys/ControlCap <a name="LLM"></a> # 大语言模型(LLM) <a name="NAS"></a> # NAS <a name="ReID"></a> # ReID(重识别) <a name="Diffusion"></a> # 扩散模型(Diffusion Models) **ZIGMA: A DiT-style Zigzag Mamba Diffusion Model** - Paper: https://arxiv.org/abs/2403.13802 - Code: https://taohu.me/zigma/ **Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation** - Paper: https://arxiv.org/abs/2403.16394 - Code: https://github.com/zdxdsw/skewed_relations_T2I **The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization** - Project: https://ut-mao.github.io/noise.github.io/ - Paper: https://arxiv.org/abs/2312.08872 - Code: https://github.com/UT-Mao/Initial-Noise-Construction <a name="Vision-Transformer"></a> # Vision Transformer **GiT: Towards Generalist Vision Transformer through Universal Language Interface** - Paper: https://arxiv.org/abs/2403.09394 - Code: https://github.com/Haiyang-W/GiT <a name="VL"></a> # 视觉和语言(Vision-Language) **GalLoP: Learning Global and Local Prompts for Vision-Language Models** - Paper:https://arxiv.org/abs/2407.01400 <a name="Object-Detection"></a> # 目标检测(Object Detection) **Relation DETR: Exploring Explicit Position Relation Prior for Object Detection** - Paper: https://arxiv.org/abs/2407.11699v1 - Code: https://github.com/xiuqhou/Relation-DETR - Dataset: https://huggingface.co/datasets/xiuqhou/SA-Det-100k **Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector** - Project: http://yuqianfu.com/CDFSOD-benchmark/ - Paper: https://arxiv.org/pdf/2402.03094 - Code: https://github.com/lovelyqian/CDFSOD-benchmark <a name="Anomaly-Detection"></a> # 异常检测(Anomaly Detection) <a name="VT"></a> # 目标跟踪(Object Tracking) <a name="Semantic-Segmentation"></a> # 语义分割(Semantic Segmentation) **Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation** - Paper: https://arxiv.org/abs/2405.06228 - Code: https://github.com/nizhenliang/CGRSeg <a name="MI"></a> # 医学图像(Medical Image) **Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging** - Paper: https://arxiv.org/abs/2311.16914 - Code: https://github.com/peirong26/Brain-ID **FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification** - Project: https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k - Paper : https://arxiv.org/abs/2407.08813 - Dataset: https://drive.google.com/drive/u/1/folders/1huH93JVeXMj9rK6p1OZRub868vv0UK0O - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain <a name="MIS"></a> # 医学图像分割(Medical Image Segmentation) **ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image** - Project: https://scribbleprompt.csail.mit.edu/ - Paper: https://arxiv.org/abs/2312.07381 - Code: https://github.com/halleewong/ScribblePrompt **AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking** - Paper: https://arxiv.org/abs/2407.06468 - Code: https://github.com/ricklisz/AnatoMask **Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures** - Paper: https://arxiv.org/abs/2407.14754 - Code: https://github.com/cbmi-group/FFM-Multi-Decoder-Network <a name="VOS"></a> # 视频目标分割(Video Object Segmentation) **DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries** - Project: https://zhang-tao-whu.github.io/projects/DVIS_DAQ/ - Paper: https://arxiv.org/abs/2404.00086 - Code: https://github.com/zhang-tao-whu/DVIS_Plus <a name="Autonomous-Driving"></a> # 自动驾驶(Autonomous Driving) **Fully Sparse 3D Occupancy Prediction** - Paper: https://arxiv.org/abs/2312.17118 - Code: https://github.com/MCG-NJU/SparseOcc **milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing** - Paper: https://arxiv.org/abs/2306.17010 - Code: https://github.com/Toytiny/milliFlow/ **4D Contrastive Superflows are Dense 3D Representation Learners** - Paper : https://arxiv.org/abs/2407.06190 - Code: https://github.com/Xiangxu-0103/SuperFlow <a name="3D-Point-Cloud"></a> # 3D点云(3D-Point-Cloud) <a name="3DOD"></a> # 3D目标检测(3D Object Detection) **3D Small Object Detection with Dynamic Spatial Pruning** - Project: https://xuxw98.github.io/DSPDet3D/ - Paper: https://arxiv.org/abs/2305.03716 - Code: https://github.com/xuxw98/DSPDet3D **Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection** - Paper: https://arxiv.org/abs/2402.03634 - Code: https://github.com/LiewFeng/RayDN <a name="3DOD"></a> # 3D语义分割(3D Semantic Segmentation) <a name="Image-Editing"></a> # 图像编辑(Image Editing) <a name="Image-Inpainting"></a> # 图像补全/图像修复(Image Inpainting) **BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion** - Project https://tencentarc.github.io/BrushNet/ - Paper: https://arxiv.org/abs/2403.06976 - Code: https://github.com/TencentARC/BrushNet <a name="Video-Editing"></a> # 视频编辑(Video Editing) <a name="LLV"></a> # Low-level Vision **Restoring Images in Adverse Weather Conditions via Histogram Transformer** - Paper: https://arxiv.org/abs/2407.10172 - Code: https://github.com/sunshangquan/Histoformer **OneRestore: A Universal Restoration Framework for Composite Degradation** - Project https://gy65896.github.io/projects/ECCV2024_OneRestore - Paper: https://arxiv.org/abs/2407.04621 - Code: https://github.com/gy65896/OneRestore # 超分辨率(Super-Resolution) <a name="Denoising"></a> # 去噪(Denoising) ## 图像去噪(Image Denoising) <a name="3D-Human-Pose-Estimation"></a> # 3D人体姿态估计(3D Human Pose Estimation) <a name="Image-Generation"></a> # 图像生成(Image Generation) **Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models** - Paper: https://arxiv.org/abs/2404.07389 - Code: https://github.com/YasminZhang/EBAMA **Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization** - Project: https://kaminyou.com/Dense-Normalization/ - Paper: https://arxiv.org/abs/2407.04245 - Code: https://github.com/Kaminyou/Dense-Normalization **ZIGMA: A DiT-style Zigzag Mamba Diffusion Model** - Paper: https://arxiv.org/abs/2403.13802 - Code: https://taohu.me/zigma/ **Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation** - Paper: https://arxiv.org/abs/2403.16394 - Code: https://github.com/zdxdsw/skewed_relations_T2I <a name="Video-Generation"></a> # 视频生成(Video Generation) **VideoStudio: Generating Consistent-Content and Multi-Scene Videos** - Project: https://vidstudio.github.io/ - Code: https://github.com/FuchenUSTC/VideoStudio <a name="3D-Generation"></a> # 3D生成 <a name="Video-Understanding"></a> # 视频理解(Video Understanding) **VideoMamba: State Space Model for Efficient Video Understanding** - Paper: https://arxiv.org/abs/2403.06977 - Code: https://github.com/OpenGVLab/VideoMamba **C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition** - Paper: https://arxiv.org/abs/2407.06113 - Code: https://github.com/RongchangLi/ZSCAR_C2C <a name="Action-Recognition"></a> # 行为识别(Action Recognition) **SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders** - Paper: https://arxiv.org/abs/2407.13460 - Code: https://github.com/pha123661/SA-DVAE <a name="KD"></a> # 知识蒸馏(Knowledge Distillation) <a name="IC"></a> # 图像压缩(Image Compression) **Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation** - Code: https://github.com/qingshi9974/ECCV2024-AdpatICMH - Paper: http://arxiv.org/abs/2407.09853 <a name="Stereo-Matching"></a> # 立体匹配(Stereo Matching) <a name="SGG"></a> # 场景图生成(Scene Graph Generation) <a name="Counting"></a> # 计数(Counting) **Zero-shot Object Counting with Good Exemplars** - Paper: https://arxiv.org/abs/2407.04948 - Code: https://github.com/HopooLinZ/VA-Count <a name="Video-Quality-Assessment"></a> # 视频质量评价(Video Quality Assessment) <a name="Datasets"></a> # 数据集(Datasets) # 其他(Others) **Multi-branch Collaborative Learning Network for 3D Visual Grounding** - Paper: https://arxiv.org/abs/2407.05363v2 - Code: https://github.com/qzp2018/MCLN **PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers** - Code: https://github.com/ananthu-aniraj/pdiscoformer - Paper: https://arxiv.org/abs/2407.04538 **SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments** - Project: https://fraunhoferhhi.github.io/spvloc/ - Paper: https://arxiv.org/abs/2404.10527 - Code: https://github.com/fraunhoferhhi/spvloc **REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices** - Project: https://xdimlab.github.io/REFRAME/ - Paper: https://arxiv.org/abs/2403.16481 - Code: https://github.com/MARVELOUSJI/REFRAME

AI & Machine Learning Education & Learning
2.3K Github Stars
awesome-data-label-tools
Open Source

awesome-data-label-tools

# 标注工具大全 - [2D图像](#Image) - [视频](#Video) - [3D](#3D) <a name="Image"></a> ## 2D图像 <a name="Video"></a> ## 视频 <a name="3D"></a> ## 3D

Data Labeling
52 Github Stars