Home
Softono
awesome-diffusion-categorized

awesome-diffusion-categorized

Open source
2.2K
Stars
102
Forks
1
Issues
79
Watchers
3 months
Last Commit

About awesome-diffusion-categorized

collection of diffusion model papers categorized by their subareas

Platforms

Web Self-hosted

Links

Awesome Diffusion Categorized

Contents

  • Visual Illusion
  • Color
  • Count
  • Poster
  • Accelerate
  • Image Restoration
  • Storytelling
  • Virtual Try On
  • Drag Edit
  • Text-Guided Editing
  • Continual Learning
  • Remove Concept
  • In Context Learning
    • Multi-Concept
    • Decompostion
    • ID Encoder
    • General Personalization
    • AR-based
    • Video Customization

      [Code]

      Relational Diffusion Distillation for Efficient Image Generation \ [ACM MM 2024 (Oral)] [Code]

      Autoregressive Distillation of Diffusion Transformers \ [CVPR 2025 Oral] [Code]

      UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs \ [CVPR 2024] [Code]

      Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition \ [CVPR 2025] [Code]

      ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs \ [AAAI 2026] [Code]

      SADA: Stability-guided Adaptive Diffusion Acceleration \ [ICML 2025] [Code]

      SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow \ [ECCV 2024] [Code]

      Accelerating Image Generation with Sub-path Linear Approximation Model \ [ECCV 2024] [Code]

      Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models \ [NeurIPS 2023] [Code]

      Fast and Memory-Efficient Video Diffusion Using Streamlined Inference \ [NeurIPS 2024] [Code]

      Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling \ [CVPR 2026] [Code]

      A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models \ [ICML 2024] [Code]

      Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation \ [ICML 2024] [Code]

      On the Trajectory Regularity of ODE-based Diffusion Sampling \ [ICML 2024] [Code]

      InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation \ [ICLR 2024] [Code]

      Improved Training Technique for Latent Consistency Models \ [ICLR 2025] [Code]

      ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration \ [AAAI 2026] [Code]

      Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching \ [ACM MM 2025] [Code]

      CacheQuant: Comprehensively Accelerated Diffusion Models \ [CVPR 2025] [Code]

      SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models \ [CVPR 2026] [Code]

      Accelerating Vision Diffusion Transformers with Skip Branches \ [Website] [Code]

      Accelerating Diffusion Transformers with Dual Feature Caching \ [Website] [Code]

      From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers \ [Website] [Code]

      Exposure Bias Reduction for Enhancing Diffusion Transformer Feature Caching \ [Website] [Code]

      One Step Diffusion via Shortcut Models \ [Website] [Code]

      DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach \ [Website] [Code]

      DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance \ [Website] [Code]

      A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training \ [Website] [Code]

      Stable Consistency Tuning: Understanding and Improving Consistency Models \ [Website] [Code]

      SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models \ [Website] [Code]

      Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching \ [Website] [Code]

      SDXL-Lightning: Progressive Adversarial Diffusion Distillation \ [Website] [Code]

      Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation \ [Website] [Code]

      Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation \ [Website] [Code]

      Diffusion Models Are Innate One-Step Generators \ [Website] [Code]

      Optimal Stepsize for Diffusion Sampling \ [Website] [Code]

      Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models \ [Website] [Code]

      Few-Step Diffusion via Score identity Distillation \ [Website] [Code]

      FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation \ [Website] [Code]

      SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation \ [Website] [Code]

      Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers \ [Website] [Code]

      Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models \ [Website] [Code]

      SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching \ [Website] [Code]

      QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification \ [Website] [Code]

      DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space \ [Website] [Code]

      QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification \ [Website] [Code]

      Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy \ [Website] [Code]

      pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation \ [Website] [Code]

      Towards One-step Causal Video Generation via Adversarial Self-Distillation \ [Website] [Code]

      RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pruning \ [Website] [Code]

      Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield \ [Website] [Code]

      TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times \ [Website] [Code]

      CorGi: Contribution-Guided Block-Wise Interval Caching for Training-Free Acceleration of Diffusion Transformers \ [Website] [Code]

      ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation \ [Website] [Code]

      Jano: Adaptive Diffusion Generation with Early-stage Convergence Awareness \ [Website] [Code]

      SODA: Sensitivity-Oriented Dynamic Acceleration for Diffusion Transformer \ [Website] [Code]

      TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward \ [Website] [Code]

      Distilling Diffusion Models into Conditional GANs \ [ECCV 2024] [Project]

      Shortcutting Pre-trained Flow Matching Diffusion Models is Almost Free Lunch \ [NeurIPS 2025] [Project]

      Cache Me if You Can: Accelerating Diffusion Models through Block Caching \ [CVPR 2024] [Project]

      Plug-and-Play Diffusion Distillation \ [CVPR 2024] [Project]

      SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds \ [NeurIPS 2023] [Project]

      One-step Diffusion Models with f-Divergence Distribution Matching \ [Website] [Project]

      MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Portrait Few-Step Synthesis \ [Website] [Project]

      Diffusion Adversarial Post-Training for One-Step Video Generation \ [Website] [Project]

      SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance \ [Website] [Project]

      NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training \ [Website] [Project]

      Truncated Consistency Models \ [Website] [Project]

      Multi-student Diffusion Distillation for Better One-step Generators \ [Website] [Project]

      Effortless Efficiency: Low-Cost Pruning of Diffusion Models \ [Website] [Project]

      SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training \ [Website] [Project]

      SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device \ [Website] [Project]

      Align Your Flow: Scaling Continuous-Time Flow Map Distillation \ [Website] [Project]

      Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation \ [Website] [Project]

      Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor \ [Website] [Project]

      POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models \ [Website] [Project]

      Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation \ [Website] [Project]

      Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency \ [Website] [Project]

      Self-Evaluation Unlocks Any-Step Text-to-Image Generation \ [Website] [Project]

      Transition Matching Distillation for Fast Video Generation \ [Website] [Project]

      Fast Autoregressive Video Diffusion and World Models with Temporal Cache Compression and Sparse Attention \ [Website] [Project]

      FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion \ [Website] [Project]

      Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis \ [ICCV 2025 (Highlight)]

      OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models \ [ICCV 2025]

      FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification \ [NeurIPS 2024]

      One-Step Diffusion Distillation through Score Implicit Matching \ [NeurIPS 2024]

      Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation \ [AAAI 2025]

      TAP: A Token-Adaptive Predictor Framework for Training-Free Diffusion Acceleration \ [CVPR 2026]

      BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers \ [CVPR 2025]

      PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models \ [CVPR 2025]

      MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation \ [CVPR 2025]

      Revisiting Diffusion Models: From Generative Pre-training to One-Step Generation \ [ICML 2025]

      Accelerate High-Quality Diffusion Models with Inner Loop Feedback \ [Website]

      Accelerating Diffusion Transformer via Error-Optimized Cache \ [Website]

      DICE: Distilling Classifier-Free Guidance into Text Embeddings \ [Website]

      ProReflow: Progressive Reflow with Decomposed Velocity \ [Website]

      Inference-Time Diffusion Model Distillation \ [Website]

      Taming Consistency Distillation for Accelerated Human Image Animation \ [Website]

      Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free \ [Website]

      HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration \ [Website]

      Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models \ [Website]

      MLCM: Multistep Consistency Distillation of Latent Diffusion Model \ [Website]

      EM Distillation for One-step Diffusion Models \ [Website]

      AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration \ [Website]

      Score-of-Mixture Training: Training One-Step Generative Models Made Simple \ [Website]

      Partially Conditioned Patch Parallelism for Accelerated Diffusion Model Inference \ [Website]

      Importance-based Token Merging for Diffusion Models \ [Website]

      Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation \ [Website]

      Accelerating Diffusion Models with One-to-Many Knowledge Distillation \ [Website]

      Accelerating Video Diffusion Models via Distribution Matching \ [Website]

      TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution \ [Website]

      DDIL: Improved Diffusion Distillation With Imitation Learning \ [Website]

      OSV: One Step is Enough for High-Quality Image to Video Generation \ [Website]

      Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance \ [Website]

      Token Caching for Diffusion Transformer Acceleration \ [Website]

      DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization \ [Website]

      LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers \ [Website]

      Flow Generator Matching \ [Website]

      Multistep Distillation of Diffusion Models via Moment Matching \ [Website]

      SFDDM: Single-fold Distillation for Diffusion models \ [Website]

      LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models \ [Website]

      CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion \ [Website]

      SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation \ [Website]

      Ditto: Accelerating Diffusion Model via Temporal Value Similarity \ [Website]

      Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training \ [Website]

      TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution \ [Website]

      Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity \ [Website]

      Efficient Distillation of Classifier-Free Guidance using Adapters \ [Website]

      Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation \ [Website]

      Inductive Moment Matching \ [Website]

      High Quality Diffusion Distillation on a Single GPU with Relative and Absolute Position Matching \ [Website]

      DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers \ [Website]

      Mean Flows for One-step Generative Modeling \ [Website]

      Faster Video Diffusion with Trainable Sparse Attention \ [Website]

      Accelerating Diffusion-based Super-Resolution with Dynamic Time-Spatial Sampling \ [Website]

      SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation \ [Website]

      Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation \ [Website]

      RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy \ [Website]

      Foresight: Adaptive Layer Reuse for Accelerated and High-Quality Text-to-Video Generation \ [Website]

      Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principles \ [Website]

      Diffusion Transformer-to-Mamba Distillation for High-Resolution Image Generation \ [Website]

      Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers \ [Website]

      Accelerating Parallel Diffusion Model Serving with Residual Compression \ [Website]

      SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment \ [Website]

      MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration \ [Website]

      Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers \ [Website]

      DiCache: Let Diffusion Model Determine Its Own Cache \ [Website]

      HiCache: Training-free Acceleration of Diffusion Models via Hermite Polynomial-based Feature Caching \ [Website]

      SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation \ [Website]

      BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching \ [Website]

      RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer \ [Website]

      SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention \ [Website]

      CLQ: Cross-Layer Guided Orthogonal-based Quantization for Diffusion Transformers \ [Website]

      Score Distillation of Flow Matching Models \ [Website]

      Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers \ [Website]

      LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation \ [Website]

      FreqCa: Accelerating Diffusion Models via Frequency-Aware Caching \ [Website]

      Hierarchical Koopman Diffusion: Fast Generation with Interpretable Diffusion Trajectory \ [Website]

      Test-Time Iterative Error Correction for Efficient Diffusion Models \ [Website]

      From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model \ [Website]

      PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling \ [Website]

      Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning \ [Website]

      GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers \ [Website]

      InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models \ [Website]

      USV: Unified Sparsification for Accelerating Video Diffusion Models \ [Website]

      TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows \ [Website]

      Few-Step Distillation for Text-to-Image Generation: A Practical Guide \ [Website]

      On the Design of One-step Diffusion via Shortcutting Flow Paths \ [Website]

      OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration \ [Website]

      Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization \ [Website]

      Forecast the Principal, Stabilize the Residual: Subspace-Aware Feature Caching for Efficient Diffusion Transformers \ [Website]

      DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching \ [Website]

      NanoFLUX: Distillation-Driven Compression of Large Text-to-Image Generation Models for Mobile Devices \ [Website]

      DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers \ [Website]

      LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration \ [Website]

      Analyzing and Improving Fast Sampling of Text-to-Image Diffusion Models \ [Website]

      Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration \ [Website]

      TC-Padé: Trajectory-Consistent Padé Approximation for Diffusion Acceleration \ [Website]

      Train-Free

      AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising \ [NeurIPS 2024] [Project] [Code]

      Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy \ [NeurIPS 2024] [Project] [Code]

      DeepCache: Accelerating Diffusion Models for Free \ [CVPR 2024] [Project] [Code]

      Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers \ [Website] [Project] [Code]

      Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference \ [NeurIPS 2024] [Code]

      DiTFastAttn: Attention Compression for Diffusion Transformer Models \ [NeurIPS 2024] [Code]

      Structural Pruning for Diffusion Models \ [NeurIPS 2023] [Code]

      AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration \ [ICCV 2023] [Code]

      Agent Attention: On the Integration of Softmax and Linear Attention \ [ECCV 2024] [Code]

      Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration \ [CVPR 2025] [Code]

      Token Merging for Fast Stable Diffusion \ [CVPRW 2024] [Code]

      LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation \ [Website] [Code]

      FORA: Fast-Forward Caching in Diffusion Transformer Acceleration \ [Website] [Code]

      Real-Time Video Generation with Pyramid Attention Broadcast \ [Website] [Code]

      Accelerating Diffusion Transformers with Token-wise Feature Caching \ [Website] [Code]

      TGATE-V1: Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models \ [Website] [Code]

      TGATE-V2: Faster Diffusion via Temporal Attention Decomposition \ [Website] [Code]

      SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers \ [Website] [Code]

      Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models \ [CVPR 2024] [Project]

      Training-free Diffusion Acceleration with Bottleneck Sampling \ [Website] [Project]

      Cache Me if You Can: Accelerating Diffusion Models through Block Caching \ [Website] [Project]

      Fewer Denoising Steps or Cheaper Per-Step Inference: Towards Compute-Optimal Diffusion Model Deployment \ [ICCV 2025]

      Token Fusion: Bridging the Gap between Token Pruning and Token Merging \ [WACV 2024]

      Flexiffusion: Training-Free Segment-Wise Neural Architecture Search for Efficient Diffusion Models \ [Website]

      PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future \ [Website]

      Δ-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers \ [Website]

      Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step \ [Website]

      Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences \ [Website]

      Fast constrained sampling in pre-trained diffusion models \ [Website]

      Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas \ [Website]

      ETC: training-free diffusion models acceleration with Error-aware Trend Consistency \ [Website]

      AR model

      Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching \ [ICLR 2025] [Project] [Code]

      Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding \ [ICLR 2025] [Code]

      LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding \ [ICLR 2025] [Code]

      Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation \ [Website] [Code]

      SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL \ [Website] [Code]

      Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation \ [Website]

      SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation \ [Website]

      Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation \ [Website]

      Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation \ [Website]

      VAR model

      Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient \ [CVPR 2025] [Project] [Code]

      FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning \ [ICCV 2025] [Project] [Code]

      Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression \ [Website] [Code]

      SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping \ [Website] [Code]

      Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis \ [Website] [Code]

      LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization \ [Website]

      Image Restoration

      Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems \ [ECCV 2024 Oral] [Project] [Code]

      Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model \ [ICLR 2023 oral] [Project] [Code]

      Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild \ [CVPR 2024] [Project] [Code]

      Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model \ [CVPR 2024] [Project] [Code]

      Zero-Reference Low-Light Enhancement via Physical Quadruple Priors \ [CVPR 2024] [Project] [Code]

      From Posterior Sampling to Meaningful Diversity in Image Restoration \ [ICLR 2024] [Project] [Code]

      Generative Diffusion Prior for Unified Image Restoration and Enhancement \ [CVPR 2023] [Project] [Code]

      MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration \ [ECCV 2024] [Project] [Code]

      Image Restoration with Mean-Reverting Stochastic Differential Equations \ [ICML 2023] [Project] [Code]

      PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging \ [NeurIPS 2024 Spotlight] [Project] [Code]

      Denoising Diffusion Models for Plug-and-Play Image Restoration \ [CVPR 2023 Workshop NTIRE] [Project] [Code]

      FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration \ [Website] [Project] [Code]

      Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing \ [Website] [Project] [Code]

      SVFR: A Unified Framework for Generalized Video Face Restoration \ [Website] [Project] [Code]

      DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models \ [Website] [Project] [Code]

      Solving Video Inverse Problems Using Image Diffusion Models \ [Website] [Project] [Code]

      RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration \ [Website] [Project] [Code]

      Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration \ [Website] [Project] [Code]

      GenDR: Lightning Generative Detail Restorator \ [Website] [Project] [Code]

      AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion \ [Website] [Project] [Code]

      SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training \ [Website] [Project] [Code]

      Text-Aware Image Restoration with Diffusion Models \ [Website] [Project] [Code]

      LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer \ [Website] [Project] [Code]

      Zero-Shot Video Deraining with Video Diffusion Models \ [Website] [Project] [Code]

      TPGDiff: Hierarchical Triple-Prior Guided Diffusion for Image Restoration \ [Website] [Project] [Code]

      FlowIE: Efficient Image Enhancement via Rectified Flow \ [CVPR 2024 oral] [Code]

      ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting \ [NeurIPS 2023 (Spotlight)] [Code]

      GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration \ [ICML 2023 oral] [Code]

      Diffusion Priors for Variational Likelihood Estimation and Image Denoising \ [NeurIPS 2024 Spotlight] [Code]

      DiffIR: Efficient Diffusion Model for Image Restoration\

      [ICCV 2023] [Code]

      Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal \ [ICCV 2025] [Code]

      Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance \ [CVPR 2024] [Code]

      InstaRevive: One-Step Image Enhancement via Dynamic Score Matching \ [ICLR 2025] [Code]

      LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models \ [ECCV 2024] [Code]

      Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model \ [ECCV 2024] [Code]

      DAVI: Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problem \ [ECCV 2024] [Code]

      Low-Light Image Enhancement with Wavelet-based Diffusion Models \ [SIGGRAPH Asia 2023] [Code]

      Residual Denoising Diffusion Models \ [CVPR 2024] [Code]

      Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks \ [CVPR 2024] [Code]

      Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing \ [CVPR 2025] [Code]

      Deep Equilibrium Diffusion Restoration with Parallel Sampling \ [CVPR 2024] [Code]

      Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing \ [ECCV 2024] [Code]

      An Expectation-Maximization Algorithm for Training Clean Diffusion Models from Corrupted Observations \ [NeurIPS 2024] [Code]

      ReFIR: Grounding Large Restoration Models with Retrieval Augmentation \ [NeurIPS 2024] [Code]

      DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation \ [NeurIPS 2024] [Code]

      Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual \ [CVPR 2025] [Code]

      Learning to See in the Extremely Dark \ [ICCV 2025] [Code]

      Exploiting Diffusion Prior for Real-World Image Dehazing with Unpaired Training \ [AAAI 2025] [Code]

      Seeing Through the Rain: Resolving High-Frequency Conflicts in Deraining and Super-Resolution via Diffusion Guidance \ [AAAI 2026] [Code]

      Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal \ [CVPR 2024] [Code]

      Enhancing Diffusion Model Stability for Image Restoration via Gradient Management \ [ACM MM 2025] [Code]

      PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching \ [AAAI 2026] [Code]

      Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models \ [CVPR 2023 Workshop NTIRE] [Code]

      Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement \ [CVPR 2024 Workshop NTIRE] [Code]

      JPEG Artifact Correction using Denoising Diffusion Restoration Models \ [NeurIPS 2022 Workshop] [Code]

      FlowDPS: Flow-Driven Posterior Sampling for Inverse Problems \ [Website] [Code]

      InstructRestore: Region-Customized Image Restoration with Human Instructions \ [Website] [Code]

      Decoupled Data Consistency with Diffusion Purification for Image Restoration \ [Website] [Code]

      One-Step Diffusion Model for Image Motion-Deblurring \ [Website] [Code]

      Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression \ [Website] [Code]

      Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems \ [Website] [Code]

      Improving Diffusion-based Inverse Algorithms under Few-Step Constraint via Learnable Linear Extrapolation \ [Website] [Code]

      DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration \ [Website] [Code]

      DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models \ [Website] [Code]

      UniProcessor: A Text-induced Unified Low-level Image Processor \ [Website] [Code]

      Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) \ [Website] [Code]

      Varformer: Adapting VAR's Generative Prior for Image Restoration \ [Website] [Code]

      Low-Light Image Enhancement via Generative Perceptual Priors \ [Website] [Code]

      PnP-Flow: Plug-and-Play Image Restoration with Flow Matching \ [Website] [Code]

      VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement \ [Website] [Code]

      Deep Data Consistency: a Fast and Robust Diffusion Model-based Solver for Inverse Problems \ [Website] [Code]

      Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration \ [Website] [Code]

      ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring \ [Website] [Code]

      Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling \ [Website] [Code]

      Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models \ [Website] [Code]

      Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior \ [Website] [Code]

      Frequency Compensated Diffusion Model for Real-scene Dehazing \ [Website] [Code]

      Efficient Image Deblurring Networks based on Diffusion Models \ [Website] [Code]

      Blind Image Restoration via Fast Diffusion Inversion \ [Website] [Code]

      DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models \ [Website] [Code]

      Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling \ [Website] [Code]

      Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration \ [Website] [Code]

      Unlimited-Size Diffusion Restoration \ [Website] [Code]

      UniDB++: Fast Sampling of Unified Diffusion Bridge \ [Website] [Code]

      VmambaIR: Visual State Space Model for Image Restoration \ [Website] [Code]

      InvFussion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems \ [Website] [Code]

      Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model \ [Website] [Code]

      Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model \ [Website] [Code]

      BD-Diff: Generative Diffusion Model for Image Deblurring on Unknown Domains with Blur-Decoupled Learning \ [Website] [Code]

      IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models \ [Website] [Code]

      Degradation-Consistent Learning via Bidirectional Diffusion for Low-Light Image Enhancement \ [Website] [Code]

      Residual Diffusion Bridge Model for Image Restoration \ [Website] [Code]

      Learnable Fractional Reaction-Diffusion Dynamics for Under-Display ToF Imaging and Beyond \ [Website] [Code]

      EndoIR: Degradation-Agnostic All-in-One Endoscopic Image Restoration via Noise-Aware Routing Diffusion \ [Website] [Code]

      Equivariant Sampling for Improving Diffusion Model-based Image Restoration \ [Website] [Code]

      Fose: Fusion of One-Step Diffusion and End-to-End Network for Pansharpening \ [Website] [Code]

      MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration \ [Website] [Code]

      Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption \ [CVPR 2025] [Project]

      TIP: Text-Driven Image Processing with Semantic and Restoration Instructions \ [ECCV 2024] [Project]

      Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models \ [NeurIPS 2024] [Project]

      GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration \ [Website] [Project]

      VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models \ [Website] [Project]

      SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration \ [Website] [Project]

      SILO: Solving Inverse Problems with Latent Operators \ [Website] [Project]

      Proxies for Distortion and Consistency with Applications for Real-World Image Restoration \ [Website] [Project]

      UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations \ [Website] [Project]

      Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision \ [Website] [Project]

      From Events to Clarity: The Event-Guided Diffusion Framework for Dehazing \ [Website] [Project]

      FlowSteer: Conditioning Flow Field for Consistent Image Restoration \ [Website] [Project]

      CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos \ [Website] [Project]

      BlurDM: A Blur Diffusion Model for Image Deblurring \ [NeurIPS 2025]

      DREAMCLEAN: RESTORING CLEAN IMAGE USING DEEP DIFFUSION PRIOR \ [ICLR 2025]

      Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model \ [ICCV 2023]

      Multiscale Structure Guided Diffusion for Image Deblurring \ [ICCV 2023]

      Boosting Image Restoration via Priors from Pre-trained Models \ [CVPR 2024]

      Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration \ [CVPR 2025]

      Visual-Instructed Degradation Diffusion for All-in-One Image Restoration \ [CVPR 2025]

      Reversing Flow for Image Restoration \ [CVPR 2025]

      Dual Prompting Image Restoration with Diffusion Transformers \ [CVPR 2025]

      Integrating Intermediate Layer Optimization and Projected Gradient Descent for Solving Inverse Problems with Diffusion Models \ [ICML 2025]

      Exploiting Diffusion Prior for Task-driven Image Restoration \ [ICCV 2025]

      Seeing Beyond Haze: Generative Nighttime Image Dehazing \ [Website]

      Inverse Problem Sampling in Latent Space Using Sequential Monte Carlo \ [Website]

      Human Body Restoration with One-Step Diffusion Model and A New Benchmark \ [Website]

      A Modular Conditional Diffusion Framework for Image Reconstruction \ [Website]

      Solving Inverse Problems using Diffusion with Fast Iterative Renoising \ [Website]

      Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model \ [Website]

      Particle-Filtering-based Latent Diffusion for Inverse Problems \ [Website]

      Bayesian Conditioned Diffusion Models for Inverse Problem \ [Website]

      ReCo-Diff: Explore Retinex-Based Condition Strategy in Diffusion Model for Low-Light Image Enhancement \ [Website]

      Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration \ [Website]

      Tell Me What You See: Text-Guided Real-World Image Denoising\ [Website]

      Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement \ [Website]

      Prototype Clustered Diffusion Models for Versatile Inverse Problems \ [Website]

      AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement \ [Website]

      Taming Generative Diffusion for Universal Blind Image Restoration \ [Website]

      Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL \ [Website]

      TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration \ [Website]

      Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior \ [Website]

      Enhancing Diffusion Models for Inverse Problems with Covariance-Aware Posterior Sampling \ [Website]

      Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration \ [Website]

      FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process \ [Website]

      Diffusion State-Guided Projected Gradient for Inverse Problems \ [Website]

      InstantIR: Blind Image Restoration with Instant Generative Reference \ [Website]

      Score-Based Variational Inference for Inverse Problems \ [Website]

      Towards Flexible and Efficient Diffusion Low Light Enhancer \ [Website]

      AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting \ [Website]

      G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving \ [Website]

      AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations \ [Website]

      STeP: A General and Scalable Framework for Solving Video Inverse Problems with Spatiotemporal Diffusion Priors \ [Website]

      DiffMVR: Diffusion-based Automated Multi-Guidance Video Restoration \ [Website]

      Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion \ [Website]

      DIVD: Deblurring with Improved Video Diffusion Model \ [Website]

      Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration \ [Website]

      Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization \ [Website]

      Are Conditional Latent Diffusion Models Effective for Image Restoration? \ [Website]

      Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration \ [Website]

      DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration \ [Website]

      Diffusion Restoration Adapter for Real-World Image Restoration \ [Website]

      Noise Synthesis for Low-Light Image Denoising with Diffusion Models \ [Website]

      A Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising \ [Website]

      Temporal-Consistent Video Restoration with Pre-trained Diffusion Models \ [Website]

      Diffusion Image Prior \ [Website]

      Invert2Restore: Zero-Shot Degradation-Blind Image Restoration \ [Website]

      Blind Inversion using Latent Diffusion Priors \ [Website]

      CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image \ [Website]

      IDDM: Bridging Synthetic-to-Real Domain Gap from Physics-Guided Diffusion for Real-world Image Dehazing \ [Website]

      DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor \ [Website]

      LatentINDIGO: An INN-Guided Latent Diffusion Algorithm for Image Restoration \ [Website]

      Dual Ascent Diffusion for Inverse Problems \ [Website]

      Restoring Real-World Images with an Internal Detail Enhancement Diffusion Model \ [Website]

      HAODiff: Human-Aware One-Step Diffusion via Dual-Prompt Guidance \ [Website]

      DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP \ [Website]

      Latent Guidance in Diffusion Models for Perceptual Evaluations \ [Website]

      Solving Inverse Problems with FLAIR \ [Website]

      Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach \ [Website]

      Restereo: Diffusion stereo video generation and restoration \ [Website]

      UniRes: Universal Image Restoration for Complex Degradations \ [Website]

      Zero-Shot Solving of Imaging Inverse Problems via Noise-Refined Likelihood Guided Diffusion Models \ [Website]

      Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration \ [Website]

      LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling \ [Website]

      Harnessing Diffusion-Yielded Score Priors for Image Restoration \ [Website]

      UniLDiff: Unlocking the Power of Diffusion Priors for All-in-One Image Restoration \ [Website]

      ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration \ [Website]

      Diffusion Once and Done: Degradation-Aware LoRA for Efficient All-in-One Image Restoration \ [Website]

      DiTVR: Zero-Shot Diffusion Transformer for Video Restoration \ [Website]

      Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration \ [Website]

      OS-DiffVSR: Towards One-step Latent Diffusion Model for High-detailed Real-world Video Super-Resolution \ [Website]

      Boosting Fidelity for Pre-Trained-Diffusion-Based Low-Light Image Enhancement via Condition Refinement \ [Website]

      Noise is All You Need: Solving Linear Inverse Problems by Noise Combination Sampling with Diffusion Models \ [Website]

      Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward \ [Website]

      Integrating Reweighted Least Squares with Plug-and-Play Diffusion Priors for Noisy Image Restoration \ [Website]

      InstantViR: Real-Time Video Inverse Problem Solver with Distilled Diffusion Prior \ [Website]

      BokehFlow: Depth-Free Controllable Bokeh Rendering via Flow Matching \ [Website]

      UnfoldLDM: Deep Unfolding-based Blind Image Restoration with Latent Diffusion Priors \ [Website]

      CARD: Correlation Aware Restoration with Diffusion \ [Website]

      SURE Guided Posterior Sampling: Trajectory Correction for Diffusion-Based Inverse Problems \ [Website]

      Measurement-Consistent Langevin Corrector: A Remedy for Latent Diffusion Inverse Solvers \ [Website]

      Unifying Heterogeneous Degradations: Uncertainty-Aware Diffusion Bridge Model for All-in-One Image Restoration \ [Website]

      Zero-Shot Video Restoration and Enhancement with Assistance of Video Diffusion Models \ [Website]

      LCUDiff: Latent Capacity Upgrade Diffusion for Faithful Human Body Restoration \ [Website]

      Colorization

      LVCD: Reference-based Lineart Video Colorization with Diffusion Models \ [SIGGRAPH Asia 2024] [Project] [Code]

      Cobra: Efficient Line Art COlorization with BRoAder References \ [SIGGRAPH 2025] [Project] [Code]

      MagicColor: Multi-Instance Sketch Colorization \ [Website] [Project] [Code]

      ColorFlow: Retrieval-Augmented Image Sequence Colorization \ [Website] [Project] [Code]

      Control Color: Multimodal Diffusion-based Interactive Image Colorization \ [Website] [Project] [Code]

      Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior \ [Website] [Project] [Code]

      MangaNinja: Line Art Colorization with Precise Reference Following \ [Website] [Project] [Code]

      Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior \ [Website] [Project] [Code]

      VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization \ [Website] [Project] [Code]

      SketchColour: Channel Concat Guided DiT-based Sketch-to-Colour Pipeline for 2D Animation \ [Website] [Project] [Code]

      L-CAD: Language-based Colorization with Any-level Descriptions using Diffusion Priors \ [Website] [Code]

      SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace Colorization \ [Website] [Code]

      Image Referenced Sketch Colorization Based on Animation Creation Workflow \ [Website] [Code]

      ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text \ [Website] [Code]

      ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities \ [Website] [Code]

      Leveraging the Powerful Attention of a Pre-trained Diffusion Model for Exemplar-based Image Colorization \ [Website] [Code]

      Diffusing Colors: Image Colorization with Text Guided Diffusion \ [SIGGRAPH Asia 2023] [Project]

      VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization \ [Website] [Project]

      Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements \ [Website]

      DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models \ [Website]

      Consistent Video Colorization via Palette Guidance \ [Website]

      L-C4: Language-Based Video Colorization for Creative and Consistent Color \ [Website]

      LatentColorization: Latent Diffusion-Based Speaker Video Colorization \ [Website]

      Controllable Image Colorization with Instance-aware Texts and Masks \ [Website]

      AnimeColor: Reference-based Animation Colorization with Diffusion Transformers \ [Website]

      MangaDiT: Reference-Guided Line Art Colorization with Hierarchical Attention in Diffusion Transformers \ [Website]

      Enhancing Reference-based Sketch Colorization via Separating Reference Representations \ [Website]

      Prompt-based Consistent Video Colorization \ [Website]

      Face Restoration

      InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration \ [ICLR 2025] [Website] [Project] [Code] [Demo]

      Self-Supervised Selective-Guided Diffusion Model for Old-Photo Face Restoration \ [Website] [Project] [Code]

      DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior \ [Website] [Project] [Code]

      OSDFace: One-Step Diffusion Model for Face Restoration \ [Website] [Project] [Code]

      DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration \ [CVPR 2023] [Code]

      PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance \ [NeurIPS 2023] [Code]

      AT-DDPM: Restoring Faces degraded by Atmospheric Turbulence using Denoising Diffusion Probabilistic Models \ [WACV 2023] [Code]

      FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration \ [WACV 2025] [Code]

      HonestFace: Towards Honest Face Restoration with One-Step Diffusion Model \ [Website] [Code]

      DifFace: Blind Face Restoration with Diffused Error Contraction \ [Website] [Code]

      AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior \ [Website] [Code]

      Towards Real-World Blind Face Restoration with Generative Diffusion Prior \ [Website] [Code]

      QuantFace: Low-Bit Post-Training Quantization for One-Step Diffusion Face Restoration \ [Website] [Code]

      Towards Unsupervised Blind Face Restoration using Diffusion Prior \ [Website] [Project]

      DynFaceRestore: Balancing Fidelity and Quality in Diffusion-Guided Blind Face Restoration with Dynamic Blur-Level Mapping and Guidance \ [ICCV 2025]

      InfoBFR: Real-World Blind Face Restoration via Information Bottleneck \ [Website]

      DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration \ [Website]

      CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models \ [Website]

      DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration \ [Website]

      Gaussian is All You Need: A Unified Framework for Solving Inverse Problems via Diffusion Posterior Sampling \ [Website]

      Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model \ [Website]

      DR-BFR: Degradation Representation with Diffusion Models for Blind Face Restoration \ [Website]

      Face2Face: Label-driven Facial Retouching Restoration \ [Website]

      WaveFace: Authentic Face Restoration with Efficient Frequency Recovery \ [Website]

      DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning \ [Website]

      LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter \ [Website]

      Unlocking the Potential of Diffusion Priors in Blind Face Restoration \ [Website]

      BIR-Adapter: A Low-Complexity Diffusion Model Adapter for Blind Image Restoration \ [Website]

      Image Compression

      Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion \ [ICML 2025] [Project] [Code]

      DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression \ [CVPR 2026] [Project] [Code]

      OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates \ [NeurIPS 2025] [Code]

      Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior \ [IEE TCSVT 2024] [Code]

      Taming Large Multimodal Agents for Ultra-low Bitrate Semantically Disentangled Image Compression \ [CVPR 2025] [Code]

      Using Powerful Prior Knowledge of Diffusion Model in Deep Unfolding Networks for Image Compressive Sensing \ [CVPR 2025] [Code]

      StableCodec: Taming One-Step Diffusion for Extreme Image Compression \ [ICCV 2025] [Code]

      PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling \ [Website] [Code]

      Diffusion-based Extreme Image Compression with Compressed Feature Initialization \ [Website] [Code]

      Lossy Compression with Pretrained Diffusion Models \ [Website] [Code]

      DiffO: Single-step Diffusion for Image Compression at Ultra-Low Bitrates \ [Website] [Code]

      Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression \ [Website] [Code]

      Compressed Image Generation with Denoising Diffusion Codebook Models \ [Website] [Project]

      Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression \ [Website] [Project]

      DiffPC: Diffusion-based High Perceptual Fidelity Image Compression with Semantic Refinement \ [ICLR 2025]

      Controllable Distortion-Perception Tradeoff Through Latent Diffusion for Neural Image Compression \ [AAAI 2025]

      Invertible Diffusion Models for Compressed Sensing \ [TPAMI 2025]

      Diffusion-based Perceptual Neural Video Compression with Temporal Diffusion Information Reuse \ [Website]

      Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression \ [Website]

      Leveraging Diffusion Knowledge for Generative Image Compression with Fractal Frequency-Aware Band Learning \ [Website]

      Towards Facial Image Compression with Consistency Preserving Diffusion Prior \ [Website]

      Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model \ [Website]

      One-Step Diffusion-Based Image Compression with Semantic Distillation \ [Website]

      Fast Training-free Perceptual Image Compression \ [Website]

      Conditional Video Generation for High-Efficiency Video Compression \ [Website]

      CoD: A Diffusion Foundation Model for Image Compression \ [Website]

      Low-Bitrate Video Compression through Semantic-Conditioned Diffusion \ [Website]

      Generative Neural Video Compression via Video Diffusion Prior \ [Website]

      SLIM: Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion \ [Website]

      Towards Efficient Low-rate Image Compression with Frequency-aware Diffusion Prior Refinement \ [Website]

      DiffVC-RT: Towards Practical Real-Time Diffusion-based Perceptual Neural Video Compression \ [Website]

      CADC: Content Adaptive Diffusion-Based Generative Image Compression \ [Website]

      Super Resolution

      ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting \ [NeurIPS 2023 spotlight] [Website] [Project] [Code]

      Image Super-Resolution via Iterative Refinement \ [TPAMI] [Website] [Project] [Code]

      DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution \ [ECCV 2024] [Project] [Code]

      Kalman-Inspired Feature Propagation for Video Face Super-Resolution \ [ECCV 2024] [Project] [Code]

      LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only four RTX4090S \ [Website] [Project] [Code]

      HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior \ [Website] [Project] [Code]

      Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution \ [Website] [Project] [Code]

      MatchDiffusion: Training-free Generation of Match-cuts \ [Website] [Project] [Code]

      Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling \ [Website] [Project] [Code]

      STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution \ [Website] [Project] [Code]

      AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation \ [Website] [Project] [Code]

      FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution \ [Website] [Project] [Code]

      Exploiting Diffusion Prior for Real-World Image Super-Resolution \ [Website] [Project] [Code]

      FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution \ [Website] [Project] [Code]

      Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion \ [Website] [Project] [Code]

      Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling \ [ICLR 2026] [Code]

      SinSR: Diffusion-Based Image Super-Resolution in a Single Step \ [CVPR 2024] [Code]

      CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution \ [CVPR 2024] [Code]

      Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs \ [NeurIPS 2024] [Code]

      SeeClear: Semantic Distillation Enhances Pixel Condensation for Video Super-Resolution \ [NeurIPS 2024] [Code]

      Iterative Token Evaluation and Refinement for Real-World Super-Resolution \ [AAAI 2024] [Code]

      MegaSR: Mining Customized Semantics and Expressive Guidance for Image Super-Resolution \ [Website] [Code]

      One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation \ [Website] [Code]

      Boosting Diffusion Guidance via Learning Degradation-Aware Models for Blind Super Resolution \ [Website] [Code]

      PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution \ [Website] [Code]

      Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution \ [Website] [Code]

      Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors \ [Website] [Code]

      One Step Diffusion-based Super-Resolution with Time-Aware Distillation \ [Website] [Code]

      Diffusion Prior Interpolation for Flexibility Real-World Face Super-Resolution \ [Website] [Code]

      Visual Autoregressive Modeling for Image Super-Resolution \ [Website] [Code]

      StructSR: Refuse Spurious Details in Real-World Image Super-Resolution \ [Website] [Code]

      Hero-SR: One-Step Diffusion for Super-Resolution with Human Perception Priors \ [Website] [Code]

      RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-Resolution \ [Website] [Code]

      Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model \ [Website] [Code]

      One-Step Effective Diffusion Network for Real-World Image Super-Resolution \ [Website] [Code]

      Binarized Diffusion Model for Image Super-Resolution \ [Website] [Code]

      Does Diffusion Beat GAN in Image Super Resolution? \ [Website] [Code]

      PatchScaler: An Efficient Patch-independent Diffusion Model for Super-Resolution \ [Website] [Code]

      DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion \ [Website] [Code]

      Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach \ [Website] [Code]

      OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs \ [Website] [Code]

      Enhanced Semantic Extraction and Guidance for UGC Image Super Resolution \ [Website] [Code]

      Arbitrary-steps Image Super-resolution via Diffusion Inversion \ [Website] [Code]

      Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization \ [Website] [Code]

      DSR-Diff: Depth Map Super-Resolution with Diffusion Model \ [Website] [Code]

      Semantic-Guided Diffusion Model for Single-Step Image Super-Resolution \ [Website] [Code]

      Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach \ [Website] [Code]

      BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution \ [Website] [Code]

      RFSR: Improving ISR Diffusion Models via Reward Feedback Learning \ [Website] [Code]

      SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution \ [Website] [Code]

      XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution \ [Website] [Code]

      QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution \ [Website] [Code]

      Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution \ [Website] [Code]

      BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution \ [Website] [Code]

      Consistency Trajectory Matching for One-Step Generative Super-Resolution \ [Website] [Code]

      TASR: Timestep-Aware Diffusion Model for Image Super-Resolution \ [Website] [Code]

      Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models \ [Website] [Code]

      Text-Aware Real-World Image Super-Resolution via Diffusion Model with Joint Segmentation Decoders \ [Website] [Code]

      One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution \ [Website] [Code]

      QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution \ [Website] [Code]

      OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution \ [Website] [Code]

      Ultra-High-Definition Reference-Based Landmark Image Super-Resolution with Generative Diffusion Prior \ [Website] [Code]

      InfVSR: Breaking Length Limits of Generic Video Super-Resolution \ [Website] [Code]

      SCEESR: Semantic-Control Edge Enhancement for Diffusion-Based Super-Resolution \ [Website] [Code]

      PGP-DiffSR: Phase-Guided Progressive Pruning for Efficient Diffusion-based Image Super-Resolution \ [Website] [Code]

      OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution \ [Website] [Code]

      Bridging Fidelity-Reality with Controllable One-Step Diffusion for Image Super-Resolution \ [Website] [Code]

      FiDeSR: High-Fidelity and Detail-Preserving One-Step Diffusion Super-Resolution \ [Website] [Code]

      Spectral and Trajectory Regularization for Diffusion Transformer Super-Resolution \ [Website] [Code]

      Disentangled Textual Priors for Diffusion-based Image Super-Resolution \ [Website] [Code]

      The Power of Context: How Multimodality Improves Image Super-Resolution \ [Website] [Project]

      SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution \ [Website] [Project]

      DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution \ [Website] [Project]

      DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency \ [Website] [Project]

      RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution \ [Website] [Project]

      Time-Aware One Step Diffusion Network for Real-World Image Super-Resolution \ [Website] [Project]

      UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution \ [Website] [Project]

      SkipSR: Faster Super Resolution with Token Skipping \ [Website] [Project]

      STCDiT: Spatio-Temporally Consistent Diffusion Transformer for High-Quality Video Super-Resolution \ [Website] [Project]

      FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution \ [Website] [Project]

      HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models\ [ICCV 2023] [Website]

      TurboVSR: Fantastic Video Upscalers and Where to Find Them \ [ICCV 2025]

      SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning \ [NeurIPS 2025]

      PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution \ [CVPR 2025]

      Text-guided Explorable Image Super-resolution \ [CVPR 2024]

      Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder \ [CVPR 2024]

      AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution \ [CVPR 2024]

      Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network \ [AAAI 2024]

      BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution \ [AAAI 2025]

      DP2O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolutio \ [NeurIPS 2025]

      Detail-Enhancing Framework for Reference-Based Image Super-Resolution \ [Website]

      You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation \ [Website]

      Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution \ [Website]

      Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models \ [Website]

      Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning \ [Website]

      YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution \ [Website]

      Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution -- a Non-Denoising Model \ [Website]

      TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution \ [Website]

      QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution \ [Website]

      ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution \ [Website]

      Image Super-Resolution with Text Prompt Diffusio \ [Website]

      DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution \ [Website]

      DREAM: Diffusion Rectification and Estimation-Adaptive Models \ [Website]

      Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution \ [Website]

      Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution \ [Website]

      CasSR: Activating Image Power for Real-World Image Super-Resolution \ [Website]

      Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution \ [Website]

      Frequency-Domain Refinement with Multiscale Diffusion for Super Resolution \ [Website]

      ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer \ [Website]

      Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution \ [Website]

      Adversarial Diffusion Compression for Real-World Image Super-Resolution \ [Website]

      HF-Diff: High-Frequency Perceptual Loss and Distribution Matching for One-Step Diffusion-Based Image Super-Resolution \ [Website]

      Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution \ [Website]

      RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution \ [Website]

      CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution \ [Website]

      Spatial Degradation-Aware and Temporal Consistent Diffusion Model for Compressed Video Super-Resolution \ [Website]

      AdaptSR: Low-Rank Adaptation for Efficient and Scalable Real-World Super-Resolution \ [Website]

      One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation \ [Website]

      KernelFusion: Assumption-Free Blind Super-Resolution via Patch Diffusion \ [Website]

      SupResDiffGAN a new approach for the Super-Resolution task \ [Website]

      GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution \ [Website]

      EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution \ [Website]

      Creatively Upscaling Images with Global-Regional Priors \ [Website]

      DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution \ [Website]

      UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space \ [Website]

      Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment \ [Website]

      One-Step Diffusion-based Real-World Image Super-Resolution with Visual Perception Distillation \ [Website]

      Self-Cascaded Diffusion Models for Arbitrary-Scale Image Super-Resolution \ [Website]

      Efficient Burst Super-Resolution with One-step Diffusion \ [Website]

      RealisVSR: Detail-enhanced Diffusion for Real-World 4K Video Super-Resolution \ [Website]

      RASR: Retrieval-Augmented Super Resolution for Practical Reference-based Image Restoration \ [Website]

      RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution \ [Website]

      TinySR: Pruning Diffusion for Real-World Image Super-Resolution \ [Website]

      Realism Control One-step Diffusion for Real-World Image Super-Resolution \ [Website]

      LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution \ [Website]

      Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution \ [Website]

      HDW-SR: High-Frequency Guided Diffusion Model based on Wavelet Decomposition for Image Super-Resolution \ [Website]

      One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution \ [Website]

      Low-Resolution Editing is All You Need for High-Resolution Editing \ [Website]

      Zero-shot Adaptation of Stable Diffusion via Plug-in Hierarchical Degradation Representation for Real-World Super-Resolution \ [Website]

      Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution \ [Website]

      F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model \ [Website]

      OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion \ [Website]

      Tiled Prompts: Overcoming Prompt Underspecification in Image and Video Super-Resolution \ [Website]

      Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution \ [Website]

      Improved Adversarial Diffusion Compression for Real-World Video Super-Resolution \ [Website]

      Personalized Restoration

      Restoration by Generation with Constrained Priors \ [CVPR 2024 Highlight] [Project] [Code]

      ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration \ [NeurIPS 2024] [Project] [Code]

      InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention \ [Website] [Project] [Code]

      Personalized Restoration via Dual-Pivot Tuning \ [Website] [Project] [Code]

      FaceMe: Robust Blind Face Restoration with Personal Identification \ [AAAI 2025] [Code]

      RestorerID: Towards Tuning-Free Face Restoration with ID Preservation \ [Website] [Code]

      PFStorer: Personalized Face Restoration and Super-Resolution \ [CVPR 2024]

      Reference-Guided Identity Preserving Face Restoration \ [Website]

      Storytelling

      One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt \ [ICLR 2025] [Website] [Project] [Code]

      CharaConsist: Fine-Grained Consistent Character Generation \ [ICCV 2025] [Project] [Code]

      Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models \ [CVPR 2024] [Project] [Code]

      Training-Free Consistent Text-to-Image Generation \ [SIGGRAPH 2024] [Project] [Code]

      The Chosen One: Consistent Characters in Text-to-Image Diffusion Models \ [SIGGRAPH 2024] [Project] [Code]

      StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation \ [NeurIPS 2024] [Project] [Code]

      OneActor: Consistent Character Generation via Cluster-Conditioned Guidance \ [NeurIPS 2024] [Project] [Code]

      StoryGPT-V: Large Language Models as Consistent Story Visualizers \ [CVPR 2025] [Project] [Code]

      DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation \ [CVPR 2025] [Project] [Code]

      TaleCrafter: Interactive Story Visualization with Multiple Characters \ [SIGGRAPH Asia 2023] [Project] [Code]

      ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions \ [CVPR 2025] [Project] [Code]

      AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation \ [Website] [Project] [Code]

      Consistent Subject Generation via Contrastive Instantiated Concepts \ [Website] [Project] [Code]

      Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation \ [Website] [Project] [Code]

      Story-Adapter: A Training-free Iterative Framework for Long Story Visualization \ [Website] [Project] [Code]

      DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation \ [Website] [Project] [Code]

      Manga Generation via Layout-controllable Diffusion \ [Website] [Project] [Code]

      Story2Board: A Training-Free Approach for Expressive Storyboard Generation \ [Website] [Project] [Code]

      MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising \ [Website] [Project] [Code]

      Why Settle for One? Text-to-ImageSet Generation and Evaluation \ [Website] [Project] [Code]

      DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion \ [Website] [Project] [Code]

      ASemConsist: Adaptive Semantic Feature Control for Training-Free Identity-Consistent Generation \ [Website] [Project] [Code]

      StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion \ [ECCV 2024] [Code]

      Make-A-Story: Visual Memory Conditioned Consistent Story Generation \ [CVPR 2023] [Code]

      StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization \ [AAAI 2025] [Code]

      Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models \ [AAAI 2025] [Code]

      StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation \ [Website] [Code]

      Consistent Story Generation with Asymmetry Zigzag Sampling \ [Website] [Code]

      SEED-Story: Multimodal Long Story Generation with Large Language Model \ [Website] [Code]

      Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models \ [Website] [Code]

      Masked Generative Story Transformer with Character Guidance and Caption Augmentation \ [Website] [Code]

      StoryBench: A Multifaceted Benchmark for Continuous Story Visualization \ [Website] [Code]

      Subject-Consistent and Pose-Diverse Text-to-Image Generation \ [Website] [Code]

      SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency \ [NeurIPS 2025] [Project]

      Multi-Shot Character Consistency for Text-to-Video Generation \ [Website] [Project]

      ConsiStyle: Style Diversity in Training-Free Consistent T2I Generation \ [Website] [Project]

      Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models \ [Website] [Project]

      DreamingComics: A Story Visualization Pipeline via Subject and Layout Customized Generation using Video Models \ [Website] [Project]

      PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling \ [Website] [Project]

      Infinite-Story: A Training-Free Consistent Text-to-Image Generation \ [AAAI 2026 Oral]

      Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation \ [ICCV 2025]

      Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis \ [ICASSP 2024]

      CharCom: Composable Identity Control for Multi-Character Story Illustration \ [ACM MMAsia 2025]

      Bringing Characters to New Stories: Training-Free Theme-Specific Image Generation via Dynamic Visual Prompting \ [Website]

      CogCartoon: Towards Practical Story Visualization \ [Website]

      Generating coherent comic with rich story using ChatGPT and Stable Diffusion \ [Website]

      Improved Visual Story Generation with Adaptive Context Modeling \ [Website]

      Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control \ [Website]

      Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models \ [Website]

      Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models \ [Website]

      ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models \ [Website]

      Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection \ [Website]

      StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration \ [Website]

      Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention \ [Website]

      Text2Story: Advancing Video Storytelling with Text Guidance \ [Website]

      Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling \ [Website]

      ViSTA: Visual Storytelling using Multi-modal Adapters for Text-to-Image Diffusion Models \ [Website]

      Retrieval Augmented Comic Image Generation \ [Website]

      StorySync: Training-Free Subject Consistency in Text-to-Image Generation via Region Harmonization \ [Website]

      Plot'n Polish: Zero-shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models \ [Website]

      TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering \ [Website]

      Consistent text-to-image generation via scene de-contextualization \ [Website]

      ReDiStory: Region-Disentangled Diffusion for Consistent Visual Story Generation \ [Website]

      DeCorStory: Gram-Schmidt Prompt Embedding Decorrelation for Consistent Storytelling \ [Website]

      StoryState: Agent-Based State Control for Consistent and Editable Storybooks \ [Website]

      AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist? \ [Website]

      Try On

      TryOnDiffusion: A Tale of Two UNets \ [CVPR 2023] [Website] [Project] [Official Code] [Unofficial Code]

      ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On \ [CVPR 2025] [Project] [Code]

      StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On \ [CVPR 2024] [Project] [Code]

      Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off \ [Website] [Project] [Code]

      VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding \ [Website] [Project] [Code]

      IMAGDressing-v1: Customizable Virtual Dressing \ [Website] [Project] [Code]

      OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person \ [Website] [Project] [Code]

      AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models \ [Website] [Project] [Code]

      ViViD: Video Virtual Try-on using Diffusion Models \ [Website] [Project] [Code]

      FashionComposer: Compositional Fashion Image Generation \ [Website] [Project] [Code]

      GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting \ [Website] [Project] [Code]

      Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images \ [Website] [Project] [Code]

      From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation \ [Website] [Project] [Code]

      PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns \ [Website] [Project] [Code]

      StableGarment: Garment-Centric Generation via Stable Diffusion \ [Website] [Project] [Code]

      Improving Diffusion Models for Virtual Try-on \ [Website] [Project] [Code]

      MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input \ [Website] [Project] [Code]

      Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off \ [Website] [Project] [Code]

      JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on \ [Website] [Project] [Code]

      D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On \ [ECCV 2024] [Code]

      Improving Virtual Try-On with Garment-focused Diffusion Models \ [ECCV 2024] [Code]

      Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On \ [CVPR 2024] [Code]

      Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On \ [ICLR 2025] [Code]

      OmniVTON: Training-Free Universal Virtual Try-On \ [ICCV 2025] [Code]

      Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow \ [ACM MM 2023] [Code]

      LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On \ [ACM MM 2023] [Code]

      CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation \ [Website] [Code]

      OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on \ [Website] [Code]

      CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Model \ [Website] [Code]

      Learning Flow Fields in Attention for Controllable Person Image Generation \ [Website] [Code]

      DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling \ [Website] [Code]

      CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model \ [Website] [Code]

      Consistent Human Image and Video Generation with Spatially Conditioned Diffusion \ [Website] [Code]

      MV-VTON: Multi-View Virtual Try-On with Diffusion Models \ [Website] [Code]

      PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask \ [Website] [Code]

      Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images \ [Website] [Code]

      FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models \ [Website] [Code]

      Clothing agnostic Pre-inpainting Virtual Try-ON \ [Website] [Code]

      M&M VTO: Multi-Garment Virtual Try-On and Editing \ [CVPR 2024 Highlight] [Project]

      WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models \ [ECCV 2024] [Project]

      Fashion-VDM: Video Diffusion Model for Virtual Try-On \ [SIGGRAPH Asia 2024] [Project]

      MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on \ [Website] [Project]

      Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos \ [Website] [Project]

      Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild \ [Website] [Project]

      TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models \ [Website] [Project]

      3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models \ [Website] [Project]

      Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All \ [Website] [Project]

      Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment \ [Website] [Project]

      VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers \ [Website] [Project]

      AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario \ [Website] [Project]

      Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism \ [Website] [Project]

      DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework \ [Website] [Project]

      One Model For All: Partial Diffusion for Unified Try-On and Try-Off in Any Pose \ [Website] [Project]

      Dress&Dance: Dress up and Dance as You Like It - Technical Preview \ [Website] [Project]

      InstructVTON: Optimal Auto-Masking and Natural-Language-Guided Interactive Style Control for Inpainting-Based Virtual Try-On \ [Website] [Project]

      DiT-VTON: Diffusion Transformer Framework for Unified Multi-Category Virtual Try-On and Virtual Try-All with Integrated Image Editing \ [Website] [Project]

      Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction \ [CVPR 2025]

      FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on \ [IJCAI 2024]

      Fine-Grained Controllable Apparel Showcase Image Generation via Garment-Centric Outpainting \ [Website]

      GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon \ [Website]

      WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on \ [Website]

      Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles \ [Website]

      Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models \ [Website]

      Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models \ [Website]

      ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On \ [Website]

      ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model \ [Website]

      AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion \ [Website]

      DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing \ [Website]

      TED-VITON: Transformer-Empowered Diffusion Models for Virtual Try-On \ [Website]

      Controllable Human Image Generation with Personalized Multi-Garments \ [Website]

      RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation \ [Website]

      SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models \ [Website]

      IGR: Improving Diffusion Model for Garment Restoration from Person Image \ [Website]

      DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On \ [Website]

      DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder \ [Website]

      Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models \ [Website]

      MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer \ [Website]

      EfficientVITON: An Efficient Virtual Try-On Model using Optimized Diffusion Process \ [Website]

      Training-Free Consistency Pipeline for Fashion Repose \ [Website]

      IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter \ [Website]

      MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer \ [Website]

      Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model \ [Website]

      ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On \ [Website]

      Video Virtual Try-on with Conditional Diffusion Transformer Inpainter \ [Website]

      Two-Way Garment Transfer: Unified Diffusion Framework for Dressing and Undressing Synthesis \ [Website]

      MuGa-VTON: Multi-Garment Virtual Try-On via Diffusion Transformers with Prompt Customization \ [Website]

      LUIVITON: Learned Universal Interoperable VIrtual Try-ON \ [Website]

      ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On \ [Website]

      Rethinking Garment Conditioning in Diffusion-based Virtual Try-On \ [Website]

      The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection \ [Website]

      Drag Edit

      DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models \ [ICLR 2024] [Website] [Project] [Code]

      Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold \ [SIGGRAPH 2023] [Project] [Code]

      Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping \ [ICCV 2025] [Project] [Code]

      Readout Guidance: Learning Control from Diffusion Features \ [CVPR 2024 Highlight] [Project] [Code]

      FreeDrag: Feature Dragging for Reliable Point-based Image Editing \ [CVPR 2024] [Project] [Code]

      DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing \ [CVPR 2024] [Project] [Code]

      PoseTraj: Pose-Aware Trajectory Control in Video Diffusion \ [CVPR 2025] [Project] [Code]

      InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos \ [Website] [Project] [Code]

      GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models \ [Website] [Project] [Code]

      Repositioning the Subject within Image \ [Website] [Project] [Code]

      Drag-A-Video: Non-rigid Video Editing with Point-based Interaction \ [Website] [Project] [Code]

      ObjCtrl-2.5D: Training-free Object Control with Camera Poses \ [Website] [Project] [Code]

      DragAnything: Motion Control for Anything using Entity Representation \ [Website] [Project] [Code]

      InstantDrag: Improving Interactivity in Drag-based Image Editing \ [Website] [Project] [Code]

      DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment \ [Website] [Project] [Code]

      DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing \ [CVPR 2024] [Code]

      Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation \ [CVPR 2024] [Code]

      DragVideo: Interactive Drag-style Video Editing \ [ECCV 2024] [Code]

      RotationDrag: Point-based Image Editing with Rotated Diffusion Features \ [Website] [Code]

      DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model \ [Website] [Code]

      AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing \ [Website] [Code]

      Training-free Geometric Image Editing on Diffusion Models \ [Website] [Code]

      TrackGo: A Flexible and Efficient Method for Controllable Video Generation \ [Website] [Project]

      DragText: Rethinking Text Embedding in Point-based Image Editing \ [Website] [Project]

      OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation \ [Website] [Project]

      FastDrag: Manipulate Anything in One Step \ [Website] [Project]

      DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory \ [Website] [Project]

      StableDrag: Stable Dragging for Point-based Image Editing \ [Website] [Project]

      DiffUHaul: A Training-Free Method for Object Dragging in Images \ [Website] [Project]

      Dragging with Geometry: From Pixels to Geometry-Guided Image Editing \ [Website] [Project]

      Real-Time Motion-Controllable Autoregressive Video Diffusion \ [Website] [Project]

      RegionDrag: Fast Region-Based Image Editing with Diffusion Models \ [Website]

      Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators \ [Website]

      Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing \ [Website]

      AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing \ [Website]

      A Diffusion-Based Framework for Occluded Object Movement \ [Website]

      DragNeXt: Rethinking Drag-Based Image Editing \ [Website]

      LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence \ [Website]

      TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation \ [Website]

      DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing \ [Website]

      Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime! \ [Website]

      InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image Editing \ [Website]

      ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Consistent Attention \ [Website]

      RealDrag: The First Dragging Benchmark with Real Target Image \ [Website]

      Text Guided Image Editing

      Prompt-to-Prompt Image Editing with Cross Attention Control \ [ICLR 2023] [Website] [Project] [Code] [Replicate Demo]

      Zero-shot Image-to-Image Translation \ [SIGGRAPH 2023] [Project] [Code] [Replicate Demo] [Diffusers Doc] [Diffusers Code]

      InstructPix2Pix: Learning to Follow Image Editing Instructions \ [CVPR 2023 (Highlight)] [Website] [Project] [Diffusers Doc] [Diffusers Code] [Official Code] [Dataset]

      Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation \ [CVPR 2023] [Website] [Project] [Code] [Dataset] [Replicate Demo] [Demo]

      DiffEdit: Diffusion-based semantic image editing with mask guidance \ [ICLR 2023] [Website] [Unofficial Code] [Diffusers Doc] [Diffusers Code]

      Imagic: Text-Based Real Image Editing with Diffusion Models \ [CVPR 2023] [Website] [Project] [Diffusers]

      Inpaint Anything: Segment Anything Meets Image Inpainting \ [Website] [Code 1] [Code 2]

      MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing \ [ICCV 2023] [Website] [Project] [Code] [Demo]

      SINE: SINgle Image Editing with Text-to-Image Diffusion Models \ [CVPR 2023] [Website] [Project] [Code]

      Collaborative Score Distillation for Consistent Visual Synthesis \ [NeurIPS 2023] [Website] [Project] [Code]

      Visual Instruction Inversion: Image Editing via Visual Prompting \ [NeurIPS 2023] [Website] [Project] [Code]

      Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models \ [NeurIPS 2023] [Website] [Code]

      Localizing Object-level Shape Variations with Text-to-Image Diffusion Models \ [ICCV 2023] [Website] [Project] [Code]

      Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance \ [Website] [Code1] [Code2] [Diffusers Code]

      PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models \ [Website] [Project] [Code] [Demo]

      SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models \ [CVPR 2024] [Project] [Code]

      Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing \ [CVPR 2024] [Project] [Code]

      Text-Driven Image Editing via Learnable Regions \ [CVPR 2024] [Project] [Code]

      Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators \ [ICLR 2024] [Project] [Code]

      TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models \ [SIGGRAPH Asia 2024] [Project] [Code]

      Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps \ [NeurIPS 2024] [Project] [Code]

      Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing \ [ICCV 2025] [Project] [Code]

      ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation \ [ICCV 2025] [Project] [Code]

      ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement \ [Website] [Project] [Code]

      FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing \ [Website] [Project] [Code]

      Zero-shot Image Editing with Reference Imitation \ [Website] [Project] [Code]

      OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision \ [Website] [Project] [Code]

      MultiBooth: Towards Generating All Your Concepts in an Image from Text \ [Website] [Project] [Code]

      Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting \ [Website] [Project] [Code]

      R-Genie: Reasoning-Guided Generative Image Editing \ [Website] [Project] [Code]

      EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing \ [Website] [Project] [Code]

      StyleBooth: Image Style Editing with Multimodal Instruction \ [Website] [Project] [Code]

      SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing \ [Website] [Project] [Code]

      In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer \ [Website] [Project] [Code]

      EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods \ [Website] [Project] [Code]

      InsightEdit: Towards Better Instruction Following for Image Editing \ [Website] [Project] [Code]

      InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions \ [Website] [Project] [Code]

      MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path \ [Website] [Project] [Code]

      HIVE: Harnessing Human Feedback for Instructional Visual Editing \ [Website] [Project] [Code]

      FaceStudio: Put Your Face Everywhere in Seconds \ [Website] [Project] [Code]

      Edicho: Consistent Image Editing in the Wild \ [Website] [Project] [Code]

      IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout \ [Website] [Project] [Code]

      Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach \ [Website] [Project] [Code]

      Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models \ [Website] [Project] [Code]

      EditCLIP: Representation Learning for Image Editing \ [Website] [Project] [Code]

      SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing \ [Website] [Project] [Code]

      FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction \ [Website] [Project] [Code]

      MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance \ [Website] [Project] [Code]

      Edit Transfer: Learning Image Editing via Vision In-Context Relations \ [Website] [Project] [Code]

      LIME: Localized Image Editing via Attention Regularization in Diffusion Models \ [Website] [Project] [Code]

      MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond \ [Website] [Project] [Code]

      MagicQuill: An Intelligent Interactive Image Editing System \ [Website] [Project] [Code]

      Scaling Concept With Text-Guided Diffusion Models \ [Website] [Project] [Code]

      Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control \ [Website] [Project] [Code]

      FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models \ [Website] [Project] [Code]

      FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning \ [Website] [Project] [Code]

      Steering Rectified Flow Models in the Vector Field for Controlled Image Generation \ [Website] [Project] [Code]

      Delta Denoising Score \ [Website] [Project] [Code]

      InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences \ [Website] [Project] [Code]

      KV-Edit: Training-Free Image Editing for Precise Background Preservation \ [Website] [Project] [Code]

      FreSca: Unveiling the Scaling Space in Diffusion Models \ [Website] [Project] [Code]

      Concept Lancet: Image Editing with Compositional Representation Transplant \ [Website] [Project] [Code]

      ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions \ [Website] [Project] [Code]

      RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions \ [Website] [Project] [Code]

      Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing \ [Website] [Project] [Code]

      CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing \ [Website] [Project] [Code]

      Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control \ [Website] [Project] [Code]

      EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing \ [Website] [Project] [Code]

      ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation \ [Website] [Project] [Code]

      Group Relative Attention Guidance for Image Editing \ [Website] [Project] [Code]

      EditThinker: Unlocking Iterative Reasoning for Any Image Editor \ [Website] [Project] [Code]

      EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing \ [Website] [Project] [Code]

      SpotEdit: Selective Region Editing in Diffusion Transformers \ [Website] [Project] [Code]

      Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling \ [Website] [Project] [Code]

      ChordEdit: One-Step Low-Energy Transport for Image Editing \ [Website] [Project] [Code]

      UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image \ [SIGGRAPH 2023] [Code]

      Learning to Follow Object-Centric Image Editing Instructions Faithfully \ [EMNLP 2023] [Code]

      Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling \ [ICCV 2025] [Code]

      Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning \ [CVPR 2025] [Code]

      GroupDiff: Diffusion-based Group Portrait Editing \ [ECCV 2024] [Code]

      GDrag:Towards General-Purpose Interactive Editing with Anti-ambiguity Point Diffusion \ [CVPR 2024] [Code]

      TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing \ [CVPR 2024] [Code]

      ZONE: Zero-Shot Instruction-Guided Local Editing \ [CVPR 2024] [Code]

      Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation \ [CVPR 2024] [Code]

      DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation \ [ECCV 2024] [Code]

      FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing \ [ECCV 2024] [Code]

      Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing \ [ECCV 2024] [Code]

      Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks \ [AAAI 2024] [Code]

      FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference \ [AAAI 2024] [Code]

      Face Aging via Diffusion-based Editing\ [BMVC 2023] [Code]

      Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing \ [Website] [Code]

      Step1X-Edit: A Practical Framework for General Image Editing \ [Website] [Code]

      GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing \ [Website] [Code]

      Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing \ [Website] [Code]

      MoEdit: On Learning Quantity Perception for Multi-object Image Editing \ [Website] [Code]

      REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models \ [Website] [Code]

      FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing \ [Website] [Code]

      Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing \ [Website] [Code]

      PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing \ [Website] [Code]

      DiT4Edit: Diffusion Transformer for Image Editing \ [Website] [Code]

      FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors \ [Website] [Code]

      Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization \ [Website] [Code]

      Move and Act: Enhanced Object Manipulation and Background Integrity for Image Editing \ [Website] [Code]

      EditWorld: Simulating World Dynamics for Instruction-Following Image Editing \ [Website] [Code]

      ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing \ [Website] [Code]

      Differential Diffusion: Giving Each Pixel Its Strength \ [Website] [Code]

      Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model \ [Website] [Code]

      MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing \ [Website] [Code]

      Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing \ [Website] [Code]

      LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing \ [Website] [Code]

      InstructDiffusion: A Generalist Modeling Interface for Vision Tasks \ [Website] [Code]

      Region-Aware Diffusion for Zero-shot Text-driven Image Editing \ [Website] [Code]

      Forgedit: Text Guided Image Editing via Learning and Forgetting \ [Website] [Code]

      AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing \ [Website] [Code]

      An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control \ [Website] [Code]

      FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models \ [Website] [Code]

      Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance \ [Website] [Code]

      SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing \ [Website] [Code]

      SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding \ [Website] [Code]

      Image-Editing Specialists: An RLAIF Approach for Diffusion Models \ [Website] [Code]

      FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing \ [Website] [Code]

      PromptFix: You Prompt and We Fix the Photo \ [Website] [Code]

      FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation \ [Website] [Code]

      Single Image Iterative Subject-driven Generation and Editing \ [Website] [Code]

      Image Editing As Programs with Diffusion Models \ [Website] [Code]

      PairEdit: Learning Semantic Variations for Exemplar-based Image Editing \ [Website] [Code]

      Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models \ [Website] [Code]

      Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning \ [Website] [Code]

      The Promise of RL for Autoregressive Image Editing \ [Website] [Code]

      X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning \ [Website] [Code]

      Visual Autoregressive Modeling for Instruction-Guided Image Editing \ [Website] [Code]

      SpotEdit: Evaluating Visually-Guided Image Editing Methods \ [Website] [Code]

      Delta Velocity Rectified Flow for Text-to-Image Editing \ [Website] [Code]

      Lego-Edit: A General Image Editing Framework with Model-Level Bricks and MLLM Builder \ [Website] [Code]

      AgeBooth: Controllable Facial Aging and Rejuvenation via Diffusion Models \ [Website] [Code]

      Region in Context: Text-condition Image editing with Human-like semantic reasoning \ [Website] [Code]

      RegionE: Adaptive Region-Aware Generation for Efficient Image Editing \ [Website] [Code]

      Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing \ [Website] [Code]

      LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning \ [Website] [Code]

      REASONEDIT: Towards Reasoning-Enhanced Image Editing Models \ [Website] [Code]

      Refaçade: Editing Object with Given Reference Texture \ [Website] [Code]

      Towards Generalized Multi-Image Editing for Unified Multimodal Models \ [Website] [Code]

      Conditional Score Guidance for Text-Driven Image-to-Image Translation \ [NeurIPS 2023] [Website]

      ConsistEdit: Highly Consistent and Precise Training-free Visual Editing \ [SIGGRAPH Asia 2025] [Project]

      Emu Edit: Precise Image Editing via Recognition and Generation Tasks \ [CVPR 2024] [Project]

      ByteEdit: Boost, Comply and Accelerate Generative Image Editing \ [ECCV 2024] [Project]

      Watch Your Steps: Local Image and Scene Editing by Text Instructions \ [ECCV 2024] [Project]

      TurboEdit: Instant text-based image editing \ [ECCV 2024] [Project]

      FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers \ [CVPR 2025] [Project]

      Novel Object Synthesis via Adaptive Text-Image Harmony \ [NeurIPS 2024] [Project]

      Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization \ [NeurIPS 2025] [Project]

      Textualize Visual Prompt for Image Editing via Diffusion Bridge \ [AAAI 2025] [Project]

      PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models \ [Website] [Project]

      InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images \ [Website] [Project]

      UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics \ [Website] [Project]

      HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads \ [Website] [Project]

      POEM: Precise Object-level Editing via MLLM control \ [Website] [Project]

      MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models \ [Website] [Project]

      Object-level Visual Prompts for Compositional Image Generation \ [Website] [Project]

      Instruction-based Image Manipulation by Watching How Things Move \ [Website] [Project]

      BrushEdit: All-In-One Image Inpainting and Editing \ [Website] [Project]

      Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models \ [Website] [Project]

      SeedEdit: Align Image Re-Generation to Image Editing \ [Website] [Project]

      Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection \ [Website] [Project]

      Generative Image Layer Decomposition with Visual Effects \ [Website] [Project]

      Editable Image Elements for Controllable Synthesis \ [Website] [Project]

      SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing \ [Website] [Project]

      SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion \ [Website] [Project]

      ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation \ [Website] [Project]

      UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency \ [Website] [Project]

      GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models \ [Website] [Project]

      MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers \ [Website] [Project]

      FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing \ [Website] [Project]

      GeoDiffuser: Geometry-Based Image Editing with Diffusion Models \ [Website] [Project]

      SOEDiff: Efficient Distillation for Small Object Editing \ [Website] [Project]

      UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models \ [Website] [Project]

      Click2Mask: Local Editing with Dynamic Mask Generation \ [Website] [Project]

      Stable Flow: Vital Layers for Training-Free Image Editing \ [Website] [Project]

      FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing \ [Website] [Project]

      FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model \ [Website] [Project]

      CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models \ [Website] [Project]

      DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing \ [Website] [Project]

      SeedEdit 3.0: Fast and High-Quality Generative Image Editing \ [Website] [Project]

      VINCIE: Unlocking In-context Image Editing from Video \ [Website] [Project]

      Flux-Sculptor: Text-Driven Rich-Attribute Portrait Editing through Decomposed Spatial Flow Control \ [Website] [Project]

      Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing \ [Website] [Project]

      Moodifier: MLLM-Enhanced Emotion-Driven Image Editing \ [Website] [Project]

      NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining \ [Website] [Project]

      NEP: Autoregressive Image Editing via Next Editing Token Prediction \ [Website] [Project]

      Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing \ [Website] [Project]

      Learning an Image Editing Model without Image Editing Pairs \ [Website] [Project]

      FlowOpt: Fast Optimization Through Whole Flow Processes for Training-Free Editing \ [Website] [Project]

      MIRA: Multimodal Iterative Reasoning Agent for Image Editing \ [Website] [Project]

      FreqEdit: Preserving High-Frequency Features for Robust Multi-Turn Image Editing \ [Website] [Project]

      Alterbute: Editing Intrinsic Attributes of Objects in Images \ [Website] [Project]

      Controlling Your Image via Simplified Vector Graphics \ [Website] [Project]

      From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging \ [NeurIPS 2025]

      InstantEdit: Text-Guided Few-Step Image Editing with Piecewise Rectified Flow \ [ICCV 2025]

      h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform \ [CVPR 2025]

      SceneCrafter: Controllable Multi-View Driving Scene Editing \ [CVPR 2025]

      InstantPortrait: One-Step Portrait Editing via Diffusion Multi-Objective Distillation \ [ICLR 2025]

      Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent \ [ICCV 2025]

      AutoEdit: Automatic Hyperparameter Tuning for Image Editing \ [NeurIPS 2025]

      Iterative Multi-granular Image Editing using Diffusion Models \ [WACV 2024]

      Text-to-image Editing by Image Information Removal \ [WACV 2024]

      TexSliders: Diffusion-Based Texture Editing in CLIP Space \ [SIGGRAPH 2024]

      Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models \ [CVPR 2023 AI4CC Workshop]

      Locally Controlled Face Aging with Latent Diffusion Models \ [Website]

      TimeMachine: Fine-Grained Facial Age Editing with Identity Preservation \ [Website]

      Odo: Depth-Guided Diffusion for Identity-Preserving Body Reshaping \ [Website]

      Learning Feature-Preserving Portrait Editing from Generated Pairs \ [Website]

      EmoEdit: Evoking Emotions through Image Manipulation \ [Website]

      DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images \ [Website]

      LayerDiffusion: Layered Controlled Image Editing with Diffusion Models \ [Website]

      iEdit: Localised Text-guided Image Editing with Weak Supervision \ [Website]

      User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques \ [Website]

      PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing \ [Website]

      PRedItOR: Text Guided Image Editing with Diffusion Prior \ [Website]

      FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing \ [Website]

      The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing \ [Website]

      Image Translation as Diffusion Visual Programmers \ [Website]

      Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing \ [Website]

      LoMOE: Localized Multi-Object Editing via Multi-Diffusion \ [Website]

      Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing \ [Website]

      DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation \ [Website]

      InstructGIE: Towards Generalizable Image Editing \ [Website]

      LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing \ [Website]

      Uncovering the Text Embedding in Text-to-Image Diffusion Models \ [Website]

      Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer \ [Website]

      Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion \ [Website]

      Text Guided Image Editing with Automatic Concept Locating and Forgetting \ [Website]

      The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP \ [Website]

      LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing \ [Website]

      Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing \ [Website]

      Achieving Complex Image Edits via Function Aggregation with Diffusion Models \ [Website]

      Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing \ [Website]

      InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models \ [Website]

      PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM \ [Website]

      Augmentation-Driven Metric for Balancing Preservation and Modification in TextGuided Image Editing \ [Website]

      Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing \ [Website]

      ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing \ [Website]

      ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models \ [Website]

      ColorEdit: Training-free Image-Guided Color editing with diffusion model \ [Website]

      GalaxyEdit: Large-Scale Image Editing Dataset with Enhanced Diffusion Adapter \ [Website]

      Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing \ [Website]

      Pathways on the Image Manifold: Image Editing via Video Generation \ [Website]

      LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair \ [Website]

      Action-based image editing guided by human instructions \ [Website]

      Addressing Attribute Leakages in Diffusion-based Image Editing without Training \ [Website]

      Prompt Augmentation for Self-supervised Text-guided Image Manipulation \ [Website]

      PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation \ [Website]

      Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance \ [Website]

      PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data \ [Website]

      DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics \ [Website]

      Guidance Free Image Editing via Explicit Conditioning \ [Website]

      Training-Free Text-Guided Image Editing with Visual Autoregressive Model \ [Website]

      Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing \ [Website]

      SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow \ [Website]

      Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing \ [Website]

      Towards Generalized and Training-Free Text-Guided Semantic Manipulation \ [Website]

      InstructAttribute: Fine-grained Object Attributes editing with Instruction \ [Website]

      MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models \ [Website]

      GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing \ [Website]

      MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection \ [Website]

      Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions [Website]

      Affective Image Editing: Shaping Emotional Factors via Text Descriptions \ [Website]

      FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing \ [Website]

      Cora: Correspondence-aware image editing using few step diffusion \ [Website]

      FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation \ [Website]

      CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing \ [Website]

      Improving Diffusion-Based Image Editing Faithfulness via Guidance and Scheduling \ [Website]

      S2Edit: Text-Guided Image Editing with Precise Semantic and Spatial Control \ [Website]

      MADI: Masking-Augmented Diffusion with Inference-Time Scaling for Visual Editing \ [Website]

      LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing \ [Website]

      UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying \ [Website]

      DreamVE: Unified Instruction-based Image and Video Editing \ [Website]

      Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing \ [Website]

      TweezeEdit: Consistent and Efficient Image Editing with Path Regularization \ [Website]

      LatentEdit: Adaptive Latent Control for Consistent Semantic Editing \ [Website]

      CAMILA: Context-Aware Masking for Image Editing with Language Alignment \ [Website]

      EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning \ [Website]

      Semantic Editing with Coupled Stochastic Differential Equations \ [Website]

      FoR-SALE: Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing \ [Website]

      Efficient High-Resolution Image Editing with Hallucination-Aware Loss and Adaptive Tiling \ [Website]

      Video4Edit: Viewing Image Editing as a Degenerate Temporal Process \ [Website]

      NumeriKontrol: Adding Numeric Control to Diffusion Transformers for Instruction-based Image Editing \ [Website]

      Are Image-to-Video Models Good Zero-Shot Image Editors \ [Website]

      CogniEdit: Dense Gradient Flow Optimization for Fine-Grained Image Editing \ [Website]

      LAMS-Edit: Latent and Attention Mixing with Schedulers for Improved Content Preservation in Diffusion-Based Image and Style Editing \ [Website]

      RemEdit: Efficient Diffusion Editing with Riemannian Geometry \ [Website]

      Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss \ [Website]

      Shifting the Breaking Point of Flow Matching for Multi-Instance Editing \ [Website]

      Diffusion Models Inversion

      Null-text Inversion for Editing Real Images using Guided Diffusion Models \ [CVPR 2023] [Website] [Project] [Code]

      Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code \ [ICLR 2024] [Website] [Project] [Code]

      Inversion-Based Creativity Transfer with Diffusion Models \ [CVPR 2023] [Website] [Code]

      EDICT: Exact Diffusion Inversion via Coupled Transformations \ [CVPR 2023] [Website] [Code]

      Improving Negative-Prompt Inversion via Proximal Guidance \ [Website] [Code]

      An Edit Friendly DDPM Noise Space: Inversion and Manipulations \ [CVPR 2024] [Project] [Code] [Demo]

      Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing \ [NeurIPS 2023] [Website] [Code]

      Inversion-Free Image Editing with Natural Language \ [CVPR 2024] [Project] [Code]

      LEDITS++: Limitless Image Editing using Text-to-Image Models \ [CVPR 2024] [Project] [Code]

      Noise Map Guidance: Inversion with Spatial Context for Real Image Editing \ [ICLR 2024] [Website] [Code]

      ReNoise: Real Image Inversion Through Iterative Noising \ [ECCV 2024] [Project] [Code]

      IterInv: Iterative Inversion for Pixel-Level T2I Models \ [NeurIPS-W 2023] [Openreview] [NeuripsW] [Website] [Code]

      DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models \ [Website] [Project] [Code]

      Object-aware Inversion and Reassembly for Image Editing \ [Website] [Project] [Code]

      Don't Forget your Inverse DDIM for Image Editing \ [Website] [Project] [Code]

      Taming Rectified Flow for Inversion and Editing \ [Website] [Project] [Code]

      POLARIS: Projection-Orthogonal Least Squares for Robust and Adaptive Inversion in Diffusion Models \ [Website] [Project] [Code]

      A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance \ [ICCV 2023] [Code]

      Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models \ [ECCV 2024] [Code]

      EditInfinity: Image Editing with Binary-Quantized Generative Models \ [NeurIPS 2025] [Code]

      LocInv: Localization-aware Inversion for Text-Guided Image Editing \ [CVPR 2024 AI4CC workshop] [Code]

      Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling \ [IJCAI 2024] [Code]

      StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing \ [CVMJ] [Code]

      Generating Non-Stationary Textures using Self-Rectification \ [Website] [Code]

      Exact Diffusion Inversion via Bi-directional Integration Approximation \ [Website] [Code]

      IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models \ [Website] [Code]

      Fixed-point Inversion for Text-to-image diffusion models \ [Website] [Code]

      Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing \ [Website] [Code]

      Transport-Guided Rectified Flow Inversion: Improved Image Editing Using Optimal Transport Theory \ [Website] [Code]

      Runge-Kutta Approximation and Decoupled Attention for Rectified Flow Inversion and Semantic Editing \ [Website] [Code]

      FlowCycle: Pursuing Cycle-Consistent Flows for Text-based Editing \ [Website] [Code]

      FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing \ [Website] [Code]

      Reversible Inversion for Training-Free Exemplar-guided Image Editing \ [Website] [Code]

      DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion \ [Website] [Code]

      Tight Inversion: Image-Conditioned Inversion for Real Image Editing \ [Website] [Project]

      The Devil is in Attention Sharing: Improving Complex Non-rigid Image Editing Faithfulness via Attention Synergy \ [Website] [Project]

      Effective Real Image Editing with Accelerated Iterative Diffusion Inversion \ [ICCV 2023 Oral] [Website]

      BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models \ [NeurIPS 2024]

      Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing \ [NeurIPS 2024]

      Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models \ [ICLR 2025]

      Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation \ [ICML 2025]

      BARET : Balanced Attention based Real image Editing driven by Target-text Inversion \ [WACV 2024]

      Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing \ [ICASSP 2024]

      Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing \ [Website]

      Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations \ [Website]

      Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models \ [Website]

      Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models \ [Website]

      SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing \ [Website]

      Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models \ [Website]

      KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing \ [Website]

      Tuning-Free Inversion-Enhanced Control for Consistent Image Editing \ [Website]

      LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance \ [Website]

      Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing \ [Website]

      Exploring Optimal Latent Trajetory for Zero-shot Image Editing \ [Website]

      Identity-preserving Distillation Sampling by Fixed-Point Iterator \ [Website]

      LUSD: Localized Update Score Distillation for Text-Guided Image Editing \ [Website]

      Adams Bashforth Moulton Solver for Inversion and Editing in Rectified Flow \ [Website]

      DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing \ [Website]

      FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing \ [Website]

      Training-Free Reward-Guided Image Editing via Trajectory Optimal Control \ [Website]

      DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models \ [Website]

      SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing \ [Website]

      FlowDC: Flow-Based Decoupling-Decay for Complex Image Editing \ [Website]

      On Exact Editing of Flow-Based Diffusion Models \ [Website]

      FlowBypass: Rectified Flow Trajectory Bypass for Training-Free Image Editing \ [Website]

      SSI-DM: Singularity Skipping Inversion of Diffusion Models \ [Website]

      Continual Learning

      RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models \ [CVPR 2023] [Website] [Project] [Code]

      Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective \ [Website] [Project] [Code]

      Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning \ [ECCV 2024 Oral] [Code]

      How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? \ [NeurIPS 2024] [Code]

      ReplayCAD: Generative Diffusion Replay for Continual Anomaly Detection \ [IJCAI 2025] [Code]

      CLoG: Benchmarking Continual Learning of Image Generation Models \ [Website] [Code]

      Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models \ [Website] [Code]

      Continual Learning of Diffusion Models with Generative Distillation \ [Website] [Code]

      Prompt-Based Exemplar Super-Compression and Regeneration for Class-Incremental Learning \ [Website] [Code]

      Bring Your Dreams to Life: Continual Text-to-Video Customization \ [Website] [Code]

      Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA \ [TMLR] [Project]

      Assessing Open-world Forgetting in Generative Image Model Customization \ [Website] [Project]

      ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation \ [CVPR 2025]

      Class-Incremental Learning using Diffusion Model for Distillation and Replay \ [ICCV 2023 VCL workshop best paper]

      Create Your World: Lifelong Text-to-Image Diffusion \ [Website]

      Low-Rank Continual Personalization of Diffusion Models \ [Website]

      Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models \ [Website]

      Online Continual Learning of Video Diffusion Models From a Single Video Stream \ [Website]

      Exploring Continual Learning of Diffusion Models \ [Website]

      DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency \ [Website]

      DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation \ [Website]

      Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters \ [Website]

      Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning \ [Website]

      MuseumMaker: Continual Style Customization without Catastrophic Forgetting \ [Website]

      Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion \ [Website]

      Diffusion Model Meets Non-Exemplar Class-Incremental Learning and Beyond \ [Website]

      Diffusion Meets Few-shot Class Incremental Learning \ [Website]

      How can Diffusion Models Evolve into Continual Generators? \ [Website]

      Can Synthetic Images Conquer Forgetting? Beyond Unexplored Doubts in Few-Shot Class-Incremental Learning \ [Website]

      VidCLearn: A Continual Learning Approach for Text-to-Video Generation \ [Website]

      Continual Personalization for Diffusion Models \ [Website]

      Breaking Forgetting: Training-Free Few-Shot Class-Incremental Learning via Conditional Diffusion \ [Website]

      Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers \ [Website]

      Remove Concept

      Ablating Concepts in Text-to-Image Diffusion Models \ [ICCV 2023] [Website] [Project] [Code]

      Erasing Concepts from Diffusion Models \ [ICCV 2023] [Website] [Project] [Code]

      When Are Concepts Erased From Diffusion Models? \ [Website] [Project] [Code]

      One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications \ [Website] [Project] [Code]

      Editing Massive Concepts in Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      Memories of Forgotten Concepts \ [Website] [Project] [Code]

      STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models \ [Website] [Project] [Code]

      ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling \ [Website] [Project] [Code]

      ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer \ [Website] [Project] [Code]

      ACE: Anti-Editing Concept Erasure in Text-to-Image Models \ [Website] [Code]

      ACE: Concept Editing in Diffusion Models without Performance Degradation \ [Website] [Code]

      Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models \ [CVPR 2023] [Code1] [Code2]

      Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models \ [CVPR 2025] [Code]

      Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection \ [ICLR 2026] [Code]

      Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate \ [ICLR 2025] [Code]

      Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models \ [ICML 2023 workshop] [Code]

      One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework \ [ICCV 2025] [Code]

      Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models \ [ECCV 2024] [Code]

      Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion \ [ECCV 2024] [Code]

      The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models \ [CVPR 2025] [Code]

      Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation \ [NeurIPS 2024] [Code]

      Unveiling Concept Attribution in Diffusion Models \ [Website] [Code]

      TraSCE: Trajectory Steering for Concept Erasure \ [Website] [Code]

      Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them \ [Website] [Code]

      On the Vulnerability of Concept Erasure in Diffusion Models \ [Website] [Code]

      TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models \ [Website] [Code]

      T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion Models \ [Website] [Code]

      SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models \ [Website] [Code]

      CASteer: Steering Diffusion Models for Controllable Generation \ [Website] [Code]

      Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations \ [Website] [Code]

      Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts \ [Website] [Code]

      ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion \ [Website] [Code]

      Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models \ [Website] [Code]

      Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models \ [Website] [Code]

      ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning \ [Website] [Code]

      Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models \ [Website] [Code]

      DuMo: Dual Encoder Modulation Network for Precise Concept Erasure \ [Website] [Code]

      Add-SD: Rational Generation without Manual Reference \ [Website] [Code]

      Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation \ [Website] [Code]

      MUNBa: Machine Unlearning via Nash Bargaining \ [Website] [Code]

      Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts \ [Website] [Code]

      SAGE: Exploring the Boundaries of Unsafe Concept Domain with Semantic-Augment Erasing \ [Website] [Code]

      Personalized Safety Alignment for Text-to-Image Diffusion Models \ [Website] [Code]

      NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image Generation \ [Website] [Code]

      Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models \ [Website] [Code]

      Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models \ [Website] [Code]

      Rethinking Robust Adversarial Concept Erasure in Diffusion Models \ [Website] [Code]

      PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation \ [Website] [Code]

      Mass Concept Erasure in Diffusion Models with Concept Hierarchy \ [Website] [Code]

      LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models \ [Website] [Code]

      ReLAPSe: Reinforcement-Learning-trained Adversarial Prompt Search for Erased concepts in unlearned diffusion models \ [Website] [Code]

      When Safety Collides: Resolving Multi-Category Harmful Conflicts in Text-to-Image Diffusion via Adaptive Safety Guidance \ [Website] [Code]

      Prototype-Guided Concept Erasure in Diffusion Models \ [Website] [Code]

      Implicit Concept Removal of Diffusion Models \ [ECCV 2024] [Project]

      RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining \ [Website] [Project]

      MACE: Mass Concept Erasure in Diffusion Models \ [CVPR 2024]

      Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation \ [CVPR 2025]

      Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models \ [CVPR 2025]

      Erasing Concept Combination from Text-to-Image Diffusion Model \ [ICLR 2025]

      An h-space Based Adversarial Attack for Protection Against Few-shot Personalization \ [ACM MM 2025]

      EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers \ [Website]

      Continuous Concepts Removal in Text-to-image Diffusion Models \ [Website]

      Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models \ [Website]

      Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models \ [Website]

      Direct Unlearning Optimization for Robust and Safe Text-to-Image Models \ [Website]

      Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Model \ [Website]

      Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning \ [Website]

      Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models \ [Website]

      Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers \ [Website]

      All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models \ [Website]

      EraseDiff: Erasing Data Influence in Diffusion Models \ [Website]

      UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models \ [Website]

      Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts \ [Website]

      R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model \ [Website]

      Pruning for Robust Concept Erasing in Diffusion Models \ [Website]

      Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models \ [Website]

      Unlearning Concepts from Text-to-Video Diffusion Models \ [Website]

      EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts \ [Website]

      Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning \ [Website]

      Understanding the Impact of Negative Prompts: When and How Do They Take Effect? \ [Website]

      Model Integrity when Unlearning with T2I Diffusion Models \ [Website]

      Learning to Forget using Hypernetworks \ [Website]

      Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters \ [Website]

      AdvAnchor: Enhancing Diffusion Model Unlearning with Adversarial Anchors \ [Website]

      EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques \ [Website]

      CE-SDWV: Effective and Efficient Concept Erasure for Text-to-Image Diffusion Models via a Semantic-Driven Word Vocabulary \ [Website]

      SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders \ [Website]

      Erasing with Precision: Evaluating Specific Concept Erasure from Text-to-Image Generative Models \ [Website]

      Concept Corrector: Erase concepts on the fly for text-to-image diffusion models \ [Website]

      SafeText: Safe Text-to-image Models via Aligning the Text Encoder \ [Website]

      Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models \ [Website]

      CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models \ [Website]

      Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion \ [Website]

      Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization \ [Website]

      Safe and Reliable Diffusion Models via Subspace Projection \ [Website]

      Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization \ [Website]

      Towards NSFW-Free Text-to-Image Generation via Safety-Constraint Direct Preference Optimization \ [Website]

      CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models \ [Website]

      Responsible Diffusion Models via Constraining Text Embeddings within Safe Regions \ [Website]

      Erased or Dormant? Rethinking Concept Erasure Through Reversibility \ [Website]

      Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression \ [Website]

      Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning \ [Website]

      Localizing Knowledge in Diffusion Transformers \ [Website]

      TRACE: Trajectory-Constrained Concept Erasure in Diffusion Models \ [Website]

      NSFW-Classifier Guided Prompt Sanitization for Safe Text-to-Image Generation \ [Website]

      Concept Unlearning by Modeling Key Steps of Diffusion Process \ [Website]

      FADE: Adversarial Concept Erasure in Flow Models \ [Website]

      Towards Resilient Safety-driven Unlearning for Diffusion Models against Downstream Fine-tuning \ [Website]

      PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation \ [Website]

      Zero-Residual Concept Erasure via Progressive Alignment in Text-to-Image Model \ [Website]

      UnGuide: Learning to Forget with LoRA-Guided Diffusion Models \ [Website]

      SafeCtrl: Region-Based Safety Control for Text-to-Image Diffusion via Detect-Then-Suppress \ [Website]

      VideoEraser: Concept Erasure in Text-to-Video Diffusion Models \ [Website]

      Side Effects of Erasing Concepts from Diffusion Models \ [Website]

      SuMa: A Subspace Mapping Approach for Robust and Effective Concept Erasure in Text-to-Image Diffusion Models \ [Website]

      VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation \ [Website]

      A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models \ [Website]

      Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models \ [Website]

      SAEmnesia: Erasing Concepts in Diffusion Models with Sparse Autoencoders \ [Website]

      DyME: Dynamic Multi-Concept Erasure in Diffusion Models with Bi-Level Orthogonal LoRA Adaptation \ [Website]

      Erased, But Not Forgotten: Erased Rectified Flow Transformers Still Remain Unsafe Under Concept Attack \ [Website]

      Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations \ [Website]

      Beyond Fixed Anchors: Precisely Erasing Concepts with Sibling Exclusive Counterparts \ [Website]

      GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models \ [Website]

      Coffee: Controllable Diffusion Fine-tuning \ [Website]

      Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation \ [Website]

      Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models \ [Website]

      EMMA: Concept Erasure Benchmark with Comprehensive Semantic Metrics and Diverse Categories \ [Website]

      M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models \ [Website]

      ActErase: A Training-Free Paradigm for Precise Concept Erasure via Activation Patching \ [Website]

      Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models \ [Website]

      Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking \ [Website]

      Differential Vector Erasure: Unified Training-Free Concept Erasure for Flow Matching Models \ [Website]

      The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization \ [Website]

      Consistency-Preserving Concept Erasure via Unsafe-Safe Pairing and Directional Fisher-weighted Adaptation \ [Website]

      Selective Fine-Tuning for Targeted and Robust Concept Unlearning \ [Website]

      In Context Learning

      ReVersion: Diffusion-Based Relation Inversion from Images \ [SIGGRAPH Asia 2024] [Project] [Code]

      BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing \ [NeurIPS 2023] [Website] [Project] [Code]

      Photoswap: Personalized Subject Swapping in Images \ [NeurIPS 2023] [Website] [Project] [Code]

      ITI-GEN: Inclusive Text-to-Image Generation \ [ICCV 2023 Oral] [Website] [Project] [Code]

      Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models \ [ICCV 2023] [Website] [Project] [Code]

      Total Selfie: Generating Full-Body Selfies \ [CVPR 2024 Highlight] [Project] [Code]

      Style Aligned Image Generation via Shared Attention \ [CVPR 2024] [Project] [Code]

      Diffusion Self-Distillation for Zero-Shot Customized Image Generation \ [CVPR 2025] [Project] [Code]

      Material Palette: Extraction of Materials from a Single Image \ [CVPR 2024] [Project] [Code]

      Learning Continuous 3D Words for Text-to-Image Generation \ [CVPR 2024] [Project] [Code]

      DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models \ [ICLR 2025] [Project] [Code]

      ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models \ [AAAI 2024] [Project] [Code]

      InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser \ [ECCV 2024] [Project] [Code]

      The Hidden Language of Diffusion Models \ [ICLR 2024] [Project] [Code]

      ZeST: Zero-Shot Material Transfer from a Single Image \ [ECCV 2024] [Project] [Code]

      RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization \ [CVPR 2024] [Project] [Code]

      Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models \ [ECCV 2024] [Project] [Code]

      Kosmos-G: Generating Images in Context with Multimodal Large Language Models \ [ICLR 2024] [Project] [Code]

      StyleBlend: Enhancing Style-Specific Content Creation in Text-to-Image Diffusion Models \ [Eurographics 2025] [Project] [Code]

      Personalize Anything for Free with Diffusion Transformer \ [Website] [Project] [Code]

      RealCustom++: Representing Images as Real-Word for Real-Time Customization \ [Website] [Project] [Code]

      Generating Multi-Image Synthetic Data for Text-to-Image Customization \ [Website] [Project] [Code]

      DreamO: A Unified Framework for Image Customization \ [Website] [Project] [Code]

      EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM \ [Website] [Project] [Code]

      Conceptrol: Concept Control of Zero-shot Personalized Image Generation \ [Website] [Project] [Code]

      Customizing Text-to-Image Diffusion with Camera Viewpoint Control \ [Website] [Project] [Code]

      K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs \ [Website] [Project] [Code]

      StyleDrop: Text-to-Image Generation in Any Style \ [Website] [Project] [Code]

      Personalized Representation from Personalized Generation \ [Website] [Project] [Code]

      Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion \ [Website] [Project] [Code]

      CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning \ [Website] [Project] [Code]

      When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation \ [Website] [Project] [Code]

      ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs \ [Website] [Project] [Code]

      CSGO: Content-Style Composition in Text-to-Image Generation \ [Website] [Project] [Code]

      InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework \ [Website] [Project] [Code]

      Visual Style Prompting with Swapping Self-Attention \ [Website] [Project] [Code]

      MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation \ [Website] [Project] [Code]

      MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models \ [Website] [Project] [Code]

      DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models \ [NeurIPS 2024] [Code]

      DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation \ [ICLR 2024] [Code]

      Customized Generation Reimagined: Fidelity and Editability Harmonized \ [ECCV 2024] [Code]

      ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation \ [SIGGRAPH Asia 2023] [Code]

      Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier \ [WACV 2025] [Code]

      Memory-Efficient Personalization using Quantized Diffusion Model \ [ECCV 2024] [Code]

      DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning \ [NeurIPS 2024] [Code]

      Concept-centric Personalization with Large-scale Diffusion Priors \ [CVPR 2025] [Code]

      PersonaHOI: Effortlessly Improving Personalized Face with Human-Object Interaction Generation \ [Website] [Code]

      FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation \ [Website] [Code]

      BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models \ [Website] [Code]

      AerialBooth: Mutual Information Guidance for Text Controlled Aerial View Synthesis from a Single Image \ [Website] [Code]

      Cross-domain Compositing with Pretrained Diffusion Models \ [Website] [Code]

      Customization Assistant for Text-to-image Generation \ [Website] [Code]

      FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers \ [Website] [Code]

      TARA: Token-Aware LoRA for Composable Personalization in Diffusion Models \ [Website] [Code]

      TIDE: Achieving Balanced Subject-Driven Image Generation via Target-Instructed Diffusion Enhancement \ [Website] [Code]

      MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image Personalization \ [Website] [Code]

      Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization \ [Website] [Code]

      OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation \ [Website] [Code]

      InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation \ [Website] [Code]

      Reverse Personalization \ [Website] [Code]

      PureCC: Pure Learning for Text-to-Image Concept Customization \ [Website] [Code]

      AssetDropper: Asset Extraction via Diffusion Models with Reward-Driven Optimization \ [SIGGRAPH 2025] [Project]

      HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation \ [ECCV 2024] [Project]

      Key-Locked Rank One Editing for Text-to-Image Personalization \ [SIGGRAPH 2023] [Project]

      Diffusion in Style \ [ICCV 2023] [Project]

      TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion \ [CVPR 2024] [Project]

      Personalized Residuals for Concept-Driven Text-to-Image Generation \ [CVPR 2024] [Project]

      LogoSticker: Inserting Logos into Diffusion Models for Customized Generation \ [ECCV 2024] [Project]

      SerialGen: Personalized Image Generation by First Standardization Then Personalization \ [CVPR 2025] [Project]

      Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation \ [Website] [Project]

      RelationBooth: Towards Relation-Aware Customized Object Generation \ [Website] [Project]

      InstructBooth: Instruction-following Personalized Text-to-Image Generation \ [Website] [Project]

      MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation \ [Website] [Project]

      ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation \ [Website] [Project]

      TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space \ [Website] [Project]

      PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization \ [Website] [Project]

      Subject-driven Text-to-Image Generation via Apprenticeship Learning \ [Website] [Project]

      IC-Custom: Diverse Image Customization via In-Context Learning \ [Website] [Project]

      Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation \ [Website] [Project]

      Nested Attention: Semantic-aware Attention Values for Concept Personalization \ [Website] [Project]

      HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models \ [Website] [Project]

      Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model \ [Website] [Project]

      HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation \ [Website] [Project]

      Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models \ [Website] [Project]

      CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models \ [Website] [Project]

      Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA \ [Website] [Project]

      PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models \ [Website] [Project]

      InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning \ [Website] [Project]

      DreamTuner: Single Image is Enough for Subject-Driven Generation \ [Website] [Project]

      PALP: Prompt Aligned Personalization of Text-to-Image Models \ [Website] [Project]

      Per-Query Visual Concept Learning \ [Website] [Project]

      Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing \ [Website] [Project]

      Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models \ [NeurIPS 2024]

      ConceptPrism: Concept Disentanglement in Personalized Diffusion Models via Residual Token Optimization \ [CVPR 2026]

      ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image \ [ECCV 2024]

      Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models \ [CVPR 2024]

      JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation \ [CVPR 2024]

      DynASyn: Multi-Subject Personalization Enabling Dynamic Action Synthesis \ [AAAI 2025]

      DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models \ [AAAI 2024]

      LumiCtrl : Learning Illuminant Prompts for Lighting Control in Personalized Text-to-Image Models \ [Website]

      FreeTuner: Any Subject in Any Style with Training-free Diffusion \ [Website]

      Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling Augmentation Framework \ [Website]

      Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models \ [Website]

      Gradient-Free Textual Inversion \ [Website]

      Identity Encoder for Personalized Diffusion \ [Website]

      Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation \ [Website]

      ELODIN: Naming Concepts in Embedding Spaces \ [Website]

      Generate Anything Anywhere in Any Scene \ [Website]

      Face0: Instantaneously Conditioning a Text-to-Image Model on a Face \ [Website]

      MagiCapture: High-Resolution Multi-Concept Portrait Customization \ [Website]

      A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization \ [Website]

      DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics \ [Website]

      ACCORD: Alleviating Concept Coupling through Dependence Regularization for Text-to-Image Diffusion Personalization \ [Website]

      An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis \ [Website]

      LLM-Enabled Style and Content Regularization for Personalized Text-to-Image Generation \ [Website]

      Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization \ [Website]

      Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding \ [Website]

      RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation \ [Website]

      SeFi-IDE: Semantic-Fidelity Identity Embedding for Personalized Diffusion-Based Generation \ [Website]

      Visual Concept-driven Image Generation with Text-to-Image Diffusion Model \ [Website]

      Flux Already Knows - Activating Subject-Driven Image Generation without Training \ [Website]

      IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models \ [Website]

      MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration \ [Website]

      DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation \ [Website]

      StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models \ [Website]

      Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks \ [Website]

      Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter \ [Website]

      PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction \ [Website]

      AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models \ [Website]

      Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation \ [Website]

      PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control \ [Website]

      MagicID: Flexible ID Fidelity Generation System \ [Website]

      CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization \ [Website]

      ArtiFade: Learning to Generate High-quality Subject from Blemished Images \ [Website]

      CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization \ [Website]

      Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis \ [Website]

      Event-Customized Image Generation \ [Website]

      LEARNING TO CUSTOMIZE TEXT-TO-IMAGE DIFFUSION IN DIVERSE CONTEXT \ [Website]

      HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects \ [Website]

      Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator \ [Website]

      Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency \ [Website]

      Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects \ [Website]

      DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models \ [Website]

      RealisID: Scale-Robust and Fine-Controllable Identity Customization via Local and Global Complementation \ [Website]

      P3S-Diffusion:A Selective Subject-driven Generation Framework via Point Supervision \ [Website]

      Efficient Personalization of Quantized Diffusion Model without Backpropagation \ [Website]

      ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models \ [Website]

      Multi-party Collaborative Attention Control for Image Customization \ [Website]

      PIDiff: Image Customization for Personalized Identities with Diffusion Models \ [Website]

      BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation \ [Website]

      IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation \ [Website]

      Regularized Personalization of Text-to-Image Diffusion Models without Distributional Drift \ [Website]

      In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation \ [Website]

      Create Anything Anywhere: Layout-Controllable Personalized Diffusion Model for Multiple Subjects \ [Website]

      DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization \ [Website]

      FastFace: Tuning Identity Preservation in Distilled Diffusion via Guidance and Attention \ [Website]

      AlignGen: Boosting Personalized Image Generation with Cross-Modality Prior Alignment \ [Website]

      Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion \ [Website]

      Noise Consistency Regularization for Improved Subject-Driven Image Synthesis \ [Website]

      AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant T2I Adversarial Patches \ [Website]

      Steering Guidance for Personalized Text-to-Image Diffusion Models \ [Website]

      Subject or Style: Adaptive and Training-Free Mixture of LoRAs \ [Website]

      Comparison Reveals Commonality: Customized Image Generation through Contrastive Inversion \ [Website]

      Stencil: Subject-Driven Generation with Context Guidance \ [Website]

      CusEnhancer: A Zero-Shot Scene and Controllability Enhancement Method for Photo Customization via ResInversion \ [Website]

      EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model \ [Website]

      From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation \ [Website]

      Multi-View Consistent Human Image Customization via In-Context Learning \ [Website]

      Finetuning-Free Personalization of Text to Image Generation via Hypernetworks \ [Website]

      HiCoGen: Hierarchical Compositional Text-to-Image Generation in Diffusion Models via Reinforcement Learning \ [Website]

      PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation \ [Website]

      DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation \ [Website]

      Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits \ [Website]

      VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers \ [Website]

      Mutiple Concepts

      Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models \ [NeurIPS 2023] [Website] [Project] [Code]

      LatexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending \ [CVPR 2025 Highlight] [Project] [Code]

      Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models \ [CVPR 2024] [Project] [Code]

      FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition \ [CVPR 2024] [Project] [Code]

      OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models \ [ECCV 2024] [Project] [Code]

      MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance \ [Website] [Project] [Code]

      Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter \ [Website] [Project] [Code]

      λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space \ [Website] [Project] [Code]

      Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition \ [Website] [Project] [Code]

      Non-confusing Generation of Customized Concepts in Diffusion Models \ [Website] [Project] [Code]

      XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation \ [Website] [Project] [Code]

      MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement \ [Website] [Project] [Code]

      Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation \ [AAAI 2025] [Code]

      TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation \ [ICLR 2025] [Code]

      ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement \ [ICCV 2025] [Code]

      Cached Multi-Lora Composition for Multi-Concept Image Generation \ [Website] [Code]

      Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis \ [Website] [Code]

      LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models \ [Website] [Code]

      MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing \ [Website] [Code]

      LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer \ [Website] [Code]

      PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards \ [Website] [Code]

      LoRACLR: Contrastive Adaptation for Customization of Diffusion Models \ [CVPR 2025] [Project]

      Orthogonal Adaptation for Modular Customization of Diffusion Models \ [Website] [Project]

      LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers \ [Website] [Project]

      FocusDPO: Dynamic Preference Optimization for Multi-Subject Personalized Image Generation via Adaptive Focus \ [Website] [Project]

      FreeFuse: Multi-Subject LoRA Fusion via Auto Masking at Test Time \ [Website] [Project]

      FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion \ [Website]

      Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation \ [Website]

      FaR: Enhancing Multi-Concept Text-to-Image Diffusion via Concept Fusion and Localized Refinement \ [Website]

      ShowFlow: From Robust Single Concept to Condition-Free Multi-Concept Generation \ [Website]

      Ar2Can: An Architect and an Artist Leveraging a Canvas for Multi-Human Generation \ [Website]

      AnyMS: Bottom-up Attention Decoupling for Layout-guided and Training-free Multi-subject Customization \ [Website]

      PLACID: Identity-Preserving Multi-Object Compositing via Video Diffusion with Synthetic Trajectories \ [Website]

      Hierarchical Concept-to-Appearance Guidance for Multi-Subject Image Generation \ [Website]

      Decomposition

      Break-A-Scene: Extracting Multiple Concepts from a Single Image \ [SIGGRAPH Asia 2023] [Project] [Code]

      Concept Decomposition for Visual Exploration and Inspiration \ [SIGGRAPH Asia 2023] [Project] [Code]

      ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction \ [ECCV 2024] [Project] [Code]

      Customizing Text-to-Image Models with a Single Image Pair \ [SIGGRAPH Asia 2024] [Project] [Code]

      Decoupled Textual Embeddings for Customized Image Generation \ [AAAI 2024] [Code]

      AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization \ [Website] [Code]

      CusConcept: Customized Visual Concept Decomposition with Diffusion Models \ [Website] [Code]

      Language-Informed Visual Concept Learning \ [ICLR 2024] [Project]

      QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation \ [ICCV 2025]

      Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models \ [Website]

      PartComposer: Learning and Composing Part-Level Concepts from Single-Image Examples \ [Website]

      ID encoder

      Inserting Anybody in Diffusion Models via Celeb Basis \ [NeurIPS 2023] [Website] [Project] [Code]

      Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models \ [SIGGRAPH 2023] [Project] [Code]

      Face2Diffusion for Fast and Editable Face Personalization \ [CVPR 2024] [Project] [Code]

      CapHuman: Capture Your Moments in Parallel Universes \ [CVPR 2024] [Project] [Code]

      MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation \ [ECCV 2024] [Project] [Code]

      FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention \ [IJCV 2024] [Project] [Code]

      MagicNaming: Consistent Identity Generation by Finding a "Name Space" in T2I Diffusion Models \ [AAAI 2025] [Project] [Code]

      PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding \ [CVPR 2024] [Project] [Code]

      Visual Persona: Foundation Model for Full-Body Human Customization \ [CVPR 2025] [Project] [Code]

      InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity \ [Website] [Project] [Code]

      Concat-ID: Towards Universal Identity-Preserving Video Synthesis \ [Website] [Project] [Code]

      UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization \ [Website] [Project] [Code]

      MagicFace: Training-free Universal-Style Human Image Customized Synthesis \ [Website] [Project] [Code]

      LCM-Lookahead for Encoder-based Text-to-Image Personalization \ [Website] [Project] [Code]

      ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving \ [Website] [Project] [Code]

      ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning \ [Website] [Project] [Code]

      CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models \ [Website] [Project] [Code]

      InstantID: Zero-shot Identity-Preserving Generation in Seconds \ [Website] [Project] [Code]

      StableIdentity: Inserting Anybody into Anywhere at First Sight \ [Website] [Project] [Code]

      Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction \ [Website] [Project] [Code]

      UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward \ [Website] [Project] [Code]

      VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis \ [Website] [Project] [Code]

      ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation \ [Website] [Project] [Code]

      WithAnyone: Towards Controllable and ID Consistent Image Generation \ [Website] [Project] [Code]

      Chimera: Compositional Image Generation using Part-based Concepting \ [Website] [Project] [Code]

      High-fidelity Person-centric Subject-to-Image Synthesis \ [CVPR 2024] [Code]

      RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance \ [NeurIPS 2024] [Code]

      PuLID: Pure and Lightning ID Customization via Contrastive Alignment \ [NeurIPS 2024] [Code]

      FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization \ [Website] [Code]

      ModelScope Text-to-Video Technical Report \ [Website] [Code]

      PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium \ [Website] [Code]

      Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach \ [Website] [Code]

      ID-Booth: Identity-consistent Face Generation with Diffusion Models \ [Website] [Code]

      Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention \ [Website] [Code]

      Face-MakeUpV2: Facial Consistency Learning for Controllable Text-to-Image Generation \ [Website] [Code]

      ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation \ [SIGGRAPH Asia 2025] [Project]

      Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm \ [Website] [Project]

      FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation \ [Website] [Project]

      MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning \ [Website] [Project]

      Preventing Shortcuts in Adapter Training via Providing the Shortcuts \ [Website] [Project]

      The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment \ [Website] [Project]

      DP-Adapter: Dual-Pathway Adapter for Boosting Fidelity and Text Consistency in Customizable Human Image Generation \ [Website]

      DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability \ [Website]

      EditID: Training-Free Editable ID Customization for Text-to-Image Generation \ [Website]

      Meta-LoRA: Meta-Learning LoRA Components for Domain-Aware ID Personalization \ [Website]

      Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis \ [Website]

      Generating Synthetic Data via Augmentations for Improved Facial Resemblance in DreamBooth and InstantID \ [Website]

      FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion \ [Website]

      IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait \ [Website]

      PositionIC: Unified Position and Identity Consistency for Image Customization \ [Website]

      EditIDv2: Editable ID Customization with Data-Lubricated ID Feature Integration for Text-to-Image Generation \ [Website]

      ReMix: Towards a Unified View of Consistent Character Generation and Editing \ [Website]

      A Training-Free Approach for Multi-ID Customization via Attention Adjustment and Spatial Control \ [Website]

      HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion \ [Website]

      Diff-PC: Identity-preserving and 3D-aware Controllable Diffusion for Zero-shot Portrait Customization \ [Website]

      FaceSnap: Enhanced ID-fidelity Network for Tuning-free Portrait Customization \ [Website]

      Inject Where It Matters: Training-Free Spatially-Adaptive Identity Preservation for Text-to-Image Personalization \ [Website]

      General Concept

      DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation \ [CVPR 2023 Honorable Mention] [Website] [Project] [Official Dataset] [Unofficial Code] [Diffusers Doc] [Diffusers Code]

      An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion \ [ICLR 2023 top-25%] [Website] [Diffusers Doc] [Diffusers Code] [Code]

      Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion \ [CVPR 2023] [Website] [Project] [Diffusers Doc] [Diffusers Code] [Code]

      Cones: Concept Neurons in Diffusion Models for Customized Generation \ [ICML 2023 Oral] [ICML 2023 Oral] [Website] [Code]

      Controlling Text-to-Image Diffusion by Orthogonal Finetuning \ [NeurIPS 2023] [Website] [Project] [Code]

      ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation \ [ICCV 2023 Oral] [Website] [Code]

      A Neural Space-Time Representation for Text-to-Image Personalization \ [SIGGRAPH Asia 2023] [Project] [Code]

      Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation \ [NeurIPS 2023] [Website] [Code]

      DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization \ [CVPR 2024] [Project] [Code]

      Direct Consistency Optimization for Compositional Text-to-Image Personalization \ [NeurIPS 2024] [Project] [Code]

      SVDiff: Compact Parameter Space for Diffusion Fine-Tuning \ [ICCV 2023] [Project] [Code]

      ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance \ [ICLR 2025] [Project] [Code]

      DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation \ [ICLR 2025] [Project] [Code]

      Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning\ [SIGGRAPH 2024] [Project] [Code]

      CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization \ [TCSVT 2025] [Project] [Code]

      AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation \ [NeurIPS 2024] [Project] [Code]

      AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation \ [Website] [Project] [Code]

      Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization \ [Website] [Project] [Code]

      SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing \ [Website] [Project] [Code]

      DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model \ [Website] [Project] [Code]

      TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder \ [Website] [Project] [Code]

      EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance \ [Website] [Project] [Code]

      Directional Textual Inversion for Personalized Text-to-Image Generation \ [Website] [Project] [Code]

      Cones 2: Customizable Image Synthesis with Multiple Subjects \ [NeurIPS 2023] [Code]

      Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning \ [ECCV 2024] [Code]

      Multiresolution Textual Inversion \ [NeurIPS 2022 workshop] [Code]

      Compositional Inversion for Stable Diffusion Models \ [AAAI 2024] [Code]

      T-LoRA: Single Image Diffusion Model Customization Without Overfitting \ [Website] [Code]

      Cross Initialization for Personalized Text-to-Image Generation \ [Website] [Code]

      ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation \ [Website] [Code]

      A Closer Look at Parameter-Efficient Tuning in Diffusion Models \ [Website] [Code]

      Controllable Textual Inversion for Personalized Text-to-Image Generation \ [Website] [Code]

      Mitigating Semantic Collapse in Generative Personalization with a Surprisingly Simple Test-Time Embedding Adjustment \ [Website] [Code]

      CoAR: Concept Injection into Autoregressive Models for Personalized Text-to-Image Generation \ [Website] [Code]

      EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization \ [Website] [Project]

      $P+$: Extended Textual Conditioning in Text-to-Image Generation \ [Website] [Project]

      Beyond Fine-Tuning: A Systematic Study of Sampling Techniques in Personalized Image Generation \ [Website]

      Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias \ [Website]

      Semantic Anchoring for Robust Personalization in Text-to-Image Diffusion Models \ [Website]

      AR-based

      Personalized Text-to-Image Generation with Auto-Regressive Models \ [Website] [Code]

      Proxy-Tuning: Tailoring Multimodal Autoregressive Models for Subject-Driven Image Generation \ [Website]

      Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation \ [Website]

      Video Customization

      CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities \ [AAAI 2025] [Project] [Code]

      PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation \ [Website] [Project] [Code]

      DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control \ [Website] [Project] [Code]

      MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching \ [Website] [Project] [Code]

      VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models \ [Website] [Project] [Code]

      Motion Inversion for Video Customization \ [Website] [Project] [Code]

      AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance \ [Website] [Project] [Code]

      Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers \ [Website] [Project] [Code]

      SkyReels-A2: Compose Anything in Video Diffusion Transformers \ [Website] [Project] [Code]

      Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion \ [Website] [Project] [Code]

      MotionDirector: Motion Customization of Text-to-Video Diffusion Models \ [Website] [Project] [Code]

      Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance \ [Website] [Project] [Code]

      VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models \ [Website] [Project] [Code]

      MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling \ [Website] [Project] [Code]

      Proteus-ID: ID-Consistent and Motion-Coherent Video Customization \ [Website] [Project] [Code]

      OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions \ [Website] [Project] [Code]

      First Frame Is the Place to Go for Video Content Customization \ [Website] [Project] [Code]

      V-Warper: Appearance-Consistent Video Diffusion Personalization via Value Warping \ [Website] [Project] [Code]

      LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation \ [ACM MM 2025] [Code]

      Magic-Me: Identity-Specific Video Customized Diffusion \ [Website] [Code]

      VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence \ [Website] [Project]

      CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training \ [Website] [Code]

      Multi-subject Open-set Personalization in Video Generation \ [CVPR 2025] [Project]

      Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts \ [CVPR 2025] [Project]

      VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models \ [CVPR 2025] [Project]

      APT: Adaptive Personalized Training for Diffusion Models with Limited Data \ [CVPR 2025] [Project]

      Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models \ [Website] [Project]

      SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner \ [Website] [Project]

      MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis \ [Website] [Project]

      ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning \ [Website] [Project]

      Dynamic Concepts Personalization from Single Videos \ [Website] [Project]

      JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation \ [Website] [Project]

      PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement \ [Website] [Project]

      Lynx: Towards High-Fidelity Personalized Video Generation \ [Website] [Project]

      BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration \ [Website] [Project]

      DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization \ [Website]

      CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers \ [Website]

      Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations \ [Website]

      MoCA: Identity-Preserving Text-to-Video Generation via Mixture of Cross Attention \ [Website]

      DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models \ [NeurIPS 2023] [Website] [Code]

      Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment \ [NeurIPS 2023] [Website] [Code]

      DemoFusion: Democratising High-Resolution Image Generation With No $$$ \ [CVPR 2024] [Project] [Code]

      Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation \ [CVPR 2024] [Project] [Code]

      Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis \ [CVPR 2025] [Project] [Code]

      Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning \ [CVPR 2025] [Project] [Code]

      Training Diffusion Models with Reinforcement Learning \ [ICLR 2024] [Project] [Code]

      ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning \ [ICCV 2025] [Project] [Code]

      Divide & Bind Your Attention for Improved Generative Semantic Nursing\ [BMVC 2023 Oral] [Project] [Code]

      MultiRef: Controllable Image Generation with Multiple Visual References \ [Website] [Project] [Code]

      IP-Composer: Semantic Composition of Visual Concepts \ [Website] [Project] [Code]

      Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing \ [Website] [Project] [Code]

      Region-Adaptive Sampling for Diffusion Transformers \ [Website] [Project] [Code]

      TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning in Text-to-Image Models \ [Website] [Project] [Code]

      OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction \ [Website] [Project] [Code]

      Margin-aware Preference Optimization for Aligning Diffusion Models without Reference \ [Website] [Project] [Code]

      Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step \ [Website] [Project] [Code]

      Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation \ [Website] [Project] [Code]

      Aligning Text to Image in Diffusion Models is Easier Than You Think \ [Website] [Project] [Code]

      Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      Less-to-More Generalization: Unlocking More Controllability by In-Context Generation \ [Website] [Project] [Code]

      MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts \ [Website] [Project] [Code]

      Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets \ [Website] [Project] [Code]

      Scaling Image and Video Generation via Test-Time Evolutionary Search \ [Website] [Project] [Code]

      CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching \ [Website] [Project] [Code]

      Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions \ [Website] [Project] [Code]

      MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO \ [Website] [Project] [Code]

      Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance \ [Website] [Project] [Code]

      Real-World Image Variation by Aligning Diffusion Inversion Chain \ [Website] [Project] [Code]

      FreeU: Free Lunch in Diffusion U-Net \ [Website] [Project] [Code]

      GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis \ [Website] [Project] [Code]

      ConceptLab: Creative Generation using Diffusion Prior Constraints \ [Website] [Project] [Code]

      Aligning Text-to-Image Diffusion Models with Reward Backpropagationn \ [Website] [Project] [Code]

      Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models \ [Website] [Project] [Code]

      VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control \ [Website] [Project] [Code]

      Tiled Diffusion \ [Website] [Project] [Code]

      ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models \ [Website] [Project] [Code]

      One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls \ [Website] [Project] [Code]

      TokenCompose: Grounding Diffusion with Token-level Supervision\ [Website] [Project] [Code]

      DiffusionGPT: LLM-Driven Text-to-Image Generation System \ [Website] [Project] [Code]

      Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support \ [Website] [Project] [Code]

      ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations \ [Website] [Project] [Code]

      Be Decisive: Noise-Induced Layouts for Multi-Subject Generation \ [Website] [Project] [Code]

      Not All Thats Rare Is Lost: Causal Paths to Rare Concept Synthesis \ [Website] [Project] [Code]

      MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion \ [Website] [Project] [Code]

      ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models \ [Website] [Project] [Code]

      Stylus: Automatic Adapter Selection for Diffusion Models \ [Website] [Project] [Code]

      MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      Negative Token Merging: Image-based Adversarial Feature Guidance \ [Website] [Project] [Code]

      LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation \ [Website] [Project] [Code]

      ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment \ [Website] [Project] [Code]

      HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts \ [Website] [Project] [Code]

      Towards Transformer-Based Aligned Generation with Self-Coherence Guidance \ [Website] [Project] [Code]

      Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis \ [Website] [Project] [Code]

      TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation \ [Website] [Project] [Code]

      Image Generation from Contextually-Contradictory Prompts \ [Website] [Project] [Code]

      TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation \ [Website] [Project] [Code]

      Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences \ [Website] [Project] [Code]

      Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers \ [Website] [Project] [Code]

      CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems \ [Website] [Project] [Code]

      Detail++: Training-Free Detail Enhancer for Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models \ [Website] [Project] [Code]

      Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning \ [Website] [Project] [Code]

      PractiLight: Practical Light Control Using Foundational Diffusion Models \ [Website] [Project] [Code]

      CARINOX: Inference-time Scaling with Category-Aware Reward-based Initial Noise Optimization and Exploration \ [Website] [Project] [Code]

      Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel \ [Website] [Project] [Code]

      Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation \ [Website] [Project] [Code]

      Match-and-Fuse: Consistent Generation from Unstructured Image Sets \ [Website] [Project] [Code]

      NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation \ [Website] [Project] [Code]

      PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design \ [Website] [Project] [Code]

      Composing Concepts from Images and Videos via Concept-prompt Binding \ [Website] [Project] [Code]

      Direct Diffusion Score Preference Optimization via Stepwise Contrastive Policy-Pair Supervision \ [Website] [Project] [Code]

      CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation \ [Website] [Project] [Code]

      Iterative Refinement Improves Compositional Image Generation \ [Website] [Project] [Code]

      Minority-Focused Text-to-Image Generation via Prompt Optimization \ [CVPR 2025 Oral] [Code]

      InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment \ [CVPR 2025] [Code]

      Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models \ [ICLR 2024] [Code]

      SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models \ [ACM MM 2023 Oral] [Code]

      Enhancing Creative Generation on Stable Diffusion-based Models \ [CVPR 2025] [Code]

      Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis \ [NeurIPS 2024] [Code]

      DSPO: Direct Score Preference Optimization for Diffusion Model Alignment \ [ICLR 2025] [Code]

      Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models \ [ICLR 2025] [Code]

      Dynamic Prompt Optimizing for Text-to-Image Generation \ [CVPR 2024] [Code]

      Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models \ [CVPR 2024] [Code]

      Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance \ [CVPR 2024] [Code]

      InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization \ [CVPR 2024] [Code]

      Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards \ [CVPR 2025] [Code]

      Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models \ [ECCV 2024] [Code]

      On Discrete Prompt Optimization for Diffusion Models \ [ICML 2024] [Code]

      Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function \ [NeurIPS 2024] [Code]

      Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization \ [ACM MM 2024] [Code]

      DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models \ [NeurIPS 2023] [Code]

      Improving Compositional Generation with Diffusion Models Using Lift Scores \ [ICML 2025] [Code]

      Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas \ [ECCV 2024] [Code]

      T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation \ [ICCV 2025] [Code]

      Multimodal LLMs as Customized Reward Models for Text-to-Image Generation \ [ICCV 2025] [Code]

      RAISE: Requirement-Adaptive Evolutionary Refinement for Training-Free Text-to-Image Alignment \ [CVPR 2026] [Code]

      Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards \ [CVPR 2026] [Code]

      Alfie: Democratising RGBA Image Generation With No $$$ \ [ECCVW 2024] [Code]

      Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis \ [Website] [Code]

      Diffusion Model Alignment Using Direct Preference Optimization \ [Website] [Code]

      SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization \ [Website] [Code]

      SePPO: Semi-Policy Preference Optimization for Diffusion Alignment \ [Website] [Code]

      Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback \ [Website] [Code]

      Zigzag Diffusion Sampling: The Path to Success Is Zigzag \ [Website] [Code]

      Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models \ [Website] [Code]

      RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning \ [Website] [Code]

      Progressive Compositionality In Text-to-Image Generative Models \ [Website] [Code]

      Improving Long-Text Alignment for Text-to-Image Diffusion Models \ [Website] [Code]

      Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization \ [Website] [Code]

      RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images \ [Website] [Code]

      Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation \ [Website] [Code]

      Aggregation of Multi Diffusion Models for Enhancing Learned Representations \ [Website] [Code]

      AID: Attention Interpolation of Text-to-Image Diffusion \ [Website] [Code]

      Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance \ [Website] [Code]

      FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis \ [Website] [Code]

      ORES: Open-vocabulary Responsible Visual Synthesis \ [Website] [Code]

      Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation \ [Website] [Code]

      Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization \ [Website] [Code]

      Alignment without Over-optimization: Training-Free Solution for Diffusion Models \ [Website] [Code]

      Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness \ [Website] [Code]

      Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models \ [Website] [Code]

      IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation \ [Website] [Code]

      InstructG2I: Synthesizing Images from Multimodal Attributed Graphs \ [Website] [Code]

      Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening \ [Website] [Code]

      Detector Guidance for Multi-Object Text-to-Image Generation \ [Website] [Code]

      Designing a Better Asymmetric VQGAN for StableDiffusion \ [Website] [Code]

      T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT \ [Website] [Code]

      FABRIC: Personalizing Diffusion Models with Iterative Feedback \ [Website] [Code]

      Improving Physical Object State Representation in Text-to-Image Generative Systems \ [Website] [Code]

      IPGO: Indirect Prompt Gradient Optimization on Text-to-Image Generative Models with High Data Efficiency \ [Website] [Code]

      Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models \ [Website] [Code]

      Progressive Text-to-Image Diffusion with Soft Latent Direction \ [Website] [Code]

      Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy \ [Website] [Code]

      Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuning \ [Website] [Code]

      Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models \ [Website] [Code]

      TraDiffusion: Trajectory-Based Training-Free Image Generation \ [Website] [Code]

      If at First You Don’t Succeed, Try, Try Again:Faithful Diffusion-based Text-to-Image Generation by Selection \ [Website] [Code]

      CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models \ [Website] [Code]

      LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration \ [Website] [Code]

      LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts \ [Website] [Code]

      A General Framework for Inference-time Scaling and Steering of Diffusion Models \ [Website] [Code]

      Making Multimodal Generation Easier: When Diffusion Models Meet LLMs \ [Website] [Code]

      Enhancing Diffusion Models with Text-Encoder Reinforcement Learning \ [Website] [Code]

      You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts \ [Website] [Code]

      AltDiffusion: A Multilingual Text-to-Image Diffusion Model \ [Website] [Code]

      It is all about where you start: Text-to-image generation with seed selection \ [Website] [Code]

      End-to-End Diffusion Latent Optimization Improves Classifier Guidance \ [Website] [Code]

      ReNeg: Learning Negative Embedding with Reward Guidance \ [Website] [Code]

      Correcting Diffusion Generation through Resampling \ [Website] [Code]

      Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs \ [Website] [Code]

      Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation \ [Website] [Code]

      A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis \ [Website] [Code]

      Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models \ [Website] [Code]

      PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement \ [Website] [Code]

      Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models \ [Website] [Code]

      Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation \ [Website] [Code]

      Text-to-Image Alignment in Denoising-Based Models through Step Selection \ [Website] [Code]

      Aligning Few-Step Diffusion Models with Dense Reward Difference Learning \ [Website] [Code]

      VPO: Aligning Text-to-Video Generation Models with Prompt Optimization \ [Website] [Code]

      ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models \ [Website] [Code]

      NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models \ [Website] [Code]

      Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models \ [Website] [Code]

      Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models \ [Website] [Code]

      UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation \ [Website] [Code]

      Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance \ [Website] [Code]

      Interleaving Reasoning for Better Text-to-Image Generation \ [Website] [Code]

      Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs \ [Website] [Code]

      IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance \ [Website] [Code]

      Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity \ [Website] [Code]

      G2RPO: Granular GRPO for precise reward in flow models \ [Website] [Code]

      PEO: Training-Free Aesthetic Quality Enhancement in Pre-Trained Text-to-Image Diffusion Models with Prompt Embedding Optimization \ [Website] [Code]

      Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation \ [Website] [Code]

      World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge \ [Website] [Code]

      Reinforcing Diffusion Models by Direct Group Preference Optimization \ [Website] [Code]

      Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation \ [Website] [Code]

      Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback \ [Website] [Code]

      Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models \ [Website] [Code]

      TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware Pruning \ [Website] [Code]

      DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation \ [Website] [Code]

      Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders \ [Website] [Code]

      PromptRL: Prompt Matters in RL for Flow-Based Image Generation \ [Website] [Code]

      Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis \ [Website] [Code]

      LightIt: Illumination Modeling and Control for Diffusion Models \ [CVPR 2024] [Project]

      Compass Control: Multi Object Orientation Control for Text-to-Image Generation \ [CVPR 2025] [Project]

      Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis \ [NeurIPS 2024] [Project]

      MotiF: Making Text Count in Image Animation with Motion Focal Loss \ [CVPR 2025] [Project]

      Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment \ [ICLR 2025] [Project]

      InstanceGen: Image Generation with Instance-level Instructions \ [SIGGRAPH 2025] [Project]

      Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models \ [Website] [Project]

      PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting \ [Website] [Project]

      Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG \ [Website] [Project]

      Scalable Ranked Preference Optimization for Text-to-Image Generation \ [Website] [Project]

      PreciseCam: Precise Camera Control for Text-to-Image Generation \ [Website] [Project]

      A Noise is Worth Diffusion Guidance \ [Website] [Project]

      Generating Fine Details of Entity Interactions \ [Website] [Project]

      LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors \ [Website] [Project]

      DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode \ [Website] [Project]

      ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation \ [Website] [Project]

      CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design \ [Website] [Project]

      LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation \ [Website] [Project]

      RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance \ [Website] [Project]

      UniFL: Improve Stable Diffusion via Unified Feedback Learning \ [Website] [Project]

      Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis \ [Website] [Project]

      ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting \ [Website] [Project]

      Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation \ [Website] [Project]

      Semantic Guidance Tuning for Text-To-Image Diffusion Models \ [Website] [Project]

      ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation \ [Website] [Project]

      Dual-Process Image Generation \ [Website] [Project]

      From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning \ [Website] [Project]

      Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation \ [Website] [Project]

      Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation \ [Website] [Project]

      DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling \ [Website] [Project]

      Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation \ [Website] [Project]

      FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes \ [Website] [Project]

      Lazy Diffusion Transformer for Interactive Image Editing \ [Website] [Project]

      Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis \ [Website] [Project]

      DanceGRPO: Unleashing GRPO on Visual Generation \ [Website] [Project]

      Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models \ [Website] [Project]

      Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation \ [Website] [Project]

      FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL \ [Website] [Project]

      ComposeAnything: Composite Object Priors for Text-to-Image Generation \ [Website] [Project]

      Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models \ [Website] [Project]

      CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance \ [Website] [Project]

      OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning \ [Website] [Project]

      Data-Driven Loss Functions for Inference-Time Optimization in Text-to-Image Generation \ [Website] [Project]

      Fine-grained Defocus Blur Control for Generative Image Models \ [Website] [Project]

      MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency \ [Website] [Project]

      Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions \ [Website] [Project]

      Test-time scaling of diffusions with flow maps \ [Website] [Project]

      Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation \ [Website] [Project]

      Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model \ [Website] [Project]

      GARDO: Reinforcing Diffusion Models without Reward Hacking \ [Website] [Project]

      3D Space as a Scratchpad for Editable Text-to-Image Generation \ [Website] [Project]

      HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models \ [Website] [Project]

      Creative Image Generation with Diffusion Model \ [Website] [Project]

      DIAMOND: Directed Inference for Artifact Mitigation in Flow Matching Models \ [Website] [Project]

      Norm-guided latent space exploration for text-to-image generation \ [NeurIPS 2023] [Website]

      Improving Diffusion-Based Image Synthesis with Context Prediction \ [NeurIPS 2023] [Website]

      LaRender: Training-Free Occlusion Control in Image Generation via Latent Rendering \ [ICCV 2025 Oral]

      Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds \ [ICLR 2025 Spotlight]

      Rethinking Layered Graphic Design Generation with a Top-Down Approach \ [ICCV 2025]

      GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections \ [ECCV 2024]

      MultiGen: Zero-shot Image Generation from Multi-modal Prompt \ [ECCV 2024]

      Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation \ [ICCV 2025]

      D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples \ [ICML 2025]

      On Mechanistic Knowledge Localization in Text-to-Image Generative Models \ [ICML 2024]

      Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation \ [NeurIPS 2024]

      Generating Compositional Scenes via Text-to-image RGBA Instance Generation \ [NeurIPS 2024]

      DEFT: Decompositional Efficient Fine-Tuning for Text-to-Image Models \ [NeurIPS 2025]

      DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models \ [AAAI 2025]

      T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting \ [CVPR 2025]

      Crafting Parts for Expressive Object Composition \ [CVPR 2025]

      Preserve Anything: Controllable Image Synthesis with Object Preservation \ [ICCV 2025]

      Rare Text Semantics Were Always There in Your Diffusion Transformer \ [NeurIPS 2025]

      Diverse Text-to-Image Generation via Contrastive Noise Optimization \ [Website]

      Chain-of-Cooking:Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance \ [Website]

      Investigating and Improving Counter-Stereotypical Action Relation in Text-to-Image Diffusion Models \ [Website]

      PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity \ [Website]

      ROCM: RLHF on consistency models \ [Website]

      A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning \ [Website]

      DiffBrush:Just Painting the Art by Your Hands \ [Website]

      Decoder-Only LLMs are Better Controllers for Diffusion Models \ [Website]

      A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization \ [Website]

      PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation \ [Website]

      Exposure Diffusion: HDR Image Generation by Consistent LDR denoising \ [Website]

      Information Theoretic Text-to-Image Alignment \ [Website]

      Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers \ [Website]

      Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control \ [Website]

      Aligning Diffusion Models by Optimizing Human Utility \ [Website]

      Instruct-Imagen: Image Generation with Multi-modal Instruction \ [Website]

      CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models \ [Website]

      MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask \ [Website]

      ESPLoRA: Enhanced Spatial Precision with Low-Rank Adaption in Text-to-Image Diffusion Models for High-Definition Synthesis \ [Website]

      Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images \ [Website]

      Text2Layer: Layered Image Generation using Latent Diffusion Model \ [Website]

      Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling \ [Website]

      A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation \ [Website]

      UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion \ [Website]

      Heterogeneous Image GNN: Graph-Conditioned Diffusion for Image Synthesis \ [Website]

      RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning \ [Website]

      Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer \ [Website]

      Weak-to-Strong Diffusion with Reflection \ [Website]

      Improving Compositional Text-to-image Generation with Large Vision-Language Models \ [Website]

      Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else \ [Website]

      Unseen Image Synthesis with Diffusion Models \ [Website]

      AnyLens: A Generative Diffusion Model with Any Rendering Lens \ [Website]

      Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering \ [Website]

      Text2Street: Controllable Text-to-image Generation for Street Views \ [Website]

      Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation \ [Website]

      Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Model \ [Website]

      Debiasing Text-to-Image Diffusion Models \ [Website]

      Stochastic Conditional Diffusion Models for Semantic Image Synthesis \ [Website]

      Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion \ [Website]

      Transparent Image Layer Diffusion using Latent Transparency \ [Website]

      Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation \ [Website]

      HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances \ [Website]

      StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models \ [Website]

      Make Me Happier: Evoking Emotions Through Image Diffusion Models \ [Website]

      Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model \ [Website]

      LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model \ [Website]

      AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation \ [Website]

      U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models \ [Website]

      ECNet: Effective Controllable Text-to-Image Diffusion Models \ [Website]

      TextCraftor: Your Text Encoder Can be Image Quality Controller \ [Website]

      Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding \ [Website]

      Towards Better Text-to-Image Generation Alignment via Attention Modulation \ [Website]

      Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model \ [Website]

      SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance \ [Website]

      Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance \ [Website]

      Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models \ [Website]

      FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting \ [Website]

      Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models \ [Website]

      SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation \ [Website]

      Training-Free Sketch-Guided Diffusion with Latent Optimization \ [Website]

      Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization \ [Website]

      Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models \ [Website]

      Training-free Diffusion Model Alignment with Sampling Demons \ [Website]

      MinorityPrompt: Text to Minority Image Generation via Prompt Optimization \ [Website]

      AUTOMATED FILTERING OF HUMAN FEEDBACK DATA FOR ALIGNING TEXT-TO-IMAGE DIFFUSION MODELS \ [Website]

      PiCo: Enhancing Text-Image Alignment with Improved Noise Selection and Precise Mask Control in Diffusion Models \ [Website]

      Distribution-Conditional Generation: From Class Distribution to Creative Generation \ [Website]

      Saliency Guided Optimization of Diffusion Latents \ [Website]

      Preference Optimization with Multi-Sample Comparisons \ [Website]

      CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning \ [Website]

      Redefining in Dictionary: Towards a Enhanced Semantic Understanding of Creative Generation \ [Website]

      Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation \ [Website]

      Improving image synthesis with diffusion-negative sampling \ [Website]

      Golden Noise for Diffusion Models: A Learning Framework \ [Website]

      Test-time Conditional Text-to-Image Synthesis Using Diffusion Models \ [Website]

      Decoupling Training-Free Guided Diffusion by ADMM \ [Website]

      Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps \ [Website]

      InstructEngine: Instruction-driven Text-to-Image Alignment \ [Website]

      Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis \ [Website]

      TKG-DM: Training-free Chroma Key Content Generation Diffusion Model \ [Website]

      Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image \ [Website]

      Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory \ [Website]

      CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis \ [Website]

      Reward Incremental Learning in Text-to-Image Generation \ [Website]

      QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain \ [Website]

      Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models \ [Website]

      The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation \ [Website]

      ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance \ [Website]

      Visual Lexicon: Rich Image Features in Language Space \ [Website]

      BudgetFusion: Perceptually-Guided Adaptive Diffusion Models \ [Website]

      ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction \ [Website]

      TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization \ [Website]

      Personalized Preference Fine-tuning of Diffusion Models \ [Website]

      Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation \ [Website]

      Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods \ [Website]

      Calibrated Multi-Preference Optimization for Aligning Diffusion Models \ [Website]

      Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning \ [Website]

      IPO: Iterative Preference Optimization for Text-to-Video Generation \ [Website]

      Dual Caption Preference Optimization for Diffusion Models \ [Website]

      Generating on Generated: An Approach Towards Self-Evolving Diffusion Models \ [Website]

      CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation \ [Website]

      Fine-Tuning Diffusion Generative Models via Rich Preference Optimization \ [Website]

      Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection \ [Website]

      When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO \ [Website]

      Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation \ [Website]

      On Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation \ [Website]

      ADT: Tuning Diffusion Models with Adversarial Supervision \ [Website]

      Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment \ [Website]

      VSC: Visual Search Compositional Text-to-Image Diffusion Model \ [Website]

      MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation \ [Website]

      HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation \ [Website]

      Towards Self-Improvement of Diffusion Models via Group Preference Optimization \ [Website]

      NOFT: Test-Time Noise Finetune via Information Bottleneck for Highly Correlated Asset Creation \ [Website]

      Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations \ [Website]

      IA-T2I: Internet-Augmented Text-to-Image Generation \ [Website]

      VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL \ [Website]

      Harnessing Caption Detailness for Data-Efficient Text-to-Image Generation \ [Website]

      Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion \ [Website]

      Alchemist: Turning Public Text-to-Image Data into Generative Gold \ [Website]

      Rethinking Direct Preference Optimization in Diffusion Models \ [Website]

      ISAC: Training-Free Instance-to-Semantic Attention Control for Improving Multi-Instance Generation \ [Website]

      Policy Optimized Text-to-Image Pipeline Design \ [Website]

      Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation \ [Website]

      Rhetorical Text-to-Image Generation via Two-layer Diffusion Policy Optimization \ [Website]

      A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models \ [Website]

      DreamLight: Towards Harmonious and Consistent Image Relighting \ [Website]

      VisualPrompter: Prompt Optimization with Visual Feedback for Text-to-Image Synthesis \ [Website]

      CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step \ [Website]

      FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization \ [Website]

      Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis \ [Website]

      Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment \ [Website]

      Test-time Prompt Refinement for Text-to-Image Models \ [Website]

      AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models \ [Website]

      Diffusion Models with Adaptive Negative Sampling Without External Resources \ [Website]

      Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models \ [Website]

      Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models \ [Website]

      SAGA: Learning Signal-Aligned Distributions for Improved Text-to-Image Generation \ [Website]

      CTA-Flux: Integrating Chinese Cultural Semantics into High-Quality English Text-to-Image Communities \ [Website]

      TransLight: Image-Guided Customized Lighting Control with Generative Decoupling \ [Website]

      Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference \ [Website]

      BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models \ [Website]

      Reconstruction Alignment Improves Unified Multimodal Models \ [Website]

      Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration \ [Website]

      DiffusionNFT: Online Diffusion Reinforcement with Forward Process \ [Website]

      HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models \ [Website]

      UniAlignment: Semantic Alignment for Unified Image Generation, Understanding, Manipulation and Perception \ [Website]

      Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings \ [Website]

      CO3: Contrasting Concepts Compose Better \ [Website]

      Plug-and-Play Prompt Refinement via Latent Feedback for Diffusion Model Alignment \ [Website]

      MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models \ [Website]

      Towards Better Optimization For Listwise Preference in Diffusion Models \ [Website]

      NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation \ [Website]

      Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers \ [Website]

      Improving Text-to-Image Generation with Input-Side Inference-Time Scaling \ [Website]

      DOS: Directional Object Separation in Text Embeddings for Multi-Object Image Generation \ [Website]

      Noise Projection: Closing the Prompt-Agnostic Gap Behind Text-to-Image Misalignment in Diffusion Models \ [Website]

      DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models \ [Website]

      D2D: Detector-to-Differentiable Critic for Improved Numeracy in Text-to-Image Generation \ [Website]

      Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models \ [Website]

      Beyond Randomness: Understand the Order of the Noise in Diffusion \ [Website]

      ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation \ [Website]

      Personalized Reward Modeling for Text-to-Image Generation \ [Website]

      RubricRL: Simple Generalizable Rewards for Text-to-Image Generation \ [Website]

      Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization \ [Website]

      PromptMoG: Enhancing Diversity in Long-Prompt Image Generation via Prompt Embedding Mixture-of-Gaussian Sampling \ [Website]

      Test-Time Alignment of Text-to-Image Diffusion Models via Null-Text Embedding Optimisation \ [Website]

      Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage \ [Website]

      Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards \ [Website]

      Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment \ [Website]

      TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models \ [Website]

      AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models \ [Website]

      Geometry-Aware Scene-Consistent Image Generation \ [Website]

      VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis \ [Website]

      CritiFusion: Semantic Critique and Spectral Alignment for Faithful Text-to-Image Generation \ [Website]

      Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning \ [Website]

      Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion \ [Website]

      GDRO: Group-level Reward Post-training Suitable for Diffusion Models \ [Website]

      It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models \ [Website]

      Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes \ [Website]

      SIDiffAgent: Self-Improving Diffusion Agent \ [Website]

      SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback \ [Website]

      Di3PO -- Diptych Diffusion DPO for Targeted Improvements in Image \ [Website]

      DeDPO: Debiased Direct Preference Optimization for Diffusion Models \ [Website]

      AEGPO: Adaptive Entropy-Guided Policy Optimization for Diffusion Models \ [Website]

      Diff-Aid: Inference-time Adaptive Interaction Denoising for Rectified Text-to-Image Generation \ [Website]

      GASS: Geometry-Aware Spherical Sampling for Disentangled Diversity Enhancement in Text-to-Image Generation \ [Website] -->

      I2I translation

      SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations \ [ICLR 2022] [Website] [Project] [Code]

      DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation \ [CVPR 2022] [Website] [Code]

      CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation \ [NeurIPS 2023] [Website] [Project] [Code]

      DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations \ [CVPR 2024] [Project] [Code]

      Diffusion-based Image Translation using Disentangled Style and Content Representation \ [ICLR 2023] [Website] [Code]

      FlexIT: Towards Flexible Semantic Image Translation \ [CVPR 2022] [Website] [Code]

      Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer \ [ICCV 2023] [Website] [Code]

      E2GAN: Efficient Training of Efficient GANs for Image-to-Image Translation \ [ICML 2024] [Project] [Code]

      Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models \ [Website] [Project] [Code]

      Cross-Image Attention for Zero-Shot Appearance Transfer \ [Website] [Project] [Code]

      FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models \ [Website] [Project] [Code]

      Diffusion Guided Domain Adaptation of Image Generators \ [Website] [Project] [Code]

      Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models \ [Website] [Project] [Code]

      FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models \ [Website] [Project] [Code]

      FilterPrompt: Guiding Image Transfer in Diffusion Models \ [Website] [Project] [Code]

      Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization \ [ECCV 2024] [Code]

      One-Shot Structure-Aware Stylized Image Synthesis \ [CVPR 2024] [Code]

      BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models \ [CVPR 2023] [Code]

      Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile \ [AAAI 2024] [Code]

      Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation \ [AAAI 2024] [Code]

      ZePo: Zero-Shot Portrait Stylization with Faster Sampling \ [ACM MM 2024] [Code]

      DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer \ [ACM MM Asia 2024] [Code]

      TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control \ [Website] [Code]

      Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance \ [Website] [Code]

      Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis \ [Website] [Code]

      PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions \ [Website] [Code]

      GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis \ [Website] [Code]

      CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion \ [Website] [Code]

      PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering \ [Website] [Code]

      One-Step Image Translation with Text-to-Image Models \ [Website] [Code]

      D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods \ [Website] [Code]

      Single-Step Bidirectional Unpaired Image Translation Using Implicit Bridge Consistency Distillation \ [Website] [Project]

      StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models \ [ICCV 2023] [Website]

      ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors \ [ACM MM 2023]

      High-Fidelity Diffusion-based Image Editing \ [AAAI 2024]

      EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models \ [ECCV 2024]

      Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer \ [Website]

      UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators \ [Website]

      Regularized Distribution Matching Distillation for One-step Unpaired Image-to-Image Translation \ [Website]

      TEXTOC: Text-driven Object-Centric Style Transfer \ [Website]

      Seed-to-Seed: Image Translation in Diffusion Seed Space \ [Website]

      Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation \ [Website]

      Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation \ [Website]

      A Diffusion Model Translator for Efficient Image-to-Image Translation \ [Website]

      Bidirectional Diffusion Bridge Models \ [Website]

      LBM: Latent Bridge Matching for Fast Image-to-Image Translation \ [Website]

      Segmentation Detection Tracking

      odise: open-vocabulary panoptic segmentation with text-to-image diffusion modelss \ [CVPR 2023 Highlight] [Project] [Code] [Demo]

      LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation \ [ICCV 2023] [Website] [Project] [Code]

      Text-Image Alignment for Diffusion-Based Perception \ [CVPR 2024] [Website] [Project] [Code]

      Stochastic Segmentation with Conditional Categorical Diffusion Models\ [ICCV 2023] [Website] [Code]

      DDP: Diffusion Model for Dense Visual Prediction\ [ICCV 2023] [Website] [Code]

      DiffusionDet: Diffusion Model for Object Detection \ [ICCV 2023] [Website] [Code]

      OVTrack: Open-Vocabulary Multiple Object Tracking \ [CVPR 2023] [Website] [Project]

      Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion \ [CVPR 2024] [Project] [Code]

      SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process \ [NeurIPS 2023] [Website] [Code]

      DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction \ [CVPR 2024] [Project] [Code]

      Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features \ [Website] [Project] [Code]

      InstaGen: Enhancing Object Detection by Training on Synthetic Dataset \ [Website] [Project] [Code]

      InvSeg: Test-Time Prompt Inversion for Semantic Segmentation \ [Website] [Project] [Code]

      SMITE: Segment Me In TimE \ [Website] [Project] [Code]

      Studying Image Diffusion Features for Zero-Shot Video Object Segmentation \ [Website] [Project] [Code]

      gen2seg: Generative Models Enable Generalizable Instance Segmentation \ [Website] [Project] [Code]

      ReCon: Region-Controllable Data Augmentation with Rectification and Alignment for Object Detection \ [NeurIPS 2025 (Spotlight)] [Code]

      Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation \ [NeurIPS 2024] [Code]

      Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model \ [ECCV 2024] [Code]

      Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer \ [ICAR 2025] [Code]

      ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model \ [Website] [Code]

      SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow \ [Website] [Code]

      Delving into the Trajectory Long-tail Distribution for Muti-object Tracking \ [Website] [Code]

      Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models \ [Website] [Code]

      Scribble Hides Class: Promoting Scribble-Based Weakly-Supervised Semantic Segmentation with Its Class Label \ [Website] [Code]

      Personalize Segment Anything Model with One Shot \ [Website] [Code]

      DiffusionTrack: Diffusion Model For Multi-Object Tracking \ [Website] [Code]

      MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation \ [Website] [Code]

      A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting \ [Website] [Code]

      Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation \ [Website] [Code]

      UniGS: Unified Representation for Image Generation and Segmentation \ [Website] [Code]

      Placing Objects in Context via Inpainting for Out-of-distribution Segmentation\ [Website] [Code]

      MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation \ [Website] [Code]

      Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation \ [Website] [Code]

      Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models \ [Website] [Code]

      No Annotations for Object Detection in Art through Stable Diffusion \ [Website] [Code]

      PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes \ [Website] [Code]

      Correspondence as Video: Test-Time Adaption on SAM2 for Reference Segmentation in the Wild \ [Website] [Code]

      Object-Centric Data Synthesis for Category-level Object Detection \ [Website] [Code]

      EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models \ [ICLR 2024] [Website] [Project]

      Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers \ [NeurIPS 2025] [Project]

      Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation \ [CVPR 2024] [Project]

      FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models \ [Website] [Project]

      ReferEverything: Towards Segmenting Everything We Can Speak of in Videos \ [Website] [Project]

      DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models \ [Website] [Project]

      RefAM: Attention Magnets for Zero-Shot Referral Segmentation \ [Website] [Project]

      DM3T: Harmonizing Modalities via Diffusion for Multi-Object Tracking \ [Website] [Project]

      Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation \ [ICCV 2023] [Website]

      Conditional Latent Diffusion Models for Zero-Shot Instance Segmentation \ [ICCV 2025]

      SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection \ [CVPR 2024]

      Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers \ [ECCV 2024]

      Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation \ [NeurIPS 2024]

      Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation \ [NeurIPS 2025]

      Generalization by Adaptation: Diffusion-Based Domain Extension for Domain-Generalized Semantic Segmentation \ [WACV 2024]

      Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis \ [ACCV 2024]

      A Simple Background Augmentation Method for Object Detection with Diffusion Model \ [Website]

      Unveiling the Power of Diffusion Features For Personalized Segmentation and Retrieval \ [Website]

      SLiMe: Segment Like Me \ [Website]

      ASAM: Boosting Segment Anything Model with Adversarial Tuning \ [Website]

      Diffusion Features to Bridge Domain Gap for Semantic Segmentation \ [Website]

      MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation \ [Website]

      DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery \ [Website]

      Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models \ [Website]

      Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter \ [Website]

      Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion \ [Website]

      From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models \ [Website]

      Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation \ [Website]

      Patch-based Selection and Refinement for Early Object Detection \ [Website]

      TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models \ [Website]

      Towards Granularity-adjusted Pixel-level Semantic Annotation \ [Website]

      Gen2Det: Generate to Detect \ [Website]

      Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors \ [Website]

      ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model \ [Website]

      Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection \ [Website]

      Generative Edge Detection with Stable Diffusion \ [Website]

      DINTR: Tracking via Diffusion-based Interpolation \ [Website]

      Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking \ [Website]

      DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability \ [Website]

      Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation \ [Website]

      Panoptic Diffusion Models: co-generation of images and segmentation maps \ [Website]

      Tuning-Free Amodal Segmentation via the Occlusion-Free Bias of Inpainting Models \ [Website]

      Temporal-Conditional Referring Video Object Segmentation with Noise-Free Text-to-Video Diffusion Model \ [Website]

      GS: Generative Segmentation via Label Diffusion \ [Website]

      Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision \ [Website]

      Additional conditions

      Adding Conditional Control to Text-to-Image Diffusion Models \ [ICCV 2023 best paper] [Website] [Official Code] [Diffusers Doc] [Diffusers Code]

      T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models \ [Website] [Official Code] [Diffusers Code]

      SketchKnitter: Vectorized Sketch Generation with Diffusion Models \ [ICLR 2023 Spotlight] [Project] [Website] [Code]

      Freestyle Layout-to-Image Synthesis \ [CVPR 2023 highlight] [Website] [Project] [Code]

      Collaborative Diffusion for Multi-Modal Face Generation and Editing \ [CVPR 2023] [Website] [Project] [Code]

      HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation \ [ICCV 2023] [Website] [Project] [Code]

      FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model \ [ICCV 2023] [Website] [Code]

      AnyI2V: Animating Any Conditional Image with Motion Control \ [ICCV 2025] [Project] [Code]

      Sketch-Guided Text-to-Image Diffusion Models \ [SIGGRAPH 2023] [Project] [Code]

      Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive \ [ICLR 2024] [Project] [Code]

      SemanticControl: A Training-Free Approach for Handling Loosely Aligned Visual Conditions in ControlNet \ [BMVC 2025] [Project] [Code]

      IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts \ [Website] [Project] [Code]

      ControlNeXt: Powerful and Efficient Control for Image and Video Generation \ [Website] [Project] [Code]

      Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance \ [Website] [Project] [Code]

      Jodi: Unification of Visual Generation and Understanding via Joint Modeling \ [Website] [Project] [Code]

      Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model \ [Website] [Project] [Code]

      IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis \ [Website] [Project] [Code]

      DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation \ [Website] [Project] [Code]

      A Simple Approach to Unifying Diffusion-based Conditional Generation \ [Website] [Project] [Code]

      HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion \ [Website] [Project] [Code]

      Late-Constraint Diffusion Guidance for Controllable Image Synthesis \ [Website] [Project] [Code]

      PixelPonder: Dynamic Patch Adaptation for Enhanced Multi-Conditional Text-to-Image Generation \ [Website] [Project] [Code]

      Composer: Creative and controllable image synthesis with composable conditions \ [Website] [Project] [Code]

      DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation \ [Website] [Project] [Code]

      UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild \ [Website] [Project] [Code]

      Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      LooseControl: Lifting ControlNet for Generalized Depth Conditioning \ [Website] [Project] [Code]

      X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model \ [Website] [Project] [Code]

      ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models \ [Website] [Project] [Code]

      ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet \ [Website] [Project] [Code]

      SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior \ [Website] [Project] [Code]

      Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback \ [Website] [Project] [Code]

      RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation \ [Website] [Project] [Code]

      SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation \ [Website] [Project] [Code]

      BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment \ [Website] [Project] [Code]

      Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis \ [ICLR 2024] [Code]

      It's All About Your Sketch: Democratising Sketch Control in Diffusion Models \ [CVPR 2024] [Code]

      VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis \ [AAAI 2025] [Code]

      Efficient Text-Guided Convolutional Adapter for the Diffusion Model \ [Website] [Code]

      CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation \ [Website] [Code]

      Universal Guidance for Diffusion Models \ [Website] [Code]

      Late-Constraint Diffusion Guidance for Controllable Image Synthesis \ [Website] [Code]

      Meta ControlNet: Enhancing Task Adaptation via Meta Learning \ [Website] [Code]

      EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer \ [Website] [Code]

      Local Conditional Controlling for Text-to-Image Diffusion Models \ [Website] [Code]

      KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models \ [Website] [Code]

      Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC \ [Website] [Code]

      OminiControl: Minimal and Universal Control for Diffusion Transformer \ [Website] [Code]

      OminiControl2: Efficient Conditioning for Diffusion Transformers \ [Website] [Code]

      UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer \ [Website] [Code]

      ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning \ [Website] [Code]

      Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls \ [Website] [Code]

      Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion \ [Website] [Code]

      Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation \ [Website] [Code]

      Universal Few-Shot Spatial Control for Diffusion Models \ [Website] [Code]

      LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing \ [ICCV 2025 Oral] [Project]

      Modulating Pretrained Diffusion Models for Multimodal Image Synthesis \ [SIGGRAPH 2023] [Project]

      SpaText: Spatio-Textual Representation for Controllable Image Generation\ [CVPR 2023] [Project]

      CCM: Adding Conditional Controls to Text-to-Image Consistency Models \ [ICML 2024] [Project]

      Dreamguider: Improved Training free Diffusion-based Conditional Generation \ [Website] [Project]

      ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback \ [Website] [Project]

      AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation \ [Website] [Project]

      BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion \ [Website] [Project]

      FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection \ [Website] [Project]

      Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor \ [Website] [Project]

      SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing \ [Website] [Project]

      CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models \ [Website] [Project]

      AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation \ [Website] [Project]

      EditAR: Unified Conditional Generation with Autoregressive Models \ [Website] [Project]

      DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models \ [Website] [Project]

      RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers \ [Website] [Project]

      Context-Aware Autoregressive Models for Multi-Conditional Image Generation \ [Website] [Project]

      Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis \ [Website] [Project]

      Sketch-Guided Scene Image Generation \ [Website]

      SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation \ [Website]

      Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation \ [Website]

      Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt \ [Website]

      Adding 3D Geometry Control to Diffusion Models \ [Website]

      LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation \ [Website]

      JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling \ [Website]

      ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet \ [Website]

      Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons \ [Website]

      Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt \ [Website]

      FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation \ [Website]

      Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation \ [Website]

      Label-free Neural Semantic Image Synthesis \ [Website]

      UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation \ [Website]

      FlexControl: Computation-Aware ControlNet with Differentiable Router for Text-to-Image Generation \ [Website]

      Adding Additional Control to One-Step Diffusion with Joint Distribution Matching \ [Website]

      UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models \ [Website]

      Rethink Sparse Signals for Pose-guided Text-to-image Generation \ [Website]

      LLMControl: Grounded Control of Text-to-Image Diffusion-based Synthesis with Multimodal LLMs \ [Website]

      DivControl: Knowledge Diversion for Controllable Image Generation \ [Website]

      NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion Transformer \ [Website]

      Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation \ [Website]

      ScaleWeaver: Weaving Efficient Controllable T2I Generation with Multi-Scale Reference Attention \ [Website]

      Few-Shot

      Discriminative Diffusion Models as Few-shot Vision and Language Learners \ [Website] [Code]

      Few-Shot Diffusion Models \ [Website] [Code]

      Noise Matters: Optimizing Matching Noise for Diffusion Classifiers \ [Website] [Code]

      Few-shot Semantic Image Synthesis with Class Affinity Transfer \ [CVPR 2023] [Website]

      DiffAlign : Few-shot learning using diffusion based synthesis and alignment \ [Website]

      Few-shot Image Generation with Diffusion Models \ [Website]

      Lafite2: Few-shot Text-to-Image Generation \ [Website]

      Few-Shot Task Learning through Inverse Generative Modeling \ [Website]

      SD-inpaint

      Paint by Example: Exemplar-based Image Editing with Diffusion Models \ [CVPR 2023] [Website] [Code] [Diffusers Doc] [Diffusers Code]

      GLIDE: Towards photorealistic image generation and editing with text-guided diffusion model \ [ICML 2022 Spotlight] [Website] [Code]

      Blended Diffusion for Text-driven Editing of Natural Images \ [CVPR 2022] [Website] [Project] [Code]

      Blended Latent Diffusion \ [SIGGRAPH 2023] [Project] [Code]

      GeoComplete: Geometry-Aware Diffusion for Reference-Driven Image Completion \ [NeurIPS 2025] [Project] [Code]

      CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models \ [NeurIPS 2024] [Project] [Code]

      LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal \ [ICCV 2025] [Project] [Code]

      TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition \ [ICCV 2023] [Website] [Project] [Code]

      Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting \ [CVPR 2023] [Website] [Code]

      Improving Editability in Image Generation with Layer-wise Memory \ [CVPR 2025] [Website] [Code]

      Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models \ [ICML 2023] [Website] [Code]

      Inst-Inpaint: Instructing to Remove Objects with Diffusion Models \ [Website] [Project] [Code] [Demo]

      Coherent and Multi-modality Image Inpainting via Latent Space Optimization \ [Website] [Project] [Code]

      Paint by Inpaint: Learning to Add Image Objects by Removing Them First \ [Website] [Project] [Code]

      ObjectClear: Complete Object Removal via Object-Effect Attention \ [Website] [Project] [Code]

      Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting \ [Website] [Project] [Code]

      AnyDoor: Zero-shot Object-level Image Customization \ [Website] [Project] [Code]

      Insert Anything: Image Insertion via In-Context Editing in DiT \ [Website] [Project] [Code]

      Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion \ [Website] [Project] [Code]

      MTV-Inpaint: Multi-Task Long Video Inpainting \ [Website] [Project] [Code]

      A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting \ [Website] [Project] [Code]

      Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation \ [Website] [Project] [Code]

      Towards Language-Driven Video Inpainting via Multimodal Large Language Models \ [Website] [Project] [Code]

      Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections \ [Website] [Project] [Code]

      MiniMax-Remover: Taming Bad Noise Helps Video Object Removal \ [Website] [Project] [Code]

      EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition \ [Website] [Project] [Code]

      Improving Text-guided Object Inpainting with Semantic Pre-inpainting\ [ECCV 2024] [Code]

      FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior \ [ECCV 2024] [Code]

      One Stone with Two Birds: A Null-Text-Null Frequency-Aware Diffusion Models for Text-Guided Image Inpainting \ [NeurIPS 2025] [Code]

      360-Degree Panorama Generation from Few Unregistered NFoV Images \ [ACM MM 2023] [Code]

      Delving Globally into Texture and Structure for Image Inpainting\ [ACM MM 2022] [Code]

      FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting \ [AAAI 2026] [Code]

      PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery \ [AAAI 2025] [Code]

      ControlEdit: A MultiModal Local Clothing Image Editing Method \ [Website] [Code]

      CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing \ [Website] [Code]

      DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting \ [Website] [Code]

      Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance \ [Website] [Code]

      Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing \ [Website] [Code]

      What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle Transfer \ [Website] [Code]

      A Large-scale AI-generated Image Inpainting Benchmark \ [Website] [Code]

      Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model \ [Website] [Code]

      Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting \ [Website] [Code]

      Reference-based Image Composition with Sketch via Structure-aware Diffusion Model \ [Website] [Code]

      Image Inpainting via Iteratively Decoupled Probabilistic Modeling \ [Website] [Code]

      ControlCom: Controllable Image Composition using Diffusion Model \ [Website] [Code]

      Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model \ [Website] [Code]

      MAGICREMOVER: TUNING-FREE TEXT-GUIDED IMAGE INPAINTING WITH DIFFUSION MODELS \ [Website] [Code]

      HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models \ [Website] [Code]

      BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion \ [Website] [Code]

      Sketch-guided Image Inpainting with Partial Discrete Diffusion Process \ [Website] [Code]

      ReMOVE: A Reference-free Metric for Object Erasure \ [Website] [Code]

      Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting \ [Website] [Code]

      MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior \ [Website] [Code]

      Yuan: Yielding Unblemished Aesthetics Through A Unified Network for Visual Imperfections Removal in Generated Images \ [Website] [Code]

      OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting \ [Website] [Code]

      GuidPaint: Class-Guided Image Inpainting with Diffusion Models \ [Website] [Code]

      Efficient Zero-Shot Inpainting with Decoupled Diffusion Guidance \ [Website] [Code]

      AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes \ [ECCV 2024] [Project]

      Text2Place: Affordance-aware Text Guided Human Placement \ [ECCV 2024] [Project]

      IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation \ [CVPR 2024] [Project]

      Matting by Generation \ [SIGGRAPH 2024] [Project]

      TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting \ [CVPR 2025] [Project]

      PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference \ [NeurIPS 2024] [Project]

      Get In Video: Add Anything You Want to the Video \ [Website] [Project]

      Taming Latent Diffusion Model for Neural Radiance Field Inpainting \ [Website] [Project]

      VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control \ [Website] [Project]

      CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models \ [Website] [Project]

      DreamFuse: Adaptive Image Fusion with Diffusion Transformer \ [Website] [Project]

      SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control \ [Website] [Project]

      Towards Stable and Faithful Inpainting \ [Website] [Project]

      Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos \ [Website] [Project]

      ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion \ [Website] [Project]

      EraserDiT: Fast Video Inpainting with Diffusion Transformer Model \ [Website] [Project]

      MILD: Multi-Layer Diffusion Strategy for Complex and Precise Multi-IP Aware Human Erasing \ [Website] [Project]

      ROSE: Remove Objects with Side Effects in Videos \ [Website] [Project]

      LoVoRA: Text-guided and Mask-free Video Object Removal and Addition with Learnable Object-aware Localization \ [Website] [Project]

      LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization \ [Website] [Project]

      PixPerfect: Seamless Latent Diffusion Local Editing with Discriminative Pixel-Space Refinement \ [NeurIPS 2025]

      TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization \ [ACM MM 2024]

      Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways \ [CVPR 2025]

      ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting \ [CVPR 2025]

      MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting \ [CVPR 2025]

      Semantically Consistent Video Inpainting with Conditional Diffusion Models \ [Website]

      Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention\ [Website]

      Outline-Guided Object Inpainting with Diffusion Models \ [Website]

      SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model \ [Website]

      Gradpaint: Gradient-Guided Inpainting with Diffusion Models \ [Website]

      Infusion: Internal Diffusion for Video Inpainting \ [Website]

      Rethinking Referring Object Removal \ [Website]

      Tuning-Free Image Customization with Image and Text Guidance \ [Website]

      VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model \ [Website]

      FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image \ [Website]

      InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture \ [Website]

      Thinking Outside the BBox: Unconstrained Generative Object Compositing \ [Website]

      Content-aware Tile Generation using Exterior Boundary Inpainting \ [Website]

      AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status \ [Website]

      TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning \ [Website]

      MagicEraser: Erasing Any Objects via Semantics-Aware Control \ [Website]

      I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting \ [Website]

      VIPaint: Image Inpainting with Pre-Trained Diffusion Models via Variational Inference \ [Website]

      FreeCond: Free Lunch in the Input Conditions of Text-Guided Inpainting \ [Website]

      PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control \ [Website]

      Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment \ [Website]

      Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion \ [Website]

      Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting \ [Website]

      AsyncDSB: Schedule-Asynchronous Diffusion Schrödinger Bridge for Image Inpainting \ [Website]

      RAD: Region-Aware Diffusion Models for Image Inpainting \ [Website]

      MObI: Multimodal Object Inpainting Using Diffusion Models \ [Website]

      DiffuEraser: A Diffusion Model for Video Inpainting \ [Website]

      VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models \ [Website]

      E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization \ [Website]

      Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models \ [Website]

      DiTPainter: Efficient Video Inpainting with Diffusion Transformers \ [Website]

      PixelHacker: Image Inpainting with Structural and Semantic Consistency \ [Website]

      Geometry-Editable and Appearance-Preserving Object Compositon \ [Website]

      Towards Seamless Borders: A Method for Mitigating Inconsistencies in Image Inpainting and Outpainting \ [Website]

      DreamPainter: Image Background Inpainting for E-commerce Scenarios \ [Website]

      FreeInsert: Personalized Object Insertion with Geometric and Style Control \ [Website]

      Does FLUX Already Know How to Perform Physically Plausible Image Composition? \ [Website]

      Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models \ [Website]

      CrimEdit: Controllable Editing for Counterfactual Object Removal, Insertion, and Movement \ [Website]

      Teleportraits: Training-Free People Insertion into Any Scene \ [Website]

      VidSplice: Towards Coherent Video Inpainting via Explicit Spaced Frame Guidance \ [Website]

      Unified Long Video Inpainting and Outpainting via Overlapping High-Order Co-Denoising \ [Website]

      Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition \ [Website]

      Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers \ [Website]

      Layout Generation

      LayoutDM: Discrete Diffusion Model for Controllable Layout Generation \ [CVPR 2023] [Website] [Project] [Code]

      Desigen: A Pipeline for Controllable Design Template Generation \ [CVPR 2024] [Project] [Code]

      PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation \ [CVPR 2024] [Project] [Code]

      DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer \ [ICCV 2023] [Website] [Code]

      LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models \ [ICCV 2023] [Website] [Code]

      Desigen: A Pipeline for Controllable Design Template Generation \ [CVPR 2024] [Code]

      DogLayout: Denoising Diffusion GAN for Discrete and Continuous Layout Generation \ [Website] [Code]

      LayoutDM: Transformer-based Diffusion Model for Layout Generation \ [CVPR 2023] [Website]

      Unifying Layout Generation with a Decoupled Diffusion Model \ [CVPR 2023] [Website]

      PLay: Parametrically Conditioned Layout Generation using Latent Diffusion \ [ICML 2023] [Website]

      Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints \ [ICLR 2024]

      SLayR: Scene Layout Generation with Rectified Flow \ [Website]

      CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model \ [Website]

      Diffusion-based Document Layout Generation \ [Website]

      Dolfin: Diffusion Layout Transformers without Autoencoder \ [Website]

      LayoutFlow: Flow Matching for Layout Generation \ [Website]

      Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model \ [Website]

      Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers \ [Website]

      LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation \ [Website]

      UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation \ [Website]