TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

1University of Michigan 2NVIDIA
Work done during internship
SpaVLE @ NeurIPS’25

Abstract

Current controllable diffusion models typically rely on fixed architectures that modify intermediate activations to inject guidance conditioned on a new modality. This approach uses a static conditioning strategy for a dynamic, multi-stage denoising process, limiting the model's ability to adapt its response as the generation evolves from coarse structure to fine detail. We introduce TC-LoRA (Temporally Modulated Conditional LoRA), a new paradigm that enables dynamic, context-aware control by conditioning the model's weights directly. Our framework uses a hypernetwork to generate LoRA adapters on-the-fly, tailoring weight modifications for the frozen backbone at each diffusion step based on time and the user's condition. This mechanism enables the model to learn and execute an explicit, adaptive strategy for applying conditional guidance throughout the entire generation process. Through experiments on various data domains, we demonstrate that this dynamic, parametric control significantly enhances generative fidelity and adherence to spatial conditions compared to static, activation-based methods. TC-LoRA establishes an alternative approach in which the model's conditioning strategy is modified through a deeper functional adaptation of its weights, allowing control to align with the dynamic demands of the task and generative stage.

Background & Motivation

Why do we need static conditioning strategy for a dynamic diffusion process?


Motivation
  • Control is critical for making generative AI predictable and professional.
  • Existing methods like ControlNet [1] rely on static activation-space conditioning with a frozen adapter.
  • Diffusion models have evolving needs throughout the generation steps: from early semantic alignment to later spatial refinement [2].

[1] Adding Conditional Control to Text-to-Image Diffusion Models
[2] eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

TC-LoRA Overview

Can we provide the proper conditional guidance based on the model's changing needs?


Overview
  • Dynamic weight conditioning (adaptation)

  • ControlNet-style Ours
    Primary Site of Intervention Activation Space Weight Space
    Conditioning Strategy Static Dynamic
  • Context-aware generation via hypernetwork (H)
    • Adapters conditioned on diffusion timestep (t) and input condition (c)
    • Hypernetwork shared across different layers
  • LoRA vs. TC-LoRA

LoRA : W = W0 + BA

TC-LoRA : A(t, c) = HA (t, c)
          B(t, c) = HB (t, c)

W = W0 + B(t, c)A (t, c)

Qualitative Comparison

  • Input: Depth Map
  • Baseline: Cosmos-Transfer1 (= Cosmos-Predict1 + ControlNet)
  • Ours: Cosmos-Predict1 + TC-LoRA
Qualitative Result 1 Qualitative Result 2 Qualitative Result 3

Qualitative Comparison: Each example set comprises a text prompt (top), a depth condition (left), and the corresponding generation results obtained using ControlNet (middle) and TC-LoRA conditioning (right). Overall, TC-LoRA exhibits improved fidelity in adhering to the spatial condition.

-

Effect of TC-LoRA

Qualitative Result 1

Effect of TC-LoRA: (a) and (b) show images generated from the given text prompt without conditioning. With the condition input shown in (c), the generation process can be conditioned accordingly, demonstrating TC-LoRA’s role in enabling spatial correction through LoRA. Panels (d), (e), and (f) present visualization results at different TC-LoRA post-training durations. After 150k iterations, the generated image is well aligned with the condition in (c).

-

BibTeX

@inproceedings{cho2025tclora,
      title={TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control},
      author={Minkyoung Cho, Ruben Ohana, Christian Jacobsen, Adityan Jothi, Min-Hung Chen, Z. Morley Mao, Ethem Can},
      journal={arXiv preprint arXiv:2510.09561},
      year={2025}
}