SpectraDiff: Enhancing the Fidelity of Infrared Image Translation with Object-Aware Diffusion

Autonomous systems commonly rely on RGB cameras, which are susceptible to failure in low-light and adverse conditions. Infrared (IR) imaging provides a viable alternative by capturing thermal signatures independent of visible illumination. However, its high cost and integration complexities limit widespread adoption. To address these challenges, we introduce SpectraDiff, a diffusion-based framework that synthesizes realistic IR images by fusing RGB inputs with refined semantic segmentation. Through our RGB-Seg Object-Aware (RSOA) module, SpectraDiff learns object-specific IR intensities by leveraging object-aware features. The SpectraDiff architecture, featuring a novel Spectral Attention Block, enforces self-attention among semantically similar pixels while leveraging cross-attention with the original RGB to preserve high-frequency details. Extensive evaluations on FLIR, FMB, MFNet, IDD-AW, and RANUS demonstrate SpectraDiff's superior performance over existing methods, as measured by both perceptual (FID, LPIPS, DISTS) and fidelity (SSIM, SAM) metrics.

We exploit 3DGS itself to render stereo pairs and process for more accurate depth supervision. Given a camera pose among those in the training set, we derive a corresponding right viewpoint in a fictitious stereo configuration according to an arbitrary stereo baseline. During training, for each image in the training set we render a corresponding right frame; we process the two through a stereo network to obtain depth. We train 3DGS by minimizing the difference between rendered and real images, as well as between rendered depth and the depth map obtained from stereo.

Quantitative comparison of the proposed model's performance across various infrared (IR) range datasets. The results demonstrate performance variations across different IR ranges and highlight where our model outperforms other methods based on SAM, FID, LPIPS, and DISTS metrics. The best results are shaded in green and the second-best results are shaded in yellow.

BibTeX

@article{park2026spectradiff,
title = {SpectraDiff: Enhancing the fidelity of Infrared Image Translation with object-aware diffusion},
journal = {Computer Vision and Image Understanding},
pages = {104709},
year = {2026},
issn = {1077-3142},
doi = {https://doi.org/10.1016/j.cviu.2026.104709},
url = {https://www.sciencedirect.com/science/article/pii/S1077314226000767},
author = {Incheol Park and Youngwan Jin and Nalcakan Yagiz and Hyeongjin Ju and Sanghyeop Yeo and Shiho Kim},
keywords = {Image-to-image translation, Data augmentation, Infrared imaging, Diffusion models},
}

SpectraDiff:

Enhancing the Fidelity of Infrared Image Translation with Object-Aware Diffusion

Abstract

Method

Qualitative Results

Qualitative comparisons of different models on the FLIR dataset

Qualitative comparisons with different models on the FMB dataset (top two rows) and MFNet dataset (bottom two rows) for LWIR translation.

Quantitative Results

BibTeX