Skip to content

Latest commit

 

History

History
140 lines (100 loc) · 6.42 KB

File metadata and controls

140 lines (100 loc) · 6.42 KB

👀✨🖼️Disentangled Reinforcement Learning for Robust Visual Quality Assessment

Project Page Project Page
Zehui Feng, Tian Qiu, Tong Wu, Huayuan Xu, Ting Han*,
Shanghai Jiao Tong University, Zhejiang University * denotes the corresponding author

The first NR-QA model empowered by RL2RS, capable of performing both quality reasoning and rating across IQA and VQA tasks.

fig-genexample

🏁 Method Overview. (a) Existing score/ranking reward function assign minimal difference, which results in distribution fall or robustness fail. (b) PreResIQA-R1 focus on fine-grained response-ranking reward balance and preference. (c) PreResIQA-R1 enables state-of-the-art performance and stable image quality assessment with discriminative reward. (d) typical qualitative and quantitative example comparison between VisualQuality-R1 and PreResIQA-R1, which demonstrates superior performance on image quality describe and score.

Framework

🏁 Overall training framework of PreResIQA-R1 via reinforcement-learning-to-rank-score (RL2RS). Given an image batch with a shared text prompt, PreResIQA-R1 generates K responses. To quickly activate CoT differences and then access generation stability, we introduce the response penalty and fine-grained triplet-response balance reward. To jointly enhance the robustness of ranking and score ability, we introduce the preference pairwise-and-triplet score-and-ranking reward for GRPO.

Framework
🏁 Pipeline of the Preference-Response Disentangled Policy Optimization (PRPO), which applies response ranking response balance reward, and preference pairwise score and ranking reward, and preference triplet ranking reward to optimize group policy learning.

✨ Update

[2025/10/30] 💻💻💻 We release the training and inference code of PreResVQA-R1 on video quality assessment. To extend beyond static imagery, we introduce a global–temporal and local–spatial data flow strategy. With only 28K samples, it achieves state-of-the-art performance across 5 VQA datasets while providing interpretable CoT process.

[2025/10/27] 🤗🤗🤗 We release [PreResIQA-R1-7B] fine-tuned on the Qwen2.5-VL-7B-Instruct.

[2025/10/23] 💻💻💻 We release the training and inference code of PreResIQA-R1 on image quality assessment, a preference–response disentangled reinforcement learning framework that unifies score regression and ranking consistency via reasoning-driven optimization. With only 6K samples, it achieves state-of-the-art performance across 10 IQA datasets while providing interpretable CoT process.

🔧Environment setup

quickly create a conda environment that contains the packages necessary to run our scripts on A100 and A800 GPUs.

conda create -n PreResQ python=3.11
conda activate PreResQ

bash setup.sh

🚀Quick Training and Inference

1.Quick Reinforcement-Learning Fine-Tuning Start

For IQA task:

bash run_scripts\KADID-10K\one_node_run_KADID_PreResIQA_R1.sh
--model_name_or_path [ Qwen2.5-VL-7B-Instruct path] \
--image_folders [dataset images path] \
--data_file_paths [JSON MOS_Ground_Truth file path] \

For VQA task:

bash run_scripts\KADID-10K\one_node_run_LSVQ_PreResVQA_R1.sh
--model_name_or_path [ your PreResIQA-R1 path] \
--image_folders [dataset images path] \
--data_file_paths [JSON MOS_Ground_Truth file path] \

2.quick batch sample inference

For IQA task:

python src\inference_PreResIQA_R1.py
--MODEL_PATH [ PreResIQA-R1_path] \
--image_root_path [ test_image_root_path] \
--output_root_path [ output_root_path]

For IQA task:

python src\inference_PreResVQA_R1.py
--MODEL_PATH [ PreResVQA-R1_path] \
--image_root_path [ test_image_root_path] \
--output_root_path [ output_root_path]

😺Acknowledge

We sincerely thank the following outstanding works and contributors:

  1. Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank. Authors: Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, Kede Ma.

  2. VLM-R1: A stable and generalizable R1-style Large Vision-Language Model Authors: Haozhan Shen, Peng Liu, Jingcheng Li, Chunxin Fang, Yibo Ma, Jiajia Liao, Qiaoli Shen, Zilun Zhang, Kangjia Zhao, Qianqian Zhang, Ruochen Xu, Tiancheng Zhao


🏷️ License

This repository is released under the MIT license. See LICENSE for additional details.