CubiD: Cubic Discrete Diffusion for High-Dimensional Representation Tokens
_{Official PyTorch Implementation}

Can we generate high-dimensional semantic representations discretely, just like language models generate text?

Generating high-dimensional semantic representations has long been a pursuit for visual generation, yet discrete methods, the paradigm shared with language models, remain stuck with low-dimensional tokens. CubiD breaks this barrier with fine-grained cubic masking across the h×w×d tensor, directly modeling dependencies across both spatial and dimensional axes in 768 dim representation space, while the discretized tokens preserve their original understanding capabilities.

This is a PyTorch/GPU implementation of the paper Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens:

@article{wang2025cubic,
  title={Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens},
  author={Wang, Yuqing and Ma, Chuofan and Lin, Zhijie and Teng, Yao and Yu, Lijun and Wang, Shuai and Han, Jiaming and Feng, Jiashi and Jiang, Yi and Liu, Xihui},
  journal={arXiv preprint arXiv:2603.19232},
  year={2026}
}

Preparation

Dataset

Download ImageNet dataset, and place it in your IMAGENET_PATH.

Installation

Download the code:

git clone https://github.com/YuqingWang1029/CubiD.git
cd CubiD

Please refer to TokenBridge and RAE for environment setup.

Pre-trained Models

Download pre-trained CubiD models and RAE weights from Hugging Face.

Generation

Evaluation (ImageNet 256x256)

For example, evaluate CubiD-Large (without CFG):

torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \
main_cubid.py \
--img_size 256 --encoder_size 224 \
--encoder_name facebook/dinov2-with-registers-base \
--decoder_path ${RAE_DECODER_PATH} \
--stats_path ${RAE_STATS_PATH} \
--vae_embed_dim 768 --vae_stride 14 \
--model cubid_large \
--quant_bits 3 --quant_min -9.0 --quant_max 9.0 \
--eval_bsz 32 --num_images 50000 \
--num_iter 1536 --cfg 1.0 --cfg_schedule constant --temperature 1.0 \
--output_dir ${OUTPUT_DIR} \
--resume cubid_ckpts/cubid_large \
--data_path ${IMAGENET_PATH} --evaluate

The --resume argument points to a folder (e.g., cubid_ckpts/cubid_large), which automatically loads the checkpoint inside.
Generation steps can be set from 256 to 1536. More steps generally lead to better results.

(Optional) Caching RAE Latents

The RAE latents can be pre-computed and saved to CACHED_PATH to accelerate training:

torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 \
main_cache.py \
--img_size 256 --encoder_size 224 \
--encoder_name facebook/dinov2-with-registers-base \
--decoder_path ${RAE_DECODER_PATH} \
--stats_path ${RAE_STATS_PATH} \
--batch_size 128 \
--data_path ${IMAGENET_PATH} --cached_path ${CACHED_PATH}

Training

Script for the default setting (CubiD-Large, 800 epochs, 64 GPUs):

torchrun --nproc_per_node=8 --nnodes=8 --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \
main_cubid.py \
--img_size 256 --encoder_size 224 \
--encoder_name facebook/dinov2-with-registers-base \
--decoder_path ${RAE_DECODER_PATH} \
--stats_path ${RAE_STATS_PATH} \
--vae_embed_dim 768 --vae_stride 14 --patch_size 1 \
--model cubid_large \
--quant_bits 3 --quant_min -9.0 --quant_max 9.0 \
--mask_ratio_min 0.5 --mask_std 0.1 \
--epochs 800 --warmup_epochs 100 --batch_size 32 --blr 5e-5 --lr_schedule cosine \
--output_dir ${OUTPUT_DIR} --resume ${OUTPUT_DIR} \
--data_path ${IMAGENET_PATH}

(Optional) To train with cached RAE latents, add --use_cached --cached_path ${CACHED_PATH} to the arguments.
(Optional) To save GPU memory during training, add --grad_checkpointing to the arguments.

Acknowledgements

Part of the code is based on MAR and TokenBridge. We use RAE for representation encoding and decoding. Thanks for their awesome work!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
models		models
util		util
README.md		README.md
demo.jpg		demo.jpg
engine.py		engine.py
main_cache.py		main_cache.py
main_cubid.py		main_cubid.py
rae.py		rae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CubiD: Cubic Discrete Diffusion for High-Dimensional Representation Tokens
_{Official PyTorch Implementation}

Preparation

Dataset

Installation

Pre-trained Models

Generation

Evaluation (ImageNet 256x256)

(Optional) Caching RAE Latents

Training

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

CubiD: Cubic Discrete Diffusion for High-Dimensional Representation Tokens Official PyTorch Implementation

Preparation

Dataset

Installation

Pre-trained Models

Generation

Evaluation (ImageNet 256x256)

(Optional) Caching RAE Latents

Training

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

CubiD: Cubic Discrete Diffusion for High-Dimensional Representation Tokens
_{Official PyTorch Implementation}

Packages