1 School of Artificial Intelligence, Xidian University, Xi'an, China
2 Guangzhou Institute of Technology, Xidian University, Xi'an, China
Recent methods using diffusion models have made significant progress in Human Image Generation (HIG) with various control signals such as pose priors. In HIG, both accurate human poses and coherent visual quality are crucial for image generation. However, most existing methods mainly focus on pose accuracy while neglecting overall image quality, often improving pose alignment at the cost of image quality. To address this, we propose Knowledge-Based Global Guidance and Dynamic pose Masking for human image Generation (KB-DMGen). The Knowledge Base (KB), implemented as a visual codebook, provides coarse, global guidance based on input text-related visual features, improving pose accuracy while maintaining image quality, while the Dynamic pose Masking (DM) offers fine-grained local control to enhance precise pose accuracy. By injecting KB and DM at different stages of the diffusion process, our framework enhances pose accuracy through both global and local control without compromising image quality. Experiments demonstrate the effectiveness of KB-DMGen, achieving new state-of-the-art results in terms of AP and CAP on the HumanArt dataset. The project page and code are available at https://lushbng.github.io/KBDMGen.
Dataset | Method | Pose Accuracy | Image Quality | T2I Alignment | |||
---|---|---|---|---|---|---|---|
AP(%)↑ | CAP(%)↑ | PCE↓ | FID↓ | KID↓ | Clip-score↑ | ||
Human-Art | SD* [StableDiffusion] | 0.24 | 55.71 | 2.30 | 11.53 | 3.36 | 33.33 |
T2I-Adapter [T2I-Adapter] | 27.22 | 65.65 | 1.75 | 11.92 | 2.73 | 33.27 | |
ControlNet [ControlNet] | 39.52 | 65.19 | 1.54 | 11.01 | 2.23 | 32.65 | |
Uni-ControlNet [Uni-ControlNet] | 41.94 | 69.32 | 1.48 | 14.63 | 2.30 | 32.51 | |
HumanSD [HumanSD] | 44.57 | 69.68 | 1.37 | 10.03 | 2.70 | 32.24 | |
GRPose [GRPose] | 49.50 | 70.84 | 1.43 | 13.76 | 2.53 | 32.31 | |
Stable-Pose [Stable-Pose] | 48.88 | 70.83 | 1.50 | 11.12 | 2.35 | 32.60 | |
KB-DMGen | 53.47 | 72.33 | 1.56 | 10.54 | 2.54 | 32.43 | |
Laion-Human† | SD* [StableDiffusion] | 0.24 | 55.71 | 2.30 | 11.53 | 3.36 | 33.33 |
Stable-Pose [Stable-Pose] | 48.88 | 70.83 | 1.50 | 11.12 | 2.35 | 32.60 |
Table shows that our method KB-DMGen achieves superior performance in multiple metrics compared to other approaches.
@article{KB-DMGen,
title={KB-DMGen: Knowledge-Based Global Guidance and Dynamic Pose Masking for Human Image Generation},
author={Liu, Shibang and Xie, Xuemei and Shi, Guangming},
journal={arXiv preprint arXiv:2507.20083},
year={2025}
}