Image customization, a crucial technique for industrial media production, aims to generate content that is consistent with reference images. However, current approaches conventionally separate image customization into position-aware and position-free customization paradigms and lack a universal framework for diverse customization, limiting their applications across various scenarios.
To overcome these limitations, we propose IC-Custom, a unified framework that seamlessly integrates position-aware and position-free image customization through in-context learning. IC-Custom concatenates reference images with target images to a polyptych, leveraging DiT's multi-modal attention mechanism for fine-grained token-level interactions. We introduce the In-context Multi-Modal Attention (ICMA) mechanism with learnable task-oriented register tokens and boundary-aware positional embeddings to enable the model to correctly handle different task types and distinguish various inputs in polyptych configurations. To bridge the data gap, we carefully curated a high-quality dataset of 12k identity-consistent samples with 8k from real-world sources and 4k from high-quality synthetic data, avoiding the overly glossy and over-saturated synthetic appearance.
IC-Custom supports various industrial applications, including try-on, accessory placement, furniture arrangement, and IP customization. Extensive evaluations on our proposed ProductBench and the publicly available DreamBench demonstrate that IC-Custom significantly outperforms community workflows, closed-source models like GPT-4o, and state-of-the-art open-source approaches. IC-Custom achieves approximately 73% higher human preference across identity consistency, harmonicity, and text alignment metrics, while training only 0.4% of the original model parameters.
Hover over the images to see the generated results - Our model seamlessly integrates reference content into target scenes
Reference
Target → Generated
Reference
Target → Generated
Reference
Target → Generated
Reference
Target → Generated
Hover over the reference images to see the generated results - Our model creates images based on text prompts while maintaining reference identity
@article{li2025iccustom,
title={IC-Custom: Diverse Image Customization via In-Context Learning},
author={Li, Yaowei and Li, Xiaoyu and Zhang, Zhaoyang and Bian, Yuxuan and Liu, Gan and Li, Xinyuan and Xu, Jiale and Hu, Wenbo and Liu, Yating and Li, Lingen and Cai, Jing and Zou, Yuexian and He, Yancheng and Shan, Ying},
journal={arXiv preprint arXiv:2507.00000},
year={2025}}