IC-Custom: Diverse Image Customization via In-Context Learning

IC-Custom Team

Abstract

Image customization, a crucial technique for industrial media production, aims to generate content that is consistent with reference images. However, current approaches conventionally separate image customization into position-aware and position-free customization paradigms and lack a universal framework for diverse customization, limiting their applications across various scenarios.

To overcome these limitations, we propose IC-Custom, a unified framework that seamlessly integrates position-aware and position-free image customization through in-context learning. IC-Custom concatenates reference images with target images to a polyptych, leveraging DiT's multi-modal attention mechanism for fine-grained token-level interactions. We introduce the In-context Multi-Modal Attention (ICMA) mechanism with learnable task-oriented register tokens and boundary-aware positional embeddings to enable the model to correctly handle different task types and distinguish various inputs in polyptych configurations. To bridge the data gap, we carefully curated a high-quality dataset of 12k identity-consistent samples with 8k from real-world sources and 4k from high-quality synthetic data, avoiding the overly glossy and over-saturated synthetic appearance.

IC-Custom supports various industrial applications, including try-on, accessory placement, furniture arrangement, and IP customization. Extensive evaluations on our proposed ProductBench and the publicly available DreamBench demonstrate that IC-Custom significantly outperforms community workflows, closed-source models like GPT-4o, and state-of-the-art open-source approaches. IC-Custom achieves approximately 73% higher human preference across identity consistency, harmonicity, and text alignment metrics, while training only 0.4% of the original model parameters.


Position-aware Image Customization

Hover over the images to see the generated results - Our model seamlessly integrates reference content into target scenes


Position-free Image Customization

Hover over the reference images to see the generated results - Our model creates images based on text prompts while maintaining reference identity

Reference Image Generated Image

"Soft plush toy is joyfully wandering through a lush jungle..."

Reference Image Generated Image

"The long-haired dachshund lies in a sunny garden..."

Reference Image Generated Image

"A cat is lying on some ancient books..."

Reference Image Generated Image

"...rests on a rustic wooden table, filled with fresh blueberries that glisten in the morning sunlight..."

Reference Image Generated Image

"The bright yellow alarm clock is perched on a snowy mountain peak at sunrise..."

Reference Image Generated Image

"The elegant vase stands in the center of a dining table..."

Reference Image Generated Image

"A Lego figure is sitting on a weathered wooden park bench, surrounded by lush green grass and blooming flowers"

Reference Image Generated Image

"The crocheted gingerbread man is perched on a tree branch in a dense forest..."

Reference Image Generated Image

"The kitten is lounging on a lush, green meadow surrounded by wildflowers..."


BibTeX

@article{li2025iccustom,
  title={IC-Custom: Diverse Image Customization via In-Context Learning},
  author={Li, Yaowei and Li, Xiaoyu and Zhang, Zhaoyang and Bian, Yuxuan and Liu, Gan and Li, Xinyuan and Xu, Jiale and Hu, Wenbo and Liu, Yating and Li, Lingen and Cai, Jing and Zou, Yuexian and He, Yancheng and Shan, Ying},
  journal={arXiv preprint arXiv:2507.00000},
  year={2025}}