UB Paderborn / Katalog / Suche / Details

Ergebnis 22 von 306

ACM SIGGRAPH 2024 Conference Papers, 2024, p.1-10

2024

Volltextzugriff (PDF)

Autor(en) / Beteiligte

Titel

Separate-and-Enhance: Compositional Finetuning for Text-to-Image Diffusion Models

Ist Teil von

Ort / Verlag

New York, NY, USA: ACM

Erscheinungsjahr

2024

Quelle

ACM Digital Library

Beschreibungen/Notizen

Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. In this work, we first show the fundamental reasons for such misalignment by identifying issues related to low attention activation and mask overlaps. Then we propose a compositional finetuning framework with two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores, respectively. Unlike conventional test-time adaptation methods, our model, once finetuned on critical parameters, is able to directly perform inference given an arbitrary multi-object prompt, which enhances the scalability and generalizability. Through comprehensive evaluations, our model demonstrates superior performance in image realism, text-image alignment, and adaptability, significantly surpassing established baselines. Furthermore, we show that training our model with a diverse range of concepts enables it to generalize effectively to novel concepts, exhibiting enhanced performance compared to models trained on individual concept pairs.

Sprache: Englisch
Identifikatoren: ISBN: 9798400705250
DOI: 10.1145/3641519.3657527
Titel-ID: cdi_acm_books_10_1145_3641519_3657527

Format: –
Schlagworte: Computing methodologies -- Computer graphics -- Image manipulation -- Image processing

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX