Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 22 von 306

Details

Autor(en) / Beteiligte
Titel
Separate-and-Enhance: Compositional Finetuning for Text-to-Image Diffusion Models
Ist Teil von
  • ACM SIGGRAPH 2024 Conference Papers, 2024, p.1-10
Ort / Verlag
New York, NY, USA: ACM
Erscheinungsjahr
2024
Quelle
ACM Digital Library
Beschreibungen/Notizen
  • Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. In this work, we first show the fundamental reasons for such misalignment by identifying issues related to low attention activation and mask overlaps. Then we propose a compositional finetuning framework with two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores, respectively. Unlike conventional test-time adaptation methods, our model, once finetuned on critical parameters, is able to directly perform inference given an arbitrary multi-object prompt, which enhances the scalability and generalizability. Through comprehensive evaluations, our model demonstrates superior performance in image realism, text-image alignment, and adaptability, significantly surpassing established baselines. Furthermore, we show that training our model with a diverse range of concepts enables it to generalize effectively to novel concepts, exhibiting enhanced performance compared to models trained on individual concept pairs.
Sprache
Englisch
Identifikatoren
ISBN: 9798400705250
DOI: 10.1145/3641519.3657527
Titel-ID: cdi_acm_books_10_1145_3641519_3657527

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX