Collaposer

Collaposer empowers individual creators to prepare visual assets for collage-based storytelling with a streamlined workflow by automatically selecting, cutting out, and presenting diverse visual assets based on the input photo collection and story description. Given a curated overview of the extracted visual elements, users may compose them into a static collage, or animate them to create expressive visual stories.

Abstract

Digital collage is an artistic practice that combines image cutouts to tell stories. However, preparing cutouts from a set of photos remains a tedious and time-consuming task. A formative study identified three main challenges: 1) inefficient search for relevant photos, 2) manual image cutout, and 3) difficulty in organizing large sets of cutouts. To meet these challenges and facilitate asset preparation for collage, we propose Collaposer, a tool that transforms a collection of photos into organized, ready-to-use visual cutouts based on user-provided story descriptions. Collaposer tags, detects, and segments photos, and then uses an LLM to select central and related labels based on the user-provided story description. Collaposer presents the resulting visuals in varying sizes, clustered according to semantic hierarchy. Our evaluation shows that Collaposer effectively automates the preparation process to produce diverse sets of visual cutouts adhering to the storyline, allowing users to focus on collaging these assets for storytelling.

Our pipeline consists of three stages. The inputs include an image collection and a story description. In Stage I, valid visual elements are trimmed out and tagged with an object name. In Stage II, visual elements relevant to the story are selected and clustered into semantic groups. The elements classified as characters undergo part segmentation and pose estimation for later manipulation. In Stage III, the visual assets are visualized in a compact view to facilitate navigation and composition.

Animated Collage Results

* Hover the thumbnail to play animation on web.

* Click the thumbnail to see details.

User Evaluation Questionnaire Results

User ratings across four evaluation dimensions—Consistency, Diversity, Presentation, and Usability—covering eleven question items (Q1–Q4) for three system variants: Collaposer (c), Ablated-Select (as), and Ablated-Present (ap). Ratings were collected on a 7-point Likert scale (1 = Strongly Disagree, 7 = Strongly Agree). Asterisks (*) indicate statistically significant differences in mean ratings (p < .05 / 3). Overall, the results indicate that Collaposer supports more effective story-aligned asset selection and presentation compared to the baselines. Ablated-Select shows the weakest asset-story consistency, often failing to provide relevant elements and occasionally including unrelated ones. Ablated-Present delivers the least satisfactory presentation results, though it has a relatively smaller impact on system usability.

Usage Scenario Video