Visual Lyrics: Generating Animated Text for Music Lyric Videos with an Augmented Text EditorAnimated lyric videos transform song lyrics into dynamic visual experiences, offering a powerful medium for artistic expression and audience engagement. However, creating these videos is challenging, requiring expertise in audio, typography, graphic design, and animation, making it inaccessible to novices. To address this challenge, we introduce Visual Lyrics, a proof-of-concept system for generating animated lyric videos controlled with an augmented text editor interface. We examined existing lyric videos to distill a taxonomy and design guidelines, informing the design of Visual Lyrics. Our key insight is a multimodal music analysis pipeline based on the taxonomy and leveraging LLM's strong natural language understanding and code generation capabilities to synthesize creative and semantically meaningful animations. We collected a dataset of over 300 code-driven creative text animations to serve as inspiration for our LLM-driven pipeline, which we open source. In a user study, Visual Lyrics enabled novices to easily create high-quality animated lyric videos with high ratings of enjoyment, inspiration, and exploration.2026DLDavid Chuan-En Lin et al.Carnegie Mellon UniversityAI-Assisted Creative WritingVideo Production & EditingCreative Collaboration & Feedback SystemsIUI
RankCut: A Ranking-Based LLM Approach to Extractive Summarization for Transcript-Based Video EditingVideo recordings of interviews, lectures, and meetings contain valuable moments surrounded by less essential talk. Making a shareable and meaningful shorter version of this content requires significant effort because it combines tedious, repeated operations with personal editorial decisions, which require human judgment. We introduce an editing approach that operates on video transcripts and combines a three-stage large language model pipeline with a timeline-anchored, marker-based interface so editors can inspect and refine suggestions before final assembly. The pipeline first produces an overview summary to maximize content coverage, then induces plain-language selection rules that encode editorial intent, and finally applies rule-conditioned ranking on small transcript windows to mitigate long-context limits, yielding strictly extractive, time-aligned spans under duration constraints. The interface displays groupings of short excerpts using markers with priorities and confidence cues, converting opaque model output into verifiable units within standard video editing workflows. On MeetingBank and MeetingBank-QA datasets, our method outperforms practical extractive baselines at matched lengths. In a within-subjects study with experienced video editors familiar with Premiere Pro video editing software, we found that our marker-based interface provided editors higher efficiency, control, and satisfaction than both a manual editing baseline and an opaque auto-cut condition.2026SSSana Shah et al.University of HamburgAI-Assisted Writing & Text GenerationVideo Production & EditingPrototyping & User TestingIUI
Script2Screen: Supporting Dialogue-Centric Scriptwriting with Interactive Audiovisual GenerationScriptwriting has traditionally been text-centric, a modality that only partially conveys the produced audiovisual experience. A formative study with professional writers informed us that connecting textual and audiovisual modalities can aid ideation and iteration, especially for writing dialogues. In this work, we present Script2Screen, an AI-assisted tool that integrates scriptwriting with audiovisual scene creation in a unified, synchronized workflow. Focusing on dialogues in scripts, Script2Screen generates expressive scenes with emotional speeches and animated characters through a novel text-to-audiovisual-scene pipeline. The user interface provides fine-grained controls, allowing writers to fine-tune audiovisual elements such as character gestures, speech emotions, and camera angles. A user study with both novice and professional writers from various domains demonstrated that Script2Screen’s interactive audiovisual generation enhances the scriptwriting process, facilitating iterative refinement while complementing - rather than replacing - their creative efforts.2026ZWZhecheng Wang et al.University of TorontoAI-Assisted Creative WritingVideo Production & EditingCreative Collaboration & Feedback SystemsIUI
UI Remix: Supporting UI Design Through Interactive Example Retrieval and RemixingDesigning user interfaces (UIs) is a critical step when launching products, building portfolios, or personalizing projects, yet end users without design expertise often struggle to articulate their intent and to trust design choices. Existing example-based tools either promote broad exploration, which can cause overwhelm and design drift, or require adapting a single example, risking design fixation. We present UI Remix, an interactive system that supports mobile UI design through an example-driven design workflow. Powered by a multimodal retrieval-augmented generation (MMRAG) model, UI Remix enables iterative search, selection, and adaptation of examples at both the global (whole interface) and local (component) level. To foster trust, it presents source transparency cues such as ratings, download counts, and developer information. In an empirical study with 24 end users, UI Remix significantly improved participants' ability to achieve their design goals, facilitated effective iteration, and encouraged exploration of alternative designs. Participants also reported that source transparency cues enhanced their confidence in adapting examples. Our findings suggest new directions for AI-assisted, example-driven systems that empower end users to design with greater control, trust, and openness to exploration.2026JWJunling Wang et al.Department of Computer Science, ETH AI CenterPrototyping & User TestingGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationIUI
Rewriting Video: Text-Driven Reauthoring of Video FootageVideo is a powerful medium for communication and storytelling, yet reauthoring existing footage remains challenging. Even simple edits often demand expertise, time, and careful planning, constraining how creators envision and shape their narratives. Recent advances in generative AI suggest a new paradigm: what if editing a video were as straightforward as rewriting text? To investigate this, we present a tech probe and a study on text-driven video reauthoring. Our approach involves two technical contributions: (1) a generative reconstruction algorithm that reverse-engineers video into an editable text prompt, and (2) an interactive probe, Rewrite Kit, that allows creators to manipulate these prompts. A technical evaluation of the algorithm reveals a critical human-AI perceptual gap. A probe study with 12 creators surfaced novel use cases such as virtual reshooting, synthetic continuity, and aesthetic restyling. It also highlighted key tensions around coherence, control, and creative alignment in this new paradigm. Our work contributes empirical insights into the opportunities and challenges of text-driven video reauthoring, offering design implications for future co-creative video tools.2026SWSitong Wang et al.Columbia UniversityGenerative AI (Text, Image, Music, Video)Video Production & EditingCreative Collaboration & Feedback SystemsIUI
Feedback by Design: Understanding and Overcoming User Feedback Barriers in Conversational AgentsHigh-quality feedback is essential for effective human–AI interaction. It bridges knowledge gaps, corrects digressions, and shapes system behavior; both during interaction and throughout model development. Yet despite its importance, human feedback to AI is often infrequent and low quality. This gap motivates a critical examination of human feedback during interactions with AIs. To understand and overcome the challenges preventing users from giving high-quality feedback, we conducted two studies examining feedback dynamics between humans and conversational agents (CAs). Our formative study, through the lens of Grice’s maxims, identified four Feedback Barriers---Common Ground, Verifiability, Communication, and Informativeness---that prevent high-quality feedback by users. Building on these findings, we derive three design desiderata and show that systems incorporating scaffolds aligned with these desiderata enabled users to provide higher-quality feedback. Finally, we detail a call for action to the broader AI community for advances in Large Language Models capabilities to overcome Feedback Barriers.2026NSNikhil Sharma et al.Johns Hopkins UniversityHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationExplainable AI (XAI)CHI
VidTune: Creating Video Soundtracks with Generative Music and Video-Based ThumbnailsMusic shapes the tone of videos, yet creators find it hard to find soundtracks that match their video's mood and narrative. Recent text-to-music models let creators generate music from text prompts, but our formative study (N=8) shows creators struggle to construct diverse prompts, quickly review and compare tracks, and understand their impact on the video. We present VidTune, a system that supports soundtrack creation by generating diverse music options from a creator’s prompt and producing contextual thumbnails for rapid review. VidTune extracts representative video subjects to ground thumbnails in context, maps each track’s valence and energy onto visual cues like color and brightness, and depicts prominent genres and instruments. Creators can refine tracks with natural language edits, which VidTune expands into new generations. In a controlled user study (N=12) and an exploratory case study (N=6), participants found VidTune helpful for efficiently reviewing and comparing music options and described the process as playful and enriching.2026MHMina Huh et al.University of Texas, AustinGenerative AI (Text, Image, Music, Video)Music Composition & Sound Design ToolsVideo Production & EditingCHI
I Can SE Clearly Now: Investigating the Effectiveness of GUI-based Symbolic Execution for Software Vulnerability DiscoveryWhile symbolic execution (SE) can discover software vulnerabilities, it has received limited practical adoption. A key barrier is that SE requires human expertise to understand the program’s state and prioritize paths to analyze. Traditionally, users controlled SE through programmatic API calls, but recent tooling now implements graphical user interfaces (GUI). However, it is unclear how these new features affect human-SE performance. To understand this impact, we conducted a controlled experiment where 24 vulnerability discovery experts were tasked with analyzing a binary using an SE tool with either API or GUI-based features. From this study, we identify (1) experts' SE process, and (2) the impact of GUI-based features on human-SE performance. Then we propose recommendations to improve SE tool design.2026YLYi Jou Li et al.Arizona State UniversityComputational Methods in HCIUser Research Methods (Interviews, Surveys, Observation)Prototyping & User TestingCHI
Narrix: Remixing Narrative Strategies from Examples for Story WritingExperienced storytellers decompose stories into local narrative strategies and how these strategies shape higher-level arcs. This decomposition helps writers recognize patterns in others' work and adapt those patterns to tell new stories. Novices, however, struggle to identify these strategies or to reuse them effectively. We present Narrix, a novel writing tool that helps novice writers recognize narrative strategies in example stories and repurpose these strategies in their own writing. Narrix analyzes strategies in example stories, highlights them with color-coded lexical cues and explanations, and situates them on an interactive story arc for exploration by emotional shifts and turning points. Writers then drag strategies onto multi-dimensional tracks and apply block-scoped edits to revise or continue their drafts through controlled generation steered by specified strategies. Through a within-subjects study (N=12), Narrix showed improved participants' retention, confidence, and creative adaptation of narrative strategies compared to a baseline chat-based writing interface.2026CZChao Zhang et al.Cornell UniversityAI-Assisted Creative WritingAI-Assisted Writing & Text GenerationCreative Collaboration & Feedback SystemsCHI
From Conversation to Human-AI Common Ground: Extracting Cognitive Workflows for Reuse in Sense-making TasksKnowledge workers increasingly rely on conversational AI for sense-making tasks (e.g., conducting market analysis), yet must repeatedly reconstruct context and intent to meet their goals. A formative study (N=10) showed that workflow reuse with AI often failed. Current tools either only remember preferences or enforce rigid, predefined workflows—neither adapts to evolving goals. We present ThinkFlow, a system that maintains a dynamic common ground through a cognitive workflow schema, enabling users to express intent and AI to adapt and reuse workflows across contexts. An expert-rating study shows that the schema can accurately capture the collocutor's reasoning process, and when reused for a similar task, improves the AI's responses compared to when the schema isn't present. A user study with eight knowledge workers demonstrates that ThinkFlow supports awareness of evolving workflows, intent expression, and flexible application across contexts.2026XCXinyue Chen et al.University of MichiganHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationParticipatory DesignCHI
MoSound: An Interactive Tool for Generative Sound Design in Motion GraphicsMotion graphics, which bring logos, text, and other illustrations to life, are greatly enhanced with sound effects. Sound design for motion graphics presents unique challenges due to their short, abstract nature. Sound designers must identify opportunities for adding sound, decide on the sound's character to match the visual graphics, synchronize sounds with events, and align sonic properties with motions. We introduce MoSound, an interactive system that helps with all steps of this creation process. We designed the interface of MoSound based on formative studies with practitioners and implemented the system as a combination of visual event detection, spatial attribute mapping, and generative sound stylization. We demonstrate MoSound on a variety of examples, showing that it is capable of creating high quality soundtracks while being accessible to novices.2026JHJialin Huang et al.George Mason UniversityMusic Composition & Sound Design Tools3D Modeling & AnimationCreative Collaboration & Feedback SystemsCHI
SoundStager: Interactive Design of Story-Driven GenAI Soundscapes for VideoSound effects (SFX) are critical to video storytelling by immersing viewers, directing attention, and shaping emotion. However, crafting an effective soundscape is difficult: creators must decidehow to source, place, layer, and mix sounds to support the narrative. Generative text-to-SFX tools enable users to create custom sounds, but creators often struggle to describe sounds with words and lack control over individual stems in premixed outputs. We propose SoundStager, an AI-assisted tool for designing generative soundscapes for video. SoundStager analyzes the video narrativeto create layered audio scenes (of keynote, signal, soundmark, and archetypal sounds) and supports iterative refinement through a combination of conversational and analog controls. SoundStager’s design was informed by formative studies with six professional sound designers, six video creators, and insights from sound design literature. Our user evaluation with twelve video creators shows that SoundStager enables users to quickly create satisfactory soundscapes while retaining creative control.2026SYSuhyeon Yoo et al.University of TorontoGenerative AI (Text, Image, Music, Video)Music Composition & Sound Design ToolsCreative Collaboration & Feedback SystemsCHI
Vidmento: Scaffolded Expansion for Video Storytelling with Generative VideoVideo storytelling is often constrained by available material, limiting creative expression and leaving undesired narrative gaps. Generative video offers a new way to address these limitations by augmenting captured media with tailored visuals. To explore this potential, we interviewed eight video creators to identify opportunities and challenges in integrating generative video into their workflows. Building on these insights and established filmmaking principles, we developed Vidmento, a tool for authoring hybrid video stories that combine captured and generated media through context-aware expansion. Vidmento surfaces opportunities for story development, generates clips that blend stylistically and narratively with surrounding media, and provides controls for refinement. In a study with 12 creators, Vidmento supported narrative development and exploration by systematically expanding initial materials with generative media, enabling expressive video storytelling aligned with creative intent. We highlight how creators bridge story gaps with generative content and where they find this blending capability most valuable.2026CYCatherine Yeh et al.Harvard UniversityGenerative AI (Text, Image, Music, Video)Video Production & EditingCreative Collaboration & Feedback SystemsCHI
Interview-Informed Generative Agents for Product Discovery: A Validation StudyLarge language models (LLMs) have shown strong performance on standardized social science instruments, but their value for product discovery remains unclear. We investigate whether interview-informed generative agents can simulate user responses in concept testing scenarios. Using in-depth workflow interviews with knowledge workers, we created personalized agents and compared their evaluations of novel AI concepts against the same participants’ responses. Our results show that agents are distribution-calibrated but identity-imprecise: they fail to replicate the specific individual they are grounded in, yet approximate population-level response distributions. These findings highlight both the potential and the limits of LLM simulation in design research. While unsuitable as a substitute for individual-level insights, simulation may provide value for early-stage concept screening and iteration, where distributional accuracy suffices. We discuss implications for integrating simulation responsibly into product development workflows.2026ZWZichao Wang et al.AdobeHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationUser Research Methods (Interviews, Surveys, Observation)CHI
Notational Animating: An Interactive Approach to Creating and Editing Animation KeyframesWe introduce the concept of notational animating, an interaction paradigm for animation authoring where users sketch high-level notations over static drawings to indicate intended motions, which are then interpreted by automatic methods (e.g., GenAI models) to generate animation keyframes. Sketched notations have long served as cognitive instruments for animators, capturing forces, poses, dynamics, paths, and other animation features. However, such notations are often contextual, ambiguous, and combinational based on our analysis of 135 real-world sketches. To facilitate interpretation, we first formalize these notations into a structured animation representation (i.e., source, path, and target). We then built an animation authoring system that translates high-level notations into the formalized intended animation, provides dynamic UI widgets for fine-grained parameter control, and establishes a closed feedback loop to resolve ambiguity. Finally, through a preliminary study with animators, we assess the usability of notational animating, reflect its affordance, and identify its contexts of use.2026XSXinyu Shi et al.University of Waterloo3D Modeling & AnimationCreative Collaboration & Feedback SystemsCHI
Improving User Behavior Prediction: Leveraging Annotator Metadata in Supervised Machine Learning ModelsSupervised machine-learning models often underperform in predicting user behaviors from conversational text, hindered by poor crowdsourced label quality and low NLP task accuracy. We introduce the Metadata-Sensitive Weighted-Encoding Ensemble Model (MSWEEM), which integrates annotator meta-features like fatigue and speeding. First, our results show MSWEEM outperforms standard ensembles by 14% on held-out data and 12% on an alternative dataset. Second, we find that incorporating signals of annotator behavior, such as speed and fatigue, significantly boosts model performance. Third, we find that annotators with higher qualifications, such as Master's, deliver more consistent and faster annotations. Given the increasing uncertainty over annotation quality, our experiments show that understanding annotator patterns is crucial for enhancing model accuracy in user behavior prediction.2025LNLynnette Hui Xian Ng et al.Communicating properly, interpreting signsCSCW
PosterMate: Audience-driven Collaborative Persona Agents for Poster DesignPoster designing can benefit from synchronous feedback from target audiences. However, gathering audiences with diverse perspectives and reconciling them on design edits can be challenging. Recent generative AI models present opportunities to simulate human-like interactions, but it is unclear how they may be used for feedback processes in design. We introduce PosterMate, a poster design assistant that facilitates collaboration by creating audience-driven persona agents constructed from marketing documents. PosterMate gathers feedback from each persona agent regarding poster components, and stimulates discussion with the help of a moderator to reach a conclusion. These agreed-upon edits can then be directly integrated into the poster design. Through our user study (N=12), we identified the potential of PosterMate to capture overlooked viewpoints, while serving as an effective prototyping tool. Additionally, our controlled online evaluation (N=100) revealed that the feedback from an individual persona agent is appropriate given its persona identity, and the discussion effectively synthesizes the different persona agents' perspectives.2025DSDonghoon Shin et al.AI-Assisted Creative WritingCreative Collaboration & Feedback SystemsUIST
MapStory: Prototyping Editable Map Animations with LLM AgentsWe introduce MapStory, an LLM‑powered animation prototyping tool that generates editable map animation sequences directly from natural language text by leveraging a dual-agent LLM architecture. Given a user-written script, MapStory automatically produces a scene breakdown, which decomposes the text into key map animation primitives such as camera movements, visual highlights, and animated elements. Our system includes a researcher agent that accurately queries geospatial information by leveraging an LLM with web search, enabling automatic extraction of relevant regions, paths, and coordinates while allowing users to edit and query for changes or additional information to refine the results. Additionally, users can fine-tune parameters of these primitive blocks through an interactive timeline editor. We detail the system’s design and architecture, informed by formative interviews with professional animators and by an analysis of 200 existing map animation videos. Our evaluation, which includes expert interviews (N=5), and a usability study (N=12), demonstrates that MapStory enables users to create map animations with ease, facilitates faster iteration, encourages creative exploration, and lowers barriers to creating map-centric stories.2025AGAditya Gunturu et al.Geospatial & Map VisualizationComputational Methods in HCIUIST
Morae: Proactively Pausing UI Agents for User ChoicesUser interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.2025YPYi-Hao Peng et al.Intelligent Voice Assistants (Alexa, Siri, etc.)Voice AccessibilityUIST
Refashion: Reconfigurable Garments via Modular DesignWhile bodies change over time and trends vary, most store-bought clothing comes in fixed sizes and styles and fails to adapt to these changes. Alterations can enable small changes to otherwise static garments, but these changes often require sewing and are non-reversible. We propose a modular approach to garment design that considers resizing, restyling, and reusability earlier in the clothing design process. Our contributions include a compact set of modules and connectors that form the building blocks of modular garments, a method to decompose a garment into modules via integer linear programming, and a digital design tool that supports modular garment design and simulation. Our user evaluation suggests that our approach to modular clothing design can support the creation of a wide range of garments and can help users transform clothing into different sizes and styles while reusing the same building blocks.2025RLRebecca Lin et al.Customizable & Personalized ObjectsDesign FictionUIST