Protosampling: Enabling Free-Form Convergence of Sampling and Prototyping through Canvas-Driven Visual AI GenerationAs an emergent process, creativity relies on explorations via sampling and prototyping for problem construction. These activities compile knowledge, provide a context enveloping the solution, and answer questions. With Generative AI, practitioners can go beyond sampling existing media towards instantly generating and remixing new ones. We refer to this convergence as 'Protosampling'. Using existing literature we ground a definition for protosampling and operationalize it through Atelier, a canvas-like system that leverages a variety of generative image and video models for visual creation. Atelier: (1) blends the spaces for thinking and creation, where both references and generated assets co-exist in one space, (2) provides various encapsulated technical workflows that focus on the activity at hand, and (3) enables navigating emergence through interactive visualizations, smart search, and collections. Protosampling as a lens reframes creative work to emphasize the process itself and how seemingly disjointed thoughts can tightly interweave into a final solution.2026AGAlicia Guo et al.Autodesk ResearchGenerative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsVideo Production & EditingCHI
PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video PrevisualizationIn pre-production, filmmakers and 3D animation experts must rapidly prototype ideas to explore a film's possibilities before full-scale production, yet conventional approaches involve trade-offs in efficiency and expressiveness. Hand-drawn storyboards often lack spatial precision needed for complex cinematography, while 3D previsualization demands expertise and high-quality rigged assets. To address this gap, we present PrevizWhiz, a system that leverages rough 3D scenes in combination with generative image and video models to create stylized video previews. The workflow integrates frame-level image restyling with adjustable resemblance, time-based editing through motion paths or external video inputs, and refinement into high-fidelity video clips. A study with filmmakers demonstrates that our system lowers technical barriers for filmmakers, accelerates creative iteration, and effectively bridges the communication gap, while also surfacing challenges of continuity, authorship, and ethical consideration in AI-assisted filmmaking.2026EHErzhen Hu et al.Autodesk ResearchGenerative AI (Text, Image, Music, Video)3D Modeling & AnimationCreative Collaboration & Feedback SystemsCHI
GroundLink: Exploring How Contextual Meeting Snippets Can Close Common Ground Gaps in Editing 3D Scenes for Virtual ProductionVirtual Production (VP) professionals often face challenges accessing tacit knowledge and creative intent, which are important in forming common ground with collaborators and in contributing more effectively and efficiently to the team. From our formative study (N=23) with a follow-up interview (N=6), we identified the significance and prevalence of this challenge. To help professionals access knowledge, we present GroundLink, a Unity add-on that surfaces meeting-derived knowledge directly in the editor to support establishing common ground. It features a meeting knowledge dashboard for capturing and reviewing decisions and comments, constraint-aware feedforward that proactively informs the editor environment, and cross-modal synchronization that provides referential links between the dashboard and the editor. A comparative study (N=12) suggested that GroundLink help users build common ground with their team while improving perceived confidence and ease of editing the 3D scene. An expert evaluation with VP professionals (N=5) indicated strong potential for GroundLink in real-world workflows.2026GPGun Woo (Warren) Park et al.Autodesk ResearchMixed Reality WorkspacesCreative Collaboration & Feedback Systems3D Modeling & AnimationCHI
Lost in Translation: The Value of Verbalizations in Interpreting 3D Computer-Aided Design WorkflowsAI assistants are transforming creative and knowledge domains, holding similar promise for mechanical design via 3D CAD software. Yet, current AI assistance for CAD relies on geometry or command history, lacking rich design intent. We investigate think-aloud computing as a lightweight approach to capture designers' spoken intent and inform how future AI assistance could leverage this to provide in-situ feedback. Through a three-part study with 10 designers and 10 experts, we (1) recorded designers' think-aloud verbalizations during 3D modelling, (2) compared expert feedback with and without think-aloud recordings, and (3) interviewed the original designers to evaluate feedback quality. Findings show that verbalizations surface rationale, future actions, and challenges --- insights absent from geometric and command data --- that enable feedback attuned to designers' goals. By harnessing think-aloud data, we uncover when to intervene, what to prompt, and characteristics of effective feedback, paving the way for context-aware AI assistance for CAD.2026KCKathy Cheng et al.Autodesk ResearchGenerative AI (Text, Image, Music, Video)AI-Assisted Decision-Making & AutomationPrototyping & User TestingCHI
PointAloud: An Interaction Suite for AI-Supported Pointer-Centric Think-Aloud ComputingThink-Aloud Computing, a method for capturing users’ verbalized thoughts during software tasks, allows eliciting rich contextual insights into evolving intentions, struggles, and decision-making processes of users in real-time. However, existing approaches face practical challenges: users often lack awareness of what is captured by the system, are not effectively encouraged to speak, and miss or are interrupted by system feedback. Additionally, thinking aloud should feel worthwhile for users due to the gained contextual AI assistance. To better support and harness Think-Aloud Computing, we introduce PointAloud, a suite of novel AI-driven pointer-centric interactions for in-the-moment verbalization encouragement, low-distraction system feedback, and contextually rich work process documentation alongside proactive AI assistance. Our user study with 12 participants provides insights into the value of pointer-centric think-aloud computing for work process documentation and human-AI co-creation. We conclude by discussing the broader implications of our findings and design considerations for pointer-centric and AI-supported Think-Aloud Computing workflows.2026FGFrederic Gmeiner et al.Autodesk ResearchHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationPrototyping & User TestingCHI
PlayWrite: A Multimodal System for AI Supported Narrative Co-Authoring Through Play in XRCurrent AI writing tools, which rely on text prompts, poorly support the spatial and interactive nature of storytelling where ideas emerge from direct manipulation and play. We present PlayWrite, a mixed-reality system where users author stories by directly manipulating virtual characters and props. A multi-agent AI pipeline interprets these actions into Intent Frames—structured narrative beats visualized as rearrangeable story marbles on a timeline. A large language model then transforms the user’s assembled sequence into a final narrative. A user study (N=13) with writers from varying domains found that PlayWrite fosters a highly improvisational and playful process. Users treated the AI as a collaborative partner, using its unexpected responses to spark new ideas and overcome creative blocks. PlayWrite demonstrates an approach for co-creative systems that move beyond text to embrace direct manipulation and play as core interaction modalities.2026ETEsen K. Tütüncü et al.Autodesk ResearchIdentity & Avatars in XRCreative Collaboration & Feedback SystemsSocial & Collaborative VRCHI
WhatIF: Branched Narrative Fiction Visualization for Authoring Emergent Narratives using Large Language ModelsBranched Narrative Fiction (BNF) are non-linear, text based narrative games, where the player of the game is an active participant shaping the story. Unlike linear narratives, BNF allows players to influence the direction, outcomes, and progression of the plot. A narrative fiction developer designs these branching storylines, creating a dynamic interaction between the player and the narrative which requires significant time and skill. In this work we build and investigate the use of a visual analytics tool to help narrative fiction developers generate and plan these parallel worlds within a BNF. We present WhatIF, a visual analytics tool that aids BNF developers to create BNF graphs, edit the graphs, obtain recommendations, visualize differences between storylines and finally verify their BNF on custom metrics. Through a formative study (3 participants) and a user study (11 participants), we observe that WhatIF helps users plan and prototype their BNF, provides avenues to support iterative refinement of narrative and also aids in removing writer's block. Furthermore, we explore how contemporary generative AI (GenAI) tools can empower game developers to build richer and more immersive narratives.2025AMAditi Mishra et al.Generative AI (Text, Image, Music, Video)AI-Assisted Creative WritingC&C
Paratrouper: Exploratory Creation of Character Cast Visuals Using Generative AIGreat characters are critical to the success of many forms of media, such as comics, games, and films. Designing visually compelling casts of characters requires significant skill and consideration, and there is a lack of specialized tools to support this endeavor. We investigate how AI-driven image-generation techniques can empower creatives to explore a variety of visual design possibilities for individual and groups of characters. Informed by interviews with character designers, Paratrouper is a multi-modal system that enables creating and experimenting with multiple permutations for character casts and visualizing them in various contexts as part of a holistic approach to design. We demonstrate how Paratrouper supports different aspects of the character design process, and share insights from its use by eight creators. Our work highlights the interplay between creative agency and serendipity, as well as the visual interrelationships among character aesthetics.2025JLJoanne Leong et al.MIT, MIT Media LabGenerative AI (Text, Image, Music, Video)3D Modeling & AnimationCHI
To Use or Not to Use: Impatience and Overreliance When Using Generative AI Productivity Support ToolsGenerative AI has the potential to assist people with completing various tasks, but increased productivity is not guaranteed due to challenges such as uncertainty in output quality and unclear processing time. Through an online crowdsourced experiment (N=508), leveraging a “paint by numbers” task to simulate properties of GenAI assistance, we explore how, and how well, users make decisions on whether to use or not use automation to maximize their productivity given varying waiting times and output quality. We observed gaps between user’s actual choices and their optimal choices and characterized these gaps as the “gulf of impatience” and the “gulf of overreliance”. We also distilled strategies that participants adopted when making their decisions. We discuss design considerations in supporting users to make more informed decisions when interacting with GenAI tools and make these tools more useful for improving users’ task performance, productivity and satisfaction.2025HQHan Qiao et al.Autodesk ResearchGenerative AI (Text, Image, Music, Video)AI-Assisted Decision-Making & AutomationCHI
AQuA: Automated Question-Answering in Software Tutorial Videos with Visual Anchors Tutorial videos are a popular help source for learning feature-rich software. However, getting quick answers to questions about tutorial videos is difficult. We present an automated approach for responding to tutorial questions. By analyzing 633 questions found in 5,944 video comments, we identified different question types and observed that users frequently described parts of the video in questions. We then asked participants (N=24) to watch tutorial videos and ask questions while annotating the video with relevant visual anchors. Most visual anchors referred to UI elements and the application workspace. Based on these insights, we built AQuA, a pipeline that generates useful answers to questions with visual anchors. We demonstrate this for Fusion 360, showing that we can recognize UI elements in visual anchors and generate answers using GPT-4 augmented with that visual information and software documentation. An evaluation study (N=16) demonstrates that our approach provides better answers than baseline methods.2024SYSaelyne Yang et al.Autodesk Research, School of Computing, KAISTHuman-LLM CollaborationOnline Learning & MOOC PlatformsCHI
SwitchSpace: Understanding Context-Aware Peeking Between VR and Desktop InterfacesCross-reality tasks, like creating or consuming virtual reality (VR) content, often involve inconvenient or distracting switches between desktop and VR. An initial formative study explores cross-reality switching habits, finding most switches are momentary "peeks" between interfaces, with specific habits determined by current context. The results inform a design space for context-aware "peeking" techniques that allow users to view or interact with desktop from VR, and vice versa, without fully switching. We implemented a set of peeking techniques and evaluated them in two levels of a cross-reality task: one requiring only viewing, and another requiring input and viewing. Peeking techniques made task completion faster, with increased input accuracy and reduced perceived workload.2024JWJohann Wentzel et al.University of WaterlooMixed Reality WorkspacesContext-Aware ComputingCHI
TimeTunnel: Integrating Spatial and Temporal Motion Editing for Character Animation in Virtual RealityEditing character motion in Virtual Reality is challenging as it requires working with both spatial and temporal data using controls with multiple degrees of freedom. The spatial and temporal controls are separated, making it difficult to adjust poses over time and predict the effects across adjacent frames. To address this challenge, we propose TimeTunnel, an immersive motion editing interface that integrates spatial and temporal control for 3D character animation in VR. TimeTunnel provides an approachable editing experience via KeyPoses and Trajectories. KeyPoses are a set of representative poses automatically computed to concisely depict motion. Trajectories are 3D animation curves that pass through the joints of KeyPoses to represent in-betweens. TimeTunnel integrates spatial and temporal control by superimposing Trajectories and KeyPoses onto a 3D character. We conducted two studies to evaluate TimeTunnel. In our quantitative study, TimeTunnel reduced the amount of time required for editing motion, and saved effort in locating target poses. Our qualitative study with domain experts demonstrated how TimeTunnel is an approachable interface that can simplify motion editing, while still preserving a direct representation of motion.2024QZQian Zhou et al.Autodesk ResearchImmersion & Presence Research3D Modeling & AnimationCHI
WorldSmith: A Multi-Modal Image Synthesis Tool for Fictional World BuildingCrafting a rich and unique environment is crucial for fictional world-building, but can be difficult to achieve since illustrating a world from scratch requires time and significant skill. We investigate the use of recent multi-modal image generation systems to enable users iteratively visualize and modify elements of their fictional world using a combination of text input, sketching, and region-based filling. WorldSmith enables novice world builders to quickly visualize a fictional world with layered edits and hierarchical compositions. Through a formative study (4 participants) and first-use study (13 participants) we demonstrate that WorldSmith offers more expressive interactions with prompt-based models. With this work, we explore how creatives can be empowered to leverage prompt-based generative AI as a tool in their creative process, beyond current "click-once" prompting UI paradigms.2023HDHai Duong Dang et al.Generative AI (Text, Image, Music, Video)AI-Assisted Creative WritingGraphic Design & Typography ToolsUIST
3DALL-E: Integrating Text-to-Image AI in 3D Design WorkflowsText-to-image AI are capable of generating novel images for inspiration, but their applications for 3D design workflows and how designers can build 3D models using AI-provided inspiration have not yet been explored. To investigate this, we integrated DALL-E, GPT-3, and CLIP within a CAD software in 3DALL-E, a plugin that generates 2D image inspiration for 3D design. 3DALL-E allows users to construct text and image prompts based on what they are modeling. In a study with 13 designers, we found that designers saw great potential in 3DALL-E within their workflows and could use text-to-image AI to produce reference images, prevent design fixation, and inspire design considerations. We elaborate on prompting patterns observed across 3D modeling tasks and provide measures of prompt complexity observed across participants. From our findings, we discuss how 3DALL-E can merge with existing generative design workflows and propose prompt bibliographies as a form of human-AI design history.2023VLVivian Liu et al.Generative AI (Text, Image, Music, Video)Customizable & Personalized ObjectsDIS
Immersive Sampling: Exploring Sampling for Future Creative Practices in Media-Rich, Immersive SpacesCreative practitioners rely on sampling to understand, explore, and construct problems; or gather resources for later use. Despite practitioners' ability to experience immersive environments, sampling from them remains limited to primarily visual captures (e.g., screenshots, videos), which overlook the richness and variety of available media. To address these challenges, we describe ''Immersive Sampling'' as a new way to frame information gathering in the context of immersive environments. In the context of Immersive Sampling, practitioners engage in experiencing immersive environments, while capturing, organizing, revisiting, and remixing found content. We situate this subset of tasks in literature and argue for their importance for emerging, future content creation domains. To further explore how Immersive Sampling might take place, we created VRicolage, a proof-of-concept prototype showcasing a set of interactions in Virtual Reality to sample, revisit, and remix captures. Given the democratization of immersive environments, Immersive Sampling provides practitioners with a means to collect, revisit, and remix digital materials.2023ESEvgeny Stemasov et al.Immersion & Presence ResearchInteractive Narrative & Immersive StorytellingDIS
Tesseract: Querying Spatial Design Recordings by Manipulating Worlds in MiniatureNew immersive 3D design tools enable the creation of spatial design recordings, capturing collaborative design activities. By reviewing captured spatial design sessions, which include user activities, workflows, and tool use, users can reflect on their own design processes, learn new workflows, and understand others' design rationale. However, finding interesting moments in design activities can be challenging: they contain multimodal data (such as user motion and logged events) occurring over time which can be difficult to specify when searching, and are typically distributed over many sessions or recordings. We present Tesseract, a Worlds-in-Miniature-based system to expressively query VR spatial design recordings. Tesseract consists of the Search Cube interface acting as a centralized stage-to-search container, and four querying tools for specifying multimodal data to enable users to find interesting moments in past design activities. We studied ten participants who used Tesseract and found support for our miniature-based stage-to-search approach.2023KMKarthik Mahadevan et al.University of TorontoMixed Reality WorkspacesComputational Methods in HCICHI
AvatAR: An Immersive Analysis Environment for Human Motion Data Combining Interactive 3D Avatars and TrajectoriesAnalysis of human motion data can reveal valuable insights about the utilization of space and interaction of humans with their environment. To support this, we present AvatAR, an immersive analysis environment for the in-situ visualization of human motion data, that combines 3D trajectories, virtual avatars of people’s movement, and a detailed representation of their posture. Additionally, we describe how to embed visualizations directly into the environment, showing what a person looked at or what surfaces they touched, and how the avatar’s body parts can be used to access and manipulate those visualizations. AvatAR combines an AR HMD with a tablet to provide both mid-air and touch interaction for system control, as well as an additional overview to help users navigate the environment. We implemented a prototype and present several scenarios to show that AvatAR can enhance the analysis of human motion data by making data not only explorable, but experienceable.2022PRPatrick Reipschläger et al.Autodesk Research, Technische Universität DresdenHuman Pose & Activity RecognitionSocial & Collaborative VRAR Navigation & Context AwarenessCHI
Supercharging Trial-and-Error for Learning Complex Software ApplicationsDespite an abundance of carefully-crafted tutorials, trial-and-error remains many people’s preferred way to learn complex software. Yet, approaches to facilitate trial-and-error (such as tooltips) have evolved very little since the 1980s. While existing mechanisms work well for simple software, they scale poorly to large feature-rich applications. In this paper, we explore new techniques to support trial-and-error in complex applications. We identify key benefits and challenges of trial-and-error, and introduce a framework with a conceptual model and design space. Using this framework, we developed three techniques: ToolTrack to keep track of trial-and-error progress; ToolTrip to go beyond trial-and-error of single commands by highlighting related commands that are frequently used together; and ToolTaste to quickly and safely try commands. We demonstrate how these techniques facilitate trial-and-error, as illustrated through a proof-of-concept implementation in the CAD software Fusion 360. We conclude by discussing possible scenarios and outline directions for future research on trial-and-error.2022DMDamien Masson et al.Autodesk Research, University of WaterlooHead-Up Display (HUD) & Advanced Driver Assistance Systems (ADAS)Privacy by Design & User ControlKnowledge Worker Tools & WorkflowsCHI
In-Depth Mouse: Integrating Desktop Mouse into Virtual RealityVirtual Reality (VR) has potential for productive knowledge work, however, midair pointing with controllers or hand gestures does not offer the precision and comfort of traditional 2D mice. Directly integrating mice into VR is difficult as selecting targets in a 3D space is negatively impacted by binocular rivalry, perspective mismatch, and improperly calibrated control-display (CD) gain. To address these issues, we developed Depth-Adaptive Cursor, a 2D-mouse driven pointing technique for 3D selection with depth-adaptation that continuously interpolates the cursor depth by inferring what users intend to select based on the cursor position, the viewpoint, and the selectable objects. Depth-Adaptive Cursor uses a novel CD gain tool to compute a usable range of CD gains for general mouse-based pointing in VR. A user study demonstrated that Depth-Adaptive Cursor significantly improved performance compared with an existing mouse-based pointing technique without depth-adaption in terms of time (21.2%), error (48.3%), perceived workload, and user satisfaction.2022QZQian Zhou et al.Autodesk ResearchEye Tracking & Gaze InteractionMixed Reality WorkspacesCHI
"I don't want to feel like I'm working in a 1960s factory": The Practitioner Perspective on Creativity Support Tool AdoptionWith the rapid development of creativity support tools, creative practitioners (e.g., designers, artists, architects) have to constantly explore and adopt new tools into their practice. While HCI research has focused on developing novel creativity support tools, little is known about creative practitioner's values when exploring and adopting these tools. We collect and analyze 23 videos, 13 interviews, and 105 survey responses of creative practitioners reflecting on their values to derive a value framework. We find that practitioners value the tools' functionality, integration into their current workflow, performance, user interface and experience, learning support, costs and emotional connection, in that order. They largely discover tools through personal recommendations. To help unify and encourage reflection from the wider community of CST stakeholders (e.g., systems creators, researchers, marketers, educators), we situate the framework within existing research on systems, creativity support tools and technology adoption.2022SPSrishti Palani et al.Autodesk Research, University of CaliforniaGenerative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsCHI