Canvas3D: Empowering Precise Spatial Control for Image Generation with Constraints from a 3D Virtual CanvasGenerative AI (GenAI) has significantly advanced the ease and flexibility of image creation. However, it remains a challenge to precisely control spatial compositions, including object arrangement and scene conditions. To bridge this gap, we propose Canvas3D, an interactive system leveraging a 3D engine to enable precise spatial manipulation for image generation. Upon user prompt, Canvas3D automatically converts textual descriptions into interactive objects within a 3D engine-driven virtual canvas, empowering direct and precise spatial configuration. These user-defined arrangements generate explicit spatial constraints that guide generative models in accurately reflecting user intentions in the resulting images. We conducted a closed-ended comparative study between Canvas3D and a baseline system, and an open-ended, free-form study to assess overall system usability. The results indicate that Canvas3D outperforms the baseline on spatial control, interactivity, and overall user experience.2026YCYuzhao Chen et al.Purdue UniversityGenerative AI (Text, Image, Music, Video)3D Modeling & AnimationCreative Collaboration & Feedback SystemsIUI
JustShape: Exploring Co-Speech Gestures for Multimodal LLM-Powered 3D Parametric ModelingParametric modeling is a prevailing 3D modeling approach in design, architecture, and engineering. The emergence of multimodal large language models (LLMs) brings a new opportunity to lower the entry barriers to this powerful tool. However, describing 3D geometries through natural language can be fuzzy and challenging. We introduce co-speech gesture, a natural and expressive interaction modality to complement text prompts for LLM-empowered generative parametric modeling. We first conducted an elicitation study to explore and categorize co-speech gesture expressions. Based on the findings, we designed a multimodal fusion pipeline that parametrizes gestures and synthesizes them with speech. This approach reduces language ambiguity by translating implicit user intentions into explicit parametric attributes, thus lifting the model generation performance. We conducted a two-session user study testing and comparing it with traditional language and sketch inputs. This work streamlines the parametric modeling workflow and explores novel multimodal interaction paradigms for LLM-empowered design and creation.2026RDRunlin Duan et al.Purdue UniversityHand Gesture RecognitionGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationCHI
ARify: Leveraging Narrated Instructional Videos to Create Augmented Reality Tutorials for Procedural TasksAugmented Reality (AR) tutorials enhance procedural task learning by providing situated, step-by-step guidance. Yet, creating such tutorials requires AR authoring expertise, posing a significant entry barrier. To lower this barrier, we introduce ARify, an authoring system that semi-automatically transforms narrated instructional videos into AR tutorials. To guide system design, we conducted a content analysis of video tutorials and derived a design space of instructional intents, tactics, and AR representations. Building on this, ARify generates AR tutorials by integrating a vision–language model to plan tutorial structures and an AR builder to configure AR representations, and offers interfaces that allow users to refine and customize the results. A numerical study on three machine tasks and a user study with 18 participants showed that ARify achieves promising performance across task types, and allows novices to author effective AR tutorials, validating its effectiveness and usability.2026XHXiyun Hu et al.Purdue UniversityAR Navigation & Context AwarenessPrototyping & User TestingMixed Reality WorkspacesCHI
AgentCoach: LLM-Based Adaptive Coaching Feedback for Motor Skill LearningWe present AgentCoach, an LLM-powered system that provides adaptive feedback for motor skill learning from tutorial videos. The system works by extracting key coaching points (CPs) and compiling CP-specific evaluators that map each cue to measurable kinematic parameters. This process allows AgentCoach to connect high-level semantic meaning with low-level postural estimation for accurate, context-aware evaluation. During practice, learners receive concise visual diagnostics of their mistakes paired with prescriptive verbal feedback that adapts based on their performance history. We technically validate the CP extraction and evaluator compilation across a wide range of common sports and exercise videos. A user study confirms the system's usability and shows the system's potential effectiveness of its adaptive feedback across multiple skills.2026DMDizhi Ma et al.Purdue UniversityHuman Pose & Activity RecognitionFitness Tracking & Physical Activity MonitoringBehavior Change & Reflection TechnologyCHI
AmIWrite: Exploring Scalable One-on-One Handwriting-Based Tutoring for Mathematical Problem-Solving with an LLM-Powered AI TutorReal-time handwriting interactions between tutors and students —where tutors observe individual problem-solving processes, provide personalized annotations, and adapt explanations based on students' work—are fundamental to effective STEM tutoring. However, scaling such personalized handwriting-based tutoring remains challenging—human tutors cannot be available to every student on demand, and current online platforms often fail to recreate equivalent learning experiences. As an initial step toward tackling this challenge, we present AmIWrite, an LLM-powered AI tutoring system for mathematical problem-solving that provides real-time co-speech handwriting interactions on tablet devices, instantiated here as a case study in linear algebra. We conducted a within-subjects study (N = 40) comparing AmIWrite to a text-based AI tutor on two linear algebra topics. Our case study demonstrates how a multimodal AI tutor can preserve the pedagogical benefits of handwriting-based math tutoring and offer a potential path toward more scalable one-on-one STEM tutoring.2026ZLZiyi Liu et al.Purdue UniversityHand Gesture RecognitionIntelligent Tutoring Systems & Learning AnalyticsTangible Interaction in EducationCHI
agentAR: Creating Augmented Reality Applications with Tool-Augmented LLM-based Autonomous AgentsCreating Augmented Reality (AR) applications requires expertise in both design and implementation, posing significant barriers to entry for non-expert users. While existing methods reduce some of this burden, they often fall short in flexibility or usability for complex or varied use cases. To address this, we introduce agentAR, an AR authoring system that leverages a tool-augmented large language model (LLM)–based autonomous agent to support end-to-end, in-situ AR application creation from natural language input. Built on an application structure and tool library derived from state-of-the-art AR research, the agent autonomously creates AR applications from natural language dialogue. We demonstrate the effectiveness of agentAR through a case study of six AR applications and a user study with twelve participants, showing that it significantly reduces user effort while supporting the creation of diverse and functional AR experiences.2025CZChenfei Zhu et al.AR Navigation & Context AwarenessGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationUIST
GesPrompt: Leveraging Co-Speech Gestures to Augment LLM-Based Interaction in Virtual RealityLarge Language Model (LLM)-based copilots have shown great potential in Extended Reality (XR) applications. However, the user faces challenges when describing the 3D environments to the copilots due to the complexity of conveying spatial-temporal information through text or speech alone. To address this, we introduce GesPrompt, a multimodal XR interface that combines co-speech gestures with speech, allowing end-users to communicate more naturally and accurately with LLM-based copilots in XR environments. By incorporating gestures, GesPrompt extracts spatial-temporal reference from co-speech gestures, reducing the need for precise textual prompts and minimizing cognitive load for end-users. Our contributions include (1) a workflow to integrate gesture and speech input in the XR environment, (2) a prototype VR system that implements the workflow, and (3) a user study demonstrating its effectiveness in improving user communication in VR environments.2025XHXiyun Hu et al.Hand Gesture RecognitionMixed Reality WorkspacesHuman-LLM CollaborationDIS
DesignFromX:Empowering Consumer-Driven Design Space Exploration through Feature Composition of Referenced ProductsIndustrial products are designed to satisfy the needs of consumers. The rise of generative artificial intelligence (GenAI) enables consumers to easily modify a product by prompting a generative model, opening up opportunities to incorporate consumers in exploring the product design space. However, consumers often struggle to articulate their preferred product features due to their unfamiliarity with terminology and their limited understanding of the structure of product features. We present DesignFromX, a system that empowers consumer-driven design space exploration by helping consumers to design a product based on their preferences. Leveraging an effective GenAI-based framework, the system allows users to easily identify design features from product images and compose those features to generate conceptual images and 3D models of a new product. A user study with 24 participants demonstrates that DesignFromX lowers the barriers and frustration for consumer-driven design space explorations by enhancing both engagement and enjoyment for the participants.2025RDRunlin Duan et al.Generative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsCustomizable & Personalized ObjectsDIS
CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial IntelligenceContext-aware AR instruction enables adaptive and in-situ learning experiences. However, hardware limitations and expertise requirements constrain the creation of such instructions. With recent developments in Generative Artificial Intelligence (Gen-AI), current research tries to tackle these constraints by deploying AI-generated content (AIGC) in AR applications. However, our preliminary study with six AR practitioners revealed that the current AIGC lacks contextual information to adapt to varying application scenarios and is therefore limited in authoring. To utilize the strong generative power of GenAI to ease the authoring of AR instruction while capturing the context, we developed CARING-AI, an AR system to author context-aware humanoid-avatar-based instructions with GenAI. By navigating in the environment, users naturally provide contextual information to generate humanoid-avatar animation as AR instructions that blend in the context spatially and temporally. We showcased three application scenarios of CARING-AI: Asynchronous Instructions, Remote Instructions, and Ad Hoc Instructions based on a design space of AIGC in AR Instructions. With two user studies (N=12), we assessed the system usability of CARING-AI and demonstrated the easiness and effectiveness of authoring with Gen-AI.2025JSJingyu Shi et al.Purdue University, Elmore Family School of Electrical and Computer EngineeringAR Navigation & Context AwarenessGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationCHI
Transparent Barriers: Natural Language Access Control Policies for XR-Enhanced Everyday ObjectsExtended Reality (XR)-enabled headsets that overlay digital content onto the physical world, are gradually finding their way into our daily life. This integration raises significant concerns about privacy and access control, especially in shared spaces where XR applications interact with everyday objects. Such issues remain subtle in the absence of widespread applications of XR and studies in shared spaces are required for a smooth progress. This study evaluated a prototype system facilitating natural language policy creation for flexible, context-aware access control of personal objects. We assessed its usability, focusing on balancing precision and user effort in creating access control policies. Qualitative interviews and task-based interactions provided insights into users' preferences and behaviors, informing future design directions. Findings revealed diverse user needs for controlling access to personal items in various situations, emphasizing the need for flexible, user-friendly access control in XR-enhanced shared spaces that respects boundaries and considers social contexts.2025KTKentaro Taninaka et al.Keio University, Graduate School of Media and GovernanceAR Navigation & Context AwarenessPrivacy by Design & User ControlCHI
avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented RealityTable tennis stroke training is a critical aspect of player development. We designed a new augmented reality (AR) system, avaTTAR, for table tennis stroke training. The system provides both “on-body” (first-person view) and “detached” (third-person view) visual cues, enabling users to visualize target strokes and correct their attempts effectively with this dual perspectives setup. By employing a combination of pose estimation algorithms and IMU sensors, avaTTAR captures and reconstructs the 3D body pose and paddle orientation of users during practice, allowing real-time comparison with expert strokes. Through a user study, we affirm avaTTAR ’s capacity to amplify player experience and training results2024DMDizhi Ma et al.Full-Body Interaction & Embodied InputAR Navigation & Context AwarenessVR Medical Training & RehabilitationUIST
ChatDirector: Enhancing Video Conferencing with Space-Aware Scene Rendering and Speech-Driven Layout TransitionRemote video conferencing systems (RVCS) are widely adopted in personal and professional communication. However, they often lack the co-presence experience of in-person meetings. This is largely due to the absence of intuitive visual cues and clear spatial relationships among remote participants, which can lead to speech interruptions and loss of attention. This paper presents ChatDirector, a novel RVCS that overcomes these limitations by incorporating space-aware visual presence and speech-aware attention transition assistance. ChatDirector employs a real-time pipeline that converts participants' RGB video streams into 3D portrait avatars and renders them in a virtual 3D scene. We also contribute a decision tree algorithm that directs the avatar layouts and behaviors based on participants' speech states. We report on results from a user study (N=16) where we evaluated ChatDirector. The satisfactory algorithm performance and complimentary subject user feedback imply that ChatDirector significantly enhances communication efficacy and user engagement.2024XQXun Qian et al.Purdue UniversitySocial & Collaborative VRMixed Reality WorkspacesCHI
ClassMeta: Designing Interactive Virtual Classmate to Promote VR Classroom ParticipationPeer influence plays a crucial role in promoting classroom participation, where behaviors from active students can contribute to a collective classroom learning experience. However, the presence of these active students depends on several conditions and is not consistently available across all circumstances. Recently, Large Language Models (LLMs) such as GPT have demonstrated the ability to simulate diverse human behaviors convincingly due to their capacity to generate contextually coherent responses based on their role settings. Inspired by this advancement in technology, we designed ClassMeta, a GPT-4 powered agent to help promote classroom participation by playing the role of an active student. These agents, which are embodied as 3D avatars in virtual reality, interact with actual instructors and students with both spoken language and body gestures. We conducted a comparative study to investigate the potential of ClassMeta for improving the overall learning experience of the class.2024ZLZiyi Liu et al.Purdue UniversitySocial & Collaborative VRHuman-LLM CollaborationIntelligent Tutoring Systems & Learning AnalyticsCHI
Ubi-TOUCH: Ubiquitous Tangible Object Utilization through Consistent Hand-object interaction in Augmented RealityUtilizing everyday objects as tangible proxies for Augmented Reality (AR) provides users with haptic feedback while interacting with virtual objects. Yet, existing methods focus on the attributes of the objects, constraining the possible proxies and yielding inconsistency in user experience. Therefore, we propose Ubi-TOUCH, an AR system that assists users in seeking a wider range of tangible proxies for AR applications based on the hand-object interaction (HOI) they desire. Given the target interaction with a virtual object, the system scans the users' vicinity and recommends object proxies with similar interactions. Upon user selection, the system simultaneously tracks and maps users' physical HOI to the virtual HOI, adaptively optimizing object 6 DoF and the hand gesture to provide consistency between the interactions. We showcase promising use cases of Ubi-TOUCH, such as remote tutorials, AR gaming, and Smart Home control. Finally, we evaluate the performance and usability of Ubi-TOUCH with a user study.2023RJRahul Jain et al.Full-Body Interaction & Embodied InputAR Navigation & Context AwarenessMixed Reality WorkspacesUIST
LearnIoTVR: An End-to-end Virtual Reality Environment Providing Authentic Learning Experiences for Internet of ThingsThe rapid growth of Internet-of-Things (IoT) applications has generated interest from many industries and a need for graduates with relevant knowledge. An IoT system is comprised of spatially distributed interactions between humans and various interconnected IoT components. These interactions are contextualized within their ambient environment, thus impeding educators from recreating authentic tasks for hands-on IoT learning. We propose LearnIoTVR, an end-to-end virtual reality (VR) learning environment which helps students to acquire IoT knowledge through immersive design, programming, and exploration of real-world environments empowered by IoT (e.g., a smart house). The students start the learning process by installing virtual IoT components we created in different locations inside the VR environment so that the learning will be situated in the same context where the IoT is applied. With our custom-designed 3D block-based language, students can program IoT behaviors directly within VR and get immediate feedback on their programming outcome. In the user study, we evaluated the learning outcomes among students using LearnIoTVR with a pre- and post-test to understand to what extent does engagement in LearnIoTVR lead to gains in learning programming skills and IoT competencies. Additionally, we examined what aspects of LearnIoTVR support usability and learning of programming skills compared to a traditional desktop-based learning environment. The results from these studies were promising. We also acquired insightful user feedback which provides inspiration for further expansions of this system.2023ZZZhengzhe Zhu et al.Purdue UniversityAR Navigation & Context AwarenessProgramming Education & Computational ThinkingK-12 Digital Education ToolsCHI
InstruMentAR: Auto-Generation of Augmented Reality Tutorials for Operating Digital Instruments Through Recording Embodied DemonstrationAugmented Reality tutorials, which provide necessary context by directly superimposing visual guidance on the physical referent, represent an effective way of scaffolding complex instrument operations. However, current AR tutorial authoring processes are not seamless as they require users to continuously alternate between operating instruments and interacting with virtual elements. We present InstruMentAR, a system that automatically generates AR tutorials through recording user demonstrations. We design a multimodal approach that fuses gestural information and hand-worn pressure sensor data to detect and register the user's step-by-step manipulations on the control panel. With this information, the system autonomously generates virtual cues with designated scales to respective locations for each step. Voice recognition and background capture are employed to automate the creation of text and images as AR content. For novice users receiving the authored AR tutorials, we facilitate immediate feedback through haptic modules. We compared InstruMentAR with traditional systems in the user study.2023ZLZiyi Liu et al.Purdue UniversityIn-Vehicle Haptic, Audio & Multimodal FeedbackHand Gesture RecognitionAR Navigation & Context AwarenessCHI
Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in Augmented RealityEdges are one of the most ubiquitous geometric features of physical objects. They provide accurate haptic feedback and easy-to-track features for camera systems, making them an ideal basis for Tangible User Interfaces (TUI) in Augmented Reality (AR). We introduce Ubi Edge, an AR authoring tool that allows end-users to customize edges on daily objects as TUI inputs to control varied digital functions. We develop an integrated AR device and an integrated vision-based detection pipeline that can track 3D edges and detect the touch interaction between fingers and edges. Leveraging the spatial awareness of AR, users can simply select an edge by sliding fingers along it and then make the edge interactive by connecting it to various digital functions. We demonstrate four use cases including multi-function controllers, smart homes, games, and TUI-based tutorials. We also evaluated and proved our system’s usability through a two-session user study, where qualitative and quantitative results are positive.2023FHFengming He et al.Purdue UniversityShape-Changing Interfaces & Soft Robotic MaterialsAR Navigation & Context AwarenessCHI
ARnnotate: An Augmented Reality Interface for Collecting Custom Dataset of 3D Hand-Object Interaction Pose EstimationVision-based 3D pose estimation has substantial potential in hand-object interaction applications and requires user-specified datasets to achieve robust performance. We propose ARnnotate, an Augmented Reality (AR) interface enabling end-users to create custom data using a hand-tracking-capable AR device. Unlike other dataset collection strategies, ARnnotate first guides a user to manipulate a virtual bounding box and records its poses and the user's hand joint positions as the labels. By leveraging the spatial awareness of AR, the user manipulates the corresponding physical object while following the in-situ AR animation of the bounding box and hand model, while ARnnotate captures the user's first-person view as the images of the dataset. A 12-participant user study was conducted, and the results proved the system's usability in terms of the spatial accuracy of the labels, the satisfactory performance of the deep neural networks trained with the data collected by ARnnotate, and the users' subjective feedback.2022XQXun Qian et al.Hand Gesture RecognitionEye Tracking & Gaze InteractionHuman Pose & Activity RecognitionUIST
MechARspace: An Authoring System Enabling Bidirectional Binding of AR with Toys in Real-timeAugmented Reality (AR), which blends physical and virtual worlds, presents the possibility of enhancing traditional toy design. By leveraging bidirectional virtual-physical interactions between humans and the designed artifact, such AR-enhanced toys can provide more playful and interactive experiences for traditional toys. However, designers are constrained by the complexity and technical difficulties of the current AR content creation processes. We propose MechARspace, an immersive authoring system that supports users to create toy-AR interactions through direct manipulation and visual programming. Based on the elicitation study, we propose a bidirectional interaction model which maps both ways: from the toy inputs to reactions of AR content, and also from the AR content to the toy reactions. This model guides the design of our system which includes a plug-and-play hardware toolkit and an in-situ authoring interface. We present multiple use cases enabled by MechARspace to validate this interaction model. Finally, we evaluate our system with a two-session user study where users first recreated a set of predefined toy-AR interactions and then implemented their own AR-enhanced toy designs.2022ZZZhengzhe Zhu et al.Mixed Reality WorkspacesUIST
ScalAR: Authoring Semantically Adaptive Augmented Reality Experiences in Virtual RealityAugmented Reality (AR) experiences tightly associate virtual contents with environmental entities. However, the dissimilarity of different environments limits the adaptive AR content behaviors under large-scale deployment. We propose ScalAR, an integrated workflow enabling designers to author semantically adaptive AR experiences in Virtual Reality (VR). First, potential AR consumers collect local scenes with a semantic understanding technique. ScalAR then synthesizes numerous similar scenes. In VR, a designer authors the AR contents' semantic associations and validates the design while being immersed in the provided scenes. We adopt a decision-tree-based algorithm to fit the designer’s demonstrations as a semantic adaptation model to deploy the authored AR experience in a physical scene. We further showcase two application scenarios authored by ScalAR and conduct a two-session user study where the quantitative results prove the accuracy of the AR content rendering and the qualitative results show the usability of ScalAR.2022XQXun Qian et al.Purdue UniversityAR Navigation & Context AwarenessMixed Reality WorkspacesCHI