Point & Grasp: Flexible Selection of Out-of-Reach Objects Through Probabilistic Cue IntegrationSelecting out-of-reach objects is a fundamental task in mixed reality (MR). Existing methods rely on a single cue or deterministically fuse multiple cues, leading to performance degradation when the dominant cue becomes unreliable. In this work, we introduce a probabilistic cue integration framework that enables flexible combination of multiple user-generated cues for intent inference. Inspired by natural grasping behavior, we instantiate the framework with pointing direction and grasp gestures as a new interaction technique, \textsc{Point\&Grasp}. To this end, we collect the \datasetfullname~(\dataset) dataset to train a robust likelihood model of the gestural cue, which captures grasping patterns not present in existing in-reach datasets. User studies demonstrate that our selection method with cue integration not only improves accuracy and speed over single-cue baselines, but also remains practically effective compared to state-of-the-art methods across various sources of ambiguity. The dataset and code are available at \url{https://github.com/drlxj/point-and-grasp}.2026XLXuejing Luo et al.Aalto UniversityFull-Body Interaction & Embodied InputMixed Reality WorkspacesPhysical-Digital Hybrid InteractionCHI
Automating UI Optimization through Multi-Agentic ReasoningWe present AutoOptimization, a novel multi-objective optimization framework for adapting user interfaces. From a user’s verbal preferences for changing a UI, our framework guides a prioritization-based Pareto frontier search over candidate layouts. It selects suitable objective functions for UI placement while simultaneously parameterizing them according to the user's instructions to define the optimization problem. A solver then generates a series of optimal UI layouts, which our framework validates against the user's instructions to adapt the UI with the final solution. Our approach thus overcomes the previous need for manual inspection of layouts and the use of population averages for objective parameters. We integrate a Vision-Language Model into our framework whose reasoning capabilities allow us to focus on the Pareto optimization, prioritize results, and validate outcomes. We evaluate each step of our framework inside a Mixed Reality use case and demonstrate that AutoOptimization effectively increases the usability of UI adaptation schemes.2026ZLZhipeng Li et al.ETH ZürichHuman-LLM CollaborationMixed Reality WorkspacesPrototyping & User TestingCHI
Preference-Guided Prompt Optimization for Text-to-Image GenerationGenerative models are increasingly powerful, yet users struggle to guide them through prompts. The generative process is difficult to control and unpredictable, and user instructions may be ambiguous or under-specified. Prior prompt refinement tools heavily rely on human effort, while prompt optimization methods focus on numerical functions and are not designed for human-centered generative tasks, where feedback is better expressed as binary preferences and demands convergence within few iterations. We present APPO, a preference-guided prompt optimization algorithm. Instead of iterating prompts, users only provide binary preferential feedback. APPO adaptively balances its strategies between exploiting user feedback and exploring new directions, yielding effective and efficient optimization. We evaluate APPO on image generation, and the results show APPO enables achieving satisfactory outcomes in fewer iterations with lower cognitive load than manual prompt editing. We anticipate APPO will advance human-AI collaboration in generative tasks by leveraging user preferences to guide complex content creation.2026ZLZhipeng Li et al.ETH ZürichGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationCreative Collaboration & Feedback SystemsCHI
Efficient Visual Appearance Optimization by Learning from Prior PreferencesAdjusting visual parameters such as brightness and contrast is common in our everyday experiences. Finding the optimal parameter setting is challenging due to the large search space and the lack of an explicit objective function, leaving users to rely solely on their implicit preferences. Prior work has explored Preferential Bayesian Optimization (PBO) to address this challenge, involving users to iteratively select preferred designs from candidate sets. However, PBO often requires many rounds of preference comparisons, making it more suitable for designers than everyday end-users. We propose Meta-PO, a novel method that integrates PBO with meta-learning to improve sample efficiency. Specifically, Meta-PO infers prior users' preferences and stores them as models, which are leveraged to intelligently suggest design candidates for the new users, enabling faster convergence and more personalized results. An experimental evaluation of our method for appearance design tasks on 2D and 3D content showed that participants achieved satisfactory appearance in 5.86 iterations using Meta-PO when participants shared similar goals with a population (e.g., tuning for a ``warm'' look) and in 8 iterations even generalizes across divergent goals (e.g., from ``vintage'', ``warm'', to ``holiday''). Meta-PO makes personalized visual optimization more applicable to end-users through a generalizable, more efficient optimization conditioned on preferences, with the potential to scale interface personalization more broadly.2025ZLZhipeng Li et al.Explainable AI (XAI)AI-Assisted Decision-Making & AutomationUIST
Redefining Affordance via Computational RationalityAffordances, a foundational concept in human-computer interaction and design, have traditionally been explained by direct-perception theories, which assume that individuals perceive action possibilities directly from the environment. However, these theories fall short of explaining how affordances are perceived, learned, refined, or misperceived, and how users choose between multiple affordances in dynamic contexts. This paper introduces a novel affordance theory grounded in Computational Rationality, positing that humans construct internal representations of the world based on bounded sensory inputs. Within these internal models, affordances are inferred through two core mechanisms: feature recognition and hypothetical motion trajectories. Our theory redefines affordance perception as a decision-making process, driven by two components: confidence (the perceived likelihood of successfully executing an action) and predicted utility (the expected value of the outcome). By balancing these factors, individuals make informed decisions about which actions to take. Our theory frames affordances perception as dynamic, continuously learned, and refined through reinforcement and feedback. We validate the theory via thought experiments and demonstrate its applicability across diverse types of affordances (e.g., physical, digital, social). Beyond clarifying and generalizing the understanding of affordances across contexts, our theory serves as a foundation for improving design communication and guiding the development of more adaptive and intuitive systems that evolve with user capabilities.2025YLYi-Chi Liao et al.Explainable AI (XAI)Privacy by Design & User ControlUser Research Methods (Interviews, Surveys, Observation)IUI
Continual Human-in-the-Loop OptimizationOptimal input settings vary across users due to differences in motor abilities and personal preferences, which are typically addressed by manual tuning or calibration. Although human-in-the-loop optimization has the potential to identify optimal settings during use, it is rarely applied due to its long optimization process. A more efficient approach would continually leverage data from previous users to accelerate optimization, exploiting shared traits while adapting to individual characteristics. We introduce the concept of Continual Human-in-the-Loop Optimization and a Bayesian optimization-based method that leverages a Bayesian-neural-network surrogate model to capture population-level characteristics while adapting to new users. We propose a generative replay strategy to mitigate catastrophic forgetting. We demonstrate our method by optimizing virtual reality keyboard parameters for text entry using direct touch, showing reduced adaptation times with a growing user base. Our method opens the door for next-generation personalized input systems that improve with accumulated experience.2025YLYi-Chi Liao et al.ETH ZürichHuman-LLM CollaborationCHI
The Personality Dimensions GPT-3 Expresses During Human-Chatbot InteractionsKovacevic等人分析GPT-3在对话中表现的人格特征,揭示大型语言模型在社交交互中的行为模式与个性表达。2024NKNikola Kovacevic et al.Agent Personality & AnthropomorphismUbiComp
SituationAdapt: Contextual UI Optimization in Mixed Reality with Situation Awareness via LLM ReasoningMixed Reality is increasingly used in mobile settings beyond controlled home and office spaces. This mobility introduces the need for user interface layouts that adapt to varying contexts. However, existing adaptive systems are designed only for static environments. In this paper, we introduce SituationAdapt, a system that adjusts Mixed Reality UIs to real-world surroundings by considering environmental and social cues in shared settings. Our system consists of perception, reasoning, and optimization modules for UI adaptation. Our perception module identifies objects and individuals around the user, while our reasoning module leverages a Vision-and-Language Model to assess the placement of interactive UI elements. This ensures that adapted layouts do not obstruct relevant environmental cues or interfere with social norms. Our optimization module then generates Mixed Reality interfaces that account for these considerations as well as temporal constraints The evaluation of SituationAdapt is two-fold: We first validate our reasoning component’s capability in assessing UI contexts comparable to human expert users. In an online user study, we then established our system’s capability of producing context-aware MR layouts, where it outperformed adaptive methods from previous work. We further demonstrate the versatility and applicability of SituationAdapt with a set of application scenarios.2024ZLZhipeng Li et al.AR Navigation & Context AwarenessHuman-LLM CollaborationUIST
Chatbots With Attitude: Enhancing Chatbot Interactions Through Dynamic Personality InfusionEquipping chatbots with personality has the potential of transforming user interactions from mere transactions to engaging conversations, enhancing user satisfaction and experience. In this work, we introduce dynamic personality infusion, a novel intermediate stage between the chatbot and the user that adjusts the chatbot's response using a dedicated chatbot personality model and GPT-4 without altering the chatbot's semantic capabilities. To test the effectiveness of our method, we first collected human-chatbot conversations from 33 participants while they interacted with three LLM-based chatbots (GPT-3.5, Llama-2 13B, and Mistral 7B). Then, we conducted an online rating survey with 725 participants on the collected conversations. We analyze the impact of the personality infusion on the perceived trustworthiness of the chatbots and the suitability of different personality profiles for real-world chatbot use cases. Our work paves the way for dynamic, personalized chatbots, enhancing user trust and real-world applicability.2024NKNikola Kovacevic et al.Conversational ChatbotsAgent Personality & AnthropomorphismHuman-LLM CollaborationCUI
PressurePick: Muscle Tension Estimation for Guitar Players Using Unobtrusive Pressure SensingWhen learning to play an instrument, it is crucial for the learner's muscles to be in a relaxed state when practicing. Identifying, which parts of a song lead to increased muscle tension requires self-awareness during an already cognitively demanding task. In this work, we investigate unobtrusive pressure sensing for estimating muscle tension while practicing songs with the guitar. First, we collected data from twelve guitarists. Our apparatus consisted of three pressure sensors (one on each side of the guitar pick and one on the guitar neck) to determine the sensor that is most suitable for automatically estimating muscle tension. Second, we extracted features from the pressure time series that are indicative of muscle tension. Third, we present the hardware and software design of our PressurePick prototype, which is directly informed by the data collection and subsequent analysis.2023AFAndreas Rene Fender et al.Force Feedback & Pseudo-Haptic WeightBiosensors & Physiological MonitoringUIST
ViGather: Inclusive Virtual Conferencing with a Joint Experience Across Traditional Screen Devices and Mixed Reality HeadsetsTeleconferencing is poised to become one of the most frequent use cases of immersive platforms, since it supports high levels of presence and embodiment in collaborative settings. On desktop and mobile platforms, teleconferencing solutions are already among the most popular apps and accumulate significant usage time---not least due to the pandemic or as a desirable substitute for air travel or commuting. In this paper, we present ViGather, an immersive teleconferencing system that integrates users of all platform types into a joint experience via equal representation and a first-person experience. ViGather renders all participants as embodied avatars in one shared scene to establish co-presence and elicit natural behavior during collocated conversations, including nonverbal communication cues such as eye contact between participants as well as body language such as turning one's body to another person or using hand gestures to emphasize parts of a conversation during the virtual hangout. Since each user embodies an avatar and experiences situated meetings from an egocentric perspective no matter the device they join from, ViGather alleviates potential concerns about self-perception and appearance while mitigating potential `Zoom fatigue', as users' self-views are not shown. For participants in Mixed Reality, our system leverages the rich sensing and reconstruction capabilities of today's headsets. For users of tablets, laptops, or PCs, ViGather reconstructs the user's pose from the device's front-facing camera, estimates eye contact with other participants, and relates these non-verbal cues to immediate avatar animations in the shared scene. Our evaluation compared participants' behavior and impressions while videoconferencing in groups of four inside ViGather with those in Meta Horizon as a baseline for a social VR setting. Participants who participated on traditional screen devices (e.g., laptops and desktops) using ViGather reported a significantly higher sense of physical, spatial, and self-presence than when using Horizon, while all perceived similar levels of active social presence when using Virtual Reality headsets. Our follow-up study confirmed the importance of representing users on traditional screen devices as reconstructed avatars for perceiving self-presence.2023HQHuajian Qiu et al.Social & Collaborative VRMixed Reality WorkspacesIdentity & Avatars in XRMobileHCI
Reality Rifts: Wonder-ful Interfaces by Disrupting Perceptual CausalityReality Rifts are interfaces between the physical and the virtual reality, where incoherent observations of physical behavior lead users to imagine comprehensive and plausible end-to-end dynamics. Reality Rifts emerge in interactive physical systems that lack one or more components that are central to their operation, yet where the physical end-to-end interaction persists with plausible outcomes. Even in the presence of a Reality Rift, users can still interact with a system—much like they would with the unaltered and complete counterpart—leading them to implicitly infer the existence and imagine the behavior of the lacking components from observable phenomena and outcomes. Therefore, dynamic systems with Reality Rifts trigger doubt, curiosity, and rumination—a sense of wonder that users experience when observing a Reality Rift due to their innate curiosity. In this paper, we explore how interactive systems can elicit and guide the user's imagination by integrating Reality Rifts. We outline the design process for opening a Reality Rift in interactive physical systems, describe the resulting design space, and explore it through six characteristic prototypes. To understand to what extent and with which qualities these prototypes indeed induce a sense of wonder during an interaction, we evaluated \projectName\ in the form of a field deployment with 50 participants. We discuss participants' behavior and derive factors for the implementation of future wonder-ful experiences.2023LCLung-Pan Cheng et al.National Taiwan UniversityDesign FictionDigital Art Installations & Interactive PerformanceCHI
InfinitePaint: Painting in Virtual Reality with Passive Haptics Using Wet Brushes and a Physical Proxy CanvasDigital painting interfaces require an input fidelity that preserves the artistic expression of the user. Drawing tablets allow for precise and low-latency sensing of pen motions and other parameters like pressure to convert them to fully digitized strokes. A drawback is that those interfaces are rigid. While soft brushes can be simulated in software, the haptic sensation of the rigid pen input device is different compared to using a soft wet brush on paper. We present InfinitePaint, a system that supports digital painting in Virtual Reality on real paper with a real wet brush. We use special paper that turns black wherever it comes into contact with water and turns blank again upon drying. A single camera captures those temporary strokes and digitizes them while applying properties like color or other digital effects. We tested our system with artists and compared the subjective experience with a drawing tablet.2023AFAndreas Rene Fender et al.ETH ZürichHaptic WearablesDigital Art Installations & Interactive PerformanceCHI
HOOV: Hand Out-Of-View Tracking for Proprioceptive Interaction Using Inertial SensingCurrent Virtual Reality systems are designed for interaction under visual control. Using built-in cameras, headsets track the user's hands or hand-held controllers while they are inside the field of view. Current systems thus ignore the user's interaction with off-screen content---virtual objects that the user could quickly access through proprioception without requiring laborious head motions to bring them into focus. In this paper, we present HOOV, a wrist-worn sensing method that allows VR users to interact with objects outside their field of view. Based on the signals of a single wrist-worn inertial sensor, HOOV continuously estimates the user's hand position in 3-space to complement the headset's tracking as the hands leave the tracking range. Our novel data-driven method predicts hand positions and trajectories from just the continuous estimation of hand orientation, which by itself is stable based solely on inertial observations. Our inertial sensing simultaneously detects finger pinching to register off-screen selection events, confirms them using a haptic actuator inside our wrist device, and thus allows users to select, grab, and drop virtual content. We compared HOOV's performance with a camera-based optical motion capture system in two folds. In the first evaluation, participants interacted based on tracking information from the motion capture system to assess the accuracy of their proprioceptive input, whereas in the second, they interacted based on HOOV's real-time estimations. We found that HOOV's target-agnostic estimations had a mean tracking error of 7.7 cm, which allowed participants to reliably access virtual objects around their body without first bringing them into focus. We demonstrate several applications that leverage the larger input space HOOV opens up for quick proprioceptive interaction, and conclude by discussing the potential of our technique.2023PSPaul Streli et al.ETH ZürichIn-Vehicle Haptic, Audio & Multimodal FeedbackFoot & Wrist InteractionCHI
DeltaPen: A Device with Integrated High-Precision Translation and Rotation Sensing on Passive SurfacesWe present DeltaPen, a pen device that operates on passive surfaces without the need for external tracking systems or active sensing surfaces. DeltaPen integrates two adjacent lens-less optical flow sensors at its tip, from which it reconstructs accurate directional motion as well as yaw rotation. DeltaPen also supports tilt interaction using a built-in inertial sensor. A pressure sensor and high-fidelity haptic actuator complements our pen device while retaining a compact form factor that supports mobile use on uninstrumented surfaces. We present a processing pipeline that reliably extracts fine-grained pen translations and rotations from the two optical flow sensors. To asses the accuracy of our translation and angle estimation pipeline, we conducted a technical evaluation in which we compared our approach with ground-truth measurements of participants' pen movements during typical pen interactions. We conclude with several example applications that leverage our device's capabilities. Taken together, we demonstrate novel input dimensions with DeltaPen that have so far only existed in systems that require active sensing surfaces or external tracking.2022GLGuy Lüthi et al.Shape-Changing Interfaces & Soft Robotic MaterialsPrototyping & User TestingUIST
Understanding Multi-Device Usage Patterns: Physical Device Configurations and Fragmented WorkflowsTo better ground technical (systems) investigation and interaction design of cross-device experiences, we contribute an in-depth survey of existing multi-device practices, including fragmented workflows across devices and the way people physically organize and configure their workspaces to support such activity. Further, this survey documents a historically significant moment of transition to a new future of remote work, an existing trend dramatically accelerated by the abrupt switch to work-from-home (and having to contend with the demands of home-at-work) during the COVID-19 pandemic. We surveyed 97 participants, and collected photographs of home setups and open-ended answers to 50 questions categorized in 5 themes. We characterize the wide range of multi-device physical configurations and identify five usage patterns, including: partitioning tasks, integrating multi-device usage, cloning tasks to other devices, expanding tasks and inputs to multiple devices, and migrating between devices. Our analysis also sheds light on the benefits and challenges people face when their workflow is fragmented across multiple devices. These insights have implications for the design of multi-device experiences that support people's fragmented workflows.2022YYYe Yuan et al.Microsoft Research, University of MinnesotaRemote Work Tools & ExperienceDistributed Team CollaborationNotification & Interruption ManagementCHI
Affective State Prediction from Smartphone Touch and Sensor Data in the WildKnowledge of users' affective states can improve their interaction with smartphones by providing more personalized experiences (e.g., search results and news articles). We present an affective state classification model based on data gathered on smartphones in real-world environments. From touch events during keystrokes and the signals from the inertial sensors, we extracted two-dimensional heat maps as input into a convolutional neural network to predict the affective states of smartphone users. For evaluation, we conducted a data collection in the wild with 82 participants over 10 weeks. Our model accurately predicts three levels (low, medium, high) of valence (AUC up to 0.83), arousal (AUC up to 0.85), and dominance (AUC up to 0.84). We also show that using the inertial sensor data alone, our model achieves a similar performance (AUC up to 0.83), making our approach less privacy-invasive. By personalizing our model to the user, we show that performance increases by an additional 0.07 AUC.2022RWRafael Wampfler et al.ETH ZurichHuman Pose & Activity RecognitionMultilingual & Cross-Cultural Voice InteractionBiosensors & Physiological MonitoringCHI
TapType: Ten-finger text entry on everyday surfaces via Bayesian inferenceDespite the advent of touchscreens, typing on physical keyboards remains most efficient for entering text, because users can leverage all fingers across a full-size keyboard for convenient typing. As users increasingly type on the go, text input on mobile and wearable devices has had to compromise on full-size typing. In this paper, we present TapType, a mobile text entry system for full-size typing on passive surfaces—without an actual keyboard. From the inertial sensors inside a band on either wrist, TapType decodes and relates surface taps to a traditional QWERTY keyboard layout. The key novelty of our method is to predict the most likely character sequences by fusing the finger probabilities from our Bayesian neural network classifier with the characters' prior probabilities from an n-gram language model. In our online evaluation, participants on average typed 19 words per minute with a character error rate of 0.6 % after 30 minutes of training. Expert typists thereby consistently achieved more than 25 WPM at a similar error rate. We demonstrate applications of TapType in mobile use around smartphones and tablets, as a complement to interaction in situated Mixed Reality outside visual control, and as an eyes-free mobile text input method using an audio feedback-only interface.2022PSPaul Streli et al.ETH ZürichHand Gesture RecognitionFoot & Wrist InteractionContext-Aware ComputingCHI
Causality-preserving Asynchronous RealityMixed Reality is gaining interest as a platform for collaboration and focused work to a point where it may supersede current office settings in future workplaces. At the same time, we expect that interaction with physical objects and face-to-face communication will remain crucial for future work environments, which is a particular challenge in fully immersive Virtual Reality. In this work, we reconcile those requirements through a user's individual Asynchronous Reality, which enables seamless physical interaction across time. When a user is unavailable, e.g., focused on a task or in a call, our approach captures co-located or remote physical events in real-time, constructs a causality graph of co-dependent events, and lets immersed users revisit them at a suitable time in a causally accurate way. Enabled by our system AsyncReality, we present a workplace scenario that includes walk-in interruptions during a person's focused work, physical deliveries, and transient spoken messages. We then generalize our approach to a use-case agnostic concept and system architecture. We conclude by discussing the implications of an Asynchronous Reality for future offices.2022AFAndreas Rene Fender et al.ETH ZürichMixed Reality WorkspacesImmersion & Presence ResearchContext-Aware ComputingCHI
CapContact: Super-resolution Contact Areas from Capacitive TouchscreensTouch input is dominantly detected using mutual-capacitance sensing, which measures the proximity of close-by objects that change the electric field between the sensor lines. The exponential drop-off in intensities with growing distance enables software to detect touch events, but does not reveal true contact areas. In this paper, we introduce CapContact, a novel method to precisely infer the contact area between the user's finger and the surface from a single capacitive image. At 8x super-resolution, our convolutional neural network generates refined touch masks from 16-bit capacitive images as input, which can even discriminate adjacent touches that are not distinguishable with existing methods. We trained and evaluated our method using supervised learning on data from 10 participants who performed touch gestures. Our capture apparatus integrates optical touch sensing to obtain ground-truth contact through high-resolution frustrated total internal reflection. We compare our method with a baseline using bicubic upsampling as well as the ground truth from FTIR images. We separately evaluate our method's performance in discriminating adjacent touches. CapContact successfully separated closely adjacent touch contacts in 494 of 570 cases (87%) compared to the baseline's 43 of 570 cases (8%). Importantly, we demonstrate that our method accurately performs even at half of the sensing resolution at twice the grid-line pitch across the same surface area, challenging the current industry-wide standard of a ~4mm sensing pitch. We conclude this paper with implications for capacitive touch sensing in general and for touch-input accuracy in particular.2021PSPaul Streli et al.ETH ZürichHand Gesture RecognitionCHI