Balancing Flow and Collaboration: Exploring Visual Noise Cancellation in Mixed Reality WorkspaceIn open-plan offices, visual noise from surrounding people and objects can negatively impact both concentration and mood. Mixed Reality (MR) offers a promising approach to address this challenge by reshaping the workspace. In this paper, we first conducted a survey with 50 office workers to examine the impact of visual noise, identifying common sources of distraction and potential mitigation strategies. Considering the necessity of face-to-face communication in office environments, we designed adaptive user interfaces to strike a balance between deep focus and seamless in-situ collaboration. We utilized Virtual Reality (VR) and Diminished Reality (DR) methods to eliminate visual noise and leveraged face orientation along with a distance threshold to determine collaborative intentions. We developed a prototype system and conducted a user study for evaluation. The results indicate that our system can create a tranquil workspace to foster concentration and workplace well-being, while maintaining necessary in-situ collaboration. These findings provide valuable insights for designing future MR-integrated office environments.2026XCXiang Chen et al.Beijing University of Posts and TelecommunicationsMixed Reality WorkspacesImmersion & Presence ResearchIUI
HiSync: Spatio-Temporally Aligning Hand Motion from Wearable IMU and On-Robot Camera for Command Source Identification in Long-Range HRILong-range Human-Robot Interaction (HRI) remains underexplored. Within it, Command Source Identification (CSI) – determining who issued a command – is especially challenging due to multi-user and distance-induced sensor ambiguity. We introduce HiSync, an optical-inertial fusion framework that treats hand motion as binding cues by aligning robot-mounted camera optical flow with hand-worn IMU signals. We first elicit a user-defined (N=12) gesture set and collect a multimodal command gesture dataset (N=38) in long-range multi-user HRI scenarios. Next, HiSync extracts frequency-domain hand motion features from both camera and IMU data, and a learned CSINet denoises IMU readings, temporally aligns modalities, and performs distance-aware multi-window fusion to compute cross-modal similarity of subtle, natural gestures, enabling robust CSI. In three-person scenes up to 34m, HiSync achieves 92.32% CSI accuracy, outperforming the prior SOTA by 48.44%. HiSync is also validated on real-robot deployment. By making CSI reliable and natural, HiSync provides a practical primitive and design guidance for public-space HRI.2026CZChengwen Zhang et al.Tsinghua UniversityTeleoperation & TelepresenceHuman Pose & Activity RecognitionHand Gesture RecognitionCHI
InkFlow: Connected Handwriting Recognition for Natural Mid-Air Input in Mixed RealityMid-air handwriting is a freeform text input modality that enables expression of individuality and creativity. In Mixed Reality, the recognition of writing strokes has conventionally depended on manual action or proximal planes. Such explicit reliance imposes cognitive load and quickly leads to fatigue. We present InkFlow, a novel bare-hand handwriting interaction approach that enables users to write continuously and naturally without explicit stroke control. We first design a user-friendly pipeline that leverages the widely adopted pinch–release gesture to intuitively collect annotated handwriting data. Next, we enhance a lightweight DS-TCN model with boundary-aware strategy to improve the learning of kinematic features. Moreover, building on cross-domain meta-learning, our approach achieves effective cross-user generalization and supports rapid personalization for new users. The comparative user study (N=30) shows the effectiveness and usability of our method and interaction design. A closed-loop online study (N=12) further demonstrates notable improvements in handwriting efficiency and physical comfort.2026XJXufeng Jian et al.Beijing University of Posts and TelecommunicationsHand Gesture RecognitionEye Tracking & Gaze InteractionAR Navigation & Context AwarenessCHI
Intrinsic vs. Extrinsic Programming Challenges in Educational Games: How they shape Children’s Computational Thinking, Learning Drive, and Game EngagementEducational programming games (EPGs) build computational thinking (CT), a vital 21st-century skill. A core design challenge is how pedagogical and gameplay challenges are integrated to balance educational objectives with player engagement. This study formalizes two contrasting challenge design patterns that reflect distinct integration strategies: extrinsic programming challenges (C1), a pedagogy-oriented design where programming is the core challenge enforced through external constraints; and intrinsic programming challenges (C2), a gameplay-oriented design where programming serves as a tool for overcoming gameplay challenges raised by in-game puzzles. To examine these challenge design patterns, we developed two isomorphic EPGs WannaBone1 (C1) and WannaBone2 (C2), each featuring 20 levels introducing sequences, loops, conditionals, and global variables. A controlled classroom study with 306 primary school students reveals that both designs improve CT, whereas C2 yields significantly higher intrinsic learning motivation and high-order immersion of flow. These findings indicate that a gameplay-oriented rather than pedagogy-oriented design perspective better unites education and entertainment, guiding future EPGs design.2026BCBaijun Chen et al.Beijing University of Posts and TelecommunicationsProgramming Education & Computational ThinkingSerious & Functional GamesCHI
Proactive AI as a Catalyst for Creativity? Balancing Human Agency and AI Contribution in Collaborative Story WritingLarge Language Models (LLMs) hold promise in supporting creative writing, yet the role of proactive AI in collaborative writing remains underexplored due to concerns around human agency and disruption. To investigate effective strategies for proactive AI support, we conducted a Wizard-of-Oz study simulating two suggestion styles: intrusive suggestions (next-sentence completions) and non-intrusive suggestions (exploratory proposals), where participants completed two story outlining tasks under each style, receiving real-time proactive suggestions from a human wizard acting as the AI. Both quantitative and qualitative results show that proactive AI can enhance creativity and accelerate writing. However, we observed a trade-off between AI involvement and perceived human agency. This trade-off was moderated by how strongly AI stimulated users—greater inspiration led to stronger perceived agency even under high AI involvement. Based on wizards' behavior, we offer guidance on suggestion style and timing to better balance creativity and agency for future proactive AI writing systems.2026YYYiwen Yin et al.Tsinghua UniversityHuman-LLM CollaborationAI-Assisted Creative WritingAI-Assisted Writing & Text GenerationCHI
Roomify: Spatially-Grounded Style Transformation for Immersive Virtual EnvironmentsWe present Roomify, a spatially-grounded transformation system that generates themed virtual environments anchored to users' physical rooms while maintaining spatial structure and functional semantics. Current VR approaches face a fundamental trade-off: full immersion sacrifices spatial awareness, while passthrough solutions break presence. Roomify addresses this through spatially-grounded transformation—treating physical spaces as "spatial containers'' that preserve key functional and geometric properties of furniture while enabling radical stylistic changes. Our pipeline combines in-situ 3D scene understanding, AI-driven spatial reasoning, and style-aware generation to create personalized virtual environments grounded in physical reality. We introduce a cross-reality authoring tool enabling fine-grained user control through MR editing and VR preview workflows. Two user studies validate our approach: one with 18 VR users demonstrates a 63% improvement in presence over passthrough and 26% over fully virtual baselines while maintaining spatial awareness; another with 8 design professionals confirms the system's creative expressiveness (scene quality: 5.95/7; creativity support: 6.08/7) and professional workflow value across diverse environments.2026XWXueyang Wang et al.Tsinghua UniversitySocial & Collaborative VRMixed Reality WorkspacesImmersion & Presence ResearchCHI
Does Sycophancy Change Decisions? Effect of LLM Sycophancy on AI-Assisted Decision-MakingLarge language models are increasingly integrated into everyday and professional decision making, yet often exhibit sycophantic behavior by aligning with users’ views or preferences. While sycophancy can enhance interaction, its influence on users' decisions remain unclear given different styles and task risks. We examine three forms of sycophancy—opinion agreement, direct praise, and self-deprecation—in two contrasting contexts: a low-risk speed-dating prediction task and a high-risk ETF investment task. In a 4×2 mixed-design online study (\textit{N} = 106), we compare non-sycophantic AI with sycophantic variants on decision outcomes and confidence changes. Results show that sycophancy influences decision patterns in type-dependent ways. Specifically, opinion agreement reinforces initial decisions and self-deprecation boosts confidence. Interviews further indicate that users value supportive AI but question its objectivity when praise becomes excessive. These findings reveal the multifaceted effects of AI sycophancy and offer design implications for balancing support and credibility in human–AI interaction.2026ZLZejian Li et al.Zhejiang UniversityAI-Assisted Decision-Making & AutomationAI Ethics, Fairness & AccountabilityHuman-LLM CollaborationCHI
Understanding Users' Perceptions and Expectations toward a Social Balloon Robot via an Exploratory StudyWe are witnessing a new epoch in embodied social agents. Most of the work has focused on ground or desktop robots that enjoy technical maturity and rich social channels but are often limited by terrain. Drones, which enable spatial mobility, currently face issues with safety and proximity. This paper explores a social balloon robot as a viable alternative that combines these advantages and alleviates limitations. To this end, we developed a hardware prototype named BalloonBot that integrates various devices for social functioning and a helium balloon. We conducted an exploratory lab study on users’ perceptions and expectations about its demonstrated interactions and functions. Our results show promise in using such a robot as another form of socially embodied agent. We highlight its unique mobile and approachable characteristics that harvest novel user experiences and outline factors that should be considered before its broad applications.2025CWChongyang Wang et al.Social Robot InteractionUIST
"This is My Fault", Really? Understanding Blind and Low-Vision People’s Perception of Hallucination in Large Vision Language ModelsVisual question-answering (VQA) tools powered by large visual language models (LVLMs) are used to assist blind and low-vision (BLV) individuals in overcoming visual challenges, raising concerns about hallucinations and associated risks. Existing literature overlooks the variations of hallucinations across distinct usage scenarios and types in the context of VQA for BLV people, resulting in limited understanding of their perceptions and insufficient guidance for targeted mitigation strategies. By analyzing 3,467 real-world VQA cases from BLV users, we developed a manifestation-scenario-based dual-dimensional hallucination typology, uncovering eight scenarios and five types of hallucinations. Through interviews with 16 BLV users, we examined their awareness levels, detection strategies, mental models of hallucinations, and their tolerance of associated risks, identifying key gaps between their perceptions and real situations. By designing with 12 BLV users, we uncovered their expectations for hallucination-mitigating solutions, including enhanced information provision, transparency in processing, verification strategies, and feedback mechanisms.2025YTYilin Tang et al.Voice AccessibilityExplainable AI (XAI)AI Ethics, Fairness & AccountabilityUIST
Exploring the Design of LLM-based Agent in Enhancing Self-disclosure Among the Older AdultsSocial difficulties have become an increasingly serious issue among older adults. For older adults, regular self-disclosure is essential for maintaining mental health and building close relationships. Leveraging conversational agents to encourage self-disclosure in older adults has shown increasing potential. Understanding how LLM-based agents can influence and stimulate self-disclosure across different topics is crucial for designing future agents tailored to older users. This study introduces Disclosure-Agent, an LLM-based conversational agent, and examines its impact on self-disclosure in older adults through a user study involving 20 participants, 8 topics, and two interactive interfaces equipped with Disclosure-Agent. The findings provide valuable insights into how LLM-based agents can promote self-disclosure in older adults and offer design recommendations for future elderly-oriented conversational agents.2025YGYijie Guo et al.Tsinghua University, Academy of Arts and Design; Tsinghua University, The Future LaboratoryAgent Personality & AnthropomorphismHuman-LLM CollaborationCHI
Characterizing Developers’ Linguistic Behaviors in Open Source Development across Their Social StatusesOpen Source Software (OSS) development has attracted numerous developers. As a typical complex sociotechnical system, an OSS project often forms a hierarchical social structure where a few developers are elite while the rest are non-elite. Differences in social status may result in distinct language use behaviors in interpersonal communication. Characterizing such behaviors is critical for supporting efficient and effective communication among developers with different social statuses. This study empirically compared elite and non-elite developers' language behaviors in their communication. We compiled a corpus of ~216,000 discourses collected from 20 large projects on GitHub. We investigated the linguistic differences in three aspects, namely, linguistic styles and characters, main concerns, and sentence patterns. Our findings reveal that elite and non-elite developers showed different linguistic patterns and had different concerns in their discourses. Their discourses also reflect the variation of the main focuses in the development process. Furthermore, elite and non-elite developers exhibited noticeable patterns in their linguistic behaviors in accordance with their roles and corresponding divisions of labor in the production process, no matter which semantic contexts. These findings provide implications for supporting communication that crosses social statuses in OSS development.2024YHYisi Han et al.Session 3b: Work, Non-Work, and Social TechnologiesCSCW
airTac: A Contactless Digital Tactile Receptor for Detecting Material and Roughness via Terahertz SensingZhang 等人提出 airTac 非接触式数字触觉传感器,利用太赫兹技术检测材料和表面粗糙度,为人机交互提供新途径2024ZZZhan Zhang et al.Mid-Air Haptics (Ultrasonic)Shape-Changing Interfaces & Soft Robotic MaterialsUbiComp
UHead: Driver Attention Monitoring System Using UWB RadarXu 等人提出 UHead 系统,利用超宽带雷达技术实时监测驾驶员注意力状态,提升驾驶安全。2024CXChongzhi Xu et al.Human Pose & Activity RecognitionUbiComp
AirECG: Contactless Electrocardiogram for Cardiac Disease Monitoring via mmWave Sensing and Cross-domain Diffusion Model2024LZLangcheng Zhao et al.Mental Health Apps & Online Support CommunitiesBiosensors & Physiological MonitoringUbiComp
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task AutomationThe emergent large language/multimodal models facilitate the evolution of mobile agents, especially in mobile UI task automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined action sequences, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on-device mobile UI task execution and faithful, scalable task evaluation. By observing that the task execution process only transfers UI states, LlamaTouch employs a novel evaluation approach that only assesses whether an agent traverses all manually annotated, essential application/system states. LlamaTouch comprises three key techniques: (1) On-device task execution that enables mobile agents to interact with realistic mobile environments for task execution. (2) Fine-grained UI component annotation that merges pixel-level screenshots and textual screen hierarchies to explicitly identify and precisely annotate essential UI components with a rich set of designed annotation primitives. (3) A multi-level application state matching algorithm that utilizes exact and fuzzy matching to accurately detect critical information in each screen, even with unpredictable UI layout/content dynamics. LlamaTouch currently incorporates four mobile agents and 496 tasks, encompassing both tasks in the widely-used datasets and our self-constructed ones to cover more diverse mobile applications. Evaluation results demonstrate LlamaTouch’s high faithfulness of evaluation in real-world mobile environments and its better scalability than human validation. LlamaTouch also enables easy task annotation and integration of new mobile agents. Code and dataset are publicly available at https://github.com/LlamaTouch/LlamaTouch.2024LZLi Zhang et al.Human-LLM CollaborationUIST
MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users’ physical contexts and mental states. We first conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leveraged large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We developed MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users’ in-the-moment app usage behaviors, physical contexts, mental states, goals & habits as input, and generates personalized and dynamic persuasive content with appropriate persuasion strategies. We conducted a 5-week field experiment (N=25) to compare MindShift with its simplified version (remove mental states) and baseline techniques (fixed reminder). The results show that MindShift improves intervention acceptance rates by 4.7-22.5% and reduces smartphone usage duration by 7.4-9.8%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy scale scores. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.2024RWRuolan Wu et al.Tsinghua UniversityHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesPrivacy by Design & User ControlCHI
Time2Stop: Adaptive and Explainable Human-AI Loop for Smartphone Overuse InterventionDespite a rich history of investigating smartphone overuse intervention techniques, AI-based just-in-time adaptive intervention (JITAI) methods for overuse reduction are lacking. We develop Time2Stop, an intelligent, adaptive, and explainable JITAI system that leverages machine learning to identify optimal intervention timings, introduces interventions with transparent AI explanations, and collects user feedback to establish a human-AI loop and adapt the intervention model over time. We conducted an 8-week field experiment (N=71) to evaluate the effectiveness of both the adaptation and explanation aspects of Time2Stop. Our results indicate that our adaptive models significantly outperform the baseline methods on intervention accuracy (>32.8% relatively) and receptivity (>8.0%). In addition, incorporating explanations further enhances the effectiveness by 53.8% and 11.4% on accuracy and receptivity, respectively. Moreover, Time2Stop significantly reduces overuse, decreasing app visit frequency by 7.0∼8.9%. Our subjective data also echoed these quantitative measures. Participants preferred the adaptive interventions and rated the system highly on intervention time accuracy, effectiveness, and level of trust. We envision our work can inspire future research on JITAI systems with a human-AI loop to evolve with users.2024AOAdiba Orzikulova et al.KAISTExplainable AI (XAI)AI-Assisted Decision-Making & AutomationNotification & Interruption ManagementCHI
mmStress: Distilling Human Stress from Daily Activities via Contact-less Millimeter-wave Sensing"Long-term exposure to stress hurts human's mental and even physical health,and stress monitoring is of increasing significance in the prevention, diagnosis, and management of mental illness and chronic disease. However, current stress monitoring methods are either burdensome or intrusive, which hinders their widespread usage in practice. In this paper, we propose mmStress, a contact-less and non-intrusive solution, which adopts a millimeter-wave radar to sense a subject's activities of daily living, from which it distills human stress. mmStress is built upon the psychologically-validated relationship between human stress and "displacement activities", i.e., subjects under stress unconsciously perform fidgeting behaviors like scratching, wandering around, tapping foot, etc. Despite the conceptual simplicity, to realize mmStress, the key challenge lies in how to identify and quantify the latent displacement activities autonomously, as they are usually transitory and submerged in normal daily activities, and also exhibit high variation across different subjects. To address these challenges, we custom-design a neural network that learns human activities from both macro and micro timescales and exploits the continuity of human activities to extract features of abnormal displacement activities accurately. Moreover, we also address the unbalance stress distribution issue by incorporating a post-hoc logit adjustment procedure during model training. We prototype, deploy and evaluate mmStress in ten volunteers' apartments for over four weeks, and the results show that mmStress achieves a promising accuracy of ~80% in classifying low, medium and high stress. In particular, mmStress manifests advantages, particularly under free human movement scenarios, which advances the state-of-the-art that focuses on stress monitoring in quasi-static scenarios." https://doi.org/10.1145/36109262023KLKun Liang et al.Human Pose & Activity RecognitionSleep & Stress MonitoringBiosensors & Physiological MonitoringUbiComp
Side-lobe Can Know More: Towards Simultaneous Communication and Sensing for mmWave"Thanks to the wide bandwidth, large antenna array, and short wavelength, millimeter wave (mmWave) has superior performance in both communication and sensing. Thus, the integration of sensing and communication is a developing trend for the mmWave band. However, the directional transmission characteristics of the mmWave limits the sensing scope to a narrow sector. Existing works coordinate sensing and communication in a time-division manner, which takes advantage of the sector level sweep during the beam training interval for sensing and the data transmission interval for communication. Beam training is a low frequency (e.g., 10Hz) and low duty-cycle event, which makes it hard to track fast movement or perform continuous sensing. Such time-division designs imply that we need to strike a balance between sensing and communication, and it is hard to get the best of both worlds. In this paper, we try to solve this dilemma by exploiting side lobes for sensing. We design Sidense, where the main lobe of the transmitter is directed towards the receiver, while in the meantime, the side lobes can sense the ongoing activities in the surrounding. In this way, sensing and downlink communication work simultaneously and will not compete for hardware and radio resources. In order to compensate for the low antenna gain of side lobes, Sidense performs integration to boost the quality of sensing signals. Due to the uneven side-lobe energy, Sidense also designs a target separation scheme to tackle the mutual interference in multi-target scenarios. We implement Sidense with Sivers mmWave module. Results show that Sidense can achieve millimeter motion tracking accuracy at 6m. We also demonstrate a multi-person respiration monitoring application. As Sidense does not modify the communication procedure or the beamforming strategy, the downlink communication performance will not be sacrificed due to concurrent sensing. We believe that more fascinating applications can be implemented on this concurrent sensing and communication platform. https://dl.acm.org/doi/10.1145/3569498"2023QYQian Yang et al.V2X (Vehicle-to-Everything) Communication DesignContext-Aware ComputingUbiComp
Midas: Generating mmWave Radar Data from Videos for Training Pervasive and Privacy-preserving Human Sensing Tasks"Millimeter wave radar is a promising sensing modality for enabling pervasive and privacy-preserving human sensing. However, the lack of large-scale radar datasets limits the potential of training deep learning models to achieve generalization and robustness. To close this gap, we resort to designing a software pipeline that leverages wealthy video repositories to generate synthetic radar data, but it confronts key challenges including i) multipath reflection and attenuation of radar signals among multiple humans, ii) unconvertible generated data leading to poor generality for various applications, and iii) the class-imbalance issue of videos leading to low model stability. To this end, we design Midas to generate realistic, convertible radar data from videos via two components: (i) a data generation network (DG-Net) combines several key modules, depth prediction, human mesh fitting and multi-human reflection model, to simulate the multipath reflection and attenuation of radar signals to output convertible coarse radar data, followed by a Transformer model to generate realistic radar data; (ii) a variant Siamese network (VS-Net) selects key video clips to eliminate data redundancy for addressing the class-imbalance issue. We implement and evaluate Midas with video data from various external data sources and real-world radar data, demonstrating its great advantages over the state-of-the-art approach for both activity recognition and object detection tasks. https://dl.acm.org/doi/10.1145/3580872"2023KDKaikai Deng et al.Human Pose & Activity RecognitionContext-Aware ComputingUbiComp