OpenCD: Empowering Diagnosis of Children's Mathematical Cognition through Open-ended Multimodal TasksAssessing children’s cognitive development in early mathematics is vital for effective teaching. Compared to closed-ended questions, which may fail to capture nuanced developmental spectrum, open-ended elicitation tasks (e.g., asking students to manipulate objects or draw to represent numbers) serve as a promising approach to reveal deeper cognitive processes. However, their diverse and unstructured nature makes systematic analysis challenging for teachers. We present OpenCD, a teacher-facing system that automatically analyzes multimodal student responses to capture individualized insights. Based on Evidence-Centered Design, it combines Vision-Language Models (VLMs) and expert models to generate interactive diagnostic graphs and reports with traceability back to behavioral evidence. In our two-part evaluation, a validation study found 90.3% of the system’s diagnoses “completely reasonable,” and a user study showed that OpenCD reduced teachers’ analysis burden and enhanced their insights into student thinking. Our work contributes to scalable process-based assessment for mathematical literacy.2026ZZZhi Zheng et al.Tsinghua UniversityIntelligent Tutoring Systems & Learning AnalyticsProgramming Education & Computational ThinkingUser Research Methods (Interviews, Surveys, Observation)CHI
HiSync: Spatio-Temporally Aligning Hand Motion from Wearable IMU and On-Robot Camera for Command Source Identification in Long-Range HRILong-range Human-Robot Interaction (HRI) remains underexplored. Within it, Command Source Identification (CSI) – determining who issued a command – is especially challenging due to multi-user and distance-induced sensor ambiguity. We introduce HiSync, an optical-inertial fusion framework that treats hand motion as binding cues by aligning robot-mounted camera optical flow with hand-worn IMU signals. We first elicit a user-defined (N=12) gesture set and collect a multimodal command gesture dataset (N=38) in long-range multi-user HRI scenarios. Next, HiSync extracts frequency-domain hand motion features from both camera and IMU data, and a learned CSINet denoises IMU readings, temporally aligns modalities, and performs distance-aware multi-window fusion to compute cross-modal similarity of subtle, natural gestures, enabling robust CSI. In three-person scenes up to 34m, HiSync achieves 92.32% CSI accuracy, outperforming the prior SOTA by 48.44%. HiSync is also validated on real-robot deployment. By making CSI reliable and natural, HiSync provides a practical primitive and design guidance for public-space HRI.2026CZChengwen Zhang et al.Tsinghua UniversityTeleoperation & TelepresenceHuman Pose & Activity RecognitionHand Gesture RecognitionCHI
ActivitySeeker: Towards Collaborative Personalized Human Activity Discovery and Recognition on SmartphonesSmartphones provide an attractive yet challenging platform for human activity recognition (HAR). They are ubiquitous, but also limit the input of HAR systems to a single IMU. These systems are also challenged by the inherent diversity of human activities and varying phone placement on the user's body. This results in traditional smartphone HAR systems having limited personalization potential or imposing a high user burden. We propose ActivitySeeker, a personalized smartphone HAR system that combines self-supervised activity discovery and low-burden user interaction to collaboratively label IMU data and adapt HAR models to individual users on-device through transfer learning. We evaluated ActivitySeeker through simulated online learning and in-the-wild user experiments, where it discovered 95.5% of personal activity types and achieved high recognition accuracy (93.3%) while maintaining a positive user experience. Leveraging the synergy between user and smartphone, ActivitySeeker opens up new possibilities for HAR-based applications like fitness, health and personalized recommendation.2026ZYZhoutong Ye et al.Tsinghua UniversityHuman Pose & Activity RecognitionFitness Tracking & Physical Activity MonitoringBehavior Change & Reflection TechnologyCHI
TraceRing: Touchpad-like Pointing with a Single IMU Ring through Personalized LearningAchieving touchpad-like pointing with a single IMU ring is highly desirable for portable and wearable interaction, yet challenging due to incomplete motion data and significant user variability. We present TraceRing, a finger-worn IMU system that enables precise two-dimensional cursor control. To address the limitations of generic end-to-end models, we propose a personalized training framework that learns user-specific representations through joint multi-task and contrastive learning, while dynamically selecting the most suitable expert model. This approach enables personalization without requiring per-user fine-tuning, and reduces velocity prediction error by 33.9% over state-of-the-art baselines. Furthermore, a real-time study shows it delivers speed and accuracy far exceeding those of AirMouse (2.26s v.s. 3.01s in average task completion time). These results demonstrate TraceRing as a portable and comfortable alternative for mobile computing and AR interaction applications.2026ZHZhe He et al.Tsinghua UniversityHaptic WearablesHand Gesture RecognitionMobile Augmented RealityCHI
Division of Labor and Collaboration Between Parents in Family Education: The Case of Homework Involvement in Chinese FamiliesHomework tutoring work is a demanding and often conflict-prone practice in family life, and parents often lack targeted support for managing its cognitive and emotional burdens. Through interviews with 18 parents of children in grades 1–3, we examine how homework-related labor is divided and coordinated between parents, and where AI might meaningfully intervene. We found three key insights: (1) Homework labor encompasses distinct dimensions: physical, cognitive, and emotional, with the latter two often remaining invisible. (2) We identified father-mother-child triadic dynamics in labor division, with children’s feedback as the primary factor shaping parental labor adjustments. (3) Building on prior HCI research, we propose an AI design that prioritizes relationship maintenance over task automation or broad labor mitigation. By employing labor as a lens that integrates care work, we explore the complexities of labor within family contexts, contributing to feminist and care-oriented HCI and to the development of context-sensitive coparenting practices.2026ZWZiyi Wang et al.Beijing University of Civil Engineering and ArchitectureParticipatory DesignInclusive DesignEmpowerment of Marginalized GroupsCHI
SituFont: A Just-in-Time Adaptive Intervention Interface for Enhancing Mobile Readability in Situational Visual ImpairmentsSituational visual impairments (SVIs) hinder mobile readability, causing discomfort and limiting information access. Building on prior work in adaptive typography and accessibility, this paper presents SituFont, a context-aware and human-in-the-loop adaptive typography adjustment approach that enhances smartphone mobile readability by dynamically adjusting font parameters based on real-time contextual changes. Using smartphone sensors and a human-in-the-loop approach, SituFont personalizes text presentation to accommodate personal factors (e.g., fatigue, distraction) and environmental conditions (e.g., lighting, motion, location). To inform its design, we conducted formative interviews (N=15) to identify key SVI factors and controlled experiments (N=18) to quantify their impact on optimal text parameters. A comparative user study (N=12) across eight simulated SVI scenarios demonstrated SituFont's effectiveness in improving smartphone mobile readability in terms of improved efficiency and reduced workload compared with a non-trivial manual adjustment baseline.2026JCJingruo Chen et al.Cornell UniversityMobile Accessibility DesignBehavior Change & Reflection TechnologyContext-Aware ComputingCHI
KeySense: LLM-Powered Hands-Down, Ten-Finger Typing on Commodity TouchscreensExisting touchscreen software keyboards prevent users from resting their hands, forcing slow and fatiguing index-finger tapping (“chicken typing”) instead of familiar hands-down ten-finger typing. We present KeySense, a purely software solution that preserves physical keyboard motor skills. KeySense isolates intentional taps from resting-finger noise with cognitive–motor timing patterns, and then uses a fine-tuned LLM decoder to turn the resulting noisy letter sequence into the intended word. In controlled component tests, this decoder substantially outperforms 2 statistical baselines (top-1 accuracy 84.8% vs 75.7% and 79.3%). A 12-participant study shows clear ergonomic and performance benefits: compared with the conventional hover-style keyboard, users rated KeySense as markedly less physically demanding (NASA-TLX median 1.5 vs 4.0), and after brief practice, typed significantly faster (WPM 28.3 vs 26.2, p <0.01). These results indicate that KeySense enables accurate, efficient and comfortable ten-finger text entry on commodity touchscreens, without any extra hardware.2026TLTony Li et al.Stony Brook UniversitySoft Keyboard & Virtual Keyboard DesignLanguage Model-Assisted Text InputCHI
3DRing: Enabling Low-Cost 3D Hand Position Tracking by Fusing Inertial and Low-Framerate Optical SensingCurrent mobile hand tracking systems primarily rely on high-framerate (HFR) optical sensors to capture hand positions, resulting in high computational cost and limiting the applicability in end devices. We propose 3DRing, a 3D hand position tracking method that requires only low-framerate (LFR, <10 FPS) optical data and a single IMU ring. It consists of two stages: (1) a Deep Extended Kalman Filter module that predicts high-framerate hand positions from LFR optical measurements and a single IMU; (2) a Reinforcement Learning module that adaptively selects minimal keyframes for calibration, further reducing the average optical framerate. Using only 6.61 FPS optical data, 3DRing achieves an average real-time tracking error of 1.75 cm and an interaction efficiency of 86.0% in a 3D target selection task, compared to the 67 FPS hand tracking system of Meta Quest Pro, demonstrating a strong potential to reduce the reliance on optical data in mobile hand tracking tasks.2026ZLZhuojun Li et al.Tsinghua UniversityHand Gesture RecognitionFull-Body Interaction & Embodied InputForce Feedback & Pseudo-Haptic WeightCHI
GazeCoT: Unleashing Social Intelligence in Multimodal LLMs With Gaze-Informed Chain-of-Thought ReasoningSocial intelligence is vital for effective human-AI interaction. While LLMs demonstrate strong text-based social intelligence, the vision modality remains challenging due to the presence of non-verbal social cues. For example, gaze is the primary conveyor of social attention, yet it cannot be accurately perceived and understood by multimodal LLMs (MLLMs). Therefore, we propose GazeCoT, a pipeline using gaze estimation models to provide MLLMs with the attention of people in images or videos. The gaze information is provided as visual and text prompts compiled into a structured context to support MLLM social reasoning. Benchmark evaluation confirms that GazeCoT enhances MLLMs’ social intelligence by improving gaze perception. A user study in a challenging application involving parent-child interactions demonstrates that GazeCoT improves perceived explainability and trustworthiness by aligning MLLM social perception and social reasoning with human norms. We hope that GazeCoT, a versatile plug-and-play pipeline, can enable socially aware, MLLM-based HCI applications.2026ZYZhoutong Ye et al.Tsinghua UniversityEye Tracking & Gaze InteractionHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationCHI
EchoMind: Supporting Real-time Complex Problem Discussions through Human-AI Collaborative FacilitationTeams often engage in group discussions to leverage collective intelligence when solving complex problems. However, in real-time discussions, such as face-to-face meetings, participants frequently struggle with managing diverse perspectives and structuring content, which can lead to unproductive outcomes like forgetfulness and off-topic conversations. Through a formative study, we explores a human-AI collaborative facilitation approach, where AI assists in establishing a shared knowledge framework to provide a guiding foundation. We present EchoMind, a system that visualizes discussion knowledge through real-time issue mapping. EchoMind empowers participants to maintain focus on specific issues, review key ideas or thoughts, and collaboratively expand the discussion. The system leverages large language models (LLMs) to dynamically organize dialogues into nodes based on the current context recorded on the map. Our user study with four teams (N=16) reveals that EchoMind helps clarify discussion objectives, trace knowledge pathways, and enhance overall productivity. We also discuss the design implications for human-AI collaborative facilitation and the potential of shared knowledge visualization to transform group dynamics in future collaborations.2025WCWeihao Chen et al.Human-AI (and Robot!) CollaborationCSCW
Understanding Users' Perceptions and Expectations toward a Social Balloon Robot via an Exploratory StudyWe are witnessing a new epoch in embodied social agents. Most of the work has focused on ground or desktop robots that enjoy technical maturity and rich social channels but are often limited by terrain. Drones, which enable spatial mobility, currently face issues with safety and proximity. This paper explores a social balloon robot as a viable alternative that combines these advantages and alleviates limitations. To this end, we developed a hardware prototype named BalloonBot that integrates various devices for social functioning and a helium balloon. We conducted an exploratory lab study on users’ perceptions and expectations about its demonstrated interactions and functions. Our results show promise in using such a robot as another form of socially embodied agent. We highlight its unique mobile and approachable characteristics that harvest novel user experiences and outline factors that should be considered before its broad applications.2025CWChongyang Wang et al.Social Robot InteractionUIST
InterQuest: A Mixed-Initiative Framework for Dynamic User Interest Modeling in Conversational SearchIn online information-seeking tasks (e.g., for products and restaurants), users seek information that aligns with their individual preferences to make informed decisions. However, existing systems often struggle to infer users' implicit interests—unstated yet essential preference factors that directly impact decision quality. Our formative study reveals that User-Centric Knowledge—cross-task persistent preference attributes of users (e.g., "user cares about functionality details for electronics")—serves as a key indicator for resolving users' implicit interests. However, constructing such knowledge from task-specific data alone is insufficient due to three types of uncertainties—cold-start limitation, content accuracy, and scope applicability—which require user-provided information for knowledge alignment. Based on these insights, we present InterQuest, an LLM-based conversational search agent that dynamically models user interests. InterQuest combines two strategies: (1) Dynamic User Knowledge Modeling, which infers and adjusts the content and scope of User-Centric Knowledge, and (2) Uncertainty-Driven Questioning, where InterQuest proactively asks questions to resolve knowledge uncertainties. A user study with 18 participants demonstrates that InterQuest outperforms the baselines in user interest inference, accuracy of user knowledge modeling, and the overall information-seeking experience. Additionally, our findings provide valuable design implications for improving mixed-initiative user modeling in future systems.2025YMYu Mei et al.Human-LLM CollaborationRecommender System UXAlgorithmic Fairness & BiasUIST
From Operation to Cognition: Automatic Modeling Cognitive Dependencies from User Demonstrations for GUI Task AutomationTraditional Programming by Demonstration (PBD) systems primarily automate tasks by recording and replaying operations on Graphical User Interfaces (GUIs), without fully considering the cognitive processes behind operations. This limits their ability to generalize tasks with interdependent operations to new contexts (e.g. collecting and summarizing introductions depending on different search keywords from varied websites). We propose TaskMind, a system that automatically identifies the semantics of operations, and the cognitive dependencies between operations from demonstrations, building a user-interpretable task graph. Users modify this graph to define new task goals, and TaskMind executes the graph to dynamically generalize new parameters for operations, with the integration of Large Language Models (LLMs). We compared TaskMind with a baseline end-to-end LLM which automates tasks from demonstrations and natural language commands, without task graph. In studies with 20 participants on both predefined and customized tasks, TaskMind significantly outperforms the baseline in both success rate and controllability.2025YYYiwen Yin et al.Tsinghua University, Department of Computer Science and TechnologyHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationCHI
Palmpad: Enabling Real-Time Index-to-Palm Touch Interaction with a Single RGB CameraIndex-to-palm interaction plays a crucial role in Mixed Reality(MR) interactions. However, achieving a satisfactory inter-hand interaction experience is challenging with existing vision-based hand tracking technologies, especially in scenarios where only a single camera is available. Therefore, we introduce Palmpad, a novel sensing method utilizing a single RGB camera to detect the touch of an index finger on the opposite palm. Our exploration reveals that the incorporation of optical flow techniques to extract motion information between consecutive frames for the index finger and palm leads to a significant improvement in touch status determination. By doing so, our CNN model achieves 97.0% recognition accuracy and a 96.1% F1 score. In usability evaluation, we compare Palmpad with Quest's inherent hand gesture algorithms. Palmpad not only delivers superior accuracy 95.3% but also reduces operational demands and significantly improves users’ willingness and confidence. Palmpad aims to enhance accurate touch detection for lightweight MR devices.2025ZHZhe He et al.Tsinghua University, Department of Computer Science and TechnologyHand Gesture RecognitionFull-Body Interaction & Embodied InputMixed Reality WorkspacesCHI
WritingRing: Enabling Natural Handwriting Input with a Single IMU RingTracking continuous 2D sequential handwriting trajectories accurately using a single IMU ring is extremely challenging due to the significant displacement between the IMU's wearing position and the location of the tracked fingertip. We propose WritingRing, a system that uses a single IMU ring worn at the base of the finger to support natural handwriting input and provide real-time 2D trajectories. To achieve this, we first built a handwriting dataset using a touchpad and an IMU ring (N=20). Next, we improved the LSTM model by incorporating streaming input and a TCN network, significantly enhancing accuracy and computational efficiency, and achieving an average trajectory accuracy of 1.63mm. Real-time usability studies demonstrated that the system achieved 88.7% letter recognition accuracy and 68.2% word recognition accuracy, which reached 84.36% when restricting the output to words within a vocabulary of size 3000. WritingRing can also be embedded into existing ring systems, providing a natural and real-time solution for various applications.2025ZHZhe He et al.Tsinghua University, Department of Computer Science and TechnologyElectrical Muscle Stimulation (EMS)Hand Gesture RecognitionFoot & Wrist InteractionCHI
AutoPBL: An LLM-powered Platform to Guide and Support Individual Learners Through Self Project-based LearningSelf project-based learning (SPBL) is a popular learning style where learners follow tutorials and build projects by themselves. SPBL combines project-based learning’s benefit of being engaging and effective with the flexibility of self-learning. However, insufficient guidance and support during SPBL may lead to unsatisfactory learning experiences and outcomes. While LLM chatbots (e.g., ChatGPT) could potentially serve as SPBL tutors, we have yet to see an SPBL platform with responsible and systematic LLM integration. To address this gap, we present AutoPBL, an interactive learning platform for SPBL learners. We examined human PBL tutors’ roles through formative interviews to inform our design. AutoPBL features an LLM-guided learning process with checkpoint questions and in-context Q&A. In a user study where 29 beginners learned machine learning through entry-level projects, we found that AutoPBL effectively improves learning outcomes and elicits better learning behavior and metacognition by clarifying current priorities and providing timely assistance.2025YZYihao Zhu et al.Tsinghua University, Department of Computer Science and TechnologyHuman-LLM CollaborationProgramming Education & Computational ThinkingIntelligent Tutoring Systems & Learning AnalyticsCHI
Enhancing Smartphone Eye Tracking with Cursor-Based Interactive Implicit CalibrationThe limited accuracy of eye-tracking on smartphones restricts its use. Existing RGB-camera-based eye-tracking relies on extensive datasets, which could be enhanced by continuous fine-tuning using calibration data implicitly collected from the interaction. In this context, we propose COMETIC (Cursor Operation Mediated Eye-Tracking Implicit Calibration), which introduces a cursor-based interaction and utilizes the inherent correlation between cursor and eye movement. By filtering valid cursor coordinates as proxies for the ground truth of gaze and fine-tuning the eye-tracking model with corresponding images, COMETIC enhances accuracy during the interaction. Both filtering and fine-tuning use pre-trained models and could be facilitated using personalized, dynamically updated data. Results show COMETIC achieves an average eye-tracking er- ror of 278.3 px (1.60 cm, 2.29◦), representing a 27.2% improvement compared to that without fine-tuning. We found that filtering cursor points whose actual distance to gaze is 150.0 px (0.86 cm) yields the best eye-tracking results.2025CLChang Liu et al.Tsinghua University, Department of Computer Science and TechnologyEye Tracking & Gaze InteractionHuman-LLM CollaborationVisualization Perception & CognitionCHI
Investigating Context-Aware Collaborative Text Entry on Smartphones using Large Language ModelsText entry is a fundamental and ubiquitous task, but users often face challenges such as situational impairments or difficulties in sentence formulation. Motivated by this, we explore the potential of large language models (LLMs) to assist with text entry in real-world contexts. We propose a collaborative smartphone-based text entry system, CATIA, that leverages LLMs to provide text suggestions based on contextual factors, including screen content, time, location, activity, and more. In a 7-day in-the-wild study with 36 participants, the system offered appropriate text suggestions in over 80% of cases. Users exhibited different collaborative behaviors depending on whether they were composing text for interpersonal communication or information services. Additionally, the relevance of contextual factors beyond screen content varied across scenarios. We identified two distinct mental models: AI as a supportive facilitator or as a more equal collaborator. These findings outline the design space for human-AI collaborative text entry on smartphones.2025WCWeihao Chen et al.Tsinghua University, Department of Computer Science and TechnologyVoice User Interface (VUI) DesignHuman-LLM CollaborationContext-Aware ComputingCHI
UbiPhysio: Support Daily Functioning, Fitness, and Rehabilitation with Action Understanding and Feedback in Natural LanguageWang 等人开发 UbiPhysio 系统,通过动作理解和自然语言反馈,帮助用户进行日常功能锻炼、健身和康复训练。2024CWChongyang Wang et al.Vibrotactile Feedback & Skin StimulationFull-Body Interaction & Embodied InputUbiComp
G-VOILA: Gaze-Facilitated Information Querying in Daily ScenariosWang 等人提出 G-VOILA 系统,利用眼动追踪技术 Facilitate 日常场景中的信息查询交互。2024ZWZeyu Wang et al.Eye Tracking & Gaze InteractionUbiComp