PrivWeb: Unobtrusive and Content-aware Privacy Protection For Web AgentsWhile web agents gained popularity by automating web interactions, their requirement for interface access introduces privacy risks that are understudied, particularly from users' perspective. Through a formative study (N=15), we found that users frequently misunderstand agent data practices, and desire unobtrusive, transparent data management. To achieve this, we developed PrivWeb, a trusted add-on on web agents that utilizes a localized LLM to anonymize private information on interfaces based on user preferences. It employs a tiered delegation to balance automation and intrusiveness, using ambient notifications for low-sensitivity data and enforces a mandatory pause for high-sensitivity data. The user study (N=14) across travel, information retrieval, shopping, and entertainment tasks showed that PrivWeb enhances perceived privacy protection and trust compared to transparency-only baselines, without increasing cognitive load. Crucially, we identified user delegation strategies: they prefer to manually execute sensitive steps for high-sensitivity data, while granting agent access to low-sensitivity data.2026SZShuning Zhang et al.Tsinghua UniversityPrivacy by Design & User ControlPrivacy Perception & Decision-MakingHuman-LLM CollaborationCHI
VisGuardian: A Lightweight Group-based Visual Privacy Control Technique For Smart Glasses in Home EnvironmentsAlways-on sensing of AI applications on AR glasses makes traditional permission techniques inefficient for context-dependent private visual data within home environments. Home presents a challenging privacy context due to massive sensitive objects and the intimate nature of daily routines. We propose VisGuardian, a fine-grained content-based visual permission technique for AR glasses. VisGuardian features a group-based control mechanism that enables users to efficiently manage permissions for multiple private objects. VisGuardian detects objects using YOLO and adopts a pre-classified schema to group them. By selecting a single object, users can obscure groups of related objects based on criteria including privacy sensitivity, object category, or spatial proximity. A technical evaluation shows VisGuardian achieves mAP50 of 0.6704 with only 14.0 ms latency and a 1.7% increase in battery consumption per hour. Furthermore, a user study (N=24) comparing VisGuardian to slider-based and object-based baselines found it to be significantly faster for setting permissions and was preferred by users for its efficiency, effectiveness, and ease of use.2026SZShuning Zhang et al.Tsinghua UniversitySmart Home Privacy & SecurityPrivacy by Design & User ControlAR Navigation & Context AwarenessCHI
"Privacy across the boundary": Examining Perceived Privacy Risk Across Data Transmission and Sharing Ranges of Smart Home Personal AssistantsAs Smart Home Personal Assistants (SPAs) evolve into social agents, understanding user privacy necessitates interpersonal communication frameworks, such as Privacy Boundary Theory (PBT). To ground our investigation, our three-phase preliminary study (1) identified transmission and sharing ranges as key boundary-related risk factors, (2) categorized relevant SPA functions and data types, and (3) analyzed commercial practices, revealing widespread data sharing and non-transparent safeguards. A subsequent mixed-methods study (N=412 survey, N=40 interviews among the survey participants) assessed users' perceived privacy risks across data types, transmission ranges and sharing ranges. Results demonstrate a significant, non-linear escalation in perceived risk when data crosses two critical boundaries: the `public network' (transmission) and `third parties' (sharing). This boundary effect holds across data types and demographics. Furthermore, risk perception is modulated by data attributes, and contextual privacy calculus. Conversely, anonymization show limited efficacy especially for third-party sharing, a finding attributed to user distrust. These findings empirically ground PBT in SPA context and inform design of boundary-aware privacy protection.2026SZShuning Zhang et al.Tsinghua UniversityPrivacy by Design & User ControlSmart Home Privacy & SecurityCHI
Characterizing Unintended Consequences of GUI Agents For Web BrowsingThe integration of LLMs into GUI agents promises to revolutionize web browsing automation, yet the practical user experience remains challenging. This paper systematically characterizes user-reported issues with GUI agents by focusing on three dimensions: phenomena, influences, and user-centric mitigation. We adopted a two-phase method combining social media analysis (N=221 posts) and semi-structured interviews (N=21). Our findings reveal a taxonomy of complaints unique to GUI agents, including deficits in grounding abstract intent into concrete interface affordances, the inability to adapt to dynamic visual states, and the execution of erroneous actions. These lead to influences distinct from text-based hallucinations, ranging from task abandonment to security risks like uncontrolled file system access. In response, users are forced to employ ad-hoc mitigation strategies, including ecological sandboxing, and cursor shadowing to correct GUI agents behaviors. We contribute: (1) a comprehensive characterization of complaints specific to GUI agents interaction, (2) an analysis of how these phenomena degrade interaction integrity, and (3) design implications for creating consequence-aware agents.2026SZShuning Zhang et al.Tsinghua UniversityHuman-LLM CollaborationExplainable AI (XAI)Privacy by Design & User ControlCHI
Collab: Fostering Critical Identification of Deepfake Videos on Social Media via Synergistic AnnotationIdentifying deepfake videos on social media platforms is challenged by dynamic spatio-temporal artifacts and inadequate user tools. This hinders both critical viewing by users and scalable moderation on platforms. Here, we present Collab, a web plugin enabling users to collaboratively annotate deepfake videos. Collab integrates three key components: (i) an intuitive interface for spatio-temporal labeling where users provide confidence scores and rationales, facilitating detailed input even from non-experts, (ii) a novel confidence-weighted spatio-temporal Intersection-over-Union (IoU) algorithm to aggregate diverse user annotations into accurate aggregations, and (iii) a hierarchical demonstration strategy presenting aggregated results to guide attention toward contentious regions and foster critical evaluation. A seven-day online study (N=90), where participants annotated suspicious videos when viewing an online experimental platforms, compared Collab against two conditions without aggregation or demonstration respectively. Collab significantly improved identification accuracy and enhanced reflection compared to non-demonstration condition, while outperforming non-aggregation condition for its novelty and effectiveness.2026SZShuning Zhang et al.Tsinghua UniversityDeepfake & Synthetic Media DetectionContent Moderation & Platform GovernanceMisinformation & Fact-CheckingCHI
Roomify: Spatially-Grounded Style Transformation for Immersive Virtual EnvironmentsWe present Roomify, a spatially-grounded transformation system that generates themed virtual environments anchored to users' physical rooms while maintaining spatial structure and functional semantics. Current VR approaches face a fundamental trade-off: full immersion sacrifices spatial awareness, while passthrough solutions break presence. Roomify addresses this through spatially-grounded transformation—treating physical spaces as "spatial containers'' that preserve key functional and geometric properties of furniture while enabling radical stylistic changes. Our pipeline combines in-situ 3D scene understanding, AI-driven spatial reasoning, and style-aware generation to create personalized virtual environments grounded in physical reality. We introduce a cross-reality authoring tool enabling fine-grained user control through MR editing and VR preview workflows. Two user studies validate our approach: one with 18 VR users demonstrates a 63% improvement in presence over passthrough and 26% over fully virtual baselines while maintaining spatial awareness; another with 8 design professionals confirms the system's creative expressiveness (scene quality: 5.95/7; creativity support: 6.08/7) and professional workflow value across diverse environments.2026XWXueyang Wang et al.Tsinghua UniversitySocial & Collaborative VRMixed Reality WorkspacesImmersion & Presence ResearchCHI
Request a Note: How the Request Function Shapes X's Community Notes SystemX's Community Notes is a crowdsourced fact-checking system. To improve its scalability, X introduced ``Request Community Note'' feature, enabling users to solicit fact-checks from contributors on specific posts. Yet, its implications for the system---what gets checked, by whom, and with what quality---remain unclear. Using 98,685 requested posts and their associated notes, we evaluate how requests shape the Community Notes system. We find that requested posts with higher GPT-estimated misleadingness and from authors with greater misinformation exposure are more likely to receive notes. Conversely, requested political posts (vs. non-political) are less likely to receive notes. We also observe partisan asymmetries: posts from Republicans are more likely to receive notes than those from Democrats. Although only 12% of requested posts receive request-fostered notes from top contributors, these notes are rated as more helpful and less polarized than others, partly reflecting top contributors' selective fact-checking of misleading posts. Our findings highlight both the limitations and promise of requests for scaling high-quality community-based fact-checking.2026YCYuwei Chuai et al.University of LuxembourgContent Moderation & Platform GovernanceMisinformation & Fact-CheckingVolunteer Coordination & Crowdsourced Disaster ReliefCHI
A Scoping Review and Guidelines on Privacy Policy's Visualization from an HCI PerspectivePrivacy Policies are a cornerstone of informed consent, yet a persistent gap exists between their legal intent and practical efficacy. Despite decades of research proposing various visualizations, user comprehension remains low, and designs rarely see widespread adoption. To understand this landscape and chart a path forward, we synthesized 65 top-tier papers using a framework adapted from user-centered design lifecycles. Our analysis presented four findings of the field's evolution: (1) trade-off between information load and decision efficacy, which shows a shift from augmenting disclosures to cognitive load management, (2) co-evolutionary dynamic of design and automation, revealing that designs such as context-awareness drove automation needs, while LLM breakthroughs enable the semantic interpretation required to realize those designs, (3) tension between generality and specificity, highlighting the divergence between standardized solutions and the increasing necessity for specialized interaction in IoT and immersive environments, and (4) balancing stakeholder opinions, where visualization efficacy is constrained by the interplay of regulatory mandates, developer capabilities and provider incentives.2026SZShuning Zhang et al.Tsinghua UniversityPrivacy Perception & Decision-MakingPrivacy by Design & User ControlExplainable AI (XAI)CHI
Mind the Gap: Mapping Wearer–Bystander Privacy Tensions and Context-Adaptive Pathways for Camera GlassesCamera glasses create fundamental privacy tensions between wearers seeking recording functionality and bystanders concerned about unauthorized surveillance. We present a systematic multi-stakeholder evaluation of privacy mechanisms through surveys (N=525) and paired interviews (N=20) in China. Study 1 quantifies expectation-willingness gaps: bystanders consistently demand stronger information transparency and protective measures than wearers will provide, with disparities intensifying in sensitive contexts where 65–90\% of bystanders would take defensive action. Study 2 evaluates twelve privacy-enhancing technologies, revealing four fundamental trade-offs that undermine current approaches: visibility versus disruption, empowerment versus burden, protection versus agency, and accountability versus exposure. These gaps reflect structural incompatibilities rather than inadequate goodwill, with context emerging as the primary determinant of privacy acceptability. We propose context-adaptive pathways that dynamically adjust protection strategies: minimal-friction visibility in public spaces, structured negotiation in semi-public environments, and automatic protection in sensitive contexts. Our findings contribute a diagnostic framework for evaluating privacy mechanisms and implications for context-aware design in ubiquitous sensing.2026XWXueyang Wang et al.Tsinghua UniversityPrivacy by Design & User ControlPrivacy Perception & Decision-MakingContext-Aware ComputingCHI
Exploring Collaboration Patterns and Strategies in Human-AI Co-creation through the Lens of Agency: A Scoping Review of the Top-tier HCI LiteratureAs Artificial Intelligence (AI) increasingly becomes an active collaborator in co-creation, understanding the distribution and dynamic of agency is paramount. The Human-Computer Interaction (HCI) perspective is crucial for this analysis, as it uniquely reveals the interaction dynamics and specific control mechanisms that dictate how agency manifests in practice. Despite this importance, a systematic synthesis mapping agency configurations and control mechanisms within the HCI/CSCW literature is lacking. Addressing this gap, we reviewed 134 papers from top-tier HCI/CSCW venues (e.g., CHI, UIST, CSCW) over the past 20 years. This review yields four primary contributions: (1) an integrated theoretical framework structuring agency patterns, control mechanisms, and interaction contexts, (2) a comprehensive operational catalog of control mechanisms detailing how agency is implemented; (3) an actionable cross-context map linking agency configurations to diverse co-creative practices; and (4) grounded implications and guidance for future CSCW research and the design of co-creative systems, addressing aspects like trust and ethics.2025SZShuning Zhang et al.Getting Things Done With AICSCW
PrivCAPTCHA: Interactive CAPTCHA to Facilitate Effective Comprehension of APP Privacy PolicyTraditional app privacy policies are often lengthy and non-interactive, leading users to skip them and remain uninformed. To address this, we proposed PrivCAP, a technique to enhance user comprehension by presenting policies in a concise, interactive format. PrivCAP adopted a CAPTCHA-based design, requiring users to interact with clickable chunks of concise policy content, thus reducing physical and cognitive load. A formative study (N=38) demonstrated that participants valued informed consent alongside concerns over data collection and sharing, marking the first such evaluation among Chinese users. This study further found a preference for concise visualizations and interactable formats. PrivCAP, leveraging few-shot prompting on Large Language Models (LLMs), accurately translates privacy policies into clickable, chunked formats optimized for smartphone screens. In an evaluation (N=28), PrivCAP outperformed traditional policy presentations in improving user understanding, reducing cognitive load, and maintaining efficiency, with participants favoring its engaging design and reporting more informed decision-making.2025SZShuning Zhang et al.Tsinghua University, Institute for Network Sciences and CyberspaceVR Medical Training & RehabilitationPrivacy by Design & User ControlPrivacy Perception & Decision-MakingCHI
Actual Achieved Gain and Optimal Perceived Gain: Modeling Human Take-over Decisions Towards Automated Vehicles' SuggestionsDriver decision quality in take-overs is critical for effective human-Autonomous Driving System (ADS) collaboration. However, current research lacks detailed analysis of its variations. This paper introduces two metrics--Actual Achieved Gain (AAG) and Optimal Perceived Gain (OPG)--to assess decision quality, with OPG representing optimal decisions and AAG reflecting actual outcomes. Both are calculated as weighted averages of perceived gains and losses, influenced by ADS accuracy. Study 1 (N=315) used a 21-point Thurstone scale to measure perceived gains and losses—key components of AAG and OPG—across typical tasks: route selection, overtaking, and collision avoidance. Studies 2 (N=54) and 3 (N=54) modeled decision quality under varying ADS accuracy and decision time. Results show with sufficient time (>3.5s), AAG converges towards OPG, indicating rational decision-making, while limited time leads to intuitive and deterministic choices. Study 3 also linked AAG-OPG deviations to irrational behaviors. An intervention study (N=8) and a pilot (N=4) employing voice alarms and multi-modal alarms based on these deviations demonstrated AAG's potential to improve decision quality.2025SZShuning Zhang et al.Tsinghua University, Institute for Network Sciences and CyberspaceAutomated Driving Interface & Takeover DesignHead-Up Display (HUD) & Advanced Driver Assistance Systems (ADAS)AI-Assisted Decision-Making & AutomationCHI
Raise Your Eyebrows Higher: Facilitating Emotional Communication in Social Virtual Reality Through Region-Specific Facial Expression ExaggerationWhile exaggerated facial expressions in cartoon avatars can enhance emotional communication in social virtual reality (VR), they risk triggering the uncanny valley effect. Our research reveals that this effect varies significantly across different emotions. In Study 1 (N=30), participants evaluated scaled facial expressions during simulated VR conversations. We found that expression exaggeration had opposing effects: it decreased facial realism for joy, surprise, and disgust due to overly dramatic mouth movements, while enhancing realism for fear, sadness, and anger—emotions that rely on upper facial expressions typically constrained by HMD pressure. Based on these findings, we developed a region-specific facial expression exaggeration strategy that enhances under-expressed upper facial features while maintaining natural lower facial movements. Study 2 (N=20) validated this approach, demonstrating enhanced emotional intensity and contagion for negative emotions while mitigating the uncanny valley effect. Our research provides practical guidelines for optimizing avatar-mediated emotional communication in social VR environments.2025XWXueyang Wang et al.Tsinghua University, Institute for Network Sciences and CyberspaceSocial & Collaborative VRImmersion & Presence ResearchIdentity & Avatars in XRCHI
The EarSAVAS Dataset: Enabling Subject-Aware Vocal Activity Sensing on EarablesZhang 等人构建 EarSAVAS 数据集,支持智能耳穿戴设备进行主体感知的语音活动检测,推动相关算法研究。2024XZXiyuxing Zhang et al.Biosensors & Physiological MonitoringUbiComp
From 2D to 3D: Facilitating Single-Finger Mid-Air Typing on QWERTY Keyboards with Probabilistic Touch ModelingMid-air text entry on virtual keyboards suffers from the lack of tactile feedback, which brings challenges to both tap detection and input prediction. In this paper, we explored the feasibility of single-finger typing on virtual QWERTY keyboards in mid-air. We first conducted a study to examine users' 3D typing behavior on different sizes of virtual keyboards. Results showed that the participants perceived the vertical projection of the lowest point on the keyboard during a tap as the target location and inferring taps based on the intersection between the finger and the keyboard was not applicable. Aiming at this challenge, we derived a novel input prediction algorithm that took the uncertainty in tap detection into a calculation as probability, and performed probabilistic decoding that could tolerate false detection. We analyzed the performance of the algorithm through a full-factorial simulation. Results showed that the SVM-based probabilistic touch detection together with a 2D elastic probabilistic decoding algorithm (elasticity = 2) could achieve the optimal top-5 accuracy of 94.2%. In the evaluation user study, the participants reached a single-finger typing speed of 26.1 WPM with 3.2% uncorrected word-level error rate, which was significantly better than both tap-based and gesture-based baseline techniques. Also, the proposed technique received the highest preference score from the users, proving its usability in real text entry tasks. https://dl.acm.org/doi/10.1145/35808292023XYXin Yi et al.Mid-Air Haptics (Ultrasonic)Hand Gesture RecognitionVoice User Interface (VUI) DesignUbiComp
Modeling the Trade-off of Privacy Preservation and Activity Recognition on Low-Resolution ImagesA computer vision system using low-resolution image sensors can provide intelligent services (e.g., activity recognition) but preserve unnecessary visual privacy information from the hardware level. However, preserving visual privacy and enabling accurate machine recognition have adversarial needs on image resolution. Modeling the trade-off of privacy preservation and machine recognition performance can guide future privacy-preserving computer vision systems using low-resolution image sensors. In this paper, using the at-home activity of daily livings (ADLs) as the scenario, we first obtained the most important visual privacy features through a user survey. Then we quantified and analyzed the effects of image resolution on human and machine recognition performance in activity recognition and privacy awareness tasks. We also investigated how modern image super-resolution techniques influence these effects. Based on the results, we proposed a method for modeling the trade-off of privacy preservation and activity recognition on low-resolution images.2023YWYuntao Wang et al.Tsinghua UniversityHuman Pose & Activity RecognitionPrivacy Perception & Decision-MakingCHI
Squeez'In: Private Authentication on Smartphones based on Squeezing GesturesIn this paper, we proposed \emph{Squeez'In}, a technique on smartphones that enabled private authentication by holding and squeezing the phone with a unique pattern. We first explored the design space of practical squeezing gestures for authentication by analyzing the participants' self-designed gestures and squeezing behavior. Results showed that varying-length gestures with two levels of touch pressure and duration were the most natural and unambiguous. We then implemented \emph{Squeez'In} on an off-the-shelf capacitive sensing smartphone, and employed an SVM-GBDT model for recognizing gestures and user-specific behavioral patterns, achieving 99.3\% accuracy and 0.93 F1-score when tested on 21 users. A following 14-day study validated the memorability and long-term stability of \proj. During usability evaluation, compared with gesture and pin code, \emph{Squeez'In} achieved significantly faster authentication speed and higher user preference in terms of privacy and security.2023XYXin Yi et al.Tsinghua UniversityForce Feedback & Pseudo-Haptic WeightPasswords & AuthenticationCHI
DEEP: 3D Gaze Pointing in Virtual Reality Leveraging Eyelid MovementGaze-based target suffers from low input precision and target occlusion. In this paper, we explored to leverage the continuous eyelid movement to support high-efficient and occlusion-robust dwell-based gaze pointing in virtual reality. We first conducted two user studies to examine the users' eyelid movement pattern both in unintentional and intentional conditions. The results proved the feasibility of leveraging intentional eyelid movement that was distinguishable with natural movements for input. We also tested the participants' dwelling pattern for targets with different sizes and locations. Based on these results, we propose DEEP, a novel technique that enables the users to see through occlusions by controlling the aperture angle of their eyelids and dwell to select the targets with the help of a probabilistic input prediction model. Evaluation results showed that DEEP with dynamic depth and location selection incorporation significantly outperformed its static variants, as well as a naive dwelling baseline technique. Even for 100% occluded targets, it could achieve an average selection speed of 2.5s with an error rate of 2.3%.2022XYXin Yi et al.Eye Tracking & Gaze InteractionImmersion & Presence ResearchUIST
SemanticAdapt: Optimization-based Adaptation of Mixed Reality Layouts Leveraging Virtual-Physical Semantic ConnectionsWe present an optimization-based approach that automatically adapts Mixed Reality (MR) interfaces to different physical environments. Current MR layouts, including the position and scale of virtual interface elements, need to be manually adapted by users whenever they move between environments, and whenever they switch tasks. This process is tedious and time consuming, and arguably needs to be automated for MR systems to be beneficial for end users. We contribute an approach that formulates this challenge as a combinatorial optimization problem and automatically decides the placement of virtual interface elements in new environments. To achieve this, we exploit the semantic association between the virtual interface elements and physical objects in an environment. Our optimization furthermore considers the utility of elements for users' current task, layout factors, and spatio-temporal consistency to previous layouts. All those factors are combined in a single linear program, which is used to adapt the layout of MR interfaces in real time. We demonstrate a set of application scenarios, showcasing the versatility and applicability of our approach. Finally, we show that compared to a naive adaptive baseline approach that does not take semantic associations into account, our approach decreased the number of manual interface adaptations by 33\%.2021YCYifei Cheng et al.AR Navigation & Context AwarenessMixed Reality WorkspacesContext-Aware ComputingUIST
Facilitating Text Entry on Smartphones with QWERTY Keyboard for Users with Parkinson’s DiseaseQWERTY is the primary smartphone text input keyboard configuration. However, insertion and substitution errors caused by hand tremors, often experienced by users with Parkinson's disease, can severely affect typing efficiency and user experience. In this paper, we investigated Parkinson's users' typing behavior on smartphones. In particular, we identified and compared the typing characteristics generated by users with and without Parkinson's symptoms. We then proposed an elastic probabilistic model for input prediction. By incorporating both spatial and temporal features, this model generalized the classical statistical decoding algorithm to correct insertion, substitution and omission errors, while maintaining direct physical interpretation. User study results confirmed that the proposed algorithm outperformed baseline techniques: users reached 22.8 WPM typing speed with a significantly lower error rate and higher user-perceived performance and preference. We concluded that our method could effectively improve the text entry experience on smartphones for users with Parkinson's disease.2021YWYuntao Wang et al.Tsinghua University, University of WashingtonMotor Impairment Assistive Input TechnologiesShape-Changing Materials & 4D PrintingCHI