"Un-default" Behavior Tuning: Specifying Model Behavior outside the Norm with LLM Self-Playing and Self-ImprovingSpecifying model behavior is challenging—especially when the desired behavior is unpopular relative to the model’s training data. Reversing the influence of massive training corpora is both time-consuming and costly, and such interventions are typically inaccessible to end users. While Large Language Models (LLMs) make it easier to write instructions using natural language, specifying unpopular behaviors remains a difficult task. We introduce \Undefault{}, a human-in-the-loop framework that combines self-play with self-refinement to better specify such behaviors. Our system enables users to identify popular (but undesired) model behaviors through self-play, then iteratively guide the model toward preferred alternatives by refining prompts in a self-improving loop. Our first evaluation involves user study conducted on a system implementation of \Undefault{} within the context of chatbot behavior. Our system self-play itself by simulating user interactions to identify patterns and create effective prompts based on the pattern. In a within-subject study (N=12), participants pinpointed more patterns through self-playing and crafted better prompts. Surprisingly, users felt more or less success level in specifying the model behavior. Follow-up crowd studies (N=60) confirmed that the chatbot adhered to instructions without sacrificing quality. Our second evaluation is a case study on a real-world implementation using a movie rating dataset with \Undefault{}, demonstrating its effectiveness and robustness in modeling a critic's preferences across the spectrum of low to highly rated movies. Together, these results suggest how AI improves the design process of interactive AI systems. Furthermore, they suggest how the benefits of these tools may be non-obvious to end-users. We reflect on these findings and suggest future directions.2026SPSoya Park et al.MITHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationExplainable AI (XAI)IUI
PopSignAI: Towards Using Sign Language Recognition Games to Improve American Sign Language Learning in Novice SignersTo help novice signers learn American Sign Language, we develop PopSignAI, a proof-of-concept smartphone-based bubble-shooter game that facilitates real-time interaction through isolated sign language recognition. In a 20-person user study, we demonstrate that encouraging novice signers to practice generating sign in PopSignAI is more efficient for teaching ASL skills than a version of PopSign focused on receptive signing ability. We use over 200,000 examples of 250 signs from 47 signers to train and test a user-independent LSTM recognizer that achieves 82.9\% accuracy on an independent test set. For the purposes of the game, the recognizer averages 99.6\% accuracy with a 7ms inference time using a 2.5MB model. Ablation studies suggest that as few as eight signers are need for training in order for adequate recognition accuracy for PopSignAI's gameplay. To encourage future sign language recognition games, we release the PopSignAI recognition pipeline and software. We identify hearing parents of deaf children as important potential users of sign games and conduct interviews with eight of these parents, investigating their motivation and challenges in learning sign.2026DMDavid Martin et al.Georgia Institute of TechnologyHand Gesture RecognitionSpecial Education TechnologyChild-Computer Interaction DesignIUI
Strategic Tradeoffs Between Humans and AI in Multi-Agent BargainingMarkets increasingly accommodate large language models (LLMs) as autonomous decision-making agents. As this transition occurs, it becomes critical to evaluate how these agents behave relative to their human and task-specific statistical predecessors. In this work, we present results from an empirical study comparing humans (N=216), multiple frontier LLMs, and customized Bayesian agents in dynamic multi-player bargaining games under identical conditions. Bayesian agents extract the highest surplus with aggressive trade proposals that are frequently rejected. Humans and LLMs achieve comparable aggregate surplus within their groups, but exhibit different trading strategies. LLMs favor conservative, concessionary proposals that are usually accepted by other LLMs, while humans propose trades that are consistent with fairness norms but are more likely to be rejected. These findings highlight that performance parity---a common benchmark in agent evaluation---can mask substantive procedural differences in \emph{how} LLMs behave in complex multi-agent interactions.2026CQCrystal Qian et al.Google DeepMindHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationExplainable AI (XAI)IUI
Black LLMirror: User (Self) Perceptions in Black American English Interactions with LLMsLLMs becoming increasingly personalized to users’ language style raises both excitement and concerns for minority users such as Black American English (BAE) speakers. Yet, previous work has predominantly focused on user perceptions of out-of-context BAE statements by LLMs rather than naturalistic multi-turn interactions, and has ignored such systems’ effects on users’ self-perception. In this work, we examine the effects that multi-turn interactions with speech and text BAE-producing LLMs have on BAE speakers’ perceptions of the LLM and of themselves. We observe a significant change in participant self-esteem following the interactions, and notable qualitative differences between BAE-LLM and Standard American English (SAE) LLM interactions. We also observe significant effects of BAE-usage on user perception of the model within speech-based interactions. Our findings suggest that the effects of BAE-usage by an LLM agent on model- and self-perception among BAE-speaking users are complex and widely varied.2026MCMikayla Campbell et al.Carnegie Mellon UniversityHuman-LLM CollaborationAI Ethics, Fairness & AccountabilityAlgorithmic Fairness & BiasCHI
How Well Can 3D Accessibility Guidelines Support XR Development? An Interview Study with XR Practitioners in IndustryWhile accessibility (a11y) guidelines exist for 3D games and virtual worlds, their applicability to extended reality (XR)'s unique interaction paradigms (e.g., spatial tracking, kinesthetic interactions) remains unexplored. XR practitioners need practical guidance to successfully implement a11y guidelines under real-world constraints. We present the first evaluation of existing 3D a11y guidelines applied to XR development through semi-structured interviews with 25 XR practitioners across diverse organization contexts. We assessed 20 commonly-agreed a11y guidelines from six major resources across visual, motor, cognitive, speech, and hearing domains, comparing practitioners' development practices against guideline applicability to XR. Our investigation reveals that guidelines can be highly effective when designed as transformation catalysts rather than compliance checklists, but fundamental mismatches exist between existing 3D guidelines and XR requirements, creating both implementation barriers and design gaps. This work provides foundational insights towards developing a11y guidelines and support tools that address XR's distinct characteristics.2026DKDaniel Killough et al.University of Wisconsin-MadisonVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Universal & Inclusive DesignImmersion & Presence ResearchCHI
TALES: A Taxonomy and Analysis of Cultural Representations in LLM-generated StoriesMillions of users across the globe turn to AI chatbots for their creative needs, inviting widespread interest in understanding how they represent diverse cultures. However, evaluating cultural representations in open-ended tasks remains challenging and underexplored. In this work, we present TALES, an evaluation of cultural misrepresentations in LLM-generated stories for diverse Indian cultural identities. First, we develop TALES-Tax, a taxonomy of cultural misrepresentations by collating insights from participants with lived experiences in India through focus groups (N=9) and individual surveys (N=15). Using TALES-Tax, we evaluate 6 models through a large-scale annotation study spanning 2,925 annotations from 108 annotators with lived experience and native language proficiency from across 71 regions in India and 14 languages. Concerningly, we find that 88% of the generated stories contain misrepresentations, and such errors are more prevalent in mid- and low-resourced languages and stories based in peri-urban regions in India. We also transform the annotations into TALES-QA, a standalone question bank to evaluate the cultural knowledge of models.2026KBKirti Bhagat et al.Indian Institute of ScienceHuman-LLM CollaborationAI Ethics, Fairness & AccountabilityLow-Resource Languages & Digital InclusionCHI
Ability Heuristics for Conducting Accessibility InspectionsThe accessibility of interactive technologies is often evaluated using checklists that are low-level, numerous, and platform-specific. Such checklists are typically used by accessibility experts, leaving everyday designers and developers with little support for assessing their own interfaces. To make accessibility evaluations easier to conduct, we devised a set of nine ``ability heuristics'' that prompt designers to engage with accessibility throughout the design process. We empirically evaluated these ability heuristics with 37 design students, comparing them to usability heuristics and WCAG. The ability heuristics emphasized the quality of accessibility features compared to the other methods, and surfaced issues that were more broadly dispersed across disability groups. Further, the students found the heuristics were as easy to use as the alternative methods. We argue that the heuristics help to move beyond binary notions of accessibility, pushing designers to consider the quality of features across diverse disabilities and the range of abilities within.2026CMClaire L. Mitchell et al.University of WashingtonUniversal & Inclusive DesignCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Participatory DesignCHI
Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM InferenceLarge Language Models (LLMs) such as ChatGPT can infer personal attributes from seemingly innocuous text, raising privacy risks beyond memorized data leakage. While prior work has demonstrated these risks, little is known about how users estimate and respond. We conducted a survey with 240 U.S. participants who judged text snippets for inference risks, reported concern levels, and attempted rewrites to block inference. We compared their rewrites with those generated by ChatGPT and Rescriber, a state-of-the-art sanitization tool. Results show that participants struggled to anticipate inference, performing a little better than chance. User rewrites were effective in just 28\% of cases - better than Rescriber but worse than ChatGPT. We examined our participants' rewriting strategies, and observed that while paraphrasing was the most common strategy it is also the least effective; instead abstraction and adding ambiguity were more successful. Our work highlights the importance of inference-aware design in LLM interactions.2026SWSynthia Qia Wang et al.University of ChicagoExplainable AI (XAI)Privacy by Design & User ControlPrivacy Perception & Decision-MakingCHI
Improving Low-Vision Chart Accessibility via On-Cursor Visual ContextDespite widespread use, charts remain largely inaccessible for Low-Vision Individuals (LVI). Reading charts requires viewing data points within a global context, which is difficult for LVI who may rely on magnification or experience a partial field of vision. We aim to improve exploration by providing visual access to critical context. To inform this, we conducted a formative study with five LVI. We identified four fundamental contextual elements common across chart types: axes, legend, grid lines, and the overview. We propose two pointer-based interaction methods to provide this context: Dynamic Context, a novel focus+context interaction, and Mini-map, which adapts overview+detail principles for LVI. In a study with N=22 LVI, we compared both methods and evaluated their integration to current tools. Our results show that Dynamic Context had significant positive impact on access, usability, and effort reduction; however, worsened visual load. Mini-map strengthened spatial understanding, but was less preferred for this task. We offer design insights to guide the development of future systems that support LVI with visual context while balancing visual load.2026YSYotam Sechayk et al.The University of TokyoVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Interactive Data VisualizationUncertainty VisualizationCHI
ADCanvas: Accessible and Conversational Audio Description Authoring for Blind and Low Vision CreatorsAudio Description (AD) provides essential access to visual media for blind and low vision (BLV) audiences. Yet current AD production tools remain largely inaccessible to BLV video creators, who possess valuable expertise but face barriers due to visually-driven interfaces. We present ADCanvas, a multimodal authoring system that supports non-visual control over audio description (AD) creation. ADCanvas combines conversational interaction with keyboard-based playback control and a plain-text, screen reader–accessible editor to support end-to-end AD authoring and visual question answering (VQA). Combining screen-reader-friendly controls with a multimodal LLM agent, ADCanvas supports live VQA, script generation, and AD modification. Through a user study with 12 BLV video creators, we find that users adopt the conversational agent as an informational aide and drafting assistant, while maintaining agency through verification and editing. For example, participants saw themselves as curators who received information from the model and filtered it down for their audience. Our findings offer design implications for accessible media tools, including precise editing controls, accessibility support for creative ideation, and configurable rules for human-AI collaboration.2026FLFranklin Mingzhe Li et al.Carnegie Mellon UniversityVoice AccessibilityAI-Assisted Creative WritingAI-Assisted Writing & Text GenerationCHI
DeltaDorsal: Enhancing Hand Pose Estimation with Dorsal Features in Egocentric ViewsThe proliferation of XR devices has made egocentric hand pose estimation a vital task, yet this perspective is inherently challenged by frequent finger occlusions. To address this, we propose a novel approach that leverages the rich information in dorsal hand skin deformation, unlocked by recent advances in dense visual featurizers. We introduce a dual-stream delta encoder that learns pose by contrasting features from a dynamic hand with a baseline relaxed position. Our evaluation demonstrates that, using only cropped dorsal images, our method reduces the Mean Per Joint Angle Error (MPJAE) by 18% in self-occluded scenarios (fingers >= 50% occluded) compared to state-of-the-art techniques that depend on the whole hand's geometry and large model backbones. Consequently, our method not only enhances the reliability of downstream tasks like index finger pinch and tap estimation in occluded scenarios but also unlocks new interaction paradigms, such as detecting isometric force for a surface "click" without visible movement while minimizing model size.2026WHWilliam Huang et al.Unversity of California, Los AngelesEye Tracking & Gaze InteractionHand Gesture RecognitionImmersion & Presence ResearchCHI
Peeking Ahead of the Field Study: Exploring VLM Personas as Support Tools for Embodied Studies in HCIField studies are irreplaceable but costly, time-consuming, and error-prone, which need careful preparation. Inspired by rapid-prototyping in manufacturing, we propose a fast, low-cost evaluation method using Vision-Language Model (VLM) personas to simulate outcomes comparable to field results. While LLMs show human-like reasoning and language capabilities, autonomous vehicle (AV)-pedestrian interaction requires spatial awareness, emotional empathy, and behavioral generation. This raises our research question: To what extent can VLM personas mimic human responses in field studies? We conducted parallel studies: 1) one real-world study with 20 participants, and 2) one video-study using 20 VLM personas, both on a street-crossing task. We compared their responses and interviewed five HCI researchers on potential applications. Results show that VLM personas mimic human response patterns (e.g., average crossing times of 5.25 s vs. 5.07 s) lack the behavioral variability and depth. They show promise for formative studies, field study preparation, and human data augmentation.2026XGXinyue Gui et al.The University of TokyoAutomated Driving Interface & Takeover DesignExternal HMI (eHMI) — Communication with Pedestrians & CyclistsUser Research Methods (Interviews, Surveys, Observation)CHI
Designing Privacy Choice in Generative AI Chatbot EcosystemsGenerative AI (GenAI) is evolving from standalone tools to interconnected ecosystems that integrate chatbots, cloud platforms, and third-party services. While this ecosystem model enables personalization and extended services, it also introduces complex information flows and amplifies privacy risks. Existing solutions focus on system-level protections, offering little support for users to make meaningful privacy choices. To address this gap, we conducted two vignette-based survey studies with 486 participants and a follow-up interview study with 16 participants. We also explored users’ needs and preferences for privacy choice design across both GenAI personalization and data-sharing. Our results reveal paradoxical patterns: participants sometimes trusted third-party ecosystems more for personalization but perceived greater control in first-party ecosystems when data was shared externally. We discuss design implications for privacy choice interfaces that enhance transparency, control, and trust in GenAI ecosystems.2026LLLanjing Liu et al.Johns Hopkins UniversityGenerative AI (Text, Image, Music, Video)Privacy by Design & User ControlPrivacy Perception & Decision-MakingCHI
Surfacing and Applying Meaning: Supporting Hermeneutical Autonomy for LGBTQ+ People in TaiwanAfter Taiwan’s legalization of same-sex marriage in 2019, LGBTQ+ communities continue to face hostility on social media. Using the lens of hermeneutical injustice and autonomy, we examine how technological conditions affect LGBTQ+ individuals’ identity exploration, narrative seeking, and community resilience. We conducted a multi-stage study with Taiwanese LGBTQ+ individuals, including in-depth interviews, participatory design workshops, and evaluation sessions. Participants described fragile yet creative strategies such as seeking validation in online interactions, reframing hostile content through theory, and relying on allies. Building on these insights, we designed and evaluated a retrieval-augmented, LLM-powered chatbot with four modes of interaction: reflection, validation, discussion, and allyship. Findings show that the system fosters hermeneutical autonomy by helping participants reframe hostile narratives, validate lived experiences, and scaffold identity exploration, while reducing the hermeneutical labor of navigating social media hostility. We conclude by outlining design implications for AI systems that advance hermeneutical autonomy through fluid self-representation, contextualized dialogue, and inclusive community participation.2026YCYi-Tong Chen et al.National Taiwan UniversityAgent Personality & AnthropomorphismHuman-LLM CollaborationAI Ethics, Fairness & AccountabilityCHI
Barriers that Programming Instructors Face While Performing Emergency Pedagogical Design to Shape Student-AI Interactions with Generative AI ToolsGenerative AI (GenAI) tools are increasingly pervasive, pushing instructors to redesign how students use GenAI tools in coursework. We conceptualize this work as emergency pedagogical design: reactive, indirect efforts by instructors to shape student-AI interactions without control over commercial interfaces. To understand practices of lead users conducting emergency pedagogical design, we conducted interviews (n=13) and a survey (n=169) of computing instructors. These instructors repeatedly encountered five barriers: fragmented buy-in for revising courses; policy crosswinds from non-prescriptive institutional guidance; implementation challenges as instructors attempt interventions; assessment misfit as student-AI interactions are only partially visible to instructors; and lack of resources, including time, staffing, and paid tool access. We use these findings to present emergency pedagogical design as a distinct design setting for HCI and outline recommendations for HCI researchers, academic institutions, and organizations to effectively support instructors in adapting courses to GenAI.2026SLSam Lau et al.University of California San DiegoHuman-LLM CollaborationProgramming Education & Computational ThinkingIntelligent Tutoring Systems & Learning AnalyticsCHI
Who am I Talking to? A Large-Scale Measurement of Surface Attribution Across Real-World Security and Privacy InterfacesModern user interfaces are complex composites, with elements originating from various sources, such as the operating system, apps, a web browser, or websites. We posit that security and privacy decisions can to some extent depend on users correctly identifying an element's source, a concept we term "surface attribution." Through two large-scale vignette-based surveys (N=4,400 and N=3,057), we present the first empirical measurement of this ability. We find that users struggle, correctly attributing UI source only 55% of the time on desktop and 53% on mobile. Familiarity and strong brand cues are associated with improved accuracy, whereas UI positioning, a long-held security design concept especially for browsers, has minimal impact. Furthermore, simply adding a "Security & Privacy" brand cue to Android permission prompts failed to improve attribution. These findings demonstrate a fundamental gap in users' mental models, indicating that relying on them to distinguish trusted UI is a fragile security paradigm.2026MHMarian Harbach et al.GooglePrivacy by Design & User ControlPrivacy Perception & Decision-MakingExplainable AI (XAI)CHI
“It Became My Buddy, But I’m Not Afraid to Disagree”: A Multi-Session Study of UX Evaluators Collaborating with Conversational AI AssistantsAI-assisted usability analysis can potentially reduce the time and effort of finding usability problems, yet little is known about how AI's perceived expertise influences evaluators' analytic strategies and perceptions over time. We ran a within-subjects, five-session study (six hours per participant) with 12 professional UX evaluators who worked with two conversational assistants designed to appear novice- or expert-like (differing in suggestion quantity and response accuracy). We logged behavioral measures (number of passes, suggestion acceptance rate), collected subjective ratings (trust, perceived efficiency), and conducted semi-structured interviews. Participants experienced an initial novelty effect and a subsequent dip in trust that recovered over time. Their efficiency improved as they shifted from a two-pass to a one-pass video inspection approach. Evaluators ultimately rated the experienced CA as significantly more efficient, trustworthy, and comprehensive, despite not perceiving expertise differences early on. We conclude with design implications for adapting AI expertise to enable calibrated human-AI collaboration.2026EKEmily Kuang et al.York UniversityHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationAgent Personality & AnthropomorphismCHI
"It didn’t feel right but I needed a job so desperately": Understanding People’s Emotions and Help Needs During ScamsOnline financial scams represent a long-standing and serious threat for which people seek help. We present a study to understand people’s in situ motivations for engaging with scams and the help needs they express before, during, and after encountering a scam. We identify the main emotions scammers exploited (e.g., fear, hope) and characterize how they did so. We examine factors -- such as financial insecurity and legal precarity -- which elevate people’s risk of engaging with specific scams and experiencing harm. We indicate when people sought help and describe their help-seeking needs and emotions at different stages of the scam. We discuss how these needs could be met through the design of contextually-specific prevention, diagnostic, mitigation, and recovery interventions.2026JCJake Chanenson et al.GooglePrivacy Perception & Decision-MakingOnline Harassment & Counter-ToolsPrivacy by Design & User ControlCHI
AgentHands: Generating Interactive Hand Gestures for Spatially Grounded Agent Conversations in XRCommunicating spatial tasks via text or speech creates ``a mental mapping gap'' that limits an agent’s expressiveness. Inspired by co-speech gestures in face-to-face conversation, we propose \textsc{AgentHands}, an LLM-powered XR system that equips agents with hands to render responses clearer and more engaging. Guided by a design taxonomy distilled from a formative study (N=10), we implement a novel pipeline to generate and render a hand agent that augments conversational responses with synchronized, space-aware, and interactive hand gestures: using a meta-instruction, \textsc{AgentHands} generates verbal responses embedded with \textit{GestureEvents} aligned to specific words; each event specifies gesture type and parameters. At runtime, a parser converts events into time-stamped poses and motions, driving an animation system that renders expressive hands synchronized with speech. In a within-subjects study (N=12), \textsc{AgentHands} increased engagement and made spatially grounded conversations easier to follow compared to a speech-only baseline.2026ZLZiyi Liu et al.Purdue UniversityIdentity & Avatars in XRAffective Human-Computer DialogueImmersion & Presence ResearchCHI
Preshaping Hand Behaviour for Direct and Indirect Manipulation of 3D ObjectsEffortless manipulation informs and relies on preshaping: the subconscious posing of the hand before grasping. Virtual environments and the design of interaction techniques alters interaction requirements like contact and reach, forcing behavioural adaptation. We present a comparative study investigating preshaping behaviour across direct versus indirect (gaze-assisted) and bare-hand versus controller techniques on a docking task. Results reveal that response patterns scale with anticipated task difficulty, and that direct techniques elicit effective posing of the hand. Indirect techniques shortcut hand transport and, in turn lacks the sensory feedback to guide planning, inducing efficient but attenuated responses that necessitate compensatory manipulation and clutching. Notably, controllers that afford in-hand rotation allow users to extend their range of motion. These findings can inform interaction design to better afford preshaping and optimise 3D manipulation tasks.2026TMThorbjørn Mikkelsen et al.Aarhus UniversityHand Gesture RecognitionFull-Body Interaction & Embodied Input3D Modeling & AnimationCHI