The Behavioral Fabric of LLM-Powered GUI Agents: Human Values and Interaction OutcomesLarge Language Model (LLM)-powered web GUI agents are increasingly automating everyday online tasks. Despite their popularity, little is known about how users' preferences and values impact agents' reasoning and behavior. In this work, we investigate how both explicit and implicit user preferences, as well as the underlying user values, influence agent decision-making and action trajectories. We built a controlled testbed of 14 common interactive web tasks, spanning shopping, travel, dining, and housing, each replicated from real websites and integrated with a low-fidelity LLM-based recommender system. We injected 12 human preferences and values as personas into four state-of-the-art agents and systematically analyzed their task behaviors. Our results show that preference and value-infused prompts consistently guided agents toward outcomes that reflected these preferences and values. While the absence of user preference or value guidance led agents to exhibit a strong efficiency bias and employ shortest-path strategies, their presence steered agents' behavior trajectories through the greater use of corresponding filters and interactive web features. Despite their influence, dominant interface cues, such as discounts and advertisements, frequently overrode these effects, shortening the agents' action trajectories and inducing rationalizations that masked rather than reflected value-consistent reasoning. The contributions of this paper are twofold: (1) an open-source testbed for studying the influence of values in agent behaviors, and (2) an empirical investigation of how user preferences and values shape web agent behaviors.2026SGSimret Araya Gebreegziabher et al.University of Notre DameHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationAI Ethics, Fairness & AccountabilityIUI
"Over-the-Hood" AI Inclusivity Bugs and How 3 AI Product Teams Found and Fixed ThemWhile much research has shown the presence of AI's "under-the-hood'" biases (e.g., algorithmic, training data, etc.), what about "over-the-hood" inclusivity biases: barriers in user-facing AI products that disproportionately exclude users with certain problem-solving approaches? Recent research has begun to report the existence of such biases—but what do they look like, how prevalent are they, and how can developers find and fix them? To find out, we conducted a field study with 3 AI product teams, to investigate what kinds of AI inclusivity bugs exist uniquely in user-facing AI products, and whether/how AI product teams might harness an existing (non-AI-oriented) inclusive design method to find and fix them. The teams' work revealed 83 instances of 6 AI inclusivity bug types unique to user-facing AI products, their fixes covering 47 bug instances, and a new GenderMag inclusive design method variant, GenderMag-for-AI, that is especially effective at detecting AI inclusivity bugs when the AI's output is not necessarily believed.2026AAAndrew A. Anderson et al.IBM ResearchAI Ethics, Fairness & AccountabilityInclusive DesignParticipatory DesignIUI
"Better Ask for Forgiveness than Permission": Practices and Policies of AI Disclosure in Freelance WorkThe growing use of AI applications among freelance workers is reshaping trust and relationships with clients. This paper investigates how both workers and clients perceive AI use and disclosure in the freelance economy through a three-stage study: interviews with workers and two survey studies with workers and clients. Findings first reveal a key expectation gap around disclosure: Workers often adopt passive disclosure practices, revealing AI use only when asked, as they assume clients can already detect it. Clients, however, are far less confident in recognizing AI-assisted work and prefer proactive disclosure. A second finding highlights the role of unclear or absent client AI policies, which leave workers consistently misinterpreting clients' expectations for AI use and disclosure. Together, these gaps point to the need for clearer guidelines and practices for AI disclosure. Insights extend beyond freelancing, offering implications for trust, accountability, and policy design in other AI-mediated work domains.2026AHAngel Hsing-Chi Hwang et al.University of Southern CaliforniaAI-Assisted Decision-Making & AutomationAI Ethics, Fairness & AccountabilityPrivacy by Design & User ControlCHI
How Does Delegation in Social Interaction Evolve Over Time? Navigation with a Robot for Blind PeopleAutonomy and independent navigation are vital to daily life but remain challenging for individuals with blindness. Robotic systems can enhance mobility and confidence by providing intelligent navigation assistance. However, fully autonomous systems may reduce users’ sense of control, even when they wish to remain actively involved. Although collaboration between user and robot has been recognized as important, little is known about how perceptions of this relationship change with repeated use. We present a repeated exposure study with six blind participants who interacted with a navigation-assistive robot in a real-world museum. Participants completed tasks such as navigating crowds, approaching lines, and encountering obstacles. Findings show that participants refined their strategies over time, developing clearer preferences about when to rely on the robot versus act independently. This work provides insights into how strategies and preferences evolve with repeated interaction and offers design implications for robots that adapt to user needs over time.2026RHRayna Hata et al.Carnegie Mellon UniversityRobots in Education & HealthcareCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Elderly Care & Dementia SupportCHI
From Reflection to Repair: A Scoping Review of Dataset Documentation ToolsDataset documentation is widely recognized as essential for the responsible development of automated systems. Despite growing efforts to support documentation through different kinds of artifacts, little is known about the motivations shaping documentation tool design or the factors hindering their adoption. We present a systematic review supported by mixed-methods analysis of 59 dataset documentation publications to examine the motivations behind building documentation tools, how authors conceptualize documentation practices, and how these tools connect to existing systems, regulations, and cultural norms. Our analysis shows four persistent patterns in dataset documentation conceptualization that potentially impede adoption and standardization: unclear operationalizations of documentation’s value, decontextualized designs, unaddressed labor demands, and a tendency to treat integration as future work. Building on these findings, we propose a shift in Responsible AI tool design toward institutional rather than individual solutions, and outline actions the HCI community can take to enable sustainable documentation practices.2026PRPedro Reynolds-Cuéllar et al.Robotics and AI InstituteExplainable AI (XAI)Research Ethics & Open ScienceAI-Assisted Decision-Making & AutomationCHI
Eyes on the Finger: Investigating a Ring-Shaped Camera for Seamless Accessible Tactile ExplorationTactile exploration is essential for blind and low vision (BLV) individuals to understand objects and spaces. Yet little is known about how camera-based devices can support hand-centric exploration: tactilely examining exhibits while inquiring about and processing information. We investigate a finger-worn ring camera that captures images from the palm side while allowing tactile exploration, comparing it with hand-centered smartphones. We conducted a Wizard-of-Oz study with 11 BLV participants in a science museum. Results showed that the ring camera supported effective bimanual strategies: exploring with both hands, lifting the camera-worn hand while keeping the other as an anchor during inquiry, and resuming bimanual touch for information processing. In contrast, smartphones led to effortful, fragmented exploration. Building on these findings, we developed an interactive system and evaluated its reliability and practicality with 6 BLV participants. We contribute insights and design implications for wearable camera systems that augment tactile exploration in real-world settings.2026ATAyaka Tsutsui et al.University of TsukubaVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Haptic WearablesSmartwatches & Fitness BandsCHI
Robot-Assisted Group Tours for Blind PeopleGroup interactions are essential to social functioning, yet effective engagement relies on the ability to recognize and interpret visual cues, making such engagement a significant challenge for blind people. In this paper, we investigate how a mobile robot can support group interactions for blind people. We used the scenario of a guided tour with mixed-visual groups involving blind and sighted visitors. Based on insights from an interview study with blind people (n=5) and museum experts (n=5), we designed and prototyped a robotic system that supported blind visitors to join group tours. We conducted a field study in a science museum where each blind participant (n=8) joined a group tour with one guide and two sighted participants (n=8). Findings indicated users' sense of safety from the robot's navigational support, concerns in the group participation, and preferences for obtaining environmental information. We present design implications for future robotic systems to support blind people's mixed-visual group participation.2026YHYaxin Hu et al.University of Wisconsin-MadisonSocial Robot InteractionRobots in Education & HealthcareVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
Transformer Explainer: Learning LLM Transformers with Interactive Visual Explanation and ExperimentationThe Transformer architecture underpins modern large language models powering state-of-the-art text generation and AI applications. However, its complexity makes it difficult for non-experts to learn. Existing resources often lack interactivity, rely on static descriptions of simplified architectures, or fail to reflect models’ behavior with real data. To address this gap, we introduce Transformer Explainer, an interactive visualization tool for non-experts to learn Transformers. The tool integrates an overview illustrating the Transformer's data flow with on-demand explanations that gradually reveal mathematical details. Smooth transitions across abstraction levels highlight the interplay between high-level structures and low-level operations. Running a live GPT-2 instance directly in the browser, Transformer Explainer empowers learners to experiment with custom input and hyperparameters without setup, observing next-token predictions in real time. A 90-participant user study showed that our tool offered significant advantages in improving user understanding and engagement. Transformer Explainer has attracted over 490,000 users.2026ACAeree Cho et al.Georgia Institute of TechnologyGenerative AI (Text, Image, Music, Video)Interactive Data VisualizationPrototyping & User TestingCHI
Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to EvaluationRecently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial Intelligence. However, most existing work emphasizes technical benchmarks and attack success rates, leaving the socio-technical practices of how red teaming datasets are defined, created, and evaluated under-examined. Drawing on 22 interviews with practitioners who design and evaluate red teaming datasets, we examine the data practices and standards that underpin this work. Because adversarial datasets determine the scope and accuracy of model evaluations, they are critical artifacts for assessing potential harms from large language models. Our contributions are first, empirical evidence of practitioners conceptualizing red teaming and developing and evaluating red teaming datasets. Second, we reflect on how practitioners’ conceptualization of risk leads to overlooking the context, interaction type, and user specificity. We conclude with three opportunities for HCI researchers to expand the conceptualization and data practices for red-teaming.2026AGAdriana Alvarado Garcia et al.IBM ResearchExplainable AI (XAI)AI Ethics, Fairness & AccountabilityAlgorithmic Transparency & AuditabilityCHI
Who Gets to Define Safety? A Systematic Review of How Generative AI Research Addresses Youth Online SafetyGenerative AI is rapidly reshaping young people’s digital experiences, from providing emotional support to introducing new dimensions of risks. Yet, existing safety frameworks are not equipped to handle the unique risks posed by GenAI. To investigate how youth safety is being addressed in this new landscape, we conducted a systematic review of (N=30) GenAI-youth studies from 2014-2025. We found that GenAI-youth-related research was primarily led by AI experts with minimal involvement from youth development experts or young people themselves. Safety was typically framed as a technical system feature, optimized through filters, benchmarks, or guardrails, rather than a relational, contextual, and developmentally grounded concern. We call on the HCI community to re-evaluate its approach to participation in AI. We must move beyond reactive, system-driven GenAI approaches to youth safety towards a more holistic, proactive model where multistakeholder inclusion is a core aspect throughout the AI-lifecycle, leading to safer and equitable systems. Content Warning: This paper discusses sensitive topics, such as self-harm, which may be triggering.2026OOOzioma Collins Oguine et al.University of Notre DameGenerative AI (Text, Image, Music, Video)Youth Online Safety & PrivacyAI Ethics, Fairness & AccountabilityCHI
Current and Future Use of Large Language Models for Knowledge WorkLarge Language Models (LLMs) have introduced a paradigm shift in interaction with AI technology, enabling knowledge workers to complete tasks by specifying their desired outcome in natural language. LLMs have the potential to increase productivity and reduce tedious tasks in an unprecedented way. A systematic study of LLM adoption for work can provide insight into how LLMs can best support these workers. To explore knowledge workers' current and desired usage of LLMs, we ran a survey (n=216). Workers described tasks they already used LLMs for, like generating code or improving text, but imagined a future with LLMs integrated into their workflows and data. We ran a second survey (n=107) a year later that validated our initial findings and provides insight into up-to-date LLM use by knowledge workers. We discuss implications for adoption and design of generative AI technologies for knowledge work.2025MBMichelle Brachman et al.Working with AICSCW
Helping the Helper: Supporting Peer Counselors via AI-Empowered Practice and FeedbackMillions of users come to online peer counseling platforms to seek support. However, studies show that online peer support groups are not always as effective as expected largely due to users' negative experiences with unhelpful counselors. Peer counselors are key to the success of online peer counseling platforms, but most often do not receive appropriate training. Hence, we introduce CARE: an AI-based tool to empower and train peer counselors through practice and feedback. Concretely, CARE helps diagnose which counseling strategies are needed in a given situation and suggests example responses to counselors during their practice sessions. Building upon the Motivational Interviewing framework, CARE utilizes large-scale counseling conversation data with text generation techniques to enable these functionalities. We demonstrate the efficacy of CARE by performing quantitative evaluations and qualitative user studies through simulated chats and semi-structured interviews, finding that CARE especially helps novice counselors in challenging situations. The code is available at https://app.box.com/s/z3a4dwgmeqfy8vbzi9cgmg0yhn6t4j53.2025SHShang-Ling (Kate) Hsu et al.Caring at a DistanceCSCW
Togedule: Scheduling Meetings with Large Language Models and Adaptive Representations of Group AvailabilityScheduling is a perennial—and often challenging—problem for many groups. Existing tools are mostly static, showing an identical set of choices to everyone, regardless of the current status of attendees' inputs and preferences. In this paper, we propose Togedule, an adaptive scheduling tool that uses large language models to dynamically adjust the pool of choices and their presentation format. With the initial prototype, we conducted a formative study (N=10) and identified the potential benefits and risks of such an adaptive scheduling tool. Then, after enhancing the system, we conducted two controlled experiments, one each for attendees and organizers (total N=66). For each experiment, we compared scheduling with verbal messages, shared calendars, or Togedule. Results show that Togedule significantly reduces the cognitive load of attendees indicating their availability and improves the speed and quality of the decisions made by organizers.2025JSJaeyoon Song et al.Working with AICSCW
Online Safety for All: Sociocultural Insights from a Systematic Review of Youth Online Safety in the Global SouthYouth online safety research in HCI has historically centered on perspectives from the Global North, often overlooking the unique particularities and cultural contexts of regions in the Global South. This paper presents a systematic review of 66 youth online safety studies published between 2014 and 2024, specifically focusing on regions in the Global South. Our findings reveal a concentrated research focus in Asian countries and predominance of quantitative methods. We also found limited research on marginalized youth populations and a primary focus on risks related to cyberbullying. Our analysis underscores the critical role of cultural factors in shaping online safety, highlighting the need for educational approaches that integrate social dynamics and awareness. We propose methodological recommendations and a future research agenda that encourages the adoption of situated, culturally sensitive methodologies and youth-centered approaches to researching youth online safety regions in the Global South. This paper advocates for greater inclusivity in youth online safety research, emphasizing the importance of addressing varied sociocultural contexts to better understand and meet the online safety needs of youth in the Global South.2025OOOzioma Collins Oguine et al.Trust, Safety, and Privacy in Online CommunitiesCSCW
EvalAssist: Insights on Task-Specific Evaluations and AI-Assisted Judgment Strategy PreferencesWith the broad availability of large language models and their ability to generate vast outputs using varied prompts and configurations, determining the best output for a given task requires an intensive evaluation process, one where machine learning practitioners must decide how to assess the outputs and then carefully carry out the evaluation. This process is both time-consuming and costly. As practitioners work with an increasing number of models, they must now evaluate outputs to determine which model performs best for a given task. LLMs are increasingly used as evaluators to filter training data, evaluate model performance or assist human evaluators with detailed assessments. Our application, EvalAssist, supports this process by aiding users in interactively refining evaluation criteria. In our study with machine learning practitioners (n=15), each completing 6 tasks yielding 131 evaluations, we explore how task-related factors and judgment strategies influence criteria refinement and user perceptions. Findings show that users performed more evaluations with direct assessment by making criteria task-specific, modifying judgments, and changing the AI evaluator model. We conclude with recommendations for how systems can better support practitioners with AI-assisted evaluations.2025ZAZahra Ashktorab et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationUIST
"The Diagram is like Guardrails": Structuring GenAI-assisted Hypotheses Exploration with an Interactive Shared RepresentationData analysis encompasses a spectrum of tasks, from high-level conceptual reasoning to lower-level execution. While AI-powered tools increasingly support execution tasks, there remains a need for intelligent assistance in conceptual tasks. This paper investigates the design of an ordered node-link tree interface augmented with AI-generated information hints and visualizations, as a potential shared representation for hypothesis exploration. Through a design probe (n=22), participants generated diagrams averaging 21.82 hypotheses. Our findings showed that the node-link diagram acts as "guardrails" for hypothesis exploration, facilitating structured workflows, providing comprehensive overviews, and enabling efficient backtracking. The AI-generated information hints, particularly visualizations, aided users in transforming abstract ideas into data-backed concepts while reducing cognitive load. We further discuss how node-link diagrams can support both parallel exploration and iterative refinement in hypothesis formulation, potentially enhancing the breadth and depth of human-AI collaborative data analysis.2025ZDZijian Ding et al.Human-LLM CollaborationInteractive Data VisualizationC&C
Can LLMs Recommend More Responsible Prompts?Human-Computer Interaction practitioners have been proposing best practices in user interface design for decades. However, generative Artificial Intelligence (GenAI) brings additional design considerations and currently lacks sufficient user guidance regarding affordances, inputs, and outputs. In this context, we developed a recommender system to promote responsible AI (RAI) practices while people prompt GenAI systems, by recommending addition of sentences based on social values and removal of harmful sentences. We detail a lightweight recommender system designed to be used in prompting-time and compare its recommendations to the ones provided by three base large language models (LLMs) and two LLMs fine-tuned for the task, i.e., recommending inclusion of sentences based on social values and removal of harmful sentences from a given prompt. Results indicate that our approach has the best F1-score balance in terms of recommendations for additions and removal of sentences to promote responsible prompts, while a fine-tuned model obtained the best F1-score for additions, and our approach obtained the best F1-score for removals of harmful sentences. In addition, fine-tuned models improved the objectiveness of responses by reducing the verbosity of generated content in 93% when compared to the content generated by base models. Presented findings contribute to RAI by showing the limits and bias of existing LLMs in terms of recommendations on how to create more responsible prompts and how open-source technologies can fill this gap in prompting-time.2025VSVagner Figueredo de Santana et al.Human-LLM CollaborationAI Ethics, Fairness & AccountabilityRecommender System UXIUI
Building Appropriate Mental Models: What Users Know and Want to Know about an Agentic AI ChatbotAgentic systems aim to handle complex problems with increasing system autonomy using generative AI. These new agentic systems are becoming more feasible and easier to build. Yet we know little about what end-users need to know to use these systems appropriately. We study one such agentic system, "Gent," which can break down complex problems into a set of actions, provide a rationale for each action, interact with external information, and cite its sources. Our goals were to understand users' mental models of the agentic system, the information users leveraged to evaluate the accuracy of the system, and users' information needs. In our study (N=24), participants interacted with Gent for four information seeking tasks where they could see Gent’s actions, rationale, and sources. Participants' mental models centered around the search-like qualities of the system, with their confidence impacted by the website sources. Participants' mental models often lacked insight into the workings of the generative AI model and agentic framework that impact the actions the system takes. Participants used the descriptions of the system's actions to support their evaluation of the accuracy of the system and wanted to know more about how the system got to its answers. Participants also relied on their own personal knowledge and the style or length of Gent's responses to evaluate the accuracy. Our results highlight the need for further transparency in agentic AI systems to support end-users in evaluating system outputs and help them build effective mental models.2025MBMichelle Brachman et al.Conversational ChatbotsAgent Personality & AnthropomorphismExplainable AI (XAI)IUI
Controlling AI Agent Participation in Group Conversations: A Human-Centered ApproachConversational AI agents are commonly applied within single-user, turn-taking scenarios. The interaction mechanics of these scenarios are trivial: when the user enters a message, the AI agent produces a response. However, the interaction dynamics are more complex within group settings. How should an agent behave in these settings? We report on two experiments aimed at uncovering users' experiences of an AI agent's participation within a group, in the context of group ideation (brainstorming). In the first study, participants benefited from and preferred having the AI agent in the group, but participants disliked when the agent seemed to dominate the conversation and they desired various controls over its interactive behaviors. In the second study, we created functional controls over the agent's behavior, operable by group members, to validate their utility and probe for additional requirements. Integrating our findings across both studies, we developed a taxonomy of controls for when, what, and where a conversational AI agent in a group should respond, who can control its behavior, and how those controls are specified and implemented. Our taxonomy is intended to aid AI creators to think through important considerations in the design of mixed-initiative conversational agents.2025SHStephanie Houde et al.Conversational ChatbotsAgent Personality & AnthropomorphismIUI
Which Contributions Deserve Credit? Perceptions of Attribution in Human-AI Co-CreationAI systems powered by large language models can act as capable assistants for writing and editing. In these tasks, the AI system acts as a co-creative partner, making novel contributions to an artifact-under-creation alongside its human partner(s). One question that arises in these scenarios is the extent to which AI should be credited for its contributions. We examined knowledge workers' views of attribution through a survey study (N=155) and found that they assigned different levels of credit across different contribution types, amounts, and initiative. Compared to a human partner, we observed a consistent pattern in which AI was assigned less credit for equivalent contributions. Participants felt that disclosing AI involvement was important and used a variety of criteria to make attribution judgments, including the quality of contributions, personal values, and technology considerations. Our results motivate and inform new approaches for crediting AI contributions to co-created work.2025JHJessica He et al.IBM ResearchHuman-LLM CollaborationExplainable AI (XAI)AI-Assisted Decision-Making & AutomationCHI