"Over-the-Hood" AI Inclusivity Bugs and How 3 AI Product Teams Found and Fixed ThemWhile much research has shown the presence of AI's "under-the-hood'" biases (e.g., algorithmic, training data, etc.), what about "over-the-hood" inclusivity biases: barriers in user-facing AI products that disproportionately exclude users with certain problem-solving approaches? Recent research has begun to report the existence of such biases—but what do they look like, how prevalent are they, and how can developers find and fix them? To find out, we conducted a field study with 3 AI product teams, to investigate what kinds of AI inclusivity bugs exist uniquely in user-facing AI products, and whether/how AI product teams might harness an existing (non-AI-oriented) inclusive design method to find and fix them. The teams' work revealed 83 instances of 6 AI inclusivity bug types unique to user-facing AI products, their fixes covering 47 bug instances, and a new GenderMag inclusive design method variant, GenderMag-for-AI, that is especially effective at detecting AI inclusivity bugs when the AI's output is not necessarily believed.2026AAAndrew A. Anderson et al.IBM ResearchAI Ethics, Fairness & AccountabilityInclusive DesignParticipatory DesignIUI
Prompt Coaching for Inclusiveness: A Media Literacy Approach to Increase Users’ Awareness of Algorithmic Bias and Prompting EfficacyLarge language models often produce biased or stereotypical outputs. One way to reduce this possibility is to be more inclusive in our prompts, but doing so may not come naturally to most users. Therefore, we designed a tool that coaches users to write more inclusive prompts—a strategy that leverages design friction to provide a media literacy intervention. Data from a user study (N=344) show that compared to no coaching, inclusive prompt coaching directly increased users’ awareness of algorithmic bias and their perceived prompting efficacy. It also indirectly enhanced their trust in the system and perceived trust calibration through cognitive elaboration. However, inclusive prompt coaching resulted in a less satisfying user experience. These findings have implications for ethical interventions in prompting for better communicating and combating algorithmic bias. We discuss the benefits and limitations of inclusive prompt coaching, as well as ways to balance usability for long-term adoption of generative AI systems.2026CCCheng Chen et al.Oregon State UniversityHuman-LLM CollaborationAI Ethics, Fairness & AccountabilityInclusive DesignCHI
Relational Gains, Privacy Strains: Exploring Users’ Perceptions and Experiences with ChatGPT’s Memory FeatureChatGPT’s memory feature is designed to provide users with greater control and more helpful responses. Yet, it remains unclear how users perceive this feature in relation to privacy. To address this gap, we conducted interviews with 20 ChatGPT users from diverse backgrounds. Our findings revealed four major characteristics that distinguish ChatGPT's memory from human memory: perceived unforgetfulness, detailedness, accuracy, and lack of emotions, highlighting the machine-like nature of AI memory. Moreover, both ChatGPT's memory and human memory were perceived as beneficial for relationship building. Notably, most participants experienced negative expectancy violations after learning what ChatGPT remembered about them. They expressed a strong need for greater visibility, accessibility, transparency, and user control in the design of future memory features. Drawing on users' suggestions and theoretical frameworks on privacy management, we provide design implications for developing a more transparent, responsible, and user-aligned memory experience that helps them navigate privacy-personalization trade-offs when interacting with LLM-based memories.2026CCCheng Chen et al.Oregon State UniversityHuman-LLM CollaborationPrivacy by Design & User ControlPrivacy Perception & Decision-MakingCHI
"Fast, easy, simple"? SES-diverse transfer students' sociotechnical experiences registering for classesRecruiting, retaining, and educating students in computing is a frequent research topic in CHI. However, students' sociotechnical experiences of registering for classes are understudied -- especially those of socioeconomic-diverse students. These experiences matter: research shows that registration problems bring long-term consequences to student successes. We investigate students' socioeconomic status (SES) impact on registration experiences through three studies: a case study with education professionals using an emerging analytic method, SocioeconomicMag (SESMag); interviews with faculty/staff/students from 8 universities; and observations of 14 SES-diverse students registering for classes. Results showed: (1) 5 SES-inclusivity bugs which arose 30 times, 72% more often by lower-SES students than by higher-SES students. (2) 6/7 lower-SES students (but only 2/7 higher-SES students) expected downstream problems from the registration issues. (3) The risk-to-negative-outcomes rate was 3 times higher for lower-SES students.2026ABAlec Busteed et al.Oregon State UniversityProgramming Education & Computational ThinkingIntelligent Tutoring Systems & Learning AnalyticsInclusive DesignCHI
Outcomes, Perceptions, and Interaction Strategies of Novice Programmers Studying with ChatGPTLarge Language Model (LLM) conversational agents are increasingly used in programming education, yet we still lack insight into how novices engage with them for conceptual learning compared with human tutoring. This mixed-methods study compared learning outcomes and interaction strategies of novices using ChatGPT or human tutors. A controlled lab study with 20 students enrolled in introductory programming courses revealed that students employ markedly different interaction strategies with AI versus human tutors: ChatGPT users relied on brief, zero-shot prompts and received lengthy, context-rich responses but showed minimal prompt refinement, while those working with human tutors provided more contextual information and received targeted explanations. Although students distrusted ChatGPT’s accuracy, they paradoxically preferred it for basic conceptual questions due to reduced social anxiety. We offer empirically grounded recommendations for developing AI literacy in computer science education and designing learning-focused conversational agents that balance trust-building with maintaining the social safety that facilitates uninhibited inquiry.2025JPJacob Penney et al.Human-LLM CollaborationProgramming Education & Computational ThinkingCUI
Is Innovation Shaped by Masculine Norms? A Longitudinal Case Study of a Consumer ProductIn theories and metrics of product innovation, gender is invisible or ignored, and innovative products are presumed to be gender-neutral or agnostic. Yet, many ostensibly-innovative consumer products overlook the needs of women and gender non-conforming individuals, suggesting an implicit masculine framing. This research introduces a mixed-methods approach for analyzing gender scripts in product features and marketing, applied to a case study of the Apple Watch (2015–2024). Findings reveal a sustained reinforcement of gender norms: masculine-coded language and industrial design dominate how innovation is presented, even as objective technical improvements decline. In contrast, feminine-coded features, especially relational or user-centered ones, receive less emphasis in innovation framing. This work demonstrates how masculine value systems shape perceptions and theories of innovation and offers opportunities for future research on gender and design.2025CBCaseysimone Ballestas et al.Inclusive DesignGender & Race Issues in HCITechnology Ethics & Critical HCIDIS
Measuring User Experience Inclusivity in Human-AI Interaction via Five User Problem-Solving StylesMotivations: Recent research has emerged on generally how to improve AI products’ human-AI interaction (HAI) user experience (UX), but relatively little is known about HAI-UX inclusivity. For example, what kinds of users are supported, and who are left out? What product changes would make it more inclusive? Objectives: To help fill this gap, we present an approach to measuring what kinds of diverse users an AI product leaves out and how to act upon that knowledge. To bring actionability to the results, the approach focuses on users’ problem-solving diversity. Thus, our specific objectives were (1) to show how the measure can reveal which participants with diverse problem-solving styles were left behind in a set of AI products and (2) to relate participants’ problem-solving diversity to their demographic diversity, specifically gender and age. Methods: We performed 18 experiments, discarding two that failed manipulation checks. Each experiment was a 2x2 factorial experiment with online participants, comparing two AI products: one deliberately violating 1 of 18 HAI guidelines and the other applying the same guideline. For our first objective, we used our measure to analyze how much each AI product gained/lost HAI-UX inclusivity compared to its counterpart, where inclusivity meant supportiveness to participants with particular problem-solving styles. For our second objective, we analyzed how participants’ problem-solving styles aligned with their gender identities and ages. Results and Implications: Participants’ diverse problem-solving styles revealed six types of inclusivity results: (1) the AI products that followed an HAI guideline were almost always more inclusive across diversity of problem-solving styles than the products that did not follow that guideline—but “who” got most of the inclusivity varied widely by guideline and by problem-solving style; (2) when an AI product had risk implications, four variables’ values varied in tandem: participants’ feelings of control, their (lack of) suspicion, their trust in the product, and their certainty while using the product; (3) the more control an AI product offered users, the more inclusive it was; (4) whether an AI product was learning from “my” data or other people’s affected how inclusive that product was; (5) participants’ problem-solving styles skewed differently by gender and age group; and (6) almost all of the results suggested actions that HAI practitioners could take to improve their products’ inclusivity further. Together, these results suggest that a key to improving the demographic inclusivity of an AI product (e.g., across a wide range of genders, ages) can often be obtained by improving the product’s support of diverse problem-solving styles.2025AAAndrew Anderson et al.AI Ethics, Fairness & AccountabilityAlgorithmic Fairness & BiasInclusive DesignIUI
Analyzing the Shifts in Users Data Focus in Exploratory Visual AnalysisUsers often begin exploratory visual analysis (EVA) without clear analysis goals but iteratively refine them as they learn more about their data. As an essential step in data science, researchers want to aid EVA by developing responsive and personalized visualization tools. For this, accurate models of users’ exploration behavior are becoming increasingly vital. However, many computational models assume that the human exploration behavior is static, which goes against the dynamic nature of EVA. In this benchmark study, we investigate how users dynamically shift their data focus in EVA and seek to find the best online learning methods for modeling users’ data focus shifts. Through empirical analyses, we find reinforcement learning algorithms are better in this regard than existing approaches from visualization research. Furthermore, we discuss our findings and their impact on the future of user modeling for visualization system design.2025SSSanad Saha et al.Interactive Data VisualizationVisualization Perception & CognitionIUI
Incorporating Sustainability in Electronics Design: Obstacles and OpportunitiesLife cycle assessment (LCA) is a methodology for holistically measuring the environmental impact of a product from initial manufacturing to end-of-life disposal. However, the extent to which LCA informs the design of computing devices remains unclear. To understand how this information is collected and applied, we interviewed 17 industry professionals with experience in LCA or electronics design, systematically coded the interviews, and investigated common themes. These themes highlight the challenge of LCA data collection and reveal distributed decision-making processes where responsibility for sustainable design choices—and their associated costs—is often ambiguous. Our analysis identifies opportunities for HCI technologies to support LCA computation and its integration into the design process to facilitate sustainability-oriented decision-making. While this work provides a nuanced discussion about sustainable design in the information and communication technologies (ICT) hardware industry, we hope our insights will also be valuable to other sectors.2025ZEZachary Englhardt et al.University of Washington, Computer Science and EngineeringSustainable HCIEcological Design & Green ComputingCHI
Reenvisioning Patient Education with Smart Hospital Patient RoomsDawson 等人提出智能医院病房中的患者教育新系统,通过交互界面和实时数据展示提升患者健康素养与治疗依从性,改善医疗服务体验。2024JDJoshua Dawson et al.Intelligent Tutoring Systems & Learning AnalyticsMental Health Apps & Online Support CommunitiesTelemedicine & Remote Patient MonitoringUbiComp
Lateralization Effects in Electrodermal Activity Data Collected Using Wearable DevicesAlchieri 等人研究可穿戴设备收集的皮肤电活动数据偏侧化效应,发现不同身体部位的EDA信号存在显著差异,为情绪与压力监测提供理论依据。2024LALeonardo Alchieri et al.Haptic WearablesBiosensors & Physiological MonitoringUbiComp
The Matchmaker Inclusive Design Curriculum: A Faculty-Enabling Curriculum to Teach Inclusive Design Throughout Undergraduate CSDespite efforts to raise awareness of societal and ethical issues in CS education, research shows students often do not act upon their new awareness (Problem 1). One such issue, well-established by HCI research, is that much of technology contains barriers impacting numerous populations—such as minoritized genders, races, ethnicities, and more. HCI has inclusive design methods that help—but these skills are rarely taught, even in HCI classes (Problem 2). To address Problems 1 and 2, we created the Matchmaker Curriculum to pair CS faculty—including non-HCI faculty—with inclusive design elements to allow for inclusive design skill-building throughout their CS program. We present the curriculum and a field study, in which we followed 18 faculty along their journey. The results show how the Matchmaker Curriculum equipped 88% of these faculty with enough inclusive design teaching knowledge to successfully embed actionable inclusive design skill-building into 13 CS courses.2024RGRosalinda Garcia et al.Oregon State UniversityCollaborative Learning & Peer TeachingSpecial Education TechnologyInclusive DesignCHI
Modelling Experts' Sampling Strategy to Balance Multiple Objectives During Scientific ExplorationsDuring scientific explorations, scientists often hold multiple and often conflicting objectives. Understanding how scientists prioritize and balance these objectives is crucial for developing cognitively-compatible robotic teammates and fostering effective human-robot collaboration. In this study, we seek to improve the cognitive compatibility of robotic algorithms by modelling human' decision making processes under multiple objectives. Collected human decision data from 141 sampling steps indicate that the majority of scientists adopt one of the following objective balancing strategies: (i) A Focus mode, where experts select sampling location to primarily optimize their primary objective; (ii) A Hierarchy mode, where experts hierarchically satisfy foremost their primary objective, then, to a lesser extent, their secondary objective; and (iii) A Trade-off mode, where experts select sampling locations to satisfy all objectives, even the location was not ideal for either objective. To understand how experts choose among the different modes, we quantitatively characterize the three types of strategies, by representing the decision data from each sampling step in an objective function space. Analysis of the strategy types reveals that, experts' adaptation of multi-objective coordinating strategies are primarily governed by two key decision factors: current stages of sampling, and outstanding reward values. This discovery allows the robot to use an extremely simple decision algorithm to connect experts' high-level objectives to desired sampling locations when balancing multiple objectives. Deployment of this algorithm at a planetary-analogue field exploration mission on Mt. Hood demonstrates the potential for robots to use cognitively-compatible algorithms to participate in decision making and aid with the adaptation of sampling plans that align with scientists' high-level goals.2024SLShipeng Liu et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationComputational Methods in HCIHRI
Iterative Robot Waiter Algorithm Design: Service Expectations and Social FactorsMobile robots carrying food in restaurants are here. What service behavior norms do people expect them to follow? This paper evaluates robot waiter algorithms and service parameters for scenarios with two participants at a simulated cocktail event. Varying body-storming inspired context variables such as: “hunger level" and “relationship to each other," robot delivery algorithms (lead, follow, ambient), and participant pose (standing, seated). Due to increasing deployment of robotic systems, companies may need to rapidly iterate on situated, human-cognizant robotic behaviors that take functional and social considerations into account. We utilized a within-subjects design and improvisational methods, in which pairs of people were given a series of context prompts, and told to participate as felt natural. Output variables included whether they took food and post-trial survey ratings of the robot. The results show a positive correlation between food taking (or feelings of obligation to take food) and human or robot initiative, and negative correlation in the mixed-ambient algorithm with no explicit leader. The robot waiter that comes to the table is the clearest and most noticeable. Bringing food one person ordered to the other person was unforgivable. When in doubt, head to the center-point.2024HKHeather Knight et al.Head-Up Display (HUD) & Advanced Driver Assistance Systems (ADAS)Social Robot InteractionHuman-Robot Collaboration (HRC)HRI
Zeno: An Interactive Framework for Behavioral Evaluation of Machine LearningMachine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners discover real-world patterns and validate systematic failures. We conducted 18 semi-structured interviews with ML practitioners to better understand the challenges of behavioral evaluation and found that it is a collaborative, use-case-first process that is not adequately supported by existing task- and domain-specific tools. Using these findings, we designed Zeno, a general-purpose framework for visualizing and testing AI systems across diverse use cases. In four case studies with participants using Zeno on real-world models, we found that practitioners were able to reproduce previous manual analyses and discover new systematic failures.2023ÁCÁngel Alexander Cabrera et al.Carnegie Mellon UniversityExplainable AI (XAI)AI-Assisted Decision-Making & AutomationCHI
Perceptual Pat: A Virtual Human Visual System for Iterative Visualization DesignDesigning a visualization is often a process of iterative refinement where the designer improves a chart over time by adding features, improving encodings, and fixing mistakes. However, effective design requires external critique and evaluation. Unfortunately, such critique is not always available on short notice and evaluation can be costly. To address this need, we present Perceptual Pat, an extensible suite of AI and computer vision techniques that forms a virtual human visual system for supporting iterative visualization design. The system analyzes snapshots of a visualization using an extensible set of filters—including gaze maps, text recognition, color analysis, etc—and generates a report summarizing the findings. The web-based Pat Design Lab provides a version tracking system that enables the designer to track improvements over time. We validate Perceptual Pat using a longitudinal qualitative study involving 4 professional visualization designers that used the tool over a few days to design a new visualization.2023SSSungbok Shin et al.University of MarylandInteractive Data VisualizationUser Research Methods (Interviews, Surveys, Observation)Prototyping & User TestingCHI
TiiS: After-Action Review for AI (AAR/AI)Dodge 等人提出 AAR/AI 行动后复盘框架,旨在通过人工智能辅助技术改进决策回顾与反思过程。2022JDJonathan Dodge et al.AI-Assisted Decision-Making & AutomationIUI
How Do People Rank Multiple Mutant Agents?How might a person decide on which of several AI-powered sequential decision-making systems to rely? For example, imagine car buyer Blair shopping for a self-driving car, or developer Dillon trying to choose an appropriate ML model to use in their application. Their first choice might be infeasible (e.g., too expensive in money or execution time), so they may need to select their second or third choice. To address this question, this paper presents: 1) a new XAI empirical task to measure explanations: "the Ranking Task"; 2) a new strategy for inducing controllable agent variations---Mutant Agent Generation; 3) novel explanations for sequential decision-making agents; 4) an adaptation to the AAR/AI assessment process; and 5) a qualitative study around these devices with 10 participants to investigate how they performed the Ranking Task task on our mutant agents, using our explanations, and structured by AAR/AI. From an XAI researcher perspective, just as mutation testing can be applied to any code, mutant agent generation can be applied to essentially any neural network for which one wants to evaluate an assessment process or explanation type. As to an XAI user's perspective, the participants ranked the agents well overall, but showed the importance of high explanation resolution for close differences between agents. The participants also revealed the importance of supporting a wide diversity of explanation diets and agent "test selection" strategies.2022JDJonathan Dodge et al.Explainable AI (XAI)AI-Assisted Decision-Making & AutomationIUI
FitVid: Responsive and Flexible Video Content AdaptationMobile video-based learning attracts many learners with its mobility and ease of access. However, most lectures are designed for desktops. Our formative study reveals mobile learners' two major needs: more readable content and customizable video design. To support mobile-optimized learning, we present FitVid, a system that provides responsive and customizable video content. Our system consists of (1) an adaptation pipeline that reverse-engineers pixels to retrieve design elements (e.g., text, images) from videos, leveraging deep learning with a custom dataset, which powers (2) a UI that enables resizing, repositioning, and toggling in-video elements. The content adaptation improves the guideline compliance rate by 24% and 8% for word count and font size. The content evaluation study (n=198) shows that the adaptation significantly increases readability and user satisfaction. The user study (n=31) indicates that FitVid significantly improves learning experience, interactivity, and concentration. We discuss design implications for responsive and customizable video adaptation.2022JKJeongyeon Kim et al.KAISTInteractive Data VisualizationOnline Learning & MOOC PlatformsCHI
The Long Road Ahead: Ongoing Challenges in Contributing to Large OSS Organizations and What to DoOpen source communities hosted in large foundations operate in a complex socio-technical ecosystem, which includes a heterogeneous mix of projects and stakeholders. Previous work has thus far investigated the challenges faced in OSS communities from the point of view of specific stakeholders, primarily at the level of individual projects. None have yet studied the challenges faced within a large, federated open source organization. In this paper, we aim to bridge this gap to identify ongoing challenges contributors face in a mature OSS organization. To do so, we surveyed 624 contributors at the Apache Software Foundation (ASF) and ran 11 semi-structured follow up interviews. We validated our findings through member checking with the interviewees as well as the ASF Diversity and Inclusion (D&I) committee. The contributions of this paper include: (1) an empirically-evidenced conceptual model of the 88 challenges that contributors face in a mature OSS foundation and (2) a set of 48 community-recommended strategies for alleviating these challenges. Our results show that even well-established and mature organizations still face a variety of individual and project-specific challenges and that it is difficult to design a comprehensive set of processes and guidelines to match the needs and expectations of a diverse and large federated community. Our conceptual challenges model and associated strategies to mitigate them can provide guidance to other OSS foundations and projects helping them in building better support processes and tools to create a successful, thriving community of contributors.2021MGMariam Guizani et al.Open CollaborationCSCW