ClearFairy: Capturing Creative Workflows through Decision Structuring, In-Situ Questioning, and Rationale InferenceCapturing professionals’ decision-making in creative workflows (e.g., UI/UX) is essential for reflection, collaboration, and knowledge sharing, yet existing methods often leave rationales incomplete and implicit decisions hidden. To address this, we present the CLEAR approach, which structures reasoning into cognitive decision steps—linked units of actions, artifacts, and explanations, making decisions traceable with generative AI. Building on CLEAR, we introduce ClearFairy, a think-aloud AI assistant for UI design that detects weak explanations, asks lightweight clarifying questions, and infers missing rationales. In a study with twelve professionals, 85% of ClearFairy’s inferred rationales were accepted (as-is or with revisions). Notably, the system increased "strong explanations"'—rationales providing sufficient causal reasoning—from 14% to 83% without adding cognitive demand. Furthermore, exploratory applications demonstrate that captured steps can enhance generative AI agents in Figma, yielding predictions better aligned with professionals and producing coherent outcomes. We release a dataset of 417 decision steps to support future research.2026KSKihoon Son et al.KAISTHuman-LLM CollaborationCreative Collaboration & Feedback Systems360° Video & Panoramic ContentCHI
"Having Lunch Now": Understanding How Users Engage with a Proactive Agent for Daily Planning and Self-ReflectionConversational agents have been studied as tools to scaffold planning and self-reflection for productivity and well-being. While prior work has demonstrated positive outcomes, we still lack a clear understanding of what drives these results and how users behave and communicate with agents that act as coaches rather than assistants. Such understanding is critical for designing interactions in which agents foster meaningful behavioral change. We conducted a 14-day longitudinal study with 12 participants using a proactive agent that initiated regular check-ins to support daily planning and reflection. Our findings reveal diverse interaction patterns: participants accepted or negotiated suggestions, developed shared mental models, reported progress, and at times resisted or disengaged. We also identified problematic aspects of the agent's behavior, including rigidity, premature turn-taking, and overpromising. Our work contributes to understanding how people interact with a proactive, coach-like agent and offers design considerations for facilitating effective behavioral change.2026AAAdnan Abbas et al.Virginia Polytechnic Institute & State University (Virginia Tech)Conversational ChatbotsAI-Assisted Decision-Making & AutomationBehavior Change & Reflection TechnologyCHI
“Too Crowded for a Robot?”: Modeling Human Acceptance Criteria for Elevator-Riding RobotsRobots are increasingly expected to share elevators with people, yet little is known about the conditions shaping acceptance. We introduce the Robot Boarding Area (RBA)—a designated entry zone for robots—and examine how its availability and congestion affect user evaluations. In an online survey, acceptance sharply decreased once the RBA was occupied by any person or large object, even under moderate crowding. A VR experiment confirmed this pattern and further showed that participants preferred when robots refrained from boarding in crowded conditions compared to forcing entry. By formalizing the RBA as an acceptance criterion and demonstrating the value of adaptive skip strategies, this work identifies spatial availability and boarding behavior as central to socially acceptable robot deployment in elevators.2026SKSeoktae Kim et al.NAVER LABSSocial Robot InteractionTeleoperation & TelepresenceCHI
"Are we writing an advice column for Spock here?" Understanding Stereotypes in AI Advice for Autistic UsersAutistic individuals sometimes disclose autism when asking LLMs for social advice, hoping for more personalized responses. However, they also recognize that these systems may reproduce stereotypes, raising uncertainty about the risks and benefits of disclosure. We conducted a mixed-methods study combining a large-scale LLM audit experiment with interviews involving 11 autistic participants. We developed a six-step pipeline operationalizing 12 documented autism stereotypes into decision-making scenarios framed as users requesting advice (e.g., “Should I do A or B?”). We generated 345,000 responses from six LLMs and measured how advice shifted when prompts disclosed autism versus when they did not. When autism was disclosed, LLMs disproportionately recommended avoiding stereotypically stressful situations, including social events, confrontations, new experiences, and romantic relationships. While some participants viewed this as affirming, others criticized it as infantilizing or undermining opportunities for growth. Our study illuminates how the intermingling of affirmation and stereotyping complicates the personalization of LLMs.2026CWCaleb Wohn et al.Virginia TechHuman-LLM CollaborationAI Ethics, Fairness & AccountabilityCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)CHI
LingoQ: Bridging the Gap between EFL Learning and Work through AI-Generated Work-Related QuizzesNon-native English speakers performing English-related tasks at work struggle to sustain EFL learning, despite their motivation. Often, study materials are disconnected from their work context. Our formative study revealed that reviewing work-related English becomes burdensome with current systems, especially after work. Although workers rely on LLM-based assistants to address their immediate needs, these interactions may not directly contribute to their English skills. We present LingoQ, an AI-mediated system that allows workers to practice English using quizzes generated from their LLM queries during work. LingoQ leverages these on-the-fly queries using AI to generate personalized quizzes that workers can review and practice on their smartphones. We conducted a three-week deployment study with 28 EFL workers to evaluate LingoQ. Participants valued the quality-assured, work-situated quizzes and constantly engaging with the app during the study. This active engagement improved self-efficacy and led to learning gains for beginners and, potentially, for intermediate learners. Drawing on these results, we discuss design implications for leveraging workers' growing reliance on LLMs to foster proficiency and engagement while respecting work boundaries and ethics.2026YYYeonsun Yang et al.DGISTHuman-LLM CollaborationProgramming Education & Computational ThinkingIntelligent Tutoring Systems & Learning AnalyticsCHI
Exploring Learners' Expectations and Engagement When Collaborating with Constructively Controversial Peer AgentsPeer agents can supplement real-time collaborative learning in asynchronous online courses. Constructive Controversy (CC) theory suggests that humans deepen their understanding of a topic by confronting and resolving controversies. This study explores whether CC’s benefits apply to LLM-based peer agents, focusing on the impact of agents’ disputatious behaviors and disclosure of agents’ behavior designs on the learning process. In our mixed-method study (n=144), we compare LLMs that follow detailed CC guidelines (regulated) to those guided by broader goals (unregulated) and examine the effects of disclosing the agents’ design to users (transparent vs. opaque). Findings show that learners' values influence their agent interaction: those valuing control appreciate unregulated agents' willingness to cease push-back upon request, while those valuing intellectual challenges favor regulated agents for stimulating creativity. Additionally, design transparency lowers learners' perception of agents’ abilities. Our findings lay the foundation for designing effective collaborative peer agents in isolated educational settings.2026TTThitaree Tanprasert et al.University of British ColumbiaHuman-LLM CollaborationIntelligent Tutoring Systems & Learning AnalyticsCollaborative Learning & Peer TeachingCHI
Surfacing Governing Principles for Chatbots: A Workbench and Comparative StudyTrust in Large Language Model chatbots depends not only on what these systems do but also on how their behavior is governed and communicated. We present Trust Mediator, a workbench that supports service owners in authoring and assessing principle sets for LLM-driven chatbots through persona-based exploration and structured scaffolds. To examine this workflow, we use three analytic lenses—specificity, coverage, and coherence—to characterize the principles produced. In an exploratory between-subjects study, we compared manual and assisted principle authoring. Participants in both conditions viewed principles as useful for governing and assessing chatbot behavior. Assisted authoring was generally perceived as more supportive and tended to broaden coverage. Manual authoring required more effort but yielded principles that were significantly more specific.These findings highlight complementary strengths of assisted and manual pathways and illustrate the value of treating principle sets as design objects within governance workflows. Beyond their analytic role in this study, the lenses themselves suggest opportunities for supporting the construction and inspection of principle sets.2026AGAntonietta Maria Grasso et al.Naver Labs EuropeHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationAI Ethics, Fairness & AccountabilityCHI
Autiverse: Eliciting Autistic Adolescents' Daily Narratives through AI-guided Multimodal JournalingJournaling can potentially serve as an effective method for autistic adolescents to improve narrative skills. However, its text-centric nature and high executive functioning demands present barriers to practice. We present Autiverse, an AI-guided multimodal journaling app for tablets that scaffolds daily narratives through conversational prompts and visual supports. Autiverse elicits key details of an adolescent-selected event through a stepwise dialogue with peer-like, customizable AI and composes them into an editable four-panel comic strip. Through a two-week deployment study with 10 autistic adolescent-parent dyads, we examine how Autiverse supports autistic adolescents to organize their daily experience and emotion. Our findings show Autiverse scaffolded adolescents' coherent narratives, while enabling parents to learn additional details of their child's events and emotions. Moreover, the customized AI peer created a comfortable space for sharing, fostering enjoyment and a strong sense of agency. Drawing on these results, we discuss implications for adaptive scaffolding across autism profiles, socio-emotionally appropriate AI peer design, and balancing autonomy with parental involvement.2026MYMigyeong Yang et al.NAVER AI LabCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Affective Human-Computer DialogueChild-Computer Interaction DesignCHI
CHOIR: A Chatbot-mediated Organizational Memory Leveraging Communication in University Research LabsUniversity research labs often rely on chat-based platforms for communication and project management, where valuable knowledge surfaces but is easily lost in message streams. Documentation can preserve knowledge, but it requires ongoing maintenance and is challenging to navigate. Drawing on formative interviews that revealed organizational memory challenges in labs, we designed CHOIR, an LLM-based chatbot that supports organizational memory through four key functions: document-grounded Q&A, Q&A sharing for follow-up discussion, knowledge extraction from conversations, and AI-assisted document updates. We deployed CHOIR in four research labs for one month (n=21), where the lab members asked 107 questions and lab directors updated documents 38 times in the organizational memory. Our findings reveal a privacy-awareness tension: questions were asked privately, limiting directors' visibility into documentation gaps. Students often avoided contribution due to challenges in generalizing personal experiences into universal documentation. We contribute design implications for privacy-preserving awareness and supporting context-specific knowledge documentation.2026SLSangwook Lee et al.Virginia TechHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationPrivacy by Design & User ControlCHI
An Empirical Study to Understand How Students Use ChatGPT for Writing EssaysAs large language models (LLMs) become widespread, students increasingly turn to systems like ChatGPT for writing tasks. Educators worry that this reliance may reduce critical engagement with writing and hinder students' learning processes. Although datasets exist on students’ use of LLMs for writing, how they functionally use ChatGPT in detail---and how this usage shapes their writing and perceptions---remains underexplored. We conducted an online study (n=77) in which students wrote an essay using an in-house ChatGPT we developed to capture their queries. Through qualitative analysis, we identified the types of assistance students sought and presented patterns of use, ranging from asking for opinions on a topic to delegating the entire writing task to ChatGPT. We also found that students' writing self-efficacy influenced their querying patterns and that levels of ownership and creativity varied depending on how they used ChatGPT. This study contributes empirical data to ongoing discussions about how writing education should incorporate or regulate LLM-powered tools.2026AJAndrew Jelson et al.Virginia TechHuman-LLM CollaborationAI-Assisted Writing & Text GenerationIntelligent Tutoring Systems & Learning AnalyticsCHI
BloomIntent: Automating Search Evaluation with LLM-Generated Fine-Grained User IntentsIf 100 people issue the same search query, they may have 100 different goals. While existing work on user-centric AI evaluation highlights the importance of aligning systems with fine-grained user intents, current search evaluation methods struggle to represent and assess this diversity. We introduce BloomIntent, a user-centric search evaluation method that uses user intents as the evaluation unit. BloomIntent first generates a set of plausible, fine-grained search intents grounded on taxonomies of user attributes and information-seeking intent types. Then, BloomIntent provides an automated evaluation of search results against each intent powered by large language models. To support practical analysis, BloomIntent clusters semantically similar intents and summarizes evaluation outcomes in a structured interface. With three technical evaluations, we showed that BloomIntent generated fine-grained, evaluable, and realistic intents and produced scalable assessments of intent-level satisfaction that achieved 72% agreement with expert evaluators. In a case study (N=4), we showed that BloomIntent supported search specialists in identifying intents for ambiguous queries, uncovering underserved user needs, and discovering actionable insights for improving search experiences. By shifting from query-level to intent-level evaluation, BloomIntent reimagines how search systems can be assessed---not only for performance but for their ability to serve a multitude of user goals.2025YCYoonseo Choi et al.Human-LLM CollaborationExplainable AI (XAI)Recommender System UXUIST
PlanFitting: Personalized Exercise Planning with Large Language Model-driven Conversational AgentCreating personalized and actionable exercise plans often requires iteration with experts, which can be costly and inaccessible to many individuals. This work explores the capabilities of Large Language Models (LLMs) in addressing these challenges. We present PlanFitting, an LLM-driven conversational agent that assists users in creating and refining personalized weekly exercise plans. By engaging users in free-form conversations, PlanFitting helps elicit users’ goals, availabilities, and potential obstacles, and enables individuals to generate personalized exercise plans aligned with established exercise guidelines. Our study—involving a user study, intrinsic evaluation, and expert evaluation—demonstrated PlanFitting’s ability to guide users to create tailored, actionable, and evidence-based plans. We discuss future design opportunities for LLM-driven conversational agents to create plans that better comply with exercise principles and accommodate personal constraints.2025DSDonghoon Shin et al.Human-LLM CollaborationFitness Tracking & Physical Activity MonitoringCUI
ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signingd/Deaf and hearing song-signers have become prevalent across video-sharing platforms, but translating songs into sign language remains cumbersome and inaccessible. Our formative study revealed the challenges song-signers face, including semantic, syntactic, expressive, and rhythmic considerations in translations. We present ELMI, an accessible song-signing tool that assists in translating lyrics into sign language. ELMI enables users to edit glosses line-by-line, with real-time synced lyric and music video snippets. Users can also chat with a large language model-driven AI to discuss meaning, glossing, emoting, and timing. Through an exploratory study with 13 song-signers, we examined how ELMI facilitates their workflows and how song-signers leverage and receive an LLM-driven chat for translation. Participants successfully adopted ELMI to song-signing, with active discussions throughout. They also reported improved confidence and independence in their translations, finding ELMI encouraging, constructive, and informative. We discuss research and design implications for accessible and culturally sensitive song-signing translation tools.2025SYSuhyeon Yoo et al.University of Toronto, Computer ScienceVoice User Interface (VUI) DesignConversational ChatbotsVoice AccessibilityCHI
ExploreSelf: Fostering User-driven Exploration and Reflection on Personal Challenges with Adaptive Guidance by Large Language ModelsExpressing stressful experiences in words is proven to improve mental and physical health, but individuals often disengage with writing interventions as they struggle to organize their thoughts and emotions. Reflective prompts have been used to provide direction, and large language models (LLMs) have demonstrated the potential to provide tailored guidance. However, current systems often limit users' flexibility to direct their reflections. We thus present ExploreSelf, an LLM-driven application designed to empower users to control their reflective journey, providing adaptive support through dynamically generated questions. Through an exploratory study with 19 participants, we examine how participants explore and reflect on personal challenges using ExploreSelf. Our findings demonstrate that participants valued the flexible navigation of adaptive guidance to control their reflective journey, leading to deeper engagement and insight. Building on our findings, we discuss the implications of designing LLM-driven tools that facilitate user-driven and effective reflection of personal challenges.2025ISInhwa Song et al.KAIST, School of ComputingHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesCHI
A Matter of Perspective(s): Contrasting Human and LLM Argumentation in Subjective Decision-Making on Subtle SexismIn subjective decision-making, where decisions are based on contextual interpretation, Large Language Models (LLMs) can be integrated to present users with additional rationales to consider. The diversity of these rationales is mediated by the ability to consider the perspectives of different social actors; however, it remains unclear whether and how models differ in the distribution of perspectives they provide. We compare the perspectives taken by humans and different LLMs when assessing subtle sexism scenarios. We show that these perspectives can be classified within a finite set (perpetrator, victim, decision-maker), consistently present in argumentations produced by humans and LLMs, but in different distributions and combinations, demonstrating differences and similarities with human responses, and between models. We argue for the need to systematically evaluate LLMs’ perspective-taking to identify the most suitable models for a given decision-making task. We discuss the implications for model evaluation.2025PAPaula Akemi Aoyagui et al.University of Toronto, Faculty of InformationHuman-LLM CollaborationAI Ethics, Fairness & AccountabilityAlgorithmic Fairness & BiasCHI
Making the Write Connections: Linking Writing Support Tools with Writer NeedsThis work sheds light on whether and how creative writers' needs are met by existing research and commercial writing support tools (WST). We conducted a need finding study to gain insight into the writers' process during creative writing through a qualitative analysis of the response from an online questionnaire and Reddit discussions on \textit{r/Writing}. Using a systematic analysis of 115 tools and 67 research papers, we map out the landscape of how digital tools facilitate the writing process. Our triangulation of data reveals that research predominantly focuses on the writing activity and overlooks pre-writing activities and the importance of visualization. We distill 10 key takeaways to inform future research on WST and point to opportunities surrounding underexplored areas. Our work offers a holistic and up-to-date account of how tools have transformed the writing process, guiding the design of future tools that address writers' evolving and unmet needs.2025ZZZixin Zhao et al.University of Toronto, Department of Computer ScienceAI-Assisted Creative WritingCHI
Understanding Public Agencies' Expectations and Realities of AI-Driven Chatbots for Public Health MonitoringAdvances in artificial intelligence (AI) offer the potential for chatbots to support public health monitoring by automating tasks traditionally performed by frontline workers. While introducing AI impacts public agency workers across decision-making, administration, and monitoring roles, the perceptions of workers regarding these technologies and their actual impact on labor are underexplored. We examine the case of CareCall, a large language model (LLM)-driven chatbot used to monitor socially isolated individuals, by interviewing 21 public agency workers across 13 sites involved in its adoption and rollout. We find that CareCall helped expand public reach but increased burdens on frontline workers due to insufficient resources and new labor demands, such as handling lapses in user engagement. We discuss how implementing LLM-driven chatbots in public health contexts can complicate decision-makers' articulation work and impose additional maintenance work on frontline workers. We recommend AI chatbots in this space leverage public infrastructure and incorporate fallback mechanisms.2025EJEunkyung Jo et al.University of California, IrvineHuman-LLM CollaborationMental Health Apps & Online Support CommunitiesActivism & Political ParticipationCHI
Enhancing Pediatric Communication: The Role of an AI-Driven Chatbot in Facilitating Child-Parent-Provider InteractionCommunication with child patients is challenging due to their developing ability to express emotions and symptoms. Additionally, healthcare providers often have limited time to offer resources to parents. By leveraging AI to facilitate free-form conversations, our study aims to design an AI-driven chatbot to bridge these gaps in child-parent-provider communication. We conducted two studies: 1) design sessions with 12 children with cancer and their parents, which informed the development of our chatbot, ARCH, and 2) an interview study with 15 pediatric care experts to identify potential challenges and refine ARCH's role in pediatric communication. Our findings highlight three key roles for ARCH: providing an expressive outlet for children, offering reassurance to parents, and serving as an assessment tool for providers. We conclude by discussing design considerations for AI-driven chatbots in pediatric communication, such as creating communication spaces, balancing the expectations of children and parents, and addressing potential cultural differences.2025WSWoosuk Seo et al.University of Michigan, School of InformationConversational ChatbotsCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Mental Health Apps & Online Support CommunitiesCHI
AACessTalk: Fostering Communication between Minimally Verbal Autistic Children and Parents with Contextual Guidance and Card RecommendationAs minimally verbal autistic (MVA) children communicate with parents through few words and nonverbal cues, parents often struggle to encourage their children to express subtle emotions and needs and to grasp their nuanced signals. We present AACessTalk, a tablet-based, AI-mediated communication system that facilitates meaningful exchanges between an MVA child and a parent. AACessTalk provides real-time guides to the parent to engage the child in conversation and, in turn, recommends contextual vocabulary cards to the child. Through a two-week deployment study with 11 MVA child-parent dyads, we examine how AACessTalk fosters everyday conversation practice and mutual engagement. Our findings show high engagement from all dyads, leading to increased frequency of conversation and turn-taking. AACessTalk also encouraged parents to explore their own interaction strategies and empowered the children to have more agency in communication. We discuss the implications of designing technologies for balanced communication dynamics in parent-MVA child interaction.2025DCDasom Choi et al.KAIST, Department of Industrial DesignCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Augmentative & Alternative Communication (AAC)CHI
Textoshop: Interactions Inspired by Drawing Software to Facilitate Text EditingWe explore how interactions inspired by drawing software can help edit text. Making an analogy between visual and text editing, we consider words as pixels, sentences as regions, and tones as colours. For instance, direct manipulations move, shorten, expand, and reorder text; tools change number, tense, and grammar; colours map to tones explored along three dimensions in a tone picker; and layers help organize and version text. This analogy also leads to new workflows, such as boolean operations on text fragments to construct more elaborated text. A study shows participants were more successful at editing text and preferred using the proposed interface over existing solutions. Broadly, our work highlights the potential of interaction analogies to rethink existing workflows, while capitalizing on familiar features.2025DMDamien Masson et al.University of Toronto, Department of Computer ScienceHuman-LLM CollaborationAI-Assisted Creative WritingPrototyping & User TestingCHI