Improving Human Verification of LLM Reasoning through Interactive Explanation InterfacesThe reasoning capabilities of Large Language Models (LLMs) have led to their increasing employment in several critical applications, particularly education, where they support problem-solving, tutoring, and personalized study. While there are a plethora of works showing the effectiveness of LLMs in generating step-by-step solutions through chain-of-thought (CoT) reasoning on reasoning benchmarks, little is understood about whether the generated CoT is helpful for end-users in improving their ability to comprehend mathematical reasoning problems and detect errors/hallucinations in LLM-generated solutions. To address this gap and contribute to understanding how reasoning can improve human-AI interaction, we present three new interactive reasoning interfaces: interactive CoT (iCoT), interactive Program-of-Thought (iPoT), and interactive Graph (iGraph), and a novel framework that generates the LLM's reasoning from traditional CoT to alternative, interactive formats. Across 125 participants, we found that interactive interfaces significantly improved performance. Specifically, the iGraph interface yielded the highest clarity and error detection rate (85.6 %), followed by iPoT (82.5 %), iCoT (80.6 %), all outperforming standard CoT (73.5 %). Interactive interfaces also led to faster response times, where participants using iGraph were fastest (57.9 secs), compared to iCoT and iPoT (60 secs), and the standard CoT baseline (64.7 secs). Furthermore, participants preferred the iGraph reasoning interface, citing its superior ability to enable users to follow the LLM's reasoning process. We discuss the implications of these results and provide recommendations for the future design of reasoning models. The code and interfaces for this project can be found here: https://github.com/Runtaozhou/Interactive-CoT.2026RZRuntao Zhou et al.University of VirginiaHuman-LLM CollaborationExplainable AI (XAI)Prototyping & User TestingIUI
Three Modalities, Two Design Probes, One Prototype, and No Vision: Experience-Based Co-Design of a Multi-modal 3D Data Visualization ToolThree-dimensional (3D) data visualizations, such as surface plots, are vital in STEM fields from biomedical imaging to spectroscopy, yet remain largely inaccessible to blind and low-vision (BLV) people. To address this gap, we conducted an Experience-Based Co-Design with BLV co-designers with expertise in non-visual data representations to create an accessible, multi-modal, web-native visualization tool. Using a multi-phase methodology, our team of five BLV and one non-BLV researcher(s) participated in two iterative sessions, comparing a low-fidelity tactile probe with a high-fidelity digital prototype. This process produced a prototype with empirically grounded features, including reference sonification, stereo and volumetric audio, and configurable buffer aggregation, which our co-designers validated as improving analytic accuracy and learnability. In this study, we target core analytic tasks essential for non-visual 3D data exploration: orientation, landmark and peak finding, comparing local maxima versus global trends, gradient tracing, and identifying occluded or partially hidden features. Our work offers accessibility researchers and developers a co-design protocol for translating tactile knowledge to digital interfaces, concrete design guidance for future systems, and opportunities to extend accessible 3D visualization into embodied data environments.2026SKSanchita S. Kamath et al.University of Illinois Urbana-ChampaignVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Interactive Data VisualizationMedical & Scientific Data VisualizationCHI
Needling Through the Threads: A Visualization Tool for Navigating Threaded Online DiscussionsNavigating large-scale online discussions is difficult due to their rapid pace and high volume of content. Platforms like Reddit employ ``threads’’ to visually organize parallel discussions, but deep nesting obscures conversation flow. For moderators, this fragmentation compounds the difficulty of following evolving conversations and maintaining context across threads, which limits timely and effective moderation. In this paper, we present Needle, an interactive system that applies visual analytics to summarize key conversational metrics: activity, toxicity, and voting trends over time. Needle provides both high-level overviews and detailed breakdowns of threads, enabling moderators to identify priority areas without reading through entire nested conversations. Through a user study with ten Reddit moderators, we find that Needle provides a practical solution to maintain contextual understanding when navigating threaded discussions. Based on these findings, we propose design guidelines for future visualization-based tools that shape how people consume, interpret, and make sense of large-scale online discussions.2026YLYijun Liu et al.University of Illinois Urbana-ChampaignInteractive Data VisualizationSocial Platform Design & User BehaviorContent Moderation & Platform GovernanceCHI
From Crafting Text to Crafting Thought: Grounding AI Writing Support to Writing Center PedagogyAs AI writing tools evolve from fixing surface errors to creating language with writers, new capabilities raise concerns about negative impacts on student writers, such as replacing their voices and undermining critical thinking skills. To address these challenges, we look at a parallel transition in university writing centers from focusing on fixing errors to preserving student voices. We develop design guidelines informed by writing center literature and interviews with 10 writing tutors. We illustrate these guidelines in a prototype AI tool, Writor. Writor helps writers revise text by setting goals, providing balanced feedback, and engaging in conversations without generating text verbatim. We conducted an expert review with 30 writing instructors, tutors, and AI researchers on Writor to assess the pedagogical soundness, alignment with writing center pedagogy, and integration contexts. We distill our findings into design implications for future AI writing feedback systems, including designing for trust among AI-skeptical educators.2026YLYijun Liu et al.University of Illinois Urbana-ChampaignHuman-LLM CollaborationAI-Assisted Writing & Text GenerationParticipatory DesignCHI
"I Don't Trust Any Professional Research Tool": A Re-Imagination of Knowledge Production Workflows by, with, and for Blind and Low-Vision ResearchersResearch touts universal participation through accessibility initiatives, yet blind and low-vision (BLV) researchers face systematic exclusion as visual representations dominate modern research workflows. To materialize inclusive processes, we, as BLV researchers, examined how our peers combat inaccessible infrastructures. Through an explanatory sequential mixed-methods approach, we conducted a cross-sectional, observational survey (n=57) and follow-up semi-structured interviews (n=15), analyzing open-ended data using reflexive thematic analysis and framing findings through activity theory to highlight research's systemic shortcomings. We expose how BLV researchers sacrifice autonomy and shoulder physical burdens, with nearly one-fifth unable to independently perform literature review or evaluate visual outputs, delegating tasks to sighted colleagues or relying on AI-driven retrieval to circumvent fatigue. Researchers also voiced frustration with specialized tools, citing developers' performative responses and losing deserved professional accolades. We seek follow-through on research's promises through design recommendations that reconceptualize accessibility as fundamental to successful research and supporting BLV scholars' workflows.2026OKOmar Khan et al.University of Illinois Urbana-ChampaignVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Universal & Inclusive DesignUser Research Methods (Interviews, Surveys, Observation)CHI
Principles of Safe AI Companions for Youth: Parent and Expert PerspectivesAI companions are increasingly popular among teenagers, yet current platforms lack safeguards to address developmental risks and harmful normalization. Despite growing concerns, little is known about how parents and developmental psychology experts assess these interactions or what protections they consider necessary. We conducted 26 semi-structured interviews with parents and experts, who reviewed real-world youth–AI companion conversation snippets. We found that stakeholders assessed risks contextually, attending to factors such as youth maturity, AI character age, and how AI characters modeled values and norms. We also identified distinct logics of assessment: parent participants flagged single events, such as a mention of suicide or flirtation, as high risk, whereas expert participants looked for patterns over time, such as repeated references to self-harm or sustained dependence. Both groups proposed interventions, with parents favoring broader oversight and experts preferring cautious, crisis-only escalation paired with youth-facing safeguards. These findings provide directions for embedding safety into AI companion design.2026YYYaman Yu et al.University of Illinois at Urbana ChampaignMental Health Technology for YouthAffective Human-Computer DialogueAI Ethics, Fairness & AccountabilityCHI
Towards Understanding Children’s Collaborative Interaction Patterns in Child-AI Co-creative InterfacesChildren are increasingly using generative AI for co-creative activities, such as storytelling. While co-creativity is inherently about collaboration between children and AI, little is known about how children naturally engage, respond, and negotiate collaboration with AI. To address this gap, we conducted a participatory design study with children (ages 8–13) to examine the roles children and AI take and the strategies children use to align AI’s output with their intent. Our findings introduce four novel child–AI collaboration profiles. We found that children were open to technical AI refinements (e.g., adding details to their drawings) as scaffolds for developing drawing skills, but resisted conceptual transformations (e.g., changing objects) that altered their original ideas. We introduce the Child-Centered Co-creative AI (CCAI, “Kai”) framework, grounded in children’s natural collaborative behaviors during co-creation with AI, to inform the design of future child–AI co-creativity interfaces.2026FFFrancesca Fusco et al.SUPSIGenerative AI (Text, Image, Music, Video)Children's AI Literacy & Data LiteracyParticipatory DesignCHI
AReframedChair: Reframing the Empty Chair through Dyadic and Triadic AR-Mediated Self-EmbodimentImmersive technologies are increasingly applied in therapeutic and well-being practices, yet most AR systems focus on dyadic client–avatar interactions and overlook richer therapeutic structures that involve therapists. We introduce AReframedChair, an AR system that reimagines the traditional Empty Chair technique by enabling self-dialogue with a personalized avatar representing one's past or future self. In a between-subjects study with 60 adults, we compared the traditional Empty Chair method with two AR-reframed modes: Dyadic (client–avatar) and Triadic (client–avatar– therapist). Participants' survey responses showed that the Dyadic mode elicited greater positive affect and self-compassion in the past-self scenarios, whereas the Triadic mode produced stronger gains in motivation and reflections in future-self scenarios. Thematic analysis further revealed distinct roles: the Avatar facilitated emotional entry, reassurance, and cognitive reframing, while the Therapists intervened at critical moments to down-regulate intensity, redirect attention, and enhance reflection. These findings open up new design pathways for mental health technologies.2026YLYongming Li et al.Xi'an Jiaotong UniversityVR Medical Training & RehabilitationMental Health Apps & Online Support CommunitiesAffective Feedback & Emotion Regulation InterfacesCHI
I Can SE Clearly Now: Investigating the Effectiveness of GUI-based Symbolic Execution for Software Vulnerability DiscoveryWhile symbolic execution (SE) can discover software vulnerabilities, it has received limited practical adoption. A key barrier is that SE requires human expertise to understand the program’s state and prioritize paths to analyze. Traditionally, users controlled SE through programmatic API calls, but recent tooling now implements graphical user interfaces (GUI). However, it is unclear how these new features affect human-SE performance. To understand this impact, we conducted a controlled experiment where 24 vulnerability discovery experts were tasked with analyzing a binary using an SE tool with either API or GUI-based features. From this study, we identify (1) experts' SE process, and (2) the impact of GUI-based features on human-SE performance. Then we propose recommendations to improve SE tool design.2026YLYi Jou Li et al.Arizona State UniversityComputational Methods in HCIUser Research Methods (Interviews, Surveys, Observation)Prototyping & User TestingCHI
A Design Space for Live Music AgentsLive music provides a uniquely rich setting for studying creativity and interaction due to its spontaneous nature. The pursuit of live music agents---intelligent systems supporting real-time music performance and interaction---has captivated researchers across HCI, AI, and computer music for decades, and recent advancements in AI suggest unprecedented opportunities to evolve their design. However, the interdisciplinary nature of music has led to fragmented development across research communities, hindering effective communication and collaborative progress. In this work, we bring together perspectives from these diverse fields to map the current landscape of live music agents. Based on our analysis of 184 systems across both academic literature and video, we develop a comprehensive design space that categorizes dimensions spanning usage contexts, interactions, technologies, and ecosystems. By highlighting trends and gaps in live music agents, our design space offers researchers, designers, and musicians a structured lens to understand existing systems and shape future directions in real-time human-AI music co-creation. We release our annotated systems as a living artifact at https://live-music-agents.github.io.2026YKYewon Kim et al.Carnegie Mellon UniversityMusic Composition & Sound Design ToolsGenerative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsCHI
From Code Generation to Conceptual Learning: Student Use of LLMs in a Web Programming CourseAs AI-assisted coding becomes standard in software development, computer science educators need a clearer understanding of how Large Language Models (LLMs) can support the learning process. Recent work has examined how students can benefit from using LLMs in courses, but most rely on self-reported usage or controlled experiments. To complement these approaches, this paper investigates how students organically leverage LLMs in an advanced CS course where assignments reflect real-world complexity. We analyze 448 LLM chat logs from 147 students across two offerings of a web programming course at a U.S. university. Through open coding, we identified 14 distinct prompt–response pair types that cluster into three categories: to \textit{generate code}, \textit{debug code}, and \textit{explain programming concepts}. Our analysis reveals that \textit{how} students interact with LLMs correlates with academic performance. High-effort detailed specifications for code generation positively correlated with final grades ($r = 0.25$, $p < 0.01$), whereas low-effort behaviors such as pasting raw error messages showed negative correlations ($r = -0.34$, $p < 0.01$). We also observed a temporal shift toward explanation-oriented interactions, suggesting that students increasingly use LLMs as conceptual tutors.2026HIHajara-Yasmin Isa et al.University of Illinois at Urbana-ChampaignHuman-LLM CollaborationProgramming Education & Computational ThinkingIntelligent Tutoring Systems & Learning AnalyticsCHI
Living Contracts: Beyond Document-Centric Interaction with Legal AgreementsUser interaction with legal contracts has been limited to document reading, which is often complicated by complex, ambiguous legal language. We explore possible futures where contract interfaces go beyond single document interfaces to (1) educate users with legal rights not stated in the contract, (2) transform legal language into alternative representations to aid information tasks before, during, and after signing, and (3) proactively supply contractual information at relevant moments. We refer to these future interfaces collectively as Living Contracts. Using residential leases as a case study, we created three design probes representing different possible Living Contracts. A three-part qualitative study (N=18) revealed participants' barriers to interacting with contracts, including interpreting complex language, uncertainty about legal rights, and the pressure to sign quickly. Participants’ feedback on the probes highlighted how Living Contracts have the potential to address these challenges and open new design opportunities for human-contract interactions beyond document reading.2026ZHZiheng Huang et al.University of Illinois Urbana-ChampaignParticipatory DesignUser Research Methods (Interviews, Surveys, Observation)Prototyping & User TestingCHI
A Shared Look: Detecting Deepfakes with Inter-Subject Neural SynchronyThe rapid evolution of generative AI presents a significant challenge for Deepfake detection. While most research focuses on face-swapping, the emerging threat of "image-to-video" (I2V) forgeries is harder to detect and poses a greater risk. Traditional computer vision detectors rely on transient digital artifacts, which often lack interpretability and robustness against the new generation techniques. This study introduces a neuro-cognitive method, using dyadic electroencephalogram (EEG) to decode the human perception of authenticity. We recorded inter-brain synchrony via EEG hyperscanning from 15 participant pairs as they viewed a balanced set of authentic and AI-generated videos. Results showed that these shared neural response can classify video authenticity with an accuracy of up to 89.23% using our proposed Hyper-FusionNet. In addition, the biomarkers exhibited distinct patterns for different emotional valences, highlighting their versatility. These findings highlight the potential of inter-brain synchrony for detecting emerging deepfakes, offering a new perspective for enhancing user trust and digital literacy.2026SHShiang Hu et al.Anhui UniversityDeepfake & Synthetic Media DetectionEmotion Recognition & DetectionAffective Feedback & Emotion Regulation InterfacesCHI
It Shouldn't Be This Difficult: Researcher Perspectives on Diversity and Inclusion in Usable Privacy and Security ResearchWhile recent usable privacy and security (UPS) research has made progress in moving beyond “the average user,” a systematic account of how UPS researchers navigate diversity and inclusion in their work remains lacking. Through 20 in-depth semi-structured interviews with experienced researchers, we examine how and why they recruit diverse, underserved populations in their work, as well as the challenges they face in doing so, including conceptual difficulties in defining who is underserved, limited access to target populations, and inflexible peer review and publishing norms. Participants also reflected on their own positionality when planning and conducting studies, often expressing uncertainty about how to account for and articulate their positionality. We identify strategies researchers use to overcome challenges and highlight areas where collective action from the research community and institutions is needed to foster greater inclusion in UPS research practices.2026PCPriyasha Chatterjee et al.MPI-SPPrivacy by Design & User ControlPrivacy Perception & Decision-MakingInclusive DesignCHI
Envisioning an Ethical and Sustainable Metaverse Workplace: Beyond AI-Driven SurveillanceEight of the ten largest American companies now use employee tracking software, which has raised concerns about invasive monitoring. Metaverse platforms then emerged as a potential alternative to restore natural workplace visibility without keystroke logging or screen capture. While most metaverse workplace implementations were abandoned quickly, Zigbang, a South Korean company operating entirely through its metaverse platform since 2022, stands as a notable exception. Through our mixed-method analysis of employee experiences and stakeholder perspectives, we identify three factors that undermine metaverse workplace sustainability: the persistence of surveillance proxies over meaningful performance assessment, design choices that prioritize realism over digital innovation, and the absence of governance frameworks specific to metaverse workplaces. Our findings reveal that metaverse workplaces often perpetuate and amplify problematic management paradigms rather than transcend them. Using these insights, we propose frameworks for task-specific monitoring, digital-first design, and governance guidelines to aid development of ethical and sustainable metaverse workplaces.2026HPHyanghee Park et al.University of Illinois at Urbana-ChampaignMixed Reality WorkspacesImpact of Automation on WorkTechnology Ethics & Critical HCICHI
"Think about it like you're a firefighter": Understanding How Reddit Moderators Use the ModqueueOn Reddit, the moderation queue (modqueue) is the platform’s primary interface for reviewing user-reported and automatically flagged content. Despite its central role in Reddit’s community-reliant moderation model, little is known about how moderators use it. To address this gap, we surveyed 110 moderators, who collectively oversee more than 400 subreddits, to understand how the modqueue fits into their workflows and what its design enables or constrains. We find substantial variation in modqueue use: some moderators treat it as a daily checklist, others use it to identify patterns or emerging issues, and many routinely leave the interface to gather additional context or coordinate with teammates. Respondents also described challenges, coordination issues including collisions, incomplete or noisy information signals, and friction from fragmented interface versions and reliance on third-party tools. Taken together, we show the modqueue is neither a one-size-fits-all solution nor sufficient on its own for supporting moderator review. We outline opportunities for more modular, better-integrated moderation infrastructures that support both item-level review and broader governance activities, and that better align with the collaborative and value-driven nature of volunteer moderation on Reddit.2026TBTanvi Bajpai et al.University of Illinois Urbana-ChampaignContent Moderation & Platform GovernanceCommunity Collaboration & WikipediaUser Research Methods (Interviews, Surveys, Observation)CHI
Why Don't People Follow Robot Leaders? Understanding the Effects of Power Legitimacy on Compliance with AgentsArtificially Intelligent systems such as robots are increasingly integrated into the workplace and gaining more power. Yet studies on robot power and compliance report mixed findings. To address these inconsistencies, we introduced legitimacy as people's psychological acceptance of power. Three preregistered experiments were conducted (N = 431). In Experiment 1 and 2, we manipulated power assignment (robot power vs. human power), and legitimacy of power (legitimate, illegitimate, no explanation) through competence and procedural fairness. The results showed that participants complied more to the legitimate robot power than illegitimate one. In Experiment 3, we examined whether perceptions of legitimacy would emerge naturally in more ecologically valid collaboration. Results of multigroup mediation model showed that the robot leader was perceived as less legitimate than the human leader, which accounted for the reduced compliance to the robot’s decisions. In all three experiments, people’s perceived social attributes of robots with power and their affective responses were negatively affected. Theoretical and design implications are discussed.2026HCHuajie Jay Cao et al.Michigan State UniversityHuman-Robot Collaboration (HRC)Technology Ethics & Critical HCISocial Robot InteractionCHI
Representation on Our Terms: How Trans and Gender Diverse People Prioritize Inclusivity in University Information SystemsHCI research increasingly advocates for fluid models of gender to better represent trans and gender diverse people. In many organizations, such flexibility must operate within regulatory, technical, and resource constraints. For example, public universities are legally required to report gender data using binary categories for compliance and oversight. To understand how inclusivity should be implemented under these conditions, we investigated trans and gender diverse people’s representational priorities within a public university's information systems. Interviews with 23 participants revealed nuanced priorities on acceptable representational trade-offs. Participants preferred accurate representation in back-end systems and other unseen realms over inclusive user interfaces. They also desired agency in how their identities were grouped or simplified in contexts where categorical reduction was unavoidable. We argue for pragmatic notions of inclusivity that balance community values with organizational constraints. Implementing inclusivity in long-lived organizational systems requires discretion and restraint alongside affirming representational practices amid shifting sociopolitical landscapes.2026DKDrew N. Kirks-Cler et al.University of Illinois Urbana-ChampaignInclusive DesignGender & Race Issues in HCITechnology Ethics & Critical HCICHI
FretFlow: Adaptive Haptics for Rhythm and Articulation in Guitar LearningRhythm and articulation are essential for expressive guitar performance. Existing tools provide basic beat cues, whereas beginners often struggle to align with these cues when playing complex techniques, such as strumming and muting. Informed by a formative study with five instructors and grounded in embodied learning theories, we present FretFlow, a haptic vest-based tool that simulates common instructional practices to guide learners through physical interactions like tapping. The key to FretFlow is its design space that maps rhythmic and articulation patterns in various playing techniques to distinct haptic patterns, enabling authoring of haptic scores. FretFlow further dynamically adapts haptic intensity based on learners' real-time performance accuracy, accompanied by multimodal guidance across haptic, visual, and audio channels. We iteratively refined haptic designs across two rounds with 46 participants, followed by a two-week user study with 20 beginners. Results show that FretFlow improves learners’ rhythmic accuracy and expressive performance.2026XSXin Shu et al.Newcastle UniversityHaptic WearablesBehavior Change & Reflection TechnologyFull-Body Interaction & Embodied InputCHI
Towards AI as Colleagues: Multi-Agent System Improves Structured Ideation ProcessesMost AI systems today are designed to manage tasks and execute predefined steps. This makes them effective for process coordination but limited in their ability to engage in joint problem-solving with humans or contribute new ideas. We introduce MultiColleagues, a multi-agent conversational system that shows how AI agents can act as colleagues by conversing with each other, sharing new ideas, and actively involving users in collaborative ideation processes. In a within-subjects study with 20 participants, we compared MultiColleagues to a single-agent baseline. Results show that MultiColleagues fostered stronger perceived social presence, and participants rated their outcomes as higher in quality and novelty, with more elaboration during ideation. These findings demonstrate the potential of AI agents to move beyond process partners toward colleagues that share intent, strengthen group dynamics, and collaborate with humans to advance ideas.2026KQKexin Quan et al.University of Illinois, Urbana-ChampaignHuman-LLM CollaborationCreative Collaboration & Feedback SystemsAI-Assisted Decision-Making & AutomationCHI