Improving Human Verification of LLM Reasoning through Interactive Explanation InterfacesThe reasoning capabilities of Large Language Models (LLMs) have led to their increasing employment in several critical applications, particularly education, where they support problem-solving, tutoring, and personalized study. While there are a plethora of works showing the effectiveness of LLMs in generating step-by-step solutions through chain-of-thought (CoT) reasoning on reasoning benchmarks, little is understood about whether the generated CoT is helpful for end-users in improving their ability to comprehend mathematical reasoning problems and detect errors/hallucinations in LLM-generated solutions. To address this gap and contribute to understanding how reasoning can improve human-AI interaction, we present three new interactive reasoning interfaces: interactive CoT (iCoT), interactive Program-of-Thought (iPoT), and interactive Graph (iGraph), and a novel framework that generates the LLM's reasoning from traditional CoT to alternative, interactive formats. Across 125 participants, we found that interactive interfaces significantly improved performance. Specifically, the iGraph interface yielded the highest clarity and error detection rate (85.6 %), followed by iPoT (82.5 %), iCoT (80.6 %), all outperforming standard CoT (73.5 %). Interactive interfaces also led to faster response times, where participants using iGraph were fastest (57.9 secs), compared to iCoT and iPoT (60 secs), and the standard CoT baseline (64.7 secs). Furthermore, participants preferred the iGraph reasoning interface, citing its superior ability to enable users to follow the LLM's reasoning process. We discuss the implications of these results and provide recommendations for the future design of reasoning models. The code and interfaces for this project can be found here: https://github.com/Runtaozhou/Interactive-CoT.2026RZRuntao Zhou et al.University of VirginiaHuman-LLM CollaborationExplainable AI (XAI)Prototyping & User TestingIUI
Design Considerations for Human Oversight of AI: Insights from Co-Design Workshops and Work Design TheoryAs AI systems become increasingly capable and autonomous, domain experts’ roles are shifting from performing tasks themselves to overseeing AI-generated outputs. Such oversight is critical, as undetected errors can have serious consequences or undermine the benefits of AI. Effective oversight, however, depends not only on detecting and correcting AI errors but also on the motivation and engagement of the oversight personnel and the meaningfulness they see in their work. Yet little is known about how domain experts approach and experience the oversight task and what should be considered to design effective and motivational interfaces that support human oversight. To address these questions, we conducted four co-design workshops with domain experts from psychology and computer science. We asked them to first oversee an AI-based grading system, and then discuss their experiences and needs during oversight. Finally, they collaboratively prototyped interfaces that could support them in their oversight task. Our thematic analysis revealed four key user requirements: understanding tasks and responsibilities, gaining insight into the AI’s decision-making, contributing meaningfully to the process, and collaborating with peers and the AI. We integrated these empirical insights with the SMART model of work design to develop a framework of twelve design considerations with increased transferability compared to the identified user requirements. Our framework links interface characteristics and user requirements to the psychological processes underlying effective and satisfying work. Being grounded in work design theory and overlapping with existing guidelines for human–AI interaction, we expect these considerations to be applicable across domains and discuss how they go beyond existing guidelines for human-AI interaction to inform the design of engaging and meaningful interfaces that support human oversight of AI-based systems.2026CFCedric Faas et al.Saarland UniversityAI-Assisted Decision-Making & AutomationExplainable AI (XAI)Participatory DesignIUI
Mental Models in Human-AI Interaction: Systematic Review of Empirical Methodologies and GuidelinesThe notion of mental model has long been used in HCI to capture people's understanding and reasoning about computing systems. Eliciting users' mental models can explain their behaviors and attitudes toward a system—why and how they use, rely on, trust, or reject it. However, its use remains conceptually fragmented and methodologically diverse and has not been revisited in light of modern AI systems, whose opacity and newfound abilities may challenge human understanding. To address this gap, we systematically review 88 empirical studies that elicit humans’ mental models of AI systems. We extracted and analyzed how studies define and elicit mental models, the type of mental model their method presupposes, and how these vary across AI system types. Drawing from the mental model's framing in cognitive psychology and HCI, and based on descriptive and relational analysis between the variables extracted, we find that (1) mental model elicitations' goal bifurcates between system-specific evaluation and class-level probes surfacing lay theories; (2) epistemic assumptions exceed the classic functional-structural lens (how the system behaves / how it works internally) with analogical and anthropomorphic framings of AI systems; (3) elicitation methods are shaped more by system characteristics and community-specific practices than theoretical commitments, particularly for predictive and explainable AI systems and autonomous or driver-assist vehicles. We derive 9 practical guidelines to support more deliberate and reflective methods for eliciting mental models of AI systems. In doing so, we aim to reestablish continuity between the cognitive theory of mental models and their empirical use in HCI, improving the transparency and comparability of research surrounding the concept.2026TSTéo Sanchez et al.Ludwig Maximilian University of MunichExplainable AI (XAI)AI-Assisted Decision-Making & AutomationAutomated Driving Interface & Takeover DesignIUI
No Code, No Cloud: On-Device Mockup-to-Code with Lightweight Vision-Language AIBridging the gap between visual design and functional code remains a persistent challenge in modern UI workflows, especially for small teams and non-programmers. Existing solutions, such as Figma-to-code tools and recent vision language models (VLMs), often depend on proprietary cloud APIs or large-scale architectures, limiting offline operation, privacy, and control. We present LiteViT5, a lightweight, on-device vision language model that directly generates HTML from images of design mockups, enabling private, no-code prototyping without cloud infrastructure. Built on a compact ViT–T5 encoder–decoder framework with 235M parameters, LiteViT5 achieves competitive results on both in-distribution (WebSight) and out-of-distribution (Design2Code) benchmarks. We evaluate its performance across structure, position, color, and CLIP-based similarity metrics and report its comparable performance to models 10–30× larger such as PaliGemma-3B, LLaVA-7B, and DeepSeek-VL-7B. We further assess LiteViT5 in a user study with 24 participants assessing perceived accuracy, code quality, and editability. Our findings show that LiteViT5 supports rapid design iteration, reduces reliance on developer handoff, making it a practical, assistive tool for democratizing web interface creation. This work highlights the potential of efficient, human-centered generative AI to empower interface design beyond expert-only workflows. To support transparency and reproducibility, we release LiteViT5 as an open-source model on Hugging Face: OSTswiss/LiteViT5.2026AKAbinas Kuganathan et al.Institut für Interaktive InformatikGenerative AI (Text, Image, Music, Video)Creative Collaboration & Feedback SystemsAI-Assisted Writing & Text GenerationIUI
When Help Hurts: Verification Load and Fatigue with AI Coding AssistantsAI coding assistants help, but developers still spend effort verifying model output. We isolate interface effects by holding a single LLM fixed while N=60 participants solve three Python tasks with Inline, Chat, or Structured prompting, plus a no-AI control. AI reduced workload by -18.2 TLX points and time by 22% (25.0 vs. 32.1 min) and improved correctness (OR=1.71). Within AI, Inline is fastest and lowest-load on simple work; Chat yields higher correctness beyond a per-observation complexity threshold (z≈+0.41) without a time cost; Structured benefits novices at mid complexity. We introduce a mode-agnostic verification-load index (failures, time-to-first-compile, churn, pauses, switches) that partially mediates rising stress/fatigue across tasks. We translate these findings into design guidance: adaptive mode orchestration, transparency on demand, and verification-aware packaging, and propose reporting verification load alongside outcomes to evaluate interfaces as models evolve.2026GFGuangrui Fan et al.Taiyuan University of Science and TechnologyHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationExplainable AI (XAI)CHI
Take the Power Back: Screen-Based Personal Moderation Against Hate Speech on InstagramHate speech remains a pressing challenge on social media, where platform moderation often fails to protect targeted users. Personal moderation tools that let users decide how content is filtered can address some of these shortcomings. However, it remains an open question on which screens (e.g., the comments, the reels tab, or the home feed) users want personal moderation and which features they value most. To address these gaps, we conducted a three-wave Delphi study with 40 activists who experienced hate speech. We combined quantitative ratings and rankings with open questions about required features. Participants prioritized personal moderation for conversational and algorithmically curated screens. They valued features allowing for reversibility and oversight across screens, while input-based, content-type specific, and highly automated features are more screen specific. We discuss the importance of personal moderation and offer user-centered design recommendations for personal moderation on Instagram.2026ALAnna Ricarda Luther et al.ifibOnline Harassment & Counter-ToolsSocial Platform Design & User BehaviorParticipatory DesignCHI
Play/Destroy: A portfolio of sound destruction devicesDigital media operates on a curious boundary between storage and loss. While each new storage format promises a permanent solution to our exponentially expanding media libraries, they inevitably fail or otherwise become unusable. This paper reflects on a long-term design process that attempts to bring a different paradigm to the experience of personal digital media: destruction. We present an annotated portfolio of a set of sound listening devices, critically unpacking the particular temporal, perceptual, and experiential qualities that emerge when designing for the loss of personal media. These annotations show how destruction comes to matter in designing against the traditional bias towards growth and accumulation in HCI.2026YSYann Seznec et al.KTH Royal Institute of TechnologyDigital Art Installations & Interactive PerformanceTangible User Interface DesignEmpathy & Emotional DesignCHI
FlueBricks: A Construction Kit of Flute-like Instruments for Acoustic ReasoningWe present FlueBricks, a construction kit for acoustic reasoning via building and customizing flute-like instruments. By assembling generator, resonator, and connector modules that embody various aeroacoustic properties, users gain deeper understanding of how blowhole, tube length, and tone-hole placement alter onset, pitch, and timbre through hands-on experimentation. This forms a designer-player loop of configuring and playing to form, test, and refine acoustic behaviors-acoustic reasoning-shifting acoustic instruments from static artifacts to dynamic systems.To understand how users engage with this system, we conducted an exploratory study with 12 participants ranging from novices to professional musicians. During their explorations, we observed participants fluently switching between designer and player roles, scaffolding designs from familiar instruments, forming and refining their acoustic understanding of length, tone holes, and generator geometry, reinterpreting modules beyond their intended functions, and using their creations for performative acts such as pedagogical showing and musical expression. These collectively demonstrated FlueBricks's potential as a pedagogical tool for embodied acoustic reasoning.2026BCBo-Yu Chen et al.National Taiwan UniversityDigital Art Installations & Interactive PerformanceTangible Programming & Physical ComputingCHI
Privy: Envisioning and Mitigating Privacy Risks for Consumer-facing AI Product ConceptsAI creates and exacerbates privacy risks, yet practitioners lack effective resources to identify and mitigate these risks. We present Privy, a tool that guides practitioners without privacy expertise through structured privacy impact assessments to: (i) identify relevant risks in novel AI product concepts, and (ii) propose appropriate mitigations. Privy was shaped by a formative study with 11 practitioners, which informed two versions --- one LLM-powered, the other template-based. We evaluated these two versions of Privy through a between-subjects, controlled study with 24 separate practitioners, whose assessments were reviewed by 13 independent privacy experts. Results show that Privy helps practitioners produce privacy assessments that experts deemed high quality: practitioners identified relevant risks and proposed appropriate mitigation strategies. These effects were augmented in the LLM-powered version. Practitioners themselves rated Privy as being useful and usable, and their feedback illustrates how it helps overcome long-standing awareness, motivation, and ability barriers in privacy work.2026HLHao-Ping (Hank) Lee et al.Carnegie Mellon UniversityExplainable AI (XAI)Privacy by Design & User ControlPrivacy Perception & Decision-MakingCHI
Laughing Through the Struggles: Understanding ADHD Experience and Community Engagement Through Memes and Comments on InstagramWhile public discourse often reduces Attention-Deficit Hyperactivity Disorder (ADHD) to stereotypes that overlook the invisible struggles of those who live with it, ADHD people are increasingly using social media to express their experiences on their own terms. On platforms like Instagram, memes have become a powerful and accessible medium for expressing everyday challenges through humor and relatability. This study analyzed 350 ADHD-related memes and over 28,000 associated comments to explore how ADHD was expressed and engaged with in online spaces, and consulted a neurodevelopmental science and clinical researcher. Findings show that memes depict behavioral inconsistencies, internal conflicts, and societal pressures, while comments reveal strong resonance, personal identification, and peer support, including informal self-diagnosis and shared experiences. By combining meme and comment analyses, this study contributes to digital mental health research by demonstrating how memes serve as an interactional mechanism for neurodivergent storytelling and identity formation and informing future platform design.2026FZFan Zhang et al.Independent ResearcherCognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)Social Platform Design & User BehaviorMental Health Apps & Online Support CommunitiesCHI
ViRAS: Design Artifacts to Explore Socio-Material Configurations through a Research-through-Design Approach in Robot-Assisted SurgeryRobot-assisted surgery (RAS) has raised concerns within human-computer interaction, particularly regarding the socio-material configurations of robotic systems and their impact on surgical practices. To investigate these configurations in the context of the da Vinci robotic system, we used a research-through-design approach. Through this approach, we developed six Visual RAS (ViRAS) scenarios derived from RAS observations. These scenarios represent different configurations of interacting surgical team members and the robotic system in distinct adverse RAS events. In ViRAS-guided interviews with experienced RAS surgeons, we found that ViRAS scenarios help reflect on surgical practices and the specific material and spatial properties of RAS. Our findings indicated that the team’s cognitive engagement during surgery could be improved by providing sensory augmentation to facilitate task perception and individual skill development. Through our research, we show how ViRAS scenarios, as a tool for reflection, can reveal opportunities for designing socio-material configurations in RAS and beyond.2026PSPeter Sörries et al.Freie Universität BerlinSurgical Assistance & Medical TrainingRobots in Education & HealthcarePrototyping & User TestingCHI
SituFont: A Just-in-Time Adaptive Intervention Interface for Enhancing Mobile Readability in Situational Visual ImpairmentsSituational visual impairments (SVIs) hinder mobile readability, causing discomfort and limiting information access. Building on prior work in adaptive typography and accessibility, this paper presents SituFont, a context-aware and human-in-the-loop adaptive typography adjustment approach that enhances smartphone mobile readability by dynamically adjusting font parameters based on real-time contextual changes. Using smartphone sensors and a human-in-the-loop approach, SituFont personalizes text presentation to accommodate personal factors (e.g., fatigue, distraction) and environmental conditions (e.g., lighting, motion, location). To inform its design, we conducted formative interviews (N=15) to identify key SVI factors and controlled experiments (N=18) to quantify their impact on optimal text parameters. A comparative user study (N=12) across eight simulated SVI scenarios demonstrated SituFont's effectiveness in improving smartphone mobile readability in terms of improved efficiency and reduced workload compared with a non-trivial manual adjustment baseline.2026JCJingruo Chen et al.Cornell UniversityMobile Accessibility DesignBehavior Change & Reflection TechnologyContext-Aware ComputingCHI
Giving Meaning to Movements: Challenges and Opportunities in Expanding Communication by Pairing Unaided AAC with Speech Generated MessagesAugmentative and Alternative Communication (AAC) technologies are categorized into two forms: aided AAC, which uses external devices like speech-generating systems to produce standardized output, and unaided AAC, which relies on body-based gestures for natural expression but requires shared understanding. We investigate how to combine these approaches to harness the speed and naturalness of unaided AAC while maintaining the intelligibility of aided AAC, a largely unexplored area for individuals with communication and motor impairments. Through 18 months of participatory design with AAC users, we identified key challenges and opportunities and developed AllyAAC, a wearable system with a wrist-worn IMU paired with a smartphone app. We evaluated AllyAAC in a field study with 14 participants and produced a dataset containing over 600,000 multimodal data points featuring atypical gestures—the first of its kind. Our findings reveal challenges in recognizing personalized, idiosyncratic gestures and demonstrate how to address them using Transformer-based large machine learning (ML) models with different pretraining strategies. In sum, we contribute design principles and a reference implementation for adaptive, personalized systems combining aided and unaided AAC.2026IKImran Kabir et al.Pennsylvania State UniversityElectrical Muscle Stimulation (EMS)Haptic WearablesBehavior Change & Reflection TechnologyCHI
A Cantilevered DeltaXY Positioning Mechanism Enabling Rackable Digital Fabrication Form FactorsDesktop digital fabrication presumes form-factors designed for workbenches, limiting suitability for other spaces and workflows. We propose a class of physically narrow and deep “rackable” digital fabrication machines that offers opportunities for new applications and interactions. Flexible and inconspicuous placement supports ubiquitous fabrication, including site- and context-specific tools. Personal factories could be enabled by shelf-optimized rackable digital fabrication technologies that improve organization and functionality for collections of machines. These explorations necessitate new positioning mechanisms and machine architectures. We contribute the Cantilevered DeltaXY mechanism that enables rackable digital fabrication form factors with high lateral spatial efficiencies (LSE). We develop first-order design tools to aid the implementation of DeltaXY machines. We demonstrate DeltaXY by creating Fab Unit, a “bookshelf 3D printer” with an LSE significantly higher than similar commercial desktop machines. Together, DeltaXY and Fab Unit open the design space of rackable digital fabrication for future HCI fabrication research.2026IMIlan E Moyer et al.Massachusetts Institute of TechnologyDesktop 3D Printing & Personal FabricationCustomizable & Personalized ObjectsCircuit Making & Hardware PrototypingCHI
TableTale: Reviving the Narrative Interplay Between Tables and Text in Scientific PapersData tables play a central role in scientific papers. However, their meaning is often co-constructed with surrounding text through narrative interplay, making comprehension cognitively demanding for readers. In this work, we explore how interfaces can better support this reading process. We conducted a formative study that revealed key characteristics of text-table narrative interplay, including linking mechanisms, multi-granularity alignments, and mention typologies, as well as a layered framework of readers’ intents. Informed by these insights, we present TableTale, an augmented reading interface that enriches text with data tables at multiple granularities, including paragraphs, sentences, and mentions. TableTale automatically constructs a document-level linking schema within the paper and progressively renders cascade visual cues on text and tables that unfold as readers move through the text. A within-subject study with 24 participants showed that TableTale reduced cognitive workload and improved reading efficiency, demonstrating its potential to enhance paper reading and inform future reading interface design.2026LWLiangwei Wang et al.The Hong Kong University of Science and Technology (Guangzhou)Interactive Data VisualizationData StorytellingVisualization Perception & CognitionCHI
No Spirituality Please, We’re HCI: Challenges for HCI Research on Religion and SpiritualityReligion and spirituality (R/S) shape billions of lives, yet they remain marginal in Human–Computer Interaction (HCI) research. Prior literature reviews mapped fragments of this space but missed key contributions and the lived realities of its researchers. We extend this picture through a review of 206 ACM and IEEE publications and a survey of R/S scholars in HCI (n=19). Our analysis shows a field in transition: Research on R/S is growing slightly in volume and diversity, with design-oriented work emerging as the dominant form of engagement. Yet the ACM and IEEE corpora remain largely separate, reflecting distinct epistemic traditions. Researchers report persistent challenges, including marginalization, exposing a deeper tension in HCI: While HCI claims to center the full range of human experience, R/S experience is still treated with suspicion. Our findings call for a reconsideration: If HCI is serious about human experience, it must take R/S experiences seriously as well.2026SWSara Wolf et al.Julius-Maximilians-Universität WürzburgTechnology Ethics & Critical HCIDeveloping Countries & HCI for Development (HCI4D)Gender & Race Issues in HCICHI
Engaging Communities Meaningfully in Defining Disability Representation for AI Image GenerationMedia representations of people with disabilities profoundly influence societal perceptions, yet have historically been absent, stereotyped, or inaccurate. As AI-generated visual media becomes increasingly prevalent, there is a critical opportunity to address these misrepresentations. Responding to the lack of collectively negotiated representation standards, this paper presents our human-centric approach to engaging disability communities meaningfully in AI data practices. Over three months, we worked closely with three disability organizations across the Global North and South to develop the Community Library Creator that introduces design scaffolds to support communities in defining ‘good’ representation and curating community-centric AI datasets; laying the foundations for community-specific evaluation metrics and future model adaptations. We contribute qualitative insights into the complexities of community-led data curation; discuss the value and practical challenges of intersecting human insights with AI requirements; and reflect on human-centered AI approaches that empower communities to share their perspectives and actively shape AI data practices.2026ATAnja Thieme et al.Microsoft ResearchAI Ethics, Fairness & AccountabilityAlgorithmic Fairness & BiasDeveloping Countries & HCI for Development (HCI4D)CHI
Beyond Content Exposure: Systemic Factors Driving Moderators' Mental Health Crisis in AfricaContent moderators review disturbing content to protect social media users, often at significant cost to their mental health. Recent reports document the mental health conditions of African moderators as notably problematic. Beyond the content itself, what factors contribute to the deteriorating mental health of these workers? We surveyed 134 moderators across Africa to understand their mental health and interviewed 15 moderators to contextualize their experiences. We found that African moderators suffer from high psychological distress and lower well-being compared to moderators in other areas. Former moderators showed significantly higher distress levels, demonstrating long-term impact that extends beyond their moderation work. Our interviews showed that systemic and structural labor conditions contribute to moderators’ severe psychological distress and diminished mental well-being. Corporate wellness programs promoted by platforms were found ineffective and inadequate. We discuss how this requires holistic attention and structural solutions by all involved parties to improve moderators’ mental health.2026NANuredin Ali et al.University of MinnesotaCyberbullying & Online HarassmentMid-Air Haptics (Ultrasonic)CHI
My Favorite Streamer is an LLM: Discovering, Bonding, and Co-Creating in AI VTuber FandomAI VTubers, where the performer is not human but algorithmically generated, introduce a new context for fandom. While human VTubers have been substantially studied for their cultural appeal, parasocial dynamics, and community economies, little is known about how audiences engage with their AI counterparts. To address this gap, we present a qualitative study of Neuro-sama, the most prominent AI VTuber. Our findings show that engagement is anchored in active co-creation: audiences are drawn by the AI's unpredictable yet entertaining interactions, cement loyalty through collective emotional events that trigger anthropomorphic projection, and sustain attachment via the AI's consistent persona. Financial support emerges not as a reward for performance but as a participatory mechanism for shaping livestream content, establishing a resilient fan economy built on ongoing interaction. These dynamics reveal how AI Vtuber fandom reshapes fan–creator relationships and offer implications for designing transparent and sustainable AI-mediated communities.2026JYJiayi Ye et al.Independent ResearcherIntelligent Voice Assistants (Alexa, Siri, etc.)Agent Personality & AnthropomorphismLive Streaming & Content CreatorsCHI
From Vulnerable to Resilient: Examining Parent and Teen Perceptions on How to Respond to Unwanted Cybergrooming AdvancesCybergrooming is a form of online abuse that threatens teens' mental health and physical safety. Yet, most prior work has focused on detecting perpetrators’ behaviors, leaving a limited understanding of how teens might respond to such unwanted advances. To address this gap, we conducted an online survey with 74 participants---51 parents and 23 teens---who responded to simulated cybergrooming scenarios in two ways: responses that they think would make teens more vulnerable or resilient to unwanted sexual advances. Through a mixed-methods analysis, we identified four types of vulnerable responses (encouraging escalation, accepting an advance, displaying vulnerability, and negating risk concern) and four types of protective strategies (setting boundaries, directly declining, signaling risk awareness, and leveraging avoidance techniques). As the cybergrooming risk escalated, both vulnerable responses and protective strategies showed a corresponding progression. This study contributes a teen-centered understanding of cybergrooming, a labeled dataset, and a stage-based taxonomy of perceived protective strategies, while offering implications for educational programs and sociotechnical interventions.2026XZXinyi Zhang et al.Virginia TechYouth Online Safety & PrivacyDigital Parenting & Screen Time ManagementMental Health Technology for YouthCHI