Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic StudyLarge Language Models (LLMs) are increasingly developed for use in complex professional domains, yet little is known about how teams design and evaluate these systems in practice. This paper examines the challenges and trade-offs in LLM development through a 12-week ethnographic study of a team building a pedagogical chatbot. The researcher observed design and evaluation activities and conducted interviews with both developers and domain experts. Analysis revealed four key practices: creating workarounds for data collection, turning to augmentation when expert input was limited, co-developing evaluation criteria with experts, and adopting hybrid expert–developer–LLM evaluation strategies. These practices show how teams made strategic decisions under constraints and demonstrate the central role of domain expertise in shaping the system. Challenges included expert motivation and trust, difficulties structuring participatory design, and questions around ownership and integration of expert knowledge. We propose design opportunities for future LLM development workflows that emphasize AI literacy, transparent consent, and frameworks recognizing evolving expert roles.2026ASAnnalisa Szymanski et al.University of Notre DameHuman-LLM CollaborationParticipatory DesignUser Research Methods (Interviews, Surveys, Observation)IUI
Gazeify Then Voiceify: Physical Object Referencing Through Gaze and Voice Interaction with Displayless Smart GlassesSmart glasses enhance interactions with the environment by using head-mounted cameras to observe the user’s viewpoint , but lack the visual feedback used for common interactions. We introduce "Gazeify then Voiceify", a multimodal approach allowing object selection via gaze and voice using displayless smart glasses. Users can select a physical object with their gaze, and the system generates a digital mask and a voice description of the object's semantics. Users can further correct errors through free-form conversation. To demonstrate our approach, we develop an interactive system by integrating advanced object segmentation and detection with a visual-language model. User studies reveal that participants achieve correct gaze selection in 53% of the task trials and use voice disambiguation to correct 58% remaining errors. Participants also rated the system as likable, useful and easy to use.2026ZZZheng Zhang et al.University of Notre DameEye Tracking & Gaze InteractionVoice User Interface (VUI) DesignContext-Aware ComputingIUI
The Behavioral Fabric of LLM-Powered GUI Agents: Human Values and Interaction OutcomesLarge Language Model (LLM)-powered web GUI agents are increasingly automating everyday online tasks. Despite their popularity, little is known about how users' preferences and values impact agents' reasoning and behavior. In this work, we investigate how both explicit and implicit user preferences, as well as the underlying user values, influence agent decision-making and action trajectories. We built a controlled testbed of 14 common interactive web tasks, spanning shopping, travel, dining, and housing, each replicated from real websites and integrated with a low-fidelity LLM-based recommender system. We injected 12 human preferences and values as personas into four state-of-the-art agents and systematically analyzed their task behaviors. Our results show that preference and value-infused prompts consistently guided agents toward outcomes that reflected these preferences and values. While the absence of user preference or value guidance led agents to exhibit a strong efficiency bias and employ shortest-path strategies, their presence steered agents' behavior trajectories through the greater use of corresponding filters and interactive web features. Despite their influence, dominant interface cues, such as discounts and advertisements, frequently overrode these effects, shortening the agents' action trajectories and inducing rationalizations that masked rather than reflected value-consistent reasoning. The contributions of this paper are twofold: (1) an open-source testbed for studying the influence of values in agent behaviors, and (2) an empirical investigation of how user preferences and values shape web agent behaviors.2026SGSimret Araya Gebreegziabher et al.University of Notre DameHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationAI Ethics, Fairness & AccountabilityIUI
Narrative Scaffolding: A Narrative-First Framework for Data-Driven SensemakingWhen exploring data, analysts construct narratives about what the data means by asking questions, generating visualizations, reflecting on patterns, and revising their interpretations as new insights emerge. Yet existing analysis tools treat narrative as an afterthought, breaking the link between reasoning, reflection, and the evolving story from exploration. Consequently, analysts lose the ability to see how their reasoning evolves, making it harder to reflect systematically or build coherent explanations. To address this gap, we propose Narrative Scaffolding (NS), a framework for narrative-driven exploration that positions narrative construction as the primary interface for exploration and reasoning. We implemented this framework in a system that externalizes iterative reasoning through narrative-first entry, semantically aligned view generation, and reflection support via insight provenance and inquiry tracking. In a within-subject study (N=20), we demonstrated that narrative scaffolding facilitates broader exploration, deeper reflection, and more defensible narratives. An evaluation with visualization literacy experts (N=6) confirmed that the system produced outputs aligned with narrative intent and facilitated intentional exploration.2026OHOliver Huang et al.University of TorontoInteractive Data VisualizationData StorytellingVisualization Perception & CognitionIUI
Crepe: A Mobile Screen Data Collector Using Graph QueryCollecting mobile screen information datasets remains challenging for academic researchers. Commercial organizations often have exclusive access to mobile data, leading to a “data monopoly” that restricts academic research and user transparency. Existing open-source mobile data collection frameworks primarily focus on mobile sensing data rather than screen content. We present Crepe, a no-code Android app that enables researchers to collect information displayed on screen through simple demonstrations of target data. Crepe utilizes a novel Graph Query technique, which augments mobile UI structures to support flexible identification, location, and collection of specific data pieces. The tool emphasizes participants' privacy and agency by providing full transparency over collected data and allowing easy opt-out. We designed and built Crepe for research purposes only and in scenarios where researchers obtain explicit consent from participants. Code for Crepe will be open-sourced to support future academic research data collection.2026YLYuwen Lu et al.University of Notre DameUser Research Methods (Interviews, Surveys, Observation)Computational Methods in HCIResearch Ethics & Open ScienceCHI
Dark Patterns Meet GUI Agents: LLM Agent Susceptibility to Manipulative Interfaces and the Role of Human OversightThe dark patterns, deceptive interface designs manipulating user behaviors, have been extensively studied for their effects on human decision-making and autonomy. Yet, with the rising prominence of LLM-powered GUI agents that automate tasks from high-level intents, understanding how dark patterns affect agents is increasingly important. We present a two-phase empirical study examining how agents, human participants, and human-AI teams respond to 16 types of dark patterns across diverse scenarios. Phase 1 highlights that agents often fail to recognize dark patterns, and even when aware, prioritize task completion over protective action. Phase 2 revealed divergent failure modes: humans succumb due to cognitive shortcuts and habitual compliance, while agents falter from procedural blind spots. Human oversight improved avoidance but introduced costs such as attentional tunneling and cognitive load. Our findings show neither humans nor agents are uniformly resilient, and collaboration introduces new vulnerabilities, suggesting design needs for transparency, adjustable autonomy, and oversight.2026JTJingyu Tang et al.University of Notre DameDark Patterns RecognitionHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationCHI
Through the Lens of Human-Human Collaboration: An Configurable Research Platform for Exploring Human-Agent CollaborationIntelligent systems have traditionally been designed as tools rather than collaborators, often lacking critical characteristics that collaboration partnerships require. Recent advances in large language model (LLM) agents open new opportunities for human-LLM-agent collaboration by enabling natural communication and various social and cognitive behaviors. Yet it remains unclear whether principles of computer-mediated collaboration established in HCI and CSCW persist, change, or fail when humans collaborate with LLM agents. To support systematic investigations of these questions, we introduce an open and configurable research platform for HCI researchers. The platform's modular design allows seamless adaptation of classic CSCW experiments and manipulation of theory-grounded interaction controls. We demonstrate the platform's research efficacy and usability through three case studies: (1) two Shape Factory experiments for resource negotiation with 16 participants, (2) one Hidden Profile experiment for information pooling with 16 participants, and (3) a participatory cognitive walkthrough with five HCI researchers to refine workflows of researcher interface for experiment setup and analysis.2026BYBingsheng Yao et al.Northeastern UniversityHuman-LLM CollaborationParticipatory DesignPrototyping & User TestingCHI
Hidden Labor behind the Hype: Understanding AI Side Hustles through Platform Narratives and Worker PracticesAI side hustles are increasingly promoted on social media as accessible, empowering, and profitable opportunities. This paper examines the gap between such platform narratives and workers' lived experiences through a mixed-method study of 7,938 RedNote posts and 16 semi-structured interviews. Our analysis identifies monetization typologies and rhetorical strategies that portray AI work as simple and rewarding, while interview data reveal hidden labor, unstable income, and the devaluation of human contributions. By juxtaposing platform narratives with lived experiences, we show how these narratives structurally foreground ease and reward while downplaying the precarity embedded in actual AI work. This study contributes a critical account of how AI side hustles are framed and experienced, and offers design implications for HCI: platforms should moderate promotional content and provide clearer risk communication, while designers of human–AI collaboration tools should highlight and value human input rather than allowing it to remain invisible.2026XYXiaoyu YANG et al.The Hong Kong University of Science and Technology (Guangzhou)AI Ethics, Fairness & AccountabilityAI-Assisted Decision-Making & AutomationParticipatory DesignCHI
DiLLS: Interactive Diagnosis of LLM-based Multi-agent Systems via Layered Summary of Agent BehaviorsLarge language model (LLM)-based multi-agent systems have demonstrated impressive capabilities in handling complex tasks. However, the complexity of agentic behaviors makes these systems difficult to understand. When failures occur, developers often struggle to identify root causes and to determine actionable paths for improvement. Traditional methods that rely on inspecting raw log records are inefficient, given both the large volume and complexity of data. To address this challenge, we propose a framework and an interactive system, DiLLS, designed to reveal and structure the behaviors of multi-agent systems. The key idea is to organize information across three levels of query completion: activities, actions, and operations. By probing the multi-agent system through natural language, DiLLS derives and organizes information about planning and execution into a structured, multi-layered summary. Through a user study, we show that DiLLS significantly improves developers’ effectiveness and efficiency in identifying, diagnosing, and understanding failures in LLM-based multi-agent systems.2026RSRui Sheng et al.The Hong Kong University of Science and TechnologyHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationExplainable AI (XAI)CHI
Audience in the Loop: Viewer Feedback-Driven Content Creation in Micro-drama Production on Social MediaThe popularization of social media has led to increasing consumption of narrative content in byte-sized formats. Such micro-dramas contain fast-pace action and emotional cliffs, particularly attractive to emerging Chinese markets in platforms like Douyin and Kuaishou. Content writers for micro-dramas must adapt to fast-pace, audience-directed workflows, but previous research has focused instead on examining writers’ experiences of platform affordances or their perceptions of platform bias, rather than the step-by-step processes through which they actually write and iterative content. In 28 semi-structured interviews with scriptwriters and writers specialized in micro-dramas, we found that the short-turn-around workflow leads to writers taking on multiple roles simultaneously, iteratively adapting to storylines in response to real-time audience feedback in the form of comments, reposts, and memes. We identified unique narrative styles such as AI-generated micro-dramas and audience-responsive micro-dramas. This work reveals audience interaction as a new paradigm for collaborative creative processes on social media.2026GCGengchen Cao et al.Tsinghua - Anta Joint Research CenterCreative Collaboration & Feedback SystemsSocial Platform Design & User BehaviorLive Streaming & Content CreatorsCHI
Homeroom: A Value-Aligned and Community-Centered Homeschooling PlatformWe present \textit{Homeroom}, a homeschooling platform that treats parents as reflective partners in collaboration with LLMs, integrates culturally responsive personalization for generating schooling materials, and supports the formation of small, trusted circles. Homeroom provides plan-then-generate story and curriculum creation, alignment, and comparison to local school standards, and resource sharing in invite-only groups. We conducted a summative usability study with 15 Muslim homeschooling parents in the Greater Toronto Area. Findings show that previewable, editable drafts preserve parental agency; values work best as revisable ``soft constraints'' integrated into the platform; and parents prefer private circles with clear lineage. Parents also requested lightweight infrastructure (e.g., rubric libraries, portfolio builders) to reduce paperwork. We discuss opportunities and challenges in positioning AI as a deliberative partner in family- and community-shaped pedagogy.2026MRMohammad Rashidujjaman Rifat et al.University of Notre DameHuman-LLM CollaborationInclusive DesignCollaborative Learning & Peer TeachingCHI
Designing Staged Evaluation Workflows for LLMs: Integrating Domain Experts, Lay Users, and Model-Generated Evaluation CriteriaLarge Language Models (LLMs) are increasingly utilized for domain-specific tasks, yet evaluating their outputs remains challenging. A common strategy is to apply evaluation criteria to assess alignment with domain-specific standards, yet little is understood about how criteria differ across sources or where each type is most useful in the evaluation process. This study investigates criteria developed by domain experts, lay users, and LLMs to identify their complementary roles within an evaluation workflow. Results show that experts produce fact-based criteria with long-term value, lay users emphasize usability with a shorter-term focus, and LLMs target procedural checks for immediate task requirements. We also examine how criteria evolve between a priori and a posteriori phases, noting drift across stages as well as convergence in the a posteriori phase. Based on our observations, we propose design guidelines for a staged evaluation workflow combining the complementary strengths of these sources to balance quality, cost, and scalability.2026ASAnnalisa Szymanski et al.University of Notre DameHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationUser Research Methods (Interviews, Surveys, Observation)CHI
My Favorite Streamer is an LLM: Discovering, Bonding, and Co-Creating in AI VTuber FandomAI VTubers, where the performer is not human but algorithmically generated, introduce a new context for fandom. While human VTubers have been substantially studied for their cultural appeal, parasocial dynamics, and community economies, little is known about how audiences engage with their AI counterparts. To address this gap, we present a qualitative study of Neuro-sama, the most prominent AI VTuber. Our findings show that engagement is anchored in active co-creation: audiences are drawn by the AI's unpredictable yet entertaining interactions, cement loyalty through collective emotional events that trigger anthropomorphic projection, and sustain attachment via the AI's consistent persona. Financial support emerges not as a reward for performance but as a participatory mechanism for shaping livestream content, establishing a resilient fan economy built on ongoing interaction. These dynamics reveal how AI Vtuber fandom reshapes fan–creator relationships and offer implications for designing transparent and sustainable AI-mediated communities.2026JYJiayi Ye et al.Independent ResearcherIntelligent Voice Assistants (Alexa, Siri, etc.)Agent Personality & AnthropomorphismLive Streaming & Content CreatorsCHI
Nonvisual Support for Understanding and Reasoning about Data Structures Blind and visually impaired (BVI) computer science students face systematic barriers when learning data structures: current accessibility approaches typically translate diagrams into alternative text, focusing on visual appearance rather than preserving the underlying structure essential for conceptual understanding. More accessible alternatives often do not scale in complexity, cost to produce, or both. Motivated by a recent shift to tools for creating visual diagrams from code, we propose a solution that automatically creates accessible representations from structural information about diagrams. Based on a Wizard-of-Oz study, we derive design requirements for an automated system, Arboretum, that compiles text-based diagram specifications into three synchronized nonvisual formats—tabular, navigable, and tactile. Our evaluation with BVI users highlights the strength of tactile graphics for complex tasks such as binary search; the benefits of offering multiple, complementary nonvisual representations; and limitations of existing digital navigation patterns for structural reasoning. This work reframes access to data structures by preserving their structural properties. The solution is a practical system to advance accessible CS education.2026BWBrianna L Wimer et al.University of Notre DameVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesSpecial Education TechnologyCHI
Whose Data Builds the City? Critical Data Practices for Socio-Environmentally Just UrbanizationData-driven systems, such as satellite imagery, often dictate urban planning; however, they frequently neglect local, situated, and embodied knowledges. This paper examines the epistemic, political, and socio-ecological frictions that surface when city data (e.g., aerial imagery and administrative records) is brought into dialogue with community data (e.g., lived experiences and shared epistemologies) to guide more equitable urban planning. We employ Research through Design, complemented by ethnographic inquiry and auto-ethnographic reflection, to create a speculative probe that helps foreground the frictions embedded in urban data infrastructures in Bangalore, India. Our analysis reveals the limitations of dominant top-down urban data systems, which routinely obscure socio-ecological dependencies and selectively define what constitutes legitimate urban knowledge. We employ environmental justice as an analytical lens to analyze our findings, highlighting how urban data infrastructures can reproduce or contest inequalities and identify opportunities to foreground care, accountability, and equity, particularly in postcolonial contexts, toward cultivating socially just and climate-resilient urban futures.2026VSVishal Sharma et al.University of Notre DameSmart Cities & Urban SensingUrban SustainabilityAlgorithmic Fairness & BiasCHI
Overcoming Translation Delays: Towards Better Subtitle Design for Foreign Language Conversations in Extended RealityIn multilingual conferences, translation support should not compromise non‑verbal cues or social interaction. Prior work on eXtended Reality (XR) subtitles aids comprehension but rarely examines translation latency. We conducted a VR-simulated conference, testing latencies of 0, 1.5, 3, 4.5, and 6 seconds to measure overall comprehension and attribution of verbal and non‑verbal information. Results showed that latencies beyond 3 seconds significantly increased subjective difficulty and affected accuracy, while shorter latencies showed no significant effects. Furthermore, participants noted that very low delay drew attention to subtitles, reducing opportunities to observe the speaker. Guided by these insights, we designed and evaluated four VR subtitle interfaces, including one traditional and three novel designs. Across delay conditions, Merged Subtitles improved opportunities to observe the speaker and resulted in better emotion attribution and user experience than other designs. We also proposed design guidelines for XR subtitle interfaces based on different levels of translation latency.2026ZLZiming Li et al.Hong Kong University of Science and Technology (GZ)Multilingual & Cross-Cultural Voice InteractionAR Navigation & Context AwarenessImmersion & Presence ResearchCHI
Exploring Creator-Centric Methods for LLM-Assisted Interactive StorytellingWhile large language models (LLMs) are increasingly applied in creative domains, their role in supporting interactive storytelling tailored to creators’ needs remains underexplored. This thesis adopts a creator-centered perspective to examine how LLMs can assist in building interactive narratives, focusing on multi-layered structure editing, automated analysis, target user feedback, and the preservation of authorial control. A multi-stage design was employed: interviews with sixteen creators identified five key design goals, which informed the development of \textit{CoNoder}, a prototype integrating node-graph editing, dual interaction modes, and generation styles, ripple-effect analysis, and simulated feedback. Evaluation results show that \textit{CoNoder} improves creative efficiency, supports morally complex storytelling, and provides structured narrative feedback, though onboarding, expert guidance, and finer control remain areas for improvement. Overall, this research contributes a creator-focused framework and a practical system design approach, highlighting the need for future tools that balance expressive freedom with creative sovereignty.2026YLYuelu Li et al.The Hong Kong University of Science and Technology (Guangzhou)Human-LLM CollaborationAI-Assisted Creative WritingInteractive Narrative & Immersive StorytellingCHI
Balancing Goals, Health, and Cost: A Food Information System for Managing Complex Choices and Fostering Sustained Food AgencyTechnology offers new opportunities to support healthier food choices, particularly for individuals in low-income communities who face systemic barriers to obtaining nutritious, affordable groceries. We introduce a novel conceptual model of grocery planning that frames food purchasing as a multi-objective optimization problem that considers cost, nutrition components, and a consumer's personal dietary goals. Guided by Zimmerman’s model of Self-Regulated Learning and prior research on food agency, we designed the Food Information System, a planning tool that provides optimized product recommendations aligned with users’ goals by integrating store inventory, prices, and nutritional data. We evaluated our system in an eight-week within-subjects intervention with 55 participants from a food-insecure community, followed by focus group sessions. While overall Healthy Eating Index scores remained largely stable, participants reported improved nutritional awareness and greater perceived agency in planning and purchasing groceries. We discuss design implications to support food agency by promoting long-term food literacy and by enhancing autonomy in making food choices.2026ASAnnalisa Szymanski et al.University of Notre DameDiet Tracking & Nutrition ManagementBehavior Change & Reflection TechnologyData-Driven Personal Decision-MakingCHI
Vistoria: A Multimodal System to Support Fictional Story Writing through Instrumental Image-Text Co-EditingHumans think visually—we remember in images, dream in pictures, and use visual metaphors to communicate. Yet, most creative writing tools remain text-centric, limiting how writers plan and translate ideas. We present Vistoria, a system for synchronized image-text co-editing in fictional story writing. A formative Wizard-of-Oz co-design study with 10 story writers revealed how sketches, images, and text serve as essential elements for ideation and organization. Drawing on theories of Instrumental Interaction, Vistoria introduces instrumental operations—Lasso, Collage, Perspective Shift, and Filter that enable seamless narrative exploration across modalities. A controlled study with 12 participants shows that co-editing enhances expressiveness, immersion, and collaboration, opening space for writers to follow divergent story directions and craft more vivid, detailed narratives. While multimodality increased cognitive demand, participants reported stronger senses of ownership and agency. These findings demonstrate how multimodal co-editing expands creative potential by balancing abstraction and concreteness in narrative development.2026KFKexue Fu et al.City University of Hong KongAI-Assisted Creative WritingCreative Collaboration & Feedback SystemsCHI
CodeStream: Augmenting Timelines with Code Annotation for Navigating Large Coding HistoriesCode edit histories can offer instructors valuable insight into students’ problem-solving processes, revealing unproductive behaviors that final code alone cannot capture. For example, a correct solution may contain large copy-and-pasted segments (suggesting the code originated elsewhere) or unguided trial-and-error (suggesting a lack of clear strategy). Timelines are a common way to visualize code histories, but existing timeline visualizations of code or document histories show only when and where edits occurred, not what changed. Without this context, it is difficult to answer key questions about how students invested effort or to infer their intentions. We present CodeStream, a visualization system that augments timelines with situational code annotations, whose granularity and visibility dynamically adapt to scale and interaction state. A comparison study shows that CodeStream enables context-aware navigation of coding histories, supporting fast and accurate pattern identification, and helping instructors reason about students’ coding behaviors and identify who may need intervention.2026AZAshley Ge Zhang et al.University of Michigan, Ann ArborInteractive Data VisualizationCollaborative Writing ToolsProgramming Education & Computational ThinkingCHI