UI Remix: Supporting UI Design Through Interactive Example Retrieval and RemixingDesigning user interfaces (UIs) is a critical step when launching products, building portfolios, or personalizing projects, yet end users without design expertise often struggle to articulate their intent and to trust design choices. Existing example-based tools either promote broad exploration, which can cause overwhelm and design drift, or require adapting a single example, risking design fixation. We present UI Remix, an interactive system that supports mobile UI design through an example-driven design workflow. Powered by a multimodal retrieval-augmented generation (MMRAG) model, UI Remix enables iterative search, selection, and adaptation of examples at both the global (whole interface) and local (component) level. To foster trust, it presents source transparency cues such as ratings, download counts, and developer information. In an empirical study with 24 end users, UI Remix significantly improved participants' ability to achieve their design goals, facilitated effective iteration, and encouraged exploration of alternative designs. Participants also reported that source transparency cues enhanced their confidence in adapting examples. Our findings suggest new directions for AI-assisted, example-driven systems that empower end users to design with greater control, trust, and openness to exploration.2026JWJunling Wang et al.Department of Computer Science, ETH AI CenterPrototyping & User TestingGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationIUI
StepMIND: A Visual Framework for Stepwise, Multimodal, and Bidirectional Explanations of AI-Generated Data Analysis PipelineArtificial intelligence (AI) enables users to generate data visualizations from natural language descriptions, lowering the barrier to data exploration. However, AI-generated visualizations often present only the final output, lacking transparency and limiting users' ability to verify, interpret, or refine the results. To address this, we introduce \stepmindnospace, a generalizable visual framework that enhances explainability and interactivity in AI-generated data analysis pipelines. \stepmind integrates four dimensions: (1) Stepwise Refinement, allowing users to engage in the AI decision process; (2) Multimodal Explanations, combining natural language, structured notation, direct manipulation, and content visualization for accessible interpretation; (3) Bidirectional Editing, enabling seamless updates across modalities; and (4) Familiar Interaction Models, such as code editor and spreadsheet-based manipulations, to support both technical and non-technical users. To demonstrate its utility, we apply \stepmind in \stagenospace, a case study system for AI-assisted data visualization. A within-subject user study (N=20) shows that \stage significantly improves user confidence and trust, reduces cognitive load, and facilitates both exploratory and corrective refinements. Our findings further suggest that \stepmind can generalize to broader AI-assisted workflows, offering a visible and interactive approach to explainable AI.2026YWYang Wu et al.ETH ZurichExplainable AI (XAI)Interactive Data VisualizationAI-Assisted Decision-Making & AutomationIUI
AI and My Values: User Perceptions of LLMs’ Ability to Extract, Embody, and Explain Human Values from Casual ConversationsDoes AI understand human values? While this remains an open philosophical question, we take a pragmatic stance by introducing VAPT, the Value-Alignment Perception Toolkit, for studying how LLMs reflect people's values and how people judge those reflections. 20 participants texted a chatbot over a month, then completed a 2-hour interview with our toolkit evaluating AI's ability to extract (pull details regarding), embody (make decisions guided by), and explain (provide proof of) their values. 13 participants ultimately left our study convinced that AI can understand human values. Thus, we warn about "weaponized empathy": a design pattern that may arise in interactions with value-aware, yet welfare-misaligned conversational agents. VAPT offers a new way to evaluate value-alignment in AI systems. We also offer design implications to evaluate and responsibly build AI systems with transparency and safeguards as AI capabilities grow more inscrutable, ubiquitous, and posthuman into the future.2026BYBhada Yun et al.ETH ZürichHuman-LLM CollaborationExplainable AI (XAI)AI Ethics, Fairness & AccountabilityCHI
Through the Lens of Human-Human Collaboration: An Configurable Research Platform for Exploring Human-Agent CollaborationIntelligent systems have traditionally been designed as tools rather than collaborators, often lacking critical characteristics that collaboration partnerships require. Recent advances in large language model (LLM) agents open new opportunities for human-LLM-agent collaboration by enabling natural communication and various social and cognitive behaviors. Yet it remains unclear whether principles of computer-mediated collaboration established in HCI and CSCW persist, change, or fail when humans collaborate with LLM agents. To support systematic investigations of these questions, we introduce an open and configurable research platform for HCI researchers. The platform's modular design allows seamless adaptation of classic CSCW experiments and manipulation of theory-grounded interaction controls. We demonstrate the platform's research efficacy and usability through three case studies: (1) two Shape Factory experiments for resource negotiation with 16 participants, (2) one Hidden Profile experiment for information pooling with 16 participants, and (3) a participatory cognitive walkthrough with five HCI researchers to refine workflows of researcher interface for experiment setup and analysis.2026BYBingsheng Yao et al.Northeastern UniversityHuman-LLM CollaborationParticipatory DesignPrototyping & User TestingCHI
Point & Grasp: Flexible Selection of Out-of-Reach Objects Through Probabilistic Cue IntegrationSelecting out-of-reach objects is a fundamental task in mixed reality (MR). Existing methods rely on a single cue or deterministically fuse multiple cues, leading to performance degradation when the dominant cue becomes unreliable. In this work, we introduce a probabilistic cue integration framework that enables flexible combination of multiple user-generated cues for intent inference. Inspired by natural grasping behavior, we instantiate the framework with pointing direction and grasp gestures as a new interaction technique, \textsc{Point\&Grasp}. To this end, we collect the \datasetfullname~(\dataset) dataset to train a robust likelihood model of the gestural cue, which captures grasping patterns not present in existing in-reach datasets. User studies demonstrate that our selection method with cue integration not only improves accuracy and speed over single-cue baselines, but also remains practically effective compared to state-of-the-art methods across various sources of ambiguity. The dataset and code are available at \url{https://github.com/drlxj/point-and-grasp}.2026XLXuejing Luo et al.Aalto UniversityFull-Body Interaction & Embodied InputMixed Reality WorkspacesPhysical-Digital Hybrid InteractionCHI
Exploring the Impacts and Challenges of Vibe Coding Paradigm to Children's Programming Learning and PracticesRecent advances in generative AI have introduced a new programming paradigm—vibe coding, a natural language–driven mode of AI collaboration. While promising for adults, little is known about how children engage with this approach, especially in block-based environments. To explore this gap, we conducted workshops with children of varying Scratch experience (n=41) and interviewed five Scratch teachers. Our study investigates how vibe coding impacts children’s programming learning and practice, and what challenges arise. Findings show that vibe coding has both positive and negative impacts across three key contexts of children’s programming experience: acquisition, application, and creation. Across the stages of vibe coding—goal articulation, information interpretation, and outcome evaluation—children encounter distinct challenges. By examining the mismatches between core assumptions of vibe coding and children’s needs, and analyzing its applicability across different contexts, we offer child-centered design implications for future vibe coding systems and GenAI tools.2026JSJanice Jianing SI et al.University of MacauProgramming Education & Computational ThinkingChildren's AI Literacy & Data LiteracyHuman-LLM CollaborationCHI
DiLLS: Interactive Diagnosis of LLM-based Multi-agent Systems via Layered Summary of Agent BehaviorsLarge language model (LLM)-based multi-agent systems have demonstrated impressive capabilities in handling complex tasks. However, the complexity of agentic behaviors makes these systems difficult to understand. When failures occur, developers often struggle to identify root causes and to determine actionable paths for improvement. Traditional methods that rely on inspecting raw log records are inefficient, given both the large volume and complexity of data. To address this challenge, we propose a framework and an interactive system, DiLLS, designed to reveal and structure the behaviors of multi-agent systems. The key idea is to organize information across three levels of query completion: activities, actions, and operations. By probing the multi-agent system through natural language, DiLLS derives and organizes information about planning and execution into a structured, multi-layered summary. Through a user study, we show that DiLLS significantly improves developers’ effectiveness and efficiency in identifying, diagnosing, and understanding failures in LLM-based multi-agent systems.2026RSRui Sheng et al.The Hong Kong University of Science and TechnologyHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationExplainable AI (XAI)CHI
PleaSQLarify: Visual Pragmatic Repair for Natural Language Database QueryingNatural language database interfaces broaden data access, yet they remain brittle under input ambiguity. Standard approaches often collapse uncertainty into a single query, offering little support for mismatches between user intent and system interpretation. We reframe this challenge through pragmatic inference: while users economize expressions, systems operate on priors over the action space that may not align with the users'. In this view, pragmatic repair---incremental clarification through minimal interaction---is a natural strategy for resolving underspecification. We present PleaSQLarify, which operationalizes pragmatic repair by structuring interaction around interpretable decision variables that enable efficient clarification. A visual interface complements this by surfacing the action space for exploration, requesting user disambiguation, and making belief updates traceable across turns. In a study with twelve participants, PleaSQLarify helped users recognize alternative interpretations and efficiently resolve ambiguity. Our findings highlight pragmatic repair as a design principle that fosters effective user control in natural language interfaces.2026RCRobin Shing Moon Chan et al.ETH ZürichHuman-LLM CollaborationExplainable AI (XAI)Interactive Data VisualizationCHI
GenRole: Personalizing Role Play for Educators Supporting Autistic Students’ Social Interaction LearningRole-play is widely used to empower autistic children to explore social interaction and dynamics on their own terms, navigating neurotypical social conventions to shape social expressions in ways that align with their own traits and needs, fostering a stronger sense of agency. However, existing approaches typically rely on fixed content, requiring educators to design materials, which creates a significant burden on manual preparation. According to insights from a formative study, we developed GenRole, a generative AI system that enables educators to design personalized role play class activities. GenRole supports a progression from simple to complex interactions and allows for personalization of characters, settings, and dialogues that meet the needs of autistic learners. We conducted a pilot study with 16 educators, followed by a two-week evaluation study with 11 autistic children and their teachers. Results show that GenRole enhances the efficiency and flexibility of role play design while improving instructional support, offering design insights for creating personalized components that help educators deliver more engaging and individualized social interaction learning for autistic children.2026YLYixuan Li et al.Hong Kong University of Science and Technology (Guangzhou)Special Education TechnologyGenerative AI (Text, Image, Music, Video)Cognitive Impairment & Neurodiversity (Autism, ADHD, Dyslexia)CHI
Beyond Precision: Understanding the Impact of Algorithmic Accuracy and Transparency on User Perceptions in Keyword-Driven Contextual AdvertisingAlgorithms frequently manage online advertising markets, aligning advertisements with article topics. Our work investigates how users perceive the relevance of ads to articles when ads are placed using different keyword extraction algorithms, including Large Language Models (LLMs), and how transparency about the placement procedure influences these perceptions and behavioral intentions. We conducted an online user experiment (N = 498) where ads are matched with news articles using the keyword extraction methods TF-IDF, KeyBERT, and DeepSeek. Results indicate that lightweight methods can match advanced LLMs in delivering high user-perceived ad-article relevance, which in turn fosters click and purchase intentions. However, providing explanations for the ad-article placements by displaying extracted keywords reduces ad interest and thereby weakens behavioral intentions, while simultaneously increasing perceived relevance and moderating algorithm effects. These findings highlight the complex impact of transparency-increasing explanations and suggest that algorithmic precision metrics must be complemented by user perception and intention measures.2026JCJingwen Cai et al.Umeå UniversityExplainable AI (XAI)AI-Assisted Decision-Making & AutomationAlgorithmic Transparency & AuditabilityCHI
CoMap: A Collaborative 3D Sketch Mapping Game to Engage Spatial Communication in Search and RescueSearch and rescue (SAR) is a complex teamwork environment that requires efficient spatial communication between commanders and field teams with heterogeneous perspectives and asymmetric information. Maps are central artifacts in SAR, yet they are also a space of technological tension due to constantly changing situation at disaster sites. Sketch mapping is an effective method of externalizing and communicating spatial understanding, increasing situation awareness in spatial decision-making tasks including SAR. Current paper-based sketch mapping in SAR struggles to handle the three-dimensional nature of physical space and remote collaboration. We propose CoMap, a collaborative 3D sketch mapping system validated in a virtual reality fire-rescue game. In a within-subject study with 13 commander–field team pairs, CoMap enabled more accurate and efficient spatial communication than conventional 2D sketch mapping. Communication analysis further showed that CoMap fostered proactive descriptions. We distill three design implications for next-generation mapping tools to advance SAR training and real-world operations.2026TXTianyi Xiao et al.Institute of Cartography and Geoinformation, ETH ZurichVolunteer Coordination & Crowdsourced Disaster ReliefPost-Disaster Community Recovery TechnologySocial & Collaborative VRCHI
Git Takes Two: Split-View Awareness for Collaborative Learning of Distributed Workflows in GitGit is widely used for collaborative software development, but it can be challenging for newcomers. While most learning tools focus on individual workflows, Git is inherently collaborative. We present GitAcademy, a browser-based learning platform that embeds a full Git environment with a split-view collaborative mode: learners work on their own local repositories connected to a shared remote repository, while simultaneously seeing their partner's actions mirrored in real time. This design is not intended for everyday software development, but rather as a training simulator to build awareness of distributed states, coordination, and collaborative troubleshooting. In a within-subjects study with 13 pairs of learners, we found that the split-view interface enhanced social presence, supported peer teaching, and was consistently preferred over a single-view baseline, even though performance gains were mixed. We further discuss how split-view awareness can serve as a training-only scaffold for collaborative learning of Git and other distributed technical systems.2026JBJoel Bucher et al.ETH ZürichCollaborative Learning & Peer TeachingDistributed Team CollaborationCrowdsourcing Task Design & Quality ControlCHI
Automating UI Optimization through Multi-Agentic ReasoningWe present AutoOptimization, a novel multi-objective optimization framework for adapting user interfaces. From a user’s verbal preferences for changing a UI, our framework guides a prioritization-based Pareto frontier search over candidate layouts. It selects suitable objective functions for UI placement while simultaneously parameterizing them according to the user's instructions to define the optimization problem. A solver then generates a series of optimal UI layouts, which our framework validates against the user's instructions to adapt the UI with the final solution. Our approach thus overcomes the previous need for manual inspection of layouts and the use of population averages for objective parameters. We integrate a Vision-Language Model into our framework whose reasoning capabilities allow us to focus on the Pareto optimization, prioritize results, and validate outcomes. We evaluate each step of our framework inside a Mixed Reality use case and demonstrate that AutoOptimization effectively increases the usability of UI adaptation schemes.2026ZLZhipeng Li et al.ETH ZürichHuman-LLM CollaborationMixed Reality WorkspacesPrototyping & User TestingCHI
Efficient Human-in-the-Loop Optimization via Priors Learned from User ModelsHuman-in-the-loop optimization identifies optimal interface designs by iteratively observing user performance. However, it often requires numerous iterations due to the lack of prior information. While recent approaches have accelerated this process by leveraging previous optimization data, collecting user data remains costly and often impractical. We present a conceptual framework, Human-in-the-Loop Optimization with Model-Informed Priors (HOMI), which augments human-in-the-loop optimization with a training phase where the optimizer learns adaptation strategies from diverse, synthetic user data generated with predictive models before deployment. To realize HOMI, we introduce Neural Acquisition Function+ (NAF+), a Bayesian optimization method featuring a neural acquisition function trained with reinforcement learning. NAF+ learns optimization strategies from large-scale synthetic data, improving efficiency in real-time optimization with users. We evaluate HOMI and NAF+ with mid-air keyboard optimization, a representative VR input task. Our work presents a new approach for more efficient interface adaptation by bridging in situ and in silico optimization processes.2026YLYi-Chi Liao et al.ETH ZürichMid-Air Haptics (Ultrasonic)Hand Gesture RecognitionImmersion & Presence ResearchCHI
"Bespoke Bots'': Diverse Instructor Needs for Customizing Generative AI Classroom ChatbotsInstructors are increasingly experimenting with AI chatbots for classroom support. To investigate how instructors adapt chatbots to their own contexts, we first analyzed existing resources that provide prompts for educational purposes. We identified ten common categories of customization, such as persona, guardrails, and personalization. We then conducted interviews with ten university STEM instructors and asked them to card-sort the categories into priorities. We found that instructors consistently prioritized the ability to customize chatbot behavior to align with course materials and pedagogical strategies and de-prioritized customizing persona/tone. However, their prioritization of other categories varied significantly by course size, discipline, and teaching style, even across courses taught by the same individual, highlighting that no single design can meet all contexts. These findings suggest that modular AI chatbots may provide a promising path forward. We offer design implications for educational developers building the next generation of customizable classroom AI systems.2026IHIrene Hou et al.University of California, San DiegoHuman-LLM CollaborationIntelligent Tutoring Systems & Learning AnalyticsConversational ChatbotsCHI
From Junior to Senior: Allocating Agency and Navigating Professional Growth in Agentic AI-Mediated Software EngineeringJuniors enter as AI‑natives, seniors adapted mid‑career. AI is not just changing how engineers code–it is reshaping who holds agency across work and professional growth. We contribute junior–senior accounts on their usage of agentic AI through a three-phase mixed-methods study: ACTA combined with a Delphi process with 5 seniors, an AI-assisted debugging task with 10 juniors, and blind reviews of junior prompt histories by 5 more seniors. We found that agency in software engineering is primarily constrained by organizational policies rather than individual preferences, with experienced developers maintaining control through detailed delegation while novices struggle between over-reliance and cautious avoidance. Seniors leverage pre-AI foundational instincts to steer modern tools and possess valuable perspectives for mentoring juniors in their early AI-encouraged career development. From synthesis of results, we suggest three practices that focus on preserving agency in software engineering for coding, learning, and mentorship, especially as AI grows increasingly autonomous.2026DFDana Feng et al.NoneHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationGenerative AI (Text, Image, Music, Video)CHI
The Elephant in the Syntax: A Comparative Study of Semantics‑First, Block‑Based, and Textual ProgrammingSyntax remains a major barrier for novices. Although block-based systems reduce or eliminate syntax errors, conditionals still challenge learners, likely because their semantics remain implicit. In this paper, we address this problem by introducing a semantics-first, state-visible programming approach inspired by the classic visual language Stagecast Creator. To demonstrate its usefulness, we designed Elephant, a unified, Karel-like research platform that supports three equally expressive programming paradigms: (i) semantics-first programming, (ii) block-based programming with the Blockly library, and (iii) text-based programming in JavaScript with domain-specific libraries. We then deployed Elephant in two within-subjects studies with secondary-school students (N = 39) to compare semantics-first programming to textual and block-based baselines, keeping the program semantics constant across modes and reducing cross-tool confounds. Results indicate, among other things, that semantics-first programming yields significantly higher task performance, suggesting that increasing the visibility of the program state during program composition could support greater outcomes in secondary computing education.2026TWTheo B. Weidmann et al.ETH ZurichProgramming Education & Computational ThinkingK-12 Digital Education ToolsCHI
Does My Chatbot Have an Agenda? Understanding Human and AI Agency in Human-Human-like Chatbot InteractionAs AI chatbots shift from tools to companions, critical questions arise: who controls the conversation in human-AI chatrooms? This paper explores perceived human and AI agency in sustained conversation. We report a month-long longitudinal study with 22 adults who chatted with "Day", an LLM companion we built, followed by a semi-structured interview with post-hoc elicitation of notable moments, cross-participant chat reviews, and a 'strategy reveal' disclosing "Day's" goal for each conversation. We discover agency manifests as an emergent, shared experience: as participants set boundaries and the AI steered intentions, control was co-constructed turn-by-turn. We introduce a 3-by-4 framework mapping actors (Human, AI, Hybrid) by their action (Intention, Execution, Adaptation, Delimitation), modulated by individual and environmental factors. We argue for translucent design (transparency-on-demand) and provide implications for agency self-aware conversational agents.2026BYBhada Yun et al.ETH ZürichAgent Personality & AnthropomorphismHuman-LLM CollaborationAI-Assisted Decision-Making & AutomationCHI
Preference-Guided Prompt Optimization for Text-to-Image GenerationGenerative models are increasingly powerful, yet users struggle to guide them through prompts. The generative process is difficult to control and unpredictable, and user instructions may be ambiguous or under-specified. Prior prompt refinement tools heavily rely on human effort, while prompt optimization methods focus on numerical functions and are not designed for human-centered generative tasks, where feedback is better expressed as binary preferences and demands convergence within few iterations. We present APPO, a preference-guided prompt optimization algorithm. Instead of iterating prompts, users only provide binary preferential feedback. APPO adaptively balances its strategies between exploiting user feedback and exploring new directions, yielding effective and efficient optimization. We evaluate APPO on image generation, and the results show APPO enables achieving satisfactory outcomes in fewer iterations with lower cognitive load than manual prompt editing. We anticipate APPO will advance human-AI collaboration in generative tasks by leveraging user preferences to guide complex content creation.2026ZLZhipeng Li et al.ETH ZürichGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationCreative Collaboration & Feedback SystemsCHI
Computer Science Achievement and Writing Skills Predict Vibe Coding ProficiencyMany software development platforms now support LLM-driven programming, or “vibe coding”, a technique that allows one to specify programs in natural language and iterate from observed behavior, all without directly editing source code. While its adoption is accelerating, little is known about which skills best predict success in this workflow. We report a preregistered cross-sectional study with tertiary-level students (N = 100) who completed measures of computer-science achievement, domain-general cognitive skills, written-communication proficiency, and a vibe-coding assessment. Tasks were curated via an eight-expert consensus process and executed in a purpose-built, vibe-coding environment that mirrors commercial tools while enabling controlled evaluation. We find that both writing skill and CS achievement are significant predictors of vibe-coding performance, and that CS achievement remains a significant predictor after controlling for domain-general cognitive skills. The results may inform tool and curriculum design, including when to emphasize prompt-writing versus CS fundamentals to support future software creators.2026STSverrir Thorgeirsson et al.ETH ZurichHuman-LLM CollaborationUser Research Methods (Interviews, Surveys, Observation)Prototyping & User TestingCHI