Finding the Signal in the Noise: An Exploratory Study on Assessing the Effectiveness of AI and Accessibility Forums for Blind Users’ Support NeedsAccessibility forums and, more recently, generative AI tools have become vital resources for blind users seeking solutions to computer-interaction issues and learning about new assistive technologies, screen reader features, tutorials, and software updates. Understanding user experiences with these resources is essential for identifying and addressing persistent support gaps. Towards this, we interviewed 14 blind users who regularly engage with forums and GenAI tools. Findings revealed that forums often overwhelm users with multiple overlapping topics, redundant or irrelevant content, and fragmented responses that must be mentally pieced together, increasing cognitive load. GenAI tools, while offering more direct assistance, introduce new barriers by producing unreliable answers, including overly verbose or fragmented guidance, fabricated information, and contradictory suggestions that fail to follow prompts, thereby heightening verification demands. Based on these insights, we outlined design opportunities to improve the reliability of assistive resources, aiming to provide blind users with more trustworthy and cognitively-manageable support.2026SKSatwik Ram Kodandaram et al.Stony Brook UniversityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Generative AI (Text, Image, Music, Video)Explainable AI (XAI)CHI
KeySense: LLM-Powered Hands-Down, Ten-Finger Typing on Commodity TouchscreensExisting touchscreen software keyboards prevent users from resting their hands, forcing slow and fatiguing index-finger tapping (“chicken typing”) instead of familiar hands-down ten-finger typing. We present KeySense, a purely software solution that preserves physical keyboard motor skills. KeySense isolates intentional taps from resting-finger noise with cognitive–motor timing patterns, and then uses a fine-tuned LLM decoder to turn the resulting noisy letter sequence into the intended word. In controlled component tests, this decoder substantially outperforms 2 statistical baselines (top-1 accuracy 84.8% vs 75.7% and 79.3%). A 12-participant study shows clear ergonomic and performance benefits: compared with the conventional hover-style keyboard, users rated KeySense as markedly less physically demanding (NASA-TLX median 1.5 vs 4.0), and after brief practice, typed significantly faster (WPM 28.3 vs 26.2, p <0.01). These results indicate that KeySense enables accurate, efficient and comfortable ten-finger text entry on commodity touchscreens, without any extra hardware.2026TLTony Li et al.Stony Brook UniversitySoft Keyboard & Virtual Keyboard DesignLanguage Model-Assisted Text InputCHI
Lost in Instructions: Study of Blind Users’ Experiences with DIY Manuals and AI-Rewritten Instructions for Assembly, Operation, and Troubleshooting of Tangible ProductsAI tools like ChatGPT and Be-My-AI are increasingly being used by blind individuals. Although prior work has explored their use in some Do-It-Yourself (DIY) tasks by blind individuals, little is known about how they use these tools and the available product-manual resources to assemble, operate, and troubleshoot physical/tangible products – tasks requiring spatial reasoning, structural understanding, and precise execution. We address this knowledge gap via an interview study and a usability study with blind participants, investigating how they leverage AI tools and product manuals for DIY tasks with physical products. Findings show that manuals are essential resources, but product-manual instructions are often inadequate for blind users. AI tools presently do not adequately address this insufficiency, in fact, we observed that they often exacerbate this issue with incomplete, incoherent, or misleading guidance. Lastly, we suggest improvements to AI tools for generating tailored instructions for blind users’ DIY tasks involving tangible products.2026MRMonalika Padma Reddy et al.Stony Brook UniversityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Generative AI (Text, Image, Music, Video)AI-Assisted Decision-Making & AutomationCHI
Enabling Auto-Correction on Soft Braille KeyboardA soft Braille keyboard is a graphical representation of the Braille writing system on smartphones. It provides an essential text input method for visually impaired individuals, but accuracy and efficiency remain significant challenges. We present an intelligent Braille keyboard with auto-correction ability, which uses optimal transportation theory to estimate the distances between touch input and Braille patterns, and combines it with a language model to estimate the probability of entering words. The proposed system was evaluated through both simulations and user studies. In a touch interaction simulation on an Android phone and an iPhone, our intelligent Braille keyboard demonstrated superior error correction performance compared to the Android Braille keyboard with proofreading suggestions and the iPhone Braille keyboard with spelling suggestions. It reduced the error rate from 55.81% on Android and 57.13% on iPhone to 19.80% under high typing noise. Furthermore, in a user study of 12 participants who are legally blind, the intelligent Braille keyboard reduced word error rate (WER) by 59.5% (42.53% to 17.28%) with a slight drop of 0.74 words per minute (WPM), compared to a conventional Braille keyboard without auto-correction. These findings suggest that our approach has the potential to greatly improve the typing experience for Braille users on touchscreen devices.2025DZDan Zhang et al.Voice AccessibilityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesUIST
Tap&Say: Touch Location-Informed Large Language Model for Multimodal Text Correction on SmartphonesWhile voice input offers a convenient alternative to traditional text editing on mobile devices, practical implementations face two key challenges: 1) reliably distinguishing between editing commands and content dictation, and 2) effortlessly pinpointing the intended edit location. We propose Tap&Say, a novel multimodal system that combines touch interactions with Large Language Models (LLMs) for accurate text correction. By tapping near an error, users signal their edit intent and location, addressing both challenges. Then, the user speaks the correction text. Tap&Say utilizes the touch location, voice input, and existing text to generate contextually relevant correction suggestions. We propose a novel touch location-informed attention layer that integrates the tap location into the LLM's attention mechanism, enabling it to utilize the tap location for text correction. We fine-tuned the touch location-informed LLM on synthetic touch locations and correction commands, achieving significantly higher correction accuracy than the state-of-the-art method VT. A 16-person user study demonstrated that Tap&Say outperforms VT with 16.4% shorter task completion time and 47.5% fewer keyboard clicks and is preferred by users.2025MZMaozheng Zhao et al.Stony Brook University, Department of Computer ScienceHuman-LLM CollaborationCHI
SpellRing: Recognizing Continuous Fingerspelling in American Sign Language using a RingFingerspelling is a critical part of American Sign Language (ASL) recognition and has become an accessible optional text entry method for Deaf and Hard of Hearing (DHH) individuals. In this paper, we introduce SpellRing, a single smart ring worn on the thumb that recognizes words continuously fingerspelled in ASL. SpellRing uses active acoustic sensing (via a microphone and speaker) and an inertial measurement unit (IMU) to track handshape and movement, which are processed through a deep learning algorithm using Connectionist Temporal Classification (CTC) loss. We evaluated the system with 20 ASL signers (13 fluent and 7 learners), using the MacKenzie-Soukoref Phrase Set of 1,164 words and 100 phrases. Offline evaluation yielded top-1 and top-5 word recognition accuracies of 82.45% (±9.67%) and 92.42% (±5.70%), respectively. In real-time, the system achieved a word error rate (WER) of 0.099 (±0.039) on the phrases. Based on these results, we discuss key lessons and design implications for future minimally obtrusive ASL recognition wearables.2025HLHyunchul Lim et al.Cornell, Computing and Information ScienceFoot & Wrist InteractionVoice AccessibilityMotor Impairment Assistive Input TechnologiesCHI
LLM Powered Text Entry Decoding and Flexible Typing on SmartphonesLarge language models (LLMs) have shown exceptional performance in various language-related tasks. However, their application in keyboard decoding, which involves converting input signals (e.g. taps and gestures) into text, remains underexplored. This paper presents a fine-tuned FLAN-T5 model for decoding. It achieves 93.1% top-1 accuracy on user-drawn gestures, outperforming the widely adopted SHARK2 decoder, and 95.4% on real-word tap typing data. In particular, our decoder supports Flexible Typing, allowing users to enter a word with taps, gestures, multi-stroke gestures, and tap-gesture combinations. User study results show that Flexible Typing is beneficial and well-received by participants, where 35.9% of words were entered using word gestures, 29.0% with taps, 6.1% with multi-stroke gestures, and the remaining 29.0% using tap-gestures. Our investigation suggests that the LLM-based decoder improves decoding accuracy over existing word gesture decoders while enabling the Flexible Typing method, which enhances the overall typing experience and accommodates diverse user preferences.2025YMYan Ma et al.Stony Brook University, Computer Science DepartmentEV Charging & Eco-Driving InterfacesHuman-LLM CollaborationCHI
Model Touch Pointing and Detect Parkinson's Disease via a Mobile GameLing 等人开发基于移动游戏的触控点建模方法,通过分析玩家在游戏中的触控行为特征,实现帕金森病的早期辅助检测,为疾病筛查提供新途径。2024KLKaiyan Ling et al.Motor Impairment Assistive Input TechnologiesSerious & Functional GamesUbiComp
Accessible Gesture Typing on Smartphones for People with Low VisionWhile gesture typing is widely adopted on touchscreen keyboards, its support for low vision users is limited. We have designed and implemented two keyboard prototypes, layout-magnified and key-magnified keyboards, to enable gesture typing for people with low vision. Both keyboards facilitate uninterrupted access to all keys while the screen magnifier is active, allowing people with low vision to input text with one continuous stroke. Furthermore, we have created a kinematics-based decoding algorithm to accommodate the typing behavior of people with low vision. This algorithm can decode the gesture input even if the gesture trace deviates from a pre-defined word template, and the starting position of the gesture is far from the starting letter of the target word. Our user study showed that the key-magnified keyboard achieved 5.28 words per minute, 27.5% faster than a conventional gesture typing keyboard with voice feedback.2024DZDan Zhang et al.Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Motor Impairment Assistive Input TechnologiesUIST
Hand Gesture Recognition for Blind Users by Tracking 3D Gesture TrajectoryHand gestures provide an alternate interaction modality for blind users and can be supported using commodity smartwatches without requiring specialized sensors. The enabling technology is an accurate gesture recognition algorithm, but almost all algorithms are designed for sighted users. Our study shows that blind user gestures are considerably different from sighted users, rendering current recognition algorithms unsuitable. Blind user gestures have high inter-user variance, making learning gesture patterns difficult without large-scale training data. Instead, we design a gesture recognition algorithm that works on a 3D representation of the gesture trajectory, capturing motion in free space. Our insight is to extract a micro-movement in the gesture that is user-invariant and use this micro-movement for gesture classification. To this end, we develop an ensemble classifier that combines image classification with geometric properties of the gesture. Our evaluation demonstrates a 92% classification accuracy, surpassing the next best state-of-the-art which has an accuracy of 82%.2024PKPrerna Khanna et al.Stony Brook UniversityHand Gesture RecognitionVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
TouchType-GAN: Modeling Touch Typing with Generative Adversarial NetworkModels that can generate touch typing tasks are important to the development of touch typing keyboards. We propose TouchType- GAN, a Conditional Generative Adversarial Network that can sim- ulate locations and time stamps of touch points in touch typing. TouchType-GAN takes arbitrary text as input to generate realistic touch typing both spatially (i.e., (𝑥, 𝑦) coordinates of touch points) and temporally (i.e., timestamps of touch points). TouchType-GAN in- troduces a variational generator that estimates Gaussian Distribu- tions for every target letter to prevent mode collapse. Our experi- ments on a dataset with 3k typed sentences show that TouchType- GAN outperforms existing touch typing models, including the Ro- tational Dual Gaussian model for simulating the distribution of touch points, and the Finger-Fitts Euclidean Model for sim- ulating typing time. Overall, our research demonstrates that the proposed GAN structure can learn the distribution of user typed touch points, and the resulting TouchType-GAN can also estimate typing movements. TouchType-GAN can serve as a valuable tool for designing and evaluating touch typing input systems.2023JCJeremy Chu et al.Force Feedback & Pseudo-Haptic WeightHuman-LLM CollaborationUIST
Modeling Touch-based Menu Selection Performance of Blind Users via Reinforcement LearningAlthough menu selection has been extensively studied in HCI, most existing studies have focused on sighted users, leaving blind users' menu selection under-studied. In this paper, we propose a computational model that can simulate blind users’ menu selection performance and strategies, including the way they use techniques like swiping, gliding, and direct touch. We assume that selection behavior emerges as an adaptation to the user's memory of item positions based on experience and feedback from the screen reader. A key aspect of our model is a model of long-term memory, predicting how a user recalls and forgets item position based on previous menu selections. We compare simulation results predicted by our model against data obtained in an empirical study with ten blind users. The model correctly simulated the effect of the menu length and menu arrangement on selection time, the action composition, and the menu selection strategy of the users.2023ZLZhi Li et al.Stony Brook UniversityVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
WordGesture-GAN: Modeling Word-Gesture Movement with Generative Adversarial NetworkWord-gesture production models that can synthesize word-gestures are critical to the training and evaluation of word-gesture keyboard decoders. We propose WordGesture-GAN, a conditional generative adversarial network that takes arbitrary text as input to generate realistic word-gesture movements in both spatial (i.e., $(x,y)$ coordinates of touch points) and temporal (i.e., timestamps of touch points) dimensions. WordGesture-GAN introduces a Variational Auto-Encoder to extract and embed variations of user-drawn gestures into a Gaussian distribution which can be sampled to control variation in generated gestures. Our experiments on a dataset with 38k gesture samples show that WordGesture-GAN outperforms existing gesture production models including the minimum jerk model [37] and the style-transfer GAN [31,32] in generating realistic gestures. Overall, our research demonstrates that the proposed GAN structure can learn variations in user-drawn gestures, and the resulting WordGesture-GAN can generate word-gesture movement and predict the distribution of gestures. WordGesture-GAN can serve as a valuable tool for designing and evaluating gestural input systems.2023JCJeremy Chu et al.Stony Brook UniversityHand Gesture RecognitionHuman-LLM CollaborationCreative Coding & Computational ArtCHI
GlanceWriter: Writing Text by Glancing Over Letters with GazeWriting text with eye gaze only is an appealing hands-free text entry method. However, existing gaze-based text entry methods introduce eye fatigue and are slow in typing speed because they often require users to dwell on letters of a word, or mark the starting and ending positions of a gaze path with extra operations for entering a word. In this paper, we propose GlanceWriter, a text entry method that allows users to enter text by glancing over keys one by one without any need to dwell on any keys or specify the starting and ending positions of a gaze path when typing a word. To achieve so, GlanceWriter probabilistically determines the letters to be typed based on the dynamics of gaze movements and gaze locations. Our user studies demonstrate that GlanceWriter significantly improves the text entry performance over EyeSwipe, a dwell-free input method using ``reverse crossing'' to identify the starting and ending keys. GlanceWriter also outperforms the dwell-free gaze input method of Tobii's Communicator 5, a commercial eye gaze-based communication system. Overall, GlanceWriter achieves dwell-free and crossing-free text entry by probabilistically decoding gaze paths, offering a promising gaze-based text entry method.2023WCWenzhe Cui et al.Stony Brook UniversityEye Tracking & Gaze InteractionCHI
Phrase-Gesture Typing on SmartphonesWe study phrase-gesture typing, a gesture typing method that allows users to type short phrases by swiping through all the letters of the words in a phrase using a single, continuous gesture. Unlike word-gesture typing, where text needs to be entered word by word, phrase-gesture typing enters text phrase by phrase. To demonstrate the usability of phrase-gesture typing, we implemented a prototype called PhraseSwipe. Our system is composed of a frontend interface designed specifically for typing through phrases and a backend phrase-level gesture decoder developed based on a transformer-based neural language model. Our decoder was trained using five million phrases of varying lengths of up to five words, chosen randomly from the Yelp Review Dataset. Through a user study with 12 participants, we demonstrate that participants could type using PhraseSwipe at an average speed of 34.5 WPM with a Word Error Rate of 1.1%.2022ZXZheer Xu et al.Voice User Interface (VUI) DesignGenerative AI (Text, Image, Music, Video)UIST
Bayesian Hierarchical Pointing ModelsBayesian hierarchical models are probabilistic models that have hierarchical structures and use Bayesian methods for inferences. In this paper, we extend Fitts' law to be a Bayesian hierarchical pointing model and compare it with the typical pooled pointing models (i.e., treating all observations as the same pool), and the individual pointing models (i.e., building an individual model for each user separately). The Bayesian hierarchical pointing models outperform pooled and individual pointing models in predicting the distribution \hl{and the mean of pointing movement time, especially when the training data are sparse.} Our investigation also shows that \hl{both noninformative and weakly informative priors are adequate for modeling pointing actions,} although the weakly informative prior performs slightly better than the noninformative prior when the training data size is small. Overall, we conclude that the expected advantages of Bayesian hierarchical models hold for the pointing tasks. Bayesian hierarchical modeling should be adopted a more principled and effective approach of building pointing models than the current common practices in HCI which use pooled or individual models.2022HZHANG ZHAO et al.Visualization Perception & CognitionComputational Methods in HCIUIST
EyeSayCorrect: Eye Gaze and Voice Based Hands-free Text Correction for Mobile DevicesText correction on mobile devices usually requires precise and repetitive manual control. In this paper, we present EyeSayCorrect, an eye gaze and voice based hands-free text correction method for mobile devices. To correct text with EyeSayCorrect, the user first utilizes the gaze point on the screen to select a word, then speaks the new phrase. EyeSayCorrect would then infer the user's correction intention based on the inputs and the text context. EyeSayCorrect can accommodate ambiguities and noisy input signals. We used a Bayesian approach for determining the selected word given an eye-gaze trajectory. Given each sampling point in an eye-gaze trajectory, the posterior probability of selecting a word is calculated and accumulated. The target word would be selected when its accumulated interest is larger than a threshold. The misspelled words have higher priors. Our evaluation showed that EyeSayCorrect can correct text with promising performance. The mean +/- 95% CI of the task completion time (in seconds) with priors is 11.63 +/- 1.07 for large font size (28 pt) and 11.57 +/- 1.14 for small font size (14 pt). Using priors for misspelled words reduced the task-completion time of large text by 9.26% and small text by 23.79%, and it reduced the text-selecting time of large text by 23.49% and small text by 40.35%. The subjective ratings are also in favor of the method with priors for misspelled words. Overall, EyeSayCorrect utilizes the advantages of eye gaze and voice input, making hands-free text correction available and efficient on mobile devices.2022MZMaozheng Zhao et al.Eye Tracking & Gaze InteractionVoice User Interface (VUI) DesignIUI
Automatically Generating and Improving Voice Command Interface from Operation Sequences on SmartphonesUsing voice commands to automate smartphone tasks (e.g., making a video call) can effectively augment the interactivity of numerous mobile apps. However, creating voice command interfaces requires a tremendous amount of effort in labeling and compiling the graphical user interface (GUI) and the utterance data. In this paper, we propose AutoVCI, a novel approach to automatically generate voice command interface (VCI) from smartphone operation sequences. The generated voice command interface has two distinct features. First, it automatically maps a voice command to GUI operations and fills in parameters accordingly, leveraging the GUI data instead of corpus or hand-written rules. Second, it launches a complementary Q&A dialogue to confirm the intention in case of ambiguity. In addition, the generated voice command interface can learn and evolve from user interactions. It accumulates the history command understanding results to annotate the user’s input and improve its semantic understanding ability. We implemented this approach on Android devices and conducted a two-phase user study with 16 and 67 participants in each phase. Experimental results of the study demonstrated the practical feasibility of AutoVCI.2022LPLihang Pan et al.Tsinghua University, Tsinghua UniversityVoice User Interface (VUI) DesignHuman-LLM CollaborationCHI
Select or Suggest? Reinforcement Learning-based Method for High-Accuracy Target Selection on TouchscreensSuggesting multiple target candidates based on touch input is a possible option for high-accuracy target selection on small touchscreen devices. But it can become overwhelming if suggestions are triggered too often. To address this, we propose SATS, a Suggestion-based Accurate Target Selection method, where target selection is formulated as a sequential decision problem. The objective is to maximize the utility: the negative time cost for the entire target selection procedure. The SATS decision process is dictated by a policy generated using reinforcement learning. It automatically decides when to provide suggestions and when to directly select the target. Our user studies show that SATS reduced error rate and selection time over Shift~\cite{vogel2007shift}, a magnification-based method, and MUCS, a suggestion-based alternative that optimizes the utility for the current selection. SATS also significantly reduced error rate over BayesianCommand~\cite{zhu2020using}, which directly selects targets based on posteriors, with only a minor increase in selection time.2022ZLZhi Li et al.Stony Brook UniversityHand Gesture RecognitionHuman-LLM CollaborationCHI
Modeling Touch Point Distribution with Rotational Dual Gaussian ModelTouch point distribution models are important tools for designing touchscreen interfaces. In this paper, we investigate how the finger movement direction affects the touch point distribution, and how to account for it in modeling. We propose the Rotational Dual Gaussian model, a refinement and generalization of the Dual Gaussian model, to account for the finger movement direction in predicting touch point distribution. In this model, the major axis of the prediction ellipse of the touch point distribution is along the finger movement direction, and the minor axis is perpendicular to the finger movement direction. We also propose using projected target width and height, in lieu of nominal target width and height to model touch point distribution. Evaluation on three empirical datasets shows that the new model reflects the observation that the touch point distribution is elongated along the finger movement direction, and outperforms the original Dual Gaussian Model in all prediction tests. Compared with the original Dual Gaussian model, the Rotational Dual Gaussian model reduces the RMSE of touch error rate prediction from 8.49% to 4.95%, and more accurately predicts the touch point distribution in target acquisition. Using the Rotational Dual Gaussian model can also improve the soft keyboard decoding accuracy on smartwatches.2021YMYan Ma et al.Hand Gesture RecognitionEye Tracking & Gaze InteractionUIST