TypeAnywhere: A QWERTY-Based Text Entry Solution for Ubiquitous ComputingWe present a QWERTY-based text entry system, TypeAnywhere, for use in off-desktop computing environments. Using a wearable device that can detect finger taps, users can leverage their touch-typing skills from physical keyboards to perform text entry on any surface. TypeAnywhere decodes typing sequences based only on finger-tap sequences without relying on tap locations. To achieve optimal decoding performance, we trained a neural language model and achieved a 1.6% character error rate (CER) in an offline evaluation, compared to a 5.3% CER from a traditional n-gram language model. Our user study showed that participants achieved an average performance of 70.6 WPM, or 80.4% of their physical keyboard speed, and 1.50% CER after 2.5 hours of practice over five days on a table surface. They also achieved 43.9 WPM and 1.37% CER when typing on their laps. Our results demonstrate the strong potential of QWERTY typing as a ubiquitous text entry solution.2022MZMingrui Ray Zhang et al.University of WashingtonHaptic WearablesVoice User Interface (VUI) DesignIntelligent Voice Assistants (Alexa, Siri, etc.)CHI
Monitoring Screen Time or Redesigning It? Two Approaches to Supporting Intentional Social Media UseExisting designs helping people manage their social media use include: 1) external supports that monitor and limit use; 2) internal supports that change the interface itself. Here, we design and deploy Chirp, a mobile Twitter client, to independently examine how users experience external and internal supports. To develop Chirp, we identified 16 features that influence users’ sense of agency on Twitter through a survey of 129 participants and a design workshop. We then conducted a four-week within-subjects deployment with 31 participants. Our internal supports (including features to filter tweets and inform users when they have exhausted new content) significantly increased users’ sense of agency, while our external supports (a usage dashboard and nudges to close the app) did not. Participants valued our internal supports and said that our external supports were for "other people". Our findings suggest that design patterns promoting agency may serve users better than screen time tools.2022MZMingrui Ray Zhang et al.University of WashingtonSocial Platform Design & User BehaviorNotification & Interruption ManagementCHI
Ga11y: an Automated GIF Annotation System for Visually Impaired UsersAnimated GIF images have become prevalent in internet culture, often used to express richer and more nuanced meanings than static images. But animated GIFs often lack adequate alternative text descriptions, and it is challenging to generate such descriptions automatically, resulting in inaccessible GIFs for blind or low-vision (BLV) users. To improve the accessibility of animated GIFs for BLV users, we provide a system called Ga11y (pronounced ``galley''), for creating GIF annotations. Ga11y combines the power of machine intelligence and crowdsourcing and has three components: an Android client for submitting annotation requests, a backend server and database, and a web interface where volunteers can respond to annotation requests. We evaluated three human annotation interfaces and employ the one that yielded the best annotation quality. We also conducted a multi-stage evaluation with 12 BLV participants from the United States and China, receiving positive feedback.2022MZMingrui Ray Zhang et al.University of WashingtonVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
“I Don't Even Remember What I Read”: How Design Influences Dissociation on Social MediaMany people have experienced mindlessly scrolling on social media. We investigated these experiences through the lens of normative dissociation: total cognitive absorption, characterized by diminished self-awareness and reduced sense of agency. To explore user experiences of normative dissociation and how design affects the likelihood of normative dissociation, we deployed Chirp, a custom Twitter client, to 43 U.S. participants. Experience sampling and interviews revealed that sometimes, becoming absorbed in normative dissociation on social media felt like a beneficial break. However, people also reported passively slipping into normative dissociation, such that they failed to absorb any content and were left feeling like they had wasted their time. We found that designed interventions--including custom lists, reading history labels, time limit dialogs, and usage statistics--reduced normative dissociation. Our findings demonstrate that interaction designs intended to capture attention likely do so by harnessing people’s natural inclination to seek normative dissociation experiences. This suggests that normative dissociation may be a more productive framing than addiction for discussing social media overuse.2022ABAmanda Baughan et al.University of WashingtonPrivacy by Design & User ControlOnline Harassment & Counter-ToolsSocial Platform Design & User BehaviorCHI
Revamp: Enhancing Accessible Information Seeking Experience of Online Shopping for Blind or Low Vision UsersOnline shopping has become a valuable modern convenience, but blind or low vision (BLV) users still face significant challenges using it, because of: 1) inadequate image descriptions and 2) the inability to filter large amounts of information using screen readers. To address those challenges, we propose Revamp, a system that leverages customer reviews for interactive information retrieval. Revamp is a browser integration that supports review-based question-answering interactions on a reconstructed product page. From our interview, we identified four main aspects (color, logo, shape, and size) that are vital for BLV users to understand the visual appearance of a product. Based on the findings, we formulated syntactic rules to extract review snippets, which were used to generate image descriptions and responses to users’ queries. Evaluations with eight BLV users showed that Revamp 1) provided useful descriptive information for understanding product appearance and 2) helped the participants locate key information efficiently.2021RWRuolin Wang et al.UCLAVisual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)Universal & Inclusive DesignCHI
PhraseFlow: Designs and Empirical Studies of Phrase-Level InputDecoding on phrase-level may afford more correction accuracy than on word-level according to previous research. However, how phrase-level input affects the user typing behavior, and how to design the interaction to make it practical remain under explored. We present PhraseFlow, a phrase-level input keyboard that is able to correct previous text based on the subsequently input sequences. Computational studies show that phrase-level input reduces the error rate of autocorrection by over 16%. We found that phrase-level input introduced extra cognitive load to the user that hindered their performance. Through an iterative design-implement-research process, we optimized the design of PhraseFlow that alleviated the cognitive load. An in-lab study shows that users could adopt PhraseFlow quickly, resulting in 19% fewer error without losing speed. In real-life settings, we conducted a six-day deployment study with 42 participants, showing that 78.6% of the users would like to have the phrase-level input feature in future keyboards.2021MZMingrui Ray Zhang et al.University of WashingtonGenerative AI (Text, Image, Music, Video)Human-LLM CollaborationAI-Assisted Creative WritingCHI
Voicemoji: Emoji Entry Using Voice for Visually Impaired PeopleKeyboard-based emoji entry can be challenging for people with visual impairments: users have to sequentially navigate emoji lists using screen readers to find their desired emojis, which is a slow and tedious process. In this work, we explore the design and benefits of emoji entry with speech input, a popular text entry method among people with visual impairments. After conducting interviews to understand blind or low vision (BLV) users’ current emoji input experiences, we developed Voicemoji, which (1) outputs relevant emojis in response to voice commands, and (2) provides context-sensitive emoji suggestions through speech output. We also conducted a multi-stage evaluation study with six BLV participants from the United States and six BLV participants from China, finding that Voicemoji significantly reduced entry time by 91.2% and was preferred by all participants over the Apple iOS keyboard. Based on our findings, we present Voicemoji as a feasible solution for voice-based emoji entry.2021MZMingrui Ray Zhang et al.University of WashingtonIntelligent Voice Assistants (Alexa, Siri, etc.)Visual Impairment Technologies (Screen Readers, Tactile Graphics, Braille)CHI
JustCorrect: Intelligent Post Hoc Text Correction Techniques on SmartphonesCorrecting errors in entered text is a common task but usually difficult to perform on mobile devices due to tedious cursor navigation steps. In this paper, we present JustCorrect, an intelligent post hoc text correction technique for smartphones. To make a correction, the user simply types the correct text at the end of their current input, and JustCorrect will automatically detect the error and apply the correction in the form of an insertion or a substitution. In this way, manual navigation steps are bypassed, and the correction can be committed with a single tap. We solved two critical problems to support JustCorrect: (1) Correction Algorithm: we propose an algorithm that infers the user’s correction intention from the last typed word. (2) Input Modalities: our study revealed that both tap and gesture were suitable input modalities for performing JustCorrect. Based on our findings, we integrated JustCorrect into a soft keyboard. Our user studies show that using JustCorrect reduces the text correction time by 12.8% over the stock Android keyboard and by 9.7% over the "Type, then Correct" text correction technique by Zhang et al. (2019). Overall, JustCorrect complements existing post hoc text correction techniques, making error correction more automatic and intelligent.2020WCWenzhe Cui et al.Voice User Interface (VUI) DesignUIST
Text Entry Throughput: Towards Unifying Speed and Accuracy in a Single Performance MetricHuman-computer input performance inherently involves speed-accuracy tradeoffs---the faster users act, the more inaccurate those actions are. Therefore, comparing speeds and accuracies separately can result in ambiguous outcomes: Does a fast but inaccurate technique perform better or worse overall than a slow but accurate one? For pointing, speed and accuracy has been unified for over 60 years as throughput (bits/s) (Crossman 1957, Welford 1968), but to date, no similar metric has been established for text entry. In this paper, we introduce a text entry method-independent throughput metric based on Shannon information theory (1948). To explore the practical usability of the metric, we conducted an experiment in which 16 participants typed with a laptop keyboard using different cognitive sets, i.e., speed-accuracy biases. Our results show that as a performance metric, text entry throughput remains relatively stable under different speed-accuracy conditions. We also evaluated a smartphone keyboard with 12 participants, finding that throughput varied least compared to other text entry metrics. This work allows researchers to characterize text entry performance with a single unified measure of input efficiency.2019MZMingrui Ray Zhang et al.University of WashingtonUser Research Methods (Interviews, Surveys, Observation)Computational Methods in HCICHI
Anchored Audio Sampling: A Seamless Method for Exploring Children's Thoughts During Deployment StudiesMany traditional HCI methods, such as surveys and interviews, are of limited value when working with preschoolers. In this paper, we present anchored audio sampling (AAS), a remote data collection technique for extracting qualitative audio samples during field deployments with young children. AAS offers a developmentally sensitive way of understanding how children make sense of technology and situates their use in the larger context of daily life. AAS is defined by an anchor event, around which audio is collected. A sliding window surrounding this anchor captures both antecedent and ensuing recording, providing the researcher insight into the activities that led up to the event of interest as well as those that followed. We present themes from three deployments that leverage this technique. Based on our experiences using AAS, we have also developed a reusable open-source library for embedding AAS into any Android application.2019AHAlexis Hiniker et al.University of WashingtonSpecial Education TechnologyUser Research Methods (Interviews, Surveys, Observation)CHI
Communication Breakdowns Between Families and AlexaWe investigate how families repair communication breakdowns with digital home assistants. We recruited 10 diverse families to use an Amazon Echo Dot in their homes for four weeks. All families had at least one child between four and 17 years old. Each family participated in pre- and post- deployment interviews. Their interactions with the Echo Dot (Alexa) were audio recorded throughout the study. We analyzed 59 communication breakdown interactions between family members and Alexa, framing our analysis with concepts from HCI and speech-language pathology. Our findings indicate that family members collaborate using discourse scaffolding (supportive communication guidance) and a variety of speech and language modifications in their attempts to repair communication breakdowns with Alexa. Alexa's responses also influence the repair strategies that families use. Designers can relieve the communication repair burden that primarily rests with families by increasing digital home assistants' abilities to collaborate together with users to repair communication breakdowns.2019EBLucas Colusso et al.University of WashingtonHome Voice Assistant ExperienceCHI
Beyond the Input Stream: Making Text Entry Evaluations More Flexible with Transcription SequencesText entry method-independent evaluation tools are often used to conduct text entry experiments and compute performance metrics, like words per minute and error rates. The input stream paradigm of Soukoreff & MacKenzie (2001, 2003) still remains prevalent, which presents a string for transcription and uses a serial character representation for encoding the text entry process. Although an advance over prior paradigms, the input stream paradigm is unable to support many modern text entry features. To address these limitations, we present “transcription sequences:” for each new input, a snapshot of the entire transcribed string unto that point is captured. By assembling transcription sequences and comparing adjacent strings, we can compute all prior metrics, reduce artificial constraints on text entry evaluations, and introduce new metrics. We conducted a study with 18 participants who typed 1620 phrases using a laptop keyboard, on-screen keyboard, and smartphone keyboard using features such as auto-correction, word prediction, and copy-and-paste. We also evaluated non-keyboard methods Dasher, gesture typing, and T9. Our results show that modern features and methods can be accommodated, prior metrics can be correctly computed, and new metrics can reveal insights. We validated our algorithms using ground truth based on cursor positioning, confirming 100% accuracy. We also provide a new tool, TextTest++, to facilitate web-based evaluations.2019MZMingrui Ray Zhang et al.Prototyping & User TestingUIST
Type, Then Correct: Intelligent Text Correction Techniques for Mobile Text Entry Using Neural NetworksCurrent text correction processes on mobile touch devices are laborious: users either extensively use backspace, or navigate the cursor to the error position, make a correction, and navigate back, usually by employing multiple taps or drags over small targets. In this paper, we present three novel text correction techniques to improve the efficiency of the correction process: Drag-n-Drop, Drag-n-Throw, and Magic Key. All of the techniques skip error-deletion and cursor-positioning procedures, and instead allow the user to type the correction first, and then apply that correction to a previously committed error. Specifically, Drag-n-Drop allows a user to drag a correction and drop it on the error position. Drag-n-Throw lets a user drag a correction from the keyboard suggestion list and “throw” it to the approximate area of the error text. Our deep learning algorithm determines the most likely error in the targeted area and applies the correction. Magic Key allows a user to type a correction and tap a designated key to highlight possible error candidates. The user can navigate among these candidates by dragging atop the key, and can apply a correction by tapping the key. We evaluated these techniques in both text correction and transcription tasks. Our experiment results show that correction with the new techniques was significantly faster than de facto cursor and backspace-based correction. Our techniques apply to any touch-based text entry method.2019MZMingrui Ray Zhang et al.Human-LLM CollaborationAI-Assisted Decision-Making & AutomationUIST