IUI '19- Proceedings of the 24th International Conference on Intelligent User Interfaces

Full Citation in the ACM Digital Library

Getting virtually personal: making responsible and empathetic "her" for everyone

Have you watched the movie Her? Have you ever wondered or wished to have your own AI companion just like Samantha, who could understand you better than you know about yourself, and could tell you what you really are, whom your best partner may be, and which career path would be best for you? In this talk, I will present a computational framework for building responsible and empathetic Artificial Intelligent (AI) agents who can deeply understand their users as unique individuals and responsibly guide their behavior in both virtual and real world.

Starting with a live demo of showing how an AI interviewer chats with a user to automatically derive his/her personality characteristics and provide personalized recommendations, I will highlight the technical advances of the framework in two aspects. First, I will present a computational, evidence-based approach to Big 5 personality inference, which enables an AI agent to deeply understand a user's unique characteristics by analyzing the user's chat text on the fly. Second, I will describe a topic-based conversation engine that couples deep learning with rules to support a natural conversation and rapid customization of a conversational agent.

I will describe the initial applications of our AI agents in the real world, from talent selection to student teaming to user experience research. Finally, I will discuss the wider implications of our work on building hyper-personalized systems and their impact on our lives.

DARPA's explainable artificial intelligence (XAI) program

The DARPA's Explainable Artificial Intelligence (XAI) program endeavors to create AI systems whose learned models and decisions can be understood and appropriately trusted by end users. This talk will summarize the XAI program and present highlights from these Phase 1 evaluations.

Innovating with AI

Google has created 8 products with over a billion users each. These products are powered by AI (artificial intelligence) at every level - from the core infrastructure and software platform to the application logic and the user interface. I'll share a behind-the-scenes look at how Google AI works and how we use it to create innovative UX (user experience) at a planetary scale. I'll end with our vision to democratize AI and how you can use Google AI in your own work.

SESSION: Recommender systems

Evaluating narrative-driven movie recommendations on Reddit

Recommender systems have become omni-present tools that are used by a wide variety of users in everyday life tasks, such as finding products in Web stores or online movie streaming portals. However, in situations where users already have an idea of what they are looking for (e.g., 'The Lord of the Rings', but in space with a dark vibe), most traditional recommender algorithms struggle to adequately address such a priori defined requirements. Therefore, users have built dedicated discussion boards to ask peers for suggestions, which ideally fulfill the stated requirements. In this paper, we set out to determine the utility of well-established recommender algorithms for calculating recommendations when provided with such a narrative. To that end, we first crowdsource a reference evaluation dataset from human movie suggestions. We use this dataset to evaluate the potential of five recommendation algorithms for incorporating such a narrative into their recommendations. Further, we make the dataset available for other researchers to advance the state of research in the field of narrative-driven recommendations. Finally, we use our evaluation dataset to improve not only our algorithmic recommendations, but also existing empirical recommendations of IMDb. Our findings suggest that the implemented recommender algorithms yield vastly different suggestions than humans when presented with the same a priori requirements. However, with carefully configured post-filtering techniques, we can outperform the baseline by up to 100%. This represents an important first step towards more refined algorithmic narrative-driven recommendations.

Intelligently recommending key bindings on physical keyboards with demonstrations in Emacs

Physical keyboards have been peripheral input devices to electronic computers since early 1970s and become ubiquitous during the past few decades, especially in professional areas such as software programming, professional game playing, and document processing. In these real-world applications, key bindings, a fundamental vehicle for human to interact with software systems using physical keyboards, play a critical role in users' productivity. However, as essential applications of artificial intelligence research, research on intelligent user interfaces and recommender systems barely relates to key bindings on physical keyboards. In this paper, we develop a recommender system (referred to as EKBRS) for intelligently recommending key bindings with demonstration in Emacs, which we use as a base user interface. This is a brand new direction of intelligent user interface research and also a novel application of recommender systems. To the best of our knowledge, this is the world's first intelligent user interface that heavily exploits key bindings of physical keyboards and the world's first recommender system for recommending key bindings. We empirically show the effectiveness of our recommender system and briefly discuss the applicability of this recommender system to other software systems.

Rasch-based tailored goals for nutrition assistance systems

Choosing adequate goals plays is central to the success of a task. With this study, we investigate tailoring the goals of a nutrition assistance system to the user's abilities according to a Rasch scale. To that end, we evaluated two versions of a mobile system that offers dietary tracking, visual feedback, and personalized recipe recommendations. The original version targets optimal nutritional behavior and focuses on the six least optimal nutrients (N=51). The adapted version targets only improved nutritional behavior compared to the status quo and thus tailors the advice to the next six achievable nutrients according to a Rasch scale (N=47). Results of the two-week study indicate that the tailored advice leads to higher success for the focused nutrients, and is perceived to be more diverse and personalized, and thus more effective.

SESSION: Natural language and speech

Vajra: step-by-step programming with natural language

Building natural language programming systems that are geared towards end-users requires the abstraction of formalisms inherently introduced by programming languages, capturing the intent of natural language inputs and mapping it to existing programming language constructs.

We present a novel end-user programming paradigm for Python, which maps natural language commands into Python code. The proposed semantic parsing model aims to reduce the barriers for producing well-formed code (syntactic gap) and for exploring third-party APIs (lexico-semantic gap). The proposed method was implemented in a supporting system and evaluated in a usability study involving programmers as well as non-programmers. The results show that both groups are able to produce code with or without prior programming experience.

Inferencing underspecified natural language utterances in visual analysis

Handling ambiguity and underspecification of users' utterances is challenging, particularly for natural language interfaces that help with visual analytical tasks. Constraints in the underlying analytical platform and the users' expectations of high precision and recall require thoughtful inferencing to help generate useful responses. In this paper, we introduce a system to resolve partial utterances based on syntactic and semantic constraints of the underlying analytical expressions. We extend inferencing based on best practices in information visualization to generate useful visualization responses. We employ heuristics to help constrain the solution space of possible inferences, and apply ranking logic to the interpretations based on relevancy. We evaluate the quality of inferred interpretations based on relevancy and analytical usefulness.

Empathic dialogue system based on emotions extracted from tweets

Empathic conversations have increasingly been important for dialogue systems to improve the users' experience, and increase their engagement with the system, which is difficult for many existing monotonous systems. Existing empathic dialogue systems are designed for limited domain dialogues. They respond fixed phrases toward observed user emotions. In open domain conversations, however, generating empathic responses for a wide variety of topics is required. In this paper, we draw on psychological studies about empathy, and propose an empathic dialogue system in open domain conversations. The proposed system generates empathic utterances based on observed emotions in user utterances, thus is able to build empathy with users. Our experiments have proven that users were able to feel more empathy from the proposed system, especially when their emotions were explicitly expressed in their utterances.

SESSION: IUI for wearable and mobile

Background perception and comprehension of symbols conveyed through vibrotactile wearable displays

Previous research has demonstrated the feasibility of conveying vibrotactile encoded information efficiently using wearable devices. Users can understand vibrotactile encoded symbols and complex messages combining such symbols. Such wearable devices can find applicability in many multitasking use cases. Nevertheless, for multitasking, it would be necessary for the perception and comprehension of vibrotactile information to be less attention demanding and not interfere with other parallel tasks. We present a user study which investigates whether high speed vibrotactile encoded messages can be perceived in the background while performing other concurrent attention-demanding primary tasks. The vibrotactile messages used in the study were limited to symbols representing letters of English Alphabet. We observed that users could very accurately comprehend vibrotactile such encoded messages in the background and other parallel tasks did not affect users performance. Additionally, the comprehension of such messages did also not affect the performance of the concurrent primary task as well. Our results promote the use of vibrotactile information transmission to facilitate multitasking.

Smell Pittsburgh: community-empowered mobile smell reporting system

Urban air pollution has been linked to various human health considerations, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequently concentrated. All smell report data are publicly accessible online. These reports are also sent to the local health department and visualized on a map along with air quality data from monitoring stations. This visualization provides a comprehensive overview of the local pollution landscape. Additionally, with these reports and air quality data, we developed a model to predict upcoming smell events and send push notifications to inform communities. Our evaluation of this system demonstrates that engaging residents in documenting their experiences with pollution odors can help identify local air pollution patterns, and can empower communities to advocate for better air quality.

Towards a generalizable method for detecting fluid intake with wrist-mounted sensors and adaptive segmentation

Over the last decade, advances in mobile technologies have enabled the development of intelligent systems that attempt to recognize and model a variety of health-related human behaviors. While automated dietary monitoring based on passive sensors has been an area of increasing research activity for many years, much less attention has been given to tracking fluid intake. In this work, we apply an adaptive segmentation technique on a continuous stream of inertial data captured with a practical, off-the-shelf wrist-mounted device to detect fluid intake gestures passively. We evaluated our approach in a study with 30 participants where 561 drinking instances were recorded. Using a leave-one-participant-out (LOPO), we were able to detect drinking episodes with 90.3% precision and 91.0% recall, demonstrating the generalizability of our approach. In addition to our proposed method, we also contribute an anonymized and labeled dataset of drinking and non-drinking gestures to encourage further work in the field.

ShopEye: fusing RFID and smartwatch for multi-relation excavation in physical stores

Smart retail stores open new possibilities for enabling a variety of physical analytics, such as users' shopping trajectories and preferences for certain items. This paper aims to excavate three kinds of relations in physical stores, i.e. user-item, user-user and item-item, which provide abundant information for enhancing users' shopping experiences and boosting retailers' sales. We present ShopEye, a hybrid RFID and smartwatch system to delve into these relations in an implicit and non-intrusive manner. The intuition is that inertial sensors embedded in smartwatches and RFID tags attached to items can capture the user behaviors and the item motions, respectively. ShopEye first pairs users with corresponding items according to correlations between inertial signals and RFID signals, and then incorporates these pairs with the motion behaviors of users to further profile user-user and item-item relations. We have tested the system extensively in our lab environment which mimics the real retail store. Experimental results demonstrate the effectiveness and robustness of ShopEye in excavating these relations.

SESSION: Evaluation of IUI

When people and algorithms meet: user-reported problems in intelligent everyday applications

The complex nature of intelligent systems motivates work on supporting users during interaction, for example through explanations. However, there is yet little empirical evidence on specific problems users face in such systems in everyday use. This paper investigates such problems as reported by users: We analysed 35,448 reviews of three apps on the Google Play Store (Facebook, Netflix and Google Maps) with sentiment analysis and topic modelling to reveal problems during interaction that can be attributed to the apps' algorithmic decision-making. We enriched this data with users' coping and support strategies through a follow-up online survey (N=286). In particular, we found problems and strategies related to content, algorithm, user choice, and feedback. We discuss corresponding implications for designing user support, highlighting the importance of user control and explanations of output, not processes. Our work thus contributes empirical evidence to facilitate understanding of users' everyday problems with intelligent systems.

Progressive disclosure: empirically motivated approaches to designing effective transparency

As we increasingly delegate important decisions to intelligent systems, it is essential that users understand how algorithmic decisions are made. Prior work has often taken a technocentric approach to transparency. In contrast, we explore empirical user-centric methods to better understand user reactions to transparent systems. We assess user reactions to transparency in two studies. In Study 1, users anticipated that a more transparent system would perform better, but retracted this evaluation after experience with the system. Qualitative data suggest this arose because transparency is distracting and undermines simple heuristics users form about system operation. Study 2 explored these effects in depth, suggesting that users may benefit from initially simplified feedback that hides potential system errors and assists users in building working heuristics about system operation. We use these findings to motivate new progressive disclosure principles for transparency in intelligent systems.

Supporting job mediator and job seeker through an actionable dashboard

Job mediation services can assist job seekers in finding suitable employment through a personalised approach. Consultation or mediation sessions, supported by personal profile data of the job seeker, help job mediators understand personal situation and requests. Prediction and recommendation systems can directly provide job seekers with possible job vacancies. However, incorrect or unrealistic suggestions, and bad interpretations can result in bad decisions or demotivation of the job seeker. This paper explores how an interactive dashboard visualising prediction and recommendation output can help support the dialogue between job mediator and job seeker, by increasing the "explainability" and providing mediators with control over the information that is shown to job seekers.

SESSION: Affective and aesthetic IUI

Paralinguistic recommendations for affective word clouds

Word clouds are widely used for non-analytic purposes, such as introducing a topic to students, or creating a gift with personally meaningful text. Surveys show that users prefer tools that yield word clouds with a stronger emotional impact. Fonts and color palettes are powerful paralinguistic signals that may determine this impact, but, typically, the expectation is that they are chosen by the users. We present an affect-aware font and color palette selection methodology that aims to facilitate more informed choices. We induce associations of fonts with a set of eight affects, and evaluate the resulting data in a series of user studies both on individual words as well as in word clouds. Relying on a recent study to procure affective color palettes, we carry out a similar user study to understand the impact of color choices on word clouds. Our findings suggest that both fonts and color palettes are powerful tools contributing to the affect associated with a word cloud. The experiments further confirm that the novel datasets we propose are successful in enabling this. Based on this data, we implement a prototype that allows users to specify a desired affect and recommends congruent fonts and color palettes for the word cloud.

Does emotion influence the use of auto-suggest during smartphone typing?

Typing based interfaces are common across many mobile applications, especially messaging apps. To reduce the difficulty of typing using keyboard applications on smartphones, smartwatches with restricted space, several techniques, such as auto-complete, auto-suggest, are implemented. Although helpful, these techniques do add more cognitive load on the user. Hence beyond the importance to improve the word recommendations, it is useful to understand the pattern of use of auto-suggestions during typing. Among several factors that may influence use of auto-suggest, the role of emotion has been mostly overlooked, often due to the difficulty of unobtrusively inferring emotion. With advances in affective computing, and ability to infer user's emotional states accurately, it is imperative to investigate how auto-suggest can be guided by emotion aware decisions. In this work, we investigate correlations between user emotion and usage of auto-suggest i.e. whether users prefer to use auto-suggest in specific emotion states. We developed an Android keyboard application, which records auto-suggest usage and collects emotion self-reports from users in a 3-week in-the-wild study. Analysis of the dataset reveals relationship between user reported emotion state and use of auto-suggest. We used the data to train personalized models for predicting use of auto-suggest in specific emotion state. The model can predict use of auto-suggest with an average accuracy (AUCROC) of 82% showing the feasibility of emotion-aware auto-suggestion.

Prediction of music pairwise preferences from facial expressions

Users of a recommender system may be requested to express their preferences about items either with evaluations of items (e.g. a rating) or with comparisons of item pairs. In this work we focus on the acquisition of pairwise preferences in the music domain. Asking the user to explicitly compare music, i.e., which, among two listened tracks, is preferred, requires some user effort. We have therefore developed a novel approach for automatically extracting these preferences from the analysis of the facial expressions of the users while listening to the compared tracks. We have trained a predictor that infers user's pairwise preferences by using features extracted from these data. We show that the predictor performs better than a commonly used baseline, which leverages the user's listening duration of the tracks to infer pairwise preferences. Furthermore, we show that there are differences in the accuracy of the proposed method between users with different personalities and we have therefore adapted the trained model accordingly. Our work shows that by introducing a low user effort preference elicitation approach, which, however, requires to access information that may raise potential privacy issues (face expression), one can obtain good prediction accuracy of pairwise music preferences.

Piano Genie

We present Piano Genie, an intelligent controller which allows non-musicians to improvise on the piano. With Piano Genie, a user performs on a simple interface with eight buttons, and their performance is decoded into the space of plausible piano music in real time. To learn a suitable mapping procedure for this problem, we train recurrent neural network autoencoders with discrete bottlenecks: an encoder learns an appropriate sequence of buttons corresponding to a piano piece, and a decoder learns to map this sequence back to the original piece. During performance, we substitute a user's input for the encoder output, and play the decoder's prediction each time the user presses a button. To improve the intuitiveness of Piano Genie's performance behavior, we impose musically meaningful constraints over the encoder's outputs.

SESSION: Big data and analytics

Atlas: local graph exploration in a global context

Graphs are everywhere, growing increasingly complex, and still lack scalable, interactive tools to support sensemaking. To address this problem, we present Atlas, an interactive graph exploration system that adapts scalable edge decomposition to enable a new paradigm for large graph exploration, generating explorable multi-layered representations. Atlas simultaneously reveals peculiar subgraph structures, (e.g., quasi-cliques) and possible vertex roles in connecting such subgraph patterns. Atlas decomposes million-edge graphs in seconds, scaling to graphs with up to 117 million edges. We present the results from a think-aloud user study with three graph experts and highlight discoveries made possible by Atlas when applied to graphs from multiple domains, including suspicious yelp reviews, insider trading, and word embeddings. Atlas runs in-browser and is open-sourced.

Flux capacitors for JavaScript deloreans: approximate caching for physics-based data interaction

Interactive visualizations have become an effective and pervasive mode of allowing users to explore the data in a visual, fluid, and immersive manner. While modern web, mobile, touch, and gesturedriven next-generation interfaces such as Leap Motion allow for highly interactive experiences, they pose unique and unprecedented workloads to the underlying data platform. Usually, these visualizations do not need precise results for most queries generated during an interaction, and the users require the intermediate results as feedback only to guide them towards their goal query. We present a middleware component - Flux Capacitor, that insulates the backend from bursty and query-intensive workloads. Flux Capacitor uses prefetching and caching strategies devised by exploiting the inherent physics-metaphor of UI widgets such as friction and inertia in range sliders, and typical characteristics of user-interaction. This enables low interaction response times while intelligently trading off accuracy

Avoiding drill-down fallacies with VisPilot: assisted exploration of data subsets

As datasets continue to grow in size and complexity, exploring multi-dimensional datasets remain challenging for analysts. A common operation during this exploration is drill-down-understanding the behavior of data subsets by progressively adding filters. While widely used, in the absence of careful attention towards confounding factors, drill-downs could lead to inductive fallacies. Specifically, an analyst may end up being "deceived" into thinking that a deviation in trend is attributable to a local change, when in fact it is a more general phenomenon; we term this the drill-down fallacy. One way to avoid falling prey to drill-down fallacies is to exhaustively explore all potential drill-down paths, which quickly becomes infeasible on complex datasets with many attributes. We present VisPilot, an accelerated visual data exploration tool that guides analysts through the key insights in a dataset, while avoiding drill-down fallacies. Our user study results show that VisPilot helps analysts discover interesting visualizations, understand attribute importance, and predict unseen visualizations better than other multidimensional data analysis baselines.

SESSION: Assistive IUIs

Scene text access: a comparison of mobile OCR modalities for blind users

We present a study with seven blind participants using three different mobile OCR apps to find text posted in various indoor environments. The first app considered was Microsoft SeeingAI in its Short Text mode, which reads any text in sight with a minimalistic interface. The second app was Spot+OCR, a custom application that separates the task of text detection from OCR proper. Upon detection of text in the image, Spot+OCR generates a short vibration; as soon as the user stabilizes the phone, a high-resolution snapshot is taken and OCR-processed. The third app, Guided OCR, was designed to guide the user in taking several pictures in a 360° span at the maximum resolution available by the camera, with minimum overlap between pictures. Quantitative results (in terms of true positive ratios and traversal speed) were recorded. Along with the qualitative observation and outcomes from an exit survey, these results allow us to identify and assess the different strategies used by our participants, as well as the challenges of operating these systems without sight.

Guided play: digital sensing and coaching for stereotypical play behavior in children with autism

Restricted and repetitive behaviors (RRBs) are a core symptom and an early marker of autism. Current research and intervention for RRB heavily rely on professional experience and effort. Guided Play is a technology that uses instrumented games and toys as a platform to understand children's play behavior and facilitate behavioral intervention during play. This paper presents the design and implementation of a prototype based on the technology, as well as an evaluation on 6 children with autism. The results show that children with RRBs in physical world activities also exhibit similar patterns in a similar digital activity, and that digital coaching can reduce RRBs by expanding children's play skill repertoire and promoting symbolic play.

Learning to assess the quality of stroke rehabilitation exercises

Due to the limited number of therapists, task-oriented exercises are often prescribed for post-stroke survivors as in-home rehabilitation. During in-home rehabilitation, a patient may become unmotivated or confused to comply prescriptions without the feedback of a therapist. To address this challenge, this paper proposes an automated method that can achieve not only qualitative, but also quantitative assessment of stroke rehabilitation exercises. Specifically, we explored a threshold model that utilizes the outputs of binary classifiers to quantify the correctness of a movements into a performance score. We collected movements of 11 healthy subjects and 15 post-stroke survivors using a Kinect sensor and ground truth scores from primary and secondary therapists. The proposed method achieves the following agreement with the primary therapist: 0.8436, 0.8264, and 0.7976 F1-scores on three task-oriented exercises. Experimental results show that our approach performs equally well or better than multi-class classification, regression, or the evaluation of the secondary therapist. Furthermore, we found a strong correlation (R2 = 0.95) between the sum of computed exercise scores and the Fugl-Meyer Assessment scores, clinically validated motor impairment index of post-stroke survivors. Our results demonstrate a feasibility of automatically assessing stroke rehabilitation exercises with the decent agreement levels and clinical relevance.

SESSION: Explainable AI

What can AI do for me?: evaluating machine learning interpretations in cooperative play

Machine learning is an important tool for decision making, but its ethical and responsible application requires rigorous vetting of its interpretability and utility: an understudied problem, particularly for natural language processing models. We propose an evaluation of interpretation on a real task with real human users, where the effectiveness of interpretation is measured by how much it improves human performance. We design a grounded, realistic human-computer cooperative setting using a question answering task, Quizbowl. We recruit both trivia experts and novices to play this game with computer as their teammate, who communicates its prediction via three different interpretations. We also provide design guidance for natural language processing human-in-the-loop settings.

I can do better than your AI: expertise and explanations

Intelligent assistants, such as navigation, recommender, and expert systems, are most helpful in situations where users lack domain knowledge. Despite this, recent research in cognitive psychology has revealed that lower-skilled individuals may maintain a sense of illusory superiority, which might suggest that users with the highest need for advice may be the least likely to defer judgment. Explanation interfaces - a method for persuading users to take a system's advice - are thought by many to be the solution for instilling trust, but do their effects hold for self-assured users? To address this knowledge gap, we conducted a quantitative study (N=529) wherein participants played a binary decision-making game with help from an intelligent assistant. Participants were profiled in terms of both actual (measured) expertise and reported familiarity with the task concept. The presence of explanations, level of automation, and number of errors made by the intelligent assistant were manipulated while observing changes in user acceptance of advice. An analysis of cognitive metrics lead to three findings for research in intelligent assistants: 1) higher reported familiarity with the task simultaneously predicted more reported trust but less adherence, 2) explanations only swayed people who reported very low task familiarity, and 3) showing explanations to people who reported more task familiarity led to automation bias.

Explainability scenarios: towards scenario-based XAI design

Integral to the adoption and uptake of AI systems in real-world settings is the ability for people to make sense of and evaluate such systems, a growing area of development and design efforts known as XAI (Explainable AI). Recent work has advanced the state of the art, yet a key challenge remains in understanding unique requirements that might arise when XAI systems are deployed into complex settings of use. In helping envision such requirements, this paper turns to scenario-based design, a method that anticipates and leverages scenarios of possible use early on in system development. To demonstrate the value of the scenario-based design method to XAI design, this paper presents a case study of aging-in-place monitoring. Introducing the concept of "explainability scenarios" as resources in XAI design, this paper sets out a forward-facing agenda for further attention to the emergent requirements of explainability-in-use.

The effects of example-based explanations in a machine learning interface

The black-box nature of machine learning algorithms can make their predictions difficult to understand and explain to end-users. In this paper, we propose and evaluate two kinds of example-based explanations in the visual domain, normative explanations and comparative explanations (Figure 1), which automatically surface examples from the training set of a deep neural net sketch-recognition algorithm. To investigate their effects, we deployed these explanations to 1150 users on QuickDraw, an online platform where users draw images and see whether a recognizer has correctly guessed the intended drawing. When the algorithm failed to recognize the drawing, those who received normative explanations felt they had a better understanding of the system, and perceived the system to have higher capability. However, comparative explanations did not always improve perceptions of the algorithm, possibly because they sometimes exposed limitations of the algorithm and may have led to surprise. These findings suggest that examples can serve as a vehicle for explaining algorithmic behavior, but point to relative advantages and disadvantages of using different kinds of examples, depending on the goal.

Automated rationale generation: a technique for explainable AI and its effects on human perceptions

Automated rationale generation is an approach for real-time explanation generation whereby a computational model learns to translate an autonomous agent's internal state and action data representations into natural language. Training on human explanation data can enable agents to learn to generate human-like explanations for their behavior. In this paper, using the context of an agent that plays Frogger, we describe (a) how to collect a corpus of explanations, (b) how to train a neural rationale generator to produce different styles of rationales, and (c) how people perceive these rationales. We conducted two user studies. The first study establishes the plausibility of each type of generated rationale and situates their user perceptions along the dimensions of confidence, humanlike-ness, adequate justification, and understandability. The second study further explores user preferences between the generated rationales with regard to confidence in the autonomous agent, communicating failure and unexpected behavior. Overall, we find alignment between the intended differences in features of the generated rationales and the perceived differences by users. Moreover, context permitting, participants preferred detailed rationales to form a stable mental model of the agent's behavior.

Explaining models: an empirical study of how explanations impact fairness judgment

Ensuring fairness of machine learning systems is a human-in-the-loop process. It relies on developers, users, and the general public to identify fairness problems and make improvements. To facilitate the process we need effective, unbiased, and user-friendly explanations that people can confidently rely on. Towards that end, we conducted an empirical study with four types of programmatically generated explanations to understand how they impact people's fairness judgments of ML systems. With an experiment involving more than 160 Mechanical Turk workers, we show that: 1) Certain explanations are considered inherently less fair, while others can enhance people's confidence in the fairness of the algorithm; 2) Different fairness problems-such as model-wide fairness issues versus case-specific fairness discrepancies-may be more effectively exposed through different styles of explanation; 3) Individual differences, including prior positions and judgment criteria of algorithmic fairness, impact how people react to different styles of explanation. We conclude with a discussion on providing personalized and adaptive explanations to support fairness judgments of ML systems.

What data should I protect?: recommender and planning support for data security analysts

Major breaches of sensitive company data, as for Facebook's 50 million user accounts in 2018 or Equifax's 143 million user accounts in 2017, are showing the limitations of reactive data security technologies. Companies and government organizations are turning to proactive data security technologies that secure sensitive data at source. However, data security analysts still face two fundamental challenges in data protection decisions: 1) the information overload from the growing number of data repositories and protection techniques to consider; 2) the optimization of protection plans given the current goals and available resources in the organization. In this work, we propose an intelligent user interface for security analysts that recommends what data to protect, visualizes simulated protection impact, and helps build protection plans. In a domain with limited access to expert users and practices, we elicited user requirements from security analysts in industry and modeled data risks based on architectural and conceptual attributes. Our preliminary evaluation suggests that the design improves the understanding and trust of the recommended protections and helps convert risk information in protection plans.

Decision making strategies differ in the presence of collaborative explanations: two conjoint studies

Rating-based summary statistics are ubiquitous in e-commerce, and often are crucial components in personalized recommendation mechanisms. Especially visual rating summarizations have been identified as important means to explain, why an item is presented or proposed to an user. Largely left unexplored, however, is the issue to what extent the descriptives of these rating summary statistics influence decision making of the online consumer. Therefore, we conducted a series of two conjoint experiments to explore how different summarizations of rating distributions (i.e., in the form of number of ratings, mean, variance, skewness, bimodality, or origin of the ratings) impact users' decision making. In a first study with over 200 participants, we identified that users are primarily guided by the mean and the number of ratings, and - to lesser degree - by the variance and origin of a rating. When probing the maximizing behavioral tendencies of our participants, other sensitivities regarding the summary of rating distributions became apparent. We thus instrumented a follow-up eye-tracking study to explore in more detail, how the choices of participants vary in terms of their decision making strategies. This second round with over 40 additional participants supported our hypothesis that users, who usually experience higher decision difficulty, follow compensatory decision strategies, and focus more on the decisions they make. We conclude by outlining how the results of these studies can guide algorithm development, and counterbalance presumable biases in implicit user feedback.

SESSION: Intelligent visualization

StoryPrint: an interactive visualization of stories

In this paper, we propose StoryPrint, an interactive visualization of creative storytelling that facilitates individual and comparative structural analyses. This visualization method is intended for script-based media, which has suitable metadata. The pre-visualization process involves parsing the script into different metadata categories and analyzing the sentiment on a character and scene basis. For each scene, the setting, character presence, character prominence, and character emotion of a film are represented as a StoryPrint. The visualization is presented as a radial diagram of concentric rings wrapped around a circular time axis. A user then has the ability to toggle a difference overlay to assist in the cross-comparison of two different scene inputs.

Vulnerable to misinformation?: Verifi!

We present Verifi2, a visual analytic system to support the investigation of misinformation on social media. Various models and studies have emerged from multiple disciplines to detect or understand the effects of misinformation. However, there is still a lack of intuitive and accessible tools that help social media users distinguish misinformation from verified news. Verifi2 uses state-of-the-art computational methods to highlight linguistic, network, and image features that can distinguish suspicious news accounts. By exploring news on a source and document level in Verifi2, users can interact with the complex dimensions that characterize misinformation and contrast how real and suspicious news outlets differ on these dimensions. To evaluate Verifi2, we conduct interviews with experts in digital media, communications, education, and psychology who study misinformation. Our interviews highlight the complexity of the problem of combating misinformation and show promising potential for Verifi2 as an educational tool on misinformation.

Visualizing authorship and contribution of collaborative writing in e-learning environments

Nowadays, several productivity platforms provide effective capabilities to edit collaboratively the content of a document. In educational settings, e-Learning approaches have taken advantage of this functionality to encourage students to join others to complete projects that include the writing of text documents. Although collaborative writing may foster interaction among students, the existing analytical metrics on these platforms are limited and can slow down the process of review by instructors in trying to determine the level of contribution of each student in the document. In this paper, we describe an analytic framework to measure and visualize the contribution in collaborative writing.

The role of user differences in customization: a case study in personalization for infovis-based content

Although there is extensive evidence that personalization of interactive systems can improve the user's experience and satisfaction, it is also known that the two main approaches to deliver personalization, namely via customization or system-driven adaptation, have limitations. In particular, many users do not use customize mechanisms, while adaptation can be perceived as intrusive and opaque. In this paper, we explore an intermediary approach to personalization, namely delivering system-driven support to customization. To this end, we study a customization mechanism allowing to choose the type and amount of information displayed by means of information visualizations in a system for decision making, and examine the impact of user differences on the effectiveness of this mechanism. Our results show that, for the users who did use the customization mechanism, customization effectiveness was impacted by their levels of visualization literacy and locus of control. These results suggest that the customization mechanism could be improved by system-driven assistance to customize depending on the user's level of visualization literacy and locus of control.

SESSION: Augmented and mixed reality

Analyzing user's task-driven interaction in mixed reality

Mixed reality (MR) provides exciting interaction approaches in several applications. The user experience of interacting in these visually rich environments depends highly on the way the user perceives, processes, and comprehends visual information. In this work we are investigating the differences between Field Dependent - Field Independent users towards their interaction behavior in a MR environment when they were asked to perform a specific task. A study was conducted using Microsoft HoloLens device in which participants interacted with a popular HoloLens application, modified by the authors to log user interaction data in real time. Analysis of the results demonstrates the differences in the visual processing of information, especially in visually complex environments and the impact on the user's interaction behavior.

PATI: a projection-based augmented table-top interface for robot programming

As robots begin to provide daily assistance to individuals in human environments, their end-users, who do not necessarily have substantial technical training or backgrounds in robotics or programming, will ultimately need to program and "re-task" their robots to perform a variety of custom tasks. In this work, we present PATI---a Projection-based Augmented Table-top Interface for robot programming---through which users are able to use simple, common gestures (e.g., pinch gestures) and tools (e.g., shape tools) to specify table-top manipulation tasks (e.g., pick-and-place) for a robot manipulator. PATI allows users to interact with the environment directly when providing task specifications; for example, users can utilize gestures and tools to annotate the environment with task-relevant information, such as specifying target landmarks and selecting objects of interest. We conducted a user study to compare PATI with a state-of-the-art, standard industrial method for end-user robot programming. Our results show that participants needed significantly less training time before they felt confident in using our system than they did for the industrial method. Moreover, participants were able to program a robot manipulator to complete a pick-and-place task significantly faster with PATI. This work indicates a new direction for end-user robot programming.

Walking with adaptive augmented reality workspaces: design and usage patterns

Mobile augmented reality may eventually replace our smartphones as the primary way of accessing information on the go. However, current interfaces provide little support to walking and to the variety of actions we perform in the real world. To achieve its full potential, augmented reality interfaces must support the fluid way we move and interact in the physical world. We explored how different adaptation strategies can contribute towards this goal. We evaluated design alternatives through contextual studies and identified the key interaction patterns that interfaces for walking should support. We also identified desirable properties of adaptation-based interface techniques, which can be used to guide the design of the next-generation walking-centered augmented reality workspaces.

ImWeb: cross-platform immersive web browsing for online 3D neuron database exploration

Web services have become one major way for people to obtain and explore information nowadays. However, web browsers currently only offer limited data analysis capabilities, especially for large-scale 3D datasets. This project presents a method of immersive web browsing (ImWeb) to enable effective exploration of multiple datasets over the web with augmented reality (AR) techniques. The ImWeb system allows inputs from both the web browser and AR and provides a set of immersive analytics methods for enhanced web browsing, exploration, comparison, and summary tasks. We have also integrated 3D neuron mining and abstraction approaches to support efficient analysis functions. The architecture of ImWeb system flexibly separates the tasks on web browser and AR and supports smooth networking among the system, so that ImWeb can be adopted by different platforms, such as desktops, large displays, and tablets. We use an online 3D neuron database to demonstrate that ImWeb enables new experiences of exploring 3D datasets over the web. We expect that our approach can be applied to various other online databases and become one useful addition to future web services.

SESSION: Explanations in recommender systems

Personalized explanations for hybrid recommender systems

Recommender systems have become pervasive on the web, shaping the way users see information and thus the decisions they make. As these systems get more complex, there is a growing need for transparency. In this paper, we study the problem of generating and visualizing personalized explanations for hybrid recommender systems, which incorporate many different data sources. We build upon a hybrid probabilistic graphical model and develop an approach to generate real-time recommendations along with personalized explanations. To study the benefits of explanations for hybrid recommender systems, we conduct a crowd-sourced user study where our system generates personalized recommendations and explanations for real users of the last.fm music platform. We experiment with 1) different explanation styles (e.g., user-based, item-based), 2) manipulating the number of explanation styles presented, and 3) manipulating the presentation format (e.g., textual vs. visual). We apply a mixed model statistical analysis to consider user personality traits as a control variable and demonstrate the usefulness of our approach in creating personalized hybrid explanations with different style, number, and format.

Explaining recommendations in an interactive hybrid social recommender

Hybrid social recommender systems use social relevance from multiple sources to recommend relevant items or people to users. To make hybrid recommendations more transparent and controllable, several researchers have explored interactive hybrid recommender interfaces, which allow for a user-driven fusion of recommendation sources. In this field of work, the intelligent user interface has been investigated as an approach to increase transparency and improve the user experience. In this paper, we attempt to further promote the transparency of recommendations by augmenting an interactive hybrid recommender interface with several types of explanations. We evaluate user behavior patterns and subjective feedback by a within-subject study (N=33). Results from the evaluation show the effectiveness of the proposed explanation models. The result of post-treatment survey indicates a significant improvement in the perception of explainability, but such improvement comes with a lower degree of perceived controllability.

To explain or not to explain: the effects of personal characteristics when explaining music recommendations

Recommender systems have been increasingly used in online services that we consume daily, such as Facebook, Netflix, YouTube, and Spotify. However, these systems are often presented to users as a "black box", i.e. the rationale for providing individual recommendations remains unexplained to users. In recent years, various attempts have been made to address this black box issue by providing textual explanations or interactive visualisations that enable users to explore the provenance of recommendations. Among other things, results demonstrated benefits in terms of precision and user satisfaction. Previous research had also indicated that personal characteristics such as domain knowledge, trust propensity and persistence may also play an important role on such perceived benefits. Yet, to date, little is known about the effects of personal characteristics on explaining recommendations. To address this gap, we developed a music recommender system with explanations and conducted an online study using a within-subject design. We captured various personal characteristics of participants and administered both qualitative and quantitative evaluation methods. Results indicate that personal characteristics have significant influence on the interaction and perception of recommender systems, and that this influence changes by adding explanations. For people with a low need for cognition are the explained recommendations the most beneficial. For people with a high need for cognition, we observed that explanations could create a lack of confidence. Based on these results, we present some design implications for explaining recommendations.

The effect of explanations and algorithmic accuracy on visual recommender systems of artistic images

There are very few works about explaining content-based recommendations of images in the artistic domain. Current works do not provide a perspective of the many variables involved in the user perception of several aspects of the system such as domain knowledge, relevance, explainability, and trust. In this paper, we aim to fill this gap by studying three interfaces, with different levels of explainability, for artistic image recommendation. Our experiments with N=121 users confirm that explanations of recommendations in the image domain are useful and increase user satisfaction, perception of explainability and relevance. Furthermore, our results show that the observed effects are also dependent on the underlying recommendation algorithm used. We tested two algorithms: Deep Neural Networks (DNN), which has high accuracy, and Attractiveness Visual Features (AVF) with high transparency but lower accuracy. Our results indicate that algorithms should not be studied in isolation, but rather in conjunction with interfaces, since both play a significant role in the perception of explainability and trust for image recommendation. Finally, using the framework by Knijnenburg et al., we provide a comprehensive model which synthesizes the effects between different variables involved in the user experience with explainable visual recommender systems of artistic images.

SESSION: Agent-based IUIs

Digital survivor of sexual assault

The Digital Survivor of Sexual Assault (DS2A) is an interface that allows a user to have a conversational experience with a survivor of sexual assault, using Artificial Intelligence technology and recorded videos. The application uses a statistical classifier to retrieve contextually appropriate pre-recorded video utterances by the survivor, together with dialogue management policies which enable users to conduct simulated conversations with the survivor about the sexual assault, its aftermath, and other pertinent topics. The content in the application has been specifically elicited to support the needs for the training of U.S. Army professionals in the Sexual Harassment/Assault Response and Prevention (SHARP) Program, and the application comes with an instructional support package. The system has been tested with approximately 200 users, and is presently being used in the SHARP Academy's capstone course.

Induction of an active attitude by short speech reaction time toward interaction for decision-making with multiple agents

An interactive decision-making is useful to put our ambiguous desires into concrete through the interaction with others. However, in human-agent interaction, the agents are often not regarded as well-experienced consultants but rather as human-centered interfaces that provide information. We aimed to induce an active human attitude toward decision-making interactions with agents by controlling the speech reaction time (SRT) of the agents in order to consider the agents as reliable consultants. We conducted an experiment to investigate whether the SRT could influence the human participant's attitude. We used two kinds of agents; one had no SRT (no-SRT) and the other had a SRT of two seconds (2s-SRT). As a result, we found that the no-SRT agents could keep the participants' speech reaction times short even during the decision-making task in which the participants need time for careful consideration. In addition, from the analysis of the number of proposed categories and participant's behavior, we suggest that the participants had an active attitude toward interaction with no-SRT agents.

An intelligent assistant for mediation analysis in visual analytics

Mediation analysis is commonly performed using regressions or Bayesian network analysis in statistics, psychology, and health science; however, it is not effectively supported in existing visualization tools. The lack of assistance poses great risks when people use visualizations to explore causal relationships and make data-driven decisions, as spurious correlations or seemingly conflicting visual patterns might occur. In this paper, we focused on the causal reasoning task over three variables and investigated how an interface could help users reason more efficiently. We developed an interface that facilitates two processes involved in causal reasoning: 1) detecting inconsistent trends, which guides users' attention to important visual evidence, and 2) interpreting visualizations, by providing assisting visual cues and allowing users to compare key visualizations side by side. Our preliminary study showed that the features are potentially beneficial. We discuss design implications and how the features could be generalized for more complex causal analysis.

Who should be my teammates: using a conversational agent to understand individuals and help teaming

We are building an intelligent agent to help teaming efforts. In this paper, we investigate the real-world use of such an agent to understand students deeply and help student team formation in a large university class involving about 200 students and 40 teams. Specifically, the agent interacted with each student in a text-based conversation at the beginning and end of the class. We show how the intelligent agent was able to elicit in-depth information from the students, infer the students' personality traits, and reveal the complex relationships between team personality compositions and team results. We also report on the students' behavior with and impression of the agent. We discuss the benefits and limitations of such an intelligent agent in helping team formation, and the design considerations for creating intelligent agents for aiding in teaming efforts.

BigBlueBot: teaching strategies for successful human-agent interactions

Chatbots are becoming quite popular, with many brands developing conversational experiences using platforms such as IBM's Watson Assistant and Facebook Messenger. However, previous research reveals that users' expectations of what conversational agents can understand and do far outpace their actual technical capabilities. Our work seeks to bridge the gap between these expectations and reality by designing a fun learning experience with several goals: explaining how chatbots work by mapping utterances to a set of intents, teaching strategies for avoiding conversational breakdowns, and increasing desire to use chatbots by creating feelings of empathy toward them. Our experience, called BigBlueBot, consists of interactions with two chatbots in which breakdowns occur and the user (or chatbot) must recover using one or more repair strategies. In a Mechanical Turk evaluation (N=88), participants learned strategies for having successful human-agent interactions, reported feelings of empathy toward the chatbots, and expressed a desire to interact with chatbots in the future.

SESSION: Trust in automation

Do I trust my machine teammate?: an investigation from perception to decision

In the human-machine collaboration context, understanding the reason behind each human decision is critical for interpreting the performance of the human-machine team. Via an experimental study of a system with varied levels of accuracy, we describe how human trust interplays with system performance, human perception and decisions. It is revealed that humans are able to perceive the performance of automatic systems and themselves, and adjust their trust levels according to the accuracy of systems. The 70% system accuracy suggests to be a threshold between increasing and decreasing human trust and system usage. We have also shown that trust can be derived from a series of users' decisions rather than from a single one, and relates to the perceptions of users. A general framework depicting how trust and perception affect human decision making is proposed, which can be used as future guidelines for human-machine collaboration design.

Effects of the source of advice and decision task on decisions to request expert advice

Automation has become a deeply integrated aspect of our everyday activities. Many factors affect whether we rely on and comply with recommendations that we receive, from both human and automated experts. In the present study, participants were presented with advice from either a human or automated expert to complete one of two decision tasks: assigning teams to find human survivors or assigning teams to find and repair oil wells. Participants played 1 of 4 modified versions of the Search and Rescue video game and, on each trial, were asked to choose 3 of 12 locations to which to send search teams. Participants could request advice from a drone or human expert (confederate), depending on the condition to which they were assigned. Participants utilized automation more consistently than the human expert regardless of the decision task. We discuss possible explanations of our results and how they affect design considerations for automation.

SESSION: User-adaptive IUIs

RL-KLM: automating keystroke-level modeling with reinforcement learning

The Keystroke-Level Model (KLM) is a popular model for predicting users' task completion times with graphical user interfaces. KLM predicts task completion times as a linear function of elementary operators. However, the policy, or the assumed sequence of the operators that the user executes, needs to be prespeciffed by the analyst. This paper investigates Reinforcement Learning (RL) as an algorithmic method to obtain the policy automatically. We define the KLM as an Markov Decision Process, and show that when solved with RL methods, this approach yields user-like policies in simple but realistic interaction tasks. RL-KLM offers a quick way to obtain a global upper bound for user performance. It opens up new possibilities to use KLM in computational interaction. However, scalability and validity remain open issues.

SAM: a modular framework for self-adapting web menus

This paper presents SAM, a modular and extensible JavaScript framework for <u>s</u>elf-<u>a</u>dapting <u>m</u>enus on webpages. SAM allows control of two elementary aspects for adapting web menus: (1) the target policy, which assigns scores to menu items for adaptation, and (2) the adaptation style, which specifies how they are adapted on display. By decoupling them, SAM enables the exploration of different combinations independently. Several policies from literature are readily implemented, and paired with adaptation styles such as reordering and highlighting. The process---including user data logging---is local, offering privacy benefits and eliminating the need for server-side modifications. Researchers can use SAM to experiment adaptation policies and styles, and benchmark techniques in an ecological setting with real webpages. Practitioners can make websites self-adapting, and end-users can dynamically personalise typically static web menus.

Transformer: a database-driven approach to generating forms for constrained interaction

Form-based data insertion or querying is often one of the most time-consuming steps in data-driven workflows. The small screen and lack of physical keyboard in devices such as smartphones and smartwatches introduce imprecision during user input. This can lead to data quality issues such as incomplete responses and errors, increasing user input time. We present Transformer, a system that leverages the contents of the database to automatically optimize forms for constrained input settings. Our cost function models the user input effort based on the schema and data distribution. This is used by Transformer to find the user interface (UI) widget and layout with ideal input cost for each form field. We demonstrate through user studies that Transformer provides a significantly improved user experience, with up to 50% and 57% reduction in form completion time for smartphones and smartwatches respectively.

SearchLens: composing and capturing complex user interests for exploratory search

Whether figuring out where to eat in an unfamiliar city or deciding which apartment to live in, consumer generated data (i.e. reviews and forum posts) are often an important influence in online decision making. To make sense of these rich repositories of diverse opinions, searchers need to sift through a large number of reviews to characterize each item based on aspects that they care about. We introduce a novel system, SearchLens, where searchers build up a collection of "Lenses" that reflect their different latent interests, and compose the Lenses to find relevant items across different contexts. Based on the Lenses, SearchLens generates personalized interfaces with visual explanations that promotes transparency and enables deeper exploration. While prior work found searchers may not wish to put in effort specifying their goals without immediate and sufficient benefits, results from a controlled lab study suggest that our approach incentivized participants to express their interests more richly than in a baseline condition, and a field study showed that participants found benefits in SearchLens while conducting their own tasks.

SESSION: Automated driving

Improving take-over quality in automated driving by interrupting non-driving tasks

With automated driving advancing, first production models started to incorporate the technology. However, until full autonomy is achieved, drivers always need to stay available to take over control from the car. This requirement has proven challenging: increased levels of automation reduce drivers' situational awareness and driving performance can suffer, especially in the critical moments after take-over. While manual-driving research introduced strategies to direct drivers' attention back to the road, notably interruptions of the non-driving task, the efficacy of these interventions on automated driving remain unclear. To investigate this, 53 participants drove in an automated simulator while performing tasks on an IVIS. With task interruptions, they reported increased situational awareness and showed improved reaction times during take-over, particularly for low-effort tasks (watching movies). Different to manual driving, halting tasks did not suffice; instead, we displayed the driving scene. Results question effects of situational awareness on take-over and offer solutions for manufacturers.

Assessing public perception of self-driving cars: the autonomous vehicle acceptance model

We introduce the Autonomous Vehicle Acceptance Model (AVAM), a model of user acceptance for autonomous vehicles, adapted from existing models of user acceptance for generic technologies. A 26-item questionnaire is developed in accordance with the model and a survey conducted to evaluate 6 autonomy scenarios. In a pilot survey (n = 54) and follow-up survey (n = 187), the AVAM presented good internal consistency and replicated patterns from previous surveys. Results showed that users were less accepting of high autonomy levels and displayed significantly lower intention to use highly autonomous vehicles. We also assess expected driving engagement of hands, feet and eyes which are shown to be lower for full autonomy compared with all other autonomy levels. This highlighted that partial autonomy, regardless of level, is perceived to require uniformly higher driver engagement than full autonomy. These results can inform experts regarding public perception of autonomy across SAE levels. The AVAM and associated questionnaire enable standardised evaluation of AVs across studies, allowing for meaningful assessment of changes in perception over time and between different technologies.

Why do you like to drive automated?: a context-dependent analysis of highly automated driving to elaborate requirements for intelligent user interfaces

Technology acceptance is a critical factor influencing the adoption of automated vehicles. Consequently, manufacturers feel obliged to design automated driving systems in a way to account for negative effects of automation on user experience. Recent publications confirm that full automation will potentially lack in the satisfaction of important user needs. To counteract, the adoption of Intelligent User Interfaces (IUIs) could play an important role. In this work, we focus on the evaluation of the impact of scenario type (represented by variations of road type and traffic volume) on the fulfillment of psychological needs. Results of a qualitative study (N=30) show that the scenario has a high impact on how users perceive the automation. Based on this, we discuss the potential of adaptive IUIs in the context of automated driving. In detail, we look at the aspects trust, acceptance, and user experience and its impact on IUIs in different driving situations.

S(C)ENTINEL: monitoring automated vehicles with olfactory reliability displays

Overreliance in technology is safety-critical and it is assumed that this could have been a main cause of severe accidents with automated vehicles. To ease the complex task of permanently monitoring vehicle behavior in the driving environment, researchers have proposed to implement reliability/uncertainty displays. Such displays allow to estimate whether or not an upcoming intervention is likely. However, presenting uncertainty just adds more visual workload on drivers, who might also be engaged in secondary tasks. We suggest to use olfactory displays as a potential solution to communicate system uncertainty and conducted a user study (N=25) in a high-fidelity driving simulator. Results of the experiment (conditions: no reliability display, purely visual reliability display, and visual-olfactory reliability display) comping both objective (task performance) and subjective (technology acceptance model, trust scales, semi-structured interviews) measures suggest that olfactory notifications could become a valuable extension for calibrating trust in automated vehicles.

SESSION: Collaborative interfaces

Photo sleuth: combining human expertise and face recognition to identify historical portraits

Identifying people in historical photographs is important for preserving material culture, correcting the historical record, and creating economic value, but it is also a complex and challenging task. In this paper, we focus on identifying portraits of soldiers who participated in the American Civil War (1861-65), the first widely-photographed conflict. Many thousands of these portraits survive, but only 10--20% are identified. We created Photo Sleuth, a web-based platform that combines crowdsourced human expertise and automated face recognition to support Civil War portrait identification. Our mixed-methods evaluation of Photo Sleuth one month after its public launch showed that it helped users successfully identify unknown portraits and provided a sustainable model for volunteer contribution. We also discuss implications for crowd-AI interaction and person identification pipelines.

Popup: reconstructing 3D video using particle filtering to aggregate crowd responses

Collecting a sufficient amount of 3D training data for autonomous vehicles to handle rare, but critical, traffic events (e.g., collisions) may take decades of deployment. Abundant video data of such events from municipal traffic cameras and video sharing sites (e.g., YouTube) could provide a potential alternative, but generating realistic training data in the form of 3D video reconstructions is a challenging task beyond the current capabilities of computer vision. Crowdsourcing the annotation of necessary information could bridge this gap, but the level of accuracy required to obtain usable reconstructions makes this task nearly impossible for non-experts. In this paper, we propose a novel hybrid intelligence method that combines annotations from workers viewing different instances (video frames) of the same target (3D object), and uses particle filtering to aggregate responses. Our approach can leveraging temporal dependencies between video frames, enabling higher quality through more aggressive filtering. The proposed method results in a 33% reduction in the relative error of position estimation compared to a state-of-the-art baseline. Moreover, our method enables skipping (self-filtering) challenging annotations, reducing the total annotation time for hard-to-annotate frames by 16%. Our approach provides a generalizable means of aggregating more accurate crowd responses in settings where annotation is especially challenging or error-prone.

Assisting group activity analysis through hand detection and identification in multiple egocentric videos

Research in group activity analysis has put attention to monitor the work and evaluate group and individual performance, which can be reflected towards potential improvements in future group interactions. As a new means to examine individual or joint actions in the group activity, our work investigates the potential of detecting and disambiguating hands of each person in first-person points-of-view videos. Based on the recent developments in automated hand-region extraction from videos, we develop a new multiple-egocentric-video browsing interface that gives easy access to the frames of 1) individual action when only the hands of the viewer are detected, 2) joint action when collective hands are detected, and 3) the viewer checking the others' action as only their hands are detected. We take the evaluation process to explore the effectiveness of our interface with proposed hand-related features which can help perceive actions of interests in the complex analysis of videos involving co-occurred behaviors of multiple people.

Explainable modeling of annotations in crowdsourcing

Aggregation models for improving the quality of annotations collected via crowdsourcing have been widely studied, but far less has been done to explain why annotators make the mistakes that they do. To this end, we propose a joint aggregation and worker clustering model that detects patterns underlying crowd worker labels to characterize varieties of labeling errors. We evaluate our approach on a Named Entity Recognition dataset labeled by Mechanical Turk workers in both a retrospective experiment and a small human study. The former shows that our joint model improves the quality of clusters vs. aggregation followed by clustering. Results of the latter suggest that clusters aid human sense-making in interpreting worker labels and predicting worker mistakes. By enabling better explanation of annotator mistakes, our model creates a new opportunity to help Requesters improve task instructions and to help crowd annotators learn from their mistakes. Source code, data, and supplementary material is shared online.

CoSummary: adaptive fast-forwarding for surgical videos by detecting collaborative scenes using hand regions and gaze positions

This paper presents CoSummary, an adaptive video fast-forwarding technique for browsing surgical videos recorded by wearable cameras. Current wearable technologies allow us to record complex surgical skills, however, an efficient browsing technique for these videos is not well established. In order to assist browsing surgical videos, our study focuses on adaptively changing playback speeds through the learning and detecting collaborative scenes based on surgeon hand placement and gaze information. Our evaluation shows that the proposed method is able to highlight important collaborative scenes and skip less important scenes during surgical procedures. We have also performed a subjective study with surgeons in order to have professional feedback. The results confirmed the effectiveness of the proposed method in comparison to uniform video fast-forwarding.

SESSION: Interactive machine learning

Towards rapid interactive machine learning: evaluating tradeoffs of classification without representation

Our contribution is the design and evaluation of an interactive machine learning interface that rapidly provides the user with model feedback after every interaction. To address visual scalability, this interface communicates with the user via a "tip of the iceberg" approach, where the user interacts with a small set of recommended instances for each class. To address computational scalability, we developed an O(n) classification algorithm that incorporates user feedback incrementally, and without consulting the data's underlying representation matrix. Our computational evaluation showed that this algorithm has similar accuracy to several off-the-shelf classification algorithms with small amounts of labeled data. Empirical evaluation revealed that users performed better using our design compared to an equivalent active learning setup.

Where can my career take me?: harnessing dialogue for interactive career goal recommendations

Career goals represent a special case for recommender systems and require considering both short and long term goals. Recommendations must represent a trade off between relevance to the user, achievability and aspirational goals to move the user forward in their career. Users may have different motivations and concerns when looking for a new long term goal, so involving the user in the recommender process becomes all the more important than in other domains. Additionally, the cost to the user of making a bad decision is much higher than investing two hours in watching a movie they don't like or listening to an unappealing song. As a result, we feel career recommendations is a unique opportunity to truly engage the user in an interactive recommender as we believe they will invest the cognitive load. In this paper, we present an interactive career goal recommender framework that leverages the power of dialogue to allow the user interactively improve the recommendations and bring their own preferences to the system. The underlying recommendation algorithm is a novel solution that suggests both short and long term goals through utilizing the sequential patterns extracted from career trajectories that are enhanced with features of the supporting user profiles. The effectiveness of the proposed solution is demonstrated with extensive experiments on two real world data sets.

Towards human-guided machine learning

Automated Machine Learning (AutoML) systems are emerging that automatically search for possible solutions from a large space of possible kinds of models. Although fully automated machine learning is appropriate for many applications, users often have knowledge that supplements and constraints the available data and solutions. This paper proposes human-guided machine learning (HGML) as a hybrid approach where a user interacts with an AutoML system and tasks it to explore different problem settings that reflect the user's knowledge about the data available. We present: 1) a task analysis of HGML that shows the tasks that a user would want to carry out, 2) a characterization of two scientific publications, one in neuroscience and one in political science, in terms of how the authors would search for solutions using an AutoML system, 3) requirements for HGML based on those characterizations, and 4) an assessment of existing AutoML systems in terms of those requirements.

Peripheral vision: a new killer app for smart glasses

Most smart glasses have a small and limited field of view. The head-mounted display often spreads between the human central and peripheral vision. In this paper, we exploit this characteristic to display information in the peripheral vision of the user. We introduce a mobile peripheral vision model, which can be used on any smart glasses with a head-mounted display without any additional hardware requirement. This model taps into the blocked peripheral vision of a user and simplifies multi-tasking when using smart glasses. To display the potential applications of this model, we implement an application for indoor and outdoor navigation. We conduct an experiment on 20 people on both smartphone and smart glass to evaluate our model on indoor and outdoor conditions. Users report to have spent at least 50% less time looking at the screen by exploiting their peripheral vision with smart glass. 90% of the users Agree that using the model for navigation is more practical than standard navigation applications.

Investigating the feasibility of finger identification on capacitive touchscreens using deep learning

Touchscreens enable intuitive mobile interaction. However, touch input is limited to 2D touch locations which makes it challenging to provide shortcuts and secondary actions similar to hardware keyboards and mice. Previous work presented a wide range of approaches to provide secondary actions by identifying which finger touched the display. While these approaches are based on external sensors which are inconvenient, we use capacitive images from mobile touchscreens to investigate the feasibility of finger identification. We collected a dataset of low-resolution fingerprints and trained convolutional neural networks that classify touches from eight combinations of fingers. We focused on combinations that involve the thumb and index finger as these are mainly used for interaction. As a result, we achieved an accuracy of over 92% for a position-invariant differentiation between left and right thumbs. We evaluated the model and two use cases that users find useful and intuitive. We publicly share our data set (CapFingerld) comprising 455,709 capacitive images of touches from each finger on a representative mutual capacitive touchscreen and our models to enable future work using and improving them.

SESSION: Multi-modal interfaces & experience transfer

MyoSign: enabling end-to-end sign language recognition with wearables

Automatic sign language recognition is an important milestone in facilitating the communication between the deaf community and hearing people. Existing approaches are either intrusive or susceptible to ambient environments and user diversity. Moreover, most of them perform only isolated word recognition, not sentence-level sequence translation. In this paper, we present MyoSign, a deep learning based system that enables end-to-end American Sign Language (ASL) recognition at both word and sentence levels. We leverage a lightweight wearable device which can provide inertial and electromyography signals to non-intrusively capture signs. First, we propose a multimodal Convolutional Neural Network (CNN) to abstract representations from inputs of different sensory modalities. Then, a bidirectional Long Short Term Memory (LSTM) is exploited to model temporal dependences. On the top of the networks, we employ Connectionist Temporal Classification (CTC) to get around temporal segments and achieve end-to-end continuous sign language recognition. We evaluate MyoSign on 70 commonly used ASL words and 100 ASL sentences from 15 volunteers. Our system achieves an average accuracy of 93.7% at word-level and 93.1% at sentence-level in user-independent settings. In addition, MyoSign can recognize sentences unseen in the training set with 92.4% accuracy. The encouraging results indicate that MyoSign can be a meaningful buildup in the advancement of sign language recognition.

Discovering natural language commands in multimodal interfaces

Discovering what to say and how to say it remains a challenge for users of multimodal interfaces supporting speech input. Users end up "guessing" commands that a system might support, often leading to interpretation errors and frustration. One solution to this problem is to display contextually relevant command examples as users interact with a system. The challenge, however, is deciding when, how, and which examples to recommend. In this work, we describe an approach for generating and ranking natural language command examples in multimodal interfaces. We demonstrate the approach using a prototype touch- and speech-based image editing tool. We experiment with augmentations of the UI to understand when and how to present command examples. Through an online user study, we evaluate these alternatives and find that in-situ command suggestions promote discovery and encourage the use of speech input.

Exemplar based experience transfer

Banners are present in several forms and a person might be inspired by one or more of these. However, designing banners is a non-trivial task, especially for novices. Starting from a blank canvas can often be overwhelming, and exploring alternatives is time-consuming. In this paper, we propose an automatic approach to transfer a novice user's content into an example banner. Our algorithm begins with extracting the template of the example banner via a semantic segmentation approach. This is followed by an energy-based optimization framework to combine multiple design elements and arrive at an optimal layout. A crowd-sourced experiment comparing our automatic results against banners designed by creative professionals indicates the viability of the proposed work.