UMAP '19- Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization

Full Citation in the ACM Digital Library

SESSION: Keynote & Invited Talks

Session details: Keynote & Invited Talks

Modeling Behavior and Designing Evidence-Based Technologies: What We Can Learn for Empirical Data

To be effective, modern technological applications should take into account the needs, preferences, capabilities, and limitations of human users. In recent years, this requirement has made more imperative the need to understand in more detail the human cognition and its constraints. Cognitive processes and their underlying neural substrates are traditionally investigated with laboratory studies that yield data of different forms, ranging from accuracy and reaction time data in behavioural experiments to electrophysiological responses and neuro-imaging data in neuroscience studies. But how do such data enable psychologists and other scientists to draw conclusions about cognition? Also, how can the extracted knowledge be exploited for the design of evidence-based smart systems and innovative technologies? In this talk, I will address these questions by drawing examples from my research that employs various methods and techniques, including behavioural experiments in Virtual Reality, eye-tracking, and physiological recordings. Although most of this research focuses on how people attend, perceive, and memorize spatial information, studies investigating more general cognitive mechanisms (e.g., selective attention and executive functions) will be also presented.

Engagement, Metrics and Personalisation: the Good, the Bad and the Ugly

User engagement plays a central role in companies and organisations operating online services. A main challenge is to leverage knowledge about the online interaction of users to understand what engage them short-term and more importantly long-term. Two critical steps of improving user engagement are defining the right metrics and properly optimising for them. A common way that engagement is measured and understood is through the definition and development of metrics of user satisfaction, which can act as proxy of short-term user engagement, mostly at session level. In the context of recommender systems, developing a better understanding of how users interact (implicit signals) with them during their online session is important for developing metrics of user satisfaction. Detecting and understanding implicit signals of user satisfaction are essential for enhancing the quality of the recommendations. When users interact with the recommendations served to them, they leave behind fine-grained traces of interaction patterns, which can be leveraged to predict how satisfying their experience was. This talk will present various works and personal thoughts on how to measure user engagement. It will discuss the definition and development of metrics of user satisfaction that can be used as proxy of user engagement, and will include cases of good, bad and ugly scenarios. An important message will be to show that, when aiming to personalise the recommendations, it is important to consider the heterogeneity of both user and content to formalise the notion of satisfaction, and in turn design the appropriate satisfaction metrics to capture these.

Towards Utter Well-Being: Personalization for Guardian Angels

Researchers claim that we are facing a global loneliness epidemic, and that mental illness, anxiety disorders, stress and burnout are on the rise. Technology, such as social media, is often found to have a detrimental effect on mental health, self-esteem and sleep, and to cause anxiety and feelings of loneliness. This talk is about how adaptive systems can actively improve well-being, instead of contributing to making it worse. We will discuss different ways of doing so, the work already done, the challenges faced, and our vision of a new kind of personalized systems that act as guardian angels. First, systems can provide emotional support, adapted to the recipient's characteristics such as their personality, affective state, cultural background, and stressors experienced. Second, systems can aid humans to provide emotional support. People often struggle to support others, and may say something that is counter productive or nothing at all. Systems can train people on how to provide support. They can also mediate emotional support, adapting support messages to both the support giver and recipient, taking into account for example the closeness of relationships and people's personality. Third, systems can support and motivate people to adopt behaviours that improve their well-being and that of others, and to better regulate their emotions. There has been much research on persuasive technology to support people in changing behaviours, and it has been shown that both the behaviour change techniques used, and attributes of techniques need adapting. Whilst much persuasive technology research has focused on physical well-being and sustainability, the emphasis in this presentation will be on mental well-being and encouraging people to help each other. Fourth, systems can team people up. Systems can decide who are best placed to provide support and motivation, encouraging particular people to support (or ask help from) particular other people. Additionally, adaptive group formation (or peer-to-peer recommendations) can be used for joint problem solving scenarios, with a system deciding or recommending who should work with whom. There are many benefits to group work, but it is also often a source of negative emotions. Adaptive group formation can consider affect and personality in addition to expertise, to minimize such negative emotions. Finally, systems can improve the well-being of groups and not just individuals. People's well-being is influenced by the well-being of others in their surroundings, and people's actions impact the well-being of others. Systems can monitor group well-being. They can encourage and support effective group behaviours, for example, by providing feedback on how group members and the group as a whole function. They can support the building of group identity and cohesion. They can support groups in making decisions that are good for group well-being. Overall, we envision adaptive systems as effective and emotionally intelligent contributors in the community, improving the way people interact, and acting like guardian angels.

SESSION: ACM UMAP 2019 Main Track

Session details: ACM UMAP 2019 Main Track

Justifying Recommendations through Aspect-based Sentiment Analysis of Users Reviews

In this paper we present a methodology to justify the suggestions generated by a recommendation algorithm through the identification of relevant and distinguishing characteristics of the recommended item, automatically extracted by mining users' reviews. Our approach relies on a combination ofnatural language processing and sentiment analysis techniques, and is based on the following steps: (1) a set of users' reviews discussing the recommended item is gathered and analyzed; (2) the distinguishing aspects that characterize the item are extracted and a ranking function is used to identify the most relevant ones; (3) excerpts of the reviews discussing such aspects are extracted and a natural language template is filled in through the aggregation of these sentences. This represents the final output of the algorithm, which is provided to the user as justification of the recommendation she received. In the experimental evaluation, we carried out a user study (N=296, 73.6% male) aiming to investigate the effectiveness of our methodology in two different domains, as movies and books. Results showed that our technique can provide users with rich and satisfying justifications. Moreover, our experiment also showed that the users prefer review-based justifications to other explanation strategies, and this finding further confirmed the effectiveness of the approach.

Towards Social Choice-based Explanations in Group Recommender Systems

Explanations help users to better understand why a set of items has been recommended. Compared to single user recommender systems, explanations in group recommender systems have further goals. Examples thereof are fairness which helps to take into account as much as possible group members' preferences and consensus which persuades group members to agree on a decision. This paper proposes different explanation types and investigates which explanation best helps to increase the fairness perception, consensus perception, and satisfaction of group members with regard to group recommendations. We conducted a user study to evaluate the proposed explanations. The results show that explanations which take into account preferences of all or the majority of group members achieve the best results in terms of the mentioned aspects. Moreover, there exist positive correlations among these aspects, i.e., as the perceived fairness (or the perceived consensus) of explanations increases, so does the satisfaction of users with regard to group recommendations. In addition, in the context of repeated decisions, the inclusion of group members' satisfaction from previous decisions in the explanations helps to improve the fairness perception of users with regard to group recommendations.

Evaluating Visual Explanations for Similarity-Based Recommendations: User Perception and Performance

Recommender system helps users to reduce information overload. In recent years, enhancing explainability in recommender systems has drawn more and more attention in the field of Human-Computer Interaction (HCI). However, it is not clear whether a user-preferred explanation interface can maintain the same level of performance while the users are exploring or comparing the recommendations. In this paper, we introduced a participatory process of designing explanation interfaces with multiple explanatory goals for three similarity-based recommendation models. We investigate the relations of user perception and performance with two user studies. In the first study (N=15), we conducted card-sorting and semi-interview to identify the user preferred interfaces. In the second study (N=18), we carry out a performance-focused evaluation of six explanation interfaces. The result suggests that the user-preferred interface may not guarantee the same level of performance.

Visual Annotations for Hybrid Graph-based User Model

Structured user model data not only allow system personalization, but also may be of interest as a source for analysis: in particular, for the study of general trends and for the detection of anomalies in preferences and mutually-referenced features among different user models. Such sources are multidimensional and interrelated, and recently started to be represented as graph-based datasets. Among the most effective ways of studying such data is visual exploration based on data-driven graph drawing approaches: in particular, node-link and node-link-group diagrams. The paper provides an overview of advanced approaches to the graphical representation of multidimensional data derived from user modeling and presents a proposal for developing flexible and scalable user interfaces for the hypergraph-based visual exploration of relations within a user model (UM). Then, we propose these principles in the visualization of an existing adaptive system.

Effect of Values and Technology Use on Exercise: Implications for Personalized Behavior Change Interventions

Technology has recently been recruited in the war against the ongoing obesity crisis; however, the adoption of Health & Fitness applications for regular exercise is a struggle. In this study, we present a unique demographically representative dataset of 15k US residents that combines technology use logs with surveys on moral views, human values, and emotional contagion. Combining these data, we provide a holistic view of individuals to model their physical exercise behavior. First, we show which values determine the adoption of Health & Fitness mobile applications, finding that users who prioritize the value of purity and de-emphasize values of conformity, hedonism, and security are more likely to use such apps. Further, we achieve a weighted AUROC of .673 in predicting whether individual exercises, and we also show that the application usage data allows for substantially better classification performance (.608) compared to using basic demographics (.513) or internet browsing data (.546). We also find a strong link of exercise to respondent socioeconomic status, as well as the value of happiness. Using these insights, we propose actionable design guidelines for persuasive technologies targeting health behavior modification.

Rating-based Preference Elicitation for Recommendation of Stress Intervention

In recent years, recommender systems have emerged as a key component for personalization in health applications. Central in the development of recommender systems is rating-based preference elicitation, based both on single-criterion and multi-criteria rating. Though its use has already been studied in various domains of recommender systems, far too little attention has been paid to preference elicitation in health recommender systems~(HRS). The purpose of this paper is to develop a better understanding of this preference elicitation by studying the criteria that users consider when they rate a health promotion recommendation from HRS, and accordingly, to offer a design solution as a functional feedback model for mobile health applications. This paper investigates the user-perceived importance of various criteria, as well as latent factors for eliciting user feedback on the recommendations. It also reports the relationship of explanation and trust to the overall rating. By aggregating a list of all possible criteria, we further discover that not all criteria are equally important to users, and that the effectiveness of a recommendation plays a dominant role.

WalkWithMe: Personalized Goal Setting and Coaching for Walking in People with Multiple Sclerosis

People with Multiple Sclerosis (pwMS) suffer from a diverse set of symptoms such as fatigue, pain, depression, and decline in motor and cognitive function. It has been proven that physical activity has a positive effect on most of these symptoms. However, many pwMS lead sedentary lives, and do not meet the guidelines for physical activity. We propose WalkWithMe, a mobile application that supports pwMS in walking. WalkWithMe coaches pwMS in achieving a personal goal over a period of 10 weeks. We conducted a workshop with pwMS and brainstorm sessions with experts in rehabilitation to define the design choices of WalkWithMe. We examined the impact of WalkWithMe in a 10-week field study with 13 pwMS. The study revealed insights in walking habits, and positive trends in walking capacity. In this paper, we present the design aspects of WalkWithMe, findings of our 10-week evaluation, and resulting insights on goal setting for pwMS.

Relationship between Device Performance, Trust and User Behaviour in a Care-taking Scenario

We present insights obtained from a web-based game designed to investigate trust-related factors in a care-taking scenario. The game is set in a retirement village, where elderly residents live in smart homes equipped with monitoring systems. These systems should raise alerts when adverse events happen, but they do not function perfectly (they may issue false alerts or miss true events). Players, who "work'' in the village, perform a primary task, whereby they must ensure the welfare of the residents by attending to adverse events in a timely manner, and a secondary routine task that demands their attention. Our contributions are (1) the game itself, which supports experimentation with various trust-related factors; (2) a methodology for the calibration of the game's parameters; (3) insights from two experiments regarding the relationship between device performance, in particular error type, and trust and user behaviour; and (4) insights from predictive models about factors that influence trust and aspects of user behaviour.

Modeling Cognitive Status through Automatic Scoring of a Digital Version of the Clock Drawing Test

The Clock Drawing Test is used as a cognitive assessment tool in geriatrics to detect signs of dementia or to model the progress of stroke recovery. The result is scored manually by a trained professional. We implement the Mendez scoring scheme and create a hierarchy of error categories that model the test characteristics of the clock drawing test, based on a set of impaired clock examples provided by a geriatrics clinic. Using a digital pen we recorded 120 clock samples for evaluating the automatic scoring system, with a total of 2400 error samples distributed over the 20 error classes of the Mendez scoring scheme. Error classes are scored automatically using a handwriting and gesture recognition framework. Results show that we provide a clinically relevant cognitive model for each subject. In addition, we heavily reduce the time spent on manual scoring. We compare manual scoring results with results produced by our automated system.

"An Unscented Hound for Working Memory" and the Cognitive Adaptation of User Interfaces

An Unscented Hound for Working Memory (AUHWM) is a new framework for the real-time tracking of human Working Memory (WM) that can be used to adapt computer interfaces to users' available cognitive resources. WM is the part of human cognition responsible for the short term storing and handling of information; it can, in stressful situations, under information overload or when suffering from dementia-like diseases, become severely limited, possibly leading to poor decision making. Our preliminary results suggest that AUHWM can provide a precise and timely assessment of WM capacity, so that the cognitive load a specific task imposes on users can be adapted, e.g., at the User Interface (UI) level. AUHWM is based on a low-level stochastic discrete model of human WM dynamics, implemented as a Gradient-Boosting-derived deterministic algorithm that simulates users' oblivion. AUHWM also performs Unscented Kalman filtering to track users' WM-specific parameters in real time, thus providing a dynamic assessment of their cognitive resources. Our approach has been tested and validated using data collected from Match$ ^2$s, a visual memory game played by 18 users in another study. Going beyond real-time WM tracking, AUHWM is intended to also be used for WM prediction, paving the way to the adaptation of tasks and their UIs in real time as a function of users' cognitive abilities; we detail an example of such an adapted system, and provide experimental evidence this approach could lead to future enhanced WM-adapted UIs.

Combining Trending Scan Paths with Arousal to Model Visual Behaviour on the Web: A Case Study of Neurotypical People vs People with Autism

People with autism often exhibit different visual behaviours from neurotypical users. To explore how these differences are exhibited on the Web, we model visual behaviour by combining pupillary response, which is an unobtrusive measure of physiological arousal, with eye-tracking scan paths that indicate visual attention. We evaluated our approach with two populations: 19 neurotypical users and 19 users with autism. We observe differences in their visual behaviours as, in certain instances, individuals with autism exhibit a lower arousal response to affective contents. While this is consistent with the literature on autism, we confirm this phenomenon on the Web. We discuss how our modelling method can be used to identify possible UX issues such as the presence of stress, cognitive load and differences in the perception of Web elements in relation to physiological arousal.

What Makes an Image Tagger Fair?

Image analysis algorithms have been a boon to personalization in digital systems and are now widely available via easy-to-use APIs. However, it is important to ensure that they behave fairly in applications that involve processing images of people, such as dating apps. We conduct an experiment to shed light on the factors influencing the perception of "fairness." Participants are shown a photo along with two descriptions (human- and algorithm-generated). They are then asked to indicate which is "more fair" in the context of a dating site, and explain their reasoning. We vary a number of factors, including the gender, race and attractiveness of the person in the photo. While participants generally found human-generated tags to be more fair, API tags were judged as being more fair in one setting - where the image depicted an "attractive," white individual. In their explanations, participants often mention accuracy, as well as the objectivity/subjectivity of the tags in the description. We relate our work to the ongoing conversation about fairness in opaque tools like image tagging APIs, and their potential to result in harm.

Beggars Can't Be Choosers: Augmenting Sparse Data for Embedding-Based Product Recommendations in Retail Stores

Recommender systems are an essential component in many e-commerce platforms to drive sales and guide customers when exploring new products. With the increasing adoption of RFID technology in traditional brick-and-mortar stores, for example, in the form of smart fitting rooms that allow to display recommendations in the integrated mirror, retailers have only recently started to tap into existing product recommendation algorithms. However, due to limited data availability as well as sparsity, for example due to assortments adapted for different demographics, traditional retailers largely struggle to leverage this technology. In this paper we extend the state-of-the-art embedding-based recommender approach prod2vec by processing information about co-purchased products (i.e., shopping baskets) in retail stores. By adding point-of-sale information to shopping baskets we are able to provide recommendations aimed at individual stores, without having to maintain separate models for each location. Furthermore, we experiment with data augmentation methods to overcome the imposed limitations of the available data, and are able to increase the quality of the computed recommendations by more than 6.9%.

One Size Does Not Fit All: Badge Behavior in Q&A Sites

Badges are endemic to online interaction sites, from Question and Answer (Q&A) websites to ride sharing, as systems for rewarding participants for their contributions. This paper studies how badge design affects people's contributions and behavior over time. Past work has shown that badges "steer'' people's behavior toward substantially increasing the amount of contributions before obtaining the badge, and immediately decreasing their contributions thereafter, returning to their baseline contribution levels. In contrast, we find that the steering effect depends on the type of user, as modeled by the rate and intensity of the user's contributions. We use these measures to distinguish between different groups of user activity, including users who are not affected by the badge system despite being significant contributors to the site. We provide a predictive model of how users change their activity group over the course of their lifetime in the system. We demonstrate our approach empirically in three different Q&A sites on Stack Exchange with hundreds of thousands of users, and we discuss the implications for system designers.

Socially-Aware Diagnosis for Constraint-Based Recommendation

Constraint-based group recommender systems support the identification of items that best match the individual preferences of all group members. In cases where the requirements of the group members are inconsistent with the underlying constraint set, the group needs to be supported such that an appropriate solution can be found. In this paper, we present a guided approach that determines socially-aware diagnoses based on different aggregation functions. We analyzed the prediction quality of different aggregation functions by using data collected in a user study. The results indicate that those diagnoses guided by the Least Misery aggregation function achieve a higher prediction quality compared to the Average Voting, Most Pleasure, and Majority Voting. Moreover, another major outcome of our work reveals that diagnoses based on aggregation functions outperform basic approaches such as Breadth First Search and Direct Diagnosis.

A User Study on Groups Interacting with Tourist Trip Recommender Systems in Public Spaces

Tourist groups exploring a city often face the problem of finding a sequence of points of interest that satisfies all group members. In this work, we present three different configurations of a group recommender system that suggests such trips even when tourists are already traveling: connecting multiple smartphones, sharing a public display, and combining both devices in a distributed user interface approach. We conducted a large user study with real groups to evaluate these configurations. Our results show that public displays are attractive for users who prefer an open discussion of their preferences. However, we have empirical evidence that decisions on group preferences often tend to be unfair for some group members, especially when they do not know each other very well. A distributed recommender system aggregating group members' individual preferences fairly with the option to display selected content on a public display was the most appreciated solution for overcoming this problem.

Diversity and Novelty in Social-Based Collaborative Filtering

Social-based recommenders seek to exploit the mechanisms of homophily and influence observed in social networks in order to provide more accurate recommendations. The way they achieve this is by enforcing similar preferences among users that are socially connected. It is thus reasonable to question whether such approaches lead to the formation of echo chambers, i.e., social groups with a narrow set of preferences and which receive recommendations with low diversity and novelty. This work studies this research question and quantifies the diversity and novelty of existing methods. An important finding is that it is possible to increase accuracy without sacrificing diversity and novelty.

Bayesian Personalized Ranking for Novelty Enhancement

Novelty enhancement of recommendations is typically achieved through a post-filtering process applied on a candidate set of items. While it is an effective method, its performance heavily depends on the quality of a baseline algorithm, and many of the state-of-the-art algorithms generate recommendations that are relatively similar to what the user has interacted with in the past. In this paper we explore the use of sampling as a means of novelty enhancement in the Bayesian Personalized Ranking objective. We evaluate the proposed extensions on the MovieLens 20M dataset, and show that the proposed method can be successfully used instead of two-step reranking, as it offers comparable and better accuracy/novelty tradeoffs, and more unique recommendations.

Estimating Confidence of Individual User Predictions in Item-based Recommender Systems

This paper focuses on recommender systems based on item-item collaborative filtering (CF). Although research on item-based methods is not new, current literature does not provide any reliable insight on how to estimate confidence of recommendations. The goal of this paper is to fill this gap, by investigating the conditions under which item-based recommendations will succeed or fail for a specific user. We formalize the item-based CF problem as an eigenvalue problem, where estimated ratings are equivalent to the true (unknown) ratings multiplied by a user-specific eigenvalue of the similarity matrix. We show that the magnitude of the eigenvalue related to a user is proportional to the accuracy of recommendations for that user. We define a confidence parameter called the eigenvalue confidence index, analogous to the eigenvalue of the similarity matrix, but simpler to be computed. We also show how to extend the eigenvalue confidence index to matrix-factorization algorithms. A comprehensive set of experiments on five datasets show that the eigenvalue confidence index is effective in predicting, for each user, the quality of recommendations. On average, our confidence index is 3 times more correlated with MAP with respect to previous confidence estimates.

Are All Rejected Recommendations Equally Bad?: Towards Analysing Rejected Recommendations

When evaluating algorithms that recommend a list of relevant items to a user, it is common to use metrics such as precision to measure the system accuracy. When computing precision, one computes the number of items that were selected by the user among the recommended items. As such, recommended items that were not selected by the user, which we call \em rejected recommendations, are all considered to be bad recommendations, resulting in no increase to the system accuracy metric. Our ultimate goal is to develop a new recommendation accuracy evaluation metric, which may assign some value to the rejected recommendations. In this paper, as a first step, we claim that some rejected recommendations are better than others. Specifically, we consider items that are similar to the item that was finally selected, as better recommendations than items that bear little similarity. We conduct a user study, showing that rejected recommendations that have high content or collaborative similarity to the selected item are perceived by users as better recommendations than items with low similarity. In addition, we study the correlations between the recommended items shown to a user and the un-recommended items that the user has selected in a real-life job posting dataset. We show that when considering item similarity rather than simple precision, the correlations are much higher. This may be attributed to the influence of the recommended items on the decisions of the user.

Telemetry-Aware Add-on Recommendation for Web Browser Customization

Web Extensions (add-ons) allow clients to customize their Web browsing experience through the addition of auxiliary features to their browsers. The add-on ecosystem is a market differentiator for the Firefox browser, offering contributions from both commercial entities and community developers. In this paper, we present the Telemetry-Aware Add-on Recommender (TAAR), a system for recommending add-ons to Firefox users by leveraging separate models trained to three main sources of user data: the set of add-ons a user already has installed; usage and interaction data (browser Telemetry); and the language setting of the user's browser (locale). We build individual recommendation models for each of these data sources, and combine the recommendations they generate using a linear stacking ensemble method. Our method employs a novel penalty function for tuning weight parameters, which is adapted from the log likelihood ratio cost function, allowing us to scale the penalty of both correct and incorrect recommendations using the confidence weights associated with the individual component model recommendations. This modular approach provides a way to offer relevant personalized recommendations while respecting Firefox's granular privacy preferences and adhering to Mozilla's lean data collection policy. To evaluate our recommender system, we ran a large-scale randomized experiment that was deployed to 350,000 Firefox users and localized to 11 languages. We found that, overall, users were 4.4% more likely to install add-ons recommended by our ensemble method compared to a curated list. Furthermore, the magnitude of the increase varies significantly across locales, achieving over 8% improvement among German-language users.

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent to make decisions. We present an algorithm, value driven representation (VDR), that can iteratively and adaptively augment the observation space of a reinforcement learning agent so that is sufficient to capture a (near) optimal policy. To do so we introduce a new method to optimistically estimate the value of a policy using offline simulated Monte Carlo rollouts. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.

Extending a Tag-based Collaborative Recommender with Co-occurring Information Interests

Collaborative Filtering is largely applied to personalize item recommendation but its performance is affected by the sparsity of rating data. In order to address this issue, recent systems have been developed to improve recommendation by extracting latent factors from the rating matrices, or by exploiting trust relations established among users in social networks. In this work, we are interested in evaluating whether other sources of preference information than ratings and social ties can be used to improve recommendation performance. Specifically, we aim at testing whether the integration of frequently co-occurring interests in information search logs can improve recommendation performance in User-to-User Collaborative Filtering (U2UCF). For this purpose, we propose the Extended Category-based Collaborative Filtering (ECCF) recommender, which enriches category-based user profiles derived from the analysis of rating behavior with data categories that are frequently searched together by people in search sessions. We test our model using a big rating dataset and a log of a largely used search engine to extract the co-occurrence of interests. The experiments show that ECCF outperforms U2UCF and category-based collaborative recommendation in accuracy, MRR, diversity of recommendations and user coverage. Moreover, it outperforms the SVD++ Matrix Factorization algorithm in accuracy and diversity of recommendation lists.

Linguistic Design of In-Vehicle Prompts in Adaptive Dialog Systems: An Analysis of Potential Factors Involved in the Perception of Naturalness

Against the background of current trends towards natural and adaptive in-vehicle Spoken Dialog Systems, this paper aims at evaluating potential factors involved in the perception of naturalness and comprehensibility of system prompts. By conducting an exploratory user study investigating various syntactic paraphrases, we were able to identify several system- and user-sided characteristics which should be considered in the design of system prompts. We conclude from our results that the choice of a syntactic structure for in-vehicle prompts is a relevant question and interestingly depends on several individual user characteristics, such as personality.

Modeling Behavior Patterns with an Unfamiliar Voice User Interface

Voice User Interfaces (VUIs) are becoming increasingly popular. However, how VUIs can adapt to user differences remains insufficiently understood. We analyze usage data from a user study (n=50) where participants interacted with an unfamiliar VUI. Through automated clustering and statistical analysis, we present user models of their behavior patterns. We found user behavior can be grouped into three clusters: people who become proficient with the system and typically stay proficient while completing different tasks, people who exhibit an exploratory approach to completing tasks, and people who struggled to complete tasks. We discuss design implications based on these behavior clusters.

On the Accuracy of Eye Gaze-driven Classifiers for Predicting Image Content Familiarity in Graphical Passwords

Graphical passwords leverage the picture superiority effect to enhance memorability, and reflect today's haptic users' interaction realms. Images related to users' past sociocultural experiences (e.g., retrospective) enable the creation of memorable and secure passwords, while randomly system-assigned images (e.g., generic) lead to easy-to-predict hotspot regions within graphical password schemes. What remains rather unexplored is whether the image type could be inferred during the password creation. In this work, we present a between-subjects user study in which 37 participants completed a recall-based graphical password creation task with retrospective and generic images, while we were capturing their visual behavior. We found that the image type can be inferred within a few seconds in real-time. User adaptive mechanisms might benefit from our work's findings, by providing users early feedback whether they are moving towards the creation of a weak graphical password.

Personalized Gait-based Authentication Using UWB Wearable Devices

Passive and effortless authentication of the owner of wearable devices can be achieved by building a personalized model of his/her movements during gait periods. In this paper, an authentication method based on the distances between a set of body-worn devices is proposed. The method assumes that no prior information is available about users different from the legitimate one. One-class classification methods are used to distinguish the gait segments of the owner from the gait segments of possible impostors. Experimental results show that accuracy values as high as ~87-91% can be obtained. The impact of different walking styles (normal, fast, slow, and carrying a bag) is also evaluated.

Detecting Persuasive Arguments based on Author-Reader Personality Traits and their Interaction

Persuasion is one of the most frequent, albeit challenging, tasks in human interaction. In a textual argument, one party (author) aims to change the view of the other party (reader). In this paper, we propose to detect persuasive textual arguments while considering the parties personality traits. We find that we can substantially improve accuracy by introducing features that capture author-reader personality traits and their interaction. Our model improves performance of state-of-the-art baselines from 66% to 71% on a new dataset of more than 19K arguments we collected.

Multi-faceted Trust-based Collaborative Filtering

Many collaborative recommender systems leverage social correlation theories to improve suggestion performance. However, they focus on explicit relations between users and they leave out other types of information that can contribute to determine users' global reputation; e.g., public recognition of reviewers' quality.

We are interested in understanding if and when these additional types of feedback improve Top-N recommendation. For this purpose, we propose a multi-faceted trust model to integrate local trust, represented by social links, with various types of global trust evidence provided by social networks. We aim at identifying general classes of data in order to make our model applicable to different case studies. Then, we test the model by applying it to a variant of User-to-User Collaborative filtering (U2UCF) which supports the fusion of rating similarity, local trust derived from social relations, and multi-faceted reputation for rating prediction.

We test our model on two datasets: the Yelp one publishes generic friend relations between users but provides different types of trust feedback, including user profile endorsements. The LibraryThing dataset offers fewer types of feedback but it provides more selective friend relations aimed at content sharing. The results of our experiments show that, on the Yelp dataset, our model outperforms both U2UCF and state-of-the-art trust-based recommenders that only use rating similarity and social relations. Differently, in the LibraryThing dataset, the combination of social relations and rating similarity achieves the best results. The lesson we learn is that multi-faceted trust can be a valuable type of information for recommendation. However, before using it in an application domain, an analysis of the type and amount of available trust evidence has to be done to assess its real impact on recommendation performance.

Power of the Few: Analyzing the Impact of Influential Users in Collaborative Recommender Systems

Like other social systems, in collaborative filtering a small number of "influential" users may have a large impact on the recommendations of other users, thus affecting the overall behavior of the system. Identifying influential users and studying their impact on other users is an important problem because it provides insight into how small groups can inadvertently or intentionally affect the behavior of the system as a whole. Modeling these influences can also shed light on patterns and relationships that would otherwise be difficult to discern, hopefully leading to more transparency in how the system generates personalized content. In this work we first formalize the notion of "influence" in collaborative filtering using an Influence Discrimination Model. We then empirically identify and characterize influential users and analyze their impact on the system under different underlying recommendation algorithms and across three different recommendation domains: job, movie and book recommendations. Insights from these experiments can help in designing systems that are not only optimized for accuracy, but are also tuned to mitigate the impact of influential users when it might lead to potential imbalance or unfairness in the system's outcomes.

Who Shares Fake News in Online Social Networks?

Today more and more people use social networks and so the differences in personalities of users become more diversified. The same holds true for available news content. To test if regular news and fake news are distributed similarly and to what extent this depends on the personality and behavior of individuals, we conducted a mixed-method study. Through an online questionnaire we measured personality traits of individuals in social networks, how they behave, and how they are connected to each other. Using this data, we developed an agent-based model of an online social network. Using our model, an average of 92% of regular news and 98% of fake news were disseminated to the whole network. Network density turned out to be more important for dissemination than the differences in personality and behavior of individuals. Thus the spread of fake news can not only be addressed by focusing on the personality of individual users and their associated behavior. Systemic approaches---integrating both human and algorithm---must be considered to effectively combat fake news.

Chatterbox: Conversational Interfaces for Microtask Crowdsourcing

Conversational interfaces can facilitate human-computer interactions. Whether or not conversational interfaces can improve worker experience and work quality in crowdsourcing marketplaces has remained unanswered. We investigate the suitability of text-based conversational interfaces for microtask crowdsourcing. We designed a rigorous experimental campaign aimed at gauging the interest and acceptance by crowdworkers for this type of work interface. We compared Web and conversational interfaces for five common microtask types and measured the execution time, quality of work, and the perceived satisfaction of 316 workers recruited from the FigureEight platform. We show that conversational interfaces can be used effectively for crowdsourcing microtasks, resulting in a high satisfaction from workers, and without having a negative impact on task execution time or work quality.

Auto-Suggesting Browsing Actions for Personalized Web Screen Reading

Web browsing has never been easy for blind people, primarily due to the serial press-and-listen interaction mode of screen readers -- their "go-to'' assistive technology. Even simple navigational browsing actions on a page require a multitude of shortcuts. Auto-suggesting the next browsing action has the potential to assist blind users in swiftly completing various tasks with minimal effort. Extant auto-suggest feature in web pages is limited to filling form fields; in this paper, we generalize it to any web screen-reading browsing action, e.g., navigation, selection, etc. Towards that, we introduce SuggestOmatic, a personalized and scalable unsupervised approach for predicting the most likely next browsing action of the user, and proactively suggesting it to the user so that the user can avoid pressing a lot of shortcuts to complete that action. SuggestOmatic rests on two key ideas. First, it exploits the user's Action History to identify and suggest a small set of browsing actions that will, with high likelihood, contain an action which the user will want to do next, and the chosen action is executed automatically. Second, the Action History is represented as an abstract temporal sequence of operations over semantic web entities called Logical Segments - a collection of related HTML elements, e.g., widgets, search results, menus, forms, etc.; this semantics-based abstract representation of browsing actions in the Action History makes SuggestOmatic scalable across websites, i.e., actions recorded in one website can be used to make suggestions for other similar websites. We also describe an interface that uses an off-the-shelf physical Dial as an input device that enables SuggestOmatic to work with any screen reader. The results of a user study with 12 blind participants indicate that SuggestOmatic can significantly reduce the browsing task times by as much as 29% when compared with a hand-crafted macro-based web automation solution.

Adaptive Modelling of Attentiveness to Messaging: A Hybrid Approach

Identifying instances when a user will not able to attend to an incoming message and constructing an auto-response with relevant contextual information may help reduce social pressures to immediately respond that many users face. Mobile messaging behavior often varies from one person to another. As a result, compared to a generic model considering profiles of several users, a personalized model can capture a user's messaging behavior more accurately to predict their inattentive states. However, creating accurate personalized models requires a non-trivial amount of individual data, which is often not available for new users. In this work, we investigate a weighted hybrid approach to model users' attention to messaging. Through dynamic performance-based weighting, we combine the predictions of three types of models, a general model, a group model and a personalized model to create an approach which can work through the lack of initial data while adapting to the user's behavior. We present the details of our modeling approach and the evaluation of the model with over three weeks of data from 274 users. Our results highlight the value of hybrid weighted modeling to predict when a user cannot attend to their messages.

Stuck? No worries!: Task-aware Command Recommendation and Proactive Help for Analysts

Data analytics software applications have become an integral part of the decision-making process of analysts. Users of such a software face challenges due to insufficient product and domain knowledge, and find themselves in need of help. To alleviate this, we propose a task-aware command recommendation system, to guide the user on what commands could be executed next. We rely on topic modeling techniques to incorporate information about user's task into our models. We also present a help prediction model to detect if a user is in need of help, in which case the system proactively provides the aforementioned command recommendations. We leverage the log data of a web-based analytics software to quantify the superior performance of our neural models, in comparison to competitive baselines.

Personalized Recommendations for Music Genre Exploration

Most recommender systems generate recommendations to match the user's current preference. However, users sometimes might have the goal to develop new preferences away from their current preference and use the recommender to guide them towards it. In this paper, we asked users to select a new genre to explore and studied what kind of recommendations would be more helpful for users to start exploring this new music taste. Three different recommendation methods are tested: one non-personalized which recommends the most representative tracks of the genre, one personalized method which considers songs from the new genre that best matches users' current preferences, and one mixed method which makes a trade-off between the two approaches. A comparative design was used in a user experiment in which participants were asked to evaluate the differences between the personalized method/mixed method and the non-personalized baseline. The mixed method results in recommendations that are more accurate and representative for the new genre than the personalized method. Users' perceived helpfulness for exploring the new genre is positively related to both perceived accuracy and perceived representativeness of the recommended items. Besides, recommendations from the mixed method are perceived more helpful for users high on Musical Sophistication Index for Active Engagement (MSAE). To our knowledge, this is one of the first studies using a recommender system to support users' preference development, and provides insights in how recommender systems can help users attain new goals and tastes.

Beyond Explicit Reports: Comparing Data-Driven Approaches to Studying Underlying Dimensions of Music Preference

Prior research from the field of music psychology has suggested that there are factors common to music preference beyond individual genres. Specifically, research has shown that self-reported ratings of preference for individual musical genres can be reduced to 4 or 5 dimensions, which in turn have been shown to correlate to relevant psychological constructs, such as personality. However, the number of dimensions emerging from multiple studies has varied despite the care taken in conducting such research. Data-driven approaches offer opportunities to further this line of research with actual listening data, at a scale and scope surpassing that of traditional psychological studies. Although listening data can be considered more direct and comprehensive evidence of listening preference, transforming this data into meaningful measurements is non-trivial. In the current paper, we report on investigations seeking to find interpretable underlying dimensions of music taste, using implicit large-scale listening data. Offering a critical reflection on potential researchers' degrees of freedom, we adopt an explicit systematic approach, investigating the impact of varying different parameters, analysis, and normalization techniques. More precisely, we consider various ways to extract listening preference information from two large, openly available datasets of music listening behavior, making use of principal component analysis and variational autoencoders to extract potential underlying dimensions. Results and implications are discussed in light of prior psychological theory, and the potential of user listening data to further research on music preference.

ContextPlay: Evaluating User Control for Context-Aware Music Recommendation

Music preferences are likely to depend on contextual characteristics such as location and activity. However, most recommender systems do not allow users to adapt recommendations to their current context. We therefore built ContextPlay, a context-aware music recommender that enables user control for both contextual characteristics and music preferences. By conducting a mixed-design study (N=114) with four typical scenarios of music listening, we investigate the effect of controlling contextual characteristics in a music recommender system on four aspects: perceived quality, diversity, effectiveness, and cognitive load. Compared to our baseline which only allows to specify music preferences, having additional control for context leads to higher perceived quality and does not increase cognitive load. We also find that the contexts of mood, weather, and location tend to influence user perception of the system. Moreover, we found that users are more likely to modify contexts and their profile during relaxing activities.

Exploring the Power of Visual Features for the Recommendation of Movies

In this paper, we explore the potential of using visual features in movie Recommender Systems. This type of content features can be extracted automatically without any human involvement and have been shown to be very effective in representing the visual content of movies. We have performed the following experiments, using a large dataset of movie trailers: (i) Experiment A: an exploratory analysis as an initial investigation on the data, and (ii) Experiment B: building a movie recommender based on the visual features and evaluating the performance. The observed results have shown promising potential of visual features in representing the movies and the excellency of recommendation based on these features.

Impact of English Reading Comprehension Abilities on Processing Magazine Style Narrative Visualizations and Implications for Personalization

In this paper, we present research to uncover how the level of reading comprehension abilities impacts how users process textual documents in English with embedded visualizations (i.e., Magazine Style Narrative Visualizations or MSNVs). We analyze performance and gaze data of users processing MSNVs from two user studies, one run in Canada and one in a non-English speaking European country. Our findings provide important insights toward developing automatic, real-time support to MSNV processing personalized according to users' English reading comprehension abilities.

Adapting Performance And Emotional Support Feedback To Cultural Differences

This paper investigates adaptation of feedback to learners' cultural backgrounds. First, we investigate how to portray the cultural background of a learner. Second, we present a qualitative focus-group study, investigating how participants from different cultures believe culture affects the kind of feedback given to a learner. Finally, we present an empirical study on how humans adapt feedback based on the cultural background of learners to inspire an algorithm. Our investigations resulted in a set of stories which can be used to reliably portray a person's culture when investigating cultural adaptation in indirect experiments and user as wizard studies. They also provided insights into the adaptations people make to cultural differences.

Modeling Improvement for Underrepresented Minorities in Online STEM Education

Previous research has shown that students from underrepresented minority groups tend to receive lower grades in online classes than their peers, especially in science-focused courses. We propose that there may also be benefits to online courses for these students (e.g., opportunities for peer discussions where minority status is less salient), though little is currently known about these potential benefits. We present a new perspective on learning outcomes by measuring improvement, rather than grades alone. In learning management system data from seven semesters of an online introductory science course, we found that students from underrepresented minority racial groups were indeed less likely to receive high grades, and scored lower on exams; however, their exam scores improved throughout the semester a similar amount compared to their peers. We also compared improvement to students' behaviors, including exam submission times and forum usage, finding that these behaviors were related to improvement. Finally, we also briefly discuss implications of these findings for reducing inequalities in education, and the possibilities for underrepresented minority students in online STEM education in particular.

Personalization of Persuasive Technology in Higher Education

The success of persuasive systems in changing people's attitudes and behaviours has been established in various domains. Specifically, research has shown that personalized persuasive technology is more effective at achieving the desired goal than the one-size-fits-all approach. However, in the education domain, there are limited studies on the personalization of persuasive strategies to students. To advance persuasive technology research in this area, we investigated the susceptibility of undergraduate students (n = 243) to four persuasive strategies (Reward, Competition, Social Comparison and Social Learning) in order to provide a guideline for designing and personalizing persuasive systems in education. These four strategies were chosen because research on persuasion has established their effectiveness in changing behaviour and/or attitude. The results of our analysis reveal that students are more susceptible to Reward, followed by Competition and Social Comparison (both of which come in the second place) and Social Learning (the least persuasive). Moreover, there is no gender difference in the persuasiveness of the strategies. Therefore, in choosing persuasive strategies to motivate student's learning and success, among the strategies we investigated, Reward should be given priority, followed by Competition and Social Comparison, while Social Learning should be least favoured.

SESSION: Doctoral Consortium

Session details: Doctoral Consortium

Adaptive E-Learning: Motivating Learners whilst Adapting Feedback to Cultural Background

The personalization of feedback by an Intelligent Tutoring System has the potential to greatly improve learner motivation. This PhD investigates how an Intelligent Tutoring System can adapt to the cultural background of learners when giving feedback. The research uses the user-as-wizard method for investigation. To convey the cultural background of the learner in user studies, validated cultural stories (using Hofstede cultural dimensions) are required. These stories are then used to conduct qualitative and empirical studies to investigate how participants from a range of different cultures believe the culture of a learner should affect the kind of feedback given. The insights gathered from these studies will be unified to inspire an algorithm to allow an intelligent tutoring system to utilise these adaptations, and the effects tested on real learners.

Designing Culturally-appropriate Persuasive Technology to Promote Positive Work Attitudes among Workers in Public Workplaces

This research aims to design a mobile persuasive technology (PT) to promote acceptable pro-workplace behaviors and etiquette. As a first step to achieving this, we conducted a user study of 252 subjects from an African organization, to uncover what strategies could be used to model proper behaviors and promote employee's commitment to the ideals, visions and missions of an organization. Leveraging existing workplace behavioral procedures, and socio-cultural strategies, we mapped our findings to their corresponding persuasive techniques. Presently, we employed the iterative design process in developing the mobile PT and the design is informed by our findings. Finally, we will deploy our mobile PT and conduct a large-scale evaluation of public workers in a Nigerian workplace to determine its efficacy to promoting positive workplace etiquette and attitudes. We will employ a mixed-method approach involving both quantitative and qualitative (interview and focus group) for this study.

Towards an Exhaustive Framework for Online Social Networks User Behaviour Modelling

Since the advent of Web 2.0, Online Social Networks (OSNs) represent a rich opportunity for researchers to collect real user data and to explore OSNs user behaviour. Based on the current challenges and future directions proposed in literature, we aim to investigate how to comprehensively model OSNs user behaviours, by exploiting and combining user data of different nature. We propose to use hypergraphs as a model to easily analyse and combine structural, semantic, and activity-related user information, and to study their evolution over time. This novel user behaviour modelling technique will converge in open, efficient, and scalable libraries, which will be integrated into a modular framework able to handle the data crawling process from several OSNs.

Exploring the Potential of the Resolving Sets Model for Introducing Serendipity to Recommender Systems

Recommender systems offer recommendations based on user's previous ratings. However, sometimes the user is interested in unusual and interesting items that do not exactly match her user profile, as defined by the system. Serendipity, a concept that can be interpreted primarily as surprise, is one of the "beyond-accuracy" aspects that have been proposed to be considered to meet user's expectations for the recommendations she/he gets. Although recent studies attempt to address the serendipity problem, there is still a variety of interpretations regarding the definition, the measurement and the application of serendipity in recommender systems. Our proposed method follows the distance-based approach for multi-dimensional serendipity measurement, which refers to the expected items for the user as a benchmark for measuring serendipity. For integrating serendipity into recommendations, we propose a novel serendipity-oriented user modeling method, based on graph-theory approach - resolving sets in a graph, which enables finding serendipitous items in a multi-dimensional content-based space by detecting the expected items for the user.