RecSys '18- Proceedings of the 12th ACM Conference on Recommender Systems

Full Citation in the ACM Digital Library

SESSION: Invited keynotes

Five E's: reflecting on the design of recommendations

Many case studies illustrate unintended consequences of well-intentioned systems. We have seen problems caused by information "filter bubbles"; problems caused by inappropriate, discriminatory or outright dangerous recommendations; issues of poor data quality leading to erroneous conclusions; and lack of clear methods, techniques, and tools for understanding how systems work and how to undo or reverse problems that have been caused.

These events, case studies, and stories have led to calls for what I call the three E's of accountability in application, product, system, and service offerings - that they be more Explainable, Equitable, and Ethical. I'd like to raise two more critical E's in socio-technical system design and development processes: Expedience and Exigence. It is critical that we address these two if we are going to realize the call for the first three E's.

In this talk, I will reflect on the nature of recommendation through the lens of these 5 E's to kick-start a conversation about recommendation 'design'. I will draw on the psychology of human information processing, reasoning, and decision making, and will share observations, anecdotes, and cautionary tales to motivate some directions forward for recommendation and recommender design. I will then invite us to discuss: What, concretely, can researchers, developers, and designers do to address the 5 E's?

Scalable structured prediction for richly structured socio-behavioral data

Online recommender systems, content-provider sites, and social media platforms provide richly structured socio-behavioral data. However, using this noisy and incomplete data to make decisions and recommendations is challenging. It often requires complex forms of structured prediction that rely on both the logical structure in the domain and probabilistic dependencies among interlinked entities. In this talk, I will describe some common inference patterns that are useful for socio-behavioral networks and introduce probabilistic soft logic (PSL). PSL is a highly scalable open-source probabilistic programming language being developed within my group that is well-suited for structured prediction over socio-behavioral data. Finally, I will review some of our recent work using PSL for hybrid recommender systems, explanation, and fair decision making.

Recommending social cohesion

Public media produces a public good in the form of social cohesion. Generally, countries with strong social cohesion enjoy better security, economies, and qualities of life. CBC-Radio-Canada has long used technology to bring Canadians together, against all the forces that drive us apart. Recommendation Systems are just such a technology. Join me in some optimism about how we can build the future we want to see.

SESSION: Why did i get this? explaining recommendations

Why I like it: multi-task learning for recommendation and explanation

We describe a novel, multi-task recommendation model, which jointly learns to perform rating prediction and recommendation explanation by combining matrix factorization, for rating prediction, and adversarial sequence to sequence learning for explanation generation. The result is evaluated using real-world datasets to demonstrate improved rating prediction performance, compared to state-of-the-art alternatives, while producing effective, personalized explanations.

Effects of personal characteristics on music recommender systems with different levels of controllability

Previous research has found that enabling users to control the recommendation process increases user satisfaction. However, providing additional controls also increases cognitive load, and different users have different needs for control. Therefore, in this study, we investigate the effect of two personal characteristics: musical sophistication and visual memory capacity. We designed a visual user interface, on top of a commercial music recommender, with different controls: interactions with recommendations (i.e., the output of a recommender system), the user profile (i.e., the top listened songs), and algorithm parameters (i.e., weights in an algorithm). We created eight experimental settings with combinations of these three user controls and conducted a between-subjects study (N=240), to explore the effect on cognitive load and recommendation acceptance for different personal characteristics. We found that controlling recommendations is the most favorable single control element. In addition, controlling user profile and algorithm parameters was the most beneficial setting with multiple controls. Moreover, the participants with high musical sophistication perceived recommendations to be of higher quality, which in turn lead to higher recommendation acceptance. However, we found no effect of visual working memory on either cognitive load or recommendation acceptance. This work contributes an understanding of how to design control that hits the sweet spot between the perceived quality of recommendations and acceptable cognitive load.

Providing explanations for recommendations in reciprocal environments

Automated platforms which support users in finding a mutually beneficial match, such as online dating and job recruitment sites, are becoming increasingly popular. These platforms often include recommender systems that assist users in finding a suitable match. While recommender systems which provide explanations for their recommendations have shown many benefits, explanation methods have yet to be adapted and tested in recommending suitable matches. In this paper, we introduce and extensively evaluate the use of "reciprocal explanations" - explanations which provide reasoning as to why both parties are expected to benefit from the match. Through an extensive empirical evaluation, in both simulated and real-world dating platforms with 287 human participants, we find that when the acceptance of a recommendation involves a significant cost (e.g., monetary or emotional), reciprocal explanations outperform standard explanation methods, which consider the recommendation receiver alone. However, contrary to what one may expect, when the cost of accepting a recommendation is negligible, reciprocal explanations are shown to be less effective than the traditional explanation methods.

Explore, exploit, and explain: personalizing explainable recommendations with bandits

The multi-armed bandit is an important framework for balancing exploration with exploitation in recommendation. Exploitation recommends content (e.g., products, movies, music playlists) with the highest predicted user engagement and has traditionally been the focus of recommender systems. Exploration recommends content with uncertain predicted user engagement for the purpose of gathering more information. The importance of exploration has been recognized in recent years, particularly in settings with new users, new items, non-stationary preferences and attributes. In parallel, explaining recommendations ("recsplanations") is crucial if users are to understand their recommendations. Existing work has looked at bandits and explanations independently. We provide the first method that combines both in a principled manner. In particular, our method is able to jointly (1) learn which explanations each user responds to; (2) learn the best content to recommend for each user; and (3) balance exploration with exploitation to deal with uncertainty. Experiments with historical log data and tests with live production traffic in a large-scale music recommendation service show a significant improvement in user engagement.

Interpreting user inaction in recommender systems

Temporally, users browse and interact with items in recommender systems. However, for most systems, the majority of the displayed items do not elicit any action from users. In other words, the user-system interaction process includes three aspects: browsing, action, and inaction. Prior recommender systems literature has focused more on actions than on browsing or inaction. In this work, we deployed a field survey in a live movie recommender system to interpret what inaction means from both the user's and the system's perspective, guided by psychological theories of human decision making. We further systematically study factors to infer the reasons of user inaction and demonstrate with offline data sets that this descriptive and predictive inaction model can provide benefits for recommender systems in terms of both action prediction and recommendation timing.

Impact of item consumption on assessment of recommendations in user studies

In user studies of recommender systems, participants typically cannot consume the recommended items. Still, they are asked to assess recommendation quality and other aspects related to user experience by means of questionnaires. Without having listened to recommended songs or watched suggested movies, however, this might be an error-prone task, possibly limiting validity of results obtained in these studies. In this paper, we investigate the effect of actually consuming the recommended items. We present two user studies conducted in different domains showing that in some cases, differences in the assessment of recommendations and in questionnaire results occur. Apparently, it is not always possible to adequately measure user experience without allowing users to consume items. On the other hand, depending on domain and provided information, participants sometimes seem to approximate the actual value of recommendations reasonably well.

SESSION: From browser to buyer: online product recommendations

Multistakeholder recommendation with provider constraints

Recommender systems are typically designed to optimize the utility of the end user. In many settings, however, the end user is not the only stakeholder and this exclusive focus may produce unsatisfactory results for other stakeholders. One such setting is found in multisided platforms, which bring together buyers and sellers. In such platforms, it may be necessary to jointly optimize the value for both buyers and sellers. This paper proposes a constraint-based integer programming optimization model, in which different sets of constraints are used to reflect the goals of the different stakeholders. This model is applied as a post-processing step, so it can easily be added onto an existing recommendation system to make it multi-stakeholder aware. For computational tractability with larger data sets, we reformulate the integer problem using the Lagrangian dual and use subgradient optimization. In experiments with two data sets, we evaluate empirically the interaction between the utilities of buyers and sellers and show that our approximation can achieve good upper and lower bounds in practical situations.

Translation-based factorization machines for sequential recommendation

Sequential recommendation algorithms aim to predict users' future behavior given their historical interactions. A recent line of work has achieved state-of-the-art performance on sequential recommendation tasks by adapting ideas from metric learning and knowledge-graph completion. These algorithms replace inner products with low-dimensional embeddings and distance functions, employing a simple translation dynamic to model user behavior over time.

In this paper, we propose TransFM, a model that combines translation and metric-based approaches for sequential recommendation with Factorization Machines (FMs). Doing so allows us to reap the benefits of FMs (in particular, the ability to straightforwardly incorporate content-based features), while enhancing the state-of-the-art performance of translation-based models in sequential settings. Specifically, we learn an embedding and translation space for each feature dimension, replacing the inner product with the squared Euclidean distance to measure the interaction strength between features. Like FMs, we show that the model equation for TransFM can be computed in linear time and optimized using classical techniques. As TransFM operates on arbitrary feature vectors, additional content information can be easily incorporated without significant changes to the model itself. Empirically, the performance of TransFM significantly increases when taking content features into account, outperforming state-of-the-art models on sequential recommendation tasks for a wide variety of datasets.

Exploring recommendations under user-controlled data filtering

Traditionally, recommendation systems are built on the assumption that each service provider has full access to all user data generated on its platform. However, with increasing data privacy concerns and personal data protection regulation, service providers such as Google, Twitter, and Facebook are enabling their users to revisit, erase, and rectify their historical profiles. Future recommendation systems need to be robust to such profile modifications and user-controlled data filtering. In this paper, we explore how recommendation performance may be affected by time-sensitive user data filtering, that is, users choosing to share only recent "N days" of data. Using the MovieLens dataset as a testbed, we evaluated three widely used collaborative filtering algorithms. Our experiments demonstrate that filtering out historical user data does not significantly affect the overall recommendation performance, but its impact on individual users may vary. These findings challenge the common belief that more data is essential to better performance, and suggest a potential win-win solution for services and end users.

Quality-aware neural complementary item recommendation

Complementary item recommendation finds products that go well with one another (e.g., a camera and a specific lens). While complementary items are ubiquitous, the dimensions by which items go together can vary by both product and category, making it difficult to detect complementary items at scale. Moreover, in practice, user preferences for complementary items can be complex combinations of item quality and evidence of complementarity. Hence, we propose a new neural complementary recommender Encore that can jointly learn complementary item relationships and user preferences. Specifically, Encore (i) effectively combines and balances both stylistic and functional evidence of complementary items across item categories; (ii) naturally models item latent quality for complementary items through Bayesian inference of customer ratings; and (iii) builds a novel neural network model to learn the complex (non-linear) relationships between items for flexible and scalable complementary product recommendations. Through experiments over large Amazon datasets, we find that Encore effectively learns complementary item relationships, leading to an improvement in accuracy of 15.5% on average versus the next-best alternative.

Item recommendation on monotonic behavior chains

'Explicit' and 'implicit' feedback in recommender systems have been studied for many years, as two relatively isolated areas. However many real-world systems involve a spectrum of both implicit and explicit signals, ranging from clicks and purchases, to ratings and reviews. A natural question is whether implicit signals (which are dense but noisy) might help to predict explicit signals (which are sparse but reliable), or vice versa. Thus in this paper, we propose an item recommendation framework which jointly models this full spectrum of interactions. Our main observation is that in many settings, feedback signals exhibit monotonic dependency structures, i.e., any signal necessarily implies the presence of a weaker (or more implicit) signal (a 'review' action implies a 'purchase' action, which implies a 'click' action, etc.). We refer to these structures as 'monotonic behavior chains,' for which we develop new algorithms that exploit these dependencies. Using several new and existing datasets that exhibit a variety of feedback types, we demonstrate the quantitative performance of our approaches. We also perform qualitative analysis to uncover the relationships between different stages of implicit vs. explicit signals.

Deep reinforcement learning for page-wise recommendations

Recommender systems can mitigate the information overload problem by suggesting users' personalized items. In real-world recommendations such as e-commerce, a typical interaction between the system and its users is - users are recommended a page of items and provide feedback; and then the system recommends a new page of items. To effectively capture such interaction for recommendations, we need to solve two key problems - (1) how to update recommending strategy according to user's real-time feedback, and 2) how to generate a page of items with proper display, which pose tremendous challenges to traditional recommender systems. In this paper, we study the problem of page-wise recommendations aiming to address aforementioned two challenges simultaneously. In particular, we propose a principled approach to jointly generate a set of complementary items and the corresponding strategy to display them in a 2-D page; and propose a novel page-wise recommendation framework based on deep reinforcement learning, DeepPage, which can optimize a page of items with proper display based on real-time feedback from users. The experimental results based on a real-world e-commerce dataset demonstrate the effectiveness of the proposed framework.

Causal embeddings for recommendation

Many current applications use recommendations in order to modify the natural user behavior, such as to increase the number of sales or the time spent on a website. This results in a gap between the final recommendation objective and the classical setup where recommendation candidates are evaluated by their coherence with past user behavior, by predicting either the missing entries in the user-item matrix, or the most likely next event. To bridge this gap, we optimize a recommendation policy for the task of increasing the desired outcome versus the organic user behavior. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy. To this end, we propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements.

SESSION: Learning and optimization

Neural gaussian mixture model for review-based rating prediction

Review has been proven to be an important information in recommendation. Different from the overall user-item rating matrix, it can provide textual information that exhibits why a user likes an item or not. Recently, more and more researchers have paid attention on review-based rating prediction. There are two challenging issues: how to extract representative features to characterize users / items from reviews and how to leverage them for recommendation system. In this paper, we propose a Neural Gaussian Mixture Model (NGMM) for review-based rating prediction task. Among it, the review textual information is used to construct two parallel neural networks for users and items respectively, so that the users' preferences and items' properties can be sufficiently extracted and represented as two latent vectors. A shared layer is introduced on the top to couple these two networks together and model user-item rating based on the features learned from reviews. Specifically, each rating is modeled via a Gaussian mixture model, where each Gaussian component has zero variance, the mean described by the corresponding component in user's latent vector and the weight indicated by the corresponding component in item's latent vector. Extensive experiments are conducted on five real-world Amazon review datasets. The experimental results have demonstrated that our proposed NGMM model achieves the state-of-the-art performance in review-based rating prediction task.

Interactive recommendation via deep neural memory augmented contextual bandits

Personalized recommendation with user interactions has become increasingly popular nowadays in many applications with dynamic change of contents (news, media, etc.). Existing approaches model user interactive recommendation as a contextual bandit problem to balance the trade-off between exploration and exploitation. However, these solutions require a large number of interactions with each user to provide high quality personalized recommendations. To mitigate this limitation, we design a novel deep neural memory augmented mechanism to model and track the history state for each user based on his previous interactions. As such, the user's preferences on new items can be quickly learned within a small number of interactions. Moreover, we develop new algorithms to leverage large amount of all users' history data for offline model training and online model fine tuning for each user with the focus of policy evaluation. Extensive experiments on different synthetic and real-world datasets validate that our proposed approach consistently outperforms a variety of state-of-the-art approaches.

Optimally balancing receiver and recommended users' importance in reciprocal recommender systems

Online platforms which assist people in finding a suitable partner or match, such as online dating and job recruiting environments, have become increasingly popular in the last decade. Many of these platforms include recommender systems which aim at helping users discover other people who will also be interested in them. These recommender systems benefit from contemplating the interest of both sides of the recommended match, however the question of how to optimally balance the interest and the response of both sides remains open. In this study we present a novel recommendation method for recommending people to people. For each user receiving a recommendation, our method finds the optimal balance of two criteria: a) the likelihood of the user accepting the recommendation; and b) the likelihood of the recommended user positively responding. We extensively evaluate our recommendation method in a group of active users of an operational online dating site. We find that our method is significantly more effective in increasing the number of successful interactions compared to a state-of-the-art recommendation method.

HOP-rec: high-order proximity for implicit recommendation

Recommender systems are vital ingredients for many e-commerce services. In the literature, two of the most popular approaches are based on factorization and graph-based models; the former approach captures user preferences by factorizing the observed direct interactions between users and items, and the latter extracts indirect preferences from the graphs constructed by user-item interactions. In this paper we present HOP-Rec, a unified and efficient method that incorporates the two approaches. The proposed method involves random surfing on a graph to harvest high-order information among neighborhood items for each user. Instead of factorizing a transition matrix, our method introduces a confidence weighting parameter to simulate all high-order information simultaneously, for which we maintain a sparse user-item interaction matrix and enrich the matrix for each user using random walks. Experimental results show that our approach significantly outperforms the state of the art on a range of large-scale real-world datasets.

Generation meets recommendation: proposing novel items for groups of users

Consider a movie studio aiming to produce a set of new movies for summer release: What types of movies it should produce? Who would the movies appeal to? How many movies should it make? Similar issues are encountered by a variety of organizations, e.g., mobile-phone manufacturers and online magazines, who have to create new (non-existent) items to satisfy groups of users with different preferences. In this paper, we present a joint problem formalization of these interrelated issues, and propose generative methods that address these questions simultaneously. Specifically, we leverage on the latent space obtained by training a deep generative model---the Variational Autoencoder (VAE)---via a loss function that incorporates both rating performance and item reconstruction terms. We use a greedy search algorithm that utilize this learned latent space to jointly obtain K plausible new items, and user groups that would find the items appealing. An evaluation of our methods on a synthetic dataset indicates that our approach is able to generate novel items similar to highly-desirable unobserved items. As case studies on real-world data, we applied our method on the MART abstract art and Movielens Tag Genome datasets, which resulted in promising results: small and diverse sets of novel items.

Calibrated recommendations

When a user has watched, say, 70 romance movies and 30 action movies, then it is reasonable to expect the personalized list of recommended movies to be comprised of about 70% romance and 30% action movies as well. This important property is known as calibration, and recently received renewed attention in the context of fairness in machine learning. In the recommended list of items, calibration ensures that the various (past) areas of interest of a user are reflected with their corresponding proportions. Calibration is especially important in light of the fact that recommender systems optimized toward accuracy (e.g., ranking metrics) in the usual offline-setting can easily lead to recommendations where the lesser interests of a user get crowded out by the user's main interests-which we show empirically as well as in thought-experiments. This can be prevented by calibrated recommendations. To this end, we outline metrics for quantifying the degree of calibration, as well as a simple yet effective re-ranking algorithm for post-processing the output of recommender systems.

SESSION: Travel and entertainment

No more ready-made deals: constructive recommendation for telco service bundling

We propose a new recommendation system for service and product bundling in the domain of telecommunication and multimedia. Using this system, users can easily generate a combined service plan that best suits their needs within a vast range of candidates. The system exploits the recent constructive preference elicitation framework, which allows us to flexibly model the exponentially large domain of bundle offers as an implicitly defined set of variables and constraints. The user preferences are modeled by a utility function estimated via coactive learning interaction, while iteratively generating high-utility recommendations through constraint optimization. In this paper, we detail the structure of our system, as well as the methodology and results of an empirical validation study which involved more than 130 participants. The system turned out to be highly usable with respect to both time and number of interactions, and its outputs were found much more satisfactory than those obtained with standard techniques used in the market.

Preference elicitation as an optimization problem

The new user coldstart problem arises when a recommender system does not yet have any information about a user. A common solution to it is to generate a profile by asking the user to rate a number of items. Which items are selected determines the quality of the recommendations made, and thus has been studied extensively. We propose a new elicitation method to generate a static preference questionnaire (SPQ) that poses relative preference questions to the user. Using a latent factor model, we show that SPQ improves personalized recommendations by choosing a minimal and diverse set of questions. We are the first to rigorously prove which optimization task should be solved to select each question in static questionnaires. Our theoretical results are confirmed by extensive experimentation. We test the performance of SPQ on two real-world datasets, under two experimental conditions: simulated, when users behave according to a latent factor model (LFM), and real, in which only real user judgments are revealed as the system asks questions. We show that SPQ reduces the necessary length of a questionnaire by up to a factor of three compared to state-of-the-art preference elicitation methods. Moreover, solving the right optimization task, SPQ also performs better than baselines with dynamically generated questions.

Comfride: a smartphone based system for comfortable public transport recommendation

Passenger comfort is a major factor influencing a commuter's decision to avail public transport. Existing studies suggest that factors like overcrowding, jerkiness, traffic congestion etc. correlate well to passenger's (dis)comfort. An online survey conducted with more than 300 participants from 12 different countries reveals that different personalized and context dependent factors influence passenger comfort during a travel by public transport. Leveraging on these findings, we identify correlations between comfort level and these dynamic parameters, and implement a smartphone based application, ComfRide, which recommends the most comfortable route based on user's preference honoring her travel time constraint. We use a 'Dynamic Input/Output Automata' based composition model to capture both the wide varieties of comfort choices from the commuters and the impact of environment on the comfort parameters. Evaluation of ComfRide, involving 50 participants over 28 routes in a state capital of India, reveals that recommended routes have on average 30% better comfort level than Google map recommended routes, when a commuter gives priority to specific comfort parameters of her choice.

Understanding user interactions with podcast recommendations delivered via voice

Voice interfaces introduced by smart speakers present new opportunities and challenges for podcast content recommendations. Understanding how users interact with voice-based recommendations has the potential to inform better design of vocal recommenders. However, existing knowledge about user behavior is mostly for visual interfaces, such as the web, and is not directly transferable to voice interfaces, which rely on user listening and do not support skimming and browsing. To fill in the gap, we conducted a controlled study to compare user interactions with recommendations delivered visually to those with recommendations delivered vocally. Through an online A/B testing with 100 participants, we found that when recommendations are vocally conveyed, users consume more slowly, explore less, and choose fewer long-tail items. The study also reveals the correlation between user choices and exploration via voice interfaces. Our findings pose challenges to the design of voice interfaces, such as adaptively recommending diverse content and designing better navigation mechanisms.

Deep inventory time translation to improve recommendations for real-world retail

Recommender systems are an important component in the retail industry, but the constantly renewed inventory of many companies makes it difficult to aggregate enough data to fully harness the benefits of such systems. In this paper, we describe a technique that significantly improves the accuracy of the recommendations, validated on a real store transaction history, by performing a time translation that maps out-of-stock items to similar items that are currently in stock using deep features of the products. This greatly reduces the dimension of the item-item interactions matrix while preserving all the dataset entries, which mitigates the sparsity of the dataset, and provides an original solution to the cold-start problem. We also improve the coverage at no accuracy cost by favouring less popular items within a small radius in the feature space while applying the time translation mapping. Finally, by modelling item-item rather that user-item correlations, we are able to update the recommendations for a given user in real-time, without re-training, as the user's history receives new entries.

The art of drafting: a team-oriented hero recommendation system for multiplayer online battle arena games

Multiplayer Online Battle Arena (MOBA) games have received increasing popularity recently. In a match of such games, players compete in two teams of five, each controlling an in-game avatar, known as heroes, selected from a roster of more than 100. The selection of heroes, also known as pick or draft, takes place before the match starts and alternates between the two teams until each player has selected one hero. Heroes are designed with different strengths and weaknesses to promote team cooperation in a game. Intuitively, heroes in a strong team should complement each other's strengths and suppress those of opponents. Hero drafting is therefore a challenging problem due to the complex hero-to-hero relationships to consider. In this paper, we propose a novel hero recommendation system that suggests heroes to add to an existing team while maximizing the team's prospect for victory. To that end, we model the drafting between two teams as a combinatorial game and use Monte Carlo Tree Search (MCTS) for estimating the values of hero combinations. Our empirical evaluation shows that hero teams drafted by our recommendation algorithm have a significantly higher win rate against teams constructed by other baseline and state-of-the-art strategies.

SESSION: Towards recsys that care

Recommending social-interactive games for adults with autism spectrum disorders (ASD)

Games play a significant role in modern society, since they affect people of all ages and all walks of life, whether it be socially or mentally, and have direct impacts on adults with autism. Autism spectrum disorders (ASD) are a collection of neurodevelopmental disorders characterized by qualitative impairments in social relatedness and interaction, as well as difficulties in acquiring and using communication and language abilities. Adults with ASD often find it difficult to express and recognize emotions which makes it hard for them to interact with others socially. We have designed new interactive and collaborative games for autistic adults and developed a novel strategy to recommend games to them. Using modern computer vision and graphics techniques, we (i) track the player's speech rate, facial features, eye contact, audio communication, and emotional states, and (ii) foster their collaboration. These games are personalized and recommended to a user based on games interested to the user, besides the complexity of games at different levels according to the deficient level of the emotional understanding and social skills to which the user belongs. The objective of developing and recommending short-head (i.e., familiar) and long-tail (i.e., unfamiliar) games for adults with ASD is to enhance their social interacting skills with peers so that they can live a better life.

Sustainability at scale: towards bridging the intention-behavior gap with sustainable recommendations

Finding sustainable products and evaluating their claims is a significant barrier facing sustainability-minded customers. Tools that reduce both these burdens are likely to boost the sale of sustainable products. However, it is difficult to determine the sustainability characteristics of these products --- there are a variety of certifications and definitions of sustainability, and quality labeling requires input from domain experts. In this paper, we propose a flexible probabilistic framework that uses domain knowledge to identify sustainable products and customers, and uses these labels to predict customer purchases. We evaluate our approach on grocery items from the Amazon catalog. Our proposed approach outperforms established recommender system models in predicting future purchases while jointly inferring sustainability scores for customers and products.

What's going on in my city?: recommender systems and electronic participatory budgeting

In this paper, we present electronic participatory budgeting (ePB) as a novel application domain for recommender systems. On public data from the ePB platforms of three major US cities - Cambridge, Miami and New York City-, we evaluate various methods that exploit heterogeneous sources and models of user preferences to provide personalized recommendations of citizen proposals. We show that depending on characteristics of the cities and their participatory processes, particular methods are more effective than others for each city. This result, together with open issues identified in the paper, call for further research in the area.

How algorithmic confounding in recommendation systems increases homogeneity and decreases utility

Recommendation systems are ubiquitous and impact many domains; they have the potential to influence product consumption, individuals' perceptions of the world, and life-altering decisions. These systems are often evaluated or trained with data from users already exposed to algorithmic recommendations; this creates a pernicious feedback loop. Using simulations, we demonstrate how using data confounded in this way homogenizes user behavior without increasing utility.

Enhancing structural diversity in social networks by recommending weak ties

Contact recommendation has become a common functionality in online social platforms, and an established research topic in the social networks and recommender systems fields. Predicting and recommending links has been mainly addressed to date as an accuracy-targeting problem. In this paper we put forward a different perspective, considering that correctly predicted links may not be all equally valuable. Contact recommendation brings an opportunity to drive the structural evolution of a social network towards desirable properties of the network as a whole, beyond the sum of the isolated gains for the individual users to whom recommendations are delivered -global properties that we may want to assess and promote as explicit recommendation targets.

In this perspective, we research the definition of relevant diversity metrics drawing from social network analysis concepts, and linking to prior diversity notions in recommender systems. In particular, we elaborate on the notion of weak tie recommendation as a means to enhance the structural diversity of networks. In order to show the signification of the proposed metrics, we report experiments with Twitter data illustrating how state of the art contact recommendation methods compare in terms of our metrics; we examine the tradeoff with accuracy, and we show that diverse link recommendations result in a corresponding diversity enhancement in the flow of information through the network, with potential implications in mitigating filter bubbles.

Exploring author gender in book rating and recommendation

Collaborative filtering algorithms find useful patterns in rating and consumption data and exploit these patterns to guide users to good items. Many of the patterns in rating datasets reflect important real-world differences between the various users and items in the data; other patterns may be irrelevant or possibly undesirable for social or ethical reasons, particularly if they reflect undesired discrimination, such as gender or ethnic discrimination in publishing. In this work, we examine the response of collaborative filtering recommender algorithms to the distribution of their input data with respect to a dimension of social concern, namely content creator gender. Using publicly-available book ratings data, we measure the distribution of the genders of the authors of books in user rating profiles and recommendation lists produced from this data. We find that common collaborative filtering algorithms differ in the gender distribution of their recommendation lists, and in the relationship of that output distribution to user profile distribution.

SESSION: Does it work? metrics and evaluation

Get me the best: predicting best answerers in community question answering sites

There has been a massive rise in the use of Community Question and Answering (CQA) forums to get solutions to various technical and non-technical queries. One common problem faced in CQA is the small number of experts, which leaves many questions unanswered. This paper addresses the challenging problem of predicting the best answerer for a new question and thereby recommending the best expert for the same. Although there are work in the literature that aim to find possible answerers for questions posted in CQA, very few algorithms exist for finding the best answerer whose answer will satisfy the information need of the original Poster. For finding answerers, existing approaches mostly use features based on content and tags associated with the questions. There are few approaches that additionally consider the users' history. In this paper, we propose an approach that considers a comprehensive set of features including but not limited to text representation, tag based similarity as well as multiple user-based features that target users' availability, agility as well as expertise for predicting the best answerer for a given question. We also include features that give incentives to users who answer less but more important questions over those who answer a lot of questions of less importance. A learning to rank algorithm is used to find the weight of each feature. Experiments conducted on a real dataset from Stack Exchange show the efficacy of the proposed method in terms of multiple evaluation metrics for accuracy, robustness and real time performance.

On the robustness and discriminative power of information retrieval metrics for top-N recommendation

The evaluation of Recommender Systems is still an open issue in the field. Despite its limitations, offline evaluation usually constitutes the first step in assessing recommendation methods due to its reduced costs and high reproducibility. Selecting the appropriate metric is a critical and ranking accuracy usually attracts the most attention nowadays. In this paper, we aim to shed light on the advantages of different ranking metrics which were previously used in Information Retrieval and are now used for assessing top-N recommenders. We propose methodologies for comparing the robustness and the discriminative power of different metrics. On the one hand, we study cut-offs and we find that deeper cut-offs offer greater robustness and discriminative power. On the other hand, we find that precision offers high robustness and Normalised Discounted Cumulative Gain provides the best discriminative power.

Streamingrec: a framework for benchmarking stream-based news recommenders

News is one of the earliest application domains of recommender systems, and recommending items from a virtually endless stream of news is still a relevant problem today. News recommendation is different from other application domains in a variety of ways, e.g., because new items constantly become available for recommendation. To be effective, news recommenders therefore have to continuously consider the latest items in the incoming stream of news in their recommendation models. However, today's public software libraries for algorithm benchmarking mostly do not consider these particularities of the domain. As a result, authors often rely on proprietary protocols, which hampers the comparability of the obtained results. In this paper, we present StreamingRec as a framework for evaluating streaming-based news recommenders in a replicable way. The open-source framework implements a replay-based evaluation protocol that allows algorithms to update the underlying models in real-time when new events are recorded and new articles are available for recommendation. Furthermore, a variety of baseline algorithms for session-based recommendation are part of StreamingRec. For these, we also report a number of performance results for two datasets, which confirm the importance of immediate model updates.

A field study of related video recommendations: newest, most similar, or most relevant?

Many video sites recommend videos related to the one a user is watching. These recommendations have been shown to influence what users end up exploring and are an important part of a recommender system. Plenty of methods have been proposed to recommend related videos, but there has been relatively little work that compares competing strategies. We describe a field study of related video recommendations, where we deploy algorithms to recommend related movie trailers. Our results show that recency- and similarity-based algorithms yield the highest click-through rates, and that the recency-based algorithm leads to the most trailer-level engagement. Our findings suggest the potential to design non-personalized yet effective related item recommendation strategies.

Unbiased offline recommender evaluation for missing-not-at-random implicit feedback

Implicit-feedback Recommenders (ImplicitRec) leverage positive only user-item interactions, such as clicks, to learn personalized user preferences. Recommenders are often evaluated and compared offline using datasets collected from online platforms. These platforms are subject to popularity bias (i.e., popular items are more likely to be presented and interacted with), and therefore logged ground truth data are Missing-Not-At-Random (MNAR). As a result, the widely used Average-Over-All (AOA) evaluator is biased toward accurately recommending trendy items. In this paper, we (a) investigate evaluation bias of AOA and (b) develop an unbiased and practical offline evaluator for implicit MNAR datasets using the Inverse-Propensity-Scoring (IPS) technique. Through extensive experiments using four real-world datasets and four widely used algorithms, we show that (a) popularity bias is widely manifested in item presentation and interaction; (b) evaluation bias due to MNAR data pervasively exists in most cases where AOA is used to evaluate ImplicitRec; and (c) the unbiased estimator significantly reduces the AOA evaluation bias by more than 30% in the Yahoo! music dataset in terms of the Mean Absolute Error (MAE).

Judging similarity: a user-centric study of related item recommendations

Related item recommenders operate in the context of a particular item. For instance, a music system's page about the artist Radio-head might recommend other similar artists such as The Flaming Lips. Often central to these recommendations is the computation of similarity between pairs of items. Prior work has explored many algorithms and features that allow for the computation of similarity scores, but little work has evaluated these approaches from a user-centric perspective. In this work, we build and evaluate six similarity scoring algorithms that span a range of activity- and content-based approaches. We evaluate the performance of these algorithms using both offline metrics and a new set of more than 22,000 user-contributed evaluations. We integrate these results with a survey of more than 700 participants concerning their expectations about item similarity and related item recommendations. We find that content-based algorithms outperform ratings- and clickstream-based algorithms in terms of how well they match user expectations for similarity and recommendation quality. Our results yield a number of implications to guide the construction of related item recommendation algorithms.

SESSION: Beyond users and items

Recurrent knowledge graph embedding for effective recommendation

Knowledge graphs (KGs) have proven to be effective to improve recommendation. Existing methods mainly rely on hand-engineered features from KGs (e.g., meta paths), which requires domain knowledge. This paper presents RKGE, a KG embedding approach that automatically learns semantic representations of both entities and paths between entities for characterizing user preferences towards items. Specifically, RKGE employs a novel recurrent network architecture that contains a batch of recurrent networks to model the semantics of paths linking a same entity pair, which are seamlessly fused into recommendation. It further employs a pooling operator to discriminate the saliency of different paths in characterizing user preferences towards items. Extensive validation on real-world datasets shows the superiority of RKGE against state-of-the-art methods. Furthermore, we show that RKGE provides meaningful explanations for recommendation results.

User preferences in recommendation algorithms: the influence of user diversity, trust, and product category on privacy perceptions in recommender algorithms

The use of recommendation systems is widespread in online commerce. Depending on the algorithm that is used in the recommender system different types of data are recorded from user interactions. Typically, better recommendations are achieved when more detailed data about the user and product is available. However, users are often unaware of what data is stored and how it is used in recommendation. In a survey study with 197 participants we introduced different recommendation techniques (collaborative filtering, content-based recommendation, trust-based and social recommendation) to the users and asked participants to rate what type of algorithm should be used for what type of product category (books, mobile phones, contraceptives). We found different patterns of preferences for different product categories. The more sensitive the product the higher the preference for content-based filtering approaches that could work without storing personal data. Trust-based and social approaches utilizing data from social media were generally rejected.

Spectral collaborative filtering

Despite the popularity of Collaborative Filtering (CF), CF-based methods are haunted by the cold-start problem, which has a significantly negative impact on users' experiences with Recommender Systems (RS). In this paper, to overcome the aforementioned drawback, we first formulate the relationships between users and items as a bipartite graph. Then, we propose a new spectral convolution operation directly performing in the spectral domain, where not only the proximity information of a graph but also the connectivity information hidden in the graph are revealed. With the proposed spectral convolution operation, we build a deep recommendation model called Spectral Collaborative Filtering (SpectralCF). Benefiting from the rich information of connectivity existing in the spectral domain, SpectralCF is capable of discovering deep connections between users and items and therefore, alleviates the cold-start problem for CF. To the best of our knowledge, SpectralCF is the first CF-based method directly learning from the spectral domains of user-item bipartite graphs. We apply our method on several standard datasets. It is shown that SpectralCF significantly out-performs state-of-the-art models. Code and data are available at https://github.com/lzheng21/SpectralCF.

Categorical-attributes-based item classification for recommender systems

Many techniques to utilize side information of users and/or items as inputs to recommenders to improve recommendation, especially on cold-start items/users, have been developed over the years. In this work, we test the approach of utilizing item side information, specifically categorical attributes, in the output of recommendation models either through multi-task learning or hierarchical classification. We first demonstrate the efficacy of these approaches for both matrix factorization and neural networks with a medium-size real-word data set. We then show that they improve a neural-network based production model in an industrial-scale recommender system. We demonstrate the robustness of the hierarchical classification approach by introducing noise in building the hierarchy. Lastly, we investigate the generalizability of hierarchical classification on a simulated dataset by building two user models in which we can fully control the generative process of user-item interactions.

Eliciting pairwise preferences in recommender systems

Preference data in the form of ratings or likes for items are widely used in many Recommender Systems. However, previous research has shown that even item comparisons, which generate pairwise preference data, can be used to model user preferences. Moreover, pairwise preferences can be effectively combined with ratings to compute recommendations. In such hybrid approaches, the Recommender System requires to elicit both types of preference data from the user. In this work, we aim at identifying how and when to elicit pairwise preferences, i.e., when this form of user preference data is more meaningful for the user to express and more beneficial for the system. We conducted an online A/B test and compared a rating-only based system variant with another variant that allows the user to enter both types of preferences. Our results demonstrate that pairwise preferences are valuable and useful, especially when the user is focusing on a specific type of items. By incorporating pairwise preferences, the system can generate better recommendations than a state of the art rating-only based solution. Additionally, our results indicate that there seems to be a dependency between the user's personality, the perceived system usability and the satisfaction for the preference elicitation procedure, which varies if only ratings or a combination of ratings and pairwise preferences are elicited.

Adaptive collaborative topic modeling for online recommendation

Collaborative filtering (CF) mainly suffers from rating sparsity and from the cold-start problem. Auxiliary information like texts and images has been leveraged to alleviate these problems, resulting in hybrid recommender systems (RS). Due to the abundance of data continuously generated in real-world applications, it has become essential to design online RS that are able to handle user feedback and the availability of new items in real-time. These systems are also required to adapt to drifts when a change in the data distribution is detected. In this paper, we propose an adaptive collaborative topic modeling approach, CoAWILDA, as a hybrid system relying on adaptive online Latent Dirichlet Allocation (AWILDA) to model newly available items arriving as a document stream and incremental matrix factorization for CF. The topic model is maintained up-to-date in an online fashion and is retrained in batch when a drift is detected using documents automatically selected by an adaptive windowing technique. Our experiments on real-world datasets prove the effectiveness of our approach for online recommendation.

POSTER SESSION: Short papers with poster presentation

Recommendations for chemists: a case study

Large pharmaceutical companies have a wealth of reaction and chemical structure data, but face a new problem: analyzing that corpus to yield project insights and future directions. One straight-forward approach would be to have a recommendation system to match drug structures with similar research endeavors across geographically- or organizationally-separated groups. We developed and deployed Chem Recommender, a system that suggests similar, related work to experiments that chemists have recently started. The goal of the system is to accelerate the drug discovery process by ensuring that chemists are aware of each other's work. To date, we have sent more than 8500 recommendations to over 800 medicinal chemists in our organization. The results have been positive, with several chemists reporting that the recommendations have aided their molecular syntheses.

Word2vec applied to recommendation: hyperparameters matter

Skip-gram with negative sampling, a popular variant of Word2vec originally designed and tuned to create word embeddings for Natural Language Processing, has been used to create item embeddings with successful applications in recommendation. While these fields do not share the same type of data, neither evaluate on the same tasks, recommendation applications tend to use the same already tuned hyperparameters values, even if optimal hyperparameters values are often known to be data and task dependent. We thus investigate the marginal importance of each hyperparameter in a recommendation setting through large hyperparameter grid searches on various datasets. Results reveal that optimizing neglected hyperparameters, namely negative sampling distribution, number of epochs, subsampling parameter and window-size, significantly improves performance on a recommendation task, and can increase it by an order of magnitude. Importantly, we find that optimal hyper-parameters configurations for Natural Language Processing tasks and Recommendation tasks are noticeably different.

CF4CF: recommending collaborative filtering algorithms using collaborative filtering

As Collaborative Filtering becomes increasingly important in both academia and industry recommendation solutions, it also becomes imperative to study the algorithm selection task in this domain. This problem aims at finding automatic solutions which enable the selection of the best algorithms for a new problem, without performing full-fledged training and validation procedures. Existing work in this area includes several approaches using Metalearning, which relate the characteristics of the problem domain with the performance of the algorithms. This study explores an alternative approach to deal with this problem. Since, in essence, the algorithm selection problem is a recommendation problem, we investigate the use of Collaborative Filtering algorithms to select Collaborative Filtering algorithms. The proposed approach integrates subsampling landmarkers, a data characterization approach commonly used in Metalearning, with a Collaborative Filtering methodology, named CF4CF. The predictive performance obtained by CF4CF using benchmark recommendation datasets was similar or superior to that obtained with Metalearning.

Using citation-context to reduce topic drifting on pure citation-based recommendation

Recent works in the area of academic recommender systems have demonstrated the effectiveness of co-citation and citation closeness in related-document recommendations. However, documents recommended from such systems may drift away from the main theme of the query document. In this work, we investigate whether incorporating the textual information in close proximity to a citation as well as the citation position could reduce such drifting and further increase the performance of the recommender system. To investigate this, we run experiments with several recommendation methods on a newly created and now publicly available dataset containing 53 million unique citation-based records. We then conduct a user-based evaluation with domain-knowledgeable participants. Our results show that a new method based on the combination of Citation Proximity Analysis (CPA), topic modelling and word embeddings achieves more than 20% improvement in Normalised Discounted Cumulative Gain (nDCG) compared to CPA.

Measuring anti-relevance: a study on when recommendation algorithms produce bad suggestions

Typically, performance of recommender systems has been measured focusing on the amount of relevant items recommended to the users. However, this perspective provides an incomplete view of an algorithm's quality, since it neglects the amount of negative recommendations by equating the unknown and negatively interacted items when computing ranking-based evaluation metrics. In this paper, we propose an evaluation framework where anti-relevance is seamlessly introduced in several ranking-based metrics; in this way, we obtain a different perspective on how recommenders behave and the type of suggestions they make. Based on our results, we observe that non-personalized approaches tend to return less bad recommendations than personalized ones, however the amount of unknown recommendations is also larger, which explains why the latter tend to suggest more relevant items. Our metrics based on anti-relevance also show the potential to discriminate between algorithms whose performance is very similar in terms of relevance.

RecGAN: recurrent generative adversarial networks for recommendation systems

Recent studies in recommendation systems emphasize the significance of modeling latent features behind temporal evolution of user preference and item state to make relevant suggestions. However, static and dynamic behaviors and trends of users and items, which highly influence the feasibility of recommendations, were not adequately addressed in previous works. In this work, we leverage the temporal and latent feature modelling capabilities of Recurrent Neural Network (RNN) and Generative Adversarial Network (GAN), respectively, to propose a Recurrent Generative Adversarial Network (RecGAN). We use customized Gated Recurrent Unit (GRU) cells to capture latent features of users and items observable from short-term and long-term temporal profiles. The modification also includes collaborative filtering mechanisms to improve the relevance of recommended items. We evaluate RecGAN using two datasets on food and movie recommendation. Results indicate that our model outperforms other baseline models irrespective of user behavior and density of training data.

A crowdsourcing triage algorithm for geopolitical event forecasting

Predicting the outcome of geopolitical events is of huge importance to many organizations, as these forecasts may be used to make consequential decisions. Prediction polling is a common method used in crowdsourcing platforms for geopolitical forecasting, where a group of non-expert participants are asked to predict the outcome of a geopolitical event and the collected responses are aggregated to generate a forecast. It has been demonstrated that forecasts by such a crowd can be more accurate than the forecasts of experts. However, geopolitical prediction polling is challenging because participants are highly heterogeneous and diverse in terms of their skills and background knowledge and human resources are often limited. As a result, it is crucial to refer each question to the subset of participants that possess suitable skills to answer it, such that individual efforts are not wasted. In this paper, we propose an algorithm based on multitask learning to learn the skills of participants of a forecasting platform by using their performance history. The learned model then can be used to recommend suitable questions to forecasters. Our experimental results demonstrate that the prediction accuracy can be increased based on the proposed algorithm as opposed to when questions have been randomly assigned.

Large-scale recommendation for portfolio optimization

Individual investors are now massively using online brokers to trade stocks with convenient interfaces and low fees, albeit losing the advice and personalization traditionally provided by full-service brokers. We frame the problem faced by online brokers of replicating this level of service in a low-cost and automated manner for a very large number of users. Because of the care required in recommending financial products, we focus on a risk-management approach tailored to each user's portfolio and risk profile. We show that our hybrid approach, based on Modern Portfolio Theory and Collaborative Filtering, provides a sound and effective solution. The method is applicable to stocks as well as other financial assets, and can be easily combined with various financial forecasting models. We validate our proposal by comparing it with several baselines in a domain expert-based study.

Deep neural network marketplace recommenders in online experiments

Recommendations are broadly used in marketplaces to match users with items relevant to their interests and needs. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper focuses on the challenge of measuring recommender performance and summarizes the online experiment results with several promising types of deep neural network recommenders - hybrid item representation models combining features from user engagement and content, sequence-based models, and multi-armed bandit models that optimize user engagement by re-ranking proposals from multiple submodels. The recommenders are currently running in production at the leading Norwegian marketplace FINN.no and serves over one million visitors everyday.

A hierarchical bayesian model for size recommendation in fashion

We introduce a hierarchical Bayesian approach to tackle the challenging problem of size recommendation in e-commerce fashion. Our approach jointly models a size purchased by a customer, and its possible return event: 1. no return, 2. returned too small 3. returned too big. Those events are drawn following a multinomial distribution parameterized on the joint probability of each event, built following a hierarchy combining priors. Such a model allows us to incorporate extended domain expertise and article characteristics as prior knowledge, which in turn makes it possible for the underlying parameters to emerge thanks to sufficient data. Experiments are presented on real (anonymized) data from millions of customers along with a detailed discussion on the efficiency of such an approach within a large scale production system.

Psrec: social recommendation with pseudo ratings

Data sparsity and cold start are two major problems of collaborative filtering based recommender systems. In many modern Internet applications, we have a social network over the users of recommender systems, from which social information can be utilized to improve the accuracy of recommendation. In this paper, we propose a novel trust-based matrix factorization model. Unlike most existing social recommender systems which use social information in the form of a regularizer on parameters of recommendation algorithms, we utilize the social information to densify the training data set by filling certain missing values (handle the data sparsity problem). In addition, by employing different pseudo rating generating criteria on cold start users and normal users, we can also partially solve the cold start problem effectively. Experiment results on real-world data sets demonstrated the superiority of our method over state-of-art approaches.

Harnessing a generalised user behaviour model for next-POI recommendation

Recommender Systems (RSs) are commonly used in web applications to support users in finding items of their interest. In this paper we propose a novel RS approach that supports human decision making by leveraging data acquired in the physical world. We consider a scenario in which users' choices to visit points of interests (POIs) are tracked and used to generate recommendations for not yet visited POIs. We propose a novel approach to user behaviour modelling that is based on Inverse Reinforcement Learning (IRL). Two recommendation strategies based on the proposed behaviour model are also proposed; they generate recommendations that differ from the common approach based on user next action prediction. Our experimental analysis shows that the proposed approach outperforms state of the art models in terms of the overall utility the user gains by following the provided recommendations and the novelty of the recommended items.

Learning consumer and producer embeddings for user-generated content recommendation

User-Generated Content (UGC) is at the core of web applications where users can both produce and consume content. This differs from traditional e-Commerce domains where content producers and consumers are usually from two separate groups. In this work, we propose a method CPRec (consumer and producer based recommendation), for recommending content on UGC-based platforms. Specifically, we learn a core embedding for each user and two transformation matrices to project the user's core embedding into two 'role' embeddings (i.e., a producer and consumer role). We model each interaction by the ternary relation between the consumer, the consumed item, and its producer. Empirical studies on two large-scale UGC applications show that our method outperforms standard collaborative filtering methods as well as recent methods that model producer information via item features.

Field-aware probabilistic embedding neural network for CTR prediction

For Click-Through Rate (CTR) prediction, Field-aware Factorization Machines (FFM) have exhibited great effectiveness by considering field information. However, it is also observed that FFM suffers from the overfitting problem in many practical scenarios. In this paper, we propose a Field-aware Probabilistic Embedding Neural Network (FPENN) model with both good generalization ability and high accuracy. FPENN estimates the probability distribution of the field-aware embedding rather than using the single point estimation (the maximum a posteriori estimation) to prevent overfitting. Both low-order and high-order feature interactions are considered to improve the accuracy. FPENN consists of three components, i.e., FPE component, Quadratic component and Deep component. FPE component outputs probabilistic embedding to the other two components, where various confidence levels for feature embeddings are incorporated to enhance the robustness and the accuracy. Quadratic component is designed for extracting low-order feature interactions, while Deep component aims at capturing high-order feature interactions. Experiments are conducted on two benchmark datasets, Avazu and Criteo. The results confirm that our model alleviates the overfitting problem while having a higher accuracy.

Attentive neural architecture incorporating song features for music recommendation

Recommender Systems are an integral part of music sharing platforms. Often the aim of these systems is to increase the time, the user spends on the platform and hence having a high commercial value. The systems which aim at increasing the average time a user spends on the platform often need to recommend songs which the user might want to listen to next at each point in time. This is different from recommendation systems which try to predict the item which might be of interest to the user at some point in the user lifetime but not necessarily in the very near future. Prediction of next song the user might like requires some kind of modeling of the user interests at the given point of time. Attentive neural networks have been exploiting the sequence in which the items were selected by the user to model the implicit short-term interests of the user for the task of next item prediction, however we feel that features of the songs occurring in the sequence could also convey some important information about the short-term user interest which only the items cannot. In this direction we propose a novel attentive neural architecture which in addition to the sequence of items selected by the user, uses the features of these items to better learn the user short-term preferences and recommend next song to the user.

Decomposing fit semantics for product size recommendation in metric spaces

Product size recommendation and fit prediction are critical in order to improve customers' shopping experiences and to reduce product return rates. Modeling customers' fit feedback is challenging due to its subtle semantics, arising from the subjective evaluation of products, and imbalanced label distribution. In this paper, we propose a new predictive framework to tackle the product fit problem, which captures the semantics behind customers' fit feedback, and employs a metric learning technique to resolve label imbalance issues. We also contribute two public datasets collected from online clothing retailers.

Learning to recommend diverse items over implicit feedback on PANDOR

In this paper, we present a novel and publicly available dataset for online recommendation provided by Purch1. The dataset records the clicks generated by users of one of Purch's high-tech website over the ads they have been shown for one month. In addition, the dataset contains contextual information about offers such as offer titles and keywords, as well as the anonymized content of the page on which offers were displayed. Then, besides a detailed description of the dataset, we evaluate the performance of six popular baselines and propose a simple yet effective strategy on how to overcome the existing challenges inherent to implicit feedback and popularity bias introduced while designing an efficient and scalable recommendation algorithm. More specifically, we propose to demonstrate the importance of introducing diversity based on an appropriate representation of items in Recommender Systems, when the available feedback is strongly biased.

Learning within-session budgets from browsing trajectories

Building price- and budget-aware recommender systems is critical in settings where one wishes to produce recommendations that balance users' preferences (what they like) with a model of purchase likelihood (what they will buy). A trivial solution consists of learning global budget terms for each user based on their past expenditure. To more accurately model user budgets, we also consider a user's within-session budget, which may deviate from their global budget depending on their shopping context. In this paper, we find that users implicitly reveal their session-specific budgets through the sequence of items they browse within that session. Specifically, we find that some users "browse down," by purchasing the cheapest item among alternatives under consideration, others "browse up" (selecting the most expensive), and others ultimately purchase items around the middle. Surprisingly, this mixture of behaviors is difficult to observe globally, as individual users tend to belong firmly to one of the three segments. To model this behavior, we develop an interpretable budget model that combines a clustering component to detect different user segments, with a model of segment-specific purchase profiles. We apply our model on a dataset of browsing and purchasing sessions from Etsy, a large e-commerce website focused on handmade and vintage goods, where it outperforms strong baselines and existing production systems.

Kernelized probabilistic matrix factorization for collaborative filtering: exploiting projected user and item graph

Matrix Factorization (MF) techniques have already shown its strong foundation in collaborative filtering (CF), particularly for rating prediction problem. In the basic MF model, the use of additional information such as social network, item tags along with rating has become popular and effective, which results in making the model more complex. However, there are very few studies in recent years, which only use the users rating information for the recommendation. In this paper, we present a new finding on exploiting Projected User and Item Graph in the setting of Kernelized Probabilistic Matrix Factorization (KPMF), which uses different graph kernels from the projected graphs. KPMF works with its latent vector spanning over all users (and items) with Gaussian process priors and tries to capture the covariance structure across users and items from their respective projected graphs. We also explore the ways of building these projected graphs to maximize the prediction accuracy. We implement the model in five real-world datasets and achieve significant performance improvement in terms of RMSE with state-of-the-art MF techniques.

A probabilistic model for intrusive recommendation assessment

The overwhelming advances in mobile technologies allow recommender systems to be highly contextualized and able to deliver recommendation without an explicit request. However, it is no longer enough for a recommender system to determine what to recommend according to the users' needs, but it also has to deal with the risk of disturbing the user during recommendation. We believe that mobile technologies along with contextual information may help alleviate this issue. In this paper, we address intrusiveness as a probabilistic approach that makes use of the several embedded applications within the user's device and the user's contextual information in order to figure out intrusive recommendations that are subject to rejection. The experiments that we conducted have shown that the proposed approach yields promising results.

Trust-based collaborative filtering: tackling the cold start problem using regular equivalence

User-based Collaborative Filtering (CF) is one of the most popular approaches to create recommender systems. This approach is based on finding the most relevant k users from whose rating history we can extract items to recommend. CF, however, suffers from data sparsity and the cold-start problem since users often rate only a small fraction of available items. One solution is to incorporate additional information into the recommendation process such as explicit trust scores that are assigned by users to others or implicit trust relationships that result from social connections between users. Such relationships typically form a very sparse trust network, which can be utilized to generate recommendations for users based on people they trust. In our work, we explore the use of regular equivalence applied to a trust network to generate a similarity matrix that is used to select the k-nearest neighbors for recommending items. We evaluate our approach on Epinions and we find that we can outperform related methods for tackling cold-start users in terms of recommendation accuracy.

Rank and rate: multi-task learning for recommender systems

The two main tasks in the Recommender Systems domain are the ranking and rating prediction tasks. The rating prediction task aims at predicting to what extent a user would like any given item, which would enable to recommend the items with the highest predicted scores. The ranking task on the other hand directly aims at recommending the most valuable items for the user. Several previous approaches proposed learning user and item representations to optimize both tasks simultaneously in a multi-task framework. In this work we propose a novel multi-task framework that exploits the fact that a user does a two-phase decision process - first decides to interact with an item (ranking task) and only afterward to rate it (rating prediction task).

We evaluated our framework on two benchmark datasets, on two different configurations and showed its superiority over state-of-the-art methods.

Audio-visual encoding of multimedia content for enhancing movie recommendations

We propose a multi-modal content-based movie recommender system that replaces human-generated metadata with content descriptions automatically extracted from the visual and audio channels of a video. Content descriptors improve over traditional metadata in terms of both richness (it is possible to extract hundreds of meaningful features covering various modalities) and quality (content features are consistent across different systems and immune to human errors). Our recommender system integrates state-of-the-art aesthetic and deep visual features as well as block-level and i-vector audio features. For fusing the different modalities, we propose a rank aggregation strategy extending the Borda count approach.

We evaluate the proposed multi-modal recommender system comprehensively against metadata-based baselines. To this end, we conduct two empirical studies: (i) a system-centric study to measure the offline quality of recommendations in terms of accuracy-related and beyond-accuracy performance measures (novelty, diversity, and coverage), and (ii) a user-centric online experiment, measuring different subjective metrics, including relevance, satisfaction, and diversity. In both studies, we use a dataset of more than 4,000 movie trailers, which makes our approach versatile. Our results shed light on the accuracy and beyond-accuracy performance of audio, visual, and textual features in content-based movie recommender systems.

Efficient online recommendation via low-rank ensemble sampling

The low-rank structure is one of the most prominent features in modern recommendation problems. In this paper, we consider an online learning problem with a low-rank expected reward matrix where both row features and column features are unknown a priori, and the agent aims to learn to choose the best row-column pair (i.e. the maximum entry) in the matrix. We develop a novel online recommendation algorithm based on ensemble sampling, a recently developed computationally efficient approximation of Thompson sampling. Our computational results show that our algorithm consistently achieves order-of-magnitude improvements over the baselines in both synthetic and real-world experiments.

Semantic-based tag recommendation in scientific bookmarking systems

Recently, tagging has become a common way for users to organize and share digital content, and tag recommendation (TR) has become a very important research topic. Most of the recommendation approaches which are based on text embedding have utilized bag-of-words technique. On the other hand, proposed deep learning methods for capturing semantic meanings in the text, have been proved to be effective in various natural language processing (NLP) applications. In this paper, we present a content-based TR method that adopts deep recurrent neural networks to encode titles and abstracts of scientific articles into semantic vectors for enhancing the recommendation task, specifically bidirectional gated recurrent units (bi-GRUs) with attention mechanism. The experimental evaluation is performed on a dataset from CiteULike. The overall findings show that the proposed model is effective in representing scientific articles for tag recommendation.

CLoSe: <u>c</u>ontextualized <u>lo</u>cation <u>se</u>quence recommender

The location-based social networks (LBSN) (e.g., Facebook, etc.) have been explored in the past decade for Point-of-Interest (POI) recommendation. Many of the existing systems focus on recommending a single location or a list which might not be contextually coherent. In this paper, we propose a model termed CLoSe (<u>C</u>ontextualized <u>Lo</u>cation <u>Se</u>quence Recommender) that generates contextually coherent POI sequences relevant to user preferences. The POI sequence recommenders are helpful in many day-to-day activities, for e.g., itinerary planning, etc. To the best of our knowledge, this paper is the first to formulate contextual POI sequence recommendation by exploiting Recurrent Neural Network (RNN). We incorporate check-in contexts to the hidden layer and global context to the hidden and output layers of RNN. We also demonstrate the efficiency of extended Long-short term memory (LSTM) in sequence generation. The main contributions of this paper are: (i) it exploits multi-context, personalized user preferences to formulate contextual POI sequence generation, (ii) it presents contextual extensions of RNN and LSTM that incorporate different contexts applicable to a POI and POI sequence, and (iii) it demonstrates significant performance gain of proposed model on pair-F1 and NDCG metrics when evaluated with two real-world datasets.

User preference learning in multi-criteria recommendations using stacked auto encoders

Recommender System (RS) is an essential component of many businesses, especially in e-commerce domain. RS exploits the preference history (rating, purchase, review, etc.) of users in order to provide the recommendations. A user in traditional RS can provide only one rating value about an item. Deep Neural Networks have been used in this single rating system to improve recommendation accuracy in the recent times. However, the single rating systems are inadequate to understand the usersfi preferences about an item. On the other hand, business enterprises such as tourism, e-learning, etc. facilitate users to provide multiple criteria ratings about an item, thus it becomes easier to understand users' preference over single rating system. In this paper, we propose an extended Stacked Autoencoders (a Deep Neural Network technique) to utilize the multi-criteria ratings. The proposed network is designed to learn the relationship between each user's criteria and overall rating efficiently. Experimental results on real world datasets (Yahoo! Movies and TripAdvisor) demonstrate that the proposed approach outperforms state-of-the-art single rating systems and multi-criteria approaches on various performance metrics.

SESSION: Core algorithms

Variational learning to rank (VL2R)

We present Variational Learning to Rank (VL2R), a combination of variational inference and learning to rank. The combination provides a natural way to balance exploration and exploitation of the algorithm by introducing shuffling of product search/category listings according to the model's relevance uncertainty for each product. Simply put, we perturb (newer) products with higher uncertainty on the relevance more than (older) products which have a lower uncertainty on the relevance.

Our formalism makes it possible to train an end-to-end model that optimizes for both ranking and shuffling, compared to known state-of-the-art systems where ranking and shuffling are treated as separate problems. VL2R provides an integrated way of doing propensity scoring during the offline learning phase, thus reducing selection bias. The system is simple, yet powerful and flexible. We have implemented it within the Salesforce Commerce Cloud; a platform 500 million unique online shoppers interact with each month across 2,750 websites in 53+ countries as of FY18.

In this talk, we will go into the details of our variational learning to rank system and share our early experiences with optimizing VL2R and running it in production. We hope that by sharing VL2R with the recommendation systems community, we will foster more research in this direction, and result in systems that are faster at learning user preferences for changing catalogs.

Adapting session based recommendation for features through transfer learning

This industry talk covers the deep learning architecture developed at Realtor.com to recommend real estate listings to our userbase. The recommendation of homes is a different problem than most other domains both in the sense that listings are unique and that there are additional geographic and time constraints that increase the sparsity of interactions and make recommendation of individual listings more challenging. In particular time on market in a hot area can be limited to weeks or even days, and listing cold-start is critical to providing up to date market information. Thankfully the structured feature data for listings is incredibly rich and provides a framework from which to map listings into a meaningful vector space. User first impressions are also incredibly important in this highly competitive field, and offline recommendation or models that don't adapt during the users session are less desirable.

In order to solve this recommendation problem we have developed a model based off of session based recommendation [1]. The architecture utilizes state of the art techniques from Natural Language Processing, including the AWD-LSTM language model developed by Salesforce [2]. To solve for cold-start of listings a structured data based denoising autoencoder was adapted from the methodology described in the winning entry of the Puerto Segurno Safe Driver Kaggle Competition [3]. This model is not used in the common way of generating fixed feature vectors, but rather the entire head of the autoencoder model, from the feature inputs to the middle layer commonly used as the vector output, is first trained to encode listing features, and then becomes the input to the AWD-LSTM architecture. This style of transfer learning is common in Computer Vision, and has recently been utilized in NLP to achieve state of the art results for text classification [4]. By including the head we are able to further optimize the listing encoder network and embeddings to take user interactions into account. As in traditional session based recommendation users are represented as the sequence of listings that they view, however those listings are fed into the model as the sequence of features.

The final system consists of several components. The first attempts to calculate and maintain the users' feature vector and model hidden weights in near realtime, providing a representation for the user within the system. This representation is used by several downstream components, most notably the search rerank and recommendation modules which calculate users' interest in listings both in the context of the output of more traditional elasticsearch queries via cosine similarity of user/listing vectors and through approximate nearest neighbor vector space searches for relevant listings which form the input set for a pointwise scoring model trained on time on listing as done by YouTube [5].

Hulu video recommendation: from relevance to reasoning

Online Video Streaming services such as Hulu hosts tens of millions of premium videos, which requires an effective recommendation system to help viewers discover what they enjoy. In this talk, we will introduce Hulu's recent technical progresses in recommender systems and deep-dive into the topic of generating recommendation reason from knowledge graph. We have two user scenarios: the store-shelf and autoplay. The first requires a list of videos to maximize the chance that a viewer would pick one of them to watch. The second requires a sequence of video recommendations such that the viewer would continuously watch within the current session.

In the model layer, we designed specific models to match with each user scenario, balancing both exploitation and exploration. For example, we leverage the contextual-bandit model in the store-shelf scenario to adapt the ranking strategy to various types of user feedbacks. To optimize exploitation, we tested several granularity levels for parameter sharing among the arms. For more effective exploration, we incorporate Thomason sampling. For the autoplay scenario, we use a contextual recurrent neural network to predict the next video that the viewer is going to watch.

In the feature and data layer, we train embeddings for content, user and contextual info. For example, to train content embeddings, we collect factual tags from metadata, sentiment tags from reviews, and keywords from the captions and object/action recognized using computer vision techniques.

Next we will deep-dive into one important topic: generating recommendation reason from knowledge graph.

A fact is defined by a tuple of related entities and their relation, which is normally a pair of entities tagged by a relationship. In our problem setting, recommendation results are the targets, viewed as inputs for the reasoning task, consisting of pairs of relevant entities, i.e. a source node and a destination node in a knowledge graph. The recommendation reasoning task is to learn a path or a small directed acyclic subgraph, connecting the source node to the destination node.

Since the facts in a knowledge graph have different confidence values for different reasoned targets, we need to conduct a probabilistic inference. The challenge is we do not know a predefined set of logic rules to guide the search through the knowledge graph, which prevents us from directly applying the probabilistic logic methods. Inspired by recent advances in deep learning and reinforcement learning, especially in graph neural networks, attention mechanism and deep generative models, we propose two ways to model the reasoning process: the differentiable reasoning approach and the stochastic reasoning approach.

Differentiable reasoning approaches are based on graph neural networks [1,2] with attention flow and information flow. The attention dynamics is an iterative process of redistributing and aggregating attention over the knowledge graph, starting at the source node. The final attention aggregated at the destination node serves for the prediction to compute the loss. Instead of the prediction accuracy, we care more about how the learned attention dynamics draws its reasoning track in a knowledge graph.

Stochastic reasoning approaches frame the reasoning process as learning a probabilistic graphical model consisting of stochastic discrete operations, such as selecting a node and selecting an edge, to build a reason subgraph extracted from the knowledge graph. The model is known as stochastic computation graphs (SCGs), and to learn it, we propose a generalized back-propagation framework Backprop-Q [3] to overcome the gradient-blocking issues in applying standard back-propagation. In summary, we give an overview of the recommendation research in Hulu and deep-dive into our differentiable reasoning approach and stochastic reasoning approach for generating recommendation reasons based on a knowledge graph.

Hybrid search: incorporating contextual signals in recommendations at pinterest

Many modern recommender systems use collaborative filtering or historical engagement data to serve the best recommendations for each item. However, the context of each recommendation instance can be very different. Some users may be casually browsing, while others are searching with high intent. At Pinterest, we realized that building our system solely on aggregated historical data or pin-board collaborative filtering [1] would not be able to capture these differences. Incorporating contextual signals helps us serve better recommendations for every instance.

Pinterest Related Pins is an item-to-item recommender system that accounts for 40 percent of engagement on Pinterest. [2] On Pinterest, Related Pins appears as a feed of content relevant to the Pin a user has clicked on. Users arrive at Related Pins feeds from a variety of surfaces, such as their Home Feed, Search results, or Boards. As expected, these users often have different intents. Users coming from Search have already executed a specific text query and clicked on one of the Pins in the Search results. This context tells us that the user has high intent and is interested in something related to both the Search query as well as the clicked Pin. This context is very different from a user who is casually scrolling through their Home Feed and clicks on a Pin that happens to catch their eye. The Related Pins recommendations for each of these clicked Pins should therefore also differ accordingly.

Related Pins are generally relevant to the clicked Pin. The recommendations for a women's dress shoe Pin will be other shoes of similar style, some of which may be paired with matching outfits. However, if the user searched in particular for "red ballet flats with sequins," the Related Pins may not be specific enough to be useful to the user. In order to address this, we developed a hybrid search that takes both the text search query and the clicked pin image and metadata as inputs, and outputs a set of results tailored to both. We found that this improved user engagement for Related Pins from Search by 20% on top of the previous production recommendation system [2]. Following this exciting launch, we are planning to further incorporate contextual signals by adding them as features in our model.

Learning content and usage factors simultaneously to reduce clickbaits

Recommending news and content is often more difficult than classic recommendation problems. At recommendation time, there is often less high quality explicit usage signals like upvotes, shares, dislikes, etc. because articles are relevant for a very short amount of time. Solely relying on implicit usage signals (views) in collaborative filtering for news articles often yields low quality documents optimized for views and clicks. Traditionally, content based filtering methods such as topic modeling, named entity extraction etc. are often used to counter or mitigate these issues but result in poorer recommendations on their own, and hybrid solutions of ensembles of content and collaborative filtering are difficult to optimize.

This talk proposes learning factorized representations of documents using both the content and usage signals simultaneously. Using both signals simultaneously encourages the content and usage signals to act as regularizers for each other. Also, this serves to keep the recommendation quality high while reducing the number of click-baits. This avoids the additional step of tuning often-used ensembled content and collaborative filtering based hybrid models.

This research explores learning these shared factorized representations between the two views using the traditional matrix factorization framework as well as probabilistic approaches based on topic modeling. This talk shares the lessons learned from using both approaches and shows the impact of using these learned representations on recommendation quality.

SESSION: System considerations

Measuring operational quality of recommendations: industry talk abstract

With the rise of machine learning in production, we need to talk about operational data science. The talk introduces a pragmatic method on how to measure the response quality of a recommendation service. To that end the definition of a successful response is introduced and guidelines how to capture the rate of successful responses are presented.

There are several changes that can happen during the serving phase of a model which negatively affect the quality of the algorithmic response. A few examples are:

• The model is updated and the new version is inferior to the previous one.

• The latest deployment of the stack that processes the request and serves the model contains a bug.

• Changes in the infrastructure lead to performance loss. An example in an e-commerce setting is switching to a different microservice to obtain article metadata used for filtering the recommendations.

• The input data changes. Typical reasons might be a client application that releases a bug (e.g., lowercasing a case sensitive identifier) or changes a feature in a way that affects the data distribution such as allowing all users to use the product cart instead of previously allowing it only for logged in users. If the change is not detected training data and serving data diverge.

Current monitoring solutions mostly focus on the completion of a request without errors and the request latency. That means the mentioned examples would be hard to detect despite the response quality being significantly degraded, sometimes permanently.

In addition to not being able to detect the mentioned changes, it can be argued that current monitoring practices are not sufficient to capture the performance of a recommender system or any other data driven service in a meaningful way. We might for instance have returned popular articles as a fallback in a case where personalized recommendations were requested. We should record that response as unsuccessful.

A new paradigm for measuring response quality should fulfil the following criteria:

• comparable across models

• simple and understandable metrics

• measurements are collected in real time

• allows for actionable alerting on problems

The response quality is defined as an approximation of how well the response fits the defined business and modelling case. The goal is to bridge the gap between metrics used during model learning and technical monitoring metrics. Ideally we would like to obtain Service Level Objectives (SLO)[1] that contain this quality aspect and can be discussed with the different client applications based on the business cases, e.g., "85% of the order confirmation emails contain personalized recommendations based on the purchase."

A case study will illustrate how algorithmic monitoring was introduced in the recommendation team at Zalando. Zalando is one of Europe's largest fashion retailers and multiple recommendation algorithms serve many online and offline use cases. You will see several examples of how the monitoring helped to identify bugs or diagnose quality problems.

Building recommender systems with strict privacy boundaries

Every day, millions of people rely on Slack to get the information they need to do their jobs. To make their working lives more productive, Slack has built a number of recommender systems to prioritize the content a given user is most likely to need at any point in time. These systems have wide-ranging purposes, from recommending channels for users to join to ranking unread content so users can catch up more easily.

A common trait of all these systems is that they must deal with strict privacy boundaries inherent to the underlying dataset. By policy, users can only be exposed to data that was publicly shared in their own Slack team. These restrictions must carry over into the recommender systems: not only must they refrain from recommending data from foreign teams, but ---more subtly--- patterns in foreign teams' data must not be inferable from the usage of these systems.

In this talk, I will discuss how Slack's dataset differs from those used in traditional recommender systems such as the Netflix Prize dataset. I will also present some techniques we developed to leverage the entire dataset to improve the performance of our recommender systems without jeopardizing the privacy boundaries we guarantee to our customers. These include a mix of algorithms with increased locality as well as the use of metadata over data to generate privacy sensitive recommendations.

Artwork personalization at netflix

For many years, the main goal of the Netflix personalized recommendation system has been to get the right titles in front of our members at the right time. But the job of recommendation does not end there. The homepage should be able to convey to the member enough evidence of why a title may be good for her, especially for shows that the member has never heard of. One way to address this challenge is to personalize the way we portray the titles on our service. An important aspect of how to portray titles is through the artwork or imagery we display to visually represent each title. The artwork may highlight an actor that you recognize, capture an exciting moment like a car chase, or contain a dramatic scene that conveys the essence of a movie or show. It is important to select good artwork because it may be the first time a member becomes aware of a title (and sometimes the only time), so it must speak to them in a meaningful way. In this talk, we will present an approach for personalizing the artwork we use on the Netflix homepage. The system selects an image for each member and video to give better visual evidence for why the title might be appealing to that particular member.

There are many challenges involved in getting artwork personalization to succeed. One challenge is that we can only select a single piece of artwork to represent each title. In contrast, typical recommendation engines present multiple items (in some order) to a member allowing us to subsequently learn about preferences between items through the specific item a member selects from the presented assortment. In contrast, we only collect feedback from the one image that was presented to each member for each title. This leads to a training paradigm based on incomplete logged bandit feedback [1]. Moreover, since the artwork selection process happens on top of a recommendation system, collecting data directly from the production experience (observational data) makes it hard to detangle whether a play was due to the recommendation or from the incremental effect of personalized evidence. Another challenge is understanding the impact of changing the artwork between sessions and if that is beneficial or confusing to the user. We also need to consider how diverse artworks perform in relation to one another. Finally, given that the popularity and audiences for titles can change or drop quickly after launch, the system needs to quickly learn how to personalize images for a new item.

All these considerations naturally lead us to frame the problem as online learning with contextual multi-arm bandits. Briefly, contextual bandits are a class of online learning algorithms that balance the cost of gathering randomized training data (which is required for learning an unbiased model on an ongoing basis) with the benefits of applying the learned model to each member context (to maximize user engagement). This is known as the explore-exploit trade-off. In this setting, for a given title the set of actions is the set of available images for the title. We aim to discover the underlying unknown reward, based on probability of play, for each image given a member, a title, and some context. The context could be based on profile attributes (geo-localization, previous plays, etc), the device, time, and other factors that might affect what is the optimal image to choose in each session.

With a large member base, many titles in the catalog, and multiple images per title, Netflix's product is an ideal platform to test ideas for personalization of artwork. At peak, over 20 million personalized image requests per second need to be handled with low latency. To train our model, we leveraged existing logged data from a previous system that chose images in an unpersonalized manner. We will present results comparing the contextual bandit personalization algorithms using offline policy evaluation metrics [2], such as inverse propensity scoring and doubly robust estimators [3]. We will conclude with a discussion of opportunities to expand and improve our approach. This includes developing algorithms to handle cold-start by quickly personalizing new images and new titles. We also discuss extending this personalization approach across other types of artwork we use and other evidence that describe our titles such as synopses, metadata, and trailers. Finally, we discuss potentially closing the loop by looking at how we can help artists and designers figure out what new imagery they should create to make a title even more compelling and personalizable.

Conversational content discovery via comcast X1 voice interface

The global market for intelligent voice-enabled devices is expanding at a fast pace. Comcast, one of the largest cable provides in the US with about 30 million users, has recently reinvented the way that customers can discover and access content on an entertainment platform by introducing a voice remote control for its Xfinity X1 entertainment platform. Spoken language input allows the customer to express what they are interested in on their terms, which has made it significantly more convenient for the users to find their favorite TV channel or movie compared to the traditional limits of a screen menu navigated with the keys of a TV remote.

The more natural user experience via voice interface results in voice queries that are considerably more complex to handle compared to channel numbers typed in or movie titles selected on screen and this poses a challenge for the platform to understand the user intent and find the appropriate action for millions of voice queries that we receive every day. This also makes it necessary to adapt the underlying content recommendation algorithms to incorporate the richer intent context from the users.

We describe some of the key components of our voice-powered content discovery platform that addresses specifically these issues. We discuss how we leverage multimodal data including voice queries and large database of metadata to enable a more natural search experience via voice queries for finding relevant movies, TV shows or even a specific episode of a series. We describe the models that encode semantic similarities between the content and their metadata to allow users to search for places, people, topics using keywords or phrases that do not explicitly appear in the movie/show titles as is traditionally the case. We describe how this category of voice search queries can be framed as a recommendation problem.

Even though voice input is extremely powerful to capture the intent of our customers, the freedom to say anything makes it also more difficult for a voice remote user to know the range of possible queries that are supported by our system. We show how we can leverage millions of voice queries that we receive every day to build and train a deep learning-based recommender system that produces different types of recommendations such as educational suggestions and tips for voice commands that the platform support.

Finally, it is important to consider that the true potential of the voice-powered entertainment experience is the result of the fusion of intents expressed in language with navigation of content on the screen via the remote navigation buttons. For all the applications and features discussed in this talk, our recommendation systems are adapted to provide the most relevant suggestions no matter if the voice interface is initiating the action, navigating through the results rendered on the TV screen and narrowing down the set of results by allowing the user to ask follow-up queries or select buttons.

Connecting sellers and buyers on the world's largest inventory

At eBay, sellers can offer virtually any type of listing, rendering the world's largest inventory, with well over a billion items. Yet, the noisy nature of the input data and the extremely long-tailed item distribution pose a variety of challenges for search and recommendation, such as understanding the unique attributes (aspects) of the products, their importance to both sellers and buyers, and their intra-relationships, all essential to providing a high-quality user experience on the site.

In this talk, I will present several challenges and corresponding solution frameworks recently developed at eBay Research for aspect extraction, normalization, weighting, and relation inference; the mapping of relationships between e-commerce entities for matching uploaded listings to catalog products and feeding the e-commerce knowledge graph; the recommendation of categories for sellers' contributions; and the automatic generation of textual fields (title, description) to bridge the gap between sellers and buyers by helping them speak the same language. Our methods combine a variety of language processing and computer vision approaches applied on the different types of data contributed by sellers. Learning to rank, named entity recognition, object identification, machine translation, and summarization are just a few example techniques that come to play. Our methods drive different usage scenarios by enabling a better representation of users and items and an effective computation of their similarities. I will also describe how our applied research teams perform their work, from the development of initial prototypes, through offline and online production processes, to different evaluation schemes. I will conclude the talk by reviewing open challenges in large-scale e-commerce that will have to be addressed in the years to come.

DEMONSTRATION SESSION: Demonstrations

Extra: explaining team recommendation in networks

State-of-the-art in network science of teams offers effective recommendation methods to answer questions like who is the best replacement, what is the best team expansion strategy, but lacks intuitive ways to explain why the optimization algorithm gives the specific recommendation for a given team optimization scenario. To tackle this problem, we develop an interactive prototype system, Extra, as the first step towards addressing such a sense-making challenge, through the lens of the underlying network where teams embed, to explain the team recommendation results. The main advantages are (1) Algorithm efficacy: we propose an effective and fast algorithm to explain random walk graph kernel, the central technique for networked team recommendation; (2) Intuitive visual explanation: we present intuitive visual analysis of the recommendation results, which can help users better understand the rationality of the underlying team recommendation algorithm.

Case recommender: a flexible and extensible python framework for recommender systems

This paper presents a polished open-source Python-based recommender framework named Case Recommender, which provides a rich set of components from which developers can construct and evaluate customized recommender systems. It implements well-known and state-of-the-art algorithms in rating prediction and item recommendation scenarios. The main advantage of the Case Recommender is the possibility to integrate clustering and ensemble algorithms with recommendation engines, easing the development of more accurate and efficient approaches.

Tourrec: a tourist trip recommender system for individuals and groups

In this demo paper, we present TourRec, amobile Recommender System (RS) for tourist trips, sequences of points of interest (POIs) along enjoyable routes. The core of TourRec is a modular, multi-tier architecture facilitating the development and evaluation of new recommendation algorithms, clients and data sources. We show how the TourRec Android application can be used to recommend tourist trips to individuals and groups. Furthermore, we explain how TourRec supports the evaluation of different recommendation algorithms and group recommendation strategies. A video demonstrating how TourRec works is available under https://youtu.be/o_yx8UGvvvo.

Module advisor: a hybrid recommender system for elective module exploration

Recommender systems are omni-present in our every day lives, guiding us through the vast amount of information available. However, in the academic world, personalised recommendations are less prominent, leaving students to navigate through the typically large space of available courses and modules manually. Since it is crucial for students to make informed choices about their learning pathways, we aim to improve the way students discover elective modules by developing a hybrid recommender system prototype that is specifically designed to help students find elective modules from a diverse set of subjects. We can improve the discoverability of long-tail options and help students broaden their horizons by combining notions of similarity and diversity.

Automating recommender systems experimentation with librec-auto

Recommender systems research often requires the creation and execution of large numbers of algorithmic experiments to determine the sensitivity of results to the values of various hyperparameters. Existing recommender systems platforms fail to provide a basis for systematic experimentation of this type. In this paper, we describe librec-auto, a wrapper for the well-known LibRec library, which provides an environment that supports automated experimentation.

Query-based simple and scalable recommender systems with apache hivemall

This study demonstrates a way to build large-scale recommender systems by just writing a series of SQL-like queries. In order to efficiently run recommendation logics on a cluster of computers, we implemented a variety of recommendation algorithms and common recommendation functions (e.g., efficient similarity computation, top-k retrieval, and evaluation measures) asHive user-defined functions (UDFs) in Apache Hivemall. We demonstrate that how Apache Hivemall can easily be used for building a scalable recommendation system with satisfying business requirements such as scalability, latency, and stability.

Towards an open, collaborative REST API for recommender systems

Recommender Systems aim to suggest relevant items to users, however, for this they need to properly obtain/serve different types of data from/to the users of such systems. In this work, we propose and show an example implementation for a common REST API focused on Recommender Systems. This API meets the most typical requirements faced by Recommender Systems practitioners while, at the same time, is open and flexible to be extended, based on the feedback from the community. We also present a Web client that demonstrates the functionalities of the proposed API.

Picture-based navigation for diagnosing post-harvest diseases of apple

This demo presents a conversational navigation approach for a diagnostic application of postharvest diseases of apple with the goal to educate users on the diagnosed diseases as well as to recommend consequences for the storage facility and what action to take for the next growing period. It thus builds on earlier works on picture-based navigation for conversational recommender systems and provides evidence for its usability based on a first small-scale comparative usability study.

Cognitive company discovery

Cognitive Company Discovery is an application that helps business professionals identify companies of interest to them. The application employs a variety of artificial intelligence and data science techniques to build a corpus of company data, rapidly search the corpus based on implicit and explicit user queries, present the results using visualization techniques that yield insight into areas of interest to the user and to scan vast amounts of news and blog posts to aid users in discovering new companies. The application is currently deployed in a major corporation. A video that demonstrates our system can be found at the following URL: https://player.vimeo.com/video/278031050

WORKSHOP SESSION: Workshops, challenge and late-breaking results

2nd workshop on recommendation in complex scenarios (complexrec 2018)

Over the past decade, recommendation algorithms for ratings prediction and item ranking have steadily matured. However, these state-of-the-art algorithms are typically applied in relatively straightforward scenarios. In reality, recommendation is often a more complex problem: it is usually just a single step in the user's more complex background need. These background needs can often place a variety of constraints on which recommendations are interesting to the user and when they are appropriate. However, relatively little research has been done on these complex recommendation scenarios. The ComplexRec 2018 workshop addresses this by providing an interactive venue for discussing approaches to recommendation in complex scenarios that have no simple one-size-fits-all solution.

DLRS 2018: third workshop on deep learning for recommender systems

Deep learning is now an integral part of recommender systems, but the research is still in its early phase. New research topics pop up frequently and established topics are extended in new, interesting directions. DLRS 2018 is a venue for pioneering work in the intersection of deep learning and recommender systems research.

REVEAL 2018: offline evaluation for recommender systems

The inaugural REVEAL workshop1 focuses on revisiting the offline evaluation problem for recommender systems. Being able to perform offline experiments is key to rapid innovation; however practitioners often observe significant differences between offline results and the outcome of an online experiment, where users are actually exposed to the resulting recommendations. This is unfortunate because online experiments take time, can be costly, and require access to a live recommender system, when offline experiments are inherently scalable. How can we bridge that gap between offline and online experiments?

2nd FATREC workshop: responsible recommendation

The second Workshop on Responsible Recommendation (FATREC 2018) was held in conjunction with the 12th ACM Conference on Recommender Systems on October 6th, 2018 in Vancouver, Canada. This full-day workshop brought together researchers and practitioners to discuss several topics under the banner of social responsibility in recommender systems: fairness, accountability, transparency, privacy, and other ethical and social concerns.

Third international workshop on health recommender systems (healthrecsys 2018)

The 3rd International Workshop on Health Recommender Systems was held in conjunction with the 2018 ACM Conference on Recommender Systems in Vancouver, Canada. Following the two prior workshops in 2016 [4] and 2017 [2], the focus of this workshop is to deepen the discussion on health promotion, health care as well as health related methods. This workshop also aims to strengthen the HealthRecSys community, to engage representatives of other health domains into cross-domain collaborations, and to exchange and share infrastructure.

Recsys'18 joint workshop on interfaces and human decision making for recommender systems

As an interactive intelligent system, recommender systems are developed to give recommendations that match users' preferences. Since the emergence of recommender systems, a large majority of research focuses on objective accuracy criteria and less attention has been paid to how users interact with the system and the efficacy of interface designs from users' perspectives. The field has reached a point where it is ready to look beyond algorithms, into users' interactions, decision making processes, and overall experience. his workshop will focus on the "human side" of recommender systems research. The workshop goal is to improve users' overall experience with recommender systems by integrating different theories of human decision making into the construction of recommender systems and exploring better interfaces for recommender systems.

Knowledge-aware and conversational recommender systems

More and more precise and powerful recommendation algorithms and techniques have been proposed over the last years able to effectively assess users' tastes and predict information that would probably be of interest for them. Most of these approaches rely on the collaborative paradigm (often exploiting machine learning techniques) and do not take into account the huge amount of knowledge, both structured and non-structured ones, describing the domain of interest for the recommendation engine. The aim of knowledge-aware and conversational recommender systems is to go beyond the traditional accuracy goal and to start a new generation of algorithms and interactive approaches which exploit the knowledge encoded in ontological and logic-based knowledge bases, knowledge graphs as well as the semantics emerging from the analysis and exploitation of semi-structured textual sources.

The 2nd workshop on intelligent recommender systems by knowledge transfer & learning (recsysKTL)

Having data from multiple sources, cross-domain and context-aware recommender systems, with the help of transfer learning approaches, aim to integrate such data to improve recommendation quality and alleviate issues such as cold-start problem. With the advantages of these techniques, we host the second international workshop on intelligent recommender systems by knowledge transfer and learning (RecSysKTL) to provide such a forum for both academia and industry researchers as well as application developers from around the world to present their work and discuss exciting research ideas or outcomes. The workshop is held in conjunction with the ACM Conference on Recommender Systems 2018 on October 6th in Vancouver, Canada.

ACM recsys workshop on recommenders in tourism (rectour 2018)

The Workshop on Recommenders in Tourism (RecTour) 2018, which is held in conjunction with the 12th ACM Conference on Recommender Systems (RecSys), addresses specific challenges for recommender systems within the tourism domain. In this paper, we summarize our motivations to organize this workshop and give an overview of the submissions that we received. The topics of this year's workshop include points-of-interest (POI), hotel and airline recommendations, recommending composite items such as POI sequences, group recommender systems, context-aware recommendation, decision making, user interaction issues, explanations and evaluation of tourism recommenders.

Recsys challenge 2018: automatic music playlist continuation

The ACM Recommender Systems Challenge 2018 focused on automatic music playlist continuation, which is a form of the more general task of sequential recommendation. Given a playlist of arbitrary length, the challenge was to recommend up to 500 tracks that fit the target characteristics of the original playlist. For the Challenge, Spotify released a dataset of one million user-created playlists, along with associated metadata. Participants could submit their approaches in two tracks, i.e., main and creative tracks, where the former allowed teams to use solely the provided dataset and the latter allowed them to exploit publicly available external data too. In total, 113 teams submitted 1,228 runs in the main track; 33 teams submitted 239 runs in the creative track. The highest performing team in the main track achieved an R-precision of 0.2241, an NDCG of 0.3946, and an average number of recommended songs clicks of 1.784. In the creative track, an R-precision of 0.2233, an NDCG of 0.3939, and a click rate of 1.785 was realized by the best team.

ACM recsys'18 late-breaking results (posters)

The ACM RecSys'18 Late-Breaking Results track (previously known as the Poster track) is part of the main program of the 2018 ACM Conference on Recommender Systems in Vancouver, Canada. The track attracted 48 submissions this year out of which 18 papers could be accepted resulting in an acceptance rated of 37.5%.

TUTORIAL SESSION: Tutorials

Concept to code: learning distributed representation of heterogeneous sources for recommendation

Recommender Systems fuel e-commerce. Deep Learning techniques have started to make an impact in building recommenders. Many techniques have been proposed recently to create low dimensional embeddings of heterogeneous sources including users, items, text and images that can capture the semantic relationships between them. Such combined embeddings play a very important role in the effectiveness of a Recommender System.

Restricted Boltzmann machines were the first to use deep learning successfully for collaborative filtering. We provide a conceptual understanding of the fundamental Deep Learning architectures including MultiLayer Perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) which are used in recommendation systems.

Modularizing deep neural network-inspired recommendation algorithms

This tutorial reviews recent developments of deep neural network-based recommendation algorithms and demonstrates how to extend and adapt such algorithms for diverse application scenarios. The customization is supported by OpenRec framework that modularizes neural recommenders. The tutorial consists of a lecture and two hands-on sessions. It targets intermediate and advanced audiences who already possess knowledge of deep neural networks and are interested in applying those knowledge to the domain of recommendation. Materials are available at: http://openrec.ai/

Emotions and personality in recommender systems: tutorial

This tutorial addresses the acquisition of emotions and personality for recommender systems. It is composed of two parts: (i) a short theoretical overview of emotions and personality and (ii) a hands-on part, in which we will learn how to build an end-to-end system for acquiring personality and emotions for recommender systems.

Multimedia recommender systems

This tutorial introduces multimedia recommender systems (MMRS), in particular, recommender systems that leverage multimedia content to recommend different media types. In contrast to the still most frequently adopted collaborative filtering approaches, we focus on content-based MMRS and on hybrids of collaborative filtering and content-based filtering. The target recommendation domains of the tutorial are movies, music and images. We present state-of-the-art approaches for multimedia feature extraction (text, audio, visual), including deep learning methods, and recommendation approaches tailored to the multimedia domain. Furthermore, by introducing common evaluation techniques, pointing to publicly available datasets specific to the multimedia domain, and discussing the grand challenges in MMRS research, this tutorial provides the audience with a profound introduction to MMRS and an inspiration to conduct further research.

Sequence-aware recommendation

In recent years, more and more recommendation algorithms have been proposed that are based on time-ordered user interaction logs. Algorithms for session-based recommendation tasks are among the most prominent examples of such approaches.

Differently from the more traditional matrix completion algorithms, where for each user-item pair only one interaction (e.g., a rating) is considered, sequence-aware algorithms are typically designed to learn sequential patterns from user behavior data. These patterns can then be used to predict the user's next action within an ongoing session or to detect short-term trends in the community.

In this tutorial, we first outline the application areas of sequence-aware recommendation. We then focus on sequential and session-based recommendation techniques and discuss algorithmic proposals as well as evaluation challenges. Finally, the tutorial will be concluded by an hands-on session.

Mixed methods for evaluating user satisfaction

Evaluation is a fundamental part of a recommendation system. Evaluation typically takes one of three forms: (1) smaller lab studies with real users; (2) batch tests with offline collections, judgements, and measures; (3) large-scale controlled experiments (e.g. A/B tests) looking at implicit feedback. But it is rare for the first to inform and influence the latter two; in particular, implicit feedback metrics often have to be continuously revised and updated as assumptions are found to be poorly supported.

Mixed methods research enables practitioners to develop robust evaluation metrics by combining strengths of both qualitative and quantitative approaches. In this tutorial, we will show how qualitative research on user behavior provides insight on the relationship between implicit signals and satisfaction. These insights can inform and augment quantitative modeling and analysis for online and offline metrics and evaluation.

SESSION: Doctoral symposium

Testing a recommender system for self-actualization

Traditionally, recommender systems are built with the goal of aiding users' decision-making process by extrapolating what they like and what they have done to predict what they want next. However, in attempting to personalize the suggestions to users' preferences, these systems create an isolated universe of information for each user, which may limit their perspectives and promote complacency. In this paper, we describe our research plan to test a novel approach to recommender systems that goes beyond "good recommendations" that supports user aspirations and exploration.

Comparing recommender systems using synthetic data

In this work, we propose SynRec, a data protection framework that uses data synthesis. The goal is to protect sensitive information in the user-item matrix by replacing the original values with synthetic values or, alternatively, completely synthesizing new users. The synthetic data must fulfill two requirements. First, it must no longer be possible to derive certain sensitive information from the data, and, second, it must remain possible to use the synthetic data for comparing recommender systems. SynRec is a step towards making it possible for companies to release recommender system data to the research community for the development of new algorithms, for example, in the context of recommender system challenges. We report the results of preliminary experiments, which provide a proof-of-concept, and also describe the future research directions, i.e., the challenges that must be addressed in order to make the framework useful in practice.

Towards the next generation of multi-criteria recommender systems

This paper presents the motivation, concepts, ideas and research questions underlying a PhD research project in the domain of recommender systems, and more specifically on multi-criteria recommendation. While we build on the existing work in this direction, we aim at introducing recommendation frameworks that do not only optimize for different criteria simultaneously, but also exploit their interrelations. For this aim, we will address three multi-criteria recommendation challenges, namely multi-modal user and item modeling, package recommendation, and user-centric recommendation. For realizing these frameworks, and in particular, for learning interactions and interrelations in the criteria space, we will rely on the state-of-the-art deep learning systems, and in particular the Generative Adversarial Networks (GANs). In addition, a novel evaluation strategy for multi-criteria recommendation targeting the maximization of the user's satisfaction will also be devised.

SeRenA: a semantic recommender for all

The growth of data available on the Web, especially through social networks and business transactions, has served as a driving force for the development of recommender systems. Although there are many techniques in the literature, these systems suffer from some problems, including the well-known cold start problem. This problem is related to the recommendations of new elements or new users when there is no initial knowledge base. In this study we propose a solution to this and other problems based on use of semantic. We present SeRenA (Semantic Recommender for All), an unsupervised recommending strategy based on extraction of initial interests through online data (eg. posted-message and friendship) and mapped onto a number of Wikipedia documents. We introduce the methods and techniques we plan to apply to discover new items over ambiguous knowledge base.

Using textual summaries to describe a set of products

When customers are faced with the task of making a purchase in an unfamiliar product domain, it might be useful to provide them with an overview of the product set to help them understand what they can expect. In this paper we present and evaluate a method to summarise sets of products in natural language, focusing on the price range, common product features across the set, and product features that impact on price. In our study, participants reported that they found our summaries useful, but we found no evidence that the summaries influenced the selections made by participants.

Video recommendation using crowdsourced time-sync comments

Most existing work on video recommendation focuses on recommending a video as a whole, largely due to the unavailability of semantic information on video shot-level. Recently a new type of video comments has emerged, called time-sync comments, that are posted by users in real playtime of a video, thus each has a timestamp relative to the video playtime. In the present paper, we propose to utilize time-sync comments for three research tasks that are infeasible or difficult to tackle in the past, namely (1) video clustering based on temporal user emotional/topic trajectory inside a video; (2) video highlight shots recommendation unsupervisedly; (3) personalized video shot recommendation tailored to user moods. We analyze characteristics of time-sync comments, and propose feasible solutions for each research task. For task (1), we propose a deep recurrent auto-encoder framework coupled with dictionary learning to model user emotional/topical trajectories in a video. For task (2), we propose a scoring method based on emotional/topic concentration in time-sync comments for candidate highlight shot ranking. For task (3), we propose a joint deep collaborative filtering network that optimizes ranking loss and classification loss simultaneously. Evaluation methods and preliminary experimental results are also reported. We plan to further refine our models for task (1) and (3) as our next step.

Beyond the top-N: algorithms that generate recommendations for self-actualization

Recommender systems traditionally provide users with recommendations that match their preferences, which creates a personalized user experience and increases users' satisfaction. However, recommendations from traditional systems may sometimes be considered too personalized, which isolates users from a diversity of perspectives, content, and experiences, and thus make them less likely to discover new things. To overcome this drawback, we argue that recommenders should more actively keep the user "in-the-loop" by providing alternative recommendation lists that go beyond the traditional Top-N list. Such Recommender Systems for Self-Actualization follow a more holistic human-centered personalization practice by supporting users in developing, exploring and understanding their unique tastes and preferences. In this paper, we discuss a series of algorithms that generate four new recommendation lists. These lists enable the recommender to gain a more holistic view of the user and also allow the user to learn more about themselves.

CHAMELEON: a deep learning meta-architecture for news recommender systems

News recommender systems are aimed to personalize users experiences and help them discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling, fast growing number of items, accelerated item's value decay, and users preferences dynamic shift.

Deep Learning (DL) have achieved a great success in complex domains, such as computer vision, Natural Language Processing (NLP), machine translation, speech recognition, and reinforcement learning. Therefore, it became a mainstream approach in Recommender Systems research only since 2016.

The main objective of this research is the investigation, design, implementation and evaluation of a Meta-Architecture for personalized news recommendations using deep neural networks.

As information about users' past interactions is scarce in such cold-start scenario, user context and session information are explicitly modeled, as well as past user sessions, when available. Users' past behaviors and item features are both considered in an hybrid session-aware recommendation approach. The recommendation task addressed in this work is next-item prediction for user sessions: "what is the next most likely article a user might read in a session?"

This paper presents the research methodology for this Doctoral research, the proposed Meta-Architecture and some preliminary results, as well as the next research challenges.