RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems

Full Citation in the ACM Digital Library

SESSION: Invited keynotes

Rude awakenings from behaviourist dreams. Methodological integrity and the GDPR

Recommendations are meant to increase sales or ad revenue, since this is the first priority of those who pay for them. As recommender systems match their recommendations with inferred preferences, we should not be surprised if the algorithm optimises for lucrative preferences and thus co-produces the preferences they mine. In this talk I will explain how the GDPR will help to break through this vicious circle, by constraining how people may be targeted.

Whose data traces, whose voices? Inequality in online participation and why it matters for recommendation systems research

As research relies on data traces about people's online behavior, it is important to take a step back and ask: who uses the systems where these traces appear? This talk will discuss online participation from a digital-inequality perspective showing how differences in online behavior vary by socio-demographic characteristics as well as people's Internet skills. The presentation breaks down the various steps necessary for engagement - the pipeline of online participation - and shows that different factors explain different parts of the pipeline with skills mattering at all stages. Drawing on several data sets, the talk explores whose traces are most likely to show up on various systems and what this means for potential biases in what researchers draw from analyzing digital trace data.

SESSION: Ranking and deep learning in recommenders

Personalized re-ranking for recommendation

Ranking is a core task in recommender systems, which aims at providing an ordered list of items to users. Typically, a ranking function is learned from the labeled dataset to optimize the global performance, which produces a ranking score for each individual item. However, it may be sub-optimal because the scoring function applies to each item individually and does not explicitly consider the mutual influence between items, as well as the differences of users' preferences or intents. Therefore, we propose a personalized re-ranking model for recommender systems. The proposed re-ranking model can be easily deployed as a follow-up modular after any ranking algorithm, by directly using the existing ranking feature vectors. It directly optimizes the whole recommendation list by employing a transformer structure to efficiently encode the information of all items in the list. Specifically, the Transformer applies a self-attention mechanism that directly models the global relationships between any pair of items in the whole list. We confirm that the performance can be further improved by introducing pre-trained embedding to learn personalized encoding functions for different users. Experimental results on both offline benchmarks and real-world online e-commerce systems demonstrate the significant improvements of the proposed re-ranking model.

Online ranking combination

As a task of high importance for recommender systems, we consider the problem of learning the convex combination of ranking algorithms by online machine learning. In the case of two base rankers, we show that the exponentially weighted combination achieves near optimal performance. However, the number of required points to be evaluated may be prohibitive with more base models in a real application. We propose a gradient based stochastic optimization algorithm that uses finite differences. Our new algorithm achieves similar empirical performance for two base rankers, while scaling well with an increased number of models. In our experiments with five real-world recommendation data sets, we show that the combination offers significant improvement over previously known stochastic optimization techniques. Our algorithm is the first effective stochastic optimization method for combining ranked recommendation lists by online machine learning.

A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation

Recommendation with multiple objectives is an important but difficult problem, where the coherent difficulty lies in the possible conflicts between objectives. In this case, multi-objective optimization is expected to be Pareto efficient, where no single objective can be further improved without hurting the others. However existing approaches to Pareto efficient multi-objective recommendation still lack good theoretical guarantees.

In this paper, we propose a general framework for generating Pareto efficient recommendations. Assuming that there are formal differentiable formulations for the objectives, we coordinate these objectives with a weighted aggregation. Then we propose a condition ensuring Pareto efficiency theoretically and a two-step Pareto efficient optimization algorithm. Meanwhile the algorithm can be easily adapted for Pareto Frontier generation and fair recommendation selection. We specifically apply the proposed framework on E-Commerce recommendation to optimize GMV and CTR simultaneously. Extensive online and offline experiments are conducted on the real-world E-Commerce recommender system and the results validate the Pareto efficiency of the framework.

To the best of our knowledge, this work is among the first to provide a Pareto efficient framework for multi-objective recommendation with theoretical guarantees. Moreover, the framework can be applied to any other objectives with differentiable formulations and any model with gradients, which shows its strong scalability.

From preference into decision making: modeling user interactions in recommender systems

User-system interaction in recommender systems involves three aspects: temporal browsing (viewing recommendation lists and/or searching/filtering), action (performing actions on recommended items, e.g., clicking, consuming) and inaction (neglecting or skipping recommended items). Modern recommenders build machine learning models from recordings of such user interaction with the system, and in doing so they commonly make certain assumptions (e.g., pairwise preference orders, independent or competitive probabilistic choices, etc.). In this paper, we set out to study the effects of these assumptions along three dimensions in eight different single models and three associated hybrid models on a user browsing data set collected from a real-world recommender system application. We further design a novel model based on recurrent neural networks and multi-task learning, inspired by Decision Field Theory, a model of human decision making. We report on precision, recall, and MAP, finding that this new model outperforms the others.

Deep generative ranking for personalized recommendation

Recommender systems offer critical services in the age of mass information. Personalized ranking has been attractive both for content providers and customers due to its ability of creating a user-specific ranking on the item set. Although the powerful factor-analysis methods including latent factor models and deep neural network models have achieved promising results, they still suffer from the challenging issues, such as sparsity of recommendation data, uncertainty of optimization, and etc. To enhance the accuracy and generalization of recommender system, in this paper, we propose a deep generative ranking (DGR) model under the Wasserstein autoencoder framework. Specifically, DGR simultaneously generates the pointwise implicit feedback data (via a Beta-Bernoulli distribution) and creates the pairwise ranking list by sufficient exploiting both interacted and non-interacted items for each user. DGR can be efficiently inferred by minimizing its penalized evidence lower bound. Meanwhile, we theoretically analyze the generalization error bounds of DGR model to guarantee its performance in extremely sparse feedback data. A series of experiments on four large-scale datasets (Movielens (20M), Netflix, Epinions and Yelp in movie, product and business domains) have been conducted. By comparing with the state-of-the-art methods, the experimental results demonstrate that DGR consistently benefit the recommendation system in ranking estimation task, especially for the near-cold-start-users (with less than five interacted items).

Recommending what video to watch next: a multitask ranking system

In this paper, we introduce a large scale multi-objective ranking system for recommending what video to watch next on an industrial video sharing platform. The system faces many real-world challenges, including the presence of multiple competing ranking objectives, as well as implicit selection biases in user feedback. To tackle these challenges, we explored a variety of soft-parameter sharing techniques such as Multi-gate Mixture-of-Experts so as to efficiently optimize for multiple ranking objectives. Additionally, we mitigated the selection biases by adopting a Wide & Deep framework. We demonstrated that our proposed techniques can lead to substantial improvements on recommendation quality on one of the world's largest video sharing platforms.

SESSION: User side of recommender systems

Users in the loop: a psychologically-informed approach to similar item retrieval

Recommender systems (RS) often leverage information about the similarity between items' features to make recommendations. Yet, many commonly used similarity functions make mathematical assumptions such as symmetry (i.e., Sim(a, b) = Sim(b, a)) that are inconsistent with how humans make similarity judgments. Moreover, most algorithm validations either do not directly measure users' behavior or fail to comply with methodological standards for psychological research. RS that are developed and evaluated without regard to users' psychology may fail to meet users' needs. To provide recommendations that do meet the needs of users, we must: 1) develop similarity functions that account for known properties of human cognition, and 2) rigorously evaluate the performance of these functions using methodologically sound user testing. Here, we develop a framework for evaluating users' judgments of similarity that is informed by best practices in psychological research methods. Employing users' fashion item similarity judgments collected using our framework, we demonstrate that a psychologically-informed similarity function (i.e., Tversky contrast model) outperforms a psychologically-naive similarity function (i.e., Jaccard similarity) in predicting users' similarity judgments.

Explaining and exploring job recommendations: a user-driven approach for interacting with knowledge-based job recommender systems

The dynamics of the labor market and the tasks with which jobs are being composed are continuously evolving. Job mobility is not evident, and providing effective recommendations in this context has also been found to be particularly challenging. In this paper, we present Labor Market Explorer, an interactive dashboard that enables job seekers to explore the labor market in a personalized way based on their skills and competences. Through a user-centered design process involving job seekers and job mediators, we developed this dashboard to enable job seekers to explore job recommendations and their required competencies, as well as how these competencies map to their profile. Evaluation results indicate the dashboard empowers job seekers to explore, understand, and find relevant vacancies, mostly independent of their background and age.

Designing for the better by taking users into account: a qualitative evaluation of user control mechanisms in (news) recommender systems

Recommender systems (RS) are on the rise in many domains. While they offer great promises, they also raise concerns: lack of transparency, reduction of diversity, little to no user control. In this paper, we align with the normative turn in computer science which scrutinizes the ethical and societal implications of RS. We focus and elaborate on the concept of user control because that mitigates multiple problems at once. Taking the news industry as our domain, we conducted four focus groups, or moderated think-aloud sessions, with Dutch news readers (N=21) to systematically study how people evaluate different control mechanisms (at the input, process, and output phase) in a News Recommender Prototype (NRP). While these mechanisms are sometimes met with distrust about the actual control they offer, we found that an intelligible user profile (including reading history and flexible preferences settings), coupled with possibilities to influence the recommendation algorithms is highly valued, especially when these control mechanisms can be operated in relation to achieving personal goals. By bringing (future) users' perspectives to the fore, this paper contributes to a richer understanding of why and how to design for user control in recommender systems.

Efficient privacy-preserving recommendations based on social graphs

Many recommender systems use association rules mining, a technique that captures relations between user interests and recommends new probable ones accordingly. Applying association rule mining causes privacy concerns as user interests may contain sensitive personal information (e.g., political views). This potentially even inhibits the user from providing information in the first place. Current distributed privacy-preserving association rules mining (PPARM) approaches use cryptographic primitives that come with high computational and communication costs, rendering PPARM unsuitable for large-scale applications such as social networks. We propose improvements in the efficiency and privacy of PPARM approaches by minimizing the required data. We propose and compare sampling strategies to sample the data based on social graphs in a privacy-preserving manner. The results on real-world datasets show that our sampling-based approach can achieve a high average precision score with as low as 50% sampling rate and, therefore, with a 50% reduction of communication cost.

PrivateJobMatch: a privacy-oriented deferred multi-match recommender system for stable employment

Coordination failure reduces match quality among employers and candidates in the job market, resulting in a large number of unfilled positions and/or unstable, short-term employment. Centralized job search engines provide a platform that connects directly employers with job-seekers. However, they require users to disclose a significant amount of personal data, i.e., build a user profile, in order to provide meaningful recommendations. In this paper, we present PrivateJobMatch - a privacy-oriented deferred multi-match recommender system - which generates stable pairings while requiring users to provide only a partial ranking of their preferences. PrivateJobMatch explores a series of adaptations of the game-theoretic Gale-Shapley deferred acceptance algorithm which combine the flexibility of decentralized markets with the intelligence of centralized matching. We identify the shortcomings of the original algorithm when applied to a job market and propose novel solutions that rely on machine learning techniques. Experimental results on real and synthetic data confirm the benefits of the proposed algorithms across several quality measures. Over the past year, we have implemented a PrivateJobMatch prototype and deployed it in an active job market economy. Using the gathered real-user preference data, we find that the match recommendations are superior to a typical decentralized job market---while requiring only a partial ranking of the user preferences.

User-centered evaluation of strategies for recommending sequences of points of interest to groups

Most recommender systems (RSs) predict the preferences of individual users; however, in certain scenarios, recommendations need to be made for a group of users. Tourism is a popular domain for group recommendations because people often travel in groups and look for point of interest (POI) sequences for their visits during a trip. In this study, we present different strategies that can be used to recommend POI sequences for groups. In addition, we introduce novel approaches, including a strategy called Split Group, which allows groups to split into smaller groups during a trip. We compared all strategies in a user study with 40 real groups. Our results proved that there was a significant difference in the quality of recommendations generated by using the different strategies. Most groups were willing to split temporarily during a trip, even when they were traveling with persons close to them. In this case, Split Group generated the best recommendations for different evaluation criteria. We use these findings to propose improvements for group recommendation strategies in the tourism domain.

SESSION: Deep learning for recommender systems

Are we really making much progress? A worrying analysis of recent neural recommendation approaches

Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models.

In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area.

A deep learning system for predicting size and fit in fashion e-commerce

Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation.

Relaxed softmax for PU learning

In recent years, the softmax model and its fast approximations have become the de-facto loss functions for deep neural networks when dealing with multi-class prediction. This loss has been extended to language modeling and recommendation, two fields that fall into the framework of learning from Positive and Unlabeled data.

In this paper, we stress the different drawbacks of the current family of softmax losses and sampling schemes when applied in a Positive and Unlabeled learning setup. We propose both a Relaxed Softmax loss (RS) and a new negative sampling scheme based on Boltzmann formulation. We show that the new training objective is better suited for the tasks of density estimation, item similarity and next-event prediction by driving uplifts in performance on textual and recommendation datasets against classical softmax.

Style conditioned recommendations

We propose Style Conditioned Recommendations (SCR) and introduce style injection as a method to diversify recommendations. We use Conditional Variational Autoencoder (CVAE) architecture, where both the encoder and decoder are conditioned on a user profile learned from item content data. This allows us to apply style transfer methodologies to the task of recommendations, which we refer to as injection. To enable style injection, user profiles are learned to be interpretable such that they express users' propensities for specific predefined styles. These are learned via label-propagation from a dataset of item content, with limited labeled points. To perform injection, the condition on the encoder is learned while the condition on the decoder is selected per explicit feedback. Explicit feedback can be taken either from a user's response to a style or interest quiz, or from item ratings. In the absence of explicit feedback, the condition at the encoder is applied to the decoder. We show a 12% improvement on NDCG@20 over the traditional VAE based approach on the task of recommendations. We show an average 22% improvement on AUC across all classes for predicting user style profiles against our best performing baseline. After injecting styles we compare the user style profile to the style of the recommendations and show that injected styles have an average +133% increase in presence. Our results show that style injection is a powerful method to diversify recommendations while maintaining personal relevance. Our main contribution is an application of a semi-supervised approach that extends item labels to interpretable user profiles.

Deep language-based critiquing for recommender systems

Critiquing is a method for conversational recommendation that adapts recommendations in response to user preference feedback regarding item attributes. Historical critiquing methods were largely based on constraint- and utility-based methods for modifying recommendations w.r.t. these critiqued attributes. In this paper, we revisit the critiquing approach from the lens of deep learning based recommendation methods and language-based interaction. Concretely, we propose an end-to-end deep learning framework with two variants that extend the Neural Collaborative Filtering architecture with explanation and critiquing components. These architectures not only predict personalized keyphrases for a user and item but also embed language-based feedback in the latent space that in turn modulates subsequent critiqued recommendations. We evaluate the proposed framework on two recommendation datasets containing user reviews. Empirical results show that our modified NCF approach not only provides a strong baseline recommender and high-quality personalized item keyphrase suggestions, but that it also properly suppresses items predicted to have a critiqued keyphrase. In summary, this paper provides a first step to unify deep recommendation and language-based feedback in what we hope to be a rich space for future research in deep critiquing for conversational recommendation.

Predictability limits in session-based next item recommendation

Session-based recommendations are based on the user's recent actions, for example, the items they have viewed during the current browsing session or the sightseeing places they have just visited. Closely related is sequence-aware recommendation, where the choice of the next item should follow from the sequence of previous actions.

We study seven benchmarks for session-based recommendation, covering retail, music and news domains to investigate how accurately user behavior can be predicted from the session histories. We measure the entropy rate of the data and estimate the limit of predictability to be between 44% and 73% in the included datasets.

We establish some algorithm-specific limits on prediction accuracy for Markov chains, association rules and k-nearest neighbors methods. With most of the analyzed methods, the algorithm design limits their performance with sparse training data. The session based k-nearest neighbors are least restricted in comparison and have room for improvement across all of the analyzed datasets.

SESSION: Recommendation in advertising, promotions, intent and search

A comparison of calibrated and intent-aware recommendations

Calibrated and intent-aware recommendation are recent approaches to recommendation that have apparent similarities. Both try, to a certain extent, to cover the user's interests, as revealed by her user profile. In this paper, we compare them in detail. On two datasets, we show the extent to which intent-aware recommendations are calibrated and the extent to which calibrated recommendations are diverse. We consider two ways of defining a user's interests, one based on item features, the other based on subprofiles of the user's profile. We find that defining interests in terms of subprofiles results in highest precision and the best relevance/diversity trade-off. Along the way, we define a new version of calibrated recommendation and three new evaluation metrics.

LORE: a large-scale offer recommendation engine with eligibility and capacity constraints

Businesses, such as Amazon, department store chains, home furnishing store chains, Uber, and Lyft, frequently offer deals, product discounts and incentives to drive sales, increase new product acceptance and engage with users. In order to appeal to diverse user groups, these businesses typically design more than one promotion offer but market different ones to different users. For instance, Uber offers a percentage discount in the rides to some users and a low fixed price to others. In this paper, we propose solutions to optimally recommend promotions and items to maximize user conversion constrained by user eligibility and item or offer capacity (limited quantity of items or offers) simultaneously. We achieve this through an offer recommendation model based on Min-Cost Flow network optimization, which enables us to satisfy the constraints within the optimization itself and solve it in polynomial time. We present two approaches that can be used in various settings: single period solution and sequential time period offering. We evaluate these approaches against competing methods using counterfactual evaluation in offline mode. We also discuss three practical aspects that may affect the online performance of constrained optimization: capacity determination, traffic arrival pattern and clustering for large scale setting.

FiBiNET: combining feature importance and bilinear feature interaction for click-through rate prediction

Advertising and feed ranking are essential to many Internet companies such as Facebook and Sina Weibo. Among many real-world advertising and feed ranking systems, click through rate (CTR) prediction plays a central role. There are many proposed models in this field such as logistic regression, tree based models, factorization machine based models and deep learning based CTR models. However, many current works calculate the feature interactions in a simple way such as Hadamard product and inner product and they care less about the importance of features. In this paper, a new model named FiBiNET as an abbreviation for Feature Importance and Bilinear feature Interaction NETwork is proposed to dynamically learn the feature importance and fine-grained feature interactions. On the one hand, the FiBiNET can dynamically learn the importance of features via the Squeeze-Excitation network (SENET) mechanism; on the other hand, it is able to effectively learn the feature interactions via bilinear function. We conduct extensive experiments on two real-world datasets and show that our shallow model outperforms other shallow models such as factorization machine(FM) and field-aware factorization machine(FFM). In order to improve performance further, we combine a classical deep neural network(DNN) component with the shallow model to be a deep model. The deep FiBiNET consistently outperforms the other state-of-the-art deep models such as DeepFM and extreme deep factorization machine(XdeepFM).

Domain adaptation in display advertising: an application for partner cold-start

Digital advertisements connects partners (sellers) to potentially interested online users. Within the digital advertisement domain, there are multiple platforms, e.g., user re-targeting and prospecting. Partners usually start with re-targeting campaigns and later employ prospecting campaigns to reach out to untapped customer base. There are two major challenges involved with prospecting. The first challenge is successful on-boarding of a new partner on the prospecting platform, referred to as partner cold-start problem. The second challenge revolves around the ability to leverage large amounts of re-targeting data for partner cold-start problem.

In this work, we study domain adaptation for the partner cold-start problem. To this end, we propose two domain adaptation techniques, SDA-DANN and SDA-Ranking. SDA-DANN and SDA-Ranking extend domain adaptation techniques for partner cold-start by incorporating sub-domain similarities (product category level information). Through rigorous experiments, we demonstrate that our method SDA-DANN outperforms baseline domain adaptation techniques on real-world dataset, obtained from a major online advertiser. Furthermore, we show that our proposed technique SDA-Ranking outperforms baseline methods for low CTR partners.

Addressing delayed feedback for continuous training with neural networks in CTR prediction

One of the challenges in display advertising is that the distribution of features and click through rate (CTR) can exhibit large shifts over time due to seasonality, changes to ad campaigns and other factors. The predominant strategy to keep up with these shifts is to train predictive models continuously, on fresh data, in order to prevent them from becoming stale. However, in many ad systems positive labels are only observed after a possibly long and random delay. These delayed labels pose a challenge to data freshness in continuous training: fresh data may not have complete label information at the time they are ingested by the training algorithm. Naive strategies which consider any data point a negative example until a positive label becomes available tend to underestimate CTR, resulting in inferior user experience and suboptimal performance for advertisers. The focus of this paper is to identify the best combination of loss functions and models that enable large-scale learning from a continuous stream of data in the presence of delayed labels. In this work, we compare 5 different loss functions, 3 of them applied to this problem for the first time. We benchmark their performance in offline settings on both public and proprietary datasets in conjunction with shallow and deep model architectures. We also discuss the engineering cost associated with implementing each loss function in a production environment. Finally, we carried out online experiments with the top performing methods, in order to validate their performance in a continuous training scheme. While training on 668 million in-house data points offline, our proposed methods outperform previous state-of-the-art by 3% relative cross entropy (RCE). During online experiments, we observed 55% gain in revenue per thousand requests (RPMq) against naive log loss.

Ghosting: contextualized inline query completion in large scale retail search

Query auto-completion presents a ranked list of queries as suggestions for a user-entered prefix. Ghosting is the process of auto-completing a search recommendation by highlighting the suggested text inline within the search box. We propose the use of a behavior-based recommendation model along with customer search context to ghost on high-confidence queries. We tested ghosting on a retail production system, on over 140 million search sessions. We found that session-context based ghosting significantly increased the acceptance of offered suggestions by 6.18%, reduced misspellings among searches by 4.42%, and improved net sales by 0.14%.

SESSION: Application of recommenders in personal needs

Collective embedding for neural context-aware recommender systems

Context-aware recommender systems consider contextual features as additional information to predict user's preferences. For example, the recommendations could be based on time, location, or the company of other people. Among the contextual information, time became an important feature because user preferences tend to change over time or be similar in the near future. Researchers have proposed different models to incorporate time into their recommender system, however, the current models are not able to capture specific temporal patterns. To address the limitation observed in previous works, we propose Collective embedding for Neural Context-Aware Recommender Systems (CoNCARS). The proposed solution jointly model the item, user and time embeddings to capture temporal patterns. Then, CoNCARS use the outer product to model the user-item-time correlations between dimensions of the embedding space. The hidden features feed our Convolutional Neural Networks (CNNs) to learn the non-linearities between the different features. Finally, we combine the output from our CNNs in the fusion layer and then predict the user's preference score. We conduct extensive experiments on real-world datasets, demonstrating CoNCARS improves the top-N item recommendation task and outperform the state-of-the-art recommendation methods.

A recommender system for heterogeneous and time sensitive environment

The digital game industry has recently adopted recommender systems to deliver the most relevant content and suggest the most suitable activities to players. Because of diverse game designs and dynamic experiences, recommender systems typically operate in highly heterogeneous and time-sensitive environments. In this paper, we describe a recommender system at a digital game company which aims to provide recommendations for a large variety of use-cases while being easy to integrate and operate. The system leverages a unified data platform, standardized context and tracking data pipelines, robust naive linear contextual multi-armed bandit algorithms, and experimentation platform for extensibility as well as flexibility. Several games and applications have successfully launched with the recommender system and have achieved significant improvements.

Latent factor models and aggregation operators for collaborative filtering in reciprocal recommender systems

Online dating platforms help to connect people who might potentially be a good match for each other. They have exerted a significant societal impact over the last decade, such that about one third of new relationships in the US are now started online, for instance. Recommender Systems are widely utilized in online platforms that connect people to people in e.g. online dating and recruitment sites. These recommender approaches are fundamentally different from traditional user-item approaches (such as those operating on movie and shopping sites), in that they must consider the interests of both parties jointly. Latent factor models have been notably successful in the area of user-item recommendation, however they have not been investigated within user-to-user domains as of yet. In this study, we present a novel method for reciprocal recommendation using latent factor models. We also provide a first analysis of the use of different preference aggregation strategies, thereby demonstrating that the aggregation function used to combine user preference scores has a significant impact on the outcome of the recommender system. Our evaluation results report significant improvements over previous nearest-neighbour and content-based methods for reciprocal recommendation, and show that the latent factor model can be used effectively on much larger datasets than previous state-of-the-art reciprocal recommender systems.

CB2CF: a neural multiview content-to-collaborative filtering model for completely cold item recommendations

In Recommender Systems research, algorithms are often characterized as either Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained using a dataset of user preferences while CB algorithms are typically based on item profiles. These approaches harness different data sources and therefore the resulting recommended items are generally very different. This paper presents the CB2CF, a deep neural multiview model that serves as a bridge from items content into their CF representations. CB2CF is a "real-world" algorithm designed for Microsoft Store services that handle around a billion users worldwide. CB2CF is demonstrated on movies and apps recommendations, where it is shown to outperform an alternative CB model on completely cold items.

Online learning to rank for sequential music recommendation

The prominent success of music streaming services has brought increasingly complex challenges for music recommendation. In particular, in a streaming setting, songs are consumed sequentially within a listening session, which should cater not only for the user's historical preferences, but also for eventual preference drifts, triggered by a sudden change in the user's context. In this paper, we propose a novel online learning to rank approach for music recommendation aimed to continuously learn from the user's listening feedback. In contrast to existing online learning approaches for music recommendation, we leverage implicit feedback as the only signal of the user's preference. Moreover, to adapt rapidly to preference drifts over millions of songs, we represent each song in a lower dimensional feature space and explore multiple directions in this space as duels of candidate recommendation models. Our thorough evaluation using listening sessions from Last.fm demonstrates the effectiveness of our approach at learning faster and better compared to state-of-the-art online learning approaches.

Pace my race: recommendations for marathon running

We propose marathon running as a novel domain for recommender systems and machine learning. Using high-resolution marathon performance data from multiple marathon races (n = 7931), we build in-race recommendations for runners. We show that we can outperform the existing techniques which are currently employed for in-race finish-time prediction, and we demonstrate how such predictions may be used to make real time recommendations to runners. The recommendations are made at critical points in the race to provide personalised guidance so the runner can adjust their race strategy. Through the association of model features and the expert domain knowledge of marathon runners we generate explainable, adaptable pacing recommendations which can guide runners to their best possible finish time and help them avoid the potentially catastrophic effects of hitting the wall.

SESSION: Algorithms: Large-scale, constraints and evaluation

Efficient similarity computation for collaborative filtering in dynamic environments

The problem of computing all pairwise similarities in a large collection of vectors is a well-known and common data mining task. As the number and dimensionality of these vectors keeps increasing, however, currently existing approaches are often unable to meet the strict efficiency requirements imposed by the environments they need to perform in. Real-time neighbourhood-based collaborative filtering (CF) is one example of such an environment in which performance is critical.

In this work, we present a novel algorithm for efficient and exact similarity computation between sparse, high-dimensional vectors. Our approach exploits the sparsity that is inherent to implicit feedback data-streams, entailing significant gains compared to other methods. Furthermore, as our model learns incrementally, it is naturally suited for dynamic real-time CF environments. We propose a MapReduce-inspired parallellisation procedure along with our method, and show how even more speed-up can be achieved. Additionally, in many real-world systems, many items are actually not recommendable at any given time, due to recency, stock, seasonality, or enforced business rules. We exploit this fact to further improve the computational efficiency of our approach. Experimental evaluation on both real-world and publicly available datasets shows that our approach scales up to millions of processed user-item interactions per second, and well advances the state-of-the-art.

Personalized diffusions for top-n recommendation

This paper introduces PerDif; a novel framework for learning personalized diffusions over item-to-item graphs for top-n recommendation. PerDif learns the teleportation probabilities of a time-inhomogeneous random walk with restarts capturing a user-specific underlying item exploration process. Such an approach can lead to significant improvements in recommendation accuracy, while also providing useful information about the users in the system. Per-user fitting can be performed in parallel and very efficiently even in large-scale settings. A comprehensive set of experiments on real-world datasets demonstrate the scalability as well as the qualitative merits of the proposed framework. PerDif achieves high recommendation accuracy, outperforming state-of-the-art competing approaches---including several recently proposed methods relying on deep neural networks.

Sampling-bias-corrected neural modeling for large corpus item recommendations

Many recommendation systems retrieve and score items from a very large corpus. A common recipe to handle data sparsity and power-law item distribution is to learn item representations from its content features. Apart from many content-aware systems based on matrix factorization, we consider a modeling framework using two-tower neural net, with one of the towers (item tower) encoding a wide variety of item content features. A general recipe of training such two-tower models is to optimize loss functions calculated from in-batch negatives, which are items sampled from a random mini-batch. However, in-batch loss is subject to sampling biases, potentially hurting model performance, particularly in the case of highly skewed distribution. In this paper, we present a novel algorithm for estimating item frequency from streaming data. Through theoretical analysis and simulation, we show that the proposed algorithm can work without requiring fixed item vocabulary, and is capable of producing unbiased estimation and being adaptive to item distribution change. We then apply the sampling-bias-corrected modeling approach to build a large scale neural retrieval system for YouTube recommendations. The system is deployed to retrieve personalized suggestions from a corpus with tens of millions of videos. We demonstrate the effectiveness of sampling-bias correction through offline experiments on two real-world datasets. We also conduct live A/B testings to show that the neural retrieval system leads to improved recommendation quality for YouTube.

Leveraging post-click feedback for content recommendations

Implicit feedback (e.g., clicks) is widely used in content recommendations. However, clicks only reflect user preferences according to their first impressions. They do not capture the extent to which users continue to engage with the content. Our analysis shows that more than half of the clicks on music and short videos are followed by skips from two real-world datasets. In this paper, we leverage post-click feedback, e.g. skips and completions, to improve the training and evaluation of content recommenders. Specifically, we experiment with existing collaborative filtering algorithms and find that they perform poorly against post-click-aware ranking metrics. Based on these insights, we develop a generic probabilistic framework to fuse click and post-click signals. We show how our framework can be applied to improve pointwise and pairwise recommendation models. Our approach is shown to outperform existing methods by 18.3% and 2.5% respectively in terms of Area Under the Curve (AUC) on the short-video and music dataset. We discuss the effectiveness of our approach across content domains and trade-offs in weighting various user feedback signals.

When actions speak louder than clicks: a combined model of purchase probability and long-term customer satisfaction

Maximizing sales and revenue is an important goal of online commercial retailers. Recommender systems are designed to maximize users' click or purchase probability, but often disregard users' eventual satisfaction with purchased items. As result, such systems promote items with high appeal at the selling stage (e.g. an eyecatching presentation) over items that would yield more satisfaction to users in the long run. This work presents a novel unified model that considers both goals and can be tuned to balance between them according to the needs of the business scenario.

We propose a multi-task probabilistic matrix factorization model with a dual task objective: predicting binary purchase/no purchase variables combined with predicting continuous satisfaction scores. Model parameters are optimized using Variational Bayes which allows learning a posterior distribution over model parameters. This model allows making predictions that balance the two goals of maximizing the probability for an immediate purchase and maximizing user satisfaction and engagement down the line. These goals lie at the heart of most commercial recommendation scenarios and enabling their balance has the potential to improve value for millions of users worldwide. Finally, we present experimental evaluation on different types of consumer retail datasets that demonstrate the benefits of the model over popular baselines on a number of well-known ranking metrics.

Uplift-based evaluation and optimization of recommenders

Recommender systems aim to increase user actions such as clicks and purchases. Typical evaluations of recommenders regard the purchase of a recommended item as a success. However, the item may have been purchased even without the recommendation. An uplift is defined as an increase in user actions caused by recommendations. Situations with and without a recommendation cannot both be observed for a specific user-item pair at a given time instance, making uplift-based evaluation and optimization challenging. This paper proposes new evaluation metrics and optimization methods for the uplift in a recommender system. We apply a causal inference framework to estimate the average uplift for the offline evaluation of recommenders. Our evaluation protocol leverages both purchase and recommendation logs under a currently deployed recommender system, to simulate the cases both with and without recommendations. This enables the offline evaluation of the uplift for newly generated recommendation lists. For optimization, we need to define positive and negative samples that are specific to an uplift-based approach. For this purpose, we deduce four classes of items by observing purchase and recommendation logs. We derive the relative priorities among these four classes in terms of the uplift and use them to construct both pointwise and pairwise sampling methods for uplift optimization. Through dedicated experiments with three public datasets, we demonstrate the effectiveness of our optimization methods in improving the uplift.

SESSION: Using side-information and user attributes and cold-start in recommender algorithms

Deep social collaborative filtering

Recommender systems are crucial to alleviate the information overload problem in online worlds. Most of the modern recommender systems capture users' preference towards items via their interactions based on collaborative filtering techniques. In addition to the user-item interactions, social networks can also provide useful information to understand users' preference as suggested by the social theories such as homophily and influence. Recently, deep neural networks have been utilized for social recommendations, which facilitate both the user-item interactions and the social network information. However, most of these models cannot take full advantage of the social network information. They only use information from direct neighbors, but distant neighbors can also provide helpful information. Meanwhile, most of these models treat neighbors' information equally without considering the specific recommendations. However, for a specific recommendation case, the information relevant to the specific item would be helpful. Besides, most of these models do not explicitly capture the neighbor's opinions to items for social recommendations, while different opinions could affect the user differently. In this paper, to address the aforementioned challenges, we propose DSCF, a Deep Social Collaborative Filtering framework, which can exploit the social relations with various aspects for recommender systems. Comprehensive experiments on two-real world datasets show the effectiveness of the proposed framework.

Attribute-aware non-linear co-embeddings of graph features

In very sparse recommender data sets, attributes of users such as age, gender and home location and attributes of items such as, in the case of movies, genre, release year, and director can improve the recommendation accuracy, especially for users and items that have few ratings. While most recommendation models can be extended to take attributes of users and items into account, their architectures usually become more complicated. While attributes for items are often easy to be provided, attributes for users are often scarce for reasons of privacy or simply because they are not relevant to the operational process at hand. In this paper, we address these two problems for attribute-aware recommender systems by proposing a simple model that co-embeds users and items into a joint latent space in a similar way as a vanilla matrix factorization, but with non-linear latent features construction that seamlessly can ingest user or item attributes or both (GraphRec). To address the second problem, scarce attributes, the proposed model treats the user-item relation as a bipartite graph and constructs generic user and item attributes via the Laplacian of the user-item co-occurrence graph that requires no further external side information but the mere rating matrix. In experiments on three recommender datasets, we show that GraphRec significantly outperforms existing state-of-the-art attribute-aware and content-aware recommender systems even without using any side information.

Adversarial attacks on an oblivious recommender

Can machine learning models be easily fooled? Despite the recent surge of interest in learned adversarial attacks in other domains, in the context of recommendation systems this question has mainly been answered using hand-engineered fake user profiles. This paper attempts to reduce this gap. We provide a formulation for learning to attack a recommender as a repeated general-sum game between two players, i.e., an adversary and a recommender oblivious to the adversary's existence. We consider the challenging case of poisoning attacks, which focus on the training phase of the recommender model. We generate adversarial user profiles targeting subsets of users or items, or generally the top-K recommendation quality. Moreover, we ensure that the adversarial user profiles remain unnoticeable by preserving proximity of the real user rating/interaction distribution to the adversarial fake user distribution. To cope with the challenge of the adversary not having access to the gradient of the recommender's objective with respect to the fake user profiles, we provide a non-trivial algorithm building upon zero-order optimization techniques. We offer a wide range of experiments, instantiating the proposed method for the case of the classic popular approach of a low-rank recommender, and illustrating the extent of the recommender's vulnerability to a variety of adversarial intents. These results can serve as a motivating point for more research into recommender defense strategies against machine learned attacks.

HybridSVD: when collaborative information is not enough

We propose a new hybrid algorithm that allows incorporating both user and item side information within the standard collaborative filtering technique. One of its key features is that it naturally extends a simple PureSVD approach and inherits its unique advantages, such as highly efficient Lanczos-based optimization procedure, simplified hyper-parameter tuning and a quick folding-in computation for generating recommendations instantly even in highly dynamic online environments. The algorithm utilizes a generalized formulation of the singular value decomposition, which adds flexibility to the solution and allows imposing the desired structure on its latent space. Conveniently, the resulting model also admits an efficient and straightforward solution for the cold start scenario. We evaluate our approach on a diverse set of datasets and show its superiority over similar classes of hybrid models.

Variational low rank multinomials for collaborative filtering with side-information

We are interested in Bayesian models for collaborative filtering that incorporate side-information or metadata about items in addition to user-item interaction data. We present a simple and flexible framework to build models for this task that exploit the low-rank structure in user-item interaction datasets. Although the resulting models are non-conjugate, we develop an efficient technique for approximating posteriors over model parameters using variational inference. We borrow the "re-parameterization trick" from Bayesian deep learning literature to enable variational inference in our models. The resulting approximate Bayesian inference algorithm is scalable and can handle large scale datasets. We demonstrate our ideas on three real world datasets where we show competitive performance against widely used baselines.

Quick and accurate attack detection in recommender systems through user attributes

Malicious profiles have been a credible threat to collaborative recommender systems. Attackers provide fake item ratings to systematically manipulate the platform. Attack detection algorithms can identify and remove such users by observing rating distributions. In this study, we aim to use the user attributes as an additional information source to improve the accuracy and speed of attack detection. We propose a probabilistic factorization model which can embed mixed data type user attributes and observed ratings into a latent space to generate anomaly statistics for new users. To identify the persistent outliers in the system, we also propose a sequential attack detection algorithm to enable quick and accurate detection based on the probabilistic model learned from genuine users. The proposed model demonstrates significant improvements in both accuracy and speed when compared to baseline algorithms on a popular benchmark dataset.

POSTER SESSION: Short papers with poster presentation

A generative model for review-based recommendations

User generated reviews is a highly informative source of information, that has recently gained lots of attention in the recommender systems community. In this work we propose a generative latent variable model that explains both observed ratings and textual reviews. This latent variable model allows to combine any traditional collaborative filtering method, together with any deep learning architecture for text processing. Experimental results on four benchmark datasets demonstrate its superiority comparing to all baseline recommender systems. Furthermore, a running time analysis shows that this approach is in order of magnitude faster that relevant baselines. Moreover, underlying our solution there is a general framework that may be further explored.

A simple multi-armed nearest-neighbor bandit for interactive recommendation

The cyclic nature of the recommendation task is being increasingly taken into account in recommender systems research. In this line, framing interactive recommendation as a genuine reinforcement learning problem, multi-armed bandit approaches have been increasingly considered as a means to cope with the dual exploitation/exploration goal of recommendation. In this paper we develop a simple multi-armed bandit elaboration of neighbor-based collaborative filtering. The approach can be seen as a variant of the nearest-neighbors scheme, but endowed with a controlled stochastic exploration capability of the users' neighborhood, by a parameter-free application of Thompson sampling. Our approach is based on a formal development and a reasonably simple design, whereby it aims to be easy to reproduce and further elaborate upon. We report experiments using datasets from different domains showing that neighbor-based bandits indeed achieve recommendation accuracy enhancements in the mid to long run.

Adversarial tensor factorization for context-aware recommendation

Contextual factors such as time, location, or tag, can affect user preferences for a particular item. Context-aware recommendations are thus critical to improve both quality and explainability of recommender systems, compared to traditional recommendations that are solely based on user-item interactions. Tensor factorization machines have achieved the state-of-the-art performance due to their capability of integrating users, items, and contextual factors in one unify way. However, few work has focused on the robustness of a context-aware recommender system. Improving the robustness of a tensor-based model is challenging due to the sparsity of the observed tensor and the multi-linear nature of tensor factorization. In this paper, we propose ATF, a model that combines tensor factorization and adversarial learning for context-aware recommendations. Doing so allows us to reap the benefits of tensor factorization, while enhancing the robustness of a recommender model, and thus improves its eventual performance. Empirical studies on two real-world datasets show that the proposed method outperforms standard tensor-based methods.

Aligning daily activities with personality: towards a recommender system for improving wellbeing

Recommender Systems have not been explored to a great extent for improving health and subjective wellbeing. Recent advances in mobile technologies and user modelling present the opportunity for delivering such systems, however the key issue is understanding the drivers of subjective wellbeing at an individual level. In this paper we propose a novel approach for deriving personalized activity recommendations to improve subjective wellbeing by maximizing the congruence between activities and personality traits. To evaluate the model, we leveraged a rich dataset collected in a smartphone study, which contains three weeks of daily activity probes, the Big-Five personality questionnaire and subjective wellbeing surveys. We show that the model correctly infers a range of activities that are 'good' or 'bad' (i.e. that are positively or negatively related to subjective wellbeing) for a given user and that the derived recommendations greatly match outcomes in the real-world.

Asymmetric Bayesian personalized ranking for one-class collaborative filtering

In this paper, we propose a novel preference assumption for modeling users' one-class feedback such as "thumb up" in an important recommendation problem called one-class collaborative filtering (OCCF). Specifically, we address a fundamental limitation of a recent symmetric pairwise preference assumption and propose a novel and first asymmetric one, which is able to make the preferences of different users more comparable. With the proposed asymmetric pairwise preference assumption, we further design a novel recommendation algorithm called asymmetric Bayesian personalized ranking (ABPR). Extensive empirical studies on two large and public datasets show that our ABPR performs significantly better than several state-of-the-art recommendation methods with either pointwise preference assumption or pairwise preference assumption.

Attribute-based evaluation for recommender systems: incorporating user and item attributes in evaluation metrics

Research in Recommender Systems evaluation remains critical to study the efficiency of developed algorithms. Even if different aspects have been addressed and some of its shortcomings - such as biases, robustness, or cold start - have been analyzed and solutions or guidelines have been proposed, there are still some gaps that need to be further investigated. At the same time, the increasing amount of data collected by most recommender systems allows to gather valuable information from users and items which is being neglected by classical offline evaluation metrics. In this work, we integrate such information into the evaluation process in two complementary ways: on the one hand, we aggregate any evaluation metric according to the groups defined by the user attributes, and, on the other hand, we exploit item attributes to consider some recommended items as surrogates of those interacted by the user, with a proper penalization. Our results evidence that this novel evaluation methodology allows to capture different nuances of the algorithms performance, inherent biases in the data, and even fairness of the recommendations.

Combining text summarization and aspect-based sentiment analysis of users' reviews to justify recommendations

In this paper we present a methodology to justify recommendations that relies on the information extracted from users' reviews discussing the available items. The intuition behind the approach is to conceive the justification as a summary of the most relevant and distinguishing aspects of the item, automatically obtained by analyzing its reviews. To this end, we designed a pipeline of natural language processing techniques including aspect extraction, sentiment analysis and text summarization to gather the reviews, process the relevant excerpts, and generate a unique synthesis presenting the main characteristics of the item. Such a summary is finally presented to the target user as a justification of the received recommendation. In the experimental evaluation we carried out a user study in the movie domain (N=141) and the results showed that our approach is able to make the recommendation process more transparent, engaging and trustful for the users.

Compositional network embedding for link prediction

Almost all the existing network embedding methods learn to map the node IDs to their corresponding node embeddings. This design principle, however, hinders the existing methods from being applied in real cases. Node ID is not generalizable and, thus, the existing methods have to pay great effort in cold-start problem. The heterogeneous network usually requires extra work to encode node types, as node type is not able to be identified by node ID. Node ID carries rare information, resulting in the criticism that the existing methods are not robust to noise. To address this issue, we introduce Compositional Network Embedding, a general inductive network representation learning framework that generates node embeddings by combining node features based on the "principle of compositionally". Instead of directly optimizing an embedding lookup based on arbitrary node IDs, we learn a composition function that infers node embeddings by combining the corresponding node attribute embeddings through a graph-based loss. For evaluation, we conduct the experiments on link prediction under three different settings. The results verified the effectiveness and generalization ability of compositional network embeddings, especially on unseen nodes.

Data mining for item recommendation in MOBA games

E-Sports has been positioned as an important activity within MOBA (Multiplayer Online Battle Arena) games in recent years. There is existing research on recommender systems in this topic, but most of it focuses on the character recommendation problem. However, the recommendation of items is also challenging because of its contextual nature, depending on the other characters. We have developed a framework that suggests items for a character based on the match context. The system aims to help players who have recently started the game as well as frequent players to take strategic advantage during a match and to improve their purchasing decision making. By analyzing a dataset of ranked matches through data mining techniques, we can capture purchase dynamic of experienced players to use it to generate recommendations. The results show that our proposed solution yields up to 80% of mAP, suggesting that the method leverages context information successfully. These results, together with open issues we mention in the paper, call for further research in the area.

DualDiv: diversifying items and explanation styles in explainable hybrid recommendation

In recommender systems, item diversification and explainable recommendations improve users' satisfaction. Unlike traditional explainable recommendations that display a single explanation for each item, explainable hybrid recommendations display multiple explanations for each item and are, therefore, more beneficial for users. When multiple explanations are displayed, one problem is that similar sets of explanation styles (ESs) such as user-based, item-based, and popularity-based may be displayed for similar items. Although item diversification has been studied well, the question of how to diversify the ESs remains underexplored. In this paper, we propose a method for diversifying ESs and a framework, called DualDiv, that recommends items by diversifying both the items and the ESs. Our experimental results show that DualDiv can increase the diversity of the items and the ESs without largely reducing the recommendation accuracy.

Enhancing VAEs for collaborative filtering: flexible priors & gating mechanisms

Neural network based models for collaborative filtering have started to gain attention recently. One branch of research is based on using deep generative models to model user preferences where variational autoencoders were shown to produce state-of-the-art results. However, there are some potentially problematic characteristics of the current variational autoencoder for CF. The first is the too simplistic prior that VAEs incorporate for learning the latent representations of user preference. The other is the model's inability to learn deeper representations with more than one hidden layer for each network.

Our goal is to incorporate appropriate techniques to mitigate the aforementioned problems of variational autoencoder CF and further improve the recommendation performance. Our work is the first to apply flexible priors to collaborative filtering and show that simple priors (in original VAEs) may be too restrictive to fully model user preferences and setting a more flexible prior gives significant gains. We experiment with the VampPrior, originally proposed for image generation, to examine the effect of flexible priors in CF. We also show that VampPriors coupled with gating mechanisms outperform SOTA results including the Variational Autoencoder for Collaborative Filtering by meaningful margins on 2 popular benchmark datasets (MovieLens & Netflix).

Find my next job: labor market recommendations using administrative big data

Labor markets are undergoing change due to factors such as automatization and globalization, motivating the development of occupational recommender systems for jobseekers and caseworkers. This study generates occupational recommendations by utilizing a novel data set consisting of administrative records covering the entire Danish workforce. Based on actual labor market behavior in the period 2012-2015, how well can different models predict each users' next occupation in 2016? Through offline experiments, the study finds that gradient-boosted decision tree models provide the best recommendations for future occupations in terms of mean reciprocal ranking and recall. Further, gradient-boosted decision tree models offer distinct advantages in the labor market domain due to their interpretability and ability to harness additional background information on workers. However, the study raises concerns regarding trade-offs between model accuracy and ethical issues, including privacy and the social reinforcement of gender divides.

Greedy optimized multileaving for personalization

Personalization plays an important role in many services. To evaluate personalized rankings, online evaluation, such as A/B testing, is widely used today. Recently, multileaving has been found to be an efficient method for evaluating rankings in information retrieval fields. This paper describes the first attempt to optimize the multileaving method for personalization settings. We clarify the challenges of applying this method to personalized rankings. Then, to solve these challenges, we propose greedy optimized multileaving (GOM) with a new credit feedback function. The empirical results showed that GOM was stable for increasing ranking lengths and the number of rankers. We implemented GOM on our actual news recommender systems, and compared its online performance. The results showed that GOM evaluated the personalized rankings precisely, with significantly smaller sample sizes (< 1/10) than A/B testing.

Guiding creative design in online advertising

Ad creatives (text and images) for a brand play an influential role in online advertising. To design impactful ads, creative strategists employed by the brands (advertisers) typically go through a time consuming process of market research and ideation. Such a process may involve knowing more about the brand, and drawing inspiration from prior successful creatives for the brand, and its competitors in the same product category. To assist strategists towards faster creative development, we introduce a recommender system which provides a list of desirable keywords for a given brand. Such keywords can serve as underlying themes, and guide the strategist in finalizing the image and text for the brand's ad creative. We explore the potential of distributed representations of Wikipedia pages along with a labeled dataset of keywords for 900 brands by using deep relevance matching for recommending a list of keywords for a given brand. Our experiments demonstrate the efficacy of the proposed recommender system over several baselines for relevance matching; although end-to-end automation of ad creative development still remains an open problem in the advertising industry, the proposed recommender system is a stepping stone by providing valuable insights to creative strategists and advertisers.

How can they know that? A study of factors affecting the creepiness of recommendations

Recommender systems (RS) often use implicit user preferences extracted from behavioral and contextual data, in addition to traditional rating-based preference elicitation, to increase the quality and accuracy of personalized recommendations. However, these approaches may harm user experience by causing mixed emotions, such as fear, anxiety, surprise, discomfort, or creepiness. RS should consider users' feelings, expectations, and reactions that result from being shown personalized recommendations. This paper investigates the creepiness of recommendations using an online experiment in three domains: movies, hotels, and health. We define the feeling of creepiness caused by recommendations and find out that it is already known to users of RS. We further find out that the perception of creepiness varies across domains and depends on recommendation features, like causal ambiguity and accuracy. By uncovering possible consequences of creepy recommendations, we also learn that creepiness can have a negative influence on brand and platform attitudes, purchase or consumption intention, user experience, and users' expectations of---and their trust in---RS.

Latent multi-criteria ratings for recommendations

Multi-criteria recommender systems have been increasingly valuable for helping consumers identify the most relevant items based on different dimensions of user experiences. However, previously proposed multi-criteria models did not take into account latent embeddings generated from user reviews, which capture latent semantic relations between users and items. To address these concerns, we utilize variational autoencoders to map user reviews into latent embeddings, which are subsequently compressed into low-dimensional discrete vectors. The resulting compressed vectors constitute latent multi-criteria ratings that we use for the recommendation purposes via standard multi-criteria recommendation methods. We show that the proposed latent multi-criteria rating approach outperforms several baselines significantly and consistently across different datasets and performance evaluation measures.

Multi-armed recommender system bandit ensembles

It has long been found that well-configured recommender system ensembles can achieve better effectiveness than the combined systems separately. Sophisticated approaches have been developed to automatically optimize the ensembles' configuration to maximize their performance gains. However most work in this area has targeted simplified scenarios where algorithms are tested and compared on a single non-interactive run. In this paper we consider a more realistic perspective bearing in mind the cyclic nature of the recommendation task, where a large part of the system's input is collected from the reaction of users to the recommendations they are delivered. The cyclic process provides the opportunity for ensembles to observe and learn about the effectiveness of the combined algorithms, and improve the ensemble configuration progressively.

In this paper we explore the adaptation of a multi-armed bandit approach to achieve this, by representing the combined systems as arms, and the ensemble as a bandit that at each step selects an arm to produce the next round of recommendations. We report experiments showing the effectiveness of this approach compared to ensembles that lack the iterative perspective. Along the way, we find illustrative pitfall examples that can result from common, single-shot offline evaluation setups.

Music recommendations in hyperbolic space: an application of empirical bayes and hierarchical poincaré embeddings

Matrix Factorization (MF) is a common method for generating recommendations, where the proximity of entities like users or items in the embedded space indicates their similarity to one another. Though almost all applications implicitly use a Euclidean embedding space to represent two entity types, recent work has suggested that a hyperbolic Poincaré ball may be more well suited to representing multiple entity types, and in particular, hierarchies. We describe a novel method to embed a hierarchy of related music entities in hyperbolic space. We also describe how a parametric empirical Bayes approach can be used to estimate link reliability between entities in the hierarchy. Applying these methods together to build personalized playlists for users in a digital music service yielded a large and statistically significant increase in performance during an A/B test, as compared to the Euclidean model.

On gossip-based information dissemination in pervasive recommender systems

Pervasive computing systems employ distributed and embedded devices in order to raise, communicate, and process data in an anytime-anywhere fashion. Certainly, its most prominent device is the smartphone due to its wide proliferation, growing computation power, and wireless networking capabilities. In this context, we revisit the implementation of digitalized word-of-mouth that suggests exchanging item preferences between smartphones offline and directly in immediate proximity. Collaboratively and decentrally collecting data in this way has two benefits. First, it allows to attach for instance location-sensitive context information in order to enrich collected item preferences. Second, model building does not require network connectivity. Despite the benefits, the approach naturally raises data privacy and data scarcity issues. In order to address both, we propose Propagate and Filter, a method that translates the traditional approach of finding similar peers and exchanging item preferences among each other from the field of decentralized to that of pervasive recommender systems. Additionally, we present preliminary results on a prototype mobile application that implements the proposed device-to-device information exchange. Average ad-hoc connection delays of 25.9 seconds and reliable connection success rates within 6 meters underpin the approach's technical feasibility.

On the discriminative power of hyper-parameters in cross-validation and how to choose them

Hyper-parameters tuning is a crucial task to make a model perform at its best. However, despite the well-established methodologies, some aspects of the tuning remain unexplored. As an example, it may affect not just accuracy but also novelty as well as it may depend on the adopted dataset. Moreover, sometimes it could be sufficient to concentrate on a single parameter only (or a few of them) instead of their overall set. In this paper we report on our investigation on hyper-parameters tuning by performing an extensive 10-Folds Cross-Validation on MovieLens and Amazon Movies for three well-known baselines: User-kNN, Item-kNN, BPR-MF. We adopted a grid search strategy considering approximately 15 values for each parameter, and we then evaluated each combination of parameters in terms of accuracy and novelty. We investigated the discriminative power of nDCG, Precision, Recall, MRR, EFD, EPC, and, finally, we analyzed the role of parameters on model evaluation for Cross-Validation.

PAL: a position-bias aware learning framework for CTR prediction in live recommender systems

Predicting Click-Through Rate (CTR) accurately is crucial in recommender systems. In general, a CTR model is trained based on user feedback which is collected from traffic logs. However, position-bias exists in user feedback because a user clicks on an item may not only because she favors it but also because it is in a good position. One way is to model position as a feature in the training data, which is widely used in industrial applications due to its simplicity. Specifically, a default position value has to be used to predict CTR in online inference since the actual position information is not available at that time. However, using different default position values may result in completely different recommendation results. As a result, this approach leads to sub-optimal online performance. To address this problem, in this paper, we propose a <u>P</u>osition-bias <u>A</u>ware <u>L</u>earning framework (PAL) for CTR prediction in a live recommender system. It is able to model the position-bias in offline training and conduct online inference without position information. Extensive online experiments are conducted to demonstrate that PAL outperforms the baselines by 3% - 35% in terms of CTR and CVR (ConVersion Rate) in a three-week AB test.

PDMFRec: a decentralised matrix factorisation with tunable user-centric privacy

Conventional approaches to matrix factorisation (MF) typically rely on a centralised collection of user data for building a MF model. This approach introduces an increased risk when it comes to user privacy. In this short paper we propose an alternative, user-centric, privacy enhanced, decentralised approach to MF. Our method pushes the computation of the recommendation model to the user's device, and eliminates the need to exchange sensitive personal information; instead only the loss gradients of local (device-based) MF models need to be shared. Moreover, users can select the amount and type of information to be shared, for enhanced privacy. We demonstrate the effectiveness of this approach by considering different levels of user privacy in comparison with state-of-the-art alternatives.

Performance comparison of neural and non-neural approaches to session-based recommendation

The benefits of neural approaches are undisputed in many application areas. However, today's research practice in applied machine learning---where researchers often use a variety of baselines, datasets, and evaluation procedures---can make it difficult to understand how much progress is actually achieved through novel technical approaches. In this work, we focus on the fast-developing area of session-based recommendation and aim to contribute to a better understanding of what represents the state-of-the-art.

To that purpose, we have conducted an extensive set of experiments, using a variety of datasets, in which we benchmarked four neural approaches that were published in the last three years against each other and against a set of simpler baseline techniques, e.g., based on nearest neighbors. The evaluation of the algorithms under the exact same conditions revealed that the benefits of applying today's neural approaches to session-based recommendations are still limited. In the majority of the cases, and in particular when precision and recall are used, it turned out that simple techniques in most cases outperform recent neural approaches. Our findings therefore point to certain major limitations of today's research practice. By sharing our evaluation framework publicly, we hope that some of these limitations can be overcome in the future.

Personalized fairness-aware re-ranking for microlending

Microlending can lead to improved access to capital in impoverished countries. Recommender systems could be used in microlending to provide efficient and personalized service to lenders. However, increasing concerns about discrimination in machine learning hinder the application of recommender systems to the microfinance industry. Most previous recommender systems focus on pure personalization, with fairness issue largely ignored. A desirable fairness property in microlending is to give borrowers from different demographic groups a fair chance of being recommended, as stated by Kiva. To achieve this goal, we propose a Fairness-Aware Re-ranking (FAR) algorithm to balance ranking quality and borrower-side fairness. Furthermore, we take into consideration that lenders may differ in their receptivity to the diversification of recommended loans, and develop a Personalized Fairness-Aware Re-ranking (PFAR) algorithm. Experiments on a real-world dataset from Kiva.org show that our re-ranking algorithm can significantly promote fairness with little sacrifice in accuracy, and be attentive to individual lender preference on loan diversity.

Pick & merge: an efficient item filtering scheme for Windows store recommendations

Microsoft Windows is the most popular operating system (OS) for personal computers (PCs). With hundreds of millions of users, its app marketplace, Windows Store, is one of the largest in the world. As such, special considerations are required in order to improve online computational efficiency and response times.

This paper presents the results of an extensive research of effective filtering method for semi-personalized recommendations. The filtering problem, defined here for the first time, addresses an aspect that was so far largely overlooked by the recommender systems literature, namely effective and efficient method for removing items from semi-personalized recommendation lists.

Semi-personalized recommendation lists serve a common list to a group of people based on their shared interest or background. Unlike fully personalized lists, these lists are cacheable and constitute the majority of recommendation lists in many online stores.

This motivates the following question: can we remove (most of) the users' undesired items without collapsing onto fully personalized recommendations?

Our solution is based on dividing the users into few subgroups, such that each subgroup receives a different variant of the original recommendation list. This approach adheres to the principles of semi-personalization and hence preserves simplicity and cacheability. We formalize the problem of finding optimal subgroups that minimize the total number of filtering errors, and show that it is combinatorially formidable. Consequently, a greedy algorithm is proposed that filters out most of the undesired items, while bounding the maximal number of errors for each user. Finally, a detailed evaluation of the proposed algorithm is presented using both proprietary and public datasets.

Predicting online performance of job recommender systems with offline evaluation

At Indeed, recommender systems are used to recommend jobs. In this context, implicit and explicit feedback signals we can collect are rare events, making the task of evaluation more complex. Online evaluation (A/B testing) is usually the most reliable way to measure the results from our experiments, but it is a slow process. In contrast, the offline evaluation process is faster, but it is critical to make it reliable as it informs our decision to roll out new improvements in production. In this paper, we review the comparative offline and online performances of three recommendations models, we describe the evaluation metrics we use and analyze how the offline performance metrics correlate with online metrics to understand how an offline evaluation process can be leveraged to inform the decisions.

Predicting user routines with masked dilated convolutions

Predicting users daily location visits - when and where they will go, and how long they will stay - is key for making effective location-based recommendations. Knowledge of an upcoming day allows the suggestion of relevant alternatives (e.g., a new coffee shop on the way to work) in advance, prior to a visit. This helps users make informed decisions and plan accordingly.

People's visit routines, or just routines, can vary significantly from day to day, and visits from earlier in the day, week, or month may affect subsequent choices. Traditionally, routine prediction has been modeled with sequence methods, such as HMMs or more recently with RNN-based architectures. However, the problem with such architectures is that their predictive performance degrades when increasing the number of historical observations in the routine sequence. In this paper, we propose Masked-TCN (MTCN), a novel method based on time-dilated convolutional networks. The method implements custom dilations and masking which can process effectively long routine sequences, identifying recurring patterns at different resolution - hourly, daily, weekly, monthly. We demonstrate that MTCN achieves 8% improvement in accuracy over current state-of-the-art solutions on a large data set of visit routines.

Product collection recommendation in online retail

Recommender systems are an integral part of eCommerce services, helping to optimize revenue and user satisfaction. Bundle recommendation has recently gained attention by the research community since behavioral data supports that users often buy more than one product in a single transaction. In most cases, bundle recommendations are of the form "users who bought product A also bought products B, C, and D". Although such recommendations can be useful, there is no guarantee that products A, B, C, and D may actually be related to each other. In this paper, we address the problem of collection recommendation, i.e., recommending a collection of products that share a common theme and can potentially be purchased together in a single transaction. We extend on traditional approaches that use mostly transactional data by incorporating both domain knowledge from product suppliers in the form of hierarchies, as well as textual attributes from the products. Our approach starts by combining product hierarchies together with transactional data or domain knowledge to identify candidate sets of product collections. Then, it generates the product collection recommendations from these candidate sets by learning a deep similarity model that leverages textual attributes. Experimental evaluation on real data from the Home Depot online retailer shows that the proposed solution can recommend collections of products with increased accuracy when compared to expert-crafted collections.

PyRecGym: a reinforcement learning gym for recommender systems

Recommender systems (RS) share many features and objectives with reinforcement learning (RL) systems. The former aim to maximise user satisfaction by recommending the right items to the right users at the right time, the latter maximise future rewards by selecting state-changing actions in some environment. The concept of an RL gym has become increasingly important when it comes to supporting the development of RL models. A gym provides a simulation environment in which to test and develop RL agents, providing a state model, actions, rewards/penalties etc. In this paper we describe and demonstrate the PyRecGym gym, which is specifically designed for the needs of recommender systems research, by supporting standard test datasets (MovieLens, Yelp etc.), common input types (text, numeric etc.), and thereby offering researchers a reproducible research environment to accelerate experimentation and development of RL in RS.

Should we embed? A study on the online performance of utilizing embeddings for real-time job recommendations

In this work, we present the findings of an online study, where we explore the impact of utilizing embeddings to recommend job postings under real-time constraints. On the Austrian job platform Studo Jobs, we evaluate two popular recommendation scenarios: (i) providing similar jobs and, (ii) personalizing the job postings that are shown on the homepage. Our results show that for recommending similar jobs, we achieve the best online performance in terms of Click-Through Rate when we employ embeddings based on the most recent interaction. To personalize the job postings shown on a user's homepage, however, combining embeddings based on the frequency and recency with which a user interacts with job postings results in the best online performance.

The influence of personal values on music taste: towards value-based music recommendations

The field of recommender systems has a lot to gain from the field of psychology. Indeed, many psychology researchers have investigated relations between models that describe humans and consumption preferences. One example of this is personality, which has been shown to be a valid construct to describe people. As a consequence, personality-based recommenders have already proven to be a lead toward improving recommendations, by adapting them to their users' traits.

Beyond personality, there are more ways to describe a person's identity. One of these ways is to consider personal values: what is important for the users in life at the most abstract level. Being complementary to personality traits, values may give another lead towards better user understanding. In this paper, we investigate this, taking music as a use case. We use a marketing interview technique to elicit 22 users' personal values connected to their musical preferences. We show that personal values indeed play a role in people's music preferences, and are the first to propose a map linking personal values to music preferences. We see this map as a first step in devising a value-based user model for music recommender systems.

Time slice imputation for personalized goal-based recommendation in higher education

Learners are often faced with the following scenario: given a goal for the future, and what they have learned in the past, what should they do now to best achieve their goal? We build on work utilizing deep learning to make inferences about how past actions correspond to future outcomes and enhance this work with a novel application of backpropagation to learn per-user optimized next actions. We apply this technique to two datasets, one from a university setting in which courses can be recommended towards preparation for a target course, and one from a massive open online course (MOOC) in which course pages can be recommended towards quiz preparation. In both cases, our algorithm is applied to recommend actions the learner can take to maximize a desired future achievement objective, given their past actions and performance.

Traversing semantically annotated queries for task-oriented query recommendation

As search systems gradually turn into intelligent personal assistants, users increasingly resort to a search engine to accomplish a complex task, such as planning a trip, renting an apartment, or investing in stocks. A key challenge for the search engine is to understand the user's underlying task given a sample query like "tickets to panama", "studios in los angeles", or "spotify stocks", and to suggest other queries to help the user complete the task. In this paper, we investigate several strategies for query recommendation by traversing a semantically annotated query log using a mixture of explicit and latent representations of entire queries and of query segments. Our results demonstrate the effectiveness of these strategies in terms of utility and diversity, as well as their complementarity, with significant improvements compared to state-of-the-art query recommendation baselines adapted for this task.

User-centric evaluation of session-based recommendations for an automated radio station

The creation of an automated and virtually endless playlist given a start item is a common feature of modern media streaming services. When no past information about the user's preferences is available, the creation of such playlists can be done using session-based recommendation techniques. In this case, the recommendations only depend on the start item and the user's interactions in the current listening session, such as "liking" or skipping an item.

In recent years, various novel session-based techniques were proposed, often based on deep learning. The evaluation of such approaches is in most cases solely based on offline experimentation and abstract accuracy measures. However, such evaluations cannot inform us about the quality as perceived by users. To close this research gap, we have conducted a user study (N=250), where the participants interacted with an automated online radio station. Each treatment group received recommendations that were generated by one of five different algorithms. Our results show that comparably simple techniques led to quality perceptions that are similar or even better than when a complex deep learning mechanism or Spotify's recommendations are used. The simple mechanisms, however, often tend to recommend comparably popular tracks, which can lead to lower discovery effects.

SESSION: Novel uses of recommenders

Using AI to build communities around interests on LinkedIn

At LinkedIn, our mission is to connect the world's professionals to make them more productive and successful. Our team, Communities Artificial Intelligence (AI), at LinkedIn helps our members achieve this goal is by providing a platform where communities can form around common interests and shared experiences.

Fostering active communities at LinkedIn can be broken down into the following components:

(1) Discover: Help members find new entities (members, companies, hashtags, and more) to follow that will expose them to communities that share their interests.

(2) Engage: Engage members in the conversations taking place in their communities by recommending content from their areas of interest.

(3) Contribute: Help members effectively engage with the right communities when they create or share content.

These three components form the main pillars of a content-driven ecosystem and our goal is to use AI to successfully close the loop between Discover (via providing relevant follow recommendations), Engage (via delivering engaging content to users from their areas of interest), and Contribute (via suggesting hashtags to content creators to target the right audience).

A diverse set of AI techniques is required to address the challenges that arise in each of these components. These techniques include: Supervised Learning (XGBoost, Logistic Regression, Linear Regression), Wide and Deep Models, Natural Language Processing (e.g., Word Embeddings, ngram matching), and Unsupervised Learning.

In this presentation, we will provide an overview of the AI techniques we use to form active communities on LinkedIn. We will describe two solutions in detail. First, we will describe how we have built our Follow Recommendations product. The goal of the Follow Recommendations product is to recommend entities to a member that the member finds both immediately relevant (i.e., increase the probability the member will follow the recommended entity) as well as engaging in the long run (i.e., the recommended entity produces content that the member finds relevant).

Our analysis of the performance of our follow recommendations models has shown the superiority of nonlinear models compared to their linear counterparts. To manage the explosion of data emanating from terabytes of features generated from (viewer, entity) pairs, we use an innovative 2-D hash join algorithm that was developed at LinkedIn.

We are also moving towards a hybrid scoring architecture. This allows us to score candidates with complex offline models and then re-rank these candidates based on more time-sensitive contextual features online. This generates more relevant and timely recommendations for the members based on their recent activity on different parts of the LinkedIn ecosystem.

Second, we will describe our approach to solve the problem of Hashtag Suggestion and Typeahead. Hashtags are a great tool that allows members to expand the reach of their posts to the right audience (or communities). Our Hashtag Suggestion and Typeahead (HST) product was built to aid members in adding hashtags to their posts. We do not only recommend hashtags that the member is likely to select into their post, but also hashtags that are more likely to get the member the most online feedback.

We call the latter aspect downstream utility (or engagement). However, before realizing this utility, the member has to actually select from the recommended hashtags. Therefore, the HST product is produced by combining two models. The first model maximizes the probability that the member will select the suggested hashtag and the second one optimizes for downstream utility. Based on content consumption behavior on LinkedIn, we have a good understanding of the supply and demand of content tagged with a specific hashtag. This information enables us to shape the inventory as well as traffic in individual hashtag domains, thus providing a better experience to content-starved communities.

The trinity of luxury fashion recommendations: data, experts and experimentation

Farfetch is the leading platform for online luxury fashion shopping. We have more than 3000 brands and high-end designers with the biggest catalog of luxury products available worldwide to more than 1 million customers.

The high-end luxury fashion segment where Farfetch operates in is a notably complex and intricate field. Fashion trends change very fast and can come from anywhere, at any time, thus being very hard to capture. Ultimately, people's tastes are very personal and hard to extrapolate. Users of luxury websites have understandably high expectations and demand a high-end, curated and knowledgeable experience in all aspects. To achieve this, the recommendations engine powering the Farfetch platform is being built on top of three main pillars: 1) data, 2) expert knowledge, and 3) experimentation.

Data is obviously the core of any automated recommender system. Like many e-commerce platforms, we collect and leverage various implicit interactions by tracking our users' journeys on Farfetch.com and apps, as well as the explicit preferences they often set - such as their favourite designers. From implicit feedback data we started building the state-of-the-art recommender systems based on collaborative approaches only to realize that our catalogue would not allow for item-item collaborative recommenders, since a product's lifetime is either too short with unique pieces being bought as soon as they go live, or too long with some timeless iconic items lasting forever. Hence, we needed to implement hybrid versions of collaborative-based recommenders which emphasized the products' content data [1].

Throughout the experimentation process over these algorithms, both implicit and explicit feedback data seemed to fall short to encode the sense of fashion expected by our customers. The obvious next step was to use the internal knowledge embedded in several teams of fashion experts and stylists. Although not trivial, there are many ways we can leverage this expert knowledge into improving the fashion understanding of our recommender systems:

• Our content editors create the editorial pages with the latest trends and write the products' descriptions. This data allows us to build the relationships between designers to create adjacency models and incorporate taxonomy data employing NLP approaches [5].

• Our visual merchandising experts curate crucial listing pages with products respecting business rules, fashion trends and our signature on fashion. This allows us to encode colorflow and style trends by using style transfer techniques such as computing Gram matrices from convolutional feature maps [2].

• Our stylists manually curate outfits respecting Farfetch's style identity. This allows us to build automated outfits based on siamese neural networks on top of Convolutional Neural Networks [3, 4].

In order to tie these sources of information together in a seamless manner, we follow a strict experimentation workflow, where we iterate fast, deliver in a controlled way through AB testing, and track and evaluate the impact in different dimensions. This process has allowed us to optimize the business value of the system in different contexts and gain a better understanding of our customers and what works and doesn't work for them.

In this talk, we will share the Farfetched solutions of our journey on building personalized recommendations in the segment of luxury fashion using data, experts and experimentation.

"Just play something awesome": the personalization powering voice interactions at Pandora

The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a delightful listening experience for millions of users daily. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities.

In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic and broad open-ended.

Known-item search requests are the most common scenario where users have a well defined and clear intent which is looking for a specific item in the catalog or their personal collection. A voice interface makes the task natural and easy to accomplish since the user is not required to type on a small keyboard. Solving for this specific task involves performing an entity search against a large music catalog and personal user collection. This can be very challenging due to imperfect voice utterance transcriptions, unconventional entity names and the numerous combinations of ways a user can ask for music entities. We employ personalization algorithms for entity disambiguation which can be caused by the presence of homonyms, homographs and homophones terms in the catalog.

Another common voice use case is to ask for music regarding a specific theme or context such as a genre, an activity, a mood, an occasion or any combination of those. This scenario differs sharply from the known-item case in that multiple results might, based on user varying contexts, be relevant rather than a single clearly relevant one. For example, a rap music fan would not enjoy a country workout playlist when asking for "music for working out" but may like a hip hop one. This problem can be quite complex to solve as it involves different areas such as voice spoken language understanding, content tagging and personalization. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. After that, we will discuss some of the content tagging work we have done to classify music according to these voice specific themes. Lastly, we will touch upon how we use recommendation techniques to deliver personalized and unique results to each individual and describe the challenge of balancing the delicate trade-off between query relevance and personalization.

The third category of voice queries we will describe are broad or open-ended requests. Voice users often skip the hard work of thinking about what they actually want to hear and command: "just play something awesome". A music service should still meet these expectations instead of interpreting those commands as literal requests. We discuss exploit and explore trade-offs made in the recommendation item pool generation process. Here the exploit pool contains items aimed at re-consumption, while the explore pool contains new items with specific context match.

Finally, we will discuss differences and challenges regarding evaluation of voice powered recommendation systems. The first key difference is that in the standard recommendation system settings evaluations are based on UI signals such as impressions and clicks or other explicit forms of feedback. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made.

In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.

Future of in-vehicle recommendation systems @ Bosch

Future in-vehicle recommendation systems will assist the driver or passenger in all situations before, along, and after a trip. Based on preferences and needs of the user and by taking the current situation and available context information into account, they will provide the right recommendation at the right time.

Bosch is the world's largest automotive supplier, delivering a full range of products and services from power-train, infotainment, HMI, connected mobility, driver assistance to automated driving. This talk will present challenges, concepts and recent technical progress in in-vehicle recommendation systems developed at Bosch including details of a combined routing, charging, and point-of-interest (POI) recommendation system.

There has been tremendous progress in the field of location-independent recommendation systems, such as recommending films, music, news or shopping articles. The ubiquity of user location information, provided by connected devices, has paved the way for location-based services (LBS), and their combination with social networks have extended these to location-based social network (LBSN) services, see [1, 6] for recent surveys about recommender systems in LBSN.

In-vehicle recommendation systems go a step further by extending LBSN services with vehicle context and vehicle specific applications. This can support the user in various applications, such as routing (e.g. route and point of interest recommendation), infotainment (e.g. music or news recommendation), communication (finding a contact, fast call) and in-vehicle control (e.g. seat position, ambient light or HVAC settings). Out-of-vehicle assistance includes the control of connected devices in smart buildings such as alarm systems, heating, kitchen and entertainment devices.

We present an important application of in-vehicle recommending systems, a combined routing, charging and POI recommender developed at Bosch. Routing and charging optimization for electric vehicles was described for optimizing the shortest feasible path [2], optimizing constrained shortest path [4], optimizing charging grid demand and opportunities [5], and optimizing minimum cost [3]. These approaches focus on single criteria based optimization.

We describe the first system with combined route optimization, charging station search and POI recommendation. It optimizes three criteria: finding the optimal route with the optimal charging stations, so that the vehicle always has enough energy, and finding the optimal POIs along the route, where 'optimal' depends on the drivers preferences and rich context information covering user, vehicle and environment.

Designer-driven add-to-cart recommendations

Although real-time dynamic recommender systems have been applied successfully by e-commerce and technology companies for more than a decade, we at IKEA Group have just started our journey into this exciting field. At IKEA, customer experience is at our heart, and a key principle for any machine learning algorithm that we design to improve this experience is that it should act as an extension to the home-furnishing expertise that our co-workers have developed and fine-tuned for more than 75 years. In this talk, we discuss a particular recommendation strategy that projects the inspirational shopping experience of our blue boxes onto our digital touch points by defining a notion of style from our vast collection of inspirational content.

To go beyond classical, transaction-based collaborative filtering strategies, we take as our starting point the different types of images taken of each product when launched. Our current implementation relies on the following 3 types of images:

(1) white-canvas, referring to an image of a product displayed on a plain white background;

(2) context-based, which shows a product in the larger context of a room, but where emphasis remains on the product itself;

(3) inspirational, in which a product is shown in a purposefully atmospheric setting with focus on the entirety.

By extracting the product range displayed in our tagged inspirational images, we initially construct a graph of products that embeds the mindset of our talented designers. Add-to-cart recommendations are then generated from the resulting graph based on user-behaviour data collected from our digital touch points (app, web) and transactional data from purchases made online, or in one of our IKEA stores.

To implement the strategy, we have come across a few interesting (stand-alone) problems along the way; notably, we faced a severe lack of properly tagged inspirational images, and much of our furniture today does not appear in our inspirational collection. To circumvent the latter observation, we pursue a supervised learning approach that automatically identifies products that 1) complement each other with regards to function, and 2) match in terms of style. We do this by taking product metadata attributes and the full collection of product images as input. We also discuss how we use a combination of features extracted from context-based and inspirational images using a pre-trained ImageNet model [2], together with manually tagged inspirational images and transaction data from stores to create our training data. The use of both context-based and inspirational images distinguishes us from similar methodologies in the fashion industry [1, 3] and enables us to capture the notion of complementary products in a satisfying way.

SESSION: Novel approaches to recommenders

Groupon finally explains why we showed those offers

Groupon has a large inventory of offers as varied as local taquerias, massages, concert tickets, and trips to Costa Rica. Our Search & Recommendations team continues to develop algorithmic recommendations systems, machine-learned query understanding models, and increasingly sophisticated personalization and sales conversion estimations. Across an inventory of millions of offers, including many highly localized and geographically-specific ones unique to Groupon's Local business, we strive to balance inventory exploration and matching our users with the exact right item. Our Recommendations models take a variety of factors into account so that we can make the most relevant suggestions to our customers in their neighborhood, or while traveling in one of our hundreds of domestic and international markets. Our system must index millions of items, including the many specific to a user's location; score the deals based on estimated conversion; and finally make adjustments for personalization, exploration, and diversity before delivering our ranked list of inventory to the platform. Yet despite our efforts, many of our customers are unaware of how highly considered their Groupon App and Emails are. In numerous customer interviews we found a huge perception gap that had to be addressed. Customers expressed that our central scrollable home feed felt "cluttered", "disorganized", and "like a garage sale". It was clear to us that the next great sophisticated recommendation feature meant nothing if our customers couldn't appreciate it. Collectively, we realized that we were missing a key communication with our customers. Customers of large internet marketplaces-whether eCommerce, Social Media, or Digital Media-have become accustomed to explanations or qualifications for the recommendations being shown to them. These often take the form of widgets or collections/carousels with titles that explain the grouping such as: "Because you watched "Pulp Fiction" or "Your friend liked this post by Cardi B". Our team decided we could demonstrate our own consideration logic to customers, explain the reasoning of their deal feed, and hopefully encourage them to interact and personalize their experience more. Because of the amount of data being considered to drive our recommendations, our team had to develop a system which could generate multiple personalized explanations, score them, and budget the various messages with the deal feed.

Homepage personalization at spotify

We aim to surface the best of Spotify for each user on the Home page by providing a personalized space where users can find recommendations of playlists, albums, artists, podcasts tailored to their individual preferences. Hundreds of millions of users listen to music on Spotify each month, with more than 50 million daily active users on the Homepage alone. The quality of the recommendations on Home depends on a multi-armed bandit framework that balances exploration and exploitation and allows us to adapt quickly to changes in user preferences. We employ counterfactual training and reasoning to evaluate new algorithms without having to always rely on A/B testing or randomized data collection experiments [3].

In this talk, we explain the methods and technologies used in the end-to-end process of homepage personalization and demonstrate a case study where we show improved user satisfaction over a popularity-based baseline. In addition, we present some of the challenges we faced in implementing such machine learning solutions in a production environment at scale and the approaches used to address them.

The first challenge stems from the fact that training and offline evaluation of machine learning methods from incomplete logged feedback data requires robust off-policy estimators that account for several forms of bias [1, 2]. The ability to quickly sanity check and gain confidence in the methods we use in the production system is a crucial foundation for developing and maintaining effective algorithms. We demonstrate how we used a single-feature model, optimized for impression-to-click rate, to validate, and improve if necessary, the methods we use for off-policy estimation and accounting for position bias.

Lastly, the business metrics we optimize for do not always reflect the expectations of all users of the Home page at a granular level. Consider a niche, daily podcast producing independent, fact-based news every morning. A small segment of Spotify customers might want to see that content on top of their Home page every morning. We present simple but informative metrics we developed to validate our model's ability to account for such habitual behaviors of our customers.

Recommendation in home improvement industry, challenges and opportunities

Retail industry has been disrupted by the e-commerce revolution more than any other industry. Some giant retailers went out of business or filed for bankruptcy as a result of that like Sears and Toys R Us. However, some verticals in the retail industry are still robust and not been disrupted due to the lack of e-commerce solutions that convinced customers to turn their back to the existing physical stores in favor of the online experience. Home improvement is the best example of such vertical where e-commerce has not "yet" disrupted the domain and caused problems to the leading companies which still rely heavily on physical stores.

That being said, home improvement retailers recognized the risk of not investing in building a robust online business that support their physical stores in a seamless experience so most of the leading retailers in this hundred-billion-dollar industry started building their in-house solutions for all the challenging problems to give their shoppers a seamless experience when they shop online.

Recommender systems playing crucial role in this industry like any other online retailers. Therefore, it is very important to invest in building personalized, scalable, and reliable recommender system that proactively help shoppers discover products that engage them and match their intent and interest while on the website then reengage them with products and content that align with their interest after they leave the website via email or social media.

As a Sr. Manager of Core Recommendations team at The Home Depot which is the largest home improvement retailer in the world, I deal with the challenges of building such recommender system utilizing the cutting-edge technologies in AI, machine learning, and data science. In this talk I would like to discuss and highlight the following challenges in the recommendations for home improvement:

(1) Project-based recommendations: One of the unique aspects on home improvement retail is project-based shopping. Most of the visitors of home improvement retails are classified as "Do It Yourself" where those customers who are non-home improvement professionals, but they are interested in building or fixing something in their home themselves. For those customers they prefer to go to the physical store most of the time, so they can talk to a store associate about their project and get the associate help in getting the needed tools and materials for their project. It is very challenging to build similar experience online so I will talk about what we have done at Home Depot to build a project-based recommendation utilizing multi-modal learning to achieve that goal.

(2) Item Related Groups (IRG): One of the most important recommendations on the home improvement portals is the Item Related Groups (IRG) which includes accessories (water filter is an accessory for a fridge), collections (faucet has shower head, towel bar, and towel ring which match the style as collection), and Parts (handler of a drawer). The challenges in recommending those different IRG vary from visual compatibility to functionality understanding. I will discuss how we are leveraging computer vision, Deep Learning, NLP, NLU, and domain knowledge to tackle these problems and generate high quality IRG recommendations.

I will also cover in this talk the other challenges that face recommender systems in home improvement industry like the velocity of changing interest and intent and the sparsity of interactions between customers and products.

Recommendation systems compliant with legal and editorial policies: the BBC+ app journey

The BBC produces thousands of pieces of content every day and numerous BBC products deliver this content to millions of users. For many years the content has been manually curated (this is evident in the selection of stories on the front page of the BBC News website and app for example). To support content creation and curation, a set of editorial guidelines have been developed to build quality and trust in the BBC. As personalisation becomes more important for audience engagement, we have been exploring how algorithmically-driven recommendations could be integrated in our products. In this talk we describe how we developed recommendation systems for the BBC+ app that comply with legal and editorial policies and promote the values of the organisation. We also discuss the challenges we face moving forward, extending the use of recommendation systems for a public service media organisation like the BBC.

The BBC+ app is the first product to host in-house recommendations in a fully algorithmically-driven application. The app surfaces short video clips and is targeted at younger audiences. The first challenge we dealt with was content metadata. Content metadata are created for different purposes and managed by different teams across the organisation making it difficult to have reliable and consistent information. Metadata enrichment strategies have been applied to identify content that is considered to be editorially sensitive, such as political content, current legal cases, archived news, commercial content, and content unsuitable for an under 16 audience. Metadata enrichment is also applied to identify content that due care has not been taken such as poor titles, and spelling and grammar mistakes. The first versions of recommendation algorithms exclude all editorially risky content from the recommendations, the most serious of which is avoiding contempt of court. In other cases we exclude content that could undermine our quality and trustworthiness.

The General Data Protection Regulation (GDPR) that recently came into effect had strong implications on the design of our system architecture, the choice of the recommendation models, and the implementation of specific product features. For example, the user should be able to delete their data or switch off personalisation at any time. Our system architecture should allow us to trace down and delete all data from that user and switch to non-personalised content. The recommendations should also be explainable and this led us to sometimes choosing a simpler model so that it is possible to more easily explain why a user was recommended a particular type of content. Specific product features were also added to enhance transparency and explainability. For example, the user could view their history of watched items, delete any item, and get an explanation of why a piece of content was recommended to them.

At the BBC we aim to not only entertain our audiences but also to inform and educate. These BBC values are also reflected in our evaluation strategies and metrics. While we aim to increase audience engagement we are also responsible for providing recent and diverse content that meets the needs of all our audiences. Accuracy metrics such as Hit Rate and Normalized Discounted Cumulative Gain (NDCG) can give a good estimate of the predictive performance of the model. However, recency and diversity metrics have sometimes more weight in our products, especially in applications delivering news content. What is more, qualitative evaluation is very important before releasing any new model into production. We work closely with editorial teams who provide feedback on the quality of the recommendations and flag content not adhering to the BBC's values or the legal and editorial policies.

The development of the BBC+ app has been a great journey. We learned a lot about our content metadata, the implications of GDPR in our system, and our evaluation strategies. We created a minimum viable product that is compliant with legal and editorial policies. However, a lot needs to be done to ensure the recommendations meet the quality standards of the BBC. While excluding editorially sensitive content has limited the risk of contempt of court, algorithmic fairness and impartiality still need to be addressed. We encourage the community to look more into these topics and help us create the way forward towards applications with responsible machine learning.

Incorporating intent propensities in personalized next best action recommendation

Next best action (NBA) is a technique that is widely considered as the best practice in modern personalized marketing. It takes users' unique characteristics into consideration and recommends next actions that help users progress towards business goals as quickly and smoothly as possible. Many NBA engines are built with rules handcrafted by marketers based on experience or gut feelings. It is not effective. In this proposal, we show our machine learning based approach for such a real-time recommendation engine, detail our design choices, and discuss evaluation techniques.

In practice, there are several key challenges to consider. (a) It needs to be able to deal with historical feedback that is typically incomplete and skewed towards a small set of actions; (b) Actions are typically dynamic. They can be added or removed anytime due to seasonal changes or shifts in business strategies; (c) The optimization objective is typically complex. It usually consists of reaching a set of target events or moving users to more preferred stages. The engine needs to account for all these aspects.

Standard classification or regression models are not suitable to use, because only bandit feedback is available and sampling bias presented in historical data can not be handled properly. Conventional multi-armed bandit model can address some of the challenges. But it lacks the ability to model multiple objectives. We present a propensity variant hybrid contextual multi-armed bandit model (PV-MAB) that can address all three challenges.

PV-MAB consists of two components: an intent propensity model (I-Prop) and a hybrid contextual MAB (H-Bandit). H-Bandit can be considered as a multi-policy contextual MAB, where we model different aspects of user engagement separately and cater the policies to each unique characteristic. I-Prop leverages user intent signals to target different users toward specific goals that are most relevant to them. It acts as a policy selector, to inform H-Bandit to choose the best strategy for different users at different points in the journey. I-Prop is trained separately with features extracted from user profile affinities and past behaviors.

To illustrate this design, we will focus our discussion on how to incorporate two common distinct objectives in H-bandit. The first one is to target and drive users to reach a small set of high-value goals (e.g. purchase, become superfan), called goal-oriented policy. The second is to promote progression into more advanced stages in a consumer journey (e.g. from login to complete profile). We call it stage-advancement policy. In the goal-oriented policy, we reward reaching the goals accordingly, and use classification predictor as kernel function to predict the probabilities for achieving those goals. In the stage-advancement policy, we use the progression of stages as reward. Customers can move forward in their journey, skip a few stages or go back to previous stages doing more research or re-evaluation. The reward strategy is designed in the way that we reward higher for bigger positive stage progression and not reward zero or negative stage progression. Both policies incorporate Thompson Sampling with Gaussian kernel for better exploration.

One big difference between our hybrid model and regular contextual bandit model, is that besides context information, we also mix user profile affinities in the model. It tells us the user intent and interest, and how their typical journey path looks like. With these special features, our model is able to recommend different actions for users that shows different interests (i.e. football ticket purchase v.s. jersey purchase). Similarly, for fast shoppers who usually skip a few stages, our model recommends actions that quickly triggers goal achievement; while for research type of users, the model offers actions that move them gradually towards next stages. This hybrid strategy provides us with better understanding of user intent and behaviors, so as to make more personalized recommendations.

We designed a time-sensitive rolling evaluation mechanism for offline evaluation of the system with various hyperparameters that simulate behaviors in practice. Despite the lack of online evaluation, our strategy allows researchers and prospects to gain confidence through bounded expected performance. Evaluated on real-world data, we observed about 120% of reward gain, with an overall confidence of around 0.95. A big portion of the improvement is contributed by the goal-oriented policy. It well demonstrated the discovery functionality of the intent propensity model.

Driving content recommendations by building a knowledge base using weak supervision and transfer learning

With 2.2 million subscribers and two hundred million content views, Chegg is a centralized hub where students come to get help with writing, science, math, and other educational needs. In order to impact a student's learning capabilities we present personalized content to students. Student needs are unique based on their learning style, studying environment and many other factors. Most students will engage with a subset of the products and contents available at Chegg. In order to recommend personalized content to students we have developed a generalized Machine Learning Pipeline that is able to handle training data generation and model building for a wide range of problems. We generate a knowledge base with a hierarchy of concepts and associate student-generated content, such as chat-room data, equations, chemical formulae, reviews, etc with concepts in the knowledge base. Collecting training data to generate different parts of the knowledge base is a key bottleneck in developing NLP models. Employing subject matter experts to provide annotations is prohibitively expensive. Instead, we use weak supervision and active learning techniques, with tools such as snorkel[2], an open source project from Stanford, to make training data generation dramatically easier. With these methods, training data is generated by using broad stroke filters and high precision rules. The rules are modeled probabilistically to incorporate dependencies. Features are generated using transfer learning[1] from language models for classification tasks. We explored several language models and the best performance was from sentence embeddings with skip-thought vectors predicting the previous and the next sentence. The generated structured information is then used to improve product features, and enhance recommendations made to students. In this presentation I will talk about efficient methods of tagging content with categories that come from a knowledge base. Using this information we provide relevant content recommendations to students coming to Chegg for online tutoring, studying flashcards and practicing problems.

DEMONSTRATION SESSION: Demonstrations

AnnoMathTeX - a formula identifier annotation recommender system for STEM documents

Documents from science, technology, engineering and mathematics (STEM) often contain a large number of mathematical formulae alongside text. Semantic search, recommender, and question answering systems require the occurring formula constants and variables (identifiers) to be disambiguated. We present a first implementation of a recommender system that enables and accelerates formula annotation by displaying the most likely candidates for formula and identifier names from four different sources (arXiv, Wikipedia, Wikidata, or the surrounding text). A first evaluation shows that in total, 78% of the formula identifier name recommendations were accepted by the user as a suitable annotation. Furthermore, document-wide annotation saved the user the annotation of ten times more other identifier occurrences. Our long-term vision is to integrate the annotation recommender into the edit-view of Wikipedia and the online LaTeX editor Overleaf.

Darwin & Goliath: a white-label recommender-system as-a-service with automated algorithm-selection

Recommendations-as-a-Service (RaaS) ease the process for small and medium-sized enterprises (SMEs) to offer product recommendations to their customers. Current RaaS, however, suffer from a one-size-fits-all concept, i.e. they apply the same recommendation algorithm for all SMEs. We introduce Darwin & Goliath, a RaaS that features multiple recommendation frameworks (Apache Lucene, TensorFlow, ...), and identifies the ideal algorithm for each SME automatically. Darwin & Goliath further offers per-instance algorithm selection and a white label feature that allows SMEs to offer a RaaS under their own brand. Since November 2018, Darwin & Goliath has delivered more than 1m recommendations with a CTR = 0.5%.

FineNet: a joint convolutional and recurrent neural network model to forecast and recommend anomalous financial items

Financial technology (FinTech) draws much attention in these years, with the advances of machine learning and deep learning. In this work, given historical time series of stock prices of companies, we aim at forecasting upcoming anomalous financial items, i.e., abrupt soaring or diving stocks, in financial time series, and recommending the corresponding stocks to support financial operations. We propose a novel joint convolutional and recurrent neural network model, Financial Event Neural Network (FineNet), to forecast and recommend anomalous stocks. Experiments conducted on the time series of stock prices of 300 well-known companies exhibit the promising performance of FineNet in terms of precision and recall. We build FineNet as a Web platform for live demonstration.

Interactive evaluation of recommender systems with SNIPER: an episode mining approach

Recommender systems are typically evaluated using either offline methods, online methods, or through user studies. In this paper we take an episode mining approach to analysing recommender system data and we demonstrate how we can use SNIPER, a tool for interactive pattern mining, to analyse and understand the behaviour of recommender systems. We describe the required data format, and present a useful scenario of how a user can interact with the system to answer questions about the quality of recommendations.

IRF: interactive recommendation through dialogue

Recent research focuses beyond recommendation accuracy, towards human factors that influence the acceptance of recommendations, such as user satisfaction, trust, transparency and sense of control. We present a generic interactive recommender framework that can add interaction functionalities to non-interactive recommender systems. We take advantage of dialogue systems to interact with the user and we design a middleware layer to provide the interaction functions, such as providing explanations for the recommendations, managing users' preferences learnt from dialogue, preference elicitation and refining recommendations based on learnt preferences.

Microsoft recommenders: tools to accelerate developing recommender systems

The purpose of this demonstration is to highlight the content of the Microsoft Recommenders repository and show how it can be used to reduce the time involved in developing recommender systems. The open source repository provides python utilities to simplify common recommender-related data science work as well as example Jupyter notebooks that demonstrate use of the algorithms and tools under various environments.

StoryTime: eliciting preferences from children for book recommendations

We present StoryTime, a book recommender for children. Our web-based recommender is co-designed with children and uses images to elicit their preferences. By building on existing solutions related to both visual interfaces and book recommendation strategies for children, StoryTime can generate suggestions without historical data or adult guidance. We discuss the benefits of StoryTime as a starting point for further research exploring the cold start problem, incorporating historical data, and needs related to children as a complex audience to enhance the recommendation process.

Towards interactive recommending in model-based collaborative filtering systems

Numerous attempts have been made for increasing the interactivity in recommender systems, but the features actually available in today's systems are in most cases limited to rating or re-rating single items. We present a demonstrator that showcases how model-based collaborative filtering recommenders may be enhanced with advanced interaction and preference elicitation mechanisms in a holistic manner. Hereby, we underline that by employing methods we have proposed in the past it becomes possible to easily extend any matrix factorization recommender into a fully interactive, user-controlled system. By presenting and deploying our demonstrator, we aim at gathering further insights, both into how the different mechanisms may be intertwined even more closely, and how interaction behavior and resulting user experience are influenced when users can choose from these mechanisms at their own discretion.

WORKSHOP SESSION: Workshops, challenge, and late-breaking results

Workshop on context-aware recommender systems

Contextual information has been widely recognized as an important modeling dimension both in social sciences and in computing. In particular, the role of context has been recognized in enhancing recommendation results and retrieval performance. While a substantial amount of existing research has focused context-aware recommender systems (CARS), many interesting problems remain under-explored. The CARS 2019 workshop provides a venue for presenting and discussing approaches for next generation of CARS and application domains that may require a variety of dimensions of contexts and cope with its dynamic properties.

Third workshop on recommendation in complex scenarios (ComplexRec 2019)

Over the past decade, recommendation algorithms for ratings prediction and item ranking have steadily matured. However, these state-of-the-art algorithms are typically applied in relatively straightforward and static scenarios: given information about a user's past item preferences in isolation, can we predict whether they will like a new item or rank all unseen items based on predicted interest? In reality, recommendation is often a more complex problem: the evaluation of a list of recommended items never takes place in a vacuum, and it is often a single step in the user's more complex background task or need. The goal of the ComplexRec 2019 workshop is to offer an interactive venue for discussing approaches to recommendation in complex scenarios that have no simple one-size-fits-all solution.

Workshop on recommender systems in fashion (fashionXrecsys2019)

Online Fashion retailers have significantly increased in popularity over the last decade, making it possible for customers to explore hundreds of thousands of products without the need to visit multiple stores or stand in long queues for checkout. Recommender Systems are often used to solve different complex problems in this scenario, such as social fashion-aware recommendations (outfits inspired by influencers), product recommendations, or size and fit recommendations. However, relatively little research has been done on these complex problems. The very First fashionXrecsys Workshop aims at addressing these issues by providing a avenue for discussing novel approaches to recommendations in fashion and e-commerce applications.

Fourth international workshop on health recommender systems (HealthRecSys 2019)

HealthRecSys 2019 was the 4th International Workshop on Health Recommender Systems held in conjunction with the 2019 ACM Conference on Recommender Systems in Copenhagen, Denmark. This workshop followed on from of the previous workshop in 2018 [4] and focused on the application and potentials of recommender systems on health promotion, health care and health-related topics. By engaging the discussion and representation of health domains into recommender systems, this workshop facilitated the cross-domain collaborations and exchange of knowledge and infrastructure.

First workshop on the impact of recommender systems at ACM RecSys 2019

Research in the area of recommender systems is largely focused on the value such a system creates for the users, by helping them finding items they are interested in. This is usually done by learning to rank the recommendable items based on their assumed relevance for each user. The implicit underlying goal often is that this personalization positively affects users in different positive ways, e.g., by making their search and decision processes easier or by helping them discover new things [3].

The 7th international workshop on news recommendation and analytics (INRA 2019)

Publishing news represents a vital function for societal health. News recommender systems, which support readers finding relevant content, face challenges beyond those encountered by other types of recommender systems. They have to deal with a dynamic flow of unstructured, fragmentary, and potentially unreliable news stories. The International Workshop on News Recommendation and Analytics (INRA) focuses on the challenges of news recommender systems and aims to connect researchers, practitioners and journalists. The seventh edition of INRA takes place as a half-day workshop in conjunction with thirteenth ACM Conference on Recommender Systems (RecSys '19) on September 16--20, 2019 in Copenhagen, Denmark. INRA 2019 focuses on the news recommender systems under three main categories: News recommendation, news analytics, and ethical aspects of news recommendation.

RecSys '19 joint workshop on interfaces and human decision making for recommender systems

As an interactive intelligent system, recommender systems are developed to give recommendations that match users' preferences. Since the emergence of recommender systems, a large majority of research focuses on objective accuracy criteria and less attention has been paid to how users interact with the system and the efficacy of interface designs from users' perspectives. The field has reached a point where it is ready to look beyond algorithms, into users' interactions, decision making processes, and overall experience. This workshop will focus on the "human side" of recommender systems research. The workshop goal is to improve users' overall experience with recommender systems by integrating different theories of human decision making into the construction of recommender systems and exploring better interfaces for recommender systems.

ORSUM 2019 2nd workshop on online recommender systems and user modeling

The ever-growing nature of user generated data in online systems poses obvious challenges on how we process such data. Typically, this issue is regarded as a scalability problem and has been mainly addressed with distributed algorithms able to train on massive amounts of data in short time windows. However, data is inevitably adding up at high speeds. Eventually one needs to discard or archive some of it. Moreover, the dynamic nature of data in user modeling and recommender systems, such as change of user preferences, and the continuous introduction of new users and items make it increasingly difficult to maintain up-to-date, accurate recommendation models. The objective of this workshop is to bring together researchers and practitioners interested in incremental and adaptive approaches to stream-based user modeling, recommendation and personalization, including algorithms, evaluation issues, incremental content and context mining, privacy and transparency, temporal recommendation or software frameworks for continuous learning.

RecTour 2019: workshop on recommenders in tourism

The Workshop on Recommenders in Tourism (RecTour) 2019, which is held in conjunction with the 13th ACM Conference on Recommender Systems (RecSys), addresses specific challenges for recommender systems in the tourism domain. In this overview paper, we summarize our motivations to organize the RecTour workshop and present the main topics of the submissions that we received. The topics of this year's workshop include context-aware recommendations, group recommender systems, hotel recommendations, destination characterization, next-POI recommendation, user interaction and experience, preference elicitation, user modeling and application of machine learning algorithms in the context of tourism recommender systems.

Recommendation in multistakeholder environments

In research practice, recommender systems are typically evaluated on their ability to provide items that satisfy the needs and interests of the end user. However, in many recommendation domains, the user for whom recommendations are generated is not the only stakeholder in the recommendation outcome. For example, fairness and balance across stakeholders is important in some recommendation applications; achieving a goal such as promoting new sellers in a marketplace might be important in others. Such multistakeholder environments present unique challenges for recommender system design and evaluation, and these challenges were the focus of this workshop.

REVEAL 2019: closing the loop with the real world: reinforcement and robust estimators for recommendation

The REVEAL workshop1 focuses on framing the recommendation problem as a one of making personalized interventions. Moreover, these interventions sometimes depend on each other, where a stream of interactions occurs between the user and the system, and where each decision to recommend something will have an impact on future steps and long-term rewards. This framing creates a number of challenges we will discuss at the workshop. How can recommender systems be evaluated offline in such a context? How can we learn recommendation policies that are aware of these delayed consequences and outcomes?

RecSys challenge 2019: session-based hotel recommendations

The workshop features presentations of accepted contributions to the RecSys Challenge 2019 organized by trivago, TU Wien, Politecnico di Bari, and Karlsruhe Institute of Technology. In the challenge, which originates from the domain of online travel recommender systems, participants had to build a click-prediction model based on user session interactions. Predictions were submitted in the form of a list of suggested accommodations and evaluated on an offline data set that contained the information what accommodation was clicked in the later part of a session. The data set contains anonymized information about almost 16 million session interactions of over 700.000 users visiting the trivago website.

The challenge was well received with 1509 teams that signed up and 607 teams teams that submitted a valid solution. 3452 solutions were submitted during the course of the challenge.

ACM RecSys'19 late-breaking results (posters)

As part of the main program of the 2019 ACM Recommender System Conference, the Late-Breaking Results offers a unique opportunity to share with the community the latest ideas related to recommender systems. This year, we received 42 submissions for the track, out of which 13 were accepted, resulting in a acceptance rate of 31%.

TUTORIAL SESSION: Tutorials

Bandit algorithms in recommender systems

The multi-armed bandit problem models an agent that simultaneously attempts to acquire new knowledge (exploration) and optimize his decisions based on existing knowledge (exploitation). The agent attempts to balance these competing tasks in order to maximize his total value over the period of time considered. There are many practical applications of the bandit model, such as clinical trials, adaptive routing or portfolio design. Over the last decade there has been an increased interest in developing bandit algorithms for specific problems in recommender systems, such as news and ad recommendation, the cold start problem in recommendation, personalization, collaborative filtering with bandits, or combining social networks with bandits to improve product recommendation. The aim of this tutorial is to provide an overview of the various applications of bandit algorithms in recommendation.

Fairness and discrimination in recommendation and retrieval

Fairness and related concerns have become of increasing importance in a variety of AI and machine learning contexts. They are also highly relevant to recommender systems and related problems such as information retrieval, as evidenced by the growing literature in RecSys, FAT*, SIGIR, and special sessions such as the FATREC and FACTS-IR workshops and the Fairness track at TREC 2019; however, translating algorithmic fairness constructs from classification, scoring, and even many ranking settings into recommendation and other information access scenarios is not a straightforward task. This tutorial will help orient RecSys researchers to algorithmic fairness, understand how concepts do and do not translate from other settings, and provide an introduction to the growing literature on this topic.

Multi-stakeholder recommendations: case studies, methods and challenges

Recommender systems are able to produce a list of recommended items tailored to user preferences, while the end user is the only stakeholder in the system. However, there could be multiple stakeholders in several applications or domains, e.g., e-commerce, advertising, educations, dating, job seeking, and so forth. Recommendations are necessary to be produced by balancing the needs of different stakeholders. This tutorial covers the introductions to multi-stakeholder recommender systems (MSRS), introduces multiple case studies, discusses the corresponding methods and challenges to develop MSRS. Particularly, a demo based on the MOEA framework will be given in the talk by using a speed-dating dataset.

Recommendations in a marketplace

In recent years, two sided marketplaces have emerged as viable business models in many real world applications (e.g. Uber, AirBnb), wherein the platforms have customers not only on the demand side (e.g. users), but also on the supply side (e.g. drivers, hosts). Such multi-sided marketplace involves interaction between multiple stakeholders among which there are different individuals with assorted needs. While traditional recommender systems focused specifically towards increasing consumer satisfaction by providing relevant content to consumers, two-sided marketplaces face an interesting problem of optimizing their models for supplier preferences, and visibility. In this tutorial, we consider a number of research problems which need to be address when developing a recommendation framework powering a multi-stakeholder marketplace, and provides audience with a profound introduction to this upcoming area and presents directions of further research. Tutorial material available at: https://rishabhmehrotra.github.io/recs-in-marketplace/

SMORe: modularize graph embedding for recommendation

In the Age of Big Data, graph embedding has received increasing attention for its ability to accommodate the explosion in data volume and diversity, which challenge the foundation of modern recommender systems. Respectively, graph facilitates fusing complex systems of interactions into a unified structure and distributed embedding enables efficient retrieval of entities, as in the case of approximate nearest neighbor (ANN) search. When combined, graph embedding captures relational information beyond entity interaction and towards a problem's underlying structure, as epitomized by struct2vec [20] and PinSage [26]. This session will start by brushing up on the basics about graphs and embedding methods and discussing their merits. We then quickly dive into using the mathematical formulation of graph embedding to derive the modular framework: Sampler-Mapper-Optimizer for Recommendation, or SMORe. We demonstrate existing models used for recommendation, such as MF and BPR, can all be assembled using three basic components: sampler, mapper, and optimizer. The tutorial is accompanied by a hands-on session, where we show how graph embedding can model complex systems through the multi-task learning and the cross-platform data sparsity alleviation tasks.

Concept to code: deep learning for multitask recommendation

Deep Learning has shown significant results in Computer Vision, Natural Language Processing, Speech and recommender systems. Promising techniques include Embedding, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and its variant Long Short-Term Memory (LSTM and Bi-directional LSTMs), Attention, Autoencoders, Generative Adversarial Networks (GAN) and Bidirectional Encoder Representations from Transformer (BERT).

Multi-task learning (MTL) has led to successes in many applications of machine learning. We are proposing a tutorial for applying MTL for recommendation, improving recommendation and providing explanation. We cover few recent and diverse techniques which will be used for hands-on session.

We believe that a self-contained tutorial giving good conceptual understanding of MTL technique with sufficient mathematical background along with actual code will be of immense help to RecSys participants.

SESSION: Doctoral symposium

Music cold-start and long-tail recommendation: bias in deep representations

Recent advances in deep learning have yielded new approaches for music recommendation in the long tail. The new approaches are based on data related to the music content (i.e. the audio signal) and context (i.e. other textual information), from which it automatically obtains a representation in a latent space that is used to generate the recommendations. The authors of these new approaches have shown improved accuracies, thus becoming the new state-of-the-art for music recommendation in the long tail.

One of the drawbacks of these methods is that it is not possible to understand how the recommendations are generated and what the different dimensions of the underlying models represent. The goal of this thesis is to evaluate these models to understand how good are the results from the user perspective and how successful the models are to recommend new artists or less-popular music genres and styles (i.e. the long tail). For example, if a model predicts the latent representation from the audio but a given genre is not well represented in the collection, it is not probable that the songs of this genre are going to be recommended.

First, we will focus on defining a measure that could be used to assess how successful a model is recommending new artists or less-popular genres. Then, the state-of-the-art methods will be evaluated offline to understand how they perform under different circumstances and new methods will be proposed. Later, using an online evaluation it will be possible to understand how these recommendations are perceived by the users.

Increasingly, algorithms are responsible for the music that we consume, understanding their behavior is fundamental to make sure they give the opportunity to new artists and music styles. This work will contribute in this direction, making it possible to give better recommendations for the users.

User's activity driven short-term context inference

Customer decision making process is not invariant. Actual circumstances have a great influence on user's preference adjustments, therefore an absence of incorporating contextual information leads to sub-optimal prediction performance. A popular approach in recommender systems is to treat a context as a set of identifiable and observable attributes while assuming their full separability from an activity. In contrast, we believe that the context emerges from the activity and its change can be perceived and possibly predicted by using mined patterns of its evolution on multiple levels, starting at individual sessions. This paper presents concepts, ideas and motivation for our PhD research project.

Revisiting offline evaluation for implicit-feedback recommender systems

Recommender systems are typically evaluated in an offline setting. A subset of the available user-item interactions is sampled to serve as test set, and some model trained on the remaining data points is then evaluated on its performance to predict which interactions were left out. Alternatively, in an online evaluation setting, multiple versions of the system are deployed and various metrics for those systems are recorded. Systems that score better on these metrics, are then typically preferred. Online evaluation is effective, but inefficient for a number of reasons. Offline evaluation is much more efficient, but current methodologies often fail to accurately predict online performance. In this work, we identify three ways to improve and extend current work on offline evaluation methodologies. More specifically, we believe there is much room for improvement in temporal evaluation, off-policy evaluation, and moving beyond using just clicks to evaluate performance.

Exploiting contextual information for recommender systems oriented to tourism

The use of contextual information like geographic, temporal (including sequential), and item features in Recommender Systems has favored their development in several different domains such as music, news, or tourism, together with new ways of evaluating the generated suggestions. This paper presents the underlying research in a PhD thesis introducing some of the fundamental considerations of the current tourism-based models, emphasizing the Point-Of-Interest (POI) problem, while proposing solutions using some of these additional contexts to analyze how the recommendations are made and how to enrich them. At the same time, we also intend to redefine some of the traditional evaluation metrics using contextual information to take into consideration other complementary aspects beyond item relevance. Our preliminary results show that there is a noticeable popularity bias in the POI recommendation domain that has not been studied in detail so far; moreover, the use of contextual information (such as temporal or geographical) help us both to improve the performance of recommenders and to get better insights of the quality of provided suggestions.

Recommender systems for contextually-aware, versioned items

While existing Recommender systems assume items are fixed entities, this research considers situations where there can be different versions of an item. We propose a process that is a type of contextually-aware post filtering for recommending items, and illustrate the system with real data from a newspaper. The novel framework decides whether or not to recommend particular news articles based on news trend and incorporates user states as additional contextual information and recommends versioned items based on user preferences.

Recommender system for developing new preferences and goals

The research topic is to investigate how recommender systems can help people develop new preferences and goals. Recommender systems nowadays typically use historical user data to predict users' current preferences. However, users might want to develop new preferences. Traditional recommendation approaches would fail in this situation as these approaches typically provide users with recommendations that match their current preference. In addition, users are not always aware of preference development due to the issue of filter bubbles. In this case, recommender systems could also be there to help them step away from their bubbles by suggesting new preferences for them to develop. The research will take a multidisciplinary approach in which insights from psychology on decision making and habit formation are paired with new approaches to recommendation that included preference evolution, interactive exploration methods and goal-directed approaches. Moreover, when evaluating the success of such algorithms, (longitudinal) experiments combining objective behavioral data and subjective user experience will be required to fine-tune and optimize recommendation approaches.