Human Robot Interaction (HRI) is a field of study dedicated to understanding, designing, and evaluating robotic systems for use by, or with, humans. In HRI there is a consensus about the design and implementation of robotic systems that should be able to adapt their behaviour on the basis of user actions and behaviour. The robot should adapt to emotions, personalities, and it should also have memory of past interactions with the user in order to become believable. This is of particular importance in the field of social robotics and social HRI. The aim of this Workshop is to bring together researchers and practitioners who are working on various aspects of social robotics and adaptive interaction.
Smart systems that apply complex reasoning to make decisions and plan behavior, such as decision support systems and personalized recommendations, are difficult for users to understand. Algorithms allow the exploitation of rich and varied data sources, in order to support human decision-making and/or taking direct actions; however, there are increasing concerns surrounding their transparency and accountability, as these processes are typically opaque to the user. Transparency and accountability have attracted increasing interest to provide more effective system training, better reliability and improved usability. This workshop will provide a venue for exploring issues that arise in designing, developing and evaluating intelligent user interfaces that provide system transparency or explanations of their behavior. In addition, our goal is to focus on approaches to mitigate algorithmic biases that can be applied by researchers, even without access to a given system's inter-workings, such as awareness, data provenance, and validation.
Conversational agents are becoming increasingly popular. These systems present an extremely rich and challenging research space for addressing many aspects of user awareness and adaptation, such as user profiles, contexts, personalities, emotions, social dynamics, conversational styles, etc. Adaptive interfaces are of long-standing interest for the HCI community. Meanwhile, new machine learning approaches are introduced in the current generation of conversational agents, such as deep learning, reinforcement learning, and active learning. It is imperative to consider how various aspects of user-awareness should be handled by these new techniques. The goal of this workshop is to bring together researchers in HCI, user modeling, and the AI and NLP communities from both industry and academia, who are interested in advancing the state-of-the-art on the topic of user-aware conversational agents. Through a focused and open exchange of ideas and discussions, we will work to identify central research topics in user-aware conversational agents and develop a strong interdisciplinary foundation to address them.
This tutorial introduces Bayesian computational approaches to interaction and design. Bayesian methods offer a powerful approach for interactive settings with uncertainty and noise. This course introduces the theory and practice of computational Bayesian interaction, covering inference of user data and design/adaptation of interface features based around probabilistic inference. The tutorial is built around hands-on Python programming with modern computational tools, interleaved with theory and practical examples grounded in problems of wide interest in human-computer interaction.
Recent advances in generative modeling will enable new kinds of user experiences around content creation, giving us "creative superpowers" and move us toward co-creation. This workshop brings together researchers and practitioners from both fields HCI and AI to explore and better understand both the opportunities and challenges of generative modelling from a Human-AI interaction perspective for the creation of both physical and digital artifacts.
The use of speech as an interaction modality has grown considerably through the integration of Intelligent Personal Assistants (IPAs- e.g. Siri, Google Assistant) into smartphones and voice based devices (e.g. Amazon Echo). Such engineering advances in speech processing present a unique opportunity for enabling users to interact with interface in a truly conversational way. However, we have yet to see current voice-enable interface fully becoming Conversational User Interfaces (CUIs) as afforded by the underlying speech and natural language capabilities. For example, from a conversational / dialogue perspective, there remain significant gaps in using theoretical frameworks to understand user behaviours and choices and how they may applied to specific speech interface interactions. On a design and Human-Computer Interaction level, we don't yet have the proper tools such as validated design guidelines to help us improve the usability of such interfaces. On the speech processing side, variability in speech, language, and conversation still pose problem, and error-recovery strategies often lead to degraded user experience. From a critical perspective, issues of ethics and privacy remain yet to be addressed.
This workshop aims at exploring how adaptive user interfaces, i.e., user interface that can modify, change, or adapt themselves based on the user, or their context of use, can benefit from Artificial Intelligence (AI) in general, and Machine Learning (ML) techniques in particular, towards objectively improving some software quality properties, such as usability, aesthetics, reliability, or security. For this purpose, participants will present a case study, and classify their proposed technique in terms of several criteria, such as (but not limited to): input, technique, output, adaptation steps covered, adaptation time, level of automation, software quality properties addressed, measurement method, potential benefits, and drawbacks. These will be then clustered for group discussions according to the aforementioned criteria, such as by technique family or property addressed. From these discussions, an AI4AUI framework will emerge that will be used for positioning, comparing presented techniques, and for generating future avenues.
The fourth HUMANIZE workshop1 on Transparency and Explainability in Adaptive Systems through User Modeling Grounded in Psychological Theory took place in conjunction with the 25th annual meeting of the Intelligent User Interfaces (IUI)2 community in Cagliari, Italy on March 17, 2020. The workshop provided a venue for researchers from different fields to interact by accepting contributions on the intersection of practical data mining methods and theoretical knowledge for personalization. A total of four papers was accepted for this edition of the workshop.
Recommender systems (RSs) research is so far focused mainly on improving recommendations' quality and precision. However, it is also important to discover the effect of RSs on users' behaviours. So far, very few researches have analysed the effect of RSs on users' behaviours. Therefore, not much knowledge about this essential topic is obtained. Hence, in this PhD researchwe focus on RSs' effect on users' behaviours. In order to investigate this effect, we propose a simulation procedure of users' choices under the influence of a RS. Then we measure metrics on users' choices that capture RSs effect. Moreover, we propose to conduct online experiments to study the effect of RSs by designing web-based platforms tracking the choices of the users in systems that offer also recommendations to the users.
Robots are becoming more and more present in our daily activities. In order to improve user interaction with them, it is important to design behaviors in robots that show social attitude and ability to adapt to the users. For this purpose, robots should adapt their behavior recognizing the user's emotion, also considering the actual user with cognitive and physical disabilities. However, most contemporary approaches rarely attempt to consider recognized emotional features in an active manner to modulate robot decision-making and dialogue for the benefit of the user. In this project, I aim to design and implement a module in a humanoid robot to create an adaptive behavior in a Social Robot for older adults who may have cognitive impairments.
The goal of this research is to develop an A/B testing method to automatically compare the user experience (UX) of alternative designs for a web application in a real context with a large number of users. The challenge that it poses is to find mechanisms to predict the UX with machine learning techniques. This submission outlines the motivation, research goal, current status and remaining work.
This paper presents the early stages of my PhD research aiming at advancing the field of eXplainable AI (XAI) investigating the twinsystems, where an uninterpretable black-box model is twinned with a white-box one, usually less accurate but more inspectable, to provide explanations to the classification results.We focus in particular on the twinning occurring between an Artificial Neural Network (ANN) and a Case-Based Reasoning (CBR) system, so-called ANNCBR twins, to explain the predictions in a post-hoc manner taking account of (i) a feature-weighting method for mirroring the ANN results in the CBR, (ii) a set of evaluation metrics that correlate the ANN to other white/grey models supporting explanations for users, (iii) a taxonomy of methods for generating explanations from the twinning for the neural network's predictions.
Multimodal interfaces can leverage the information from multiple modalities to provide robust and error-free interaction. Early multimodal interfaces demonstrate the feasibility of building such systems but focused on specific applications. The challenge in building adaptive systems is lack of techniques for input data fusion. In this direction, we have developed a multimodal head and eye gaze interface and evaluated it in two scenarios. In aviation scenario, our interface has reduced the task time and perceived cognitive load significantly from the existing interface. We have also studied the effect of various output conditions on user's performance in a Virtual Reality (VR) task. Further, we are making our proposed interface to include additional modalities and building novel haptic and multimodal output systems for VR.
In some scenarios, like music or tourism, people often consume items in groups. However, reaching a consensus is difficult as different members of the group may have highly diverging tastes. To keep the rest of the group satisfied, an individual might need to be confronted occasionally with items they do not like. In this context, presenting an explanation of how the system came up with the recommended item(s), may make it easier for users to accept items they might not like for the benefit of the group. This paper presents our progress on proposing improved algorithms for recommending items (for both music and tourism) for a group to consume and an approach for generating natural language explanations. Our future directions include extending the current work by modeling different factors that we need to consider when we generate explanations for groups e.g. size of the group, group members' personality, demographics, and their relationship.
The increasingly prevalent use of chatbots provides an innovative way to conduct conversation interviews. However, the lack of comprehensive design guidance and robust evaluation methodologies challenge designers in developing interview chatbots with good elicitation and ethics. This proposal presents our progress on investigating interview chatbots' performances in eliciting high-quality response without breaking any ethical rules, based on which an automatic evaluation framework will be developed. We also present our plan to propose design implications and prototypes for more robust interview chatbot development cycle.
Human Machine Interfaces (HMIs) enable the communication between humans and machines. In the automotive domain, all in-vehicle systems used to be independent. Today they are more and more interconnected and interdependent. However, they still don't act in unison to help drivers achieve their individual goals. More specifically, even though, some current HMIs provide a certain degree of personalization, they don't adapt dynamically to the situation and don't learn driver-specific nuances in order to improve the driver's user experience.
Alzheimer's Disease(AD) is a neurocognitive disease that causes impairments of cognition as well as Activities of Daily Living (ADLs). This research investigates the possibility of diagnosing the early stages of AD using the impairments of ADLs. It explores the areas which are difficult to assess and undiagnosed in clinical settings using behavioral analytics. The main focus of the research is to identify unrecognized and abnormal behavioral patterns associated with ADLs with the use of multimodal technologies such as hand movements, facial responses, eye gaze, and spontaneous speech patterns which are visible at prodromal stages of AD and evaluate their association with the corresponding mental status.
Recent trends in computer-mediated communications (CMC) have seen messaging with richer media not only in images and videos, but in visual communication markers (VCM) such as emoticons, emojis, and stickers. VCMs could prevent a potential loss of subtle emotional conversation in CMC, which is delivered by nonverbal cues that convey affective and emotional information. However, as the number of VCMs grows in the selection set, the problem of VCM entry needs to be addressed. Furthermore, conventional means of accessing VCMs continue to rely on input entry methods that are not directly and intimately tied to expressive nonverbal cues. In this work, we aim to address this issue, by facilitating the use of an alternative form of VCM entry: hand gestures. To that end, we propose a user-defined hand gesture set that is highly representative of a number of VCMs and a two-stage hand gesture recognition system (trajectory-based, shape-based) that can identify these user-defined hand gestures with an accuracy of 82%. By developing such a system, we aim to allow people using low-bandwidth forms of CMCs to still enjoy their convenient and discreet properties, while also allowing them to experience more of the intimacy and expressiveness of higher-bandwidth online communication.
We propose a method of visualizing user activities based on user's head and eye movements. Since we use an unobtrusive eyewear sensor, the measurement scene is unconstrained. In addition, due to the unsupervised end-to-end deep algorithm, users can discover unanticipated activities based on the exploratory analysis of low-dimensional representation of sensor data. We also suggest the novel regularization that makes the representation person invariant.
Recognizing changes in users' experienced mental effort is a perennial interest in human-computer interaction research particularly in the design of intelligent user interfaces built to adapt to different levels of mental effort. With virtual reality (VR) applications, for example, many measures of mental workload (e.g., secondary tasks) are highly intrusive and can distort what is being measured. In this paper we investigate the entropy of controller movements as an indicator of mental effort that can be measured unobtrusively. We report a proof-of-concept study that manipulates the experienced mental effort using the popular e-crossing task. As expected, the results show that entropy is increased for people with higher mental effort than for people with lower mental effort and that there is a positive relationship with NASA-TLX scores, the benchmark questionnaire for mental effort. Thus, intelligent user interfaces become capable of detecting mental effort in VR on the basis of controller entropy and could recognize when users need assistance in their decision making.
In today's team sports, the effective and user-friendly support of analysts and coaches in analyzing their team's tactics is essential. In this paper, we present an extended version of SPORTSENSE, a tool for searching in sports video by means of sketches, for creating and visualizing statistics of individual players and the entire team, and for visualizing the players' off-ball movement. SPORTSENSE has been developed in close collaboration with football coaches.
The key idea behind this paper is to generate fixation heatmap of unknown documents to visualize and determine the focus areas in a document as a first step towards the readability measurement of the document. The data samples were collected by conducting experiment with nine participants reading 15 documents and the proposed method was to predict the fixation duration of each word in the documents. A Random Forest Regression model was used to predict the fixation duration per word and we achieved a mean regression score (R2) of 0.757 for all the documents.
We demonstrate an intelligent, personalized, multifaceted visualization of people recommendation using a personalized 2D entities graph and a word cloud for exploration by the user. This visualization aims to show non-trivial connections, e.g., those that the user may had forgotten about, but they are interesting and relevant. Since entities we are linked to are part of our lives (and profile), they help to understand who we are and what are we interested in. We adapt the typed entity-relation graph (profile) concept as introduced by  and based on this presentation we visualize the entity profile. In this demonstration, the users, as case study are the participants of IUI'20, will be able to explore their own personalized entities graph based on entities and relations that the system harvest about them (after getting their approval), from the web for finding interesting connections that they may meet in the context of this conference.
Music production software often has complex interfaces and needs the user to know the basic musical know-how. In this paper, we present a conversational agent that allows creating music in a simplified way through voice-based interaction. Indeed, our agent can be configured and customized with simple and natural voice commands. In addition, it has some typically human cognitive skills to produce music: it listens to the user while singing a song and generates a melody by discovering and copying the patterns of her/his human voice. Technologically, the system is empowered by Google Dialogflow for conversation management and uses an advanced technique called abstract melody for music production. This Musical and Conversational Artificial Intelligence is an actual innovation since it does not require any preliminary knowledge about music and, consequently, includes professionals, but also children, beginners, and people with physical disease.
This paper presents a dialogue system that acquires user's food preference through a conversation. First, we proposed a method for selecting relevant topics and generating questions based on Freebase, a large-scale knowledge graph. To select relevant topics, using the Wikipedia corpus, we created a topic-embedding model that represents the correlation among topics. For missing entities in Freebase, knowledge completion was applied using knowledge graph embedding. We incorporated these functions into a dialogue system and conducted a user study. The results reveal that the proposed dialogue system more efficiently elicited words related to food and common nouns, and these words were highly correlated in a word embedding space.
The procedure used to input an equation and define a graph using existing tools remains unnatural and troublesome for novice students. To address this shortcoming, we propose a graph creation tool based on a natural mathematical description. In this study, we improved the predictive conversion speed of the mathematical input interface used in the previous graph creation tool. The results of a performance comparing test showed that the mean task time when using the proposed tool was approximately 1.2--1.7 times faster than that using GeoGebra.
This is a demo of a virtual academic adviser that enables student self-guidance and counters the unavailability of true academic advisers. The adviser integrates four visual tools: flexible and personalized planning of future terms and courses, body of knowledge acquisition mapping tool for easier finding of courses that are aligned to personal interests in terms of the acquired knowledge and abilities, and study program switch exploration tool to ease student mobility.
This paper proposes a multimodal meeting browser with a CNN model that estimates important utterances based on co-occurrence of verbal and nonverbal behaviors in multi-party conversations. The proposed browser was designed to visualize important utterances and to make it easier to observe the nonverbal behaviors of the conversation participants. A user study was conducted to examine whether the proposed browser supports the user to correctly understand the content of the discussion. By comparing a text-based browser and a simple video player, it was found that the proposed browser was more efficient than the video player and allowed the user to obtain a more accurate understanding of the discussion than the text-based browser.
With the richness of interactions among users emerging through different social media applications, drawing conclusive evidence about the sign of these relations (positive and negative) is receiving significant attention. In this paper, we propose an adaptive link prediction system which tactfully ensembles both local and nonlocal attributes of an edge to predict it's sign while considering the high variance of the network and handling the inherent sparsity of the graph. Experimental validation on signed networks, like Slash-dot and Epinions indicate the proposed approach can ensure high significant prediction accuracy when compared with the existing research works.
Microsoft LUIS is a natural language understanding service used to train Chatbots. Imbalance in the utterance training set may cause the LUIS model to predict the wrong intent for a user's query. We discuss this problem and the training recommendations from Microsoft to improve prediction accuracy with LUIS. We perform batch testing on three training sets created from two existing datasets to explore the effectiveness of these recommendations.
Several tourism applications have been designed to support on-line information search and content browsing. However, often they neglect user's current visit experience, i.e., what the user already experienced off-line. In this demo we showcase a novel mobile app enhancing a traveller's visit experience by considering the visit context and the traveller's currently visited locations. The app has been designed as a tool for advancing the state of the art in decision support systems. The app can be used outside the lab, hence taking into account the true complexity of user decision making, while lab experiments tend to over-simplify that.
We present a case study regarding a factory operating in the agro-industrial sector. We show how consumption management issues can be solved through the pervasive installation of sensors on the production lines and the design of a software which helps workers and managers to access data retrieved on the shop floor.
We demonstrate SIMURGH, an interactive framework for generating customized travel packages (TPs) for individuals or for groups of travelers. This is beneficial in various use cases such as tourism planning and advertisement. Simurgh relies on gathering preferences of travelers and solving an optimization problem to generate personalized travel packages. SIMURGH goes beyond personalization by allowing travelers to customize travel packages via simple-yet-powerful interaction operators.
Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.
Many large text collections exhibit graph structures, either inherent to the content itself or encoded in the metadata of the individual documents. Example graphs extracted from document collections are co-author networks, citation networks, or named-entity-cooccurrence networks. Furthermore, social networks can be extracted from email corpora, tweets, or social media. When it comes to visualising these large corpora, traditionally either the textual content or the network graph are used. We propose to incorporate both, text and graph, to not only visualise the semantic information encoded in the documents' content but also the relationships expressed by the inherent network structure in a two-dimensional landscape. We illustrate the effectiveness of our approach with an exploration interface for different real world datasets.
The main purpose of this visualization is for developing a visualization system to help users navigate their text data so that users can easily identify the main topics of data. In addition, the visualization system allows users to set options, such as text data processing pipelines, or provides proper interactions, giving them more flexible and diverse user experience of viewing data.
Users are able to identify topic keywords and the distribution of clustered text data with a hexagonal view. They can zoom the view for detailed topic of the data distribution. Also, after dragging areas or selecting internal hexagons, they cannot only grasp the topic change over time of the selected data, but can also understand the relationship between the topic keywords. Furthermore, they can compare the main keywords by each cluster of selected data.
Automated Artificial Intelligence and Machine Learning (AutoAI / AutoML) can now automate every step of the end-to-end AI Lifecycle, from data cleaning, to algorithm selection, and to model deployment and monitoring in the machine learning workflow. AutoAI technologies, initially aimed to save data scientists from the low level coding tasks, also has great potential to serve non-technical users such as domain experts and business users to build and deploy machine learning models. Researchers coined it as "democratizing AI", where non-technical users are empowered by AutoAI technologies to create and adopt AI models. To realize such promise, AutoAI needs to translate and incorporate the real-world business logic and requirements into the automation. In this Demo, we present a first of its kinds experimental system, IBM AutoAI Playground, that enables non-technical users to define and customize their business goals (e.g., Prediction Time) as constraints. AutoAI then builds models to satisfy those constraints while optimizing for the model performance (e.g., ROC AUC score). This Demo also showcases AutoAIViz, a Conditional Parallel Coordinates visualization feature, and a TrustedAI feature from two accepted IUI'20 papers.
User Interface design is an iterative process that progresses through low-, medium-, and high-fidelity prototypes. A few research projects use deep learning to automate this process by transforming low fidelity (lo-fi) sketches into front-end code. However, these research projects lack a large scale dataset of lo-fi sketches to train detection models. As a solution, we created Syn, a synthetic dataset containing 125,000 lo-fi sketches. These lo-fi sketches were synthetically generated using our UISketch dataset containing 5,917 sketches of 19 UI elements drawn by 350 participants. To realize the usage of Syn, we used it to train a UI element detector, Meta-Morph. It detects UI elements from a lo-fi sketch with 84.9% mAP and 72.7% AR. This work aims to support future research on UI element sketch detection and automating prototype fidelity transformation.
UI designers look for inspirational examples from existing UI designs during the prototyping process. However, they have to reconstruct these example UI designs from scratch to edit content or apply styling. The existing solution attempts to make UI screens into editable vector graphics using image segmentation techniques. In this research, we aim to use deep learning and gestalt laws-based algorithms to convert UI screens to editable blueprints by identifying the constituent UI element categories, their location, dimension, text content, and layout hierarchy. In this paper, we present a proof-of-concept web application that uses the UI screens and annotations from the RICO dataset and generates an editable blueprint vector graphic, and a UI layout tree. With this research, we aim to support UX designers in reconstructing UI screens and communicating UI layout information to developers.
Computer Aided Sign Language Learning (CASLL) is a recent and promising field of research which is made feasible by advances in Computer Vision and Sign Language Recognition. The importance of feedback for language learning has been established by many research works. In this work, we introduce SignGuru a chat-bot based AI tutor that can provide fine-grained feedback to learners of American Sign Language. SignGuru provides feedback directly relating to the fundamental concepts of ASL using a modular and explainable AI. The chatbot is designed not only as an interactive for learners to choose a curriculum and go through the interactive learning process, but also to perform retention and execution tests. The usability and utility of SignGuru has been validated by a user-study using 14 ASL signs with 26 users. We demonstrate the fully functioning application with a variety of different curriculum.
Explanations can be used to supply transparency in recommender systems (RSs). However, when presenting a shared explanation to a group, we need to balance users' need for privacy with their need for transparency. This is particularly challenging when group members have highly diverging tastes and individuals are confronted with items they do not like, for the benefit of the group. This paper investigates which information people would like to disclose in explanations for group recommendations in the music domain.
Supervised machine learning requires labelled data examples to train models, and those examples often come from humans who may not be experts in artificial intelligence (i.e., "AI"). Currently, many resources are devoted to these labelling tasks; a majority of which are outsourced by companies to reduce costs, and oversight on such tasks can be cumbersome. Concurrently, biases in machine learning models and human cognition are a growing concern in applications of AI.
In this paper, we present a machine teaching platform for non-AI experts that leverages interactive data exploration approaches to identify algorithmic and human (e.g., cognitive) biases. Our main objective is to understand how data exploration and explainability might impact the machine teacher (i.e., data labeller) and their understanding of AI, subsequently improving model performance, all while reducing potential bias concerns.
This paper presents our real-time human activity recognition system that understands human behavior using multimodal sensor data at multiple levels. Our system consists of a multimodal data acquisition framework and a user understanding algorithm including user identification, activity recognition, and health monitoring components.
Developers spend a great deal of time to adapt UI to different devices. By learning experience from massive number of human designed UI products, the adaptation work could be finished by machines. To this end, we introduce DUES-Adapt, an AI based UI adaptation system, and showcase in this demonstration. Given an input UI, DUES-Adapt parses the basic UI elements and employs the parsing results to generate a reasonable and aesthetic layout for a target device. The two AI problems, UI parsing and layout generation, are solved using deep neural network model and trained with over 10K app instances collected from mainstream Android markets. In the demonstration, we show a number of cases covering many apps like music, maps, fitness and different target terminals such as tablet, smartwatch, TV etc.
Projection-based augmented reality (AR) is a promising medium for realizing pervasive computing environment, and yet the problem of determining projection-suitable regions and where to project remains. To tackle this problem, we introduce FRISP, a projection-based augmented reality (AR) Framework designed for Registering Interactive Spatial Projection. The FRISP framework utilizes a pantilt projection-camera (pro-cam) system for capturing geometry and projection mapping. The framework scans and analyzes the geometric properties of a room, in order to determine projection-suitable regions and generate multi-window layouts. Once the multi-windows are registered to the real world, they can be interacted with by the users. The users can assign various widgets or applications to the multi-windows, which are then finally augmented onto the real world and can serve as a base units for realizing the pervasive AR environment.
In this work, we present a prototype of an intelligent system that can automate the UI design process via converting text descriptions into interactive design prototypes. We conducted user research in an international oilfield services company, and found that product owners prefer to validate their hypotheses via visual mockups rather than text descriptions; however, many of them need assistance from designers to produce the visual mockups. Based on this finding and after exploring multiple possibilities using design thinking, we chose a solution that uses natural language processing (NLP) to automate the visual design process. To validate the answer, we conducted user tests via means and iterated the solution. In the future, we expect the work can be fully deployed in a working environment to help product owners initiate their projects faster.
The large number of social platforms developed enable users to express their opinions and access information more freely. However, their algorithmic strategies can have a high possibility of exacerbating a filter bubble or echo chambers which may evoke distinctive emotions response with others. Herein, we present a new online visualization tool for opinion sharing, called CrowdForest, which allows users to visualize their opinions, interacting with others based on semantic figurative metaphors driven by sentiment analysis.
This paper presents an optimal camera arrangement method for surveillance camera systems that evaluates the arrangement of cameras in 3D space. It makes use of multiple factors and optimizes this arrangement using a genetic algorithm (GA). We implemented a prototype in Unity that demonstrates our proposed method, and experiments show the performance of our method when used for placement in a virtual environment.
We propose an artificial intelligence (AI)-based framework for generating 360-degree videos from videos recorded by monocular cameras. We also show immersive virtual reality content generation using AI through an analysis of user experience that compares manually designed and AI-generated 360-degree videos based on the proposed framework. The production of 360-degree videos conventionally requires special equipment, such as omni-directional cameras. Our framework is applicable to a massive amount of existing camera and video, hence it increases the availability of 360-degree videos. We implemented our framework in two steps. First, we generate a three-dimensional point cloud from the input video. Then, we apply AI-based methods to interpolate the sparse point cloud based on geometric and semantic information. Our framework will be applicable to several usages such as assisting surveying past traffic accident videos and education on showing the historical townscape of 360 degrees, etc.
In video-based learning, estimating the level of concentration is important for increasing the efficiency of learning. Facial expressions during learning obtained with a Web camera are often used to estimate concentration because cameras are easy to install. In this work, we focus on how learners react to video contents and propose a new method which is based on the Jaccard coefficient calculated from learner's facial reactions to teacher's actions. We conduct experiments and collect data in a Japanese cram school. Analysis of our collected data shows a weighted-F1 score of 0.57 for four levels of concentration classification, which is higher than the accuracy obtained with the methods based on learner's facial expression alone. The results indicate that our method can be effective for concentration estimation in an actual learning environment.
The number of automated driving functionalities in conventional vehicles is rising year by year. Intensive research regarding highly automated vehicles (AV) is performed by all big OEMs. AVs need advanced sensors and intelligence to detect relevant objects in driving situations and to perform driving tasks safely. Due to the shift of control, the role of the driver changes to an on-board user without any driving related tasks. However, the interaction between the AV and its on-board user stays vital in terms of creating a common understanding of the current situation and establishing a shared representation of the upcoming manoeuvre to ensure user acceptance and trust in automation. The current paper investigates two different light-based HMI approaches for AV / on-board user interaction. In a VR-Study 33 participants experienced an automated left turn in an urban scenario in highly automated driving. While turning, the AV had to consider other road users (pedestrian or another vehicle). The two HMI approaches (intention- vs. perception-based) were compared to a baseline using a within-subject design. Results reveal that using perception- or intention-based interaction design lead to higher user trust and usability in both scenarios.
State-of-the-art speech synthesis owes much to modern AI machine learning, with recurrent neural networks becoming the new standard. However, how you say something is just as important as what you say. If we draw inspiration from human dramatic performance, ideas such as artistic direction can help us design interactive speech synthesis systems which can be finely controlled by a human voice. This "voice puppetry" has many possible applications from film dubbing to the pre-creation of prompts for a conversational agent. Previous work in voice puppetry has raised the question of how such a system should work and how we might interact with it. Here, we share the results of a focus group discussing voice puppetry and responding to a voice puppetry demo. Results highlight a main challenge in user-centred AI: where is the trade-off between control and automation? and how may users control this trade-off?
The International Space Station ISS is a scientific laboratory in which astronauts conduct a great variety of experiments on a tight schedule. In order to fulfill their tasks efficiently and correctly, astronauts need assistance, which (at least partially) can be provided by IT systems on board, among them robotic assistants like the Crew Interactive Mobile Companion CIMON. However, the creation of user interfaces for such systems is a challenge, because astronauts often have to interact hands-free or cannot direct their attention to a visual user interface. These challenges can be met by providing multimodal user interfaces that enable speech interaction, among other modalities. We describe the use context for speech interfaces on the ISS, specific requirements and possible solutions. Our concepts rely on previous work carried out in acoustically demanding environments.
Modern information access systems extensively use personalization, automatically filtering and/or ranking content based on the user profile, to guide users to the most relevant material. However, this can also lead to unwanted effects such as the "filter bubble." We present an interactive demonstration system, designed as an educational and research tool, which imitates a search engine, personalizing the search results returned for a query based on the user's characteristics. The system can be tailored to suit any type of audience and context, as well as enabling the collection of responses and interaction data.
This paper introduces an interface that enables the user to quickly identify relevant fragments within multiple long documents. The proposed method relies on a machine-generated layer of annotations that reveals the coverage of topics per fragment and document. To illustrate how the annotations double as a tool for preview as well as navigation, an example application is presented in the form of a personalised learning system that recommends relevant fragments of video lectures according to user's history. Potential implications of this approach for lifelong learning are discussed. We argue that this approach is generally applicable to recommender and information retrieval systems, across multiple knowledge domains and document types.
Programming education has become an integral part of the primary school curriculum. However, most programming practices rely heavily on computers and electronics which causes inequalities across contexts with different socioeconomic levels. This demo introduces a new and convenient way of using tangibles for coding in classrooms. Our programming environment, Kart-ON, is designed as an affordable means to increase collaboration among students and decrease dependency on screen-based interfaces. Kart-ON is a tangible programming language that uses everyday objects such as paper, pen, fabrics as programming objects and employs a mobile phone as the compiler. Our preliminary studies with children (n=16, mage=12) show that Kart-ON boosts active and collaborative student participation in the tangible programming task, which is especially valuable in crowded classrooms with limited access to computational devices.
This demo introduces a novel mHealth application with an agent-based interface designed to collect multimodal data with passive sensors native to popular wearables (e.g., Apple Watch, FitBit, and Garmin) as well as through user self-report. This mHealth application delivers personalized and adaptive multimedia content via smartphone application specifically tailored to the user in the interdependent domains of physical, cognitive, and emotional health via novel adaptive logic-based algorithms while employing behavior change techniques (e.g., goal-setting, barrier identification, etc.). A virtual human coach leads all interactions to improve adherence.
Explainable AI (XAI) for text is an emerging field focused on developing novel techniques to render black-box models more interpretable for text-related tasks. To understand the recent advances in XAI for text, we have done an extensive literature review and user studies. Allowing users to easily explore the assets we created is a major challenge. In this demo we present an interactive website named XAIT. The core of XAIT is a tree-like taxonomy, with which the users can interactively explore and understand the field of XAI for text through different dimensions: (1) the type of text tasks in consideration; (2) the explanation techniques used for a particular task; (3) who are the target and appropriate users for a particular explanation technique. XAIT can be used as a recommender system for users to find out what are the appropriate and suitable explanation techniques for their text-related tasks, or for researchers who want to find out publications and tools relating to XAI for text.
Recently, recipe short videos such as Kurashiru and DELISH KITCHEN have been rapidly gaining attention. These recipe videos can help people learn the essentials of cooking in a short time. However, it is difficult to understand cooking operations by watching a video only one time. Also, since these short recipe videos do not take into account the user's cooking level, anyone can see only the same videos. Therefore, in this paper, we propose a novel cooking support system for recipe short videos, called Dynamic Video Tag Cloud. For this, we first extract cooking operations from a text recipe included in an existing recipe short video. Next, we extract various supplementary recipe information (videos) based on users' cooking levels by weighting the appearance frequency of cooking operations for each cooking genre. Then, the system visualizes supplementary recipe information (videos) to the users in a tag that could interface.
Drowsiness is a major factor that hinders learning. To improve learning efficiency, it is important to understand students' physical status such as wakefulness during online coursework. In this study, we have proposed a drowsiness estimation method based on learners' head and facial movements while viewing video lectures. To examine the effectiveness of head and facial movements in drowsiness estimation, we collected learner video data recorded during e-learning and applied a deep learning approach under the following conditions: (a) using only facial movement data, (b) using only head movement data, and (c) using both facial and head movement data. We achieved an average F1-macro score of 0.74 in personalized models for detecting learner drowsiness using both facial and head movement data.
Smart homes provide alternative means to foster autonomy for frail people living at home. Oral and visual cues are produced to help people carrying out activities. This necessitates to determine which sensors and effectors to choose for monitoring activities, which is is not trivial. A Do-it-Yourself approach is proposed for caregivers who know the frail people habits but needs a user-friendly interaction.
Augmented reality and ontologies are aimed to address many of the smart home design issues, via a virtual advisor. The augmented reality interface is linked to an OWL ontology that describes space, sensors and effectors, activities of daily living, monitoring and assistance. First, a semantic 3D model of one's house is constructed. Second, still on augmented reality, a hierarchical model of the assistance and monitoring scenario is specified. A virtual advisor proposes actions, scenarios and corrections of design inconsistencies.
Human speakers in a dialog adapt their responses and the way they convey them to their interlocutors by appropriately tuning their prosody, taking into account the context in which the dialog takes place. Today's spoken dialog systems are incapable of exhibiting such natural responsive behavior. Hence, there is a need for models that enable the selection of better prosody in system responses to make them appropriate to the pragmatic intentions and the dialog context. This submission includes the detailed description of my preliminary study on the prosody of discourse markers, the methods used and my initial findings that corroborate the existence of correlations between prosody and pragmatic intentions of discourse markers in human-human dialogs. These correlations, if modeled accurately, can help dialog systems respond with context-appropriate prosody.
This paper presents CogniKit; an extensible tool for human cognitive modeling. It is based on the analysis, classification and visualization of eye tracking data such as gaze points, fixation count and duration, saccades, gaze transition and stationary entropy, heat maps, areas of interests, etc. These are further processed, analyzed and classified for detecting higher level human cognitive factors such as cognitive processing styles and abilities. CogniKit comprises of two main components: i) a software application that collects and processes low- and highlevel eye gaze data metrics in real-time; and ii) an extensible interactive workbench for storing, analyzing, classifying and visualizing the collected eye gaze data. We developed an example application to demonstrate the use of CogniKit within a practical scenario.
Creating personas from large amounts of online data is useful but difficult with manual methods. To address this difficulty, we present Automatic Persona Generation (APG), which is an implementation of a methodology for quantitatively generating data-driven personas from online social media data. APG is functional, and it is deployed with several organizations in multiple industry verticals. APG employs a scalable web front-end user interface and robust back-end database framework processing tens of millions of user interactions with tens of thousands of online digital products across multiple online platforms, including Facebook, Google Analytics, and YouTube. APG identifies audience segments that are both distinct and impactful for an organization to create persona profiles. APG enhances numerical social media data with relevant human attributes, such as names, photos, topics, etc. Here, we discuss the architecture development and central system features. Overall, APG can benefit organizations distributing content via online platforms or with online content that relates to commercial products. APG is unique in its algorithmic approach to processing social media data for customer insights. APG can be found online at https://persona.qcri.org.
We present an early prototype conversational agent (CA), called Pan, for retrieving information to support criminal investigations. Our approach tackles the issue of algorithmic transparency, which is critical in unpredictable, high risk, and high consequence domains. We present a novel method to flexibly model CA intentions and provide transparency of attributes that is underpinned with human recognition. We propose that Pan can be used for experimentation to probe analyst requirements and to evaluate the effectiveness of our explanation structure.
Sketching free body diagrams is an important skill that students learn in introductory physics and engineering classes; however, university class sizes are growing and often have hundreds of students in a single class. This creates a grading challenge for instructors as there is simply not enough time nor resources to provide adequate feedback on every problem. We have developed an intelligent user interface called Mechanix to provide automated, real-time feedback on hand-drawn free body diagrams for students. The system is driven by novel sketch recognition algorithms developed for recognizing and comparing trusses, general shapes, and arrows in diagrams. We have also discovered trends in how the students utilize extra submissions for learning through deployment to five universities with 350 students completing homework on the system over the 2018 and 2019 school year. A study with 57 students showed the system allowed for homework scores similar to other homework mediums while requiring and automatically grading the free body diagrams in addition to answers.
This is an overview paper on an interactive music exploration interface for music collections. This interface is meant to help explore the cross-cultural similarities, interactions, and patterns of music excerpts from different regions and understand the similarities by employing computational audio analysis, machine learning, and visualization techniques. In our computational analysis, we used standard audio features that capture timbre information and projected them onto a lower-dimensional space for visualizing the (dis)similarity. There are two collections of non-Eurogenetic music under study. The 2-D and 3-D mappings are visualized through a dashboard application and also rendered in Virtual Reality space where users can interact and explore to get meaningful insights about the structural (dis)similarities of the music collections.
In this demo paper, we present a demonstrator for a ring-based finger tracking approach. The demonstrator consists of a ring-shaped interaction device, called PeriSense, utilizing capacitive sensing in order to enable finger tracking. The motion of the finger wearing the ring and the adjacent fingers is sensed by measuring the capacitive proximity between the electrodes and the human skin. To map the capacitive measurements to the finger angles, we apply a regression model based on long short-term memory (LSTM). A virtual 3D hand model renders simultaneous the predicted finger angles.