In this work, we illustrate the design of an explainable recommendation framework that seeks to maximize exploration while maintaining recommendation relevance. We use Gated Recurrent Unit (GRU) based Recurrent Neural Networks (RNNs) to model the temporal evolution of users' preference for items. We define an objective metric for exploration and frame the overall optimization loss function to incorporate exploration as a regularizer. We report evaluations on state of the art Netflix and IMDb datasets, and highlight future research agenda for developing an interactive framework based on the proposed algorithm.
Advances in mobile computing power have opened possibilities for processing intensive optical recognition and machine learning applications. Despite the widespread availability of mobiles phones and digital resources, the education industry has made few updates to time-consuming grading techniques: many teachers continue to grade tests by hand or use outdated technology. Our goal is to produce a seamless, efficient, and accessible mobile experience to automate grading across multiple formats. We present a solution that allows teachers to take a photo and receive a visualized result after our deep learning application recognizes handwriting patterns, grades test answers, and identifies names and IDs. As a result, we contribute an improved computational tool for teachers to quickly create and grade exams across various formats.
The use of intelligent tutoring systems to support coordination in collaborative distant learning has become an important topic. This study investigated how the coupling of emotional states detected from facial expressions can be used as indices. Forty participants engaged in a simple explanation task that required coordination; their facial expressions were recorded. We analyzed the learners' emotional recurrence and investigated its validity through comparisons with randomized variables of emotion. Furthermore, we investigated the extent to which variables can be used as indices to determine learning performance. Based on these results, we discuss how to further investigate the use of emotional recurrence to predict collaborative learning for future work.
In this paper, we explored the potential of using a conversational chatbot interface to guide students to perform peer assessments. Our results show that our chatbot interface is in general successful in guiding graders, as reflected by students' grading consistencies across multiple assessments and correlated with teaching assistants' grading. Our results provide insights into how conversational interface can be used to support peer assessments.
Current input interfaces for entering equations on mathematics e-learning systems are cumbersome, especially for novice learners. In addition, most equation editors are not developed for use in smartphones. Our goal is to develop an intelligent mathematical input interface for smartphones based on a math input method that allows users to input mathematical expressions by using colloquial-style linear strings. This paper presents an evaluation of our proposed input interface. The results show that proposed input interface allows mean was approximately 1.3 times faster than the standard interfaces.
In this paper, we present the relationship between the ankle joint angle and the frequency characteristics of propagating vibration using the active bone-conducted sound sensing system. We show an experiment to clarify relationships between the ankle joint angle and the frequency characteristics of bone-conducted sounds. Then, results of the experiment are discussed.
There are few diversity of movie trailers because the types of effects in a trailer are limited. Therefore, it is difficult to edit a trailer that caters to the different preferences of various users. To solve this problem, we define seven editing biases by images, audios and captions for the trailers, and we propose the user interface representing the result of analysis of the trailer.
Wearable devices make self-monitoring easier by the users, who usually tend to increase physical activity and weight loss maintenance over time. But in terms of behavior adaptation to these goals, these devices do not provide specific features beyond monitoring the achievement of daily goals, such as a number of steps or miles walked, and caloric outtake. The purpose of this study is to evaluate the efficacy of the recommendations provided by traditional fitness tracker apps with respect to weight loss scenarios.
The wide use of motion sensors in today's smartphones has enabled a range of innovative applications which these sensors are not originally designed for. Human activity recognition and smartphone position detection are two of them. In this paper, we present a system for the joint recognition of human activity and smartphone position. Our study shows that the coordinate transformation approach applied to motion data makes our system robust to smartphone orientation variation. Contrary to popular belief, the simple neural network does provide the accuracy comparable to the deep learning models in our problem. In addition, it suggests that the motion sensor sampling rate is another key factor to the recognition problem.
We introduce a novel model for future/next page prediction in online user journeys that uses a combination of doc2vec webpage representations with an LSTM-based neural network to mine patterns from users' online navigational paths combined with their content preferences. Empirical explorations show promise towards creating customized user experiences leveraging this work.
Explainability is an important desired property of recommendation systems. We consider a graph-based recommendation approach, which generates detailed explanations to users in the form of labelled connecting relational paths and present a visualization interface intended to convey such detailed relational information in clear and intuitive manner. As initial evaluation, we performed a user study at an academic conference, recommending participants the users may be interested to meet (using <u>rsr.cloud</u>). The feedbacks were enthusiastic, indicating that the proposed visualizations of relational explanations are engaging and useful.
This article outlines a new class of computing system, which we refer to as collaborative lifelogging (CLL). CLL advances state of the art in lifelogging and self-tracking systems by supporting social group collaboration on sensor data capture through the formation of on-the-fly self-tracking groups and the contextualization of data through human and machine-based computation.
In video-based learning, engagement estimation is important for increasing learning efficiency. Changes to the appearances of learners (facial expression or posture such as closing eyes, looking away) obtained by a Web camera are often used to estimate engagement due to the ease of installing the camera. In this work, firstly we collected data in an actual e-learning scenario at a Japanese cram school and found that the appearances of each learner hardly changed. Secondly, we propose an engagement estimation method based on synchrony of head movements, which occur frequently when students take notes. An analysis of our method using collected data revealed an F-score of 0.79, which is higher than the methods based on changes of appearance. This result indicates that our method has possibility to be more effective for engagement estimation in actual learning scenarios.
Recent studies demonstrate how personalized and multisensory learning improves students' performances and teachers' work. Each student has different skills, aptitudes and difficulties, and the learning process requires more or less time in relation to both the topic to assimilate and the employed teaching method. Our system, called Magika, is a multi-modal and multi-sensory environment that empowers teachers and aims at providing students with the right learning supports. Magika includes a preliminary phase called "on-boarding", aimed at defining their likes and dislikes, and an intelligent engine, "Geniel", capable of discerning and suggesting the most suitable activities to be played, according to the player's profile.
The demo introduces Reflex, a phy-gital game for children and adolescents with Neuro-Developmental Disorder. The game, offered through a cross-platform application for smartphones and tablets, bridges the digital and the physical worlds by tracking, via a bottom-looking mirror positioned on the device camera, physical items placed on a table. This new interaction paradigm, the first pilot study and its adaptability to each user profile reveals an unexplored potential for learning.
Children with Autism Spectrum Disorder (ASD) often face the challenge of detecting and expressing emotions. I.e., it's hard for them to recognize happiness, sadness and anger in other people and to express their own feelings. This difficulty produces severe impairments in communication and social functioning. The paper proposes a spoken educational game, exploiting Machine Learning techniques, to help children with ASD to understand how to correctly identify and express emotions. The game focuses on four emotional states (happiness, sadness, anger and neutrality) and is divided in two levels with increasingly difficulty: the first step is to learn how to recognize and express feelings and in the second phase emotional skills by the user are examined and evaluated. The application integrates a multilingual emotion recognizer from the pitch of the voice.
This paper describes the procedure to create an emotional data-set, named WIYE (What Is Your Emotion?), composed by semantic contents, audio and video recordings of children. Data have been collected using an interactive storytelling application, leading children aged 4 years to 12 years to discover about their emotional sphere and emotion expression skills. During the story, every episode is dedicated and focused on a specific emotion. The investigated emotional states are sadness, anger, fear, surprise and joy. This corpus can be exploited by Conversational Technologies: it can be used with Machine Learning classification algorithms to train models to recognize emotions expressed by children starting from the pitch of their voice, their facial expression and the contents of their conversations.
In this demo, we present PARLEY, an interactive system to train difficult social situations in a safe environment with a Virtual Agent. The system realizes different phenomena studied by psychology research that are known to create a natural interaction. Moreover, we include an open learner model to ensure an explainable user experience.
The contiguity of physical and digital content of embodied learning has been shown to increase student's engagement in educational contexts. Applications with various kinds of physical interactions have been deployed to enhance the learning experiences in many engineering domains. However, even though computer science education (CSE) is one of zestful topics in the recent years, there are few studies focusing on the embodiment of CSE materials, by which the abstract and intangible concepts could be transformed into an intuitive affordance that utilizes sensorimotor experiences during the learning process. We propose an augmented embodiment mobile app designed for computational thinking (CT), specifically the debugging practices and abstraction concept, that makes use of gestures and augmented reality for learners to interact with the content. We examine the logic by the design framework for embodied learning and discuss potential extensions of multimodal analytics in such an application. Our preliminary user study in a middle school shows students' engagement in the application, however, it also reflected several design issues that need to be solved in the next iteration. The future plan of data analysis and experiments is also discussed.
With the development of location-based social networks (LBSNs), user mobility modeling has a wide range of applications, like performance-based advertising, city-wide traffic planning and local-based recommendation. Point-of-interest (POI) plays an important role in these applications for its rich semantic behavior information. However, most studies only use user's proactive POI check-ins which are extremely sparse and just a part of user's semantic offline behavior.
In this paper, we use POI category check-ins which are obtained by matching user's GPS data, and propose a framework based on Hidden Markov Model (HMM) to model group mobility and recommend the POI category of user at the next step. The category level of POI can reflect the semantic meaning of user mobility, and also reduce the recommendation space. Experiment results show that the recommendation accuracy is 11.9% higher with the group semantic offline behavior.
This paper discusses an explainable intelligent interface supporting coaches in providing feedback to runners tracking their progress through a mobile application. The interface explicitly shows the confidence on the assigned ratings, it supports the impact analysis of the different recorded metrics and it allows controlling and reinforcing the assessment.
Clicking selfies using mobile phones has become a trend in the past few years. It is documented that the thrill of clicking selfies at adventurous places has resulted in serious injuries and even death in some cases. To overcome this, we propose a system which can alert the user by detecting the level of danger in the background while capturing selfies. Our app is based on a deep Convolutional Neural Network (CNN). The prediction is performed as a 5 class classification problem with classes representing a different level of danger. Face detection and device orientation information are also used for robustness and lesser battery consumption.
Point and click games are a more interactive and engaging alternative to usual photo galleries and video players for promoting tourist locations on the web. The PAC-PAC project aims at supporting users without programming knowledge in creating such games starting from real photos and videos. In this paper, we discuss an intelligent support and the related user interface for supporting them in developing the game story. The user selects a set of photos and a deep-learning based generator returns a story draft for each scene. The user can regenerate the story controlling some net parameters or tailor the text to his or her own tastes.
Projection Augmented Reality (AR), which uses projectors to augment virtual information and objects in physical space, is one of the AR fields that have been actively researched. There have been many attempts to use mobile devices to implement the interactions available in these AR environments; however, due to mobile devices' limited computational power, they were somewhat difficult to use in real applications. We have solved this problem with the latest off-the-shelf devices and support development kit and have proven, through experimentation, that our solution can be used in actual projection AR applications.
Estimating feedback from users on documents (e.g., interest, search satisfaction, engagement, document relevance, utility, and so on.) is essential for intelligent systems such as recommendation systems and knowledge support systems. In this paper, we focus on a specific situation (workplace) and propose the utilization of working logs as a new feature to estimate feedback from users, in addition to web browsing logs. Among the considered workplaces, we chose a workplace where programmers edit the source code by referring to documents on the web. Then, we proposed features of working logs such as editing behavior after viewing a target document, which were used as an input to machine learning algorithms for the estimation. Our experimental results demonstrate that the features of working logs can be used to improve estimation of feed-back from users.
This study describes an Adaptive Projection Augmented Reality (AR) system that can provide information in real-time using object recognition. This approach is based on deep-learning through the construction of a 3D space from the real world. Through single system, a projector-camera unit with pan-tilt mechanism,, the 360-degree space of user surroundings can be built into a 3-dimensional space and accomplish object and user recognition. a projection AR environment can be generated instantaneously. Information relevant to real-world objects can be provided through real-time interactions between the user and objects. Using spatial interaction, it also allows for achieving intuitive interactions with projection information, user interface (UI), and contents without touch sensors. Several scenarios for the use of this system are described below.
Co-creation activities such as problem solving and ideation require people with different perspectives to gather at the same place to develop ideas effectively. Owing to time and space constraints, this is not always easy. We developed a real-time collaborative system that allows people to share and discuss ideas while participating remotely. This paper introduces a distributed data sharing method adopted to improve users' collaboration and describes the evaluation results demonstrating the maximum number of systems that can be optimally connected simultaneously.
Musical relaxation is a common method to relieve personal stress. Particularly, nature sounds, instrumental music, voice (chanting), "easy listening" songs, etc. can be played for relaxation. Nevertheless, effectiveness of the sounds used for the relaxation is idiosyncratic, depending on personal taste. In our approach, computer-guided audition for spatial soundscapes is investigated, automatically exploring a polyphonic area while using biosignals as indicators of satisfaction. We propose a reinforcement learning (RL) method to discover the sound relaxation "sweet spot" in a polyphonic soundscape. An avatar roams within a pantophonic space, surrounded by six independent audio channels, while a human subject, listening through the avatar's ears, is connected to an electroencephalographic (EEG) headset. Besides the position of the avatar, pitch, reverberation, and filters can also be changed to find the most relaxing virtual standpoint and parameters for the listener. Instead of changing position manually, a Deep Q-Network (DQN) in reinforcement learning is used. An RL agent adjusts parameters according to reward values calculated by change of relative theta band (4--8 Hz) power.
This demo paper presents a desktop 3D positioning system, which is implemented with geomagnetic sensors and coils for DC magnetic field generation. Our system is composed of sensor nodes for localization, which sense and transmit the magnitude of the magnetic field to a host computer, and anchor nodes, which intermittently generate DC magnetic field with the functionality as sensor nodes. An advantage of our system is that the cover range and estimation accuracy can be enhanced instantly by adding anchor nodes since the added anchor node is also automatically localized. Also, the sensor node can be implemented in a mm-scale form factor with small power consumption thanks to the geomagnetic sensor, which enables us to attach the sensor nodes to various things. Object tracking could serve as a primary application of this system, such as virtual reality (VR), interactive educational experience and rehabilitation which can benefit the human computer interaction.
Network impairments (e.g., latency, outages, etc) can have an adverse impact on user experience, especially for interactive applications (e.g., augmented reality (AR)). To effectively deal with such problems, we propose a bi-directional interface between the applications and the network, which facilitates intelligent decision making by both the application and the network.
Geotagged tweets serve many important applications, e.g., crisis management, but only a small proportion of tweets are explicitly geotagged. We propose a Convolutional Neural Network (CNN) architecture for geotagging tweets to landmarks, based on the text in tweets and other meta information, such as posting time and source. Using a dataset of Melbourne tweets, experimental results show that our algorithm out-performed various state-of-the-art baselines.
Graceful is one of the adjectives about human motion. Graceful motion can give a good impression to the observer. This research focuses on the concept of "graceful motion," as defined by William Hogarth. To quantify the gracefulness of human motion, we propose the measurement method that can extract the "Line of Beauty" from dancer's hand trajectory. We approximate the dancer's hand trajectories by quartic B-spline curves and extract the S-shaped curves using its inflection points. Finally, we quantify the gracefulness using extracted curve's parameters. Through the subjective evaluation, we showed that weak correlation between impression and our quantification result.
The paper describes a demonstration of a database tool that automates the construction of database applications. The Automated Relational Technology (ART) Studio generates an end-user database application to information managed by a relational database. The ART Studio has a "configure, not code" interface that does not require any programming whatsoever. It generates an interactive decision tree interface that enables endusers to pinpoint information managed by the database and to display this information in a window. The newly discovered Branching Data Model (BDM) enables the Studio to generate both of these components, the decision tree and its window, automatically. Program logic in the Studio generates models of relational data between two table attributes. Other program logic attaches one BDM to another to construct tree structures as a decision tree interface and to connect window fields to table attributes throughout the database. The Studio and the end-user systems that it generates represents a paradigm shift away from today's "Visual Query" database interface. The new approach introduces a "Visual Predicate Calculus" model that treats SQL expressions as templates consisting of reusable logical parts. The BDM led to the consideration of SQL expressions as templates and their components as reusable parts.
Contour is a voice-guided speech re-synthesis system we previously developed for efficient TTS (Text-to-Speech) content production. In this follow-up evaluation study, we investigate qualities of synthetic speech produced using Contour against a conventional parametric-based workflow by evaluating expressive dimensions of produced TTS content using vocal prosodic parameters. Based on the quantitative and qualitative results, we discuss user preferences between these two workflows for producing TTS content.
This demo shows a recommender bar and a planning workspace that augment an existing system. The design addresses two challenges for analysts doing proactive data protection: 1) the information overload from the many data repositories and protection techniques to consider; 2) the optimization of protection plans given multiple competing factors.
Audio annotation for music clips is an important task for machine-learning-based music analysis and applications. However, it is a time-consuming task because it often requires repetitive manipulations even though typical audio files often contain repetitive structures (e.g., a song often has similar phrases used multiple times). In this paper we present a new interaction technique, to intelligently automate repetitive manipulations for audio annotation. It mimics the "autocompletion" functions used in source code editors and spreadsheet software and is called Autocomplete Audio Annotation. We developed a proof-of-concept system for annotating the continuous fundamental frequency (fo) of a vocal part of a song.
In recent years, deep learning has contributed to a big step forward in artificial intelligence, so that deep learning models have been created extensively in a variety of areas. However, development of deep learning model requires high implementation skills as well as domain knowledge. Additionally, finding the best model is a process of a lot of trial-and-error for developers. To alleviate the developers' difficulties, we have developed a deep learning model development environment called DL-Dashboard that allows developers can create new models easily and quickly by drag-and-dropping built-in layer component and can train the models by selecting one of the suggested training options without much deep learning experience. We explain design principles and implementation of DL-Dashboard system and show how developers can create and train models user-friendly on it.
In this demo, we showcase SAM [3], a modular and extensible JavaScript framework for <u>s</u>elf-<u>a</u>dapting <u>m</u>enus on webpages. SAM allows control of two elementary aspects of adaptating web menus: (1) the target policy, which assigns scores to menu items for adaptation, and (2) the adaptation style, which specifies how they are adapted on display. We highlight SAM's capabilities through readily implemented policies from literature, paired with adaptation styles such as reordering and highlighting. Audience are given a chance to experience how SAM automatically adapts typical web-page menus based on their browsing behaviour. We also showcase how researchers can use the open-sourced1 framework to further experiment with self-adapting menus, and how practitioners can deploy it to their own websites.
Starting their academic career can be overwhelming for many young people. Students are often presented with a variety of options within their programmes of study and making appropriate and informed decisions can be a challenge. Compared to many other areas in our everyday life, recommender systems remain underused in the academic setting. In this part of our research we use non-negative matrix factorisation to identify dependencies between modules, visualise sequential recommendations, and bring structure and clarity into the academic module space.
We proposed a new query language that expresses complex spatial queries in a concise and intuitive way. The language can express conditions on the range, direction and time distance within a spatial search query by using the proposed spatial operators and "space character". We also show how a collaborative map search system supporting this query language can be implemented and describe several applications to highlight how the language can be put into practice.
Countless voice-enabled user interfaces rely on keyword spotting (KWS) systems for wake word detection and simple command recognition. As a practical matter, these applications run on "edge" devices, where dozens of different platforms exist; typically, platform-dependent implementation are required whenever keyword spotting capabilities are needed. This impedes the rapid deployment of voice-enabled interfaces. Fortunately, with the development of several recent frameworks, JavaScript enables us to deploy neural networks for keyword spotting to support a wide range of speech-based user interfaces. We present three voice-enabled applications that use a unified, JavaScript-based KWS system: an in-browser game, a desktop virtual assistant, and a smart lightbulb controller. We are, to the best of our knowledge, the first to demonstrate the feasibility of JavaScript-based keyword spotting for universal voice-enabled user interfaces.
Cognitive load derived from pupillary measures has been studied extensively in the human-computer interaction community, using various kinds of tasks and measures. In this paper, we present preliminary results on the cognitive load due to a gaze typing task of varying complexities. A pilot study was conducted to observe if a task on gaze typing sentences, of varying complexities, could generate different cognitive load in the participants. The data obtained was from pupils, blink data, pupil oscillation measured in form of Index of Pupillary Activity and subjective scores. Logistic regression was performed to observe the difference between easy and difficult tasks and they were successfully differentiated, with 75% accuracy, using the measures of absolute pupil size, relative pupil size and subjective scores given by the participants.
The Keystroke-Level Model (KLM) is commonly used to predict a user's task completion times with graphical user interfaces. With KLM, the user's behavior is modeled with a linear function of independent, elementary operators. Each task can be completed with a sequence of operators. The policy, or the assumed sequence that the user executes, is typically pre-specified by the analyst. Using Reinforcement Learning (RL), RL-KLM [4] proposes an algorithmic method to obtain this policy automatically. This approach yields user-like policies in simple but realistic interaction tasks, and offers a quick way to obtain an upper bound for user performance.
In this demonstration, we show how a policy is automatically learned by RL-KLM in form-filling tasks. A user can interact with the system by placing form fields onto a UI canvas. The system learns the fastest filling order for the form template according to Fitts' Law operators, and computes estimates the time required to complete the form. Attendees are able to iterate over their designs to see how the changes in designs affect user's policy and the task completion time.
In the last years, the introduction of new, precise and pervasive tracking devices has contributed to the popularity of gestural interaction. In general, the effectiveness of such interfaces depends on two components: the algorithm used for accurately recognizing the user movements and the guidance provided to users while executing gestures. In this paper, we discuss a work in progress research for connecting these two components and increasing their effectiveness: the recognition algorithm supports the implementation of feedback the and feed-forward mechanisms, providing information on the identified gesture parts in real time, while developers define complex gestures starting from simple primitives.
Direct manipulation interactions on projections are often incorporated in visual analytics applications. These interactions enable analysts to provide feedback to the system, demonstrating relationships that the analyst wishes to find within the projection. However, determining the precise intent of the analyst is a challenge; when an analyst interacts with a projection, the system could infer a variety of possible interpretations. In this work, we explore interaction design considerations for the simultaneous use of dimension reduction and clustering algorithms to address this challenge.
Drum machines are an important tool for music production in the context of electronic dance music. In this work we introduce a drum machine which automatically generates drum patterns according to the high-level stylistic cues of musical genre, complexity, and loudness, controlled by the user. In comparable tools, usually a predefined collection of drum patterns serves as the source for suggestions. In order to yield a greater variety of patterns and to create original patterns, we suggest the use of stochastic generative models. Therefore, in this work, drum patterns are generated using a generative adversarial network, trained on a large-scale drum pattern library. As a method to enter, edit, visualize, and generate patterns, a touch-based step sequencer interface is augmented with controls of the semantic dimensions of genre, complexity, and loudness.
In this paper we investigate interaction strategies for autonomous virtual trainers. Fourteen participants were immersed in our VR system to learn relative areas of countries by sorting virtual cubes. We evaluated two different feedback strategies used by the virtual trainer assisting participants. One provided Correctness Feedback at the end of each task, while the other provided Suggestive Feedback during the task. Correctness feedback was the most effective given that it received higher preference and led to shorter task completion time with equivalent performance outcomes.
In the last few years, interest in intelligent voice assistants (IVAs) to aid older adults in living healthier and longer at home has seen a drastic increase. As these devices become more integrated into the lives of older adults to assist with health and wellness tasks, it is rapidly becoming important to understand how to design systems that also match older adults' goals for how they want to govern their personal health tasks. As a first step in understanding older adults' needs for autonomy in intelligent health assistant (IHA) design, we conducted a Wizard of Oz (WOZ) study that included a semi-structured interview with 10 older adults to understand their needs and concerns about IHAs for consumer health. We present our findings as opportunities to design IHAs that help inform users in order to meet their needs for autonomy.
Machine learning methods have made significant progress across many application areas. However, the power to utilize these methods has remained out of reach for many domain experts due to the background knowledge required to tune parameters and debug errors. Our HuManIC tool eases this requirement for relational models by 1) providing three different ways of meaningfully displaying the model and 2) by allowing the user to intuitively edit the model.
We present Paper Tuner, a user-controlled interface for recommending papers and presentations at a research conference. The availability of multiple sources of information about user interests makes hybrid recommendation approach attractive in a conference context, but traditional static parallel hybridization makes is hard to generate a single ranking that can address different needs. We introduce a novel slider-based user interface that allows users to control the importance of different relevance source and even reverse the impact of specific sources. The log analysis of system usage during in a real conference context revealed an extensive use of sliders. Moreover, nearly half of the users applied the reverse functionality while using the sliders.
A landscape site plan is a graphic representation to show the arrangement of landscape items(like trees, buildings, and paths) from a top view. Restyling a site plan includes adjusting the colors, textures and other artistic customization, which is the task that landscape architects spend more time to work on nowadays. Landscape-freestyle shows the potential of using machine learning for automatically restyling landscape site plans. Landscape-freestyle recognizes the features (locations and sizes) of each item (trees, buildings, and paths) by an object-recognition algorithm on a styled site plan or by reading data directly on an AutoCAD file site plan. The user can choose to upload a template image offered by themselves or a preset style template offered by Landscape-freestyle for restyling. If they upload templates themselves, a style-recognition algorithm is used to identify the items and their artistic customization from the template image and use it for styling. This work presents our first approach to restyle site plans: recognize tree images on site plans, extract tree features and redraw them with other style templates. This research aims to expose the importance of machine learning to benefit a traditional working flow in the design field in a friendly and fast way.
In this demo paper, we present a visual approach for explaining learning content recommendation in the personalized practice system Mastery Grids. The proposed approach uses a concept-level visualization of student knowledge in Java programming to demonstrate why specific practice content is recommended by the system. The visualized student knowledge is estimated by a Bayesian Knowledge Tracing approach, which traces student problem-solving performance. The visual explanatory components, which show both a fine-grained and aggregated knowledge level, are presented to the students along with textual explanations. The goal of this approach is to display the suitability of each recommended item in the context of a student's current knowledge and goal, i.e., the current topic they are studying.
Many chronic diseases and common risks to elderly patients can be assessed and treated through standardized training and rehabilitation programs. Unfortunately, there is a need to make risk assessment and preventative care for the elderly more easily accessible as many programs either use specialized hardware or require human supervision. We seek to reduce the barrier to entry for patients through a portable application which enables fall risk prevention assessment and rehabilitation anywhere. Our work leverages the latest in machine learning and computer vision, accomplishing pose estimation and body tracking with a simple and ubiquitous web cam. Thus patients can be screened anywhere with the ability to get feedback in near-real time.
Maintaining good sleep hygiene is a constant challenge in modern lives. Sleep habits are hard to monitor and record, especially when most sleep monitoring programs overlook the necessity of calculating user input. This input is vital in order to change poor sleeping patterns, as it is difficult to identify the source of an individual's problems. Sleep tracking software also struggle with a lack of user transparency and interactivity leading individuals to mistrust the results these applications generate or otherwise not feel like the insights are actionable. To explore these issues, we designed an interventional chat bot to mediate information collection and interaction between end user and sleep monitoring technology. The SleepBot prompts users with simple questions that attempt to elicit insight into larger problems that contribute to poor sleep and help craft successful sleep hygiene behaviors. Text messaging based interaction eases the process as it is similar to talking with a friend, making for a unique environment in which the user is able to share personal data comfortably.
In this paper, we verify the effect of a VH's objective and subjective speech. We hypothesized that the effect of objective and subjective speech depends on the topics that a VH speaks about and predicted that subjective speech is effective when a VH speaks about topics that it prefers. To verify this hypothesis, we performed an experiment with two parameters and two levels. One parameter was "persuasion strategy," and the level was "objective" or "subjective." The other parameter was "topic" and the level was "not according with preference" or "according with preference" The result shows that the effect of subjective speech depends on the preferences of a product recommendation virtual agent as perceived by users.
Understanding the interactions between natural processes and human activities poses major challenges as it requires the integration of models and data across disparate disciplines. It typically takes many months and even years to create valid end-to-end simulations as different models need to be configured in consistent ways and generate data that is usable by other models. MINT is a novel framework for model integration that captures extensive knowledge about models and data and aims to automatically compose them together. MINT guides a user to pose a well-formed modeling question, select and configure appropriate models, find and prepare appropriate datasets, compose data and models into end-to-end workflows, run the simulations, and visualize the results. MINT currently includes hydrology, agriculture, and socioeconomic models.
In this study, we propose a programming-by-example (PBE)-based data transformation method for feature engineering in machine learning. Data transformation by PBE is not new. However, we utilized the one proposed herein to improve the performance of machine learning in synthesizing a transformation rule from examples. Herein, the system first generates candidate rules, and then chooses the rule that achieves the highest performance in a target machine learning task. We tested this system with the Titanic dataset, and the result shows that the proposed method can avoid worst-case performance compared to the original PBE method.
People suffering from laryngeal cancer might need to have their larynx removed, which leads to speech impairment. Three traditional methods of voice restoration are available, but patients still compensate the voice loss using non-verbal communication, writing and assistive devices. Novel intelligent user interfaces are being developed as voice restoration devices, but no consensus on measures used to assess their effectiveness nor their integration with other devices has been established yet. This poster is an attempt to develop evaluation framework, for both traditional and novel interfaces, with the focus on complex and multichannel aspects of communication. Ethnographic field studies are proposed for investigation of the holistic process of communicative interaction in its highly individualized forms. Results of a pilot study are presented, in order to demonstrate the adequacy and necessary developments of the approach.
Modern election news reporting tends to focus on who is winning and on campaign strategies - what is often called "Horse Race" coverage. With the ultimate goals of better understanding the mix of election story types presented by different venues, helping people to understand their own news consumption, and recommending stories with more useful content, we explore methods for automatic classification of election news stories. We also describe a plugin that recognizes "Horse Race" articles, and recommends a similar article without the frame.
The digitization of museum exhibits has raised the question of how to make these data accessible, particularly in light of the ever growing collections being available. In this demo, we present the VIRTUE system which allows curators to easily set up virtual museum exhibitions of static and dynamic 2D (paintings, photographs, videos, etc.) and 3D artifacts. Visitors may navigate through the virtual rooms, inspect the artifacts and interact with them in novel ways. Participants will be able to use the system by creating their own exhibitions, which they tour as a visitor.
Effective personalization of web experiences constitutes matching user intent to content, while optimizing a set of engagement metrics and catering to diverse consumption mediums. The diversity of user interests necessitates automation in the process of constructing personalized content experiences. We propose a genetic algorithm based framework that chooses a set of content items relevant to a target user, and decides their respective sizes and relative positions to construct a layout. This layout is designed to simultaneously optimize a chosen engagement metric subject to diversity of the information presented. Comparisons against existing frameworks based on human annotations show improved prominence of key content and increased diversity of the content presented.
We build conversational agents to serve as AI interviewers who engage a user in a one-on-one, text-based conversation. Our live demos showcase two special skills of an AI interviewer: (a) ability to actively listen to a user during an interview-responding to complex and diverse user input empathetically; and (b) the ability to automatically infer the user's Big 5 personality traits from the interview. We provide an overview of the technologies that enable these two abilities and real-world applications of such AI interviewers.
Smart systems that apply complex reasoning to make decisions and plan behavior are often difficult for users to understand. While research to make systems more explainable and therefore more intelligible and transparent is gaining pace, there are numerous issues and problems regarding these systems that demand further attention. The ExSS 2019 workshop is a follow-on from the very successful ExSS 2018 workshop previously held at IUI, to bring academia and industry together to address these issues. This workshop includes a keynote, paper panels, poster session, and group activities, with the goal of developing concrete approaches to handling challenges related to the design and development of explainable smart systems.
The third workshop on Theory-Informed User Modeling for Tailoring and Personalizing Interfaces (HUMANIZE)1 took place in conjunction with the 24th annual meeting of the intelligent user interfaces (IUI)2 community in Los Angeles, CA, USA on March 20, 2019. The goal of the workshop was to attract researchers from different fields by accepting contributions on the intersection of practical data mining methods and theoretical knowledge for personalization. A total of six papers were accepted for this edition of the workshop.
The workshop focus is on Algorithmic Transparency (AT) in Emerging Technologies. Naturally, the user interface is where and how the Algorithmic Transparency should occur and the challenge we aim at is how intelligent user interfaces can make a system transparent to its users.
This half-day workshop seeks to bring together practitioners and academics interested in the challenges of structuring interactions for subject matter experts (SMEs) who are providing knowledge and/or feedback to an AI system, but are not well-versed in the underlying algorithms. Since the information provided by SMEs directly effects the efficacy of the final system, collecting the correct data is a problem that navigates issues ranging from curating data that may be tainted to structuring data collection tasks in such a way as to mitigate user boredom. The goal of this workshop is to discuss methods and new paradigms for productively interacting with users while collecting knowledge.
Conversational agents are becoming increasingly popular. These systems present an extremely rich and challenging research space for addressing many aspects of user awareness and adaptation, such as user profiles, contexts, personalities, emotions, social dynamics, conversational styles, etc. Adaptive interfaces are of long-standing interest for the HCI community. Meanwhile, new machine learning approaches are introduced in the current generation of conversational agents, such as deep learning, reinforcement learning, and active learning. It is imperative to consider how various aspects of user-awareness should be handled by these new techniques. The goal of this workshop is to bring together researchers in HCI, user modeling, and the AI and NLP communities from both industry and academia, who are interested in advancing the state-of-the-art on the topic of user-aware conversational agents. Through a focused and open exchange of ideas and discussions, we will work to identify central research topics in user-aware conversational agents and develop a strong interdisciplinary foundation to address them.
Digital music technology plays a central role in all areas of the music ecosystem. For both, music consumers and music producers, intelligent user interfaces are a means to improve access to sound and music. The second workshop on Intelligent Music Interfaces for Listening and Creation (MILC) provides a forum for the latest developments and trends in intelligent interfaces for music consumption and production by bringing together researchers from areas such as music information retrieval, recommender systems, interactive machine learning, human-computer interaction, and composition.
The 2nd workshop on User Interfaces for Spatial-Temporal Data Analysis (UISTDA2019)1 took place in conjunction with the 24th Annual Meeting of the Intelligent Interfaces community (ACM IUI2019) in Los Angeles, USA on March 20, 2019. The goal of this workshop is to share latest progress and developments, current challenges and potential applications for exploring and exploiting large amounts of spatial and temporal data. Four papers and a keynote talk were presented in this edition of the workshop.
As IoT devices begin to permeate our environment, our interaction with these devices are starting to have a real potential to transform our daily lives. Therefore, there exists an incredible opportunity for intelligent user interfaces to simplify the task of controlling such devices. The goal of IUIoT workshop was to serve as a platform for researchers who are working towards the design of IoT systems from an intelligent, human-centered perspective. The workshop accepted a total of five papers: two position and three extended abstracts. These papers were presented by the authors and discussed among the workshop attendees with an aim of exploring future directions and improving existing approaches towards designing intelligent User Interfaces for IoT environments.
This is the third edition of the Workshop on Exploratory Search and Interactive Data Analytics (ESIDA). This series of workshops emerged as a response to the growing interest in developing new methods and systems that allow users to interactively explore large volumes of data, such as documents, multimedia or specialised collections, such as biomedical datasets. There are various approaches to supporting users in this interactive environment ranging from the development of new algorithms through visualisation methods to analysing users' search patterns. The overarching goal of ESIDA is to bring together researchers working in areas that span across multiple facets of exploratory search and data analytics to discuss and outline research challenges for this novel area.
End-user programmable intelligent agents that can learn new tasks and concepts from users' explicit instructions are desired. This paper presents our progress on expanding the capabilities of such agents in the areas of task applicability, task generalizability, user intent disambiguation and support for IoT devices through our multi-modal approach of combining programming by demonstration (PBD) with learning from natural language instructions. Our future directions include facilitating better script reuse and sharing, and supporting greater user expressiveness in instructions.
Voice interaction has become an increasingly ubiquitous form of interaction: from in the car, and on your smartphone to dedicated smart devices such as Amazon's Echo. My thesis will compare three popular smartphone voice assistants: Apple's Siri, Google's Assistant and Microsoft's Cortana to native GUI applications on an iPhone to identify the tasks where voice assistants are more effective. In addition, design patterns that voice interfaces employ to increase effectiveness will be identified.
In this Student Consortium submission, I outline the motivation, the current situation and the research questions of my PhD research on user interfaces for interacting with recommender systems, and particularly the interplay of different personal characteristics and different visualisation and interaction techniques.
In the future we expect automated vehicles to become a major part of everyday traffic. Along with this groundbreaking change in mobility pedestrians are forced to interact with such technology. In mixed traffic situations (i.e., manual, semi-automated and autonomous vehicles share a road) it might be crucial for non-motorized traffic participants to know which entity is in control. For example, when considering to cross a road, the degree of automation and the presence of human drivers could influence the decision. Moreover, it is not clear whether conventional communication channels such as turn signals and brake / reversing lights meet the challenges of autonomous traffic. I expect that interaction between automated vehicles and pedestrians includes safety critical challenges which are directly related to acceptance and success of the emerging technology. I want to contribute to the future of autonomous mobility by providing design guidelines on how to support pedestrians in their decision making process in mixed traffic. Furthermore, I want to explore new designs for human-vehicle communication.
Autism Spectrum Disorder (ASD) is a development disorder characterized by communication impairments and repetitive behaviors. Children with autism have problems in social interactions because of their deficits in language, communication, emotion expression and sensory impairments. The research aims at developing an autonomous embodied conversational agent (ECA) capable to enhance their abilities and overcome everyday-life limitations, training social skills and promoting self-sufficiency. The tool will be designed customized and flexible in terms of interaction channels and learning contents enough to be adopted by any children, so as addressing their own specific needs. With this purpose in mind, this technology may mitigate their everyday limitations such as a poor verbal communication and a lack of self-sufficiency.
This research investigates artificial Speech Emotion Recognition both from the semantics and from the pitch of the voice and aims at exploring its application in Conversational Technology to support people with Neurodevelopmental Disorders NDD. NDD is a group of conditions with onset in the developmental period that are characterized by severe deficits in the cognitive, emotional and motor areas. The challenge of the project is to develop datasets, models and algorithms to create Conversational Tools for individuals with NDD to help them enhance communication skills, particularly emotion recognition and expression.
We propose an online virtual agent based communication skills training program for oncologists to promote better prognosis understanding. We first develop machine learning techniques to identify different characteristics of effective communication automatically from a dataset of 382 conversations between oncologists and late stage cancer patients. We then use these techniques to give feedback through a virtual agent, which acts as a standardized patient.
As the rapid growth of IoT technologies, the demand to simplify the task of controlling privacy setting for IoT devices in a user-friendly manner becomes urgent. In this paper, we first present our existing works on recommending privacy settings for different IoT contexts. We use a data-driven approach to analyze how IoT users make decisions regarding the privacy settings of their devices and design intelligent user interfaces to reduce the complexity of managing privacy for IoT based on the insights gained from previous analysis. We then present our plan to evaluate the user experience of the new interfaces.
Voice User Interfaces (VUIs) are growing in popularity. However, their lack of visual guidance challenges users in learning how to proficiently operate these systems. My research focuses on adapting a VUI's spoken feedback to suggest verbal commands to users encountering errors. Based on observations in my previous research, these adaptive suggestions adapt to the detected user proficiency and predicted user goal to customize feedback to support user needs. The objectives of this technique are to guide users to 1) learn what verbal commands execute VUI actions and 2) learn the actions supported to accomplish desired tasks with the system.
There is a lack of diversity in the simulations or training, that nursing students get to experience before making their way to a real hospital bedside. There is existing evidence and practical concern for nurses' ability to communicate therapeutically, such as calming and de-escalation techniques, when interacting with a diverse clientele, specifically in the way that they interact with patients. We look at leveraging augmented reality and natural language processing to build on existing simulation labs, giving what is now only a static and uniform simulation mannequin, a face, colors, and most importantly ears and a voice. All for the sake of improved communication and patient satisfaction once this learning is transferred into practice.
Laryngectomy is a surgical treatment that leads to voice loss. Novel approach to regaining communicative function after the operation is presented. In contrast to most technologies using artificial intelligence to restore the voice, it focuses on balancing the use of technology with human compensatory capabilities. A design process that consists of first person participatory design and user interviews is presented. In the current phase proposed device integrates wearable vibrating effector with voice pitch controller and voice amplifying set and dynamical voice conversion system, adding personal traits to the speech signal.
Artificial Intelligence (AI) research, including machine learning, computer vision, and natural language, requires large amounts of annotated datasets. The current research and development (R&D) pipeline involves each group collecting their own datasets using an annotation tool tailored specifically to their needs, followed by a series of engineering efforts in loading other external datasets and developing their own interfaces, often mimicking some components of existing annotation tools. In departure from the current paradigm, my research focuses on reducing inefficiencies by developing a unified web-based, fully configurable framework that enables researchers to set up an end-to-end R&D experience from dataset annotations to deployment with an application-specific AI backend. Extensible and customizable as required by individual projects, the framework has been successfully featured in a number of research efforts, including conversational AI, explainable AI, and commonsense grounding of language and vision. This submission outlines the various milestones-to-date and planned future work.