The computational modeling of groups requires models that connect micro-level with macro-level processes and outcomes. Recent research in computational social science has started from simple models of human behaviour, and attempted to link to social structures. However, these models make simplifying assumptions about human understanding of culture that are of ten not realistic and may be limiting in their generality. In this paper, we present work on Bayesian affect control theory as a more comprehensive, yet highly parsimonious model that integrates artificial intelligence, social psychology, and emotions into a single predictive model of human activities in groups. We illustrate these developments with examples from an ongoing research project aimed at computational analysis of virtual software development teams.
In this paper, we present the Group Affect and Performance (GAP) corpus, a publicly available dataset of thirteen small group meetings. The GAP corpus contains meeting audio, transcriptions, annotations, decision-making performance, as well as group member influence, post-meeting ratings of satisfaction, and demographics. In this paper, we discuss all aspects of data collection and preparation. We also present preliminary analyses and findings concerning decision-making performance, group member influence, group member satisfaction, and additional meeting characteristics. We conclude with future directions. In creating and releasing this corpus, it is our goal to stimulate research on the computational analysis of small group meetings, and to supplement the relatively small amount of currently available group interaction data.
This paper presents a model for head and body pose estimation (HBPE) when labelled samples are highly sparse. The current state-of-the-art multimodal approach to HBPE utilizes the matrix completion method in a transductive setting to predict pose labels for unobserved samples. Based on this approach, the proposed method tackles HBPE when manually annotated ground truth labels are temporally sparse. We posit that the current state of the art approach oversimplifies the temporal sparsity assumption by using Laplacian smoothing. Our final solution uses: i) Gaussian process regression in place of Laplacian smoothing, ii) head and body coupling, and iii) nuclear norm minimization in the matrix completion setting. The model is applied to the challenging SALSA dataset for benchmark against the state-of-the-art method. Our presented formulation outperforms the state-of-the-art significantly in this particular setting, e.g. at 5% ground truth labels as training data, head pose accuracy and body pose accuracy is approximately 62% and 70%, respectively. As well as fitting a more flexible model to missing labels in time, we posit that our approach also loosens the head and body coupling constraint, allowing for a more expressive model of the head and body pose typically seen during conversational interaction in groups. This provides a new baseline to improve upon for future integration of multimodal sensor data for the purpose of HBPE.
We present experimental results on the task of automatically predicting group members' attitudes about management of their meeting, based on linguistic and acoustic features derived from the meeting recordings and transcripts. The group members' attitudes were gathered from detailed post-meeting questionnaires. A key finding is that features of linguistic content by themselves yield poor prediction performance on this task, but the best results are found by combining acoustic and linguistic features in a multimodal prediction model. When trying to automate the detection of group member attitudes that might be manifested subtly in their language and behaviour, a multimodal analysis is key.
A major challenge in Computational Social Science consists in modelling and explaining the temporal dynamics of human communication. Understanding small group interactions can help shed light on sociological and social psychological questions relating to human communications. Previous work showed how Markov rewards models can be used to analyse group interaction in meeting. We explore further the potential of these models by formulating queries over interaction as probabilistic temporal logic properties and analysing them with probabilistic model checking. For this study, we analyse a dataset taken from a standard corpus of scenario and non-scenario meetings and demonstrate the expressiveness of our approach to validate expected interactions and identify patterns of interest.
More and more companies are putting emphasis on communication skill in the recruitment of their employees and adopt group discussion as part of their recruitment interview. In our ongoing project, we aim to develop a training system that can provide advices to its users in improving the perception of their communication skill during group discussion. In order to realize this goal, a conceptual unit of communicational behaviors and a template of communication style are required. In this paper, we propose the use of functional roles of the participants in group discussion as this unit and report the results of the investigation on the issues related to adopting functional roles in a training system. If the concept, functional role can be adopted for this purpose, it must fulfill the following requirements: the functional roles performed by the participants has the influence on the perception of communication skill, there are differences in the temporal patterns of functional roles performed by the participants with high and low communication skill, and functional roles can be automatically detected. The paper at first introduces our definition of functional roles in this work and shows that the distribution of these roles indeed determines the impression of individual participant's communication skill. In the second part of this paper, we assume that the current conversational situation can be represented by the combination of the functional roles of the participants and conducted the analyses on it. We analyzed the relationship of the transitions of conversational situations and communication skill and then analyzed the patterns of the next role of the current situation regarding to high / low communication skill participants. From the results, we found that the participants who were more active in the discussion are generally evaluated with high skill. In the last part of this paper, we investigated the possibility of automatic classification of functional roles with machine learning techniques and low-level multimodal features.
Eye-gaze activity provides rich information about individuals' engagement in social interactions. Gaze is one of the strongest visual cues in face-to-face interaction. Previous studies have examined how eye gaze can be used to coordinate social interactions such as turn taking and identifying the focus of attention. In this study, we investigate the role of gaze during collaborative problem-solving tasks, specifically how individuals perform different gazing activities when holding different team roles in pair programming, and also whether the differences in eye gaze (if any) provide predictive insight into learning outcomes. We analyzed 40 students' eye-gaze activities, which were annotated for each second during a collaborative problem-solving task (~50 min on average). The results show that students' roles in the collaborative task have a significant relationship with eye-gaze activities. Moreover, participants' gaze activities can provide predictive insight into their post-test scores. These findings suggest that simple activity measures such as relative frequency of eye-gaze activities can be very useful in understanding the collaborative process.
Kid Space is a smart space for children, enabled by an innovative, centralized projection device that senses multimodal interactivity and intelligently projects augmented reality (AR) content across surfaces. Kid Space uses a visible agent to guide learning through play. Two preliminary studies evaluated Kid Space with children 5 to 8 years old. Study 1 showed that children engaged enthusiastically with the projected character during a math exercise and during physically active games. A parent questionnaire showed that parents valued Kid Space for learning and physical activity. Study 2 found that children engaged with a projected agent at a closer distance than with a television. Parents showed a preference for a projected AR agent over an agent on a television or a standard projection. Parents also showed a preference for an agent that demonstrated awareness of children's physicality in the space.
Automatic meeting summarization would reduce the cost of producing minutes during or after a meeting. With the goal of establishing a method for extractive meeting summarization, we propose a multimodal fusion model that identifies the important utterances that should be included in meeting extracts of group discussions. The proposed multimodal model fuses audio, visual, motion, and linguistic unimodal models that are trained by employing a convolutional neural network approach. The performance of the verbal and nonverbal fusion model presented an F-measure of 0.831. We also discuss the characteristics of verbal and nonverbal models and demonstrate that they complement each other.
Groups come in various sizes and can range from a social setting with multiple smaller groups to a business meeting. Each group type has multiple kinds of interaction dynamics, such as bursts of activity where participants quickly exchange interaction events, e.g., quick back-and-forth speech events. This paper investigates representation and identification of interaction dynamics between participants in small group meetings using Parallel Episodes of speech events. We use meetings from the AMI corpus and identify characteristics that describe interaction dynamics. These can expedite the initial steps of analysis and provide an informed view of the interaction dynamics of a meeting.