We examine the concept and characteristics of “Augmented Reality Television” (ARTV) using a four-step investigation method consisting of (1) an analysis of commonly-accepted perspectives on Augmented and Mixed Reality systems, (2) a literature survey of previous work on ARTV, (3) relevant connections with other areas of scientific investigation from TVX/IMX, such as Ambient Media, Interactive TV, and 3-D TV, and (4) by proposing a conceptual framework for ARTV called the “Augmented Reality Television Continuum.” Our work comes at a moment when the excitement and hype about the potential of AR for home entertainment has overlooked rigorous analysis and clear-cut examinations of the concepts involved, which should be the hallmark of any exact science. With this work, our goal is to draw the community’s attention toward fundamentals and first principles of ARTV and to tease out its salient qualities on solid foundations.
Commercialisation of augmented reality (AR) devices has led to their growing application in domestic environments and leisure activities. One such domain is that of television, where AR is one of several technologies driving innovation (c.f. Internet broadcasting, second screen devices). We conduct a systematic literature review to quantify research at the intersection of AR and broadcast television. We identify six common themes and a set of cross-cutting design decisions. We distill this information into a design space incorporating six dimensions: abstraction, interaction, time, display, context and editorial control. We provide methods to operationalise the dimensions to enable research and development of novel concepts, and through this generate six design guidelines to shape future activity at the intersection of AR and television.
Due to the rising popularity of streaming services, television networks are experiencing pressure to keep the attention of the younger audience. Especially in the field of Edutainment, platforms like YouTube or TED are serious competitors and require broadcasters to come up with novel ideas to engage viewers in their program. In this work, we present the augmented reality (AR) SpaceStation application, designed to supplement the viewing of educational videos about the ISS. We evaluated users’ experience during the interaction with the app in a within-subject user study (N = 31) and assessed their workload. During the interaction with the SpaceStation App, participants experienced a higher workload compared to a video-only condition; nonetheless, they considered AR a valuable and enjoyable addition. This paper concludes with a discussion from the perspectives of viewers, content creators, and hosts, and states initial ideas on how to design television programs with AR content, without creating information overload.
This paper explores how acoustically transparent auditory headsets can improve TV viewing by intermixing headset and TV audio, facilitating personal, private auditory enhancements and augmentations of TV content whilst minimizing occlusion of the sounds of reality. We evaluate the impact of synchronously mirroring select audio channels from the 5.1 mix (dialogue, environmental sounds, and the full mix), and selectively augmenting TV viewing with additional speech (e.g. Audio Description, Directors Commentary, and Alternate Language). For TV content, auditory headsets enable better spatialization and more immersive, enjoyable viewing; the intermixing of TV and headset audio creates unique listening experiences; and private augmentations offer new ways to (re)watch content with others. Finally, we reflect on how these headsets might facilitate more immersive augmented TV viewing experiences within reach of consumers.
Very large floor displays can promote engaging public experiences, but incur perspective-related warping, making it challenging to comprehend and interact with distal objects when standing on the display. We introduce a perspective compensated view technique that maintains the relative size and shape of objects as they move away from the viewer, and explore the technique in SpaceHopper, a large-scale, floor-projected version of the game Asteroids. Players bounce on a hopper ball to control their ship in one of two control modes: bounce to repel or bounce to shoot. We evaluated SpaceHopper in a field experiment with 59 participants, finding that perspective compensated view yielded longer playing times (and higher scores) in the bounce to fire modality. Bystanders were highly engaged with players and also seemed to be unaware of the perspective warping, suggesting that visually compensating for the interactor’s perspective does not adversely impact the enjoyment of passive participants and audience members, at least under some circumstances.
Joint attention refers to the shared focal points of attention for occupants in a space. In this work, we introduce a computational definition of joint attention for the automated editing of meetings in multi-camera environments from the AMI corpus. Using extracted head pose and individual headset amplitude as features, we developed three editing methods: (1) a naive audio-based method that selects the camera using only the headset input, (2) a rule-based edit that selects cameras at a fixed pacing using pose data, and (3) an editing algorithm using LSTM (Long-short term memory) learned joint-attention from both pose and audio data, trained on expert edits. The methods are evaluated qualitatively against the human edit, and quantitatively in a user study with 22 participants. Results indicate that LSTM-trained joint attention produces edits that are comparable to the expert edit, offering a wider range of camera views than audio, while being more generalizable as compared to rule-based methods.
User-generated content platforms curate their vast repositories into thematic compilations that facilitate the discovery of high-quality material. Platforms that seek tight editorial control employ people to do this curation, but this process involves time-consuming routine tasks, such as sifting through thousands of videos. We introduce Sifter, a system that improves the curation process by combining automated techniques with a human-powered pipeline that browses, selects, and reaches an agreement on what videos to include in a compilation. We evaluated Sifter by creating 12 compilations from over 34,000 user-generated videos. Sifter was more than three times faster than dedicated curators, and its output was of comparable quality. We reflect on the challenges and opportunities introduced by Sifter to inform the design of content curation systems that need subjective human judgments of videos at scale.
Augmented reality (AR) on smartphone devices allows people to interact with virtually placed objects anchored in the real world through the device’s viewport. Typically, smartphone AR interactions work with the device’s 2D touchscreen disconnected from the modality and depth of the virtual objects. In this paper, we studied 15 participants’ preferences, performances, and cognitive loads on a set of common tasks performed on smartphones at two interaction depths (close-range and distant) with two touchless modalities (hand tracking and screen dwell). We find that distant AR interactions, strongly preferred by the participants, were significantly faster and took less cognitive effort. We observed that within both interaction depths, modalities performed equally. When designing touchless modalities on smartphones, we suggest using distant interactions when overall performance is the top priority, otherwise, using hand tracking or screen dwell as back-ups for each other can be equally effective.
Visually Induced Motion Sickness (VIMS), when the visual system detects motion that is not felt by the vestibular system, is a deterrent for first-time Virtual Reality (VR) users and can impact its adoption rate. Constricting the field-of-view (FoV) has been shown to reduce VIMS as it conceals optical flow in peripheral vision, which is more sensitive to motion. Additionally, several studies have suggested the inclusion of visual elements (e.g., grids) consistent with the real world as reference points. In this paper, we describe a novel technique dynamically controlled by a video’s precomputed optical flow and participants’ runtime head direction and evaluate it in a within-subjects study (N = 24) on a 360° video of a roller coaster. Furthermore, based on a detailed analysis of the video and participant’s experience, we provide insights on the effectiveness of the techniques in VIMS reduction and discuss the role of optical flow in the design and evaluation of the study.
Esports (competitive videogames) have grown into a global phenomenon with over 450m viewers and a 1.5bn USD market. Esports broadcasts follow a similar structure to traditional sports. However, due to their virtual nature, a large and detailed amount data is available about in-game actions not currently accessible in traditional sport. This provides an opportunity to incorporate novel insights about complex aspects of gameplay into the audience experience – enabling more in-depth coverage for experienced viewers, and increased accessibility for newcomers. Previous research has only explored a limited range of ways data could be incorporated into esports viewing (e.g. data visualizations post-match) and only a few studies have investigated how the presentation of statistics impacts spectators’ experiences and viewing behaviors. We present Weavr, a companion app that allows audiences to consume data-driven insights during and around esports broadcasts. We report on deployments at two major tournaments, that provide ecologically valid findings about how the app’s features were experienced by audiences and their impact on viewing behavior. We discuss implications for the design of second-screen apps for live esports events, and for traditional sports as similar data becomes available for them via improved tracking technologies.
Live streaming is a unique medium that merges different layers of communication by facilitating individual, group, and mass communication simultaneously. Streamers who broadcast themselves on live streaming platforms such as Twitch are their own media entity and have the challenge of having to manage interactions with many different types of online audiences beyond the translucent platform interfaces. Through qualitative interviews with 25 Twitch streamers, in this paper we share streamers’ practices of discovering audience composition, categorizing audience groups, and developing appropriate mechanisms to interact with them despite geographical, technological, and temporal limitations. We discuss streamers’ appropriation of real-time signals provided by these platforms as sources of information, and their dependence on both technology and voluntary human labor to scale their media entity. We conclude with design recommendations for streaming platforms to provide streamer-centric tools for audience management, especially for knowledge discovery and growth management. .
Video production is a collaborative process involving creative, artistic and technical elements that require a multitude of specialised skill sets. This open-ended work is often marked by uncertainty and interpretive flexibility in terms of what the product is and should be. At the same time, most current video production tools are designed for single users. There is a growing interest, both in industry and academia, to design features that support key collaborative processes in editing, such as commenting on videos. We add to current research by unpacking specific forms of collaboration, in particular the social mechanisms and strategies employed to reduce interpretive flexibility and uncertainty in achieving agreements between editors and other collaborators. The findings contribute to the emerging design interest by identifying general design paths for how to support collaboration in video editing through scaffolding, iconic referencing, and suggestive editing.
This paper introduces a generic framework for OBM storytelling. Aiming to function as a complete end-to-end reference for authoring OBM narrative content - from conception to realization - it proposes an integrated model that includes the three essential levels: conceptual, technological and aesthetic. At the conceptual level, we introduce a set of abstractions which provide a unified reference for thinking, describing and analysing interactive narrative structures of OBM content. Their recursive nature make our model stand out in terms of its power of expression. These abstractions have direct one-to-one operational counterparts implemented in our production-independent authoring toolkit – Cutting Room. This ensures that any specific story designs conceived within the proposed conceptual model are directly realisable as OBM productions. This isomorphic relationship between the abstract concepts and their operationalisation is another distinguishing aspect of our overall proposition. We have validated the model at the aesthetic level through the production of the interactive film What is Love?, experienced by over 900 people at the media art festival Mediale 2018 in York, UK, and evaluated through a dedicated questionnaire by 94 of them. As the foundations of OBM storytelling have not yet been established, we trust this paper constitutes a significant milestone in its development.
Recently, various over-the-top (OTT) streaming services as well as traditional broadcasts distribute numerous content every day, allowing users to watch their favorite content at any time. While users can choose from a quantity of content, they often view the same content mainly because most OTT streaming services implement recommendation systems. However, it is often difficult for users to realize the same content they viewed and enjoy sharing their opinions or feelings because they do not know each other when and which content they viewed. For the purpose of encouraging such enjoyable experiences, we propose a system architecture that allows users to share the same content they viewed by using the user's viewing history data. Such viewing history data is currently collected and stored by hundreds of different services and companies. Therefore, our system architecture adopts a user-centric data control model that allows users to collect and store their data on their own online storages, and use it for their purposes. If users are asked to disclose all of the raw data of their viewing history to each other or to third party when sharing, most of them will feel anxiety because the data often contains sensitive personal information. Therefore, we introduce a method using private set intersection (PSI), a cryptographic technique that allows users to share the same elements in the users’ viewing history data without revealing anything to each other except the elements at the intersection. We also demonstrate the feasibility of the architecture through use cases.
This paper reports on the work being done towards achieving an immersive and accessible reconstruction of Historical and Cultural Heritage, focusing on Theaters of the Early Modern as use case. In particular, the paper presents and discusses potential possibilities to enable: 1) the acoustical reconstruction of the virtual (lost) environments – beyond the graphical reconstruction for the buildings, elements and performances; 2) effective interaction features and navigation within the virtual environment (e.g. by means of adaptive interfaces, guiding methods, insertion of Point of Interest); and 3) accessible experiences, by means of an innovative and personalized presentation modes for access services, like subtitles and audio description. For most of these aspects and features, proof of concept implementation are provided, and opportunities for future work are outlined. With an effective combination of all these contributions, the goal is to bring back valuable (both tangible and intangible) Cultural Heritage from the past, providing high benefits in relevant sectors like Culture, Tourism and Education.
This paper presents a Web service that supports the automatic generation of video summaries for user-submitted videos. The developed Web application decomposes the video into segments, evaluates the fitness of each segment to be included in the video summary and selects appropriate segments until a pre-defined time budget is filled. The integrated deep-learning-based video analysis and summarization technologies exhibit state-of-the-art performance and, by exploiting the processing capabilities of modern GPUs, offer faster than real-time processing. Configurations for generating video summaries that fulfill the specifications for posting on the most common video sharing platforms and social networks are available in the user interface of this application, enabling the one-click generation of distribution-channel-specific summaries.
We built a chatbot system–Audience Bot–that simulates an audience for novice live streamers to engage with while streaming. New live streamers on platforms like Twitch are expected to perform and talk to themselves, even while no one is watching. We ran an observational lab study on how Audience Bot assists novice live streamers as they acclimate to multitasking–simultaneously playing a video game while performing for a (simulated) audience.
This paper presents the design and evaluation of a localized crowd-sourced multimedia production and distribution system to enable today’s mobile producers of multimedia content to serve their content in real-time to other nearby users, while at the same time drastically reducing the bandwidth as compared to traditional crowd-sourced multimedia services. To achieve this, we created a modular system for many-to-many live production and distribution and deployed it within the media delivery platform developed in the FLAME project. The FLAME platform provides distributed edge computing as well as a programmable network infrastructure.
We have performed a first trial of our system in the instance of the FLAME platform deployed in Bristol. Our trial revealed that with our setup 93% bandwidth can be saved, and the latency can be kept as low as 3s. The user evaluation further indicated that a sufficient amount of users (45%) enjoys producing content, an important prerequisite for prosuming apps to thrive.
360-degree videos have recently grown in popularity thanks to the popularization of virtual reality and its adoption by major online video streaming platforms. Being watched on a diverse array of interfaces (virtual reality headsets, computers, mobile devices, etc.), user behavior needs to be analyzed for all these devices. A 360-degree hypervideo production, a virtual tour around the streets of a city, is played in an experiment by 30 users with multiple devices through a web application which collects their behavior following the Experience API standard. The distribution of viewport centers when playing these 360-degree videos is analyzed and compared between device and video types, showing an evident difference in behavior when playing videos of different type.
While HCI remains vastly abundant in human- and land- centric applications, in this work we focus on exploring further the Human Computer Biosphere Interaction (HCBI) concept in aquatic settings. Based on the existing techniques for prototyping the geodesic domes, we design them as five marine megafauna species, for the on- and off- shore locations. We describe novel interaction concepts with and within such structures: (i) Turtle AR nesting, (ii) Bird XR watching, (iii) Dolphin acoustic swimming, (iv) Seal night scuba-diving, and (v) Whale projection mapping. We report the design of such interaquatic environments, focused at depicting the ongoing concerns with such marina megafauna species, discussing their feasibility, suggesting research, implementation and validation for all interaquatic domes, planned in our future work.
Social VR shall allow natural communication between users with high social presence, as if users are in the same room. One way to increase social presence is to add haptic interaction to allow, for example, users to give each other a ”high-five” or to pass documents among them. In this paper, we present our web-based VR communication framework with an added haptic component to simulate touch. The goal of this framework is to enhance the VR communication experience and the social cues exchange between users in VR. We describe our method for rendering haptic feedback within the web-based framework and evaluate the perceived quality of our system with a user survey (with 119 participants). Our proof-of-concept system was rated positively, with the haptic component offering an enhanced quality of the VR experience for 78% of the participants.
As cities become overpopulated and we explore deep seas and outer space, humans will be faced with the inevitable situation of living and working in small spaces that are visually monotonous. It is thus important to understand the psychological effects that enclosed spaces can have on people. In this study, we ran a between-subjects design experiment with people working in windowless offices by installing a digital window that featured a nature landscape video. Quantitative measures of productivity and wellbeing before and after the intervention showed that having a digital window elevated mood and happiness, but did not have a significant effect on work productivity.
This work in progress reports on ongoing experimentation with machine learning approaches on time series data, where the time series is a quantification of the success of content about a certain topic published on a certain digital channel over a past time period. The experiment tests how accurate predictive analytical approaches can be to predict the future success of a piece of media content published on the Web or social media platform according to its topics. Our intention is to enable a new innovation in media organizations’ content publication strategies, where the choice of media for a future publication can be informed by such predictive capabilities in order to maximize the potential content's reach to a digital audience.
In the research project 5G-VICTORI we are looking into how fifth generation mobile network technology (5G) can help improve video streaming services for train passengers. We use the principle of a local 5G-based data link between trains and 5G base stations in railway stations to load media assets to a caching server in the train, from which passengers can stream media via the regular apps of video-on-demand (VoD) service operators. The paper gives an outlook on the challenges in delivering VoD services to trains via this type of cloud-edge platform. These include determination of relevant content for upload to the cache, enable good quality-of experience (QoE) live streaming and WiFi traffic shaping for fair allocation of network bandwidth to streaming clients.
Connected vehicles collect and share data by communicating with road infrastructure, with each other, the web, IoT systems, and with their occupants’ personal devices. Part of this data is presented to drivers via a multitude of interactive devices and systems. Thus, one challenge that arises in such a complex environment is effective and safe operation of the various interactive systems, e.g., the in-vehicle infotainment (IVI). In this paper, we present a synopsis of input modalities from the literature of automotive user interfaces (AutoUIs) for media consumption inside connected vehicles.
Common consumer behaviors, including watching TV broadcasts and performing daily activities, interact with one another. However, it is difficult to smoothly link broadcast and Internet services with real-world services and data. For example, when viewing interesting content on a broadcast, people often takes notes or takes pictures of a TV screen. Although they cannot easily use the viewing data and watched content at other services when taking the next action. The service provider also cannot know whether the service was used because of the broadcast. To meet this challenge, in this paper, we present how to make a smooth collaboration between broadcast and a calendar application and its effects, because it is a general application and it records various activities. We developed a calendar application that allows users to register program schedules and watch TV programs with simple operations. Based on the user evaluations, we confirmed the following results: The use of broadcast program data as schedule data has the effect of reminding the TV programs. Even for consumers with decreasing viewing time, viewing opportunities will be improved by simplifying the operation flow from a program schedule registration to broadcast viewing. Moreover, linking the schedule and history data of programs and daily activities on the calendar may promote the use of both services.
TV experiences are often social, be it at-a-distance (through text) or in-person (through speech). Mixed Reality (MR) headsets offer new opportunities to enhance social communication during TV viewing by placing social artifacts (e.g. text) anywhere the viewer wishes, rather than being constrained to a smartphone or TV display. In this paper, we use VR as a test-bed to evaluate different text locations for MR TV specifically. We introduce the concepts of wall messages, below-screen messages, and egocentric messages in addition to state-of-the-art on-screen messages (i.e., subtitles) and controller messages (i.e., reading text messages on the mobile device) to convey messages to users during TV viewing experiences. Our results suggest that a) future MR systems that aim to improve viewers’ experience need to consider the integration of a communication channel that does not interfere with viewers’ primary task, that is watching TV, and b) independent of the location of text messages, users prefer to be in full control of them, especially when reading and responding to them. Our findings pave the way for further investigations towards social at-a-distance communication in Mixed Reality.