Eye contact between individuals is particularly important for understanding human behaviour. To further investigate the importance of eye contact in social interactions, portable eye tracking technology seems to be a natural choice. However, the analysis of available data can become quite complex. Scientists need data that is calculated quickly and accurately. Additionally, the relevant data must be automatically separated to save time. In this work, we propose a tool called MutualEyeContact which excels in those tasks and can help scientists to understand the importance of (mutual) eye contact in social interactions. We combine state-of-the-art eye tracking with face recognition based on machine learning and provide a tool for analysis and visualization of social interaction sessions. This work is a joint collaboration of computer scientists and cognitive scientists. It combines the fields of social and behavioural science with computer vision and deep learning.
The number of near-infrared light-emitting diodes (LEDs) is increasing to improve the accuracy and robustness of eye-tracking methods, and it is necessary to determining the identifiers (IDs) of the LEDs when applying multiple light sources. Therefore, we propose polarized near-infrared light emissions for an eye gaze estimation. We succeeded in determining the IDs of LEDs using polarization information. In addition, we remove glints from the cornea for correctly detecting the pupil center. We confirmed the effectiveness of using polarized near-infrared light emissions through evaluation experiments.
Detecting the point-of-gaze in the real world is a challenging problem in eye-tracking applications. The point-of-gaze is estimated using geometry constraints, and user-calibration is required. In addition, the distances of the focused targets are variable and large in the real world. Therefore, a calibration-free approach without geometry constraints is needed to estimate the point-of-gaze. Recent studies have investigated smooth pursuit eye movements (smooth pursuits) for human-computer interaction applications, and we consider that these smooth pursuits can also be employed in eye tracking. Therefore, we developed a method for estimating the point-of-gaze using smooth pursuits without any requirement for implicit and explicit user-calibration. In this method, interest points are extracted from the scene image, and the point-of-gaze is detected using these points, which are strongly correlated with eye movements. We performed a comparative experiment in a real environment and demonstrated the feasibility of the proposed method.
In this work we evaluate neural networks, support vector machines and decision trees for the regression of the center of the eyeball and the optical vector based on the pupil ellipse. In the evaluation we analyze single ellipses as well as window-based approaches as input. Comparisons are made regarding accuracy and runtime. The evaluation gives an overview of the general expected accuracy with different models and amounts of input ellipses. A simulator was implemented for the generation of the training and evaluation data. For a visual evaluation and to push the state of the art in optical vector estimation, the best model was applied to real data. This real data came from public data sets in which the ellipse is already annotated by an algorithm. The optical vectors on real data and the generator are made publicly available. Link to the generator and models.
In this work, we compare the use of convolution, binary, and decision tree layers in neural networks for the estimation of pupil landmarks. These landmarks are used for the computation of the pupil ellipse and have proven to be effective in previous research. The evaluated structure of the neural networks is the same for all layers and as small as possible to ensure a real-time application. The evaluations include the accuracy of the ellipse determination based on the Jaccard Index and the pupil center. Furthermore, the CPU runtime is considered to make statements about the real-time usability. The trained models are also optimized using pruning to improve the runtime. These optimized nets are also evaluated with respect to the Jaccard index and the accuracy of the pupil center estimation. Link to the framework and models.
Eye movement-based biometric has been developed for over 15 years, but for now - to the authors’ knowledge - no commercial applications utilize this modality. There are many reasons for this, starting from still low accuracy and ending with the problematic setup. One of the essential elements of this setup is the calibration, as nearly every eye tracker needs to be calibrated before its first usage. This procedure makes any authentication based on eye movement a cumbersome and lengthy process.
The main idea of the research presented in this paper is to perform authentication based on a signal from a cheap remote eye tracker but - contrary to the previous studies - without any calibration of the device. The uncalibrated signal obtained from the eye tracker is used directly, which significantly simplifies the enrollment process.
The experiment presented in the paper aims at protection from a so-called ”lunchtime attack” when an unauthorized person starts using a computer, taking advantage of the absence of the legitimate user. We show that such an impostor may be detected with an analysis of the signal obtained from the eye tracker when the user clicks with a mouse objects on a screen. The method utilizes the assumptions that: (1) users usually look at the point they click, and (2) an uncalibrated eye tracker signal is different for different users.
It has been shown that after the analysis of nine subsequent clicks, the method is able to achieve the Equal Error Rate lower than 15% and may be treated as a valuable and difficult to counterfeit supplement to classic face recognition and password-based computer protection methods.
Several visual analytics (VA) systems are used for analyzing eye-tracking data because they synergize human-in-the-loop exploration with speed and accuracy of the computer. In the VA systems, the choices of visualization techniques could afford discovering certain types of insights while hindering others. Understanding these affordances and hindrances is essential to design effective VA systems. In this paper, we focus on two approaches for visualizing AOI transitions: the transition-based approach (exemplified by the radial transition graph, RTG) and the sequence-based approach (exemplified by the Alpscarf). We captured the insights generated by two analysts who individually use each visualization technique on the same dataset. Based on the results, we identify four phases of analytic activities and discuss opportunities that the two visualization approaches can complement each other. We point out design implications for VA systems that combine these visualization approaches.
Benign Paroxysmal Positional Vertigo (BPPV) is the most common cause of vertigo and dizziness. Patients with those symptoms can be diagnosed by the presence of a specific pattern of nystagmus during the Dix-Hallpike maneuver. However, almost half of dizzy patients visiting Emergency Department (ED) are misdiagnosed, leading to significant morbidity and high medical costs. This can be attributed to the lack of specialized expertise of front-line physicians and to the lack of validated automatic commercial devices and software for nystagmus detection and quantification. Here we aim to enhance saccade detection thereby improving automatic nystagmus quantification. The proposed method is evaluated on a nystagmus dataset recorded from patients in the ED as they undergo the Dix-Hallpike maneuver. Additionally, the proposed method is also tested on a publicly available saccade dataset and compared with state-of-the-art eye movement detection methods.
We present methods for extracting corneal images and estimating pupil centers continuously and reliably using head worn glasses that consists of two eye cameras. An existing CNN was modified for detecting pupils in IR and RGB images, and stereo vision together with 2D and 3D models are used. We confirm the feasibility of the proposed methods through user study results, which show that the methods can be used in future real gaze estimation systems.
In this work, a novel method to estimate the ocular pose from electrooculography (EOG) signals is proposed. This method is based on an electrical battery model of the eye which relates the EOG potential to the distances between an electrode and the left/right cornea and retina centre points. In this work, this model is used to estimate the ocular angles (OAs), that is the orientation of the two ocular globes separately. Using this approach, an average cross-validated horizontal and vertical OA estimation error of 2.91 ± 0.86° and 2.42 ± 0.58° respectively was obtained. Furthermore, we show how these OA estimates may be used to estimate the gaze angles (GAs) without requiring the distance between the subject’s face-plane and the target-plane, as in previous work. Using the proposed method, a cross-validated horizontal and vertical GA estimation error of 2.13 ± 0.73° and 2.42 ± 0.58° respectively was obtained, which compares well with the previous distance-based GA estimation technique.
Eyes-only interaction in HCI usually requires visual focusing to carry out input arguments. For complex items, however, input requests are mixed with visual inspection, increasing the likelihood of false positive entries. Recent research applied cognitive control of pupil dilations as an input mechanism that works independently from any fixation times. However, experiments were exclusively conducted under laboratory conditions. The present study investigates the potential of exerting control over pupil diameter in noisy environments. Participants explore various techniques in a controlled sequence of (real-time) feedback sessions. Thereafter, they follow a predefined course across indoor and outdoor stations to either induce dilations or perform a control task. Results indicate strong interindividual differences in performance. Outdoor pupil dynamics exhibit a high degree of variation including a considerable number of unintended dilations. Accordingly, environmental control seems to constitute a necessary condition to exert cognitive control on pupil diameter and enable pupil-based interaction in HCI.
Many applications in eye tracking have been increasingly employing neural networks to solve machine learning tasks. In general, neural networks have achieved impressive results in many problems over the past few years, but they still suffer from the lack of interpretability due to their black-box behavior. While previous research on explainable AI has been able to provide high levels of interpretability for models in image classification and natural language processing tasks, little effort has been put into interpreting and understanding networks trained with eye movement datasets. This paper discusses the importance of developing interpretability methods specifically for these models. We characterize the main problems for interpreting neural networks with this type of data, how they differ from the problems faced in other domains, and why existing techniques are not sufficient to address all of these issues. We present preliminary experiments showing the limitations that current techniques have and how we can improve upon them. Finally, based on the evaluation of our experiments, we suggest future research directions that might lead to more interpretable and explainable neural networks for eye tracking.
The computational modelling of visual attention relies entirely on visual fixations that are collected during eye-tracking experiments. Although all fixations are assumed to follow the same attention paradigm, some studies suggest the existence of two visual processing modes, called ambient and focal. In this paper, we present the high discrepancy between focal and ambient saliency maps and propose an automatic method for inferring the degree of focalness of an image. This method opens new avenues for the computational modelling of saliency models and their benchmarking.
Simultaneous head and eye tracking has traditionally been confined to a laboratory setting and real-world motion tracking limited to measuring linear acceleration and angular velocity. Recently available mobile devices such as the Pupil Core eye tracker and the Intel RealSense T265 motion tracker promise to deliver accurate measurements outside the lab. Here, the researchers propose a hard- and software framework that combines both devices into a robust, usable, low-cost head and eye tracking system. The developed software is open source and the required hardware modifications can be 3D printed. The researchers demonstrate the system’s ability to measure head and eye movements in two tasks: an eyes-fixed head rotation task eliciting the vestibulo-ocular reflex inside the laboratory, and a natural locomotion task where a subject walks around a building outside of the laboratory. The resultant head and eye movements are discussed, as well as future implementations of this system.
Tools for eye-tracking data analysis are as of now either provided as proprietary software by the eye-tracker manufacturer or published by researchers under licenses that are problematic for some use-cases (e.g., GPL3). This lead to repeated re-implementation of the most basic building blocks, such as event filters, often resulting in incomplete, incomparable and even erroneous implementations.
The Perception Engineer’s Toolkit is a collection of basic functionality for eye-tracking data analysis double licensed with CC0 or MIT license that allows for easy integration, modification and extension of the codebase. Methods for data import from different formats, signal pre-processing and quality checking as well as several event detection algorithms are included. The processed data can be visualized as gaze density map or reduced to key metrics of the detected eye movement events. It is programmed entirely in python utilizing high performance matrix libraries and allows for easy scripting access to batch-process large amounts of data.
The code is available at https://bitbucket.org/fahrensiesicher/perceptionengineerstoolkit
In this paper, we evaluate a synthetic framework to be used in the field of gaze estimation employing deep learning techniques. The lack of sufficient annotated data could be overcome by the utilization of a synthetic evaluation framework as far as it resembles the behavior of a real scenario. In this work, we use U2Eyes synthetic environment employing I2Head datataset as real benchmark for comparison based on alternative training and testing strategies. The results obtained show comparable average behavior between both frameworks although significantly more robust and stable performance is retrieved by the synthetic images. Additionally, the potential of synthetically pretrained models in order to be applied in user’s specific calibration strategies is shown with outstanding performances.
The majority of eye-tracking systems require user-specific calibration to achieve suitable accuracy. Traditional calibration is performed by presenting targets at fixed locations that form a certain coverage of the device screen. If simple regression methods are used to learn a gaze map from the recorded data, the risk of overfitting is minimal. This is not the case if a gaze map is formed using neural networks, as is often employed in photosensor oculography (PSOG), which raises the question of careful design of calibration procedure. This paper evaluates different calibration data parsing approaches and the collection time-performance trade-off effect of grid density to build a calibration framework for PSOG with the use of video-based simulation framework.
To analyze eye-tracking data the viewed image is often divided into areas of interest (AOI). However, the temporal dynamics of eye movements towards the AOI is often lost either in favor of summary statistics (e.g., proportion of fixations or dwell time) or is significantly reduced by “binning” the data and computing the same summary statistic over each time bin. This paper introduces SPLOT: smoothed proportion of looks over time method for analyzing the eye movement dynamics across AOI. SPLOT comprises of a complete workflow, from visualization of the time-course to performing statistical analysis on it using cluster-based permutation testing. The possibilities of SPLOT are illustrated by applying it to an existing dataset of eye movements of radiologists diagnosing a chest X-ray.
As virtual reality (VR) garners more attention for eye tracking research, knowledge of accuracy and precision of head-mounted display (HMD) based eye trackers becomes increasingly necessary. It is tempting to rely on manufacturer-provided information about the accuracy and precision of an eye tracker. However, unless data is collected under ideal conditions, these values seldom align with on-site metrics. Therefore, best practices dictate that accuracy and precision should be measured and reported for each study. To address this issue, we provide a novel open-source suite for rigorously measuring accuracy and precision for use with a variety of HMD-based eye trackers. This tool is customizable without having to alter the source code, but changes to the code allow for further alteration. The outputs are available in real time and easy to interpret, making eye tracking with VR more approachable for all users.
State-of-the-art appearance-based gaze estimation methods, usually based on deep learning techniques, mainly rely on static features. However, temporal trace of eye gaze contains useful information for estimating a given gaze point. For example, approaches leveraging sequential eye gaze information when applied to remote or low-resolution image scenarios with off-the-shelf cameras are showing promising results. The magnitude of contribution from temporal gaze trace is yet unclear for higher resolution/frame rate imaging systems, in which more detailed information about an eye is captured. In this paper, we investigate whether temporal sequences of eye images, captured using a high-resolution, high-frame rate head-mounted virtual reality system, can be leveraged to enhance the accuracy of an end-to-end appearance-based deep-learning model for gaze estimation. Performance is compared against a static-only version of the model. Results demonstrate statistically-significant benefits of temporal information, particularly for the vertical component of gaze.
Eye tracking is handled as one of the key technologies for applications that assess and evaluate human attention, behavior, and biometrics, especially using gaze, pupillary, and blink behaviors. One of the challenges with regard to the social acceptance of eye tracking technology is however the preserving of sensitive and personal information. To tackle this challenge, we employ a privacy-preserving framework based on randomized encoding to train a Support Vector Regression model using synthetic eye images privately to estimate the human gaze. During the computation, none of the parties learn about the data or the result that any other party has. Furthermore, the party that trains the model cannot reconstruct pupil, blinks or visual scanpath. The experimental results show that our privacy-preserving framework is capable of working in real-time, with the same accuracy as compared to non-private version and could be extended to other eye tracking related problems.
Video-based eye trackers estimate gaze based on eye images/videos. As security and privacy concerns loom over technological advancements, tackling such challenges is crucial. We present a new approach to handle privacy issues in eye videos by replacing the current identifiable iris texture with a different iris template in the video capture pipeline based on the Rubber Sheet Model. We extend to image blending and median-value representations to demonstrate that videos can be manipulated without significantly degrading segmentation and pupil detection accuracy.
This paper introduces a conceptually simple and effective Deep Audio-Visual Embedding for dynamic saliency prediction dubbed “DAVE” in conjunction with our efforts towards building an Audio-Visual Eye-tracking corpus named “AVE”. Despite existing a strong relation between auditory and visual cues for guiding gaze during perception, video saliency models only consider visual cues and neglect the auditory information that is ubiquitous in dynamic scenes. Here, we propose a baseline deep audio-visual saliency model for multi-modal saliency prediction in the wild. Thus the proposed model is intentionally designed to be simple. A video baseline model is also developed on the same architecture to assess effectiveness of the audio-visual models on a fair basis. We demonstrate that audio-visual saliency model outperforms the video saliency models. The data and code are available at https://hrtavakoli.github.io/AVE/ and https://github.com/hrtavakoli/DAVE
Analyzing visual perception in scene images is dominated by two different approaches: 1.) Eye Tracking, which allows us to measure the visual focus directly by mapping a detected fixation to a scene image, and 2.) Saliency maps, which predict the perceivability of a scene region by assessing the emitted visual stimulus with respect to the retinal feature extraction. One of the best-known algorithms for calculating saliency maps is GBVS. In this work, we propose a novel visualization method by generating a joint fixation-saliency heatmap. By incorporating a tracked gaze signal into the GBVS, the proposed method equilibrates the fixation frequency and duration to the scene stimulus, and thus visualizes the rate of the extracted visual stimulus by the spectator.
Eye gaze promises to be a fast and intuitive way of interacting with technology. Importantly, the performance of a gaze selection paradigm depends on the eye tracker used: Higher tracking accuracy allows for selection of smaller targets, and higher precision and sampling rate allow for faster and more robust interaction. Here we present a novel approach to predict the minimal eye tracker specifications required for gaze-based selection. We quantified selection performance for targets of different sizes while recording high-fidelity gaze data. Selection performance across target sizes was well modeled by a sigmoid similar to a psychometric function. We then simulated lower tracker fidelity by adding noise, a constant spatial bias, or temporal sub-sampling of the recorded data while re-fitting the model each time. Our approach can inform design by predicting performance for a given interface element and tracker fidelity or the minimal element size for a specific performance level.
Eye-tracking data often provide access to information about users’ strategies and preferences that extend beyond purely behavioral data. Thanks to modern eye-tracking technology, gaze can be tracked rather unobtrusively in real-world settings. Here we examine the usefulness of gaze tracking with a mobile eye-tracker for interface design in an industrial setting, specifically the operation of a food production line. We use a mock task that is similar in its interface usage to the actual production task in routine machine operation. We compare two interface designs to each other as well as two levels of user expertise. We do not find any effects of experience or interface type in the behavioral data - in particular, both user groups needed the same time to complete the task on average. However, gaze data reveals different strategies: users with high experience in using the interface spend significantly less time looking at the screen – that is, actually interacting with the interface – in absolute terms as well as expressed as fraction of the total time needed to complete the task. This exemplifies how gaze tracking can be utilized to uncover different user-dependent strategies that would not be accessible through behavioral data alone.
Recent studies have shown a number of procedural similarities between solving problems in mental and in physical rotation. Such similarities open up the interesting option to study mental rotation indirectly through physical rotation, with the advantage that physical rotation processes can be much more easily observed than mental ones. To better assess where solution processes in mental and physical rotation differ, though, it is important to know what influence any specific interaction method in physical rotation will have. We present results from a comparison of two such interaction methods: a one-handed, touch-based and a two-handed, ball-based method. Our analysis focuses on fixation durations and saccade amplitudes as proxies for mental load. Results show, importantly, that the choice of interaction method seems to matter but little. We therefore suggest that the existing findings of past studies that have compared mental to physical rotation are likely highly comparable, despite the fact that different interaction techniques were used.
Gaze gesture is a desirable technique for a spontaneous and pervasive gaze interaction due to its insensitivity to spatial accuracy. Unfortunately, gaze gesture-based object selection utilizing correlation coefficient is prone to a low object selection accuracy due to presence of noises. In addition, effect of various types of eye movements that present in gaze gesture-based object selection has not been tackled properly. To overcome these problems, we propose a denoising method for gaze gesture-based object selection using First Order IIR Filter and an event detection method based on the Hidden Markov Model. The experimental results show that the proposed method yielded the best object selection accuracy of . The result suggests that a spontaneous gaze gesture-based object selection is feasible to be developed in the presence of noises and various types of eye movements.
In this work, we present a comparison between Android’s lock patterns for mobile devices (TouchLockPatterns) and an implementation of lock patterns that uses gaze input (GazeLockPatterns). We report on results of a between subjects study (N=40) to show that for the same layout of authentication interface, people employ comparable strategies for pattern composition. We discuss the pros and cons of adapting lock patterns to gaze-based user interfaces. We conclude by opportunities for future work, such as using data collected during authentication for calibrating eye trackers.
In the present study, we aim to explore whether and how well we can predict tasks based on eye movements in a virtual environment. We designed four different tasks in which participants had to align two cubes of different sizes. To define where participants looked, we used a ray-based method to calculate the point-of-regard (POR) on each cube at each time point. Using leave-one-subject-out cross-validation, our model performed well with an f1-score of 0.51 ± 0.17 (chance level 0.25) in predicting the four alignment types. Results suggest that the type of task can be decoded based on the aggregation of PORs. We further discuss the implications of object size on task inference and thus set an exciting road-map for how to design intention recognition experiments in virtual reality.
The integration of eye-tracking sensors in next-generation AR glasses will increase usability and enable new interaction concepts. Consumer AR glasses emphasize however additional requirements to eye-tracking sensors, such as high integratability and robustness to ambient illumination. We propose a novel eye-tracking sensor based on the self-mixing interference (SMI) effect of lasers. In consequence, our sensor as small as a grain of sand shows exceptional robustness against ambient radiation compared to conventional camera-based eye trackers. In this paper, we evaluate ambient light robustness under different illumination conditions for video-based oculography, conventional scanned laser eye tracking as well as the SMI-based sensor.
As readers of a language, we all agree to move our eyes in roughly the same way. Yet might there be hidden within this self-similar behavior subtle clues as to how a reader is understanding the material being read? Here we attempt to decode a reader’s eye movements to predict their level of text comprehension and related states. Eye movements were recorded from 95 people reading 4 published SAT passages, each followed by corresponding SAT questions and self-evaluation questionnaires. A sequence of 21 fixation-location (x,y), fixation-duration, and pupil-size features were extracted from the reading behavior and input to two deep networks (CNN/RNN), which were used to predict the reader’s comprehension level and other comprehension-related variables. The best overall comprehension prediction accuracy was 65% (cf. null accuracy = 54%) obtained by CNN. This prediction generalized well to fixations on new passages (64%) from the same readers, but did not generalize to fixations from new readers (41%), implying substantial individual differences in reading behavior. Our work is the first attempt to predict comprehension from fixations using deep networks, where we hope that our large reading dataset and our protocol for evaluation will benefit the development of new methods for predicting reading comprehension by decoding gaze behavior.
A previous study demonstrated a freezing-like pattern of eye movements when participants could escape from aversive stimulation by a fast button press. Freezing of gaze was characterized by more centralized, fewer and longer fixations. Here, we aimed at examining whether 1) visual exploration is also reduced when a subsequent threat-reaction requires distributed attention and if so 2) whether this visual pattern is threat-specific. We measured gaze behavior while participants anticipated a certain, no or a potential aversive stimulation (study 1) or reward (study 2) that could be avoided or gained respectively via a fast joystick movement towards an indicated display side. In study 1, results replicated a centralization of gaze when participants expected an avoidable shock. In study 2, we did not find this pattern. These findings indicate that freezing of gaze is robust even when a subsequent reaction requires spatial attention. Furthermore, these visual dynamics seem to be threat-specific.
The possibility of assessing reward expectations was examined using the environment of a poker game which was controlled by the experimenters. Subjects were asked to express their degree of expectation of obtaining a reward from their bets at two stages of each hand. In the results, pupils dilated according to the extent of a subject’s reward expectation. In particular, as mean pupil sizes for the 2nd round of betting increased with the subject’s expectation of a reward during the game, there were significant differences between the three grades of expectations. Also, pupil sizes correlated with the change in expectation between the two rounds of each hand. Other indices such as saccade frequency and microsaccade frequency also responded to the expectation of a reward. These results provide evidence that metrics of oculo-motors can be an index of the level of reward expectation in the environment of a controlled card game.
A prototypical smooth pursuit controlled candy dispensing machine was set up in a public area and evaluated regarding performance data, self reported joy of use, learnability, perceived stress and perceived usefulness. 359 sets of user data were collected from visitors ranging from eight to 75 years. The results show an overall high rate of successful interactions (89.8%), indicating no correlation between height, age, gender or the use of corrective glasses or lenses and the ease and success of interaction. Incorrectly entered digits occurred for 36.2% of all participants, with half attributing the error to their own incorrect entry and half reporting the system to have detected a false number. Users reported a generally high joy of use and found the system easy to learn. Users indicated interest to use similar interaction technologies in public settings if privacy requirements such as protection from observers, are met.
Bicyclists face the risk of being involved in collisions with motor vehicles on roadways. This experiment used eye tracking technology to better understand how drivers notice bicyclists in daylight. Participants searched for bicyclists while wearing a head-mounted eye tracker, as they were driven along an open-road route that included a test bicyclist during the daytime. Participants pressed buttons when they detected that a bicyclist might be present in or near the roadway and when they were confident that a bicyclist was present. Through the use of eye tracking technology, this experiment provided a better understanding of the differences among the distances from which participants glance at, detect, and recognize bicyclists in daytime roadways.
Gaze estimation allows robots to better understand users and thus to more precisely meet their needs. In this paper, we are interested in gaze sensing for analyzing collaborative tasks and manipulation behaviors in human-robot interactions (HRI), which differs from screen gazing and other communicative HRI settings. Our goal is to study the accuracy that remote vision gaze estimators can provide, as they are a promising alternative to current accurate but intrusive wearable sensors. In this view, our contributions are: 1) we collected and make public a labeled dataset involving manipulation tasks and gazing behaviors in an HRI context; 2) we evaluate the performance of a state-of-the-art gaze estimation system on this dataset. Our results show a low default accuracy, which is improved by calibration, but that more research is needed if one wishes to distinguish gazing at one object amongst a dozen on a table.
Adolescents with Attention-deficit/hyperactivity disorder (ADHD) have difficulty processing speech with background noise due to reduced inhibitory control and working memory capacity (WMC). This paper presents a pilot study of an audiovisual Speech-In-Noise (SIN) task for young adults with ADHD compared to age-matched controls using eye-tracking measures. The audiovisual SIN task consists of varying six levels of background babble, accompanied by visual cues. A significant difference between ADHD and neurotypical (NT) groups was observed at 15 dB signal-to-noise ratio (SNR). These results contribute to the literature of young adults with ADHD.
Faces are an important and salient stimulus in our everyday life. They convey social information and, consequently, attract our attention easily. Here, we investigate this face-attraction-bias in detail and analyze the first fixations made in a free-viewing paradigm. We presented 20 participants with natural, head-centered, live-sized stimuli of indoor scenes, taken during unconstrained free-viewing in a real-world environment. About 70% of first fixations were made on human faces, rather than human heads, non-human faces or the background. This effect was present even though human faces constituted only about 5% of the stimulus area and occurred in a wide variety of positions. With a hierarchical logistic model, we identify behavioral and stimulus’ features that explain this bias. We conclude that the face-attraction bias replicates under more natural conditions, reflects high-level properties of faces, and discuss its implications on the measurement of brain dynamics.
Developmental dyslexia is a reading disability estimated to affect between 5 to 10 percent of the population. Current screening methods are limited as they tell very little about the oculomotor processes underlying natural reading. Investigation of eye-movement correlates of reading using machine learning could enhance detection of dyslexia. Here we used eye-tracking data collected during natural reading of 48 young adults (24 dyslexic, 24 control). We established a set of 67 features containing saccade-, glissade-, fixation-related measures and the reading speed. To detect participants with dyslexic reading patterns, we used a linear support vector machine with 10-fold stratified cross-validation repeated 10 times. For feature selection we used a recursive feature elimination method, and we also considered hyperparameter optimization, both with nested and regular cross-validation. The overall best model achieved a 90.1% classification accuracy, while the best nested model achieved a 75.75% accuracy.
Data rich environments rely on operator collaboration to manage workload changes. This work explores the relationship between operators’ visual attention and collaborative performance during these workload changes. Percent gaze overlap and percent recurrence were calculated over time for best and worst performing pairs of participants who experienced low and high workload in an unmanned aerial vehicle command and control testbed. It was found that the best performing pairs had higher values for both metrics after workload changed. These results suggest successful collaborative performance is dependent on both continuous high levels of synchronized visual attention and coordinated sequences of visual attention. This work has the potential to inform the design of real-time technology.
Cognitive load is an important source of information in performance situations. One promising non-invasive method is pupillometry. The Index of Pupillary Activity [IPA, Duchowski et al. 2018] performs a wavelet transformation on changes of pupillary dilations to detect high frequencies. This index is inspired by the Index of Cognitive Activity [ICA, Marshall 2000]. The IPA value is the sum of peaks exceeding a predefined threshold. The present study shows that it appears reasonable to adapt this threshold corresponding to the task. Fifty-five participants performed a spatial thinking test with six difficulty levels and two simple fixation tasks. Six different IPA values resulting from different thresholds were computed. The distributions of these IPA values of the eight conditions were analyzed regarding the validity to indicate different levels of cognitive load, corresponding to accuracy data. The analyses revealed that different thresholds are sensitive for different cognitive load levels. Contra-intuitive results were also obtained.
Previous research has shown that already durations of second fixations reveal concealed knowledge of an object. This very early memory effect could potentially be useful in applied settings. However, in order to use this effect, it is necessary to understand the processes causing the early fixation-based memory effect and the context which is necessary to obtain the effect. In four experiments, we disentangled the contribution of a probability-sensitive orienting response (OR) from probability-insensitive recognition memory processes. The results showed that the early fixation-based memory effect only appeared if both processes were involved. Moreover, the feature that triggers the OR has to be task relevant.
Egocentric videos offer an ecological approach to study human gaze behaviour. We were interested in understanding what people look at while performing the natural task of navigating in urban environments. Is there a collective pattern among all participants or are there substantial individual differences? To this end, we recorded egocentric video and gaze data from forty-three pedestrians. Here, we present this dataset designed to benchmark future research. The content of these videos was examined with respect to the depth and category of attended objects. We observe noticeable individual differences in both factors. Following these criteria, individual gaze patterns form a number of clusters. The unique signature of each set remains to be explored, whether it is based on low-level visual features or high-level cognitive characteristics.
Revealed discrepancies between a physical and perceptual image, called optical illusions, have been extensively researched to elucidate perceptual and cognitive functions in psychology and information engineering. While some optical illusions, including the Rotating Snakes Illusion (RSI), depend on gaze information of observers, eye movement information has not been actively used to measure this illusion. This study developed a method to quantitatively measure a spatial-dependent and temporal-dependent illusion using a dynamic perceptive compensation system incorporating an eye tracking device. A subject experiment comparing an illusory and a controlled image showed that the results for the illusory image only depended on the compensation time, while the results for the controlled image did not. Consequently, our method could measure the quantitative temporal dependence of RSI appropriately. Furthermore, the results suggest that the compensation algorithm need only be considered within 500 ms when controlling the illusion.
Usability analysis plays a significant role in optimizing Web interaction by understanding the behavior of end users. To support such analysis, we present a tool to visualize gaze and mouse data of Web site interactions. The proposed tool provides not only the traditional visualizations with fixations, scanpath, and heatmap, but allows for more detailed analysis with data clustering, demographic correlation, and advanced visualization like attention flow and 3D-scanpath. To demonstrate the usefulness of the proposed tool, we conducted a remote qualitative study with six analysts, using a dataset of 20 users browsing eleven real-world Web sites.
In this paper, we present a novel marker-free method for identifying screens of interest when using head-mounted eye tracking for visualization in cluttered and multi-screen environments. We offer a solution to discerning visualization entities from sparse backgrounds by incorporating edge-detection into the existing pipeline. Our system allows for both more efficient screen identification and improved accuracy over the state-of-the-art ORB algorithm.
Visually exploring AOI transitions aggregated from a group of eye tracked people is a challenging task. Many visualizations typically produce visual clutter or aggregate the temporal or visit order information in the data hiding the visual task solution strategies for the observer. In this paper we introduce the Sankeye technique that is based on the visual metaphor of Sankey diagrams applied to eye movement data, hence the name Sankeye. The technique encodes the frequencies of AOI transitions into differently thick rivers and subrivers. The distributions of the AOI transitions are visually represented by splitting and merging subrivers in a left-to-right reading direction. The technique allows to interactively adapt the number of predefined AOIs as well as the transition frequency number threshold with the goal to derive patterns and insights from eye movement data.
Gaze tracking in 3D has the potential to improve interaction with objects and visualizations in augmented reality. However, previous research showed that subjective perception of distance varies between real and virtual surroundings. We wanted to determine whether objectively measured 3D gaze depth through eye tracking also exhibits differences between entirely real and augmented environments. To this end, we conducted an experiment (N = 25) in which we used Microsoft HoloLens with a binocular eye tracking add-on from Pupil Labs. Participants performed a task that required them to look at stationary real and virtual objects while wearing a HoloLens device. We were not able to find significant differences in the gaze depth measured by eye tracking. Finally, we discuss our findings and their implications for gaze interaction in immersive analytics, and the quality of the collected gaze data.
Visualization in virtual 3D environments can provide a natural way for users to explore data. Often, arm and short head movements are required for interaction in augmented reality, which can be tiring and strenuous though. In an effort toward more user-friendly interaction, we developed a prototype that allows users to manipulate virtual objects using a combination of eye gaze and an external clicker device. Using this prototype, we performed a user study comparing four different input methods of which head gaze plus clicker was preferred by most participants.
In this article, we introduce how eye-tracking technology might become a promising tool to teach programming skills, such as debugging with ‘Eye Movement Modeling Examples’ (EMME). EMME are tutorial videos that visualize an expert's (e.g., a programming teacher's) eye movements during task performance to guide students’ attention, e.g., as a moving dot or circle. We first introduce the general idea behind the EMME method and present studies that showed first promising results regarding the benefits of EMME to support programming education. However, we argue that the instructional design of EMME varies notably across them, as evidence-based guidelines on how to create effective EMME are often lacking. As an example, we present our ongoing research on the effects of different ways to instruct the EMME model prior to video creation. Finally, we highlight open questions for future investigations that could help improving the design of EMME for (programming) education.
When reading algorithms, expert programmers are often able to predict what the code should contain. On occasion, however, this ability may result in so-called proof-readers’ errors, where the visual input is ignored and programmers process the code based on their own predictions. The goal of this study is to gain initial understanding of how proof-readers’ errors are reflected in eye-movement parameters of an experienced programmer, and to search for parameters that may be indicative for proof-readers’ errors in pseudocode reading. We applied a case-study approach to test the hypothesis that cognitive processing of notation, when read both with and without proof-readers’ errors, results in similarities in terms of selected eye-movement measures. However, our experienced programmer turned out to become a critical case falsifying this hypothesis. In general, case studies with expert programmers are a rather novel approach for eye-tracking studies of programming, even though single cases of experts’ eye movements are actively applied for the development of eye movement modelling examples. This study therefore also points to the importance of regarding expert examples not just as representatives as ”expert reading” in general, but also as unique cases worth a closer investigation.
Code reviews are an essential part of quality assurance in modern software projects. But despite their great importance, they are still carried out in a way that relies on human skills and decisions. During the last decade, there have been several publications on code reviews using eye tracking as a method, but only a few studies have focused on the performance differences between experts and novices. To get a deeper understanding of these differences, the following experiment was developed: This study surveys expertise-related differences in experts’, advanced programmers’, and novices’ eye movements during the review of eight short C++ code examples, including correct and erroneous codes. A sample of 35 participants (21 novices, 14 advanced and expert programmers) were recruited. A Tobii Spectrum 600 was used for the data collection. Measures included participants’ eye movements during the code review, demographic background data, and cued retrospective verbal comments on replays of their own eye movement recordings. Preliminary results give proof for experience-related differences between participants. Advanced and expert programmers performed significantly better in case of error detection and the eye tracking data implies a more efficient reviewing strategy.
Some features of eye movement during the reading of program code were analysed in order to develop a procedure to assess viewer comprehension ability. A set of eye movement data which was created by the measurement of eye movement during the viewing of a code programming project was used. While backward eye movement is natural under normal reading circumstances, this paper focuses on intentional eye movement in the opposite direction of the usual pattern of reading while viewing blocks of code, and the impact of the frequency of this on the comprehension of a code, which was confirmed. In examining the frequency of this reading behaviour, there were significant differences in both overall fixation times and mean saccade lengths for the two levels of comprehension of the code.
To better understand code comprehension and problem solving strategies, we conducted an eye tracking study that includes 51 undergraduate computer science students solving six pseudocode program comprehension tasks. Each task required students to order a sequence of pseudocode statements necessary to correctly solve a programming problem. We compare the viewing patterns of computer science students to evaluate changes in behavior while participants solve problems of varying difficulty. The intent is to find out if gaze patterns are similar prior to solving the task and if this pattern changes as the problems get more difficult. The findings show that as the difficulty increases regressions between areas of interest also tend to increase. Furthermore, an analysis of clusters of participants’ common viewing patterns was performed to identify groups of participants’ sharing similar gaze patterns prior to selecting their first choice of answer. Future work suggests an investigation on the relationship of these patterns with other background information (such as gender, age, English language proficiency, course completion) as well as performance (score, duration of task completion, competency level).
Studies of eye movements during source code reading have supported the idea that reading source code differs fundamentally from reading natural text. The paper analyzed an existing data set of natural language and source code eye movement data using the E-Z reader model of eye movement control. The results show that the E-Z reader model can be used with natural text and with source code where it provides good predictions of eye movement duration. This result is confirmed by comparing model predictions to eye-movement data from this experiment and calculating the correlation score for each metric. Finally, it was found that gaze duration is influenced by token frequency in code and in natural text. The frequency effect is less pronounced on first fixation duration and single fixation duration. An eye movement control model for source code reading may open the door for tools in education and the industry to enhance program comprehension.
Collaborative learning with educational games on multi-touch tabletop devices opens chances, challenges and questions which make contemporary approaches in learning analytics meet their boundaries. Multi-modality might help here, and eye-tracking is a promising data source in the effort to a better understanding of the learners’ behaviour. This article describes our previous work regarding serious games on large multi-touch tabletop displays, developed to teach computer science theory topics in a collaborative and entertaining way, our previous research efforts and challenges and obstacles we met on our way. Eye-tracking will improve our understanding of the learners’ behaviour while they are not interacting with the game, enhance the construction of coherent learner models and might even provide a subtle way of control on a medium of public interaction. The benefits of our work can be used to enhance the game mechanics and support shy or students with disabilities. We present our plan of action which follows a design-based research approach and includes the motivation for our work and our short- and long-term goals.
Eye movements indicate visual attention and strategies during game play, regardless of whether in board, sports, or computer games. Additional factors such as individual vs. group play and active playing vs. observing game play further differentiate application scenarios for eye movement analysis. Visual analysis has proven to be an effective means to investigate and interpret such highly dynamic spatio-temporal data. In this paper, we contribute a classification strategy for different scenarios for the visual analysis of gaze data during game play. Based on an initial sample of related work, we derive multiple aspects comprising data sources, game mode, player number, player state, analysis mode, and analysis goal. We apply this classification strategy to describe typical analysis scenarios and research questions as they can be found in related work. We further discuss open challenges and research directions for new application scenarios of eye movements in game play.
As part of an experimental study aimed at evaluating the linguistic and paralinguistic factors that can influence the sense of immersion in an open-world video game, we have partially opted for an eye-tracking data collection protocol. In doing so, various problems emerged in the course of the research and we therefore propose to report and analyze them in this article in order to provide useful feedback for further research. The first set of problems is of a technical nature and relates to the difficulty of collecting reliable eye tracking data in an open and complex game environment. Our second concern is about the difficulties that may appear depending on the morphological characteristics of the players. The third issue is about player’s familiarity with the game and the experimental parameters. And lastly, we discuss some post-processing issues for the analysis. The reflections raised from these few difficulties allow us to discuss some challenges for future oculometric research in complex video game environments.
It seems that controlling games with the eyes should be very intuitive and obvious. However, eye-controlled games have not become very popular yet. One of the reasons is – in our opinion – the necessity of eye tracker calibration before its every usage. This process is not very long, but it is inconvenient and requires focusing on the particular task. Moreover, sometimes the calibration fails and must be repeated.
According to our observations, even when the eye tracker is not calibrated for the specific user, there is some information in the registered signal that may be used to control a game. Of course, without the calibration, the eye tracker signal lacks accuracy and precision. However, it is acceptable for some types of games.
The main contribution of the paper is checking to what extent an uncalibrated eye tracker signal may be used in a gaming environment. At first, a simple experiment was prepared to verify if the gaze location and eye movement direction may be estimated, having only the uncalibrated signal. Then the idea was tested in the field study involving several hundred participants.
This paper investigates the players’ gaze behavior in different game difficulty settings to explore potential use cases of gaze-informed design interventions for future research activities. A comparative study was set up where subjects played the game Pac-Man in three difficulty settings while their gaze behavior was recorded via an eye-tracking device. Several measures were employed, such as the current position of the players’ gaze, the current position of the Pac-Man character, and the currently attended game object. While some game aspects did not show any significant results (e.g., the distance between Pac-Man and the gaze point), the time spent looking at one of Pac-Man’s enemies revealed a highly significant effect for the difficulty level. With the findings, we aim at informing designers and researchers regarding the pitfalls of using gaze as an analysis tool in the field of challenge in games. Furthermore, the insights of our efforts provide the basis for our future research activities that will use the obtained data to provide player guidance and support players in challenging game situations through visually augmenting objects located in the peripheral visual field.
Gaze-based interaction in Virtual Reality (VR) has been attracting attention recently due to rapid advances in eye tracking technology in head-mounted displays. Since gazes are a natural and intuitive interaction modality for human beings, gaze-based interaction could enhance player experience in immersive VR games. Aiming assistance is a common feature in games to balance difficulty for different player skills. Previous work has investigated different aim assistance approaches and identified various shortcomings. We hypothesize that “bullet magnetism” is a promising technique for VR and could be enhanced by extending its functionality through players’ gazes. In this paper, we present a gaze-based aiming assistance approach and propose a study design to evaluate its performance and player experience in a “Mexican-style” VR first-person shooter game.