A research group led by Professor Yoichi Sato of the University of Tokyo has developed a method for predicting the movement of a person's line of sight with unprecedented accuracy from a "first-person view image" recorded by a head-mounted camera.It is expected to be used in a wide range of fields such as handing down traditional skills and medical treatment.
To understand a person's detailed behavior, it is important to know when and what he or she is paying attention to.If it is possible to predict how a person's line of sight will move from an image, it will be possible to know what and how a person is looking without using a special device such as a line-of-sight measurement device.On the other hand, it was known that the movement of a person's line of sight strongly depends on the work performed by that person, but with the existing first-person view image (image recorded by a head-mounted camera), the line-of-sight prediction method is used. , This work dependency was not taken into account.
In this research, we developed a method that can predict the line-of-sight position with high accuracy from the first-person viewpoint image by considering the pattern of line-of-sight movement that depends on the work, inspired by the attention model in image analysis using deep learning. bottom.As a result, we succeeded in predicting when and what kind of things people should move their eyes to while cooking in the kitchen.
When an evaluation experiment was conducted using the standard benchmark data set used in the research of first-person viewpoint video analysis, the proposed method obtained the line-of-sight position with an accuracy of up to about 40% higher than that of the latest existing line-of-sight prediction method. It was confirmed that it could be predicted.
The technology developed this time is widely used in various fields related to human behavior sensing and analysis, such as handing down skills in the field of manufacturing, early screening of autism spectrum disorders, and analysis of driver's visual behavior when driving a car. Is expected to be utilized.
Paper information:[European Conference on Computer Vision (ECCV 2018)] Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition