A research group led by Professor Hiroyuki Kitagawa of the Center for Computational Sciences, University of Tsukuba estimates unknown labels from attribute labels such as age and gender of nodes (users) in big data analysis of networks in social networks (hereinafter referred to as SNS). Develop technology.It was announced at the 2016 SIAM International Conference on Data Mining held in the United States.
With the progress of information technology, the amount of data that flies in society is increasing explosively.Under such circumstances, the utilization of big data is strongly demanded, and in particular, how to utilize network data containing a lot of useful information is emphasized.
For example, in SNS, the age and place of residence of the corresponding user can be given as "attribute labels" to each node, but some nodes do not clearly indicate them.In order to search for label information that is important supplementary information for linking the contents of a node, "label estimation" that estimates an unknown label from a known label in network data is required, and various types have been used so far. Methods have been developed.However, there is a problem that these cannot be applied to network data in which nodes with different labels are easily connected to each other.
In the method proposed this time, as a basic idea, in addition to the conventional "ratio", "absolute number" is also taken into consideration, and a clue of the amount proportional to "reliability" is transmitted to the surroundings to estimate the label.As a result, the amount of clues (reliability) from neighboring nodes can be taken into consideration, and network data in which nodes with different labels can easily connect can be handled.According to the experimental results, the proposed method shows higher accuracy than the mainstream ones so far.
By introducing the concept of "reliability" into the conventional estimation process, more accurate label estimation becomes possible in this research result.In the future, it is expected to become an innovative technology for estimating user attributes when considering the cooperation and utilization with big data in the real world.