A research group led by Hiroaki Funayama, a researcher (doctoral course) at the Graduate School of Tohoku University, developed a practice for assuring the quality of grading by dividing the work between a human grader and AI for automatic grading of written answers by artificial intelligence (AI). It was clarified that the grading quality can be appropriately controlled by a systematic framework.
With the advent of machine learning methods using deep learning, the accuracy of automatic scoring of descriptive answers by AI has improved remarkably.In particular, automatic grading of short-answer short-answer questions, which target written answers of several crosses, achieves the same level of grading quality as a human grader for some questions.However, it is difficult for grading AI to appropriately grade answers that contain unknown expressions that do not exist in the learning data, which is a major obstacle to the practical use of automatic grading by AI.
Therefore, the research group built a scoring framework in which an automatic scoring system and humans cooperate in scoring.This framework utilizes confidence, which is a measure of the reliability of scoring results from scoring AI.Check the confidence level of the automatic scoring results for each answer, and if the confidence level is low, re-grade by a human grader.
First, based on a small amount of graded answer data, we estimate the lower bound of confidence to achieve the desired grade quality.When the certainty factor falls below the lower limit during actual automatic scoring, a human performs re-scoring to achieve a desired scoring quality.
This time, using a data set of descriptive questions in Japan and the English-speaking world, we conducted a simulation to confirm the expected effect, and clarified its feasibility.In addition, it was found that the higher the matching rate of the scoring results between human graders, the higher the quality of scoring can be realized at low cost.Based on these findings, it is expected that the practical use of automatic scoring will progress in the future.