Osaka University Develops Technology to Identify Unknown Words in Spoken-University Journal Online

　A research group led by Assistant Professor Ryu Takeda of the Institute of Scientific and Industrial Research, Osaka University has developed a "unit of sound expression" based on the pronunciation composition pattern of known words in the "identification of unknown words contained in conversation" technology required for speech dialogue systems. Introduced a mechanism that uses.

　In recent years, many robots and applications that respond by voice have been released, but basically it is a mechanism that recognizes only pre-registered words.If other words (unknown words) are included in the utterance, they will be replaced with known words and cannot be correctly recognized as "words".If you can correctly recognize the "unknown word" part, you can ask a person to learn its meaning.

　In the research, we focused on the "unsupervised word division" method in natural language processing technology and applied it to the identification of unknown words during utterances.This method usually targets written words, and the unit of cutting is "characters", but when applied to voice, it is not obvious what should be used as "unit of sound expression" and what is effective.As the unit, there are expressions such as phonemes (phonetic symbols) and syllables (hiragana).

　In the developed method, the "unit based on pronunciation and its composition pattern" calculated using "word-likeness" is used as the "unit of sound expression".This pattern is calculated from the common pronunciation and appearance position appearing in a plurality of words, and it becomes easier to identify an unknown word having a pronunciation composition close to that of a known word.Assuming that phoneme recognition was successful, the specific rate of unknown words is verified for each unit in the Japanese / English conversation corpus.

　This result is said to be the technology necessary to realize a voice dialogue system that not only allows developers to prepare and update in advance, but also learns and becomes smarter as they talk to humans.

Paper information:[2018 IEEE Spoken Language Technology Workshop (SLT)] Word Segmentation from Phoneme Sequences based on Pitman-Yor Semi-Markov Model Exploiting Subword Information