Some ideas of semantic analysis for anaphora resolution Dmitry P. Vetrov Dorodnicyn Computing Centre of RAS
Anaphora resolution Cant be done directly by syntax analysis. In mane cases it is unclear how to refer pronouns without understanding the sense of the phrase
Example Rebel fighter attacked Imperial ship that was going to take off. It exploded into pieces and sank. From formal point of view both variants are acceptable. But without knowing which is right, we cant unite sentences within an abstract. That could be necessary for automatic text annotation.
Training corpus The use of training corpus is difficult due to its limited size. Moreover we do not have annotated corpus for Russian language. What would be desirable is to use unmarked corpus. Then we could use very large sets of texts for training and building ontology.
Hierarchical vocabulary Vocabulary by Baranov which contains 5-level hierarchical classification of words: Fighter I -> flying transportation -> transportation -> device -> artificial -> substantial Fighter II -> character -> psychology -> human -> biological organism -> substantial The most of words can be referred by 6-numbered code which corresponds to category number at each level. This opens great opportunities for generalizing from short texts
Training During text processing compute the number of associations between two categories of various levels - number of associations of word w with any words from j-th category of k-th level. This can be done relatively easy by using ONLY formal analysis (e.g. we may consider the fighter both as a plane and as a man). The inexactnesses will be compensated by large volume of text
Semantic analysis After the training is finished we may estimate the relevance measure between two words: - number of category from level k for word w - number of associations of words from i-th category of k-th level with ANY other words
Decision making If we have several variants of referring the word w with words v1,…,vm, we prefer the most relevant Explodedit Ship? Fighter?
Ambiguity resolution FighterattackedshipTake offitexploded, sank Man Plane fired spacecraft criticized boat Take off spacecraft plane man boat exploded, sank
Dynamic programing FighterattackedshipTake offitexploded, sank Man Plane fired spacecraft criticized boat Take off spacecraft plane man boat exploded, sank
Advantages Do not need the text to be marked (annotated) Do not need the text to be marked (annotated) May generalize to the words that were not met during training due to hierarchical system of categories May generalize to the words that were not met during training due to hierarchical system of categories Dynamic programming allows processing quite large graphs built according to the set of sentences Dynamic programming allows processing quite large graphs built according to the set of sentences
Thank you Contact persons: Pavel Tolpegin Dmitry Vetrov Dmitry Kropotov