Scalability of redundancy detection in focused document collections
We describe the application of primarily symbolic methods to the task of detecting logical redundancies and inconsistencies between documents in a medium sized, domain focused collection (1000— 40,000 documents). Initial investigations indicate good scalability prospects, especially for syntactic and semantic processing. The difficult and largely neglected task of mapping from linguistic/semantic representations to domain tailored knowledge representations is potentially more of a bottleneck.
Crouch, R. S. ; Condoravdi, C. ; Stolle, R. ; King, T. H. ; de Paiva, V. ; Everett, J. O. ; Bobrow, D. G. Scalability of redundancy detection in focused document collections. First International Workshop on Scalable Natural Language Understanding (SCANALU 2002); 2002 May 23-24; Heidelberg; Germany.