homeresources & publications › glsa server @parc

TECHNICAL PUBLICATIONS:

GLSA Server @PARC

 

The User Interface Research group at PARC has put together a public web server (verb|http://glsa.parc.com/|) that allows computations of semantic similarities between words. In this talk I will briefly present GLSA and show how the server can be used. GLSA (Generalized Latent Semantic Analysis) is a technique similar to LSA (Landauer & Dumais, 1997). It uses a web-based, expandable corpus of documents to compute co-occurrences between all the words in the corpus. Based on these co-occurrences, it computes pointwise mutual information (PMI) scores between words; then it uses an algebraic technique called multidimensional scaling to reduce the dimensionality of the word representation. The cosine distance between these reduced representations is the final similarity measure. The server takes as input a set of words (or a text file containing pairs of words) and outputs a similarity matrix between all the words. We are working on an ACT-R output file that would automatically create chunks for the words and would set $S_{ij}$s between words to reflect those similarities.

 
citation

Royer, C. ; Farahat, A. O. ; Pirolli, P. L. ; Budiu, R. GLSA Server @PARC. Proceedings of the Twelfth Annual ACT-R Workshop; 2005 July 15-17; Trieste; Italy.