TBL-improved non-deterministic segmentation and tagging for a Chinese parser
Details
Speakers
TBL-improved non-deterministic segmentation and tagging for a Chinese parser
Although much progress has been made recently in word segmentation and POS tagging for Chinese, the output of current state-of-the-art systems is too inaccurate to allow for syntactic analysis based on it. We present an experiment in improving the output of an off-the-shelf module that performs segmentation and tagging, the tokenizer-tagger from Beijing University (PKU). Our approach is based on transformation-based learning (TBL). Unlike in other TBL-based approaches to the problem, however, both obligatory and optional transformation rules are learned, so that the final system can output multiple segmentation and POS tagging analyses for a given input.
Additional information
Our work is centered around a series of Focus Areas that we believe are the future of science and technology.
We’re continually developing new technologies, many of which are available for Commercialization.
PARC scientists and staffers are active members and contributors to the science and technology communities.