TBL-improved non-deterministic segmentation and tagging for a Chinese parser

Details

Event

2009-03-28

Speakers

Martin Forst
Event

TBL-improved non-deterministic segmentation and tagging for a Chinese parser

Although much progress has been made recently in word segmentation and POS tagging for Chinese, the output of current state-of-the-art systems is too inaccurate to allow for syntactic analysis based on it. We present an experiment in improving the output of an off-the-shelf module that performs segmentation and tagging, the tokenizer-tagger from Beijing University (PKU). Our approach is based on transformation-based learning (TBL). Unlike in other TBL-based approaches to the problem, however, both obligatory and optional transformation rules are learned, so that the final system can output multiple segmentation and POS tagging analyses for a given input.

Additional information

Focus Areas

Our work is centered around a series of Focus Areas that we believe are the future of science and technology.

FIND OUT MORE
Licensing & Commercialization Opportunities

We’re continually developing new technologies, many of which are available for¬†Commercialization.

FIND OUT MORE
News

PARC scientists and staffers are active members and contributors to the science and technology communities.

FIND OUT MORE