TBL-improved non-deterministic segmentation and tagging for a Chinese parser

Details

Event

2009-03-28

Speakers

Martin Forst
Event

TBL-improved non-deterministic segmentation and tagging for a Chinese parser

Although much progress has been made recently in word segmentation and POS tagging for Chinese, the output of current state-of-the-art systems is too inaccurate to allow for syntactic analysis based on it. We present an experiment in improving the output of an off-the-shelf module that performs segmentation and tagging, the tokenizer-tagger from Beijing University (PKU). Our approach is based on transformation-based learning (TBL). Unlike in other TBL-based approaches to the problem, however, both obligatory and optional transformation rules are learned, so that the final system can output multiple segmentation and POS tagging analyses for a given input.

Additional information

Focus Areas

Our work is centered around a series of Focus Areas that we believe are the future of science and technology.

FIND OUT MORE
Licensing & Commercialization Opportunities

We’re continually developing new technologies, many of which are available for Commercialization.

FIND OUT MORE
News

PARC scientists and staffers are active members and contributors to the science and technology communities.

FIND OUT MORE