Parsing tables by probabilistic modeling of perceptual cues

Details

Event 10th IAPR International Workshop on Document Analysis Systems

Authors

Evgeniy Bart
Technical Publications
March 27th 2012
In this paper, we propose a method for automatically parsing images of tables, focusing in particular on `simple' matrix-like tables with rectilinear layout. Such tables account for over 50% of tables in business documents. The main novelty of the proposed method is that it combines intrinsic properties of table cells with properties of cell separators, as well as table rows, columns, and layout, in a single global objective function. This is in contrast to previous methods which focused on either separators alone or intrinsic cell properties alone. Our method uses a variety of perceptual cues, such as alignment and saliency, to characterize these properties. Candidate parses are evaluated by comparing their likelihoods, and the parse that optimizes the likelihood is selected. The proposed approach deals successfully with a wide variety of tables, as illustrated on a dataset of over 1,000 images.

Citation

Bart, E. Parsing tables by probabilistic modeling of perceptual cues. 10th IAPR International Workshop on Document Analysis Systems; 2012 March 27; Gold Coast, Australia.

Additional information

Focus Areas

Our work is centered around a series of Focus Areas that we believe are the future of science and technology.

FIND OUT MORE
Licensing & Commercialization Opportunities

We’re continually developing new technologies, many of which are available for Commercialization.

FIND OUT MORE
News

PARC scientists and staffers are active members and contributors to the science and technology communities.

FIND OUT MORE