Learning domain-specific feature descriptors for document images
Details
Gold Coast, USA. Date of Talk: 3/27/2012
Speakers
Evgeniy Bart
Event
Learning domain-specific feature descriptors for document images
Many machine learning algorithms rely on feature descriptors to access information about image appearance. Using an appropriate descriptor is therefore crucial for the algorithm to succeed. Although domain- and task-specific feature descriptors may result in excellent performance, they currently have to be hand-crafted, a difficult and time-consuming process. In contrast, general-purpose descriptors (such as SIFT) are easy to apply and have proved successful for a variety of tasks, including classification, segmentation, and clustering. Unfortunately, most general-purpose feature descriptors are targeted at natural images and may perform poorly in document analysis tasks. In this paper, we propose a method for automatically learning feature descriptors tuned to a given image domain. The method works by first extracting the independent components of the images, and then building a descriptor by pooling these components over multiple overlapping regions. We test the proposed method on several document analysis tasks and several datasets, and show that it outperforms existing general-purpose feature descriptors.
Additional information
Focus Areas
Our work is centered around a series of Focus Areas that we believe are the future of science and technology.
Licensing & Commercialization Opportunities
We’re continually developing new technologies, many of which are available for Commercialization.
News
PARC scientists and staffers are active members and contributors to the science and technology communities.