We describe the design and use of a personal digital library system, UpLib. The system consists of a full-text indexed repository accessed through an active agent via a Web interface. It is suitable for personal collections comprising tens of thousands of documents (including papers, books, photos, receipts, email, etc.), and provides for ease of document entry and access as well as high levels of security and privacy. Unlike many other systems of the sort, user access to the document collection is assured even if the UpLib system is unavailable. It is "universal" in the sense that documents are canonically represented as projections into the text and image domains, and uses a predominantly visual user interface based on page images. UpLib can thus handle any document format which can be rendered as pages. Provision is made for alternative representations existing alongside the text-domain and image-domain representation, either stored or generated on demand. The system is highly extensible through user scripting, and is intended to be used as a platform for further work in document engineering. UpLib is assembled largely from open-source components (the current exception being the OCR engine, which is proprietary).
Janssen, W. C.; Popat, A. C. UpLib: a universal personal digital library system. PARC TR-2003-16; 2003 November.