You need Macromedia Flash Player version 8 or higher to see the website.

Universal Digital Library

"Teamwork divides the task and doubles the success"; this is the essence of this project.

The idea of the Universal Digital Library (UDL) project
Initiated by Carnegie Mellon University, and previously known as the Million Book Project, the objective of this project is to transfer all books into digital format, in partnerships with other scanning centers internationally, in order to create a Universal Digital Library (UDL) which fosters creativity and free access to human knowledge. The project also aims at providing a test-bed supporting research on improved scanning techniques, Optical Character Recognition (OCR), intelligent indexing, machine translation, and information retrieval.

Creating partnerships
One of the key activities is to work with different libraries, universities and institutions worldwide that can adopt this model of exchanges and/or donate some of their collections, whether in digital form or through sending them for digitization and having them back. This would include books, journals, as well as theses and research reports.

Bibliotheca Alexandrina (BA) and its partners (including China, India and USA) have been working to demonstrate the project’s feasibility by digitizing one million books within three years and publishing them as a searchable collection on the Internet. However, by November 2007 the 1.5 million mark was already passed. The collection has been published and is available at www.ulib.org. All partners are contributing content to ensure that the collection is extensive, diverse and multilingual. The collection of the digitized books was reached by swapping the digitized books produced by the different partners. This method not only allows for sharing the resources of different countries and dividing the work among them, but also has the desirable feature of having each partner hold a mirror site of the million digitized books locally, thus guaranteeing fast access, reliability and availability.

BA's role
BA is taking the lead in scanning and digitizing Arabic books in particular. The collection has currently reached more than 170,000 Arabic digitized and processed books.

BA has also designed and implemented a database for the books, metadata and digitization status and set standards for the process of digitization in order to improve the quality of the scanning, processing, and OCR phases. The complete cycle of the workflow to produce digital books has been automated and integrated with the Library Information System. The workflow is managed by DAF, an in-house developed digitization workflow management system.

The database was further expanded into a Digital Assets Repository (DAR) accommodating various other types of digitized material including slides in multi-formats, negatives, books, manuscripts, pictures and maps, audio and video.

Researching improved techniques
Research was carried out in co-operation with Arabic OCR producers in order to achieve efficient, high quality recognition for mass OCR production. OCR systems alternative to Sakhr's Automatic Reader capable of recognizing Arabic text were investigated. A modified strategy for the OCR phase of digitization is currently being thought out. The new tools being investigated include VERUS from NovoDynamics, iRDS SDK from IRIS, CiyaOCR/ICR from CiyaSoft as well as OCR research work at the University of Buffalo, focusing on the recognition of Arabic machine-print as well as handwriting. In May 2006, BA and Novodynamics established a research partnership in order to advance NovoDynamics’ VERUS Professional product through testing and evaluation. Another research agreement was also established with Sakhr.

An implementation of an encoding system for multilingual, including Arabic, image-on-text DjVu and PDF has been completed and evaluated. Besides, a design of a framework for the universal encoding of image-on-text documents has been conceived. Previously, 12 OCR fonts were constructed and tested for accuracy, where accuracy exceeded 90% for 11 fonts. Three additional font groups are currently under construction. Moreover, an in-house digital viewer was implemented for publishing books on the web based on image-on-text technology. The viewer was enhanced and now includes searching, streaming by displaying one page at a time to facilitate displaying the book over a slow Internet connection, and extra security features such as displaying a specific range of pages or a limited number of pages and protecting copyright by preventing the user from copying or printing the entire book. DAR publishing website (http://dar.bibalex.org) features the books viewer where over 180,000 completely searchable books are now available.

International Collaboration
The BA participated with the Million Book project in the World Summit for Information Science (WSIS) conference that took place in Tunis from 16 to 18 November 2005. Furthermore, BA held the 2nd International Conference on Universal Digital Library (ICUDL2006) from 17-19 November 2006; the conference was followed by the Million Book Annual workshop. The main theme of this conference was “Towards building the globally owned Universal Digital Library where human knowledge is equally preserved and accessed”. The conference provided a forum for library and IT professionals to exchange comprehensive views on the recent development and progress in the digital library technology. An academic paper entitled " The Million Book Project at Bibliotheca Alexandrina" was presented during the conference. This paper was also chosen to be published in the Zhejiang University SCIENCE journal.

 

Partners & participants

  • Carnegie Mellon University, USA
  • Internet Archive, USA
  • Beijing University, China
  • Chinese Academy of Science, China
  • Fudan University, China
  • Chinese Ministry of Education, China
  • Nanjing University, China
  • State Planning Commission of China
  • Tsinghua University, China
  • Zhejiang University, China
  • Indian Institute of Science, Bangalore
  • International Institute of Information Technology
  • Indian Institute of Information Technology
  • Anna University, Chennai
  • Mysore University, Mysore
  • University of Pune, Pune
  • Goa University, Goa
  • Tirumala Tirupati Devasthanams, Tirupathi
  • Shanmugha Arts, Science, Technology & Research Academy, Tanjore
  • Arulmigu Kalasalingam College of Engineering, Srivilliputhur
  • Maharashtra Industrial Development Corporation, Mumbai
  • Bibliotheca Alexandrina – ISIS

 

Last updated on 28 Feb 2011