Skip Navigation Bar
 

NLM Launches “Digital Collections,” a Repository for Access to and Preservation of Digitized Biomedical Resources

The National Library of Medicine (NLM), the world's largest medical library and a component of the National Institutes of Health, has launched a new digital repository, Digital Collections, at http://collections.nlm.nih.gov. This new resource is complementary to the PubMed Central digital archive of electronic journal articles (http://www.ncbi.nlm.nih.gov/pmc/). The repository allows rich searching, browsing and retrieval of monographs and films from NLM's History of Medicine Division. Additional content and other format types will be added over time. Users can perform full-text and keyword searching within each collection or across the entire repository.

"The new Digital Collections repository will allow NLM to provide permanent, robust access to an even broader range of biomedical information," said Betsy Humphreys, Deputy Director, NLM.

Accessing the Collections

This first release of Digital Collections includes a newly expanded set of Cholera Online monographs, a portion of which NLM first published online in PDF format in 2007. The version of Cholera Online now available via Digital Collections includes 518 books (dating from 1817 to 1900) about cholera pandemics of that period. More information about the selection of the books and the subject of cholera may be found on the original Cholera Online Web page at: http://www.nlm.nih.gov/exhibition/cholera/. Each book was scanned into high-quality TIFF images, which underwent optical character recognition to generate corresponding text files. Finally, a JPEG2000 derivative was created for each page for presentation through the integrated book viewer, which includes a Flash-based zooming feature for resizing and rotating a page on demand.

 

Book viewer showing zooming of a page image

Figure 1: Book Viewer Showing Arbitrary Zooming of a Page Image

The second collection is a selection of 11 historical films, all created by the US government and in the public domain. The films have been digitized in a variety of video formats, to accommodate a wide range of playback devices, including mobile devices. Digital Collections also includes an integrated, Flash-based video player which allows full-text search of a film's transcript and graphically displays where the searched word or phrase occurs within the timeline of the film.

 

 

Video player showing search hits

Figure 2: Video Player Showing Search Hits in a Graphical Timeline of the Film

 

Preserving the Collections

Every page of each book and every video is stored as a discrete object in Digital Collections, with an XML "glue" describing each object and relationships between objects. To ensure long-term integrity of these digital files, checksums (number strings which act like mathematical "fingerprints") are calculated and written into the objects as the objects are ingested into Digital Collections. These checksums will be re-calculated periodically and compared with the original values. Additionally, all ingested files are versioned, so that any changes do not overwrite the original but instead create a new, second file which is stored along with the first.

Technology

Digital Collections was built using several open-source components, with the Fedora Commons Repository Software providing the foundation. The primary browse and search interface has been adapted from the Muradora "front-end" for Fedora, created by Macquarie University, Sydney, Australia. The book viewer is a component of Northwestern University's Book Workflow Interface, also created specifically for use with Fedora. Los Alamos National Laboratory's djatoka JPEG2000 server handles the images. The video player was adapted from a research project by NLM's Office of Computer and Communications Systems.

Project

In 2009, NLM began a pilot project to build the repository, develop appropriate workflows for ingesting and managing the content, and provide a core set of end-user services suitable for general public access. Information on the year-long evaluation process leading to the selection of Fedora can be found at http://www.nlm.nih.gov/digitalrepository/index.html.

Please send your comments and questions about Digital Collections to NLM Customer Service.

 

###