About Digital Collections
Digital Collections is the National Library of Medicine's free online archive of biomedical resources. All the content in Digital Collections is in the public domain and freely available worldwide. Digital Collections provides preservation of and unique access to NLM's rich, historical resources.
About the Collections
The majority of the texts within Digital Collections were digitized at NLM using a Kirtas KABIS III scanning system, which produces several files per page and per book. After cropping, deskewing and reviewing the source images, additional image derivatives and metadata are then created using NLM-defined scripts. A smaller number of texts were digitized from original or microfilm by a vendor offsite.
The texts comprising the Medicine in the Americas collection were digitized for a multi-institutional digital library project, the Medical Heritage Library, which uses Internet Archive to host its collection. Therefore NLM routinely deposits copies of its digitized books to Internet Archive. More information on the Medical Heritage Library can be found here.
The films available in Digital Collections come from NLM's reel and videotape holdings of government and military-created productions. The source material was digitized to MPEG2 format, which served as the digital master for the range of video derivatives offered by the repository. Each film was manually transcribed, with time-coded captions then created using WGBH's Magpie application.
Digitization specifications for the materials in Digital Collections can be found here.
Repository Technical Overview
The Digital Collections repository is primarily composed of open source technologies: The Fedora Commons Repository Software provides an underlying XML-based framework for structuring, managing and disseminating digital content. Fedora also includes the Solr/Lucene indexing application, which drives the full-text and faceted metadata search within Digital Collections.
The website's homepage, search functionality and resource summary pages are provided using the Blacklight open-source discovery interface.
Digitized texts are presented via Northwestern University's book viewing application, which itself is part of a larger open-source offering called Book-Workflow Interface. The repository's stored JPEG2000 page images are dynamically converted to regular JPEG for book viewer display via the Djatoka JPEG2000 Image Server.
Finally, digitized films are presented using an NLM-developed Flash video player. The player incorporates structured caption files into a transcript search, providing on-demand access to search results and a graphical representation of where the results occur within the timeline of the film.
Digital Collections uses the following strategies to help ensure the durability of the managed content:
- Every master file (source images and videos) is stored with an MD5 checksum, a numerical value unique to the file which can be recomputed to ensure the file has not been altered.
- All repository content and services are replicated at a secondary data center capable of taking over all repository functions if NLM's main data center is unavailable.
Digital Collections offers a Web service that facilitates programmatic search of the Dublin Core metadata and full-text OCR of every resource in the repository, with search requests and responses in XML format. More information, including the specifications of the service request and output, is available here.
For more information on the history of NLM's repository development, including the initial functional requirements and software evaluations, see the digital repository project history page.
For information about Digital Collections, contact:
National Library of Medicine
8600 Rockville Pike
Bethesda, MD 20894
Telephone: 1-888-FINDNLM (1-888-346-3656)