About Digital Collections
Digital Collections is the National Library of Medicine's free online repository of biomedical resources including books, still images, videos, and maps. All of the content in Digital Collections is freely available worldwide and, unless otherwise indicated, in the public domain. Digital Collections provides unique access to NLM's rich historical resources, as well as select modern resources.
About the Collections
The majority of the texts within Digital Collections were digitized at NLM using docWorks (dW) image processing software, which produces several files per page and per book. After cropping, deskewing and reviewing the source images, additional image derivatives and metadata are then created using NLM-defined scripts. A smaller number of texts were digitized from original or microfilm by a vendor offsite.
The texts comprising the Medicine in the Americas collection were digitized for a multi-institutional digital library project, the Medical Heritage Library, which uses Internet Archive to host its collection. Therefore NLM routinely deposits copies of its digitized books to Internet Archive. More information on the Medical Heritage Library can be found here.
The films available in Digital Collections originate from NLM's motion picture reel and videotape holdings. MPEG2 format digital masters are sourced from patron access DVDs or BetacamSP copy masters – from these masters a variety of derivatives is offered by the repository. Each film is transcribed and time-coded captions are created to satisfy Section 508 accessibility requirements and provide enhanced search functionality.
The majority of still images in Digital Collections come from the Images from the History of Medicine collection, including fine art, photographs, engravings, and posters that illustrate the social and historical aspects of medicine dating from the 15th to 21st century. Still images are available as a master TIFF or a standard JPG file.
The software available in the collection includes historical software developed by NLM, such as the interactive tutorial for Grateful Med, HowTo Grateful Med.
The specifications for NLM Digital Repository objects can be found here.
Repository Technical Overview
The Digital Collections repository is primarily composed of open source technologies: The Fedora Commons Repository Software provides an underlying XML-based framework for structuring, managing and disseminating digital content. Apache Solr and Lucene are used to index our content, and drive the full-text and faceted metadata search within Digital Collections.
The website's homepage, search functionality and resource summary pages are provided using the Blacklight open-source discovery interface.
Digitized texts are presented via Northwestern University's book viewing application, which itself is part of a larger open-source offering called Book-Workflow Interface. The repository's stored JPEG2000 page images are dynamically converted to regular JPEG for book viewer display via the Djatoka JPEG2000 Image Server.
Images are provided by the Loris IIIF Image Server, and presented using the OpenSeaDragon Image Viewer.
Digitized films are presented using an NLM-developed Flash video player. The player incorporates structured caption files into a transcript search, providing on-demand access to search results and a graphical representation of where the results occur within the timeline of the film.
Digital Collections uses the following strategies to help ensure the durability of the managed content:
- Every master file (source images and videos) is stored with an MD5 checksum, a numerical value unique to the file which can be recomputed to ensure the file has not been altered. Checksums are verified periodically to ensure the integrity of the content.
- All repository content and services are replicated at a secondary data center capable of taking over all repository functions if NLM's main data center is unavailable. A third copy of master content is stored off-site at a third-party location.
Digital Collections offers a Web service that facilitates programmatic search of the Dublin Core metadata and full-text OCR in the repository, with search requests and responses in XML format. More information, including the specifications of the service request and output, is available here.
For more information on the history of NLM's repository development, including the initial functional requirements and software evaluations, see the digital repository project history page.
For information about Digital Collections, contact:
National Library of Medicine
8600 Rockville Pike
Bethesda, MD 20894
Telephone: 1-888-FINDNLM (1-888-346-3656)