Developing Permanence Levels and the Archives for NLM’s Permanent Web Documents
The National Library of Medicine has developed a system to indicate to users which of its Web documents will be kept permanently available and whether the contents and identifiers of those documents could change over time. In 1999, the Working Group on Permanence of NLM's Electronic Information was established and given the following charge:
To examine the range of electronic information produced by NLM and develop recommendations in the following areas:
a) levels of permanence suitable for different categories of NLM information, e.g., permanent retention in unaltered form, permanent retention in continually updated form, retentions for a limited time in unaltered form, etc. In developing recommended levels of permanence, the Working Group should consider NLM's role as the holder of its own archives, as well as the needs of current and future users.
b) methods of recording and communicating the level of permanence of NLM electronic information, e.g., in metadata or by some other means
c) procedures for ensuring that the levels of permanence are implemented in practice
d) approaches to labeling, organizing, retrieving and displaying NLM's electronic information so that the retention of older materials (e.g., for NLM Archives) does not have a negative impact on those seeking current information.
The Working Group's discussions focused on three important characteristics of Web documents: identifier validity, resource availability, and content invariance. The Group developed a rating system based on these three concepts. The ratings later were distilled into the following four permanence levels:
Permanent: Unchanging Content
The National Library of Medicine has made a commitment to keep this resource permanently available. Its identifier will always provide access to the resource. Its content will not change. Example: Minutes of the NLM Board of Regents meetings
Permanent: Stable Content
The National Library of Medicine has made a commitment to keep this resource permanently available. Its identifier will always provide access to the resource. Its content is subject only to minor corrections or additions. Example: NLM Annual Report
Permanent: Dynamic Content
The National Library of Medicine has made a commitment to keep this resource permanently available. Its identifier will always provide access to the resource. Its content could be revised or replaced. Example: NLM Home Page
Permanence Not Guaranteed
The National Library of Medicine has made no commitment to keep this resource available. It could become unavailable at any time. Its identifier could be changed. Example: Frequently Asked Questions
The Working Group analyzed the documents that were available on the NLM Web site and developed a list of document categories. To simplify the assignment of permanence levels by Library staff, document categories were assigned default ratings. The default rating for press releases, for example, is Permanent: Stable Content. If a default rating does not seem appropriate for a particular document, it can be changed by the person responsible for assigning the metadata or by a system administrator.
|Document Category||Default Permanence Level|
|Announcements, News||Permanence Not Guaranteed|
|Applications, Forms, Registrations||Permanence Not Guaranteed|
|Bibliographies||Permanent: Dynamic Content|
|Calendars, Schedules||Permanence Not Guaranteed|
|Clinical Alerts||Permanent: Unchanging Content|
|Contracts and Related Resources||Permanence Not Guaranteed|
|Database||Permanent: Dynamic Content|
|Digital Library Collections||Permanent: Dynamic Content|
|Exhibitions||Permanent: Stable Content|
|Fact Sheets||Permanent: Stable Content|
|FAQs, Help Files, Pocket Cards||Permanence Not Guaranteed|
|Finding Aids||Permanent: Dynamic Content|
|Grants, Awards||Permanence Not Guaranteed|
|Lists of Links||Permanence Not Guaranteed|
|Minutes (Official)||Permanent: Unchanging Content|
|Newsletters||Permanent: Stable Content|
|Organizational Charts and Directories||Permanence Not Guaranteed|
|Other||Blank (No Default Rating)|
|Photos of Staff, Programs, Activities, Buildings and Grounds||Permanence Not Guaranteed|
|Policies (Official)||Permanent: Stable Content|
|Press Releases||Permanent: Stable Content|
|Procedures||Permanence Not Guaranteed|
|Product, Program, and Project Descriptions||Permanent: Dynamic Content|
|Reports (Official)||Permanent: Stable Content|
|Software||Permanence Not Guaranteed|
|Staff Biographical Sketches||Permanence Not Guaranteed|
|Staff Papers||Permanence Not Guaranteed|
|Staff Presentations||Permanence Not Guaranteed|
|Statistics and Reports||Permanence Not Guaranteed|
|Technical Documentation||Permanent: Dynamic Content|
|Training Material and Manuals||Permanent: Dynamic Content|
|Visitor Information||Permanence Not Guaranteed|
NLM's Metadata Schema
During the deliberations of the Working Group on Permanence, NLM's Task Group on Metadata and Methods of Recording Permanence Levels was appointed and charged with developing an expanded set of metadata to increase the retrievability of NLM's Web documents. It also was asked to decide how permanence metadata would be recorded and displayed. The Task Group recommended that metadata should be created for all publicly available electronic resources created by NLM and that permanence data be a required element of the metadata set. The NLM set is based on the Dublin Core Metadata Element Set but with some local adaptations--most notably the addition of permanence ratings. (See NLM Metadata Schema).
Implementing the System
A third committee, known as the Electronic Archive Group (EAG) then was charged with developing a pilot project for assigning metadata including permanence levels and building an archive for outdated Web documents of permanent value to NLM. The EAG evaluated several systems under development elsewhere and concluded that TeamSite, the content management system developed by Interwoven, Inc. that was being purchased for NLM's main Web site, could be used for assigning metadata and managing the archiving workflow. A template was created in TeamSite and Web contributors were trained to use it to assign basic metadata for all documents that would be submitted for promotion to the Web. The template was designed to minimize the burden on document creators.
Default values or drop-down menus are provided wherever possible. This minimal set includes:
Date Last Modified
Next Review Date
Contact email address
When a contributor assigns to a document a rating of Permanent (Unchanging, Stable, or Dynamic content), the system notifies the NLM Archives Team. The Archives Team reviews the document category and permanence metadata and forwards the document for promotion to the Web. The Cataloging Section then creates a complete MARC bibliographic record with standardized access points, including MeSH and an NLM classification number. The record appears in NLM's online catalog and is distributed to the bibliographic utilities and other NLM licensees. Enhanced metadata created by the Cataloging Section is then added to the header information of the online resource. The metadata can be viewed by clicking on a link that appears at the end of each resource.
The Archives contain permanent resources with outdated content. This includes older material that was once up on the current site but is no longer of current interest and earlier versions of current documents that have undergone major revisions. After investigating archive models developed elsewhere, the EAG determined that the best way to ensure proper migration of all permanent resources and allow searching and retrieval of archived items was to keep the archive as a separate but integral part of NLM's main Web site. The system was designed to query the current site and the Archives at the same time but search results for current and outdated documents are clearly differentiated. Search results for archived documents are listed separately and may be accessed by clicking on a folder labeled "Archives".
The system prompts Web contributors at regular intervals to review and revise their current documents as needed. If contributors create a major revision of a permanent document or decide that a permanent document should be removed from the current site without being replaced, the archiving function is triggered.
When a document is moved to the Archives, the date archived is added to its URL. If a user enters the original URL for an archived document that has no current version on the main site, a redirect page informs the user that the document has been moved to the Archives and provides a link to it. The only links in an archived document that continue to function are those to other parts of the same archived document. All other existing links remain, however they are not maintained. Users can find the earlier and later versions of Permanent: Stable documents and trace changes in these NLM programs and services over time by clicking on Previous Version and Replaced by links.
NLM developed a sidecar approach to providing metadata for non-HTML documents such as PDFs, MS Word documents, etc., using a separate XML file to contain the NLM modified Dublin Core metadata. Web documents created by NLM divisions that do not use the TeamSite content management system also are not included in the Archives. In the future, the workflow will be modified so that all of NLM's outdated Web publications of permanent value can be added to the Archives. In the near term, work will continue on automating more of the archiving process and streamlining the addition of enhanced metadata to permanent Web documents once their MARC records have been created.