he instability of resources on the Web is one of many challenging issues related to digital preservation. Several years ago, NLM recognized the seriousness of this problem and included in its long range plan for 2000-2005 the following objective:
Take a leadership role in ensuring permanent access to important digital materials in health and biomedicine, including electronic journals, databases, documents published on the Web, and new kinds of scholarly communication and documentation of knowledge, using NLM's own electronic output and services as initial testbeds.
To this end, NLM has developed a system for communicating to users whether the resources they consult on our Web site will be kept permanently available, change over time, or possibly disappear altogether. In addition, we have created an online archive for NLM's permanent Web documents that are no longer current.
In 1999, the Working Group on Permanence of NLM's Electronic Information (Permanence Working Group) was appointed and asked to examine the range of electronic information produced by NLM and develop recommendations in the following areas:
- a) levels of permanence suitable for different categories of NLM information
- b) methods of recording and communicating the level of permanence of NLM electronic information
- c) procedures for ensuring that the levels of permanence are implemented in practice
- d) approaches to labeling, organizing, retrieving and displaying NLM's electronic information so that the retention of older materials would not have a negative impact on those seeking current information.
The Permanence Working Group's discussions focused initially on three important characteristics of Web documents: identifier validity, resource availability, and content invariance. The Group developed a rating system based on these three concepts. The ratings later were distilled into the following four permanence levels:
- Permanent: Unchanging Content
- This resource will be kept available permanently. Its identifier will always provide access to the resource. Its content will not change.
Example: Minutes of the NLM Board of Regents
- Permanent: Stable Content
- This resource will be kept available permanently. Its identifier will always provide access to the resource. Its content is subject only to minor corrections or additions.
Example: Fact Sheets
- Permanent: Dynamic Content
- This resource will be kept available permanently. Its identifier will always provide access to the resource. Its content could be revised or replaced.
Example: NLM's Home Page
- Permanence Not Guaranteed
- NLM has made no commitment to keep this resource available. It could become unavailable at any time. Its content and identifier could be changed.
Example: Frequently Asked Questions
The Permanence Working Group analyzed the documents that were available on the NLM Web site and developed a list of document categories. To simplify the assignment of permanence levels by Library staff, document categories were assigned default ratings.
|Document Category ||Default Permanence Level|
|Announcements, News ||Permanence Not Guaranteed|
|Applications, Forms, Registrations ||Permanence Not Guaranteed|
|Bibliographies ||Permanent: Dynamic Content|
|Calendars, Schedules ||Permanence Not Guaranteed|
|Clinical Alerts ||Permanent: Unchanging Content|
|Contracts and Related Resources ||Permanence Not Guaranteed|
|Database ||Permanent: Dynamic Content|
|Digital Library Collections ||Permanent: Dynamic Content|
|Exhibitions ||Permanent: Stable Content|
|Fact Sheets ||Permanent: Stable Content|
|FAQs, Help Files, Pocket Cards ||Permanence Not Guaranteed|
|Finding Aids ||Permanent: Dynamic Content|
|Grants, Awards ||Permanence Not Guaranteed|
|Lists of Links ||Permanence Not Guaranteed|
|Minutes (Official) ||Permanent: Unchanging Content|
|Newsletters ||Permanent: Stable Content|
|Organizational Charts & Directories ||Permanence Not Guaranteed|
|Other ||Blank (No Default Rating)|
|Photos of Staff, Programs, Activities, Buildings & Grounds ||Permanence Not Guaranteed|
|Policies (Official) ||Permanent: Stable Content|
|Press Releases ||Permanent: Stable Content|
|Procedures ||Permanence Not Guaranteed|
|Product, Program, & Project Descriptions ||Permanent: Dynamic Content|
|Reports (Official) ||Permanent: Stable Content|
|Software ||Permanence Not Guaranteed|
|Staff Biographical Sketches ||Permanence Not Guaranteed|
||Permanence Not Guaranteed|
|Staff Presentations ||Permanence Not Guaranteed|
|Training Materials & Manuals ||Permanent: Dynamic Content|
|Visitor Information ||Permanence Not Guaranteed|
NLM's Metadata Schema
During the deliberations of the Permanence Working Group, NLM's Task Group on Metadata and Methods of Recording Permanence Levels was appointed and charged with developing an expanded set of metadata to increase the retrievability of NLM's Web documents. It also was asked to decide how permanence metadata would be recorded and displayed. The Task Group recommended that metadata should be created for all publicly available electronic resources created by NLM and that permanence levels be a required element of the metadata set. The NLM set is based on the Dublin Core Metadata Element Set but with some local adaptations--most notably the addition of permanence ratings. See http://www.nlm.nih.gov/tsd/cataloging/metafilenew.html.
Implementing the System
A third committee, known as the Electronic Archive Group (EAG) then was charged with developing a pilot project for assigning metadata including permanence levels and building an archive for outdated Web documents of permanent value to NLM. The EAG
evaluated several systems under development elsewhere and concluded that TeamSite, a content management system developed by Interwoven, Inc. that was being purchased for NLM's main Web site, could be used for assigning metadata and managing the archiving workflow. A template was created in TeamSite (see Figure 1) and NLM Web contributors were trained to use it to assign basic metadata for all documents that would be submitted for promotion to the Web. The template is designed to minimize the burden on document creators. Default values or drop-down menus are provided wherever possible. When a contributor selects a document category for a document that has just been created or revised, the system automatically provides its default permanence rating. If a default rating does not seem appropriate for a particular document, it can be changed by the person responsible for assigning the metadata or by a system administrator.
When a contributor assigns to a document a rating of Permanent (Unchanging, Stable, or Dynamic content), the system notifies the NLM Archives Team. The Archives Team reviews the document category and permanence metadata and forwards the document for promotion to the Web. The Cataloging Section then creates a complete MARC bibliographic record with standardized access points, including MeSH and an NLM classification number. The record appears in NLM's online catalog and is distributed to the bibliographic utilities and other NLM licensees. Enhanced metadata created by the Cataloging Section is then added to the header information of the online resource.
The Archiving Process
The system prompts Web contributors at regular intervals to review and revise their current documents as needed. If contributors create a major revision of a permanent document or decide that a permanent document should be removed from the current site without being replaced, the archiving function is triggered.
When a document is moved to the Archives, the date archived is added to its URL. The only links in an archived document that continue to function are those to other parts of the same archived document. All other links are stripped when a document is moved to the Archives.
The Archives contain permanent resources with outdated or superseded content. This includes older material that was once on the current NLM site but is no longer of current interest and earlier versions of current documents that have undergone major revisions. After investigating archives models developed elsewhere, the EAG determined that the best way to ensure proper migration of all permanent resources and allow searching and retrieval of archived items was to keep the Archives as a separate but integral part of NLM's main Web site. Archived pages are stored on a separate branch of the main NLM web server as shown in Figure 2.
The search engine was configured to query both the current site and the Archives but list the search results for archived documents separately (see Figure 3).
Clicking on an item in the search results takes the user directly to the archived document. An Archives header (see Figure 4) and footer were designed to indicate clearly to users that the documents they have accessed are no longer current.
At the end of each document are publication, update, and archived dates as well as links from an archived version to the version that replaced it. In the case of Figure 5, clicking on "Replaced By" takes the user to NLM's current site (see Figure 6).
Clicking on "Previous Version" at the end of the document will take the user from the current site back to the archived document (see Figure 7). Within the Archives, the user can trace changes in a document over time by clicking on the "Previous Version" link on every archived version of a document.
A link has also been added so that users can access the metadata for every document (see Figure 8).
The example in Figure 9 is a partial list of the expanded metadata which is created for all permanent documents:
Finally, if a user enters a URL for a document that has been moved to the Archives and there is no current version of the document on the main site, a redirect page will provide a link to it (see Figure 10).
Currently only HTML documents are being archived. NLM has developed a sidecar approach to providing metadata for non-HTML documents such as PDFs. Contributors use a templated form similar to that used for HTML pages to enter metadata (see Figure 1). System workflow validators require that contributors create this metadata file before a non-HTML document can be promoted. The metadata file is structured as Dublin Core XML schema which can also be queried by the site search engine; implementation is scheduled for the first half of 2005. Web documents created by the NLM administrative units that do not use the TeamSite content management system currently are not included in the Archives. In the future the workflow will be modified so that all of NLM's outdated Web publications of permanent value can be archived. Finally, NLM hopes to work with other libraries to encourage their use of permanence ratings for Web documents that are of lasting value.
By Margaret M. Byrnes
Head, Preservation and Collection Management Section
Byrnes MM. Permanence Levels and the Archives for NLM's Permanent Web Documents. NLM Tech Bull. 2005 Mar-Apr;(343):e4.