Skip to Content
Archives
NLM Home | About the Archives

Skip to Content
United States National Library of Medicine National Institutes of Health

Announcements to NLM Data Licensees: Year 1999


New Data Format and Distribution Media for Licensees of NLM Data

November 19, 1999

Summary:
NLM is approaching the final stages of its transition to a new data creation and maintenance system. We will phase out the legacy systems and move into the new processing streams beginning with MEDLINE in the early part of the year. Data will be distributed to licensees in NLM's new format during 2000. Databases other than MEDLINE will continue to be created in the legacy ELHILL Unit Record Format and distributed on tape until their production has transitioned to new systems. The new format is XML-based, the media is ftp (except for large amounts of data which will be distributed in the future on DLT tapes), and there is no cost for data distributed via ftp or DLT tape. The DTD, sample records in XML and ELHILL format, and a conversion tablet to assist in the transition from ELHILL Unit Record Format to XML format are provided at this time.

Details of transition:
1. Weekly MEDLINE data available via ftp in XML format at no cost:
Two files (one compressed and one uncompressed) containing MEDLINE data will be available for ftp in XML format each week, generally on Monday mornings by 9:00 a.m. ET (Tuesday if Monday is a Federal holiday). At this time NLM plans for the files to contain the new records, maintained MEDLINE records that were added to MEDLINE at NLM the previous Friday night, and deleted records that were removed from MEDLINE at NLM the previous Friday night. Licensees will add the records having new Unique Identifiers and will replace the records having Unique Identifiers that match a previously existing UI in their product/system. NLM will not aggregate the data into a file containing the full month's data. Licensees who have received monthly tapes will have to get the various weekly segments from the NLM server and aggregate the data at their end. Weekly MEDLINE files will remain on the NLM server for the remainder of the MEDLINE production year. NLM has not yet established the file naming convention.

2. Distribution/discontinuation of weekly and monthly MEDLINE tapes:
NLM will continue to distribute weekly and monthly MEDLINE tapes in the legacy ELHILL Unit Record Format (EURF) through the end of May 2000 which corresponds to the 2007 Entry Month. Licensees must complete their transition to the new media and format by that time. NLM will discontinue distribution of data in EURF on 9-track or 3480 tape cartridges on May 31, 2000 (the last MEDLINE shipment will be 2007 EM data); ftp of weekly MEDLINE data in XML format will be the only option effective June 1, 2000. Licensees who complete their transition to the new format and media before May 2000 are encouraged to notify NLM to stop their tape shipments. NLM will not charge for distribution of weekly or monthly MEDLINE tapes during 2000.

3. MEDLINE in-process records available via ftp in XML format at no cost:
MEDLINE in-process records (no MeSH indexing, quality assurance not yet applied, also known as PREMEDLINE records) will be leased to MEDLINE licensees at no cost via ftp in 2000. These data will not be available on tape media. NLM will prepare the data each day Monday through Friday and a file of in-process records will be available for ftp in XML format the following day, Tuesday through Monday (excluding Saturday and Sunday), generally by 9:00 a.m. ET (the next business day if there is a Federal holiday). Licensees must be sure to process the complete week's worth of data in date order (Tuesday through Monday) before processing the week's MEDLINE data which is also available on Monday on a different server. It is possible that an in-process record during the week could be fully processed by the end of the week and licensees must take care not to replace the fully processed record on the weekly MEDLINE file with the in-process record. The daily in-process files will remain on the NLM server for ninety days. On the 91st day, the first file will be removed and the next file will be added. NLM has not yet established the file naming convention.

4. MEDLINE reload data for 2000:
MEDLINE reload data distributed this winter will be in the legacy EURF format on tape and will not be available in XML format via ftp. As in the past, two options will be available: a complete file replacement (both changed and unchanged records) and changed-records-only. The complete file replacement will include new data added since the last MEDLINE update in October. This new data will also be provided in a separate file for those who need it. NLM will use the same tape media as in the past: 9-track and 3480 tape cartridges. There will be a fee for these data; the fee is modeled upon the fees for older data leased in 1999. Licensees may elect to receive data beginning with the following years of publication: 1966, 1975, 1980, 1985, 1990, 1994, 1997.

5. Data other than MEDLINE:
In the coming months, other NLM data will be available in the new format as NLM completes its transition from the legacy system. Tape production for leased databases will continue through May 2000 and will then cease May 31, 2000 for all databases. At that time, NLM may begin a period in which data are not added for some databases and, therefore, can not be distributed to licensees because NLM may not have completed the transition to the new data creation and maintenance system. Details will follow as schedules are refined. NLM will not charge for distribution of data for other databases during 2000.

6. New hard media:
NLM expects to use DLT tapes to distribute large amounts of data (such as complete MEDLINE shipments for new licensees) sometime during 2000. The MEDLINE reload data for the following year (approximately December 2000) will be in XML format on DLT tapes.

7. Documentation:
The following is available for your review on NLM's Web site:

  1. The current version of the NLM MEDLINE DTD
    (http://www.nlm.nih.gov/databases/dtd/nlmmedline.dtd)

  2. The current version of the NLMCommon DTD
    (http://www.nlm.nih.gov/databases/dtd/nlmcommon.dtd)

  3. Sample MEDLINE records in new XML format
    (http://www.nlm.nih.gov/databases/dtd/medline_sample.xml)

    NOTE: Please use Internet Explorer 5 to open (not view) the XML sample records file so that the attribute default values will be displayed. If you do not use Internet Explorer 5, or if you view with Internet Explorer 5 rather than open, then you will see the input version that does not show the attribute default values.

  4. The same sample MEDLINE records in ELHILL Unit Record Format for comparison
    (http://www.nlm.nih.gov/databases/dtd/medline_eurf_sample.html)

  5. A document on NLM's treatment of diacritics
    (http://www.nlm.nih.gov/databases/dtd/medline_characters.html)

  6. A conversion table to assist in converting from ELHILL Unit Record Format to XML format
    (http://www.nlm.nih.gov/databases/dtd/conversiontables.html).

First published: 12 October 2006
Last updated: 12 October 2006
Date Archived: 07 February 2007
Metadata | Permanence level: Permanent: Stable Content