Skip Navigation Bar
 

Sample NLM® Data

INSTRUCTIONS FOR FTP OF SAMPLE RECORDS

FTP to NLM's anonymous FTP server: ftp://ftp.nlm.nih.gov/nlmdata/sample/
(login as a non-fee/anonymous user; use your e-mail address as password)
You will see a directory for each NLM database. Go to the directory you want and get the desired files.

  1. MEDLINE®/PubMed® (includes approximately 98% of all records in PubMed)

    NLM distributes MEDLINE/PubMed data in XML format.


    2014 Production Year Data

    Sample data using the current NLMMedlineCitationSet DTD are available:

    1. A small sample file of representative records covering each of the five status categories of records distributed to MEDLINE/PubMed licensees (i.e., MEDLINE, In-Data-Review, In-process, PubMed-not-MEDLINE, and OLDMEDLINE).
       
    2. Eight large sample files, each in .gz and .zip format, each containing 30,000 records (see access instructions at the top of this page). These files contain records in MEDLINE, PubMed-not-MEDLINE, and OLDMEDLINE statuses.

    Note that maintained versions of all sample records may be exported to licensees during the year.



    Documentation

    A document describing the MEDLINE/PubMed XML data elements, including definitions of the record status categories is available at http://www.nlm.nih.gov/bsd/licensee/data_elements_doc.html.


  2. CCRIS, GENE-TOX and HSDB® Subset
    Sample CCRIS, GENE-TOX and HSDB Subset data in an abbreviated XML format are available for FTP. See instructions at the top of this page for obtaining the abbreviated DTDs, sample records in XML format, and two files of documentation for each database from NLM's FTP server. The two documentation files are a .readme file containing definitions of the elements using legacy format element names and a conversion table showing conversion of data element names from legacy format to new XML element names.
     
  3. TOXLINE® Subset
    Sample TOXLINE Subset data in XML format are available for FTP. See instructions at the top of this page for obtaining sample records and DTDs from NLM's FTP server. Multiple DTDs and sample files are available for TOXLINE Subset: toxspec.dtd defines the XML for the entire TOXLINE Subset and archival.dtd defines the XML for the archival subfiles only. (Note that licensees must have special arrangements with BIOSIS and IPA before NLM will distribute their data). Other DTDs and sample files are present for each individual subfile of the database. Updates for the various subfiles comprising this database, if available, will be placed on the NLM server for licensees at the end of each month. The frequency of updates will be irregular, as NLM is dependent upon the outside suppliers whose schedules are not fixed. Each update file will be a complete replacement for that specific subfile.
     
  4. CHEMIDplus Subset and DIRLINE®
    Sample ChemIDplus and DIRLINE data in XML format are available for FTP. See instructions at the top of this page for obtaining the DTDs and sample records in XML format from NLM's FTP server. Note that licensees must contact U.S. Pharmacopeia Convention, Inc. (USP), for possible special arrangements before NLM will distribute ChemIDplus.
     
  5. Catfile, CatfilePlus, and Serfile
    Catfile is available in MARC 21 format only; CatfilePlus and Serfile are also available in XML format. Sample files of MARC 21 and XML-formatted products are available per access instructions at the top of this page.

    CatfilePlus in XML and Serfile in XML are defined by the 2014 NLMCatalogRecord DTD.

    Data element descriptions applicable to CatfilePlus in XML and Serfile in XML are available at http://www.nlm.nih.gov/bsd/licensee/catrecordxml_element_desc2.html. A description of attributes for these elements is available at http://www.nlm.nih.gov/bsd/licensee/catrecordxml_attributevalues_alpha2.html.

    General information on the MARC 21 record structure is available from the Library of Congress at http://lcweb.loc.gov/marc/marcdocz.html.