Announcements to NLM Data Licensees: Year 2009
(11/19/09) CatfilePlus XML and Serfile XML: Minor Revision to 2010 DTD; 2010 Baseline Files Available
(11/12/09) Renew License to Continue Access to 2010 Data
(11/09/09) Server Downtime; License Renewal; For MEDLINE/PubMed Licensees: Sample Records, Special PMID File, and December File Schedule Changes
(10/22/09) New DART File for TOXLINE® Subset Licensees
(09/17/09) 2010 DTD and XML Changes; File Distribution Schedule Changes [Revised 11/17/09]
(06/08/09) ChemIDplus Subset Update File Available
(02/02/09) Catfile, CatfilePlus, and Serfile Licensees
(12/19/08) DOCTYPE Line in 2009 MEDLINE/PubMed Baseline Files
(12/15/08) 2009 MEDLINE/PubMed Baseline Data
2008 Announcements
CatfilePlus XML and Serfile XML: Minor Revision to 2010 DTD; 2010 Baseline Files Available
November 19, 2009
- Minor Revision to 2010 NLMCatalogRecordSet DTD
The ForeName element in the 2010 NLMCatalogRecordSet DTD is now marked optional in the Author, Investigator, and PersonalNameSubject fields.
- 2010 Baseline Files Available
The updated XML base files for CatfilePlus and Serfile are now available on the NLM ftp server. CatfilePlus is in 4 parts, named “catplusbase1of4.2010.xml”, “catplusbase2of4.2010.xml”, etc. The Serfile base file is complete in a single file, named “serfilebase.2010.xml”. The baseline files contain all records through November 8, 2009 and should be used to completely replace all records previously distributed. The first XML update files for CatfilePlus and Serfile are expected to be available December 2, 2009.
- License Renewal
If you entered into the current License Agreement for NLM Data on or before August 31, 2009 and have not yet renewed your license, please see the renewal instructions in the November 12, 2009 announcement. The renewal system is available through December 9, 2009. On December 14, 2009 NLM will cancel licenses of those who do not renew.
Renew License to Continue Access to 2010 Data
November 12, 2009
RENEW LICENSE FOR 2010
Licensees who entered into the current License Agreement for NLM Data on or before August 31, 2009 need to renew their license no later than December 9, 2009 to be retained as licensees and to have access to the 2010 data files. An e-mail announcing access to the renewal instructions was sent to those who need to renew. If you entered into the current License Agreement for NLM Data on September 1, 2009 or later you do not need to renew for 2010. In this case, you were not sent the e-mail and can ignore the instructions below; your license remains in effect, and access to the 2010 data files will continue without interruption.
An online system is used to renew or to cancel your existing license. The system will be in operation through 5:00pm EST Wednesday December 9, 2009. If you do not renew by then you will need to submit a new license request to again license the NLM data.
RENEWAL INSTRUCTIONS FOR LICENSEES:
- Go to the renewal system Web page no later than 5:00pm EST December 9: https://wwwcf.nlm.nih.gov/nlm_licensee/renewal/index.cfm.
- Sign into the renewal system. Enter your license code and the licensee’s personal name exactly as shown in the individual e-mail sent to each licensee. Sign in will not work if the secondary representative’s name is entered.
- Follow the prompts to renew or cancel your license.
If you elect to renew, you will be asked to again accept the terms of the license agreement and to review your license profile before electing to renew with or without changes to your profile information. Changes to e-mail address, selecting additional data to lease, or changing an IP address used to get to the data files will take effect immediately. An e-mail confirmation of the renewal will be sent to the license representatives.
If you elect to cancel your license, you will no longer have access to licensed data files on December 14, 2009 and your license will be cancelled on that date. An e-mail confirmation of the cancellation will be sent to the license representatives. See Section 13 E of the License for required actions upon termination of the license. Section 6 is also pertinent if you have redistributed data received under this license, or data derived from the NLM-supplied data.
On December 14, 2009 NLM will cancel licenses of those who have not renewed or cancelled on their own by December 9, 2009. There will be no grace period after December 9 as the renewal system will no longer be operational. Licensees who intended to renew but did not do so on time will need to initiate a new license request.
Server Downtime; License Renewal; For MEDLINE/PubMed Licensees: Sample Records, Special PMID File, and December File Schedule Changes
November 9, 2009
Topics covered:
- Leased data files not available November 14, 2009
- Renewing your license for access to 2010 data
- For MEDLINE/PubMed Licensees: Medline/PubMed sample records using 2010 DTD
- For MEDLINE/PubMed Licensees: Special text file of In-Process and In-Data-Review status records
- For MEDLINE/PubMed Licensees: December update file information
- Reminders
- LEASED DATA FILES NOT AVAILABLE NOVEMBER 14, 2009
In order to connect a new electrical infrastructure for the NLM data center, an electrical power shutdown is scheduled at NLM for Saturday November 14, 2009. Leased data files will not be accessible that day from possibly as early as 4am EST when the system-wide shutdown begins to possibly as late as 8pm EST when all systems are expected to be fully restored. The downtime for access to leased data files will likely be shorter than that total timeframe.
- RENEWING YOUR LICENSE FOR ACCESS TO 2010 DATA
Licensees will need to renew their License Agreement for NLM Data in order to have access to 2010 data if they entered into their current license on August 31, 2009 or earlier. An e-mail announcing access to the renewal instructions will be sent within a week to those who need to renew. NLM will terminate licenses of those needing to renew and do not and, accordingly, access to the 2010 data will be denied. If you entered into the current License Agreement for NLM Data on September 1, 2009 or later you do not need to renew for 2010. In this case, you will not be sent the e-mail, your license will remain in effect, and you will have access to 2010 data when it becomes available.
- FOR MEDLINE/PUBMED LICENSEES: MEDLINE/PUBMED SAMPLE RECORDS USING 2010 DTD
A small file containing 140 representative records processed using the 2010 MedlineCitationSet DTD is available. The change in the XML this year is due to the reorganization of CommentsCorrections in the 2010 DTD, as announced on September 17, 2009. The NameID element introduced in the 2010 MedlineCitationSet DTD remains in the DTD although it is now not expected to be used in the 2010 production year.
- FOR MEDLINE/PUBMED LICENSEES: SPECIAL TEXT FILE OF IN-PROCESS AND IN-DATA-REVIEW STATUS RECORDS
As the case last year, a special text file containing PMIDs of records in MedlineCitation Status = In-Process and MedlineCitation Status = In-Data-Review that have been retained in the 2010 version of PubMed at the time the 2010 baseline files are loaded and that are not exported to licensees in the early update files will be available at the time the baseline files become available, or shortly thereafter. These records should eventually be exported in update files as completed records in MedlineCitation Status = MEDLINE or MedlineCitation Status = PubMed-not-MEDLINE or as deleted PMIDs in DeleteCitationSet. Licensees who wish to create a database as close as possible to the current record content in PubMed when the 2010 system is expected to be up at NLM on December 14 may wish to include these records immediately after they load the 2010 baseline files. Further instructions will be provided in the forthcoming 2010 baseline files documentation and access instructions. Descriptions of all MedlineCitation Statuses are available.
- FOR MEDLINE/PUBMED LICENSEES: DECEMBER UPDATE FILE INFORMATION
Item 4 of the September 17, 2009 announcement contains information about MEDLINE/PubMed update file schedule changes in November. In December, the last update file containing records in In-process and in-Data-Review status for the 2009 production year will be available on Thursday December 10, 2009. Update files will not be available Friday December 11 and Saturday December 12, 2009. The 2010 baseline files are expected to be available December 14, 2009 at which time routine daily update files resume.
- REMINDERS
- See the September 17, 2009 announcement for important information about 2010 DTDs and file distribution schedule changes. Visit the NLM information page for licensees and follow the links on subsequent pages for data element descriptions, update charts, announcements to licensees, and other documentation/resources relating to the data you lease from NLM.
- Consider subscribing to one or more of NLM's e-mail alerts. These alerts services cover all NLM products, services, and programs and are different from the occasional e-mails sent to licensees containing technical and administrative information related to leasing NLM databases.
- The NLM Technical Bulletin includes information about searching the data you lease on NLM's systems including PubMed, Gateway, TOXNET, NLM Catalog, and LocatorPlus. Items are published as they are completed and are then compiled into bi-monthly issues. TB material supplements the data content and format documentation available to licensees.
New DART File for TOXLINE Subset Licensees
October 22, 2009
A new DART subfile was placed on the NLM server on October 8, 2009. This file replaces the most recent file which had been on the server for licensees since June 29, 2006. Although the current file has just been made available to TOXLINE Subset licensees, it covers data last updated in June 2008. NLM expects there will be no further updates to DART data.
Please be reminded that access instructions for all Toxline Subset data files are at http://www.nlm.nih.gov/bsd/licensee/access/toxsubset.html.
Contact nlmdatadistrib@nlm.nih.gov if you have further questions about accessing Toxline Subset data; contact custserv@nlm.nih.gov if you have data content questions.
2010 DTD and XML Changes; File Distribution Schedule Changes
September 17, 2009 [Revised 11/17/09]
Topics covered:
- DTDs for the NLM 2010 Production Year
- XML Changes for the NLM 2010 Production Year
- Forthcoming 2010 Baseline and Update Files
- Schedule Changes for Daily MEDLINE/PubMed Update Distributions in November
- 2010 MeSH® Vocabulary Available
- Continuing to Lease NLM Data in 2010
- Information Page for Licensees
- License Code Reminder
- DTDS FOR THE NLM 2010 PRODUCTION YEAR
Single "standalone" or "flat" DTDs will be used for MEDLINE/PubMed, CatfilePlus in XML, and Serfile in XML baseline and update files during the 2010 production year. The standalone DTDs replace the coordinated suite of DTDs in use through the 2009 data production year (i.e., NLMMedline, NLMMedlineCitation, NLMCatalogRecord, NLMSharedCatCit, and NLMCommon). The DTD named NLM MedlineCitationSet will be used for MEDLINE/PubMed data, and the DTD named NLM CatalogRecordSet will be used for both CatfilePlus and Serfile XML data.
Pertinent elements and attributes from applicable DTDs in the current DTD suite were extracted for the two standalone DTDs. Further, extraneous objects and unused and unmapped tags are not present; elements and attributes are explicitly defined; external entities have been merged; and internal DTD entity references are not used in the standalone DTDs.
NLM MedlineCitationSet and NLM CatalogRecordSet, both dated January 1, 2010, are available from links on http://www.nlm.nih.gov/databases/dtd/. The MEDLINE XML Element Descriptions and the CatfilePlus and Serfile XML Element Descriptions will be edited at a later date to reflect these changes for 2010.
A list of the elements and attributes that were eliminated from the coordinated suite of DTDs to create the new standalone NLM MedlineCitationSet and NLM CatalogRecordSet appears in the Revision Notes section near the top of each new DTD. The following highlights and supplements the Revision Notes sections:
- NLM MedlineCitationSet DTD used for MEDLINE/PubMed XML data files:
- From the suite of DTDs used in previous years, NLMMedline DTD and NLMMedlineCitation DTD were used as the base, and external entities from NLMCommon and NLMSharedCatCit were merged into the new DTD.
- The CommentsCorrections group of elements was reorganized. Through the 2009 data year, the publications cited in CommentsCorrections were defined as elements. For the 2010 DTD, they are defined as valid values to the RefType attribute, and a new attribute value, Cites, was created. Cites will contain PMIDs and source data for items in the bibliography or list of references at the end of an article that is deposited in PubMed Central (PMC). There is no RefType attribute corresponding to Cites for PMIDs and source data of articles in which a paper is cited.
In the implementation for 2010, RefType = “Cites” will contain only PMIDs and source data for citations where an actual PMID for the cited article exists in the NLM Data Creation and Maintenance System (DCMS). It is therefore possible for a citation to be present in the article’s list of references and yet the PMID is not included in the Cites list because it is not present in the NLM DCMS. Cites will be present in the baseline files; however, the subsequent update frequency of Cites lists is not yet determined. Again, all Cites data for this initial implementation are coming from articles in PMC.
- NameID element was added to Author and Investigator elements. NameID is a possibly multiply-occurring, optional element permissible within the Author (personal and collective) and Investigator elements. It is intended as a unique identifier associated with the name. The value in the NameID attribute Source designates the organizational authority that established the unique identifier. There is no target date for implementation of this field; it is a placeholder for now.
- NLM CatalogRecordSet DTD used for CatfilePlus XML and Serfile XML data files:
- From the suite of DTDs used in previous years, NLMCatalogRecord DTD and NLMMedlineCitation DTD were used as the base, and external entities from NLMCommon and NLMSharedCatCit were merged into the new DTD.
- BoundWith, Dissertation, and LinkComplexNote were added as new valid values for the NoteType attribute for the GeneralNote element.
- AbsorbedInPart and AbsorbedInPartBy were added as new valid values for the TitleType attribute for the TitleOther element (to coincide with LocatorPlus and MARC21 standards).
- ContentType, MediaType, and CarrierType elements were added to ResourceInfo element. NLM has defined these new elements to encompass the new MARC fields: 336 (Content type), 337 (Media type) and 338 (Carrier type). NLM will be a test site for RDA (Resource Description and Access), the new cataloging rules which are to replace AACR2 (Anglo-American Cataloging Rules, 2nd ed. rev.). The three new elements have no interdependencies and may exist in addition to the original three elements in ResourceInfo.
- CCRIS, ChemID Subset, DIRLINE®, Gene-Tox, HSDB®, and TOXLINE® Subset
There are no known DTD changes expected for 2010.
- XML CHANGES FOR THE NLM 2010 PRODUCTION YEAR
- MEDLINE/PubMed XML Changes
The following is a summary of changes in the XML distribution of the leased MEDLINE/PubMed data:
- ISSNLinking data will be released with 2010 baseline files:
The ISSNLinking element was defined in the DTDs for 2008, but has not been populated and distributed in the XML. The ISSN Centre has now provided NLM with ISSNLinking data which will export with the 2010 baseline files.
- CommentsCorrections elements reorganized:
Publications cited in CommentsCorrections are defined as values in the new RefType attribute, and a new attribute value, Cites, was created for 2010 data.
Example of CommentsCorrections containing a comment about the article and a retraction notice about the article:
2009 DTD
<CommentsCorrectionsList>
<CommentsCorrections>
<CommentIn>
<RefSource>Hum Immunol. 2001 Oct;62(10):1064</RefSource>
<PMID>11600211</PMID>
</CommentIn>
<RetractionIn>
<RefSource>Suciu-Foca N, Lewis R. Hum Immunol. 2001 Oct;62(10):1063</RefSource>
<PMID>11600210</PMID>
</RetractionIn>
</CommentsCorrections>
</CommentsCorrectionsList>
2010 DTD
<CommentsCorrectionsList>
<CommentsCorrections RefType="CommentIn">
<RefSource>Hum Immunol. 2001 Oct;62(10):1064</RefSource>
<PMID>11600211</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="RetractionIn">
<RefSource> Suciu-Foca N, Lewis R. Hum Immunol. 2001 Oct;62(10):1063</RefSource>
<PMID>11600210</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
Example of new CommentsCorrections RefType attribute = Cites:
Bibliography for paper A (PMID 87654321) contains source information* for paper B with PMID 23456789 which is in the DCMS.
Record for paper A:
<CommentsCorrectionsList>
<CommentsCorrections RefType="Cites">
<RefSource>source of paper B</RefSource>
<PMID>23456789</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
*see Creation of Journal Source at NLM
Note that the MEDLINE/PubMed record for paper B, PMID 23456789, does not contain a CommentsCorrections occurrence containing the PMID or source data of paper A.
- New NameID element with its Source attribute:
NameID data will not be present in the baseline files. There is no target date for the implementation of this field. If implementation occurs during the 2010 production year, records for which NameID data become known will be redistributed as revised records in update files after the baseline distribution. NameID data may be associated with the Author (personal and collective) and Investigator elements.
<Author ValidYN="Y">
<LastName>Soon</LastName>
<ForeName>M S</ForeName>
<Initials>MS</Initials>
<NameIDSource="NCBI">123456</NameID>
</Author>
<Author ValidYN="Y">
<CollectiveName>International Human Genome Sequencing Consortium</CollectiveName>
<NameID Source="NCBI">123457</NameID>
</Author>
<Investigator ValidYN="Y">
<LastName>Melosh</LastName>
<ForeName>H J</ForeName>
<Initials>HJ</Initials>
<NameID Source="Publisher">123458</NameID>
<Affiliation>U AZ, Tucson</Affiliation>
</Investigator>
- Author KeyWords:
Use of the existing KeyWordList elements and Owner attribute will be expanded to also house keywords which are published in journal articles and are usually assigned by authors. There is no target date for implementation. Author Keywords will not be present in the baseline files; if implemented during the 2010 production year, author KeyWord data will first be found on new records distributed in update files. When it is implemented, the KeyWord attribute Owner will be populated with the value NOTNLM and the value for the attribute MajorTopic will be N.
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">obstructive sleep apnea</Keyword>
<Keyword MajorTopicYN="N">sleep disorder</Keyword>
</KeywordList>
- CatfilePlus in XML and Serfile in XML Changes
There are no changes in the 2010 XML except that records may include new attribute values and elements defined in the 2010 DTD.
<GeneralNote Owner="NLM" NoteType="Dissertation”>Thesis (doctoral) - Universität, Halle-Wittenberg, 2008.</GeneralNote>
</TitleOther>
<TitleOther Sort="N" Owner="NLM" TitleType="AbsorbedInPartBy">
<TitleAlternate>Journal of the Chemical Society. Perkin transactions II</TitleAlternate>
</TitleOther>
<ResourceInfo>
<TypeOfResource>Visual Materials</TypeOfResource>
<Issuance>monographic</Issuance>
<ResourceUnit>videorecording</ResourceUnit>
<ContentType>moving image</ContentType>
<MediaType>video</MediaType>
<CarrierType>videodisc</CarrierType>
</ResourceInfo>
- There are no known XML changes expected in 2010 for CCRIS, ChemID Subset, DIRLINE, Gene-Tox, HSDB, and TOXLINE Subset.
- FORTHCOMING 2010 BASELINE AND UPDATE FILES
The 2010 baseline and subsequent update files should be used to replace all previously received 2009 production year data for MEDLINE/PubMed, Catfile, CatfilePlus and Serfile data. A complete baseline reload for each database ensures you have the most current and accurate version of all records and is required per Appendix C of the License Agreement for NLM Data.
- MEDLINE/PubMed
After NLM completes its annual database maintenance activities, we expect to release the 2010 baseline files for MEDLINE/PubMed maintained with 2010 MeSH vocabulary and other global changes to renewing licensees in mid-December (on or about December 14).
- CatfilePlus and Serfile in XML
The CatfilePlus and Serfile XML baseline files, also maintained with 2010 MeSH vocabulary and other global changes are expected on or about November 23, 2009.
- CatfilePlus and Serfile in MARC format
MARC-formatted CatfilePlus and Serfile baseline files maintained with 2010 MeSH vocabulary and other global changes resulting from annual maintenance are expected on or about January 7, 2010. The last monthly files which reflect the 2009 MeSH vocabulary will be dated November 1, 2009. Appropriate LocatorPlus records which are updated as part of annual maintenance will be distributed to licensees in the files dated December 1, 2009. All MeSH headings in records distributed in the December 1, 2009 files will conform to 2010 MeSH.
- Catfile in MARC format
MARC-formatted Catfile baseline files maintained with 2010 MeSH vocabulary and other global changes resulting from annual maintenance are expected on or about January 7, 2010. The last weekly file which will reflect the 2009 MeSH vocabulary is the file dated November 5, 2009. Appropriate LocatorPlus records which are updated as part of annual maintenance will be distributed to Catfile licensees in files beginning November 12, 2009. All MeSH headings in records distributed as of November 12, 2009 will conform to 2010 MeSH.
- Other databases
As routine, all XML files distributed to licensees during 2010 for CCRIS, ChemID Subset, DIRLINE, Gene-Tox, HSDB, and TOXLINE Subset will be complete replacement files. New files, if available, will be on the server for licensees on or about the 28th of the month. Some of these databases are updated irregularly and/or infrequently.
- SCHEDULE CHANGES FOR DAILY MEDLINE/PUBMED UPDATE DISTRIBUTIONS IN NOVEMBER
As is the case each year, in mid-November NLM will suspend distribution of new and revised MEDLINE/PubMed records in MedlineCitation Status = MEDLINE, MedlineCitation Status = PubMed-not-Medline, and MedlineCitation Status = OLDMEDLINE as preparations are made for the new production year. The last records in these three statuses for the current 2009 production year are expected to be available Wednesday, November 18. There will not be an update file on Thursday and Friday. Beginning Tuesday, November 24 update files of records in only In-process and In-Data-Review statuses will be available until the MEDLINE/PubMed 2010 baseline files are available on or about December 14. [Revised 11/17/09] During this period, the only PMIDs of records to delete that will be exported will be those of In-process status records. Daily update files containing new and maintained records in all statuses and PMIDs of deleted records will commence upon release of the baseline files to licensees.
- 2010 MESH VOCABULARY AVAILABLE
Some licensees use the MeSH Vocabulary with their leased NLM data. The MeSH Browser now has a link to 2010 MeSH. The default year in the MeSH Browser remains 2009 MeSH for now, but the alternate link provides access to 2010 MeSH. The 2010 MeSH files are also available for download after completion of an online Memorandum of Understanding.
- CONTINUING TO LEASE NLM DATA IN 2010
Licensees must confirm continued interest in leasing NLM data during the 2010 production year by following instructions that will be provided in November. Only those who renew will have access to the 2010 data, and licenses of non-responders will be terminated (and accordingly, access to the 2010 data will be denied). Per Section 13E of the License Agreement for NLM Data, licensees who cancel their license or whose license is terminated by NLM are to:
- Discontinue use of any promotional or other materials that refer to such data within thirty (30) days, and
- Cease use of data or information/data derived from data licensed under this License in all applications no later than ninety (90) days. Following that, Licensee must promptly destroy and erase all data obtained under this License as well as any data contained in any derivative applications under Licensee’s control. (This does not apply to Catfile, CatfilePlus, or Serfile data.)
- INFORMATION PAGE FOR LICENSEES
Please visit the NLM information page for licensees. Licensees should follow the links on this page for data element descriptions, update charts, announcements to licensees, and other documentation/resources relating to the data you lease from NLM.
- LICENSE CODE REMINDER
A unique alpha-numeric code including the letters "NLM" is assigned to each license and appears at the top of license-related e-mails sent by NLM. Please keep a record of your license code and reference it when corresponding with NLM about matters relating to your license or leased databases.
ChemIDplus Subset Update File Available
June 8, 2009
To NLM ChemIDplus Subset Licensees:
This is to advise you that a new ChemIDplus Subset file is now on the NLM server for licensees to FTP per access instructions at http://www.nlm.nih.gov/bsd/licensee/access/chemidsubset.h tml. New files are generally put on the server on the 28th of the month when an updated file becomes available. The current file, however, was put on the server on June 5th.
CAS Registry Numbers which had been missing from the ChemIDplus XML records are now back on the records.
As a reminder, the main NLM Web page for licensees of ChemIDplus is http://www.nlm.nih.gov/bsd/licensee/toxnet.html. Please let us know if you have questions or problems accessing the data file. If you have questions about the data content, please contact Custserv@nlm.nih.gov.
Catfile, CatfilePlus, and Serfile Licensees
February 2, 2009
This notice is being sent separately to both primary and secondary (if available) license representatives. Please see http://www.nlm.nih.gov/bsd/licensee/access/ for links to access instructions and information.
FOR MARC SUBSCRIBERS
The updated MARC base files for Catfile, CatfilePlus, and Serfile are now available on the NLM ftp server. These base files are each complete in a single file. Loading the base files on an annual basis is optional for MARC subscribers. If you have loaded each of the monthly updates, there is no need to reload the base files.
The MARC file containing all the bibliographic records deleted by NLM between January 1, 2008 and December 31, 2008, is also available. There are 5,665 records in this file. Licensees who are new recipients of NLM's MARC bibliographic records in 2009, as well as ongoing licensees who are discarding their pre-2009 records and reloading with the 2009 base files, do NOT need the delete file. The records in this file were removed from NLM's database prior to the pull of the 2009 base files.
If loading new baseline files, you should then load the 2009 update files dated after the date of the base files.
FOR XML SUBSCRIBERS
The updated XML basefiles for CatfilePlus and Serfile have been available on the NLM ftp server since mid-December. The baseline files should be used to completely replace all records previously distributed to continuing licensees. After the new baseline files are loaded, you should then load the 2009 update files. See the XML Update Charts at http://www.nlm.nih.gov/bsd/licensee/catrecordxml_stats_2009.html.
DOCTYPE Line in 2009 MEDLINE/PubMed Baseline Files
December 19, 2008
This message is intended for licensees who downloaded the 2009 MEDLINE/PubMed baseline files before 7:37pm December 17, 2008.
The 2009 MEDLINE/PubMed baseline files became available to licensees on Tuesday December 16, 2008 (see http://www.nlm.nih.gov/bsd/licensee/announce/2009.html#d12_15). It was subsequently discovered that the DOCTYPE line in the baseline data files was incorrect. For those who validate the XML data with the DTD from the DOCTYPE line, the correct DOCTYPE is:
<!DOCTYPE MedlineCitationSet PUBLIC "-//NLM//DTD Medline Citation, 1st January, 2009//EN" "http://www.nlm.nih.gov/databases/dtd/nlmmedline_090101.dtd">
The original baseline files were replaced with new baseline files containing the corrected DOCTYPE line at 7:37pm ET Wednesday December 17. The size of each new baseline file is slightly smaller because of the corrected DOCTYPE; the total file byte size is now 68,645,424,728 bytes. Accordingly, the chart at http://www.nlm.nih.gov/bsd/licensee/2009_stats/baseline_med_filec ount.html will be edited within several days.
2009 MEDLINE/PubMed Baseline Data
December 15, 2008
- AVAILABILITY OF 2009 MEDLINE/PUBMED BASELINE DATA
I am pleased to inform you that the 2009 MEDLINE/PubMed baseline files which replace all previously distributed MEDLINE/PubMed data are now available for FTP. Licensees have been e-mailed the location of the FTP access instructions and additional information.
- 2009 UPDATE FILES
The first group of 2009 update files and the special PMID list text file (see item 3 below) are also available. Please be sure to read the _notes.txt file that is on the server accompanying the first update file medline090594. Update files should be processed after the baseline files in ascending file name numeric sequence (see item 3 below for exception) to ensure that all new records are added and the most current and accurate version of each record is retained. FTP access instructions with additional information are available at http://www.nlm.nih.gov/bsd/licensee/access/medline_pubmed.html.
- ADDITIONAL PMID LIST FILE
**NOTE: This file may not be available until Wednesday Dec. 17, 2008**
A text file containing PMIDs of records in MedlineCitation Status = In-Process and MedlineCitation Status = In-Data-Review that have been retained in the 2009 version of PubMed at the time the 2009 baseline files were loaded and that are not exported to licensees in the first batch of update files is available. These records will eventually be exported in update files as completed records in MedlineCitation Status = MEDLINE or MedlineCitation Status = PubMed-not-MEDLINE or as deleted PMIDs in DeleteCitationSet. Licensees who wish to create a database as close as possible to the current record content in PubMed will want to include these records now.
The file, named SpecialPubMedPMIDList_2009.txt, resides in the update file directory. Licensees may use the Entrez Utilities to download the records using the list of PMIDs.
*IMPORTANT*: If you elect to add these records to your version of MEDLINE/PubMed, they must be added to your 2009 MEDLINE/PubMed database either 1) immediately after the baseline files and before any update files or, 2) after update files medline09n0594 through medline09n0626 to ensure retaining the most current version of those records as subsequent update files are loaded. Do not add the records identified in SpecialPubMedPMIDList_2008.txt after you have processed medline09n0627 as this may result in retention of an earlier and inaccurate version of the records.
- 2008 MEDLINE/PUBMED FILES TO MOVE TO NEW DIRECTORY
The last 2008 update file, medline08n0876, was placed on the server for licensees December 11, 2008. The 2008 update files have moved to ftp://ftp.nlm.nih.gov/nlmdata/.medlease2008 where they will remain for several weeks for licensees who need access to them while working with the 2009 baseline files.
- DOCUMENTATION
Documentation for the MEDLINE/PubMed baseline database is available from links in the Data Availability and Maintenance section of NLM’s information page for MEDLINE/PubMed licensees at http://www.nlm.nih.gov/bsd/licensee/medpmmenu.html. The direct URLs to those pages are http://www.nlm.nih.gov/bsd/licensee/2009_stats/baseline_doc.html and http://www.nlm.nih.gov/bsd/licensee/2009_stats/baseline_med_filec ount.html. Also see the MEDLINE/PubMed Maintenance Overview at http://www.nlm.nih.gov/bsd/licensee/medline_maintenance.html for information about and points to consider for processing update files.
- MEDLINE/PUBMED BASELINE REPOSITORY (MBR)
The 2009 baseline data will be included at a later date in the MEDLINE/PubMed Baseline Repository (MBR) resources at http://mbr.nlm.nih.gov/. If you wish to search the baseline data via the MBR Query Tool, be sure to use the same IP address registered with NLM for access to MEDLINE/PubMed from NLM’s FTP server.
Please do not hesitate to contact me with questions as they may arise. I look forward to working with you during 2009 and send best wishes for peace and good health during New Year.