PubMed Update: FTP Improvements Available for Testing. NLM Tech Bull. 2025 May-Jun;(464):e5.

2025 June 06 [posted]

In response to user feedback, the National Center for Biotechnology Information (NCBI) is announcing an upcoming update to the data provided through PubMed's public FTP servers.

Starting with the annual baseline release in December 2025, the content of PubMed data distributed via FTP will be generated using the same technology as the PubMed website and E-utilities API; this update means the FTP data will match the PubMed website and API.





Test Site Available Now

To help users prepare for this transition, a test FTP site is available now with daily update files generated by the new process. Please note, the test files may differ from the current FTP update files (see Expected Data Differences below).

A complete baseline will not be available on the test site; only update files will be posted.

We encourage users to review the sample files, test their workflows, and provide feedback via the National Library of Medicine (NLM) Help Desk.

Access the Test FTP Site.





Expected Data Differences

FTP users will notice some differences in data formatting and content. Most of these changes improve standardization and data richness without affecting the underlying meaning of the data. These differences include:

Availability of Additional Data - PubMed data may be augmented with additional data from PubMed Central (PMC) when available. This additional data was already integrated on the PubMed website and API and will now be included in the FTP files. This change will ensure that all users have access to consistent, up-to-date PubMed data.

Examples include:

More robust reference data supplied by PMC.

Inclusion of Conflict of Interest (COI) statements and additional <OtherAbstract> content from PMC.

Inclusion of PMC Release Date even after the release date has passed.

<Pagination> will include <StartPage> and <EndPage> in addition to <MedlinePgn> when available from PMC.

Due to this augmented data, daily update files on the test site may contain additional PMIDs that are not included in the current FTP files since the current files are not updated with the PMC data.

Formatting, Normalization, and Ordering – Some data may be formatted or ordered differently from the current FTP files. The structure of files and meaning of the data are not changed. Examples include:

Differing order of attributes or child elements.

Differences in capitalization of labels and some text.

Single digit dates will not be preceded by a zero (e.g., "1" instead of "01").

Author identifiers will not be preceded by the URL. The ID source will remain as an attribute in <Identifier>.

Schedule of Updates – Please note that daily update files for the current FTP and the test FTP site are generated using different processes on separate schedules. This means updated PMIDs may appear on different days between the two sites.





Share Your Feedback

Please reach out to us if you have questions or would like to provide feedback.

For more information about accessing PubMed data via FTP, please see: Download PubMed Data.

NLM will share additional details as the transition date approaches. Stay tuned to the NLM Technical Bulletin for updates.