Table of Contents RSS Home Back Issues Indexes
2010 JULY–AUGUST No. 375
August 25, 2010 [posted]

MEDLINE® Character Set Expansion

Since the inception of MEDLINE, NLM® has limited the characters used to those typed from a standard US keyboard and a small set of frequently used diacritics (see this character set at Limited MEDLINE®/PubMed® Character Set).

Starting in early September 2010, NLM will accept for newly created MEDLINE records any UTF-8 character in the Latin (Roman) and Greek scripts as well as mathematical and other symbols commonly found in biomedical literature. Other scripts such as Chinese, Japanese, or Korean are not supported (see MEDLINE®/PubMed® Character Set for the expanded character set).

The most notable difference is the addition of Greek characters to the database. Previously, NLM spelled out Greek letters, for example, replacing β (Unicode 03B2) with beta. PubMed users are now able to search for these characters either by copying and pasting the text from an online source or by spelling out the letter as they always have done. Both approaches retrieve the same set of citations.

NLM will continue to standardize some characters:

See Diacritics in PubMed® Displays and Searching for additional information.

By J. Shore
Index Section

Shore J. MEDLINE® Character Set Expansion. NLM Tech Bull. 2010 Jul-Aug;(375):e13.

Stay Current E-Mail Sign Up Home Back Issues Indexes