Category G - Phenomena and Processes

Indexing Principles - Molecular Sequence Data

As part of the indexing process, we identify articles containing amino acid sequences, base sequences or carbohydrate sequences. The sequence need not be discussed. If it appears anywhere in the text, it must be indexed according to the following criteria:

BASE SEQUENCES - if the article contains a DNA or an RNA sequence with 50 or more bases, index:

BASE SEQUENCE (NIM) + MOLECULAR SEQUENCE DATA (NIM)

AMINO ACID SEQUENCE - if an article contains an amino acid sequence of 15 or more amino acids, index:

AMINO ACID SEQUENCE (NIM) + MOLECULAR SEQUENCE DATA (NIM)

The amino acids may be represented by a one letter abbreviation or by the three letter abbreviation.

Alanine = A or Ala

CARBOHYDRATE SEQUENCE - if an article contains a carbohydrate sequence of 3 or more carbohydrates, index:

CARBOHYDRATE SEQUENCE (NIM) + MOLECULAR SEQUENCE DATA (NIM)

The carbohydrate may appear as a string of 3 letter abbreviations, written as the full name or as the molecular structure.

Many articles containing molecular sequence data supply Databank Accession Numbers indicating the databank where the sequences are deposited. This number usually appears at the bottom of the first page or on the last page and is usually entered by the editors. If it is not, the indexer must enter it. Databank accession numbers are added only when the article is the original report of the sequence. Authors frequently give accession numbers of already reported sequences in the body of the article often in tables, these are not added. When an accession number is entered, MOLECULAR SEQUENCE DATA must be indexed.

Most of the sequence data come from GENBANK. There is one exception - when the data are deposited in the protein data bank PDB, X-RAY CRYSTALLOGRAPHY (NIM) is indexed not MOLECULAR SEQUENCE DATA.

Last Reviewed: March 10, 2015

MEDLINE Indexing Online Training Course

Category G - Phenomena and Processes

Indexing Principles - Molecular Sequence Data