NLM logo

GO (Gene Ontology) - Source Representation


This page lists specific source data elements and provides information on their representation in the UMLS Metathesaurus.


VSAB: GO2023_05_10

Notes

GO "terms" are analagous to the Metathesaurus idea of "descriptors," i.e. a collection of closely related but not necessarily synonymous names. In the documentation below, the word "Term" is used in the the GO sense of the word in order to describe the original GO data in the language used by the source itself.

Summary of Changes:

There have been no changes to the GO release format or Metathesaurus source processing.

Source-Provided Files: Summary

Data files in various formats and extensive documentation is available at: http://www.geneontology.org/

Documentation and Reference

File Name Description
http://owlcollab.github.io/oboformat/doc/GO.format.obo-1_2.html OBO format documentation

Data Files

File Name Description
go.obo OBO data file

Not included:
* Certain fields and data elements may not be directly processed because they contain redundant data.


Source-Provided Files: Details

In OBO, content is structured as tag-value pairs, with optional trailing modifiers. Tag-value pairs consist of a tag name, an unescaped colon, the tag value, and a newline.

Consult the GO OBO format documentation for additional details.

file:gene_ontology_edit.obo.<YYYY-DD-MM>

For inclusion in the Metathesaurus, all tags in a [Term] Stanza, are processed, as described in the OBO format documentation.

Required tags:
Tag Name Description Representation
id Unique identifier MRCONSO.CODE, MRCONSO.SDUI
name Term name MRCONSO.STR
TTY = "PT" if there is no is_obsolete flag
TTY = "OP" if is_obsolete = true
namespace Ontology (domain) name. Valid values are:
biological_process
cellular_component
molecular_function
MRSAT.ATN = "GO_NAMESPACE"
Optional tags: the following table lists only tags actually present in the data processed for inclusion in this version of the Metathesaurus. Tags are presented in the order in which they appear in a [Term] Stanza. For information on all optional OBO tags, including deprecated tags, please consult the OBO format documentation.
Tag Name Description Representation
alt_id Alternate identifier MRSAT.ATN = "SID"
def Definition MRDEF.DEF
comment Comment MRSAT.ATN = "GO_COMMENT"
subset Subset. Valid values are:
goslim_candida
goslim_generic
goslim_goa
goslim_pir
goslim_plant
goslim_pombe
goslim_yeast
gosubset_prok
MRSAT.ATN = "GO_SUBSET"
synonym Synonym MRCONSO.STR
TTYs are assigned based on the value of the scope identifier and the presence or absence of the "is_obsolete" tag.

MRREL.REL
REL values are assigned based on the value of the scope identifier

Scope identifier is_obsolete MRCONSO.TTY MRREL.REL
EXACT (no value) SY SY
EXACT true IS SY
NARROW (no value) ET RN
NARROW true OET RB
BROAD (no value) ET RB
BROAD true OET RN
RELATED (no value) ET RO
RELATED true OET RO

Optional dbxref lists included in synonym tags are represented in MRSAT.ATN = "REF"

When present on "EXACT" synonyms, the modifier "systematic_synonym" is represented in MRSAT.ATN ="SYN_QUALIFIER," MRSAT.ATV="systematic_synonym"
xref Cross-reference (dbxref) describing an analagous term in another vocabulary MRSAT.ATN = "GXR"
is_a Indicates a subclassing relationship between one term and another Used to create a hierarchy, represented in MRHIER.RRF and MRREL.RRF (REL = "PAR," "CHD"; RELA = "isa")
relationship Describes a relationship between one term and another. Valid values are:
ends_during
happens_during
has_part
negatively_regulates
occurs_in
part_of
positively_regulates
regulates
MRREL.RELA

For part_of, MRREL.REL = "RN/RB"
For all other relationship tag values, MRREL.REL = "RO"
is_obsolete Indicates if a term is obsolete Used to compute termtype assignments
replaced_by Gives a term which replaces an obsolete term MRREL.REL = "RO"
MRREL.RELA = "replaced_by/replaces"
consider Gives a term which may be an appropriate substitute or an obsolete term, but requires human review MRREL.REL = "RO"
MRREL.RELA = "consider/consider_from"
disjoint_from Indicates that a term is disjoint from another, meaning that the two terms have no instances or subclasses in common. The value is the id of the term from which the current term is disjoint. MRSAT.ATN = "DISJOINT_FROM"
intersection_of Indicates that this term is equivalent to the intersection of several other terms. The value is either a term id, or a relationship type id, a space, and a term id. If the value is a GO term id: MRREL.REL = "RN/RB"; MRREL.RELA = "isa/inverse_isa"
If the value includes a relationship type: MRREL.REL and MRREL.RELA are populated appropriately based on the value of the relationship type. Relationship groups are used tolink pairs (or groups) of intersecting relationship.
creation_date Indicates the creation time and date of the term MRSAT.ATN = "DATE_CREATED"