PubChem Training Course

About PubChem

Data Types

PubChem has many different categories for data that you can search for and access. This section will briefly introduce each data type to prepare you to search PubChem.


Data sources submit data about a chemical; that data becomes a Substance record in PubChem. These can include chemical structures, synonyms, registration IDs, descriptions, related URLs, patent identifiers, database cross-references to PubMed, protein 3D structures, and biological screening results. Every time a data source submits new information about a chemical, a new Substance record is generated. Substance summaries help you see who provided what.


If one or more Substance records contain structures that can be standardized to the same chemical structure, a single Compound record is created. In other words, if two Substances point to the same chemical structure, they point to the same Compound. The Compound summary is an aggregated view of all available information in PubChem about a chemical. Read more about the difference between a Substance and a Compound record in PubChem Docs.


When a data source submits to PubChem the description of biological assay experiments and bioactivity test results on substances, each experiment becomes a BioAssay record. Read more about BioAssays in PubChem Docs.

Targets: Genes and Proteins

PubChem Protein and Gene records include chemical information available for a given protein or gene (or protein encoded by the gene), including bioactivity data of chemicals that are tested against the corresponding protein or gene. Learn more about genes and proteins in PubChem Docs.


PubChem Pathway summaries include information about chemicals, genes, or diseases involved in or associated with a biological pathway. The NIH National Human Genome Research Institute defines a biological pathway as “a series of actions among molecules in a cell that leads to a certain product or a change in the cell. It can trigger the assembly of new molecules, such as a fat or protein, turn genes on and off, or spur a cell to move.” For example, there are pathways involved in metabolism. Learn more about Pathways in PubChem Docs.


Taxonomy summaries in PubChem display data associated with a specific organism, like a human or Norway rat. Learn more about taxonomies in PubChem Docs.


The PubChem Patent collection contains information on what chemicals are mentioned in a given patent document. Patents are out of this tutorial’s scope, but you can learn more about finding patents in PubChem Docs.


Answer these questions to check your understanding of the data types in PubChem:

  1. When a data source submits new information about a chemical, a _____ record is created.

  1. The summary page of a __________record provides an aggregated view of all available information in PubChem about a chemical.