Exercise 3: Find more relevant data for the pathogen
×
Task: Find more data for the pathogen including accessible experimental study data.It is often helpful for researchers to find more information about a pathogen for their work. However even knowing what is available can be a challenge. NCBI cross-references all molecular data based on the submitter's provided data source taxon which is then indexed and linked with records in the NCBI Taxonomy database. The number of records within each database for a taxon is quickly visible on a summary record - which enables researchers to click through and quickly retrieve those records for further examination.
In addition to sequence data, the Taxonomy records can include links to several different types of experimental study databases. Researchers are often interested in finding genotype/genetic variation data, gene expression studies or biological activity assays to include in their work on understanding their pathogen's biology as well as for developing diagnostics or therapeutics to assist the clinical community.
You will search and explore in NCBI's Taxonomy database to find information relevant to the pathogen. Then, you will identify and retrieve relevant experimental study data that may be helpful for further study of this organism.
Background
Taxonomy – central point of organization for NCBI’s molecular dataNCBI stores a vast amount and varied types of information for all sorts of organisms. The NCBI Taxonomy database was originally created to catalog and cross-reference GenBank sequences by their source organisms.
- Information in this database is manually curated in consultation with taxonomic experts, including:
- International Code of Virus Classification and Nomenclature
- International Code of Nomenclature of Prokaryotes
- International Code of Nomenclature for Algae, Fungi & Plants
- International Code of Zoological Nomenclature
- Data is continually reviewed and updated.
NCBI is known for linking related data within and across databases. All molecular data is connected by data source organism and NCBI Taxonomy provides the critical framework for this. On the right is a very old (but pretty cool) image of how Taxonomy links related records across various NCBI databases. (This interactive "subway map" view, as we liked to call it, was retired when it's technology became extremely outdated.)
In addition to using nucleotide sequences for identification of a pathogen, as shown in Exercise 1, or finding reference sequences with annotations to learn about the biomolecules that make up and help a pathogen survive, as shown in Exercise 2, obtaining or downloading experimental data can be helpful to make discoveries about organisms classified in the pathogen's taxonomic lineage.
Gathering data can provide the basis for secondary analysis producing new discoveries of existing datasets, prototyping new analysis workflows or pipelines to test for eventual use with newly collected data, or augmenting collected data to increase the power of the dataset's analysis.
In addition to using nucleotide sequences for identification of a pathogen, as shown in Exercise 1, or finding reference sequences with annotations to learn about the biomolecules that make up and help a pathogen survive, as shown in Exercise 2, obtaining or downloading experimental data can be helpful to make discoveries about organisms classified in the pathogen's taxonomic lineage.
Gathering data can provide the basis for secondary analysis producing new discoveries of existing datasets, prototyping new analysis workflows or pipelines to test for eventual use with newly collected data, or augmenting collected data to increase the power of the dataset's analysis.
Key NCBI Resources for this Exercise
The NCBI Taxonomy database was developed to provide structure for NCBI’s molecular data and can display and provide a link to taxonomically-relevant data. As in classical biological taxonomy, this is hierarchical and the resource enables browsing throughout the classical levels (Kingdom, Phyllum, Class, Order, Family, Genus, Species) and at additional levels in between and below.
PLEASE NOTE: Taxons included in NCBI Taxonomy are listed because someone submitted a nucleotide sequence to NCBI. This database does not contain all organisms known in our world.
A very quick overview of NCBI experiment study databases that might be of interest to you:
BioProject - a catalog of sequence-based research projects & related data
Note: dbGaP and ClinicalTrials.gov are databases for human clinical studies and trials. The organism for records in these databases are considered to be "human" (Homo sapiens), therefore Taxonomy links for infectious pathogens (which may, in fact, be a focus of a study) are not provided.
A very quick overview of NCBI experiment study databases that might be of interest to you:
BioProject - a catalog of sequence-based research projects & related data
- Sequence read Archive (SRA) - Studies including genome resequencing for assembly & annotation (DNASeq, RNAseq, ChIPSeq, etc.)
- Gene Expression Omnibus (GEO) - Functional genomics studies (the experiments are listed in GEO and link to relevant high-throughput sequence data in SRA)
- Genotype (Microarray/DNASeq)
- Expression (Microarray/RNASeq)
- Epigenomics
- Database of Genotypes and Phenotypes (dbGaP) - Human Clinical Studies information and related data
Other study databases with data (not listed in BioProject)
- PubChem BioAssay - Biological activity assays (experiments measuring the impact of an exposure to a particular chemical or RNAi on cell growth, enzyme activity, receptor binding, etc.)
- ClinicalTrials.gov - Human clinical trials (a listing repository with result information included for some trials)
Note: dbGaP and ClinicalTrials.gov are databases for human clinical studies and trials. The organism for records in these databases are considered to be "human" (Homo sapiens), therefore Taxonomy links for infectious pathogens (which may, in fact, be a focus of a study) are not provided.
Your Turn: Find more relevant data for the pathogen for future studies!
Use the name of your patient's pathogen to begin your search following the steps below.
Click below if you need a hint on what organism you found:
Identified viral isolate
Identified bacterial isolate
Identified fungal isolate
Explore the Taxonomy database for information about the pathogen
- Search the Taxonomy database (https://www.ncbi.nlm.nih.gov/taxonomy/) with your pathogen name.
NOTE: NCBI also has sequence data from environmental isolates or “metagenomes” in Taxonomy.
NCBI metagenomes are a defined, pseudo-organismal eco-system consisting of the genomes and genome products of many individual organisms, such as a microbiome.
You can search the Taxonomy database with the term "metagenomes" to see the types of community isolates that people have submitted to NCBI sequence databases..
- Click a name to see a hierarchical list of all organisms underneath your term’s taxonomic node.
If you need it, you can click here to get to a link for the pathogen Taxonomy hierarchy view.
-
-
-
- Viral Taxonomy hierarchy view
- Bacterial Taxonomy hierarchy view
- [Candida] fungal Taxonomy hierarchy view vs. Candida fungal Taxonomy hierarchy view
- and, in case you want to see it, here's a link to the Metagenomes Taxonomy hierarchy view
-
-
What do you see?
- At the top of the displayed list, you can see the flat taxonomic hierarchy for the data above the taxon level shown. By clicking on names, you can move up or down to explore the Hierarchy. Hint: If you “mouse over” the name it will show you the name of the level.
- In the structured hierarchical view below, you can use the control panel at the top to:
-
- Control the number of levels shown.
-
- Display the number of relevant datasets from various NCBI databases or projects
Pick a few databases or resources and select them and then click the "Display" button to show how many records there are for each organism shown within the structured view!
For example, at the top of the page, check the blue Nucleotide, red Protein, pink Genome, purple Gene and/or green Structure, then click the "Display" button.
For your particular pathogen of interest, how many records are available in each database?
Learn more about the pathogen, including types and amounts of related data at NCBI
-
- Select a particular species, sub-species, strain, or metagenome type that you are particularly interested in. (You may need to scroll or use the web-browser’s “Find in page” function (Ctrl+F) to search for it in the list.
-
- Click on the name to see the Taxonomy summary page.
If you need it, you can click here to get to a link for the pathogen Taxonomy record page.
-
- Viral Taxonomy record page
- Bacterial Taxonomy record page
- [Candida] auris fungal Taxonomy record page vs. Candida albicans fungal Taxonomy record page
- For a metagenome.....pick one of your choice!
-
- Explore the Taxonomy summary page to see all of the information available for the pathogen.
Every organism has it's own collection of information, some with more - some with less....
- Explore the Taxonomy summary page to see all of the information available for the pathogen.
NCBI’s Taxonomy group works with outside experts and our Reference Sequence “Genome Champions” to curate these records. Notes with information or references about the official designations are often added in addition to common names, references and helpful links to eternal resources. We're always looking to add information that researchers would find helpful. Is there any additional information that you would like to see? Let us know! |
-
- Look on the right-side of the page for the data table listing the number of records and provides links to databases for which we have records for this particular organism or metagenome.
Subtree and Direct links are provided in many cases – what do you think those represent?
Find an interesting experimental study for the pathogen
First, what type of experimental study data would you like to search for?
NOTE: You can search each of these databases directly, but the easiest way to find records that are specific to your organism is from this Taxonomy page!
NOTE: You can search each of these databases directly, but the easiest way to find records that are specific to your organism is from this Taxonomy page!
Take-away Message
- If you are looking for information based on a specific organism or taxon - start by searching the NCBI Taxonomy database! While the database does not have every organism known in the world, it does contain referenced information and links to all of the data for that organism that we have within NCBI resources.
- The Taxonomy hierarchical view will show you links to records in NCBI databases for specific organisms as well as for neighboring taxons - which may help you find helpful data for your own experiments.
- NCBI provides experimental datasets, easily findable through Taxonomy links, to quickly find studies that you can download for further advanced exploration.
For more advanced work....
- Getting a bit of experience starting on the Amazon Web Services (AWS) Cloud with viral sequence data, an NCBI Workshop "An Introduction to NCBI Cloud Computing for Virologists" (February 8, 2022)
-
A NCBI Workshop showing how to mine pathogenic microbe data in the Pathogen Detection Project resources: "Identifying Clinically Relevant Genes in Bacterial Genomes" (May 19, 2022)
- For those of you interested in identifying members of communities with metagenomes: an NCBI Workshop "An NCBI Guide to Finding & Analyzing Metagenomic Data" (October 28, 2021)
- We are working on creating more advanced workshops for accessing and doing preliminary work with NCBI Experimental study data. Please stay tuned to the NCBI Outreach Events page!
Last Reviewed: August 5, 2022