Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Extended Examples and Advanced Material


Accessing Data from the Gene Expression Omnibus

 

NCBI Gene Expression Omnibus (GEO) DataSets database stores original submitter-supplied gene expression, a subset of which are curated DataSets. Curated DataSets are enabled with options for advanced data display and analysis features that can help you identify differentially expressed genes and generate figures. In this example we are going to look at a Curated DataSet to identify genes whose expression differs between two species of yeast.  



1. Navigate to the landing page for GEO DataSets, and search for studies containing both yeast species: 

saccharomyces_cerevisiae[orgn] AND saccharomyces_pastorianus[orgn] 

If needed, here is a direct link to these results. 

GEO Datasets search for data with both cerevisiae and pastorianus species

 

2. There are a lot of intriguing results, but let's focus on the following result. If you are interested, don't forget to use the `Send To` menu to store this result in your Yeast Research collection! 


Selected search result item for GEO DataSets search

 

3. Clicking on the title will take you to a GEO Accession Page for the DataSet that has lots of useful info about the Dataset (list some examples here). Ultimately, we will scroll to the Analyze with GEO2R button and click!

Screenshot of a GEO accession page

 

4. This will take you a page containing a table with all of the individual samples that were part of the original study. To conduct a differential expression analysis, you will need to designate at least two different groups of samples to compare. For this example, let's compare S. cerevisiae and S. pastorianus at 8 hours. 

  • Click Define Groups, and enter the names of the two groups, such as: 

GEO2R Group Names interface
  • Select the three 8 hours after pitching samples for S. cerevisiae, and then click the appropriate group name to assign: 
  • Select the three 8 hours after pitching samples for S. pastorianus, and then click the appropriate group name to assign, your interface should end up looking something like this, with 3 samples selected in each group: 
Geo2R sample selection interface

  • Click the Analyze button!

5. First, take a look at the row of available plots that are automatically generated by GEO2R, many of which are standard in differential expression analyses. 

plots on GEO2R results page


For example, the UMAP plot shows very obvious separation between the two species even at the same 8-hour time point: 

UMAP plot from GEO2R results page


6. Second, further down the page, GEO2R provides an ordered list of the actual differentially expressed genes. 

Geo2R differential expression table

Clicking the row of the first gene gives you access to information about the expression of that particular gene: 

Geo2R sample value for individual gene

7. Now that we have the name of a the topmost differentially expressed gene (HPF1), we could search for it in the NCBI Gene Database.  On the Gene Page for HGF1, taking a look at the publication list can give you some idea of the role of this gene: 

publications on HPF1 gene page




And you can actually from the abstract of one of these papers that it is related to haze formation - an important aspect of wine and beer! 

 

Downloading whole genomes from Datasets

Last Reviewed: June 14, 2023