Extended Examples and Advanced Material
Accessing Data from the Gene Expression Omnibus
NCBI Gene Expression Omnibus (GEO) DataSets database stores original submitter-supplied gene expression, a subset of which are curated DataSets. Curated DataSets are enabled with options for advanced data display and analysis features that can help you identify differentially expressed genes and generate figures. In this example we are going to look at a Curated DataSet to identify genes whose expression differs between two species of yeast.
1. Navigate to the landing page for GEO DataSets, and search for studies containing both yeast species:
saccharomyces_cerevisiae[orgn] AND saccharomyces_pastorianus[orgn]
If needed, here is a direct link to these results.
2. There are a lot of intriguing results, but let's focus on the following result. If you are interested, don't forget to use the `Send To` menu to store this result in your Yeast Research collection!
3. Clicking on the title will take you to a GEO Accession Page for the DataSet that has lots of useful info about the Dataset (list some examples here). Ultimately, we will scroll to the Analyze with GEO2R button and click!
4. This will take you a page containing a table with all of the individual samples that were part of the original study. To conduct a differential expression analysis, you will need to designate at least two different groups of samples to compare. For this example, let's compare S. cerevisiae and S. pastorianus at 8 hours.
- Click Define Groups, and enter the names of the two groups, such as:

- Select the three 8 hours after pitching samples for S. cerevisiae, and then click the appropriate group name to assign:
- Select the three 8 hours after pitching samples for S. pastorianus, and then click the appropriate group name to assign, your interface should end up looking something like this, with 3 samples selected in each group:

- Click the Analyze button!
5. First, take a look at the row of available plots that are automatically generated by GEO2R, many of which are standard in differential expression analyses.
For example, the UMAP plot shows very obvious separation between the two species even at the same 8-hour time point:
6. Second, further down the page, GEO2R provides an ordered list of the actual differentially expressed genes.
Clicking the row of the first gene gives you access to information about the expression of that particular gene:
7. Now that we have the name of a the topmost differentially expressed gene (HPF1), we could search for it in the NCBI Gene Database. On the Gene Page for HGF1, taking a look at the publication list can give you some idea of the role of this gene:
And you can actually from the abstract of one of these papers that it is related to haze formation - an important aspect of wine and beer!
Downloading whole genomes from Datasets
Last Reviewed: June 14, 2023