An Introduction to Molecular Evolutionary Analysis with NCBI Datasets and Python
Workshop Duration: 3 hours
Content Difficulty: Intermediate
Target Audience:
This workshop is designed for biologists interested in evolutionary biology nucleotide sequence analsis who have some scripting experience.
Workshop Description:
As they diverge from a common ancestor, species accumulate differences in their DNA sequences. Differences within a protein-coding region are classified in two types. Non-synonymous substitutions change the amino acid sequence of the protein, while synonymous substitutions do not. Synonymous substitutions are largely invisible to natural selection and tend to accumulate at a constant rate. On the other hand, non-synonymous substitutions whose effects are beneficial accumulate at a faster rate, while those that are deleterious are suppressed. By comparing the rates of non-synonymous and synonymous substitutions, we can infer whether natural selection has primarily acted to conserve the protein sequence or to adapt it to a new environment or function.
In this workshop we will compare the protein-coding sequences of two species to estimate which proteins show signs of adaptation. Working in a Jupyter notebook with bash and Python, we will use the NCBI Datasets command line interface (CLI) to download sequence data, then perform analysis with a few popular Python packages. The workshop assumes basic familiarity with a scripting language such as Python or R at a level equivalent to a semester course or programming bootcamp.
In this workshop, you will learn how to:
In this workshop we will compare the protein-coding sequences of two species to estimate which proteins show signs of adaptation. Working in a Jupyter notebook with bash and Python, we will use the NCBI Datasets command line interface (CLI) to download sequence data, then perform analysis with a few popular Python packages. The workshop assumes basic familiarity with a scripting language such as Python or R at a level equivalent to a semester course or programming bootcamp.
In this workshop, you will learn how to:
- Search for and download protein ortholog sequences with NCBI Datasets CLI
- Parse the downloaded files with BioPython
- Identify synonymous and non-synonymous substitutions and calculate substitution rates
- Plot the results with Matplotlib
Data Access Technology: Jupyter Notebook, Python, Command-line
NCBI Resources: NCBI Datasets
Last Reviewed: August 5, 2022