Skip Navigation Bar
National Library of Medicine Technical BulletinNational Library of Medicine Technical Bulletin

Table of Contents: 2022 MAY–JUNE No. 446

Previous Next

NCBI Virtual Workshop Series: An Introduction to Molecular Evolutionary Analysis with NCBI Datasets and Python

NCBI Virtual Workshop Series: An Introduction to Molecular Evolutionary Analysis with NCBI Datasets and Python. NLM Tech Bull. 2022 May-Jun;(446):b10.

2022 June 10 [posted]

The National Center for Biotechnology Information (NCBI) is pleased to announce an upcoming virtual workshop that is now accepting applications. The NCBI Education Team invites you to apply and share this opportunity with any colleagues who may be interested. Our NCBI Virtual Workshops series was developed last fall to expand our reach to individuals who use NCBI resources for biological/biomedical research, science education, and clinical application efforts. Please review the workshop description to understand the intended audience. Each workshop will accept a limited number of participants to facilitate the best possible educational experience.

Title: An Introduction to Molecular Evolutionary Analysis with NCBI Datasets and Python

Date and time:
 Thursday, July 21, 2022, 1:00 p.m. – 4:00 p.m. ET

Description: As they diverge from a common ancestor, species accumulate differences in their DNA sequences. Differences within a protein-coding region are classified in two types. Non-synonymous substitutions change the amino acid sequence of the protein, while synonymous substitutions do not. Synonymous substitutions are largely invisible to natural selection and tend to accumulate at a constant rate. On the other hand, non-synonymous substitutions whose effects are beneficial accumulate at a faster rate, while those that are deleterious are suppressed. By comparing the rates of non-synonymous and synonymous substitutions, we can infer whether natural selection has primarily acted to conserve the protein sequence or to adapt it to a new environment or function.

In this workshop you will learn to compare the protein-coding sequences of two species to estimate which proteins show signs of adaptation. Working in a Jupyter notebook with bash and Python, you will use the NCBI Datasets command line interface (CLI) to download sequence data, then perform analysis with a few popular Python packages. The workshop assumes basic familiarity with a scripting language such as Python or R at a level equivalent to a semester course or programming bootcamp.

In this workshop, you will learn how to:

  • Search for and download protein ortholog sequences with NCBI Datasets CLI
  • Parse the downloaded files with BioPython
  • Identify synonymous and non-synonymous substitutions and calculate substitution rates
  • Plot the results with Matplotlib

Apply here:

Application close date:
July 4, 2022

Due to curricular and technical limits, we've capped the number of spots to provide the best workshop experience. If you register to apply, you will be notified of your application status 2 weeks before the scheduled event.

This workshop requires a stable internet connection and the use of a modern web browser. We will be presenting using Zoom. For some tips about this platform, please refer to Zoom Support Documents.

Our workshops are intended to provide hands-on experience, so we will encourage you to follow along and perform practice exercises during the event. While many users are able to move back and forth between Zoom and practice exercises on a single screen, it is helpful to have two screens available for the event. If you only have access to one computer screen, viewing the Zoom session on a tablet or phone and using your computer for practice exercises can be helpful.

Please see our FAQs page for more information and if you still have questions about the NCBI Outreach Events program or this specific workshop, email us at

NLM Technical Bulletin National Library of Medicine National Institutes of Health