How BLAST Works
Introduction
We will take a high-level view of the steps performed by BLAST to generate an alignment, with an emphasis on the "words" used to seed BLAST alignments, and we'll briefly discuss Expect values.For more detail, see this explanation of the Blast process.
Global versus local alignments

BLAST overview
Setup
-
- read in the query, database, and search parameters
- apply query filters, e.g., low complexity and repeats
- make a lookup table of query “words”
Preliminary search
-
- scan the database for word matches
- gap-free extensions
- gapped extensions, minus deletions/insertions
Traceback
-
-
gapped extensions, calculate the deletions/insertions
-
Nucleotides: Word size, and Summary


Proteins: Word size, and Summary




Expect values
E = number of database hits you expect to find by chance, ≥ S |
Read about: The Statistics of Sequence Similarity Scores


BLAST Expect Value (In a Nutshell)
- E = number of database hits you expect to find by chance
- As the database size increases .... E increases
- As the score increases .... E decreases
Limits, Errors and Warnings
Web BLAST Search Limits
- 5,000 - maximum number of target sequences
- 1,000,000 - maximum sequence length for nucleotide queries
- 100,000 - maximum sequence length for protein queries
BLAST News Feed | NCBI Insights Blog about BLAST settings
Error Messages


Warning Message

Last Reviewed: September 30, 2022