Basic Local Alignment Search Tool (BLAST)



Short description:

BLAST (Basic Local Alignment Search Tool) is a method to ascertain sequence similarity (for definitions of similarity and homology click here). The program takes a query sequence and searches it against the database selected by user. It aligns a query sequence against the every subject sequence in the database. The results are reported in a form of a ranked list followed by a series of individual sequence alignments, plus various statistics and scores. Every hit in that list is assigned with a similarity score S. Further, that score is analyzed how likely it is to arise by chance. For that purpose so called E-value is calculated for every hit. E-value for the score S tells the expected number of hits of the score S or higher in the database.

For detailed discussion of statistics used in BLAST check the following link.

How to use BLAST:

The Advance BLAST page has many parameters which you can adjust, and the outcome of a BLAST search will depend on the parameters you used.

A) Types of BLAST programs

There are five different blast programs, which can be distinguished by the type of the query sequence (DNA or protein) and the type of the subject database:

BLASTP compares an amino acid query sequence against a protein sequence database;

BLASTN compares a nucleotide query sequence against a nucleotide sequence database;

BLASTX compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database;

TBLASTN compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).

TBLASTX compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

B) Subject Databases

There are many databases to use as subject databases. One of the most commonly used is nr database: collection of "non-redundant" sequences from GenBank and other sequence databanks. For other subject databases available click here.

C) Sequence input

BLAST accept the sequence in FASTA format (see different formats we discussed last class) or Accession Number (GI number).

D) Parameters to adjust

EXPECT value: The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance. If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Increasing the EXPECT value forces the program to report less isgnificant matches.

FILTER (Low-complexity): Mask off segments of the query sequence that have low compositional complexity (i.e. regions of biased composition, such as short-period repeats)

BLAST Tutorials:

For BLAST first-time user tutorial click here. For more advanced one click here.

Exercise:

  1. Run BLAST search with GI 9229839 as a query sequence against nr database with default parameters. What is your query sequence? From which organism is it? How many BLAST hits did you get? Are all your hits orthologs? Do you find anything strange about the alignment?

  2. How many related sequences does this sequence have in Entrez? Why is there a difference in number of related sequences and the number of BLAST hits?

  3. Do 2 sequence BLAST search with the following 2 sequences: gi 9229839 and gi 2493101. What are the results? Is there anything strange in the alignment of those 2 sequences?

References:

  • Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.
  • Bioinformatics. A Practical Guide to the Analysis of Genes and Proteins. Edited by Baxevanis, AD, Franci Ouellette BF. Wiley&Sons, 1998.