Short description:BLAST (Basic Local Alignment Search Tool) is a method to ascertain sequence similarity (for definitions of similarity and homology click here). The program takes a query sequence and searches it against the database selected by user. It aligns a query sequence against the every subject sequence in the database. The results are reported in a form of a ranked list followed by a series of individual sequence alignments, plus various statistics and scores. Every hit in that list is assigned with a similarity score S. Further, that score is analyzed how likely it is to arise by chance. For that purpose so called E-value is calculated for every hit. E-value for the score S tells the expected number of hits of the score S or higher in the database.
For detailed discussion of statistics used in BLAST check the following link.
How to use BLAST:
The Advance BLAST page has many parameters which you can adjust, and the outcome of a BLAST search will depend on the parameters you used.
A) Types of BLAST programs
There are five different blast programs, which can be distinguished by the type of the query sequence (DNA or protein)
and the type of the subject database:
B) Subject Databases
There are many databases to use as subject databases. One of the most commonly used is nr database: collection of "non-redundant" sequences from GenBank and other sequence databanks. For other subject databases available click here.
C) Sequence input
BLAST accept the sequence in FASTA format (see different formats we discussed last class) or Accession Number (GI number).
D) Parameters to adjust
EXPECT value: The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance. If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Increasing the EXPECT value forces the program to report less isgnificant matches.
FILTER (Low-complexity): Mask off segments of the query sequence that have low compositional complexity (i.e. regions of biased composition, such as short-period repeats)