Blast

Published on February 2017 | Categories: Documents | Downloads: 68 | Comments: 0 | Views: 706
of 6
Download PDF   Embed   Report

Comments

Content

BLAST- Basic Local Alignment Search Tool
Similarity search program developed at NCBI
Available as free service over the Internet
Provides very fast, accurate, and sensitive database searching
A heuristic algorithm which seeks local alignment to detect
relationships among sequences that share only isolated regions of
similarity
- Like FASTA, BLAST is a ‘word-based’ method
-

BLAST works through the following 3 steps.
1. Finds the list of high scoring words (w) and takes each word from
the query sequence (typically 3 for amino acids and 11 for
nucleotides), and locates all similar words in the current test
sequence.

2. Compares the words list to the database and identifies the exact
matches. If similar words are found, BLAST tries to expand the
alignment to the adjacent words, without allowing for gaps.

3. After all words are tested, a set of Maximal Segment Pairs
(MSPs) is chosen for that database sequence. Several short,

non-overlapping MSPs may be combined in a statistical test too
create a larger, more significant match.

Purpose of BLAST
As number of genomes is being sequenced, a researcher often
comes across a novel DNA or protein sequence for which no
functional information is available
Some basic information on the sequence is necessary before a
molecular biologist can even take the new sequence into the lab and
perform meaningful experiments with it.
Database searches reveal sequences that have some degree of
similarity to the query sequence and these sequences from the
databases are commonly referred as ‘hits’ (- to infer homology and
molecular function)
Identity- when two sequences are compared to each other, identity
indicates the extent to which the two sequences have the exact
same composition (i.e., nucleotide base or amino acid residue) at
equivalent positions, usually expressed as a percentage,
Similarity- when two genes or proteins are compared with each
other, similarity indicates the level of relatedness between the two on
the basis of their primary sequences. For DNA sequences, this is the
number of identical bases at equivalent positions, usually expressed
as a percentage.

Simple Classification of amino acids
Based on the nature of side-chains:






Aliphatic amino acids
Aromatic amino acids
Polar amino acids
Sulfur containing amino acids
Charged amino acids

G,A,V,L,I,P
F,Y,W
S,T,N,Q
C,M
D,E,H,K,R

Based on Hydrophobicity:
 Amino acids with hydrophilic side-chains
 Amino acids with hydrophobic side-chains
Based on charge:
 Positively charged
 Negatively charged

K,R
D,E

N,G,Q,R,H,K
V,I,L,M,P

BLAST Services from NCBI
1. Nucleotide BLAST- allows one to input nucleotide sequences
and compare these against other nucleotides.
2. Standard nucleotide-nucleotide BLAST- takes nucleotide
sequences in FASTA format, GenBank accession numbers or GI
numbers and compares them against the NCBI nucleotide
databases.
3. MEGA BLAST- This program uses a ‘greedy algorithm’ for
nucleotide sequence alignment searches and concatenates many
queries to save time spent scanning the database. It is optimized
for aligning sequences that differ slightly and is upto 10 times
faster than more common sequence similarity programs. It can be
used to compare two large sets of sequences against each other
and gives the results very quickly.
4. Protein BLAST- allows one input protein sequences and
compares these against other protein sequences.
5. Standard protein-protein BLAST- takes protein sequences in
FASTA format, GenBank accession numbers or GI numbers and
compares them against the NCBI protein database.
6. Pattern Hit Initiated BLAST (PHI-BLAST)- combines matching
of regular expression pattern with a Position Specific Iterative
protein search. PHI-BLAST can locate other protein sequences
that both contain the regular expression pattern and are
homologous to a query protein sequence.

7. Translating BLAST- translates query sequences or databases
from nucleotides to proteins so that protein-nucleotide sequences
can be performed.
8. Translated query- Protein database (BLASTX)- converts a
nucleotide query sequence into protein sequences in all 6 reading
frames. The translated protein products are then compared
against the NCBI protein databases.
9. Protein query- Translated database (TBLASTN)- takes a
protein query sequence and compares it against an NCBI
nucleotide database that has been translated in all six reading
frames.
10. Translated query- Translated database (TBLASTX)- converts
a nucleotide query sequence into protein sequences in all 6
reading frames and then compares this to an NCBI nucleotide
database which has been translated in all 6 reading frames.
11. Position Specific Iterated BLAST (PSI-BLAST)- an
implementation of BLAST for finding protein families. Instead of
using a single amino acid at a given position in the query
sequence, it is better to use a combination of amino acids known
to be present at the same position in that protein and related
ones. The search of sequence databases will thereby be
expanded to include additional related sequences that might
otherwise be missed. The major difficulty with such an expanded
search is that an alignment of related sequences must already be
available in order to know the variations at each position in the
query sequence. PSI-BLAST has been designed to provide
information on this variation starting with a BLAST search by a
single query sequence.

PSI-BLAST involves a series of repeated steps or iterations:

(i) A database search of a protein sequence database is performed
using a query sequence.
(ii) The results of the search are presented and can be assessed
visually to see if any database sequences that are significantly
related to the query sequence are present.
(iii) If such is the case, user decides to go through another iteration
of the search.
(iv) The high scoring sequence matches found in the first step are
aligned and from the alignment a sequence motif that indicates
the variations at each aligned position is produced. The
database is then searched with this motif. The search has thus
been expanded to include sequences that match the variations
found in the motif at each sequence position.
(v) The results are again displayed, indicating any newly discovered
sequences that are significantly ralted to the motif sequences in
additin to those found in the previous iteration.
(vi) Again, an opportunity is given to go through another iteration of
the program, but this time including any newly recruited
sequences to refine the motif. In this fashion, a new family of
sequences that are significantly similar to the original query
sequence can be found.
PSI-BLAST applications:









Distant homology detection
Fold assignment
Domain identification
Evolutionary analysis (i.e, tree building)
Sequence annotation/ function assignment
Profile export to other programs
Sequence clustering
Structural genomics target selection

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close