bioinformatics

Published on May 2017 | Categories: Documents | Downloads: 64 | Comments: 0 | Views: 511
of 23
Download PDF   Embed   Report

Comments

Content

Multiple sequence alignment

Sumbitted to: Dr.Navneet Choudhary

Submitted by: Ramesh Bishnoi Nikita jain

What is Multiple Sequence Alignment
‡ A sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. ‡ The input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor. ‡ Used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides. ‡ Most multiple sequence alignment programs use heuristic methods rather than global optimization because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive.

An example of Multiple Alignment VTISCTGSSSNIGAG-NHVKWYQQLPG QLPG VTISCTGTSSNIGS--ITVNWYQQLPG QLPG LRLSCSSSGFIFSS--YAMYWVRQAPG QAPG LSLTCTVSGTSFDD--YYSTWVRQPPG QPPG PEVTCVVVDVSHEDPQVKFNWYVDG-ATLVCLISDFYPGA--VTVAWKADS-AALGCLVKDYFPEP--VTVSWNSG--VSLTCLVKGFYPSD--IAVEWWSNG--

Goals of Multiple Sequence Alignment1. 2. 3. To generate a concise, information rich summary of sequence data. Used to illustrate the dissimilarity between a group of sequences. Alignments can be treated as models that can be used to treat hypothesis. Use in phylogenetics -Multiple sequence alignments can be used to create a phylogenetic tree. Used to identify functionally important sites, such as binding sites, active sites, or sites corresponding to other key functions, by locating conserved domains.

4.

5.

Why we do multiple alignments?
1. 2. Simple sequence comparison Conserved vs. non-conserved regions
1. proteins - motifs/profiles 2. whole genome - genes, control regions

3.

Homology (as opposed to similarity)
1. Evolution - phylogeny 2. Structural homology

4.

Sequence differences
1. Single Nucleotide Polymorphisms (SNPs)

5. 6.

Help prediction of the secondary and tertiary structures of new sequences; Preliminary step in molecular evolution analysis using Phylogenetic methods for constructing phylogenetic trees.

Multiple Alignment Method
‡ The most practical and widely used method in multiple sequence alignment is the hierarchical extensions of pairwise alignment methods.

‡ The principal is that multiple alignments is achieved by successive application of pairwise methods.

Multiple Alignment Method
‡ The steps are summarized as follows:
‡ Compare all sequences pairwise. ‡ Perform cluster analysis on the pairwise data to generate a hierarchy for alignment. This may be in the form of a binary tree or a simple ordering ‡ Build the multiple alignment by first aligning the most similar pair of sequences, then the next most similar pair and so on. Once an alignment of two sequences has been made, then this is fixed. Thus for a set of sequences A, B, C, D having aligned A with C and B with D the alignment of A, B, C, D is obtained by comparing the alignments of A and C with that of B and D using averaged scores at each aligned position.

Steps in Multiple Alignment

Multiple Sequence Alignment Tools
‡ BLOCKS : HMM profile library ‡ CDD: Conserved domain database ‡ Interpro: A unified resource combining PROSITE, PRINTS, ProDom And Pfam ‡ iProClass database :From the Protein Information Resource ‡ Pfam: Profile HMM library ‡ ClustalW: general purpose multiple sequence alignment program ‡ DIALIGN: local MSA ‡ MultAlin :Multiple sequence alignment with hierarchical clustering ‡ MSA: Multiple Sequence Alignment ‡ PileUp: general multiple sequence alignment program ‡ SAGA and COFFEE: Cedric Notredame's work .

ClustalW- for multiple alignment
‡ ClustaW is a general purpose multiple alignment program for DNA or proteins. ‡ ClustalW is produced by Julie D. Thompson, Toby Gibson of European Molecular Biology Laboratory, Germany and Desmond Higgins of European Bioinformatics Institute, Cambridge, UK. Algorithmic ‡ ClustalW is cited: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. ‡ ClustalW can create multiple alignments, manipulate existing alignments, do profile analysis and create phylogentic trees. ‡ Alignment can be done by 2 methods: ± - slow/accurate ± - fast/approximate ‡

Running ClustalW
[~]% clustalw ************************************************************** ******** CLUSTAL W (1.7) Multiple Sequence Alignments ******** **************************************************************

1. Sequence Input From Disc 2. Multiple Alignments 3. Profile / Structure Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program)

Your choice:

Running ClustalW
The input file for clustalW is a file containing all sequences in one of the following formats:

‡NBRF/PIR, EMBL/SwissProt, ‡ Pearson (Fasta), ‡GDE, ‡Clustal, ‡GCG/MSF, ‡ RSF.

Output of ClustalW
CLUSTAL W (1.7) multiple sequence alignment HSTNFR GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG------GCAG SYNTNFTRP GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG-----GCAG CFTNFA -------------------------------------------TGTCCAG------ACAG CATTNFAA GGGAAGAG---CTCCCACATGGCCTGCAACTAATCAACCCTCTGCCCCAG-----ACAC RABTNFM AGGAGGAAGAGTCCCCAAACAACCTCCATCTAGTCAACCCTGTGGCCCAGATGGTCACCC RNTNFAA AGGAGGAGAAGTTCCCAAATGGGCTCCCTCTCATCAGTTCCATGGCCCAGACCCTCACAC OATNFA1 GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG-----ACAC OATNFAR GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG-----ACAC BSPTNFA GGGAAGAGCAGTCCCCAGGTGGCCCCTCCATCAACAGCCCTCTGGTTCAA-----ACAC CEU14683 GGGAAGAGCAATCCCCAACTGGCCTCTCCATCAACAGCCCTCTGGTTCAG-----ACCC ** *

Blocks database and tools
‡ Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. ‡ The Blocks web server tools are : Block Searcher, Get Blocks and Block Maker. These are aids to detection and verification of protein sequence homology. ‡ They compare a protein or DNA sequence to a database of protein blocks, retrieve blocks, and create new blocks,respectively.

The BLOCKS web server
At URL: http://blocks.fhcrc.org/ The BLOCKS WWW server can be used to create blocks of a group of sequences, or to compare a protein sequence to a database of blocks. The Blocks Searcher tool should be used for multiple alignment of distantly related protein sequences.

The Blocks Searcher tool
‡ For searching a database of blocks, the first position of the sequence is aligned with the first position of the first block, and a score for that amino acid is obtained from the profile column corresponding to that position. Scores are summed over the width of the alignment, and then the block is aligned with the next position. ‡ This procedure is carried out exhaustively for all positions of the sequence for all blocks in the database, and the best alignments between a sequence and entries in the BLOCKS database are noted. If a particular block scores highly, it is possible that the sequence is related to the group of sequences the block represents.

The Blocks Searcher tool
‡ Typically, a group of proteins has more than one region in common and their relationship is represented as a series of blocks separated by unaligned regions. If a second block for a group also scores highly in the search, the evidence that the sequence is related to the group is strengthened, and is further strengthened if a third block also scores it highly, and so on.

The BLOCKS Database
The blocks for the BLOCKS database are made automatically by looking for the most highly conserved regions in groups of proteins represented in the PROSITE database. These blocks are then calibrated against the SWISS-PROT database to obtain a measure of the chance distribution of matches. It is these calibrated blocks that make up the BLOCKS database.

The Block Maker Tool
‡ Block Maker finds conserved blocks in a group of two or more unaligned protein sequences, which are assumed to be related, using two different algorithms. ‡ Input file must contain at least 2 sequences. ‡ Input sequences must be in FastA format. ‡ Results are returned by e-mail.

T-Coffee‡ It allows the combination of a collection of multiple/pairwise, global or local alignments into a single model ‡ Pairwise global alignment ‡ Pairwise local alignment ‡ Combined above two into a library ‡ Builds MSA with highest consistency with the library of alignments (progressive assembly)

T-Coffee

DiAlign‡ It constructs pairwise and multiple alignments by comparing whole segments of the sequences. ‡ Alignment of whole segments and not individual amino acids (bases) ‡ Pair wise comparison > segment pairs (diagonals), represent local alignments ‡ Diagonals weighted for likelihood ‡ Alignment built from consistent diagonals ‡ No gap penalties ‡ Independent of sequence order

Fig: DiAlign

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close