Bioinformatics

Published on May 2017 | Categories: Documents | Downloads: 51 | Comments: 0 | Views: 587
of 33
Download PDF   Embed   Report

Comments

Content

CCS HAU

Bioinformatics

Bio(-)informatics

Dr. Sudhir Kumar
CCS HAU, Hisar
[email protected]

CCS HAU
Bio = Biology/biological

Bioinformatics

Informatics = Information Science including technology

CCS HAU

Bioinformatics

What is Bioinformatics?
Mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information.
Bioinformatics is conceptualizing biology in terms of macromolecules and then applying “informatics” techniques to understand and organize the information associated with these molecules, on a large scale.

CCS HAU

Bioinformatics

Bioinformatics
• Bioinformatics is the application of information technology to analyze, process, and manage biological data. • Bioinformatics provides computational tools to facilitate the process of
Data Information Knowledge Discovery

CCS HAU
• Cell
Nucleotide Bases Amino Acids Exons Folding Proteins Protein Circuits Biological Functions Regulation of gene expression

Bioinformatics

Suggestive Biology-Language Homologies • Human Language
Alphabet Words Phrases Syntax Word Senses Sentences Semantics Language generation

CCS HAU Overview

Bioinformatics

• Biological databases are being produced at a phenomenal rate • As a result computers are becoming indispensable for biological research • Aims 1- organize data 2- develop tools 3- use tools to apply to biology

CCS HAU
Bioinformatics

Bioinformatics

-Genome and protein databases -aligning sequences -searching -visualizing protein structure -homology modeling -molecular mechanics and molecular dynamics -structure prediction -docking -drug design -metabolic pathways -NMR and x-ray crystallography and many more ….

CCS HAU
Definitions:

Bioinformatics

Biocomputing and computational biology are synonyms and describe the use of computers and computational techniques to analyze any type of a biological system, from individual molecules to organisms to overall ecology. Bioinformatics describes using computational techniques to access, analyze, and interpret the biological information in any type of biological database. Sequence analysis is the study of molecular sequence data for the purpose of inferring the function, interactions, evolution, and perhaps structure of biological molecules. Genomics analyzes the context of genes or complete genomes (the total DNA content of an organism) within the same and/or across different genomes. Proteomics is the subdivision of genomics concerned with analyzing the complete protein complement, i.e. the proteome, of organisms, both within and between different organisms.

CCS HAU
First “Behind the Screen”

Biological databases are largely devoted to search.

Bioinformatics

– Also, integrity, security, etc. •
Search means taking a query and retrieving some database entry that matches it. Efficiency is a key; want to find things fast, regardless of how big the database gets.



CCS HAU
Rate of growth

Bioinformatics

CCS HAU


Bioinformatics

Bioinformatics: post-genomic era
High-throughput technologies generate petabytes of data
Sequencing, Microarray, Recombinatory chemistry, High throughput screening, Mass spectroscopy, …


Rapid growth of data and databases in the public and private domains
Genomics, Gene expression profiles, Proteomics, Pharmacogenomics, Clinical trials, Literature, …



Proliferation of computational tools for data analysis and processing
Statistical analysis tools for sequence analysis and gene finding, Clustering algorithms, Protein folding and structure predictions,Drug docking, Visualization tools, Data mining tools, …

CCS HAU

Bioinformatics

The Promises
• Digitization of the biological systems and processes
Simulation and Modeling of protein-protein interactions, protein pathways, genetic networks, biochemical and cellular processes, normal and disease physiological states,…

• Blurring of the boundary between experimentally generated data and computational data search and analysis • In silico discovery in complement with wet lab experiments

The Landscape of Biological Data Sources
PRINTS BLOCKS PFAMA PROSITEDOC SWISSFAM PROSITE PRODOM EMBL DSSP DBSTS SWISSPROT Entrez PDB RHDB GENBANK HUGO GDB OMIM Clinical DB dbSNP Contact dbSNP Population WIT KEGG FASTA BLAST STKE ENZYME SSEARCH C. Elegans Microbial Genomes Fly Base EBI TAXONOMY GENETICCODE Celera DDBJ TFSITE TFCELL TREEMBL DOMO PFAMB PIR NRL3D Patent USPTO Patent PCT Patent JPO Medline GENEPEPT LOCUS LINK TFCLASS TFMATRIX UNIGENE GSDB TIGR

SNP

CLUSTALW

SNP Consortium

CCS HAU

Bioinformatics

Databases are of two types - Primary & Secondary
PRIMARY DATABASES SECONDARY DATABASES



• •

Primary source of information and can be consider as reservoir of sequence information. Primary repository for the newly discovered sequence. e.g. Genbank at NCBI, EMBL, DDBJ









These databases derives the information by resolving the primary databases. They express any particular attribute of the primary databases. ( like motif, pattern etc.) They add the value to the information present in the primary databases. Eg., pfam, BLOCK, prints etc.

CCS HAU
• NCBI • EMBL • DDBJ
( http://www.ncbi.nlm.nih.gov) (http:// www.ebi.ac.uk/embl) (http://www.ddbj.nig.ac.jp/)

Bioinformatics

Primary Nucleotide Repository

Primary Protein Repository
• PIR • Swissprot/Uniprot • Protein Data Bank
(http://pir.georgetown.edu) (http:// www.ebi.ac.uk/swissprot) (http://www.rcsb.org/pdb)

CCS HAU

Bioinformatics

Secondary ‘pattern’ databases
PROSITE PRINTS Pfam Profiles BLOCKS IDENTIFY SWISS-PROT SWISS-PROT/TrEMBL SWISS-PROT/TrEMBL SWISS-PROT PRINTS/InterPro/Domo PRINTS/InterPro Regular expressions (patterns) Aligned motifs (fingerprints) Hidden Markov Models (HMMs) Weight matrices (profiles) Weighted motifs (blocks) Permissive regular expressions

CCS HAU
• • •

Bioinformatics

NUCLEOTIDE REPOSITORY
EMBL- European Molecular Biology Laboratory, at Cambridge, UK. GENBANK- at NCBI, a division at NIH campus, USA. DDBJ- DNA Data Bank of Japan, Mishima, Japan

• Since 1982 Work in collaboration. • Collect information from their region. • Automatically update each other every 24 hours. To organize huge amount of information, the database has been split into numerous divisions (17) and each division has specific 3-letter code. e.g.

Human Virus Fungi

HUM VRL FUN

CCS HAU

Bioinformatics

NCBI EMBL

Bioinformatics Centre, BISR, Jaipur

DDBJ

18

CCS HAU


Bioinformatics

The Biological data and databases
Complex
data types range from protein and nucleic acid sequences, texts, 3-dimensional molecular structures, images of cells and tissues


Hierarchical
data organizations range from molecules, biochemical pathways, cells, tissues, organisms, populations



Heterogeneous
database locations, storage formats, and access methods



Dynamic
data contents and database schema are constantly changing

CCS HAU

The computational tools and algorithms


Bioinformatics

Input/Output data formats
Each application program requires specific I/O data formats that may impede data flow from one program to the next



Rapidly evolving
New algorithms development and improvement of old ones



Require graphical display or presentation of results
viewers for sequence alignments, 3-D structures, multidimensional plots,…

Integration
Data Bases and Scientific Algorithms Data Bases and Scientific Algorithms
Medline Medline (Asn.1) (Asn.1)
OMIN (Text File)

Entrez/NCBI Entrez/NCBI (Asn.1) (Asn.1)

Microarray Data (RDBMS, Excel)

BioInformatics BioInformatics
KEGG
(HTML Text, Binary Images)

Integration Integration

ClustalW (FASTA)

BLAST BLAST (FASTA) (FASTA)

PDB PDB (Oracle, 3D images) (Oracle, 3D images)

CCS HAU
• • • • • • •

Bioinformatics

Examples of Bioinformatics
Database interfaces – Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, … Sequence alignment – BLAST, FASTA Multiple sequence alignment – Clustal, MultAlin, DiAlign Gene finding – Genscan, GenomeScan, GeneMark, GRAIL Protein Domain analysis and identification – pfam, BLOCKS, ProDom, Pattern Identification/Characterization – Gibbs Sampler, AlignACE, MEME Protein Folding prediction – PredictProtein, SwissModeler

CCS HAU

Bioinformatics

Five websites that all biologists should know
• NCBI (The National Center for Biotechnology Information; – http://www.ncbi.nlm.nih.gov/ • EBI (The European Bioinformatics Institute) – http://www.ebi.ac.uk/ • The Canadian Bioinformatics Resource – http://www.cbr.nrc.ca/ • SwissProt/ExPASy (Swiss Bioinformatics Resource) – http://expasy.cbr.nrc.ca/sprot/ • PDB (The Protein Databank) – http://www.rcsb.org/PDB/

CCS HAU
Database Growth (cont.)

Bioinformatics

The Human Genome Project and numerous smaller genome projects have kept the data coming at alarming rates. As of February 2001 45 complete, finished genomes are publicly available for analysis, not counting all the virus and viroid genomes available. The International Human Genome Sequencing Consortium announced the completion of a "Working Draft" of the human genome in June 2000.

CCS HAU

Bioinformatics

What is bioinformatics , genomics, sequence analysis, computational molecular biology . . . ? The Reverse Biochemistry Analogy.
Biochemists no longer have to begin a research project by isolating and purifying massive amounts of a protein from its native organism in order to characterize a particular gene product. Rather, now scientists can amplify a section of some genome based on its similarity to other genomes, sequence that piece of DNA and, using sequence analysis tools, infer all sorts of functional, evolutionary, and, perhaps, structural insight into that stretch of DNA!

The computer and molecular databases are a necessary, integral part of this entire process.

Vaccine development In Post-genomic era: Reverse Vaccinology Approach.

CCS HAU

Bioinformatics

CCS HAU
COMPND HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM HETATM CONECT CONECT CONECT CONECT 123.PDB 1 O 2 C 3 N 4 N 5 C 6 C 7 C 8 O 9 C 10 C 11 O 12 C 24 25 26 27 28 29 30 1 2 3 4 H H H H H H H 2 1 2 2

Bioinformatics
-1.250 -2.964 0.008 -0.398 -2.223 0.438 -0.056 -1.110 -0.255 0.215 -2.505 1.614 -0.732 -0.857 -1.489 0.943 -0.166 0.171 1.170 -1.673 2.096 -0.192 0.337 -2.121 -2.208 -0.564 -1.230 1.548 -0.444 1.330 1.716 -1.925 3.144 -1.205 1.278 -2.349 -2.768 3.574 2.610 2.407 -1.351 -0.176 -2.056 3 4 5 6 7 18 0.082 0.173 0.443 1.487 1.949 2.831 4.016 -3.214 1.498 2.943 1.544 -0.315 -1.281 -0.887

CONECT 29 15 CONECT 30 17 END

CCS HAU

Bioinformatics

CCS HAU

• Explosion of information – Need for faster, automated analysis to process large amounts of data – Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…) – Need for “smarter” software to identify interesting relationships in very large data sets • Lack of “bioinformaticians” – Software needs to be easier to access, use and understand – Biologists need to learn about the software, its limitations, and how to interpret its results

Challenges in bioinformatics

Bioinformatics

CCS HAU
•Microarrays •Functional Genomics •Structural Genomics •Comparative Genomics •Pharmacogenomics •Medical Informatics

Bioinformatics

New areas in Bioinformatics

What is bioinformatics?

CCS HAU Your Turn: ANY Question(s)

Bioinformatics

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close