Documente Academic
Documente Profesional
Documente Cultură
Definitions of Bioinformatics
Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
NIH WORKING DEFINITION OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY July 17, 2000
Definitions of Bioinformatics
Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
NIH WORKING DEFINITION OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY July 17, 2000
History
Databases
Pairwise sequence alignment of upstream regions of the predicted HcpR-regulated operons from Desulfovibrio species. (a) sat; (b) apsAB; (c) 206515-206516. Candidate HcpR sites are highlighted in gray. Predicted SD-boxes and start codons of the first genes in the operons are in bold. Predicted '-10' and '-35' promoter boxes are underlined. *Conserved position of alignment.
http://genomebiology.com/2004/5/11/R90/figure/F13
The BLAST algorithm. (a) Given a query sequence of length L, BLAST derives a list of words of length w, where w = 3 for amino-acid sequences (shown) and 11 for nucleotide sequences. There are at most L - w + 1 such words. This word list is then expanded to include all high-scoring matching words, keeping only those that score more than the neighborhood word score threshold T when scored using a scoring matrix such as PAM250 or BLOSUM62. For typical parameter values, this results in about 50 words per residue of the query sequence. (b) The high-scoring word list is compared to the sequence database and exact matches are identified. (c) For each word match, the alignment is extended in both directions to generate alignments that score higher than the score threshold S.
Pertsemlidis and Fondon Genome Biology 2001 2:reviews2002.1 doi:10.1186/gb-2001-2-10-reviews2002
Multiple Alignment
http://www.ctwatch.org/quarterly/print.php?p=60
http://hsc.unm.edu/crtc/willmanresearch/Pages/UNMHSC _HPC_SNL_Methodology.htm#FigS1
Profiles
http://www.ctwatch.org/quarterly/print.php?p=60
Phylogeny
Genome sequences
Genome Sequencing
Random shotgun library Sequencing the ends of
randomly picked clones GSS database Sequence Assembly Gene finding Functional identification Sequence feature identification
Bla1
Tfe2
unknown
Bla1
Tfe2
unknown
Functional Genomics
Functional Genomics
Determining the role of genes through gene disruption (knockouts, underexpression and overexpression Many genes have multiple copies
Proteomics
The complete set of proteins found in each cell is known as the proteome Approximately 25,000 proteins in a plant cell Proteins concentration (and activity) may be different than gene expression due to posttranslational modification
Proteomic scheme
Peltier et al. 2000 Plant Cell 12:319-341