Documente Academic
Documente Profesional
Documente Cultură
Eukaryotic genomes are complex and DNA amounts and organization vary widely between species C value paradox: the amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity or number of genes
C-value Paradox
Drosophila has 20X smaller genome than human and 2X fewer genes Newt and lungfish genomes ~ 5 and 50 x larger than human
Re-association Kinetics
There are different classes of eukaryotic DNA based on sequence complexity revealed by re-association kinetics
Chr - chromosome n - Number of samples examined bp - Number of basepairs sequences S - Number of polymorphic sites T - Nucleotide divergence
On average ~ 0.1%
Genome Organization
gene identification Genes can be difficult to identify/predict based on genome sequence The human genome appears to contain fewer genes than originally predicted; but an estimated 35,000 genes produce an estimated 150,000 proteins
Genome Organization
gene identification No one to one correspondence between: Genome (all genes of an organism) Transcriptome (all transcripts of an organism) Proteome (all proteins of an organism
Gene identification the challenges Non coding sequences Promoters and enhancers of gene expression can be distant from the coding region itself Genes can have alternative promoters Genes can have alternative terminators
Exon shuffling
Different genes having similar exons
Moonlighting
Same protein different function depending on cellular location
Gene Identification
Open reading frames Sequence conservation Database searches Synteny Sequence features CpG islands Evidence for transcription ESTs, microarrays Gene inactivation Transformation, TEs, RNAi
agc S agg R
tga * acg T
ata I atg M
2 69
Genome Organization
duplicated genes Gene families
paralogs orthologs (homologs)
Pseudogenes
Duplicated genes
Paralogs = evolved one from another through gene duplication Encode closely related proteins Formed by duplication of an ancestral gene followed by mutation
Pseudogenes
Nonfunctional copies of genes Formed by duplication of ancestral gene, or by reverse transcription and integration of the cDNA Not expressed due to mutations that produce a stop codon, nonsense or frameshift, or mutations that prevent mRNA transcription or processing
Duplicated genes
Can be clustered as in -globin cluster, or dispersed in genome as seen for entire globin family in humans
Duplicated genes
Paralogs vs orthologs (or homologs) Different members of the globin gene family are paralogs, having evolved one from another through gene duplication. Paralogs are separated by a gene duplication event. Each specific family member (e.g. globin human) is an ortholog (homolog) of the same family member in another species. Both evolved from an ancestral globin gene. Orthologs (homologs) are separated by a speciation event. It is not always easy to distinguish true orthologs from paralogs when comparing large multigene families between species. Especially in polyploid organisms!
Ribosomal RNAs
Tandem arrays on several chromosomes 150-200 copies of 28S 5.8S 18S cluster 200-300 copies of 5S cluster
Genome Organization
repetitive DNA
~ 50% of human genome Moderately repeated DNA
Tandemly repeated rRNA, tRNA and histone genes (gene products needed in high amounts) Large duplicated gene families Mobile DNA Segmental duplications
Repetitive DNA Transposon derived repeats Different regions of the genome differ in density of repeats Most LINEs accumulate in AT rich regions Alu elements accumulate in GC rich regions
Genome Organization
repetitive DNA
Simple-sequence Repeats 3% of genome Highly repeated short sequences found in centromeres and telomeres Variable numbers of tandem repeats (VNTR) dispersed throughout the genome
14 500 repeats
minisatellites