Documente Academic
Documente Profesional
Documente Cultură
Babak Nami
Department of Medical Genetics Sel Seluk University
Human Genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs. 22 of these are autosomal chromosome pairs, while the remaining pair is sex-determining. sexThe haploid human genome occupies a total of just over 3 billion DNA base paires. paires. The haploid human genome contains ca. 23,000 protein-coding proteingenes, genes, far fewer than had been expected before its sequencing. In fact, only about 1.5% of the genome codes for proteins, while the rest consists of non-coding RNA genes, regulatory nongenes, sequences introns, and noncoding DNA (once known as "junk introns, "junk DNA")
Human Genome
Information content of the haploid human genome by chromosome: Haploid means we only count one of each chromosome pair. For this reason, the total information content for a woman (XX) is less than for a man (XY), where both the X and the Y are counted.
The Human Genome Project (HGP) is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA and to identify and map the approximately 20,000 20,000 30,000 gene of the human genome from both a physical and functional standpoint.
History
1985. Proposed. 1988. 1988. Initiated and funded by NIH and US Dept. of Energy ($3 billion set aside) 1990. 1990. Work begins. 1998. 1998. Celera announces a 3-year plan to complete the 3project early 2001.Published 2001.Published in Science and Nature in February, 2002. 2002. The quest for genome sequencing was being pursued simultaneously in over 20 laboratories in six countries 2003. 2003. the whole genome sequenced
History
1990
1995
2000
2005
NIH put the human genome sequence on the web July 7, 2000
UCSC put the human genome sequence on CD in October 2000, with varying results
June 2000 White House announcement that the majority of the human genome (80%) had been sequenced (working draft). Working draft made available on the web July 2000 at genome.ucsc.edu. genome.ucsc.edu. Publication of 90 percent of the sequence in the February 2001 issue of the journal Nature. Nature. Completion of 99.99% of the genome as finished sequence on July 2003.
The first printout of the human genome to be presented as a series of books, displayed at the Wellcome Collection, London Collection,
identify all the approximately 30,000 genes in human DNA, determine the sequences of the 3 billion chemical base pairs that make up human DNA, store this information in databases, improve tools for data analysis, transfer related technologies to the private sector, and address the ethical, legal, and social issues (ELSI) that may arise from the project.
in Medicine
Improvements in diagnostic and therapeutic applications Implementation of preventative measures. Increases in gene therapy applications. applications.
In Biotechnology
Production of useful protein products for use in medicine, agriculture, bioremediation and pharmaceutical industries.
Antibiotics Protein replacement (factor VIII, TPA, streptokinase, insulin, interferon) BT insecticide toxin (from Bacillus thuringiensis) thuringiensis) Herbicide resistance (glyphosate resistance) (glyphosate Bioengineered foods] Pharm animals Pharm
In Bioinformatics
The newest, fastest growing specialty in the life sciences that integrates biotechnology and computer science. Involved in DNA sequence assembly and analysis using computercomputer-based techniques to determine gene function, regulation and control. Unknown gene sequences can be compared to databases of known genes to enable similarities to lead to determination of an unknown genes function.
Proteomics
Investigates patterns and levels of gene expression in diseased cells that can be analyzed to build databases of expression profiles.
In Pharmacogenomics
Investigates DNA mutations associated with disease susceptibility and drug sensitivities. Monoclonal antibodies Prodrug gene therapy for cancers
In Developmental Biology
Regulation of embryonic development. Regulation of the aging process. process. Regulation of cell division and apoptosis. Regulation of metabolism.
Develop technologies for rapid, large-scale largeidentification and scoring of single-nucleotide singlepolymorphisms and other DNA sequence variants. Identify common variants in the coding regions of the majority of identified genes during this 5-year period. 5Create a SNP map of at least 100,000 markers. Develop the intellectual foundations for studies of sequence variation. Create public resources of DNA samples and cell lines.
Model organisms
Bacteria (E. coli, influenza, several others) (E. coli, Yeast (Saccharomyces cerevisiae) (Saccharomyces cerevisiae) Plant (Arabidopsis thaliana) (Arabidopsis thaliana) Roundworm (Caenorhabditis elegans) (Caenorhabditis elegans) Fruit fly (Drosophila melanogaster) (Drosophila melanogaster) Mouse (Mus musculus) (Mus musculus)
Estimated Genes
30,000 30,000 25,000 19,000 13,000 6,000 3,200 9
AAGTTC
CTAAGC
ATTCGG
AAGTTC
CTAAGC
AAGTTC
Practical Goals
A primary goal of the Human Genome Project is to make a series of descriptive diagrams maps of each human chromosome at increasingly finer resolutions. After mapping is completed, the next step is to determine the base sequence of each of the ordered DNA fragments. The ultimate goal of genome research is to find all the genes in the DNA sequence and to develop tools for using this information in the study of human biology and medicine.
Practical Goals
http://www.genome.gov/Pages/News/PaceofDiseaseGeneDiscovery.pdf
DNA from 5 humans 2 males, 3 females Cut up DNA with restriction enzymes Ligated into BACs & YACs, then grew them up Sequenced the BACs Let a supercomputer put the pieces together
Sequencing Strategy
Once a contig map of the genome was obtained, it was necessary to sequence each individual clone. Most of the actual human genome sequencing was done on BAC clones, which are less prone to rearrangement than YAC clones. BACs are about 100-200 100kbp long. Large clones are generally sequenced by shotgun sequencing: The large cloned sequencing: DNA is randomly broken up into a series of small fragments ( less than 1 kb). These fragments are cloned and sequenced. A computer program then assembles them based on overlaps between the sequences of each clone. To ensure that every bit has been covered, you need to sequence random clones until you have covered each spot 5-10 times on 5average.
DNA
Lots of overlap
Known sequence
Each clone 150-200,000 bp Cloned in bacteria 20,000 BAC clones (BAC library)
clones
Subclones 2,000 bp
subclones
Sequencing Technologies
The two basic sequencing approaches, Maxam-Gilbert and Sanger, differ primarily in the way the nested DNA fragments are produced. Maxam-Gilbert sequencing (also called the chemical degradation method) uses chemicals to cleave DNA at specific bases, resulting in fragments of different lengths. A refinement to the Maxam-Gilbert method known as multiplex sequencing enables investigators to analyze about 40 clones on a single DNA sequencing gel. Sanger sequencing (also called the chain termination or dideoxy method) involves using an enzymatic procedure to synthesize DNA chains of varying length in four different reactions, stopping the DNA replication at positions occupied by one of the four bases, and then determining the resulting fragment lengths.
Advanced Techniques
SOLiD Sequencing Helicos High speed Gene Sequencing Laser Sequencing
DoubleTwist Inc, an application service provider (ASP), devoted to empower life scientists, completed the first annotation of the human genome. The DoubleTwist human genome database was created using Sun Enterprise 420R and 10 K supercomputers, that is, a total of more than 350 processors. processors. It brought to a close an extensive analysis of the available HGP data that revealed genes and other valuable information. The task was accomplished using Sun Enterprise supercomputers, including Starfire servers.
Genome Map
A genome map describes the order of genes or other markers and the spacing between them on each chromosome. Human genome maps are constructed on several different scales or levels of resolution.
Genetic Map
Genetic linkage maps of each chromosome are made by determining how frequently two markers are passed together from parent to child. Because genetic material is sometimes exchanged during the production of sperm and egg cells, groups of traits (or markers) originally together on one chromosome may not be inherited together.
Physical Maps
Different types of physical maps vary in their degree of resolution. The lowestresolution physical map is the chromosomal (sometimes called cytogenetic) map, which is based on the distinctive banding patterns observed by light microscopy of stained chromosomes. A cDNA map shows the locations of expressed DNA regions (exons) on the chromosomal map. The more detailed cosmid contig map depicts the order of overlapping DNA fragments spanning the genome. A macrorestriction map describes the order and distance between enzyme cutting (cleavage) sites. The highest-resolution physical map is the complete elucidation of the DNA base-pair sequence of each chromosome in the human genome. Physical maps are described in greater detail below.
caggcggactcagtggatctggccagctgtgacttgacaag caggcggactcagtggatctagccagctgtgacttgacaag
The map was built by linkage studies in 60 large families with grandparents and large numbers of children, collected by the University of Utah and the Centre d'tude du Polymorphisme Humain (CEPH), Paris Families were typed with over 5000 polymorphic DNA sequences: 60% were microsatellite repeats (mostly dinucleotide (CA) repeats, also some tri- and tetratritetranucleotides). Only about 400 of them were actual genes Construction of the linkage map is a very big problem; sophisticated software was used to work out the "best fit" map of all the markers, with advanced statistical methods and algorithms
Sequence tagged sites (STSs) are specific loci in the genome, for which enough DNA sequence is available to make PCR primers to amplify the locus (usually as a fragment of a few 100bp). These include microsatellites (e.g. CA repeats) that can be used for linkage studies. The information required to use an STS is just the sequences of the PCR primers; therefore it is very easy to make databases of STSs that can be used by anyone. No actual bits of DNA need change hands. This is crucial in allowing genome projects to proceed as international collaborations, with many laboratories participating in a co-ordinated way. coESTs act as specific tags for each human gene, since they are derived by sequencing cDNA clones which came from mRNA and therefore represent the actual transcribed sequences (as opposed to STSs, which can be derived from anywhere in the genome and are mostly non-coding). They allow rapid access nonto the actual genes, ignoring introns and junk DNA
ESTs can be 3' or 5' depending on which end of the cDNA was sequenced. Because of the methods used to make cDNA libraries, parts of the 5' end of the gene are often lost during cloning whereas the 3' end is more reliable. Therefore, the same gene may give different 5' ESTs and it will difficult to deduce whether they have come from the same gene. This shown on the diagram by the white boxes representing cDNA clones being different lengths. Another complication is due to
X-ray hybrids are made by irradiating a human cell line with 3000 rad of X-rays, fusion to hamster cells, and isolation of hybrid cell lines in Xculture A panel of 100-200 hybrids with 5-10 different fragments of human 1005DNA in each gives about 1000 fragments in total, i.e. the human genome has been divided into 1000 bits. The closer together 2 markers are in the genome, the more likely it is that they will be present in the same hybrids (since they are less likely to be separated by an X-ray induced break). XBy doing a PCR assay for each marker on all the hybrids, a map can be made. The units are called cR (centiray, where 1cR is a 1% chance that the markers will be separated by X-ray breakage). X-
For each pair of markers in turn the "co-retention frequency" is the number of hybrids in which both markers are present, divided by the number of hybrids in which one or other (or both) markers are present. On the figure, there are 5 hybrids containing both markers B and C, and 6 containing B and/or C. Therefore the co-retention frequency is 5/6 or 0.83. Likewise it is 6/7 for markers E and F, and 2/10 for markers C and E. This shows that B and C are close together, E and F are close
Clone contigs
A clone contig is a series of cloned DNA segments that overlap each other, assembled in the correct order along the genome The clones are made using vectors:
cosmids (capacity 45 kb) BACs or YACs (Bacterial or Yeast Artificial Chromosomes) which can clone 100s of kb of DNA - more suitable for dealing with large stretches of mammalian DNA.