Kanigsberg and now the time has come to look into algorithmic ideas behind genome sequencing. We have a lot of territory to cover, but the good news is that we will be joined today by Dr. Son Pham, one of the leading experts on genome sequencing. I wanted to start from a question. Who are those people, who look like criminal on mug shots? And why they appear on the same slide with three great mathematicians from three different centuries? By the end of this lesson, you will learn the answer to this question, but some of you have already guessed that we are interested in these people because we are interested in their genes. You can think about the human genome as a 3 billion nucleotide long book, written in A C G T alphabet. And some of you may think that since human are so involved, then the genome must be the longest genome known. This is not true. There are are genomes that are hundreds time longer than the human genome. As we speak, there are thousands of genome sequencing projects that is being conducted all over the world. Biologists are interested in genome sequencing because they can learn a lot from studying the genome. For example, they can understand the function of a human gene by studying a similar gene in fly or even in bacteria. And of course there are numerous applications of genome sequencing in medicine, agriculture, biotechnology, and many other fields. Genome sequencing started in 1977 when Walter Gilbert and Frederick Sanger invented the first DNA sequencing technologies. Usually great discoveries wait for two to three decades before they are awarded the Nobel Prize. In this case, the
Nobel committee waited just three years
before it awarded the Nobel Prize to Sanger and Gilbert. It was obvious that this discovery will have enormous implication for science. However as great as this discovery was sequencing technology invented by Sanger and Gilbert was very expensive. It would cost $3 billion to sequence the human genome with this technology, it didn't stop biologists, who in 1990 started Human Genome Project that is still up to this day the largest collaborative research project in biology. The goal of this project was to sequence human genome in 15 years and to deliver to the public by the year 2005. In 1997 there was an unexpected entry into human genome sequencing progress. Craig Venter founded a private company, Solara Genomics, with the stated goal to sequence the human genome ahead of public Human Genome Project. And this was great for everybody, because the race to sequence the human genome intensified, and by the year 2000, five years ahead of schedule, the human genome was sequenced. Immediately afterwards, biologists starting sequencing many other genomes. And here you see all nine mammalian genomes that were sequenced in the next ten years. It looks like great set of animals and a large number of biology, biological sequencing problem. Obviously, biologist was very busy sequencing those genomes. However, they wanted to sequence thousands of genomes. And in the framework of existing technology, it was just not feasible. The existing technology was very expensive. And that's why the number of companies all over the world started a race to invent new next generation sequencing technologies. They succeeded. And they reduced the cost of sequencing by orders of magnitude. As a result, we can now move from sequencing of a reference human genome, to sequencing personal human genomes.
Reference human genome represents an
average genome of average human genome, but we want to find the differences between genomes of different humans, and these differences may appear to be small. in particular between every two human you expect to see roughly one mutation in thousands nucleotides. However, these differences may be extremely important, as they are responsible for thousands of genetic diseases. Personal medicine and personalized genomics have important implication in medicine. And in 2010, Nicolas Volker has become the poster child of personalized genomics. His life was saved by genome sequencing. Poor child went through thousands of surgeries, because doctors failed to diagnose his condition. Afterwards they decided to sequence his genome and found a rare mutation in his gene linked to defect in his immune system. As a result, immunotherapy saved the life of the child. And obviously human genome sequencing cost falling down, we expect that very soon, the cost of sequencing the human genome will fall under $1,000 mark. As a result, sequencing genomes will probably become as routine as x-ray today. At the same time, there are numerous projects to sequence various species. And in 2010 biologists started 10,000 genome project to sequence 10,000 vertebrate species. Just think about this, a single human genome in 2000, 10,000 genomes in the year 2010. To accomplish this genomic revolution. We need to develop algorithmic techniques for sequencing genomes and in the next segment, Dr. Son Pham will tell you how it can be done.