Documente Academic
Documente Profesional
Documente Cultură
Mandrita Mondal Natural Computing Lab Electronics and Communication Sciences Unit Indian Statistical Institute, Kolkata
link: http://naturalcomputinglab.weebly.com/
1
What is DNA?
DNA (Deoxyribonucleic acid) is a molecule that encodes the genetic instructions. It is present in living organisms and many viruses. In 1953, James D. Watson and Francis Crick discovered the molecular structure of DNA.
DNA is a long polymer which is made from repeating units called deoxyribonucleotides (or nucleotides). The components of this individual monomer i.e. nucleotide are: 2'-deoxyribose nitrogen base phosphate group The backbone of DNA is made of alternating sugars and phosphate groups with the nucleobases attached to the sugars.
Nucleobases can be four types: 1. Adenine (A) 2. Guanine (G) 3. Cytosine (C) 4. Thymine (T)
DNA is usually composed of two polynucleotide chains twisted around each other in the form of a double helix. The two strands of DNA run in opposite directions to each other and are therefore anti-parallel, one backbone being 3 and the other 5.
Watson-Crick Complementary:
Base pairs formed between specific nucleobases , are the building blocks of the DNA double helix. Adenine-Thymine and Guanine-Cytosine are two Watson-Crick base pairs, bound by hydrogen bonds.
Contd.
Contd. 3. Amplification
Contd. 4. Separation
Contd. 5. Extraction
Contd. 6. Cutting
Contd. 7. Ligation
Contd. 8. Substituting
Contd.
9. Marking 10. Destroying
DNA Computing
DNA computing is a rising interdisciplinary field that uses the four DNA bases (A, T, G, C) to perform computation. On 1959 Richard P. Feynman delivered a seminal lecture, Theres Plenty of Room at the Bottom. Actually he wanted to talk on the problem of manipulating and controlling things on a small scale.
In 1994 Leonard Adleman demonstrated a proof-ofconcept using DNA by solving seven-point Hamiltonian Path Problem.
Advantages of DNA computing: Massive parallelism Potential for information storage Speed Energy efficiency
Adlemans Experiment:
Solved Hamiltonian Path Problem by DNA Computing
Step 1: Generate random paths through the graph. Let v1 is encoded by the sequence, 5' TATCGGATCGGTATATCCGA 3' and v2 is encoded by the sequence, 5' GCTATTCGAGCTTAAAGCTA 3' then, the DNA sequence representing directed edge e12 is, 3' CATATAGGCTCGATAAGCT 5' The path formed due to hybridization and ligation reaction
Step 2: Keep only those paths that begin with vin and end with vout The product of step 1 was amplified by PCR using v0 and complementary sequence of v6 as primer. Thus, only those sequences encoding paths that began with vertex 0 and ended with vertex 6 were amplified.
Step 3: If the graph has n vertices, then keep only those paths that enter exactly n vertices
Gel electrophoresis by agarose gel was performed with the product of step 2. DNA sequences from 140 base pairs (bp) band of the gel were extracted. These sequences represented those paths which entered exactly seven vertices.
Step 4: Keep only those paths that enter all of the vertices of the graph at least once
The product of step 3 was affinity-purified. Only those single stranded DNA molecules that contained the sequence v1 were retained. This process was repeated successively with complementary sequence of all vertices vi (2 i 5)
Step 5: If any path remain, say Yes; otherwise, say No The product of step 4 was amplified by PCR and run on a gel. Hamiltonian path in Adleman's graph is shown below;
Similarity-Based Fuzzy Reasoning by DNA Computing Fuzzy logic resembles human decision making with its ability to work from approximate data and find precise solutions. We attempt to realize the basic approach to the similarity-based fuzzy reasoning by synthetic fuzzy DNA. We replace the logical aspect of fuzzy reasoning by DNA chemistry.
Applicable Form of Fuzzy Reasoning: A generalized form of fuzzy reasoning is several fuzzy conditional propositions combined with else. Premise 1 : If X is A1 and Y is B1 then Z is C1 else Premise 2 : If X is A2 and Y is B2 then Z is C2 else Premise n : If X is An and Y is Bn then Z is Cn else Premise n+1: If X is A and Y is B _________________________________________ Consequence: Z is C
We interpret else as union (max) or intersection (min) which is valid for the particular fuzzy implication.
We will realize this problem with a simple example of Height (Ht), Weight (Wt) and Body Mass Index (BMI).
Quantization of Height
Each discrete element of the quantized universe Ht is represented by five base DNA sequence. Each segment of discrete universe is linguistically termed as very short, very tall etc.
Quantization of Weight
Quantization of BMI
Fuzzy DNA: The DNA sequences representing several elements of discrete universe associated with their corresponding membership values. Example: Fuzzy DNA representing Medium Ht (I)
Algorithm
Implementation of the algorithm: All the rules of the problem and the observed data are coded in the form of double stranded DNA sequence.
From the set of rules, limited numbers of rules are extracted to draw the consequence depending on the degree of similarity between the antecedent part of each rule and the observed data of the corresponding domain.
Similarity measurement:
Suppose, we want to measure which primary fuzzy set among "short (II)", "very short (II)" and "medium height (I) is similar to the set "short (I)". First encode all the sequences. Let, the fuzzy DNA sequence for the primary fuzzy set short (I) be:
First amplification of desired subsequence is done by PCR with specific primer. The subsequence for amplification is from the starting point of the said primary fuzzy set represented by fuzzy DNA sequence to the position of the same fuzzy DNA sequence where the short DNA sequence CTGGA occurs.
We follow the same procedure for primary fuzzy sets short (I), very short (II) and medium height (I).
The amplified DNA sequences, derived from the fuzzy DNA sequences short (I), short (II), very short (II) and medium height (I), run through agarose gel to separate the sequences according to their length by gel electrophoresis. The lengths of the amplified sequences are given below: For short (I) 34 bp For short (II) 47 bp For very short (I) 20 bp For medium height (I) 62 bp
The differences in lengths are given below: short (I) and short (II) (4734) bp = 13 bp short (I) and very short (II) (3420) bp = 14 bp short (I) and medium height (I) (6234) bp = 28 bp.
similarity. Threshold of the difference between lengths of fuzzy DNA sequences is considered as 15 bp. Based on the similarity, in terms of difference of lengths between two sequences, we select a particular rule if at least one antecedent clause of that rule is within the limit of similarity. Once a rule is selected then we modify the consequence of that rule based on the highest degree of dissimilarity between the observed data and any of the antecedent clauses of that rule.
Thus, we can derive the sequence of C' (consequence of the given antecedent data i.e. A' & B') from the sequences C1, C7, C8 and C9. The membership value of the elements of the consequent domain are reduced depending on the degree of dissimilarity among the remaining antecedent clauses (other than the similar clauses).
For C1, the membership values of the elements are reduced by 3 bases as the difference between B' and B1 is 27bp.
C1 before modification:
C1 after modification:
For C7, the membership values of the elements are reduced by 5 bases as the difference between B' and B7 is 43bp. For C8, the membership values of the elements are reduced by 7 bases as the difference between A' and A8 is 65bp. For C9, the membership values of the elements are reduced by 4 bases as the difference between A' and A9 is 34bp.
Affinity purification is conducted with the complementary sequences to each DNA oligonucleotide representing the elements of the consequent part. The sequences having identical short sequences are stored in same test tubes.
Gel electrophoresis is performed with the sequences of each test tubes separately. This step is done to get the sequence having highest length (i.e. the sequence having highest membership value attached to it). After gel electrophoresis the sequences having the highest length are extracted from each gel.
One sequence is extracted from each of the six gels after performing gel electrophoresis. The obtained sequences are of the highest length of the corresponding test tube:
The blunt end of the double stranded DNA sequences selected from previous step are ligated by T4 DNA ligase. The resultant sequences after ligation have different length. To get the resultant sequence the following steps are performed: i.Apply the method of affinity to make sure that in every sequence each short sequences are present at least once . ii.Perform gel electrophoresis by the above sequences of different length to extract the sequences having the length of 45 to 55 bp. iii.The selected sequence from the previous step is the most possible result of the problem. The order of the bases of the resultant sequence can be obtained from sequencer.
The final sequence representing the consequence of the observed antecedent states (i.e. C) is:
The linguistic variable Membership value of under weight (CTAAG) = 0.6. Membership value of over weight (TAGCT) = 0.5. Membership value of normal weight (AGGAA) = 0.4. Membership value of obesity (I) (GCGCG) = 0.3. Membership value of obesity (II) (CTAAG) = 0.1. Membership value of morbid obesity (CTAAG) = 0.
We can apply the present approach to fuzzy reasoning based on DNA computing to different areas of pattern classification, object recognition, control problems, weather forecasting, etc.
Classification of SODAR Data by DNA Computing We attempt to classify SODAR data by similarity-based fuzzy reasoning using synthetic fuzzy DNA.
In this experiment we have encoded the expert rules in the form of DNA sequence. In the problem, the value of SSG (Ss') and HFG (Hf') are given. We have classify the SODAR pattern i.e. to compute the possibility of the data to lie in thermal plumes (C1) and inversion (C2).
References
Adleman, L. (1994), Molecular computation of solutions to combinatorial problems, Science, Vol. 266, pp. 1021-1024. Adleman, L., Rothemund, P., Roweis, S., Winfree, E. (1996), On applying molecular computation to the Data Encryption Standard. 2nd DIMACS workshop on DNA based computers, Princeton, 28-48. Head, T., (1987), Formal language theory and DNA: an analysis of the generative capacity of recombinant behaviors, Bulletin of Mathematical Biology, Vol. 49, pp. 737759.
Head, T.,(1998) Hamiltonian paths and Double Stranded DNA in computing with Bio-molecules; Theory and Experiments, eds: Gheorghe Paun, Springer, 1998, PP 80-92.
Kari, L. (1997) "DNA computing: The arrival of biological mathematics". The Mathematical Intelligencer, 19, pp.9-22.
Lipton, R. (1995) "DNA solution of hard computational problems", Science, 268, 542545.
References (contd.)
Lipton, R.J., Landweber, L.F. & Rabin, M.O. (1997) "DNA Based Computers III", DIMACS Workshop, June 23-27, University of Pennsylvania (eds Rubin, H. & Wood, D.H.) 161-172 (American Mathematical Society, Providence, Rhode Island). Liu, Q. et al. (2000) "DNA Computing on Surfaces", Nature 403, 175-179. Mizumoto, M. (1985) : Extended fuzzy reasoning, Approximate Reasoning in Expert Systems (ed. Gupta, M.M., Kandel, A., Bandler, W. & Kiszka, J.B.), Elsevier Science Publishers B.V.(North-Holland), 71-85 (1985). Mizumoto,M. (1985) : Fuzzy reasoning for "If...Then...Else..." under new compositional rules of inference, Management Decision Support Systems Using Fuzzy Sets and Possibility Theory (ed. by J.Kacprzyk and R.R.Yager), Verlag TUV Rheinland, W.Germany, 229-239 (1985). Ray, K.S. and Chatterjee, P. (2010) "Approximate reasoning on a DNA-Chip", International Journal of Intelligent Computing and Cybernetics, Vol. 3, No.3, pp.514-553. Ray, K.S. and Mondal, M. (2010) "Similarity-based Fuzzy Reasoning by DNA Computing", International Journal of Bio-inspired Computation, Vol. 3, No. 2, pp. 112-122.
References (contd.)
Ray, K.S. and Mondal, M. (2011), "Classification of SODAR data Using DNA Computing", New Mathematics and Natural Computation, Vol. 7, No. 3, pp. 413-432. Ray, K.S., and Mondal, M. (2011), "Fuzzy Molecular Automaton Using Splicing Theory", International Journal of Bio-inspired Computation, Vol. 3, No. 5, pp. 320-330. Ray, K.S. and Mondal, M. (2012) "Reasoning with Disposition using DNA Tweezers", International Journal of Bio-inspired Computation, Vol. 4, No. 5, pp. 302-318. Ruben, A.J. & Landweber, L.F. (2000) "The Past, Present and Future of Molecular computing", Nature Rev. Mol. Cell Biol. 1, 69-72. Winfree, E., Liu, F. R., Wenzler, L. A. & Seeman, N. C. (1998), "Design and selfassembly of two-dimensional DNA crystals" Nature Vol. 394, No. 6693, pp. 539-544. Zhang, X., Wang, Y., Cui, G., Niu, Y. and Xu J. (2009), " Application of a novel IWO to the design of encoding sequences for DNA computing ", Computers & Mathematics with Applications, Vol. 57, Issues No. 11-12, pp. 2001-2008.
THANK YOU ..