Sunteți pe pagina 1din 68

Reasoning and Classification based on DNA Computing

Mandrita Mondal Natural Computing Lab Electronics and Communication Sciences Unit Indian Statistical Institute, Kolkata
link: http://naturalcomputinglab.weebly.com/
1

What is DNA?
DNA (Deoxyribonucleic acid) is a molecule that encodes the genetic instructions. It is present in living organisms and many viruses. In 1953, James D. Watson and Francis Crick discovered the molecular structure of DNA.

DNA is a long polymer which is made from repeating units called deoxyribonucleotides (or nucleotides). The components of this individual monomer i.e. nucleotide are: 2'-deoxyribose nitrogen base phosphate group The backbone of DNA is made of alternating sugars and phosphate groups with the nucleobases attached to the sugars.

Nucleobases can be four types: 1. Adenine (A) 2. Guanine (G) 3. Cytosine (C) 4. Thymine (T)

DNA is usually composed of two polynucleotide chains twisted around each other in the form of a double helix. The two strands of DNA run in opposite directions to each other and are therefore anti-parallel, one backbone being 3 and the other 5.

Watson-Crick Complementary:
Base pairs formed between specific nucleobases , are the building blocks of the DNA double helix. Adenine-Thymine and Guanine-Cytosine are two Watson-Crick base pairs, bound by hydrogen bonds.

Operations on DNA string: 1. Synthesis

Contd.

2. Melting and Annealing

Contd. 3. Amplification

Contd. 4. Separation

Contd. 5. Extraction

Contd. 6. Cutting

Contd. 7. Ligation

Contd. 8. Substituting

Contd.
9. Marking 10. Destroying

Contd. 11. Reading

DNA Computing
DNA computing is a rising interdisciplinary field that uses the four DNA bases (A, T, G, C) to perform computation. On 1959 Richard P. Feynman delivered a seminal lecture, Theres Plenty of Room at the Bottom. Actually he wanted to talk on the problem of manipulating and controlling things on a small scale.

In 1994 Leonard Adleman demonstrated a proof-ofconcept using DNA by solving seven-point Hamiltonian Path Problem.

Advantages of DNA computing: Massive parallelism Potential for information storage Speed Energy efficiency

Adlemans Experiment:
Solved Hamiltonian Path Problem by DNA Computing

Step 1: Generate random paths through the graph. Let v1 is encoded by the sequence, 5' TATCGGATCGGTATATCCGA 3' and v2 is encoded by the sequence, 5' GCTATTCGAGCTTAAAGCTA 3' then, the DNA sequence representing directed edge e12 is, 3' CATATAGGCTCGATAAGCT 5' The path formed due to hybridization and ligation reaction

Step 2: Keep only those paths that begin with vin and end with vout The product of step 1 was amplified by PCR using v0 and complementary sequence of v6 as primer. Thus, only those sequences encoding paths that began with vertex 0 and ended with vertex 6 were amplified.

Step 3: If the graph has n vertices, then keep only those paths that enter exactly n vertices
Gel electrophoresis by agarose gel was performed with the product of step 2. DNA sequences from 140 base pairs (bp) band of the gel were extracted. These sequences represented those paths which entered exactly seven vertices.

Step 4: Keep only those paths that enter all of the vertices of the graph at least once

The product of step 3 was affinity-purified. Only those single stranded DNA molecules that contained the sequence v1 were retained. This process was repeated successively with complementary sequence of all vertices vi (2 i 5)

Step 5: If any path remain, say Yes; otherwise, say No The product of step 4 was amplified by PCR and run on a gel. Hamiltonian path in Adleman's graph is shown below;

Similarity-Based Fuzzy Reasoning by DNA Computing Fuzzy logic resembles human decision making with its ability to work from approximate data and find precise solutions. We attempt to realize the basic approach to the similarity-based fuzzy reasoning by synthetic fuzzy DNA. We replace the logical aspect of fuzzy reasoning by DNA chemistry.

Applicable Form of Fuzzy Reasoning: A generalized form of fuzzy reasoning is several fuzzy conditional propositions combined with else. Premise 1 : If X is A1 and Y is B1 then Z is C1 else Premise 2 : If X is A2 and Y is B2 then Z is C2 else Premise n : If X is An and Y is Bn then Z is Cn else Premise n+1: If X is A and Y is B _________________________________________ Consequence: Z is C

We interpret else as union (max) or intersection (min) which is valid for the particular fuzzy implication.

We will realize this problem with a simple example of Height (Ht), Weight (Wt) and Body Mass Index (BMI).

Let a universe of discourse be height (Ht).

Quantization of Height

Each discrete element of the quantized universe Ht is represented by five base DNA sequence. Each segment of discrete universe is linguistically termed as very short, very tall etc.

Quantization of Weight

Quantization of BMI

Representation fuzzy set of Ht with membership function

Representation fuzzy set of Wt with membership function

Representation fuzzy set of BMI with membership function

Representation of the DNA sequences with their membership value

Statement of the problem:


If A1 Medium Ht(I) and B1 Very Light(II) then If A2 Short(I) and B2 Very Heavy(I) then If A3 Very Tall(I) and B3 Heavy(I) then If A4 Very Short(I) and B4 Heavy(II) then If A5 Short(II) and B5 Very Heavy(II) then else If A6 Very Short(II) and B6 Medium Wt(II) then If A7 Tall(I) and B7 Very Heavy(I) then If A8 Very Short(I) and B8 Light(I) then If A9 Very Tall(II) and B9 Medium Wt(I) then If A' Medium Ht(II) and B' Light(II) then C1 Under Wt else C2 Obesity (Class II) else C3 Normal Wt else C4 Morbid Obesity else C5 Obesity (Class II) C6 Obesity (Class I) else C7 Over Wt else C8 Obesity (Class I) else C9 Under Wt C' ?

Fuzzy DNA: The DNA sequences representing several elements of discrete universe associated with their corresponding membership values. Example: Fuzzy DNA representing Medium Ht (I)

Algorithm

Implementation of the algorithm: All the rules of the problem and the observed data are coded in the form of double stranded DNA sequence.

From the set of rules, limited numbers of rules are extracted to draw the consequence depending on the degree of similarity between the antecedent part of each rule and the observed data of the corresponding domain.

Similarity measurement:
Suppose, we want to measure which primary fuzzy set among "short (II)", "very short (II)" and "medium height (I) is similar to the set "short (I)". First encode all the sequences. Let, the fuzzy DNA sequence for the primary fuzzy set short (I) be:

First amplification of desired subsequence is done by PCR with specific primer. The subsequence for amplification is from the starting point of the said primary fuzzy set represented by fuzzy DNA sequence to the position of the same fuzzy DNA sequence where the short DNA sequence CTGGA occurs.

We follow the same procedure for primary fuzzy sets short (I), very short (II) and medium height (I).

The amplified DNA sequences, derived from the fuzzy DNA sequences short (I), short (II), very short (II) and medium height (I), run through agarose gel to separate the sequences according to their length by gel electrophoresis. The lengths of the amplified sequences are given below: For short (I) 34 bp For short (II) 47 bp For very short (I) 20 bp For medium height (I) 62 bp

The differences in lengths are given below: short (I) and short (II) (4734) bp = 13 bp short (I) and very short (II) (3420) bp = 14 bp short (I) and medium height (I) (6234) bp = 28 bp.

Less the difference in length, more is the

similarity. Threshold of the difference between lengths of fuzzy DNA sequences is considered as 15 bp. Based on the similarity, in terms of difference of lengths between two sequences, we select a particular rule if at least one antecedent clause of that rule is within the limit of similarity. Once a rule is selected then we modify the consequence of that rule based on the highest degree of dissimilarity between the observed data and any of the antecedent clauses of that rule.

Thus, we can derive the sequence of C' (consequence of the given antecedent data i.e. A' & B') from the sequences C1, C7, C8 and C9. The membership value of the elements of the consequent domain are reduced depending on the degree of dissimilarity among the remaining antecedent clauses (other than the similar clauses).

Reduction in membership value (bases) due to length difference of sequences

For C1, the membership values of the elements are reduced by 3 bases as the difference between B' and B1 is 27bp.
C1 before modification:

C1 after modification:

For C7, the membership values of the elements are reduced by 5 bases as the difference between B' and B7 is 43bp. For C8, the membership values of the elements are reduced by 7 bases as the difference between A' and A8 is 65bp. For C9, the membership values of the elements are reduced by 4 bases as the difference between A' and A9 is 34bp.

Affinity purification is conducted with the complementary sequences to each DNA oligonucleotide representing the elements of the consequent part. The sequences having identical short sequences are stored in same test tubes.

Sequences of test tube 1:

Sequences of test tube 2:

Sequences of test tube 3:

Sequences of test tube 4:

Sequences of test tube 5:

Sequences of test tube 6:

Gel electrophoresis is performed with the sequences of each test tubes separately. This step is done to get the sequence having highest length (i.e. the sequence having highest membership value attached to it). After gel electrophoresis the sequences having the highest length are extracted from each gel.

One sequence is extracted from each of the six gels after performing gel electrophoresis. The obtained sequences are of the highest length of the corresponding test tube:

The blunt end of the double stranded DNA sequences selected from previous step are ligated by T4 DNA ligase. The resultant sequences after ligation have different length. To get the resultant sequence the following steps are performed: i.Apply the method of affinity to make sure that in every sequence each short sequences are present at least once . ii.Perform gel electrophoresis by the above sequences of different length to extract the sequences having the length of 45 to 55 bp. iii.The selected sequence from the previous step is the most possible result of the problem. The order of the bases of the resultant sequence can be obtained from sequencer.

The final sequence representing the consequence of the observed antecedent states (i.e. C) is:

The linguistic variable Membership value of under weight (CTAAG) = 0.6. Membership value of over weight (TAGCT) = 0.5. Membership value of normal weight (AGGAA) = 0.4. Membership value of obesity (I) (GCGCG) = 0.3. Membership value of obesity (II) (CTAAG) = 0.1. Membership value of morbid obesity (CTAAG) = 0.

So, we can say that:


If A1 Medium Ht(I) and B1 Very Light(II) then If A2 Short(I) and B2 Very Heavy(I) then If A3 Very Tall(I) and B3 Heavy(I) then If A4 Very Short(I) and B4 Heavy(II) then If A5 Short(II) and B5 Very Heavy(II) then else If A6 Very Short(II) and B6 Medium Wt(II) then If A7 Tall(I) and B7 Very Heavy(I) then If A8 Very Short(I) and B8 Light(I) then If A9 Very Tall(II) and B9 Medium Wt(I) then If A' Medium Ht(II) and B' Light(II) then C1 Under Wt else C2 Obesity (Class II) else C3 Normal Wt else C4 Morbid Obesity else C5 Obesity (Class II) C6 Obesity (Class I) else C7 Over Wt else C8 Obesity (Class I) else C9 Under Wt C' Under weight

Comparative study of inferred consequence:

We can apply the present approach to fuzzy reasoning based on DNA computing to different areas of pattern classification, object recognition, control problems, weather forecasting, etc.

Classification of SODAR Data by DNA Computing We attempt to classify SODAR data by similarity-based fuzzy reasoning using synthetic fuzzy DNA.

In this experiment we have encoded the expert rules in the form of DNA sequence. In the problem, the value of SSG (Ss') and HFG (Hf') are given. We have classify the SODAR pattern i.e. to compute the possibility of the data to lie in thermal plumes (C1) and inversion (C2).

Time domain of the Classes

Frequency domain of the Classes

Scatter Plot of Thermal Plumes and Inversion

References
Adleman, L. (1994), Molecular computation of solutions to combinatorial problems, Science, Vol. 266, pp. 1021-1024. Adleman, L., Rothemund, P., Roweis, S., Winfree, E. (1996), On applying molecular computation to the Data Encryption Standard. 2nd DIMACS workshop on DNA based computers, Princeton, 28-48. Head, T., (1987), Formal language theory and DNA: an analysis of the generative capacity of recombinant behaviors, Bulletin of Mathematical Biology, Vol. 49, pp. 737759.

Head, T.,(1998) Hamiltonian paths and Double Stranded DNA in computing with Bio-molecules; Theory and Experiments, eds: Gheorghe Paun, Springer, 1998, PP 80-92.
Kari, L. (1997) "DNA computing: The arrival of biological mathematics". The Mathematical Intelligencer, 19, pp.9-22.

Lipton, R. (1995) "DNA solution of hard computational problems", Science, 268, 542545.

References (contd.)
Lipton, R.J., Landweber, L.F. & Rabin, M.O. (1997) "DNA Based Computers III", DIMACS Workshop, June 23-27, University of Pennsylvania (eds Rubin, H. & Wood, D.H.) 161-172 (American Mathematical Society, Providence, Rhode Island). Liu, Q. et al. (2000) "DNA Computing on Surfaces", Nature 403, 175-179. Mizumoto, M. (1985) : Extended fuzzy reasoning, Approximate Reasoning in Expert Systems (ed. Gupta, M.M., Kandel, A., Bandler, W. & Kiszka, J.B.), Elsevier Science Publishers B.V.(North-Holland), 71-85 (1985). Mizumoto,M. (1985) : Fuzzy reasoning for "If...Then...Else..." under new compositional rules of inference, Management Decision Support Systems Using Fuzzy Sets and Possibility Theory (ed. by J.Kacprzyk and R.R.Yager), Verlag TUV Rheinland, W.Germany, 229-239 (1985). Ray, K.S. and Chatterjee, P. (2010) "Approximate reasoning on a DNA-Chip", International Journal of Intelligent Computing and Cybernetics, Vol. 3, No.3, pp.514-553. Ray, K.S. and Mondal, M. (2010) "Similarity-based Fuzzy Reasoning by DNA Computing", International Journal of Bio-inspired Computation, Vol. 3, No. 2, pp. 112-122.

References (contd.)
Ray, K.S. and Mondal, M. (2011), "Classification of SODAR data Using DNA Computing", New Mathematics and Natural Computation, Vol. 7, No. 3, pp. 413-432. Ray, K.S., and Mondal, M. (2011), "Fuzzy Molecular Automaton Using Splicing Theory", International Journal of Bio-inspired Computation, Vol. 3, No. 5, pp. 320-330. Ray, K.S. and Mondal, M. (2012) "Reasoning with Disposition using DNA Tweezers", International Journal of Bio-inspired Computation, Vol. 4, No. 5, pp. 302-318. Ruben, A.J. & Landweber, L.F. (2000) "The Past, Present and Future of Molecular computing", Nature Rev. Mol. Cell Biol. 1, 69-72. Winfree, E., Liu, F. R., Wenzler, L. A. & Seeman, N. C. (1998), "Design and selfassembly of two-dimensional DNA crystals" Nature Vol. 394, No. 6693, pp. 539-544. Zhang, X., Wang, Y., Cui, G., Niu, Y. and Xu J. (2009), " Application of a novel IWO to the design of encoding sequences for DNA computing ", Computers & Mathematics with Applications, Vol. 57, Issues No. 11-12, pp. 2001-2008.

THANK YOU ..

S-ar putea să vă placă și