Documente Academic
Documente Profesional
Documente Cultură
BIOINFORMATICS LAB
SUBMITTED TO:
MS.NISHTHA PANDEY
SUBMITTED BY:
GAGANJIT KAUR
SECTION: A77E2
ROLL NO:27
ORF FINDER
QUES: What is ORF?
ANS: Open reading frame (ORF) is a DNA sequence that contains a start codon and a stop codon in
the same reading frame. In a gene, ORFs are located between the start-code sequence (initiation codon)
and the stop-code sequence (termination codon).
For example, if a portion of a genome has been sequenced (e.g. 5'-UCUAAAAUGGGUGAC-3'), and it is
known to contain a gene, ORFs can be located by examining each of the three possible ORFs (or six in
double-stranded DNA). In this sequence two out of three possible reading frames are "open". This is one
of the two possible mRNA sequences of the transcript, and we see that it can be read in three different
ways:
The last reading frame contains a stop codon (UAA), unlike the first two. Thus, only two of the three
reading frames are open. Since there is a start codon (AUG) in the first open reading frame, it is very
likely that the first ORF is the correct one.
The two sequences are placed on the axes of a rectangular image and (in the simplest forms of
dot plot) wherever there is a similarity between the sequences a dot is placed on the image.
Where the two sequences have substantial regions of similarity, many dots align to form
diagonal lines. It is therefore possible to see at a glance where there are local regions of
similarity as these will have long diagonal lines. It is also easy to see other features such as
repeats (which form parallel diagonal lines), and insertions or deletions (which form breaks or
discontinuities in the diagonal lines).
Dottup looks for places where words (tuples) of a specified length have an exact match in both
sequences and draws a diagonal line over the position of these words. This is a fast, but not
especially sensitive way of creating dotplots. It is an acceptable method for displaying regions of
substantial similarity between two sequences.
Using a longer word (tuple) size displays less random noise, runs extremely quickly, but is less
sensitive. Shorter word sizes are more sensitive to shorter or fragmentary regions of similarity,
but also display more random points of similarity (noise) and runs slower.
STEP: 1 Open swissprot and enter name of protein collagen and click on
search and see the results.
STEP: 2 Click on accession number of one entry and open its sequence in
FASTA format.
STEP:3 Open BLASTp.
In this there are two sequences one is query and the other is subject. The centered
sequence shows the similarity between the two sequences. The letters written in
the centered line are known as identities and positive sign shows the similarity in
physical properties in both the sequences such as both are hydrophobic in nature.
Gap shows there is no similarity between the two sequences.