Perl Tutorial

Perl
tutorial

Working with DNA Sequences
#!/usr/bin/perl -w
# Storing DNA in a variable, and printing it out
# First we store the DNA in a variable called $DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
# Next, we print the DNA onto the screen
print $DNA;
# Finally, we'll specifically tell the program to exit.
exit;
Concatenating the DNA sequences
#!/usr/bin/perl -w
# Concatenating DNA
# Store two DNA fragments into variables called $DNA1
#and $DNA2
$DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
$DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA';
# Print the DNA onto the screen
print "Here are the original two DNA fragments:\n\n";

print $DNA1, "\n";
print $DNA2, "\n\n";
# Concatenate the DNA fragments into a third variable and

#print them Using "string interpolation"
$DNA3 = "$DNA1$DNA2";
print "Here is the new DNA of the two fragments
version 1):\n\n";
print "$DNA3\n\n";
# An alternative way using the "dot operator":

# Concatenate the DNA fragments into a third variable and
# print them
$DNA3 = $DNA1 . $DNA2;

print "Here is the concatenation of the first two fragments
(version 2):\n\n";
print "$DNA3\n\n";
# Print the same thing without using the variable $DNA3
print "Here is the concatenation of the first two fragments

(version 3):\n\n";
print $DNA1, $DNA2, "\n";
exit;
TRANSCRIPTION: DNA -> RNA
#!/usr/bin/perl -w
# Transcribing DNA into RNA

# The DNA

print "Here is the starting DNA:\n\n";
print "$DNA\n\n";
# Transcribe the DNA to RNA by substituting all T's with U's.
$RNA = $DNA;
$RNA =~ s/T/U/g;
# Print the RNA onto the screen
print "Here is the result of transcribing the DNA to
RNA:\n\n";
print "$RNA\n";
# Exit the program.

exit;
Reverse Complement
#!/usr/bin/perl -w
# Calculating the reverse complement of a strand of DNA
# The DNA

print "Here is the starting DNA:\n\n";
print "$DNA\n\n";
# Calculate the reverse complement
# First, copy the DNA into new variable $revcom

# (short for REVerse COMplement)
#
# It doesn't matter if we first reverse the string and then
# do the complementation; or if we first do the
complementation
# and then reverse the string. Same result each time.
# So when we make the copy we'll do the reverse in the same
statement.
$revcom = reverse $DNA;
-----
The DNA is now reversed.. we neeed to complement the bases
in revcom - substitute all bases by their complements.
# A->T, T->A, G->C, C->G
####Attempt 1:
$revcom =~ s/A/T/g;
$revcom =~ s/T/A/g;
$revcom =~ s/G/C/g;
$revcom =~ s/C/G/g;
# Print the reverse complement DNA onto the screen
print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";
#################
Does this work?? Why?

# See the text for a discussion of tr///
$revcom =~ tr/ACGTacgt/TGCAtgca/;
# Print the reverse complement DNA onto the screen

print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";
print "\nThis time it worked!\n\n";
exit;

Reading Proteins in files

#!/usr/bin/perl -w
# Reading protein sequence data from a file
# The filename of the file containing the protein sequence
data
$proteinfilename = 'Name_Of_your_sequence_file.txt';
# First we have to "open" the file, and associate

# a "filehandle" with it. We choose the filehandle
# PROTEINFILE for readability.
open(PROTEINFILE, $proteinfilename) || Die ("cannot open
file");
# Now we do the actual reading of the protein sequence data

from the file, by using the angle brackets < and > to get
the input from the filehandle. We store the data into our
variable $protein.
@protein = <PROTEINFILE>;
# Now that we've got our data, we can close the file.
close PROTEINFILE;
# Print the protein onto the screen

print "Here is the protein:\n\n";
print @protein;
exit;
Pattern matching: Motifs and Loops
Proceed ONLY if condition is true...
code layout..
if (condition)
do something
Finding Motifs
#!/usr/bin/perl -w
# if-elsif-else
$word = 'MNIDDKL';
# if-elsif-else conditionals
if($word eq 'QSTVSGE') {
print "QSTVSGE\n";
} elsif($word eq 'MRQQDMISHDEL') {
print "MRQQDMISHDEL\n";
}
GC CONTENT
In PCR experiments, the GC-content of primers are used to predict their annealing temperature
to the template DNA. A higher GC-content level indicates a higher melting temperature.
GC % = G + C x100
A+G+C+T
Logical:
for each base in the DNA
if base is A
count_of_A = count_of_A + 1
if base is C
count_of_C = count_of_C + 1
if base is G
count_of_G = count_of_G + 1
if base is T
count_of_T = count_of_T + 1
done
print count_of_A, count_of_C, count_of_G, count_of_T

the script
#!/usr/bin/perl -w
# Determining frequency of nucleotides
# Get the name of the file with the DNA sequence data
$dna_filename = File_name.txt;
# Remove the newline from the DNA filename

chomp $dna_filename;
# open the file, or exit
open(DNAFILE, $dna_filename) || die ("Cannot open file

\"$dna_filename\");
exit;
}
# Read the DNA sequence data from the file, and store it
# into the array variable @DNA
@DNA = <DNAFILE>;
# Close the file
close DNAFILE;
# From the lines of the DNA file,

# put the DNA sequence data into a single string.
$DNA = join( '', @DNA);
# Remove whitespace
$DNA =~ s/\s//g;
# Now explode the DNA into an array where each letter of

# the original string is now an element in the array.
# This will make it easy to look at each position.
# Notice that we're reusing the variable @DNA for this
purpose.
@DNA = split( '', $DNA );
# Initialize the counts.

# Notice that we can use scalar variables to hold numbers.
$count_of_A = 0;
$count_of_C = 0;
$count_of_G = 0;
$count_of_T = 0;
$errors = 0;
# In a loop, look at each base in turn, determine which of

# the four types of nucleotides it is, and increment the
# appropriate count.
foreach $base (@DNA)

{
if ( $base eq 'A' ) {
++$count_of_A;
}
elsif ( $base eq 'C' ) {
++$count_of_C;
}
elsif ( $base eq 'G' ) {
++$count_of_G;
}
elsif ( $base eq 'T' ) {
++$count_of_T;
}
else {
print "!!!!!!!! Error - I don\'t recognize this
base: $base\n";
++$errors;
}
}
# print the results

print "A = $count_of_A\n";
print "C = $count_of_C\n";
print "G = $count_of_G\n";
print "T = $count_of_T\n";
print "errors = $errors\n";
# exit the program
exit;

---using regex ---
while($DNA =~ /a/ig){$a++}
while($DNA =~ /c/ig){$c++}
while($DNA =~ /g/ig){$g++}
while($DNA =~ /t/ig){$t++}
while($DNA =~ /[^acgt]/ig){$e++}
print "A=$a C=$c G=$g T=$t errors=$e\n";

----
Next is a new kind of loop, the foreach loop. This loop works over the elements
of an
array. The line:
foreach $base (@DNA)

Wrtiting to files
# Also write the results to a file called "countbase"

$outputfile = "countbase";
(
unless
open(COUNTBASE, ">$outputfile") || die ("Cannot open file
\"$outputfile\" to write to!!\n\n");
print COUNTBASE "A=$a C=$c G=$g T=$t errors=$e\n";

close(COUNTBASE);

Perl Tutorial

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Perl Tutorial

Încărcat de

Drepturi de autor:

Formate disponibile

Perl

Working with DNA Sequences

# Next, we print the DNA onto the screen

# Finally, we'll specifically tell the program to exit.

Concatenating the DNA sequences

# Print the DNA onto the screen

print "Here are the original two DNA fragments:\n\n";

# Concatenate the DNA fragments into a third variable and

# An alternative way using the "dot operator":

$DNA3 = $DNA1 . $DNA2;

# Print the same thing without using the variable $DNA3

print "Here is the concatenation of the first two fragments

TRANSCRIPTION: DNA -> RNA

# Transcribing DNA into RNA

# Print the DNA onto the screen

# Transcribe the DNA to RNA by substituting all T's with U's.

# Exit the program.

# Print the DNA onto the screen

# Calculate the reverse complement

# First, copy the DNA into new variable $revcom

$revcom = reverse $DNA;

Does this work?? Why?

# Print the reverse complement DNA onto the screen

# First we have to "open" the file, and associate

# Now we do the actual reading of the protein sequence data

# Print the protein onto the screen

Pattern matching: Motifs and Loops

Proceed ONLY if condition is true...

for each base in the DNA

print count_of_A, count_of_C, count_of_G, count_of_T

# Remove the newline from the DNA filename

# open the file, or exit

open(DNAFILE, $dna_filename) || die ("Cannot open file

# From the lines of the DNA file,

# Now explode the DNA into an array where each letter of

# Initialize the counts.

# In a loop, look at each base in turn, determine which of

foreach $base (@DNA)

# print the results

---using regex ---

# Also write the results to a file called "countbase"

print COUNTBASE "A=$a C=$c G=$g T=$t errors=$e\n";

S-ar putea să vă placă și