Sunteți pe pagina 1din 4

Lab Assignments

PERL- Programming for Bioinformatics (MB 37)

M.Sc. (Bioinformatics) Semester III


 Loops
1. Use a loop to read data from file
2. Use a loop to read data from keyboard
3. Use a loop to read data from anywhere
 Conditional expressions
1. A conditional statement
2. Search data for a motif (aka grep)
3. Determining nucleotide frequency
4. Another way calculated nucleotide frequency
 Subroutines/modularization
1. Subroutine example
2. Restrict scope of variables with my
3. Get info from command line
 Random numbers (to model mutation)
1. Random DNA generation
2. DNA mutation
 To implement Algorithm using appropriate data structures:
1. GreedyAlgorithm
2. Sequence alignment and assembly algorithm
3. Algorithms related to Trees and Sequences

1. The PKC Beta-1 protein sequence shows that it contains 671 amino acids. Given an
average molecular weight of 110 daltons per amino acid, write a program that
calculates the estimated molecular weight of the protein (in kilodaltons) and writes
the result on the screen.
2. Write a program that asks the user for a gene name and the number of nucleotides in
its coding sequence (namely the part in its cDNA that translate to protein, from
initiation to stop codon); calculates the number of amino acids in the resulting protein
and its estimated molecular weight (in kilodaltons) and prints the results, including
the gene name on the screen.
3. Given the amino acid sequence of a polypeptide- Phe, Val, Asn, Gln, His, Leu, Cys,
Gly, Ser- perform the following tasks with a Perl program:
- Define an array that contains the amino acids in the right order (use the three
letters notation, as above). Print it in one line (without a foreach loop).
- Determine the number of amino acids in the polypeptide and print it.
- Add the amino acids “His” to the end of the polypeptide (use the “push”
function). Print the resulting array in one line (without a foreach loop).
- Create a “mutation”: replace” Gly” with “Asp”. Print the resulting array.
- Ask the user to enter a number between 1 and the number of amino acids in
the polypeptide, and print the amino acid in that position (e.g. of the user
enters “4” the program should print “Gln”.)
- Create an inversion: get two positions in the sequence from the user and invert
the sequence of the amino acids between them. For example, of the user enters
3 and then 6, the program should replace Asn, Gln, His, Leu with Leu, His,
Gln, Asn. (using “array slices” and the “reverse” function). Print the result.
- Create a string that will contain the amino acid sequence of the resulting
polypeptide, in the format Phe-Val-Asn… (use string concatenation inside a
foreach loop. Make sure not to leave a “- “before that first amino acid or after
the last one). Print that string.
4. Write a program that prints DNA in all lowercase letters. Write another one that
prints it in all uppercase letters. Use the translate tr//function. Test your program with
DNA that is represented in all uppercase (AGCT), all lowercase (aget) letter, and a
mixture of both.
5. Write a program to determine the frequency of nucleotides present in a DNA
sequence data file.
6. Activity of a certain enzyme was measured in extracts from brain, heart and lung. In
each tissue, the activity was measured several times, giving the following results (in
arbitrary units):

Brain: 65,69,70,63,70,68.
Heart: 102,95,98,110.
Lung: 112,115,113,109,95,98,100.

Write a program that calculates and prints the following information for each of the
tissues:
- Number of measurements
- Average enzyme activity
- Variance
- Standard deviation

Use the following formulas (n is the number of measurements):


Average: 𝑛
1
𝑥̅ = (∑ 𝑥𝑖 )
𝑛
𝑖=1

Variance: 𝑛
1
𝑆2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1
𝑖=1

Standard
Deviation: 𝑆 = √𝑆 2

For example, values for enzyme activity in the brain are calculated as the following:

Number of 𝑛=6
measurements:

Average: 65+69+70+
𝑥̅ = = 67.5
6
Variance: 1
𝑆2= ((65 − 67.5)2 +(69 − 67.5)2 + (70 − 67.5)2 + ) = 8.3
6−1

Standard
deviation:
𝑆 = √8.3= 2.88

7. Write a program that finds the minimal enzyme activity in each of the tissues
mentioned in Assignment 6:
Brain: 65,69,70,63,70,68.
Heart: 102,95,98,110.
Lung: 112,115,113,109,95,98,100.

Use a subroutine that returns the lowest number of any given list of numbers.

8. Write a program for searching motifs by asking the user for the filename of the file
containing the protein sequence data, and collect it from the keyboard. Read the
protein sequence data from the file, and store it into the array variable. Read the
protein sequence data from the file, and store it into the array variable then ask the
user for a motif and search for the motif.
9. Write a program that reads in a list of strings, then prints one chosen at random.
10. To select a random number within your array, you need to do two things:
 Put strand; near the top of your program to initialize the Random Number
Generator
 Use rand (@my_array); to generate a random number between 0 and the last
element in the array.
11. Write a subroutine to normalize the RNA nucleotides sequence to lowercase, then to
reverse transcribe it into DNA.
12. Write a program that checks if two DNA sequences (given as command line
arguments) are reverse complements of each other. Please do this by comparing one
nucleotide at a time from one sequence to another. Store both DNA sequences in
arrays. Use a loop and the Perl functions split, pop, shift and the eq operator.by
reverse complement, we mean that you should check that the reading order of the two
sequences are reversed and that the nucleotides are complements of one another.
13. Write a program with a subroutine to append ACGT to the original DNA sequence.

S-ar putea să vă placă și