Sunteți pe pagina 1din 6

NEUROGENOMICS:

APPLICATIONS AND ANALYSIS

Diego A. Forero, MD, PhD (c) 1,2,3,4,5


1
Applied Molecular Genomics Group, Department of Molecular Genetics, Flanders Institute for
Biotechnology (VIB); 2 University of Antwerp, Antwerp, Belgium; 3 Laboratory of Developmental
Genetics, VIB; 4 Catholic University of Leuven, Leuven, Belgium; 5 Grupo de Neurociencias,
Universidad Nacional de Colombia, Bogotá, Colombia.

Email: daforerog@gmail.com
http://users.skynet.be/dforero/index.htm

I have consolidated a set of exercises, in which you can apply different in-silico approaches to
common research problems in genetics and genomics. It is expected that the application of these
tools will enhance the design and analysis of neurogenomics experiments, in terms of scope,
precision and speed.

All the bioinformatics tools required to solve these exercises are listed in my website:
http://users.skynet.be/dforero/df9.htm

1. Identify the number of haplotype blocks that are found in the following human genes
-CREM gene in European population
-GABRA6 gene in African population
-BDNF gene in Asian population
-LMNA gene in African population
-PRNP gene in European population

2. Identify the tagging SNPs for the following human genes:


-GRIA2 gene in European population
-PDE4B gene in African population
-HTR2C gene in Asian population
-KCNA2 gene in African population
-RIMS3 gene in European population

3. Find the top 10 candidate targets for each one of the following human microRNAs:
-hsa-mir-132
-hsa-mir-134
-hsa-mir-7
-hsa-mir-135b
-hsa-let-7a

4. Identify the predicted secondary structures of the following human miRNAs:


-hsa-mir-132
-hsa-mir-134
-hsa-mir-7
-hsa-mir-135b
-hsa-let-7a

5. Retrieve the tissue with the highest expression in humans for each one of these genes.
-APOE

1
-CREM
-BDNF
-PRNP
-BACE1

6. Retrieve the dbSNP identifiers for the following human variations


-GCTGTAGGCCAGACCCTGGCA(A/C)GATCTGGGTGGATAATC
-AAATGAGGACTTCTGACCTC(A/G)AACGCTGCCCTTGTTCTT
-GCAGCCGGACAAACTTGCCCTCCTC(A/G)CCACCTCCTCCAC
-ACTATTAATGATAATACT(A/G)TCTCTCATTTATTGAGCATT
-CTGACACTTTCGAACAC(A/G)TGATAGAAGAGCTGTTGGATG

7. Identify the top 10 candidate genes for Alzheimer disease and the top 10 candidate genes
for Parkinson disease (with basis in meta-analysis of published association studies):

8. Retrieve the list of known genes located in the following human genomic regions:
-9q34.3
-21q21.3
-17p13.1
-11q23.3
-1q23.2

9. Identify the repeat sequences that are present in the following human genomic regions:
-chr17:8279904-8312206
-chr2:86247142-86276108
-chr6:16846682-16869700
-chr1:40858939-40903911
-chr6:163755665-163914884

10. Identify the effects on transcription factors binding sites for the following SNPs:
-rs34706444
-rs12028379
-rs5774713
-rs12239355
-rs17129477

11. Identify the vector sequences that are present in the following DNA fragments:
-acacctttgaggtgaaagagtattcagtgaatatgatggtcatgatgatgtcaccttggatttaaggcattttcttaag
atgtgtaaagtatgttcctttagccgccaccgcggtggagctcccagcttttgttcccttta
-tatctgggctttagtttctccatcattacaatgaagagatgtgctatccttttccaccctgttctaaaattgtgtaact
tttttttttcttttttgagacatgcacgagtgggttacatcgaactggatctcaacagcggt
-gtagtcaggattctgctgacctgcttacagggcactaaatacctgaggaggcaggagcttgggggaaagctgagaggta
tctatccccatctacctactgatggagttccgcgttacataacttacggtaaatggcccgcc

12. Identify the top candidate variations in the following human DNA sequence traces:
You will use a file with the chromatograms of 96 subjects sequenced for a 500 bp region.

13. Retrieve the genomic lengths, protein lengths, chromosomal positions and number of
exons for the following genes:
-PLXNA2
-NRG1
-MTHFR
-DTNBP1
-SLC6A4

2
14. Identify the homologues in mouse and drosophila of the following human genes:
-SV2A
-PDE4B
-DRD1
-SYT1
-RGS4

15. Design overlapping PCR primers to sequence the following human genomic regions:
-chr1:40,879,177-40,883,673
-chrX:77,256,575-77,258,830
-chr8:26,530,136-26,532,811
-chr4:122,960,094-122,962,212
-chr5:161,054,462-161,056,347

16. Identify the differential GO and KEGG terms in the following two lists of human genes:
List 1. GPR51, GRIA2, KIF5C, MBP, MEF2C, NAP1L3, NCDN, NDRG4, NEFL, NRGN,
NTRK2, OLFM1
List 2. AKAP6, BRF1, CCNA2, DST, MACF1, NBEA, RAB11A, RANBP5, SEC8L1, SYNE1,
ZFYVE20, ZNF490

17. Identify the proteins encoded by the following RNA sequences:


-atggaaaaccccagcccggccgccgccctgggcaaggccctctgcgctctcctcctggccactctcggcgccgccggcc
agcctcttgggggagagtccatctgttccgccagagccccggccaaatacagcatcaccttcacg
-atggagctggaccaccggaccagcggcgggctccacgcctaccccgggccgcggggcgggcaggtggccaagcccaacgtgatcctgc
agatcgggaagtgccgggccgagatgctggagcacgtgcggcggacgcaccggcac
-atgggcttgttagagtgctgtgcaagatgtctggtaggggccccctttgcttccctggtggccactggattgtgtttct
ttggggtggcactgttctgtggctgtggacatgaagccctcactggcaca

18. Identify the cDNAs encoding the following protein sequences:


-LCADARMYGVLPWNAFPGKVCGSNLLSICKTAEFQMTFHLFIAAFVGAAATLVSLLTFMIAATYNFAVLKLMGRGTKF
-EMMDLQHGSLFLRTPKIVSGKDYNVTANSKLVIITAGARQQEGESRLNLVQRNVNIFKFIIPNVVKYSPNCKLLIVSN
-MVDMMDLPRSRINAGMLAQFIDKPVCFVGRLEKIHPTGKMFILSDGEGKNGTIELMEPLDEEISGIVEVVGRVTAKAT

19. Find the hierarchical clustering of the following list of genes:

Tiss1 Tiss2 Tiss3 Tiss4 Tiss5 Tiss6 Tiss7 Tiss8


Gene 1 0,052905 0,058392 0,06977 0,056961 0,074954 0,061005 0,050068 0,059917
Gene 2 0,0336 0,095512 0,061694 0,036708 0,050386 0,042539 0,030157 0,056136
Gene 3 0,021603 0,024434 0,021238 0,018759 0,01518 0,015751 0,012132 0,027813
Gene 4 0,01405 0,018037 0,008364 0,010938 0,017524 0,006858 0,005407 0,016314

20. Find the genes that have their highest expression in prefrontal cortex (200 fold enrichment
in comparison with other tissues), repeat it for amygdala.

21. Identify the transcripts that are targeted by the following affymetrix probes:
-204312_x_at
-207630_s_at
-210400_at
-212581_x_at
-201891_s_at

22. Identify the haplotypes that are present in the following dataset (including their frequency
and calculate the LD values between SNPs).

3
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13
subj1 CT AG AC TT AC CT GG CC CT AG CT AA AG
subj2 TT GG AA TT CC CC GG CC CT AG CT AG AG
subj3 CT AG AA TT CC CC GG CC CC GG TT AA AG
subj4 CT AG AA CT CC CC AG CT CT GG CT AG AG
subj5 CT AG AA TT CC CC GG CC CT AG CT AG AG
subj6 CT AG AA TT CC CC GG CC CC GG TT AA AA
subj7 CT AG AA TT CC CC GG CT CT GG CT AG AG
subj8 CC AG AC TT CC CC GG TT TT GG CC GG GG
subj9 TT GG CC TT AA TT GG CT CT GG CT AG AG
subj10 TT GG AA TT CC CC GG CC CT AG CT AG AG
subj11 CT AG AA TT CC CC GG CC CT AG CT AA AG
subj12 CT GG AA TT CC CC GG TT TT GG CC GG GG
subj13 CT AG AA TT CC CC GG CC CC GG TT AA AG
subj14 TT GG AA CT CC CC AG CC CT AG CT AG AG
subj15 CT AG AA TT CC CC GG CC CC GG TT AA AG
subj16 CT AG AA TT CC CC GG CC CT AG CT AG AG
subj17 CT GG AC TT AC CT GG TT TT GG CC GG GG
subj18 CC AA AA TT CC CC GG CC CT AG CT AG AG
subj19 CT AG AA TT CC CC GG CC CC GG TT AA AA
subj20 CC AA AC TT CC CT GG CC CC GG TT AA AA
subj21 CT AG AA TT CC CC GG CC CC GG TT AA AG
subj22 CC AA CC TT AA TT GG CC CC GG TT AA AG
subj23 TT GG AA TT CC CC GG CC CC GG TT AA AG
subj24 TT GG AA TT CC CC GG CC CC GG TT AA AA

23. Identify the predicted functional effects of each one of the following nsSNPs:
-rs28931579
-rs769452
-rs28931577
-rs11542040
-rs11542035

24. Retrieve the genomic sequence for all the exons (including 50 bp of flanking sequence) of
the following genes:
-RGS4
-RIMS3
-RTN1
-SLC1A3
-SNAP25

25. Identify the interacting partners for each one of the following genes:
-MEF2C
-NAP1L3
-NCDN
-NDRG4
-NEFL

26. Identify which of the next P values pass a False Discovery Rate of 0.05.
0,650106935, 0,308093469, 0,463145394, 0,19572116, 0,112681844, 0,493084372, 0,043017213,
0,515230709, 0,098477813, 0,276669253, 0,4536028, 0,927263525, 0,000763073, 0,391324056,
0,381511095, 0,003431856, 0,206671413, 0,354702281, 0,25477432

4
27. Identify the top 10 down-regulated genes in post-mortem schizophrenia brains, repeat it
for bipolar disorder.

28. Design PCR primers that allow the cloning of the following fragments:
-chrX:77256575-77256975; EcoRI and HindIII
-chr8:26530136-26530636; HindIII and XbaI
-chr6:16846682-16846982; EcoRI and XbaI
-chr1:40858939-40859339; HindIII and EcoRI

29. Identify the genomic regions that are amplified using the following PCR primer pairs:
-F-ATGGAGTGGCTAGAAGAGTCAG
R-TGGATCATTTGCGATTTCCAGTT
-F-AGGGCTTCCTTATGTCCTCCA
R-TACCCACGTACCATTAGGAGC
-F-AAAAGCAGGAGTGTGATGACG
R-CGATCCCAAGTGTGTTACTGG

31. Identify the maximum LOD score simulated for the following pedigree:

32. Identify the nucleotide that is conserved in mouse and rat for the following SNPs:
-rs9817739
-rs1937690
-rs7973772
-rs278151
-rs10128858

33. Design primers to genotype the following SNPs by AS-PCR:


-rs974849
-rs246835
-rs12768718
-rs10185953
-rs5753220

34. Design primers to genotype the following SNPs by PCR-RFLP:


-rs16949418
-rs4979416
-rs4852259
-rs11593916
-rs10488140

35. Identify the number of citations for the papers with the following PMIDs:
-17173049
-16862116

5
-8895455
-818641
-17571346

36. Identify the predicted network of interactions for the following genes:
CAMK2B, DNER, DNM1, EEF1A2, ELAVL4, GFAP

37. Identify the best predicted drug compound that can modulate the activity of the following
genes:
-CAMK2B
-NTRK2
-VDAC1
-CCNA2
-PDE4B

38. Design PCR primers to differentiate between cDNA and genomic DNA for the following
genes:
-TF
-TU3A
-TUBB4
-UCHL1
-VSNL1

39. Identify the most suitable journal to publish a hypothetical paper with the following
abstract:
Human memory is a polygenic trait. We performed a genome-wide screen to identify memory-
related gene variants. A genomic locus encoding the brain protein KIBRA was significantly
associated with memory performance in three independent, cognitively normal cohorts from
Switzerland and the United States. Gene expression studies showed that KIBRA was expressed in
memory-related brain structures. Functional magnetic resonance imaging detected KIBRA allele–
dependent differences in hippocampal activations during memory retrieval. Evidence from these
experiments suggests a role for KIBRA in human memory.

40. Identify the significant SNPs in a genome wide association study and identify possible runs
of homozigosity in the same dataset.
You will download a publicly available dataset with results from about 500.000 SNPs.

41. Identify SNPs that are located in conserved transcription factor binding sites in
chromosome 1; retrieve SNPs that are located in microRNA binding sites in chromosome 2.

42. Identify the Ensembl IDs for the genes of the point 36.

DF, 03-2008

If you use these exercises for teaching purposes, please cite the original source; if you have
commentaries or suggestions, please do not hesitate to contact me by email.

S-ar putea să vă placă și