Sunteți pe pagina 1din 4

Arch Virol

DOI 10.1007/s00705-016-3206-z

ANNOTATED SEQUENCE RECORD

Sequence errors in foamy virus sequences in the GenBank


database: resequencing of the prototypic foamy virus proviral
plasmids
Tobias C. Wagner1 Jochen Bodem1

Received: 10 November 2016 / Accepted: 18 December 2016


Springer-Verlag Wien 2016

Abstract Nucleotide sequences are the fundamental basis published in 1987 [35]. Finally, two infectious proviral
for work on molecular mechanisms and for phylogenetic plasmids, pHSRV2 [6] and pHSRV13 [7], were generated
analysis. Recently, we identified sequence errors in all of from the same parental plasmids. These proviral sequences
the LTR sequences of the prototypic foamy virus stored in have been the basis for molecular and phylogenetic studies
the GenBank database. Here, we report the resequencing of up to now. The first foamy virus sequences were obtained
the proviral plasmids pHSRV13 and pHSRV2. Sequence by the Maxam-Gilbert sequencing method, while in the
comparisons revealed an error rate for the foamy virus eighties and early nineties, newer PFV nucleotide sequence
sequences stored in the database of up to 10 errors per 1000 data were obtained by radioactive Sanger sequencing and
bp. Even the newest sequences of the codon-optimized recording the sequence by hand. Especially long GC-rich
foamy virus synthetic Gag, Pol, and Env amino acid regions and repetitive sequences due to DNA/RNA struc-
sequences showed exchanges compared to the new proviral tures might have led to sequence readout errors from urea
pHSRV13n sequence. Our results provide evidence that polyacrylamide sequencing gels.
some prototypic foamy virus sequences contain errors and Recently, we analysed the regulation of foamy virus
should be revised. polyadenylation [8]. During these studies, we discovered
that none of the PFV sequences deposited in the GenBank
database matched the true nucleotide sequences of the two
proviral plasmids. In fact, two nucleotides were missing in
Introduction the R region (193 nt) of all HFV/PFV sequences in the
GenBank database. Furthermore, the sequences of the 5
Foamy viruses were already described as a cytopathic and 3 LTRs stored in the database were not identical, and
agent in cell cultures in the year 1954 [1], and they were sequence data directly obtained from the Rethwilm
later isolated from a nasopharyngeal carcinoma of an (pHSRV2) and Flugel/Lochelt (pHSRV13) laboratories
African patient [2]. This isolate previously called human differed from the database sequences as well. Based on
foamy virus is the best-analysed foamy virus so far and these inconsistencies, we decided to resequence both
represents the prototypic foamy virus (PFV). The genome proviral plasmids.
of PFV was cloned, and the first PFV sequences were

Sequence
Electronic supplementary material The online version of this
article (doi:10.1007/s00705-016-3206-z) contains supplementary
material, which is available to authorized users. To characterize defined plasmid stocks, we used a
pHRSV13 plasmid preparation received from Martin
& Jochen Bodem Lochelt and a pHSRV2 plasmid preparation used in a
Jochen.Bodem@vim.uni-wuerzbuerg.de
published study by the Rethwilm laboratory in 2008 [9]. In
1
Institut fur Virologie und Immubiologie, Julius-Maximilians- order to ensure overlapping sequence data, sequencing
Universitat, Wurzburg, Germany primers were located every 500 bp. Furthermore, all

123
T. C. Wagner, J. Bodem

deviations from the original sequences were verified by errors (Table 1 and Table S1). Recently, codon-optimized
resequencing. Additional LTR sequences were determined expression plasmids containing HFV/PFV gag, pol, and
separately [8]. The obtained sequence data were aligned, env were generated and patented [11]. Unfortunately, even
and consensus sequences for pHSRV2 and pHSRV13 were this synthetic codon-optimized Gag encoded one amino
generated using the SeqMan software package version acid exchange in Gag (D60N) compared to the pHSRV13n
9.0.4 (DNASTAR, Madison, Wisconsin USA). The auto- sequence.
mated sequence readout, in addition to large sequence Furthermore, sequence comparisons showed severe
overlaps and computerized sequence assembly, led to high- errors in some of the previously published pol sequen-
quality nucleotide sequences. Furthermore, to insure ces, which resulted in two frameshifts, leading to a
sequence quality, trace files containing any sequence stretch of 14 amino acid residues in a different reading
deviation were inspected in detail. frame (Tables 1 and S1). Surprisingly, these errors are
The resulting new sequences were named pHSRV2n und present in the earliest sequence records (accession
pHSRV13n, respectively, and submitted to GenBank (ac- number HSU21247) and in sequences submitted in 2005
cession number KX087159). These sequences were com- (accession numbers Y07723 and Y07725). However, this
pared to the human foamy virus and PFV sequences stored part of the sequence was corrected in the synthetic,
in GenBank and to the pHSRV2 sequences obtained codon-optimized pol expression plasmid [11, 12]. Nev-
directly from the Rethwilm laboratory, using the matcher ertheless, one error in the synthetic Pol amino acid
program of the EMBOSS software suite [10] (Table 1). For sequence was detected in the coding region of the inte-
clarity and consistency with older reports, we maintained grase (Q1141K).
the abbreviation pHSRV2 for the sequence comparisons A comparison of the coding sequence of pHSRV2n and
and used the GenBank accession numbers for GenBank- pHSRV13n showed an amino acid polymorphism located
derived sequence data (Table 1 and Table S1). in the Env protein (I72M) and one in the Bel2 protein
These sequence comparisons revealed that the previous (Table 1). Unfortunately, the synthetic env expression
Gag amino acid sequences contained two or three sequence plasmid encodes one amino acid exchange in the Env

Table 1 Comparison of foamy virus sequences: nucleotide and amino acid exchanges in comparison to pHSRV13n
GenBank acc. no. Region (length in nucleotides/amino acid residues) Ref.
P
LTR 50 /30 Gag Pol Env Tas Bel2 exchanges
(1123/-) (1947/649) (3432/1144) (2967/989) (903/301) (1071/357) (nt/[nt/1kbp])

pHSRV2n* 5: 0; 3:1 0/0 0/0 3/1 0/0 1/1 5/0.40


KX087159
pHSRV2* 5: 72 3:74 3/2 72/14 8/3 2/1 6/3 40/3.18
HSU21247 5: 72; 3:73 4/3 72/14 111/41 1/0 8/5 45/3.58 [3, 4, 7, 15]
Y07723* 5:122,6; 3:82,6 3/2 72/14 8/3 2/1 6/3 46/3.66 [16]
Y07724 5: 72; 3:74 3/2 72/14 8/3 2/1 6/3 40/3.18 [16]
2,7 2,7 2
Y07725* 5: 12 ; 3:8 3/2 7 /14 8/3 2/1 6/3 46/3.66 [16]
M19427 5:7; - 73/3 -/- -/- -/- -/- 14/4.56 [15]
EU381420 - -/- -/- -/- -/- 4/2 4/3.73 [17]
X05591 - -/- -/- 111/41 -/- -/- 11/3.71 [4]
X05592 i1366 -/- -/- -/- -/- 12/8 22/10.03 [3]
Codon opt. - -/1 -/1 -/2 (15) -/- -/- [11, 12]
1
Insertion of three nucleotides/one amino acid
2
Deletion of two nucleotides
3
One missing nucleotide
4
Three missing nucleotides
5
Differences in comparison to the amino acid sequence of pHSRV2n Env
6
Insertion of 136 nucleotides
7
Insertion of 645 nucleotides
* Dissimilarity in 5- and 3-LTR

123
Resequencing of foamy virus clones

leader peptide (N14K) compared to the new sequence of Acknowledgements We would like to thank M. Lochelt for the
the proviral plasmids. pHSRV13 plasmid.
The error frequency of GenBank sequences has been Compliance with ethical standards
determined to be 2.887 errors per 1000 bases [13], and it
was later found to be one error per 1000 bp in the mouse Ethical approval This article does not contain any studies with
genome [14]. The frequency of errors in the foamy virus human participants or animals performed by any of the authors.
sequences with up to 10.03 errors per 1000 bp, even Conflict of interest The authors declare that they have no conflict of
without taking any insertions into account, was found to be interest.
more than three times higher than the described error rates.
The described sequence deviations, especially in the ear-
liest human foamy virus sequences, might have resulted References
from naturally occurring polymorphisms, cell culture
adaption, errors during reverse transcription, or technical 1. Enders JF, Peebles TC (1954) Propagation in tissue cultures of
cytopathogenic agents from patients with measles. Proc Soc Exp
problems during sequence determination. Errors in previ-
Biol Med 86(2):277286
ous sequences of pHSRV2 and pHSRV13 can be attributed 2. Achong BG, Mansell PW, Epstein MA (1971) A new human
to the latter, since the plasmid stocks were derived from the virus in cultures from a nasopharyngeal carcinoma. J Pathol
original sources, and thus the sequences that were obtained 103(2):P18
3. Flugel RM, Maurer B, Bannert H, Rethwilm A, Schnitzler P,
should have matched those in GenBank. Some errors, like
Darai G (1987) Nucleotide sequence analysis of a cloned DNA
the missing two nucleotides in the R region of the LTR can fragment from human cells reveals homology to retrotransposons.
be explained by the history of both proviral plasmids. The Mol Cell Biol 7(1):231236
prototypic foamy virus was first cloned in the Flugel lab- 4. Flugel RM, Rethwilm A, Maurer B, Darai G (1987) Nucleotide
sequence analysis of the env gene and its flanking regions of the
oratory [3, 4, 15], and the resulting plasmids were used to
human spumaretrovirus reveals two novel genes. Embo J
generate the infectious pHSRV13 plasmid. The missing 6(7):20772084
dinucleotide within the 5LTR was present in the first 5. Rethwilm A, Darai G, Rosen A, Maurer B, Flugel RM (1987)
foamy viral sequences of the 3 LTR [3, 4], but it was not Molecular cloning of the genome of human spumaretrovirus.
Gene 59(1):1928
recorded in the first 5LTR sequence [15]. It is likely that
6. Rethwilm A, Baunach G, Netzer KO, Maurer B, Borisch B, ter
the LTRs were not resequenced and verified in pHSRV13, Meulen V (1990) Infectious DNA of the human spumaretrovirus.
since it was assumed that the sequences of the parental Nucleic Acids Res 18(4):733738
plasmids had been determined correctly. Therefore, the 7. Lochelt M, Zentgraf H, Flugel RM (1991) Construction of an
infectious DNA clone of the full-length human spumaretrovirus
missing dinucleotide remained absent in the pHSRV13
genome and mutagenesis of the bel 1 gene. Virology
5LTR nucleotide sequence. The pHSRV2 plasmid was 184(1):4354
constructed by the Rethwilm laboratory using the very 8. Schrom E-M, Moschall R, Weitner H, Fecher D, Langemeier J,
same parental plasmids produced in the Flugel laboratory Bohne J, Wohrl BM, Bodem J (2013) U1snRNP-mediated sup-
pression of polyadenylation in conjunction with the RNA struc-
[6], and again, the sequences were not verified after the
ture controls poly(A) site selection in Foamy Viruses.
construction of the plasmid. Retrovirology 10:55
Differences in the cloning strategies might explain 9. Peters K, Barg N, Gartner K, Rethwilm A (2008) Complex
deviations between the proviral plasmids, since Lochelt effects of foamy virus central purine-rich regions on viral repli-
cation. Virology 373(1):5160. doi:10.1016/j.virol.2007.10.037
et al. mainly used the parental plasmid C11, whereas the
10. McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N,
env sequence of pHSRV2 is based on the C55 plasmid Cowley AP, Lopez R (2013) Analysis Tool Web Services from
[37]. However, the cloning strategy of pHSRV2 docu- the EMBL-EBI. Nucleic Acids Res 41 (Web Server
ments the exchange of an XbaI site for an Asp718 site in issue):W597W600. doi:10.1093/nar/gkt376
11. Lindemann D, Rethwilm A (2011) Foamy virus biology and its
env [6]. Unfortunately, this Asp718 site is not present in
application for vector development. Viruses 3(5):561585.
any of the sequences analysed, and our sequence data show doi:10.3390/v3050561
that the parental XbaI site is still present in the pHSRV2n 12. Hartl MJ, Bodem J, Jochheim F, Rethwilm A, Rosch P, Wohrl
sequence. BM (2011) Regulation of foamy virus protease activity by viral
RNAa novel and unique mechanism among retroviruses.
In summary, we provide evidence that the PFV
J Virol 85(9):44624469
sequences available in the GenBank database contain 13. Krawetz SA (1989) Sequence errors described in GenBank: a
several sequencing errors and should not be used in the means to determine the accuracy of DNA sequence interpretation.
future. Furthermore, we would like to suggest that labo- Nucleic Acids Res 17(10):39513957
14. Wesche PL, Gaffney DJ, Keightley PD (2004) DNA sequence
ratories use the pHSRV13n sequence and plasmid to ensure
error rates in Genbank records estimated using the mouse genome
that experiments and data will be comparable within the as a reference. DNA Seq 15(56):362364. doi:10.1080/
foamy virus field. 10425170400008972

123
T. C. Wagner, J. Bodem

15. Maurer B, Bannert H, Darai G, Flugel RM (1988) Analysis of the 17. Perkovic M, Schmidt S, Marino D, Russell RA, Stauch B, Hof-
primary structure of the long terminal repeat and the gag and pol mann H, Kopietz F, Kloke BP, Zielonka J, Strover H, Hermle J,
genes of the human spumaretrovirus. J Virol 62(5):15901597 Lindemann D, Pathak VK, Schneider G, Lochelt M, Cichutek K,
16. Schmidt M, Herchenroder O, Heeney J, Rethwilm A (1997) Long Munk C (2009) Species-specific inhibition of APOBEC3C by the
terminal repeat U3 length polymorphism of human foamy virus. prototype foamy virus protein bet. J Biol Chem 284(9):
Virology 230(2):167178. doi:10.1006/viro.1997.8463 58195826. doi:10.1074/jbc.M808853200

123

S-ar putea să vă placă și