Documente Academic
Documente Profesional
Documente Cultură
GOV
PubMed: Scientific Journals
Entrez: Keyword Search of Database
BLAST: Sequence Queries
OMIM: Online Mendelian Inheritance in Man
Books
TaxBrowser
Structure: 3D Molecular Structures
Sequence Files
Since the information relevant to biological processes is contained in the
gene or protein sequence, all genetic and protein data are contained in
sequence files.
Importantly, there is a directionality that exists in nature that is conserved
in the sequence file;
Nucleic Acids are always written 5 to 3 (describing the 5 or 3 free
hydroxyl group used in the phosphodiesterase bond).
nucleic acids (genes): 5-AGCTCGTGTAGACCATTC-3
Amino Acids are always written with the free amino (N-terminus) first and
the carboxylic acid (C-terminus) last.
amino acids (proteins): amino-IPKERYRGQIESIWA-carboxy
5
3
3
5
5
3
3
5
AGTCGTGATCTGCTAAATGTCTCGAAGTTCGATGCTAG
||||||||||||||||||||||||||||||||||||||
TCAGCACTAGACGATTTACAGAGCTTCAAGATACGATC
Courier font is preferred for writing sequence data since letter spacing
is independent of character content.
FASTA
File
Format
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid
codes, with these exceptions:
1)
2)
3)
4)
For those programs that use amino acid (protein) query sequences
(e.g. BLASTP and TBLASTN), the accepted amino acid codes are:
A alanine
B aspartate
C cystine
D aspartate
E glutamate
F phenylalanine
G glycine
H histidine
I isoleucine
K lysine
L leucine
M methionine
N asparagine
P proline
Q glutamine
R arginine
S serine
T threonine
U selenocysteine
V valine
W tryptophan
Y tyrosine
Z glutamine
X any
* translation stop
- gap of indeterminate length
FASTA
File
Format
FASTA
File
Format
TinySeq
XML
<?xml version="1.0"?>
<!DOCTYPE TSeq PUBLIC "-//NCBI//NCBI TSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
<TSeq>
<TSeq_seqtype value="nucleotide"/>
<TSeq_gi>1924939</TSeq_gi>
<TSeq_accver>X98411.1</TSeq_accver>
<TSeq_taxid>9606</TSeq_taxid>
<TSeq_orgname>Homo sapiens</TSeq_orgname>
<TSeq_defline>Homo sapiens partial mRNA for myosin-IF</TSeq_defline>
<TSeq_length>2711</TSeq_length>
<TSeq_sequence>CAGGAGAAGCTGACCAGCCGCAAGATGGACAGCCGCTGGGGCGGGCGCAGCGAGTCCATCAATGT
</TSeq>
P proline
Q glutamine
R arginine
S serine
T threonine
U selenocysteine
V valine
W tryptophan
Y tyrosine
Z glutamine
X any
* translation stop
- gap of indeterminate
length
ddATP32
ddCTP32
ddTTP32
ddGTP32
A G C T
AAACCAGGCCGATAAGGTACTACACGAAAAA
|||||||||||||||||||||||||||||||||||||||
TTTGGTCCGGCTATTCCATGATGTGCTTTTTTT
TTGGTCCGGCTATTCCATGATGTGCTTTTTTT
TGGTCCGGCTATTCCATGATGTGCTTTTTTT
GGTCCGGCTATTCCATGATGTGCTTTTTTT
GTCCGGCTATTCCATGATGTGCTTTTTTT
TCCGGCTATTCCATGATGTGCTTTTTTT
CCGGCTATTCCATGATGTGCTTTTTTT
CGGCTATTCCATGATGTGCTTTTTTT
GGCTATTCCATGATGTGCTTTTTTT
GCTATTCCATGATGTGCTTTTTTT
CTATTCCATGATGTGCTTTTTTT
TATTCCATGATGTGCTTTTTTT
ATTCCATGATGTGCTTTTTTT
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/D/DNAsequencing.html
GGATCCTGCAAGGAGGGATACAAATTACATACATTTGTCAAAACCCACAGCATGTTGACCACCAGGAGGAG
ACCCCATGTGACTCCAGGACCCTGGTTGATAACAACGTATCGAGATTCCTCACATGGAACCAGTGCGCTC
CTGTGGTGGAGGGTGTACCTGTGTCAGGGCAGGGGGTACGTGGACATTTTCTGCAGTTTTTGATCAATTT
TGCAATGAACTAAATCTGTGGTATAAAAATAAAGTCTATTAAAAGAATCCAAGGCTCCCTCTCATCTCACGAT
AAGATAAAGTCCCCATCCATTTTACTCCTCTCAGCCCTGGAGAAAGGAGAGGCCAGGTCCCACCACCTTC
CACCAGCATGGACCCCCAGTCCAGACCCCACGCCTTTTCTCAGCATCCTCAGACCAGCAGGACTTGCAG
CAATGGGGAATTAGGCACCTGACTTCTCCTTCATCTACCTTTGGCTGGGGGCCTCCAGCCTTGACCTTCG
CTCTGAGAGTCTCAGGCAGGTCCAGAGCCAGTTCTCCCATGACGTGATATGTTTCCAGAGCAGGTTCCTG
GGTGAGATAAAAGGATTTGGGCTGAACAGGGTGGAGGGAGCATTGGAATGGCACTCAGGGCAAAGGCAG
AGGTGTGCGTGGCAGCGCCCTGGCTGTCCCTGCAAAGGGCACGGGCACTGGGCACTAGAGCCGCTCGG
GCCCCTAGGACGGTGCTGCCGTTTGAAGCCATGCCCCAGCATCCAGGCAACAGGTGGCTGAGGCTGCT
GCAGATCTGGAGGGAGCAGGGTTATGAGCACCTGCACCTGGAGATGCACCAGACCTTCCAGGAGCTGGG
GCCCATTTTCAGGTAAAGCCCTCCCTGGCCCTCGCTGGGAACACCCAGATCCCTGCCCCTGCTGCCCAG
GACCCTGCCAGGCACTCAGCACTGCCATTCCCAGCAGGTCCCGGCACTCTGCATCCTTTGGAGGATGGG
GAAGGAGTGCAGCACATGCTGGTCTGTGGTGCTGCCAGGGCAGGGGATAGTGCAGAGAAAACCCCAGC
TCACTGCAGAGAGGGCAGGACTCAGAAGCACTAAAGTTGAAAGGTTCCAGGGAGCCAGCAGGAGGGCTT
TAGCTGTGAAGCCGCTAATCCAGGAGCAGGGAGGGTGGACAGGAGACACTTTGGATTGGGACTGCAGGG
TGGGGCCACGAGGGACATGACCCCGTCCAGCAGGGCCTCCTGCTTGGCCCCACAGGTACAACTTGGGA
GGACCACGCATGGTGTGTGTGATGCTGCCGGAGGATGTGGAGAAGCTGCAACAGGTGGACAGCCTGCAT
CCCTGCAGGATGATCCTGGAGCCCTGGGTGGCCTACAGACAACATCGTGGGCACAAATGTGGCGTGTTC
TTGTTGTAAGCGGCGAGTTGGGAGCTGAGAGCTGGGAGCAGGGTGGGCAGCCTGGGTGTAGGGGGGA
GGCGAGAGAGGTAGGACCCAAAAGCACATCTGCCCTGGGCCCCTGTGGTGGGCAGTGAGGGTGAGCAC
CCGGCCCAGAGGACGGCCATCCTGTGGGGTCGCGTCTGCACTGTGGGTTGGGGAAGCAGGGCGGTGG
TGGAGAAATGGGCACGGGCACCTCTGCAGAGAAGACGCAGAGCAATGAGCCCTTCTGTGTAGTGAGAAC
CCGCTCTGCACCAACCTCGGCGGCTGCTTTCTCTTGCGGTCTGGGGACTGTCCTTCCCATAGGTCAGAA
AACTGAGGCCCTGAGAAGGGGACTTCCACTGGCCCAGGTCACAGGCTGAGTGCTGAGCCTGGTGTTCG
CCGGGGCCGCAGCCTCCCTCAGGGCGCTCAGGGTCCCTGCAGTCCTGGCAAACCTTCCTGATGGGGAC
AGTCCGGGGCAGGAGGCAGGTGGGGACGCAGGTGGCTGGTGGTTCCGTTGTTCTCAGAAGCAAGGCAC
AAGGTGGGGCGGTTGATGGCACTGGGGAGGATGTTTCCTGGCCCGTGGAGAGGGTGGCGCCTGGTCAG
GTGGGCAGGGAGAGGCTGATGCTTGGAGTCGGTCACCTGCAGGGATGTTGTCATTAGGACGGGGGAAG
GACTGGATGAGGATGTCACAGTGGTGACAGCCCCCACTCCATGGTAGGAAGGGAACGCTATTGGGAATAG
TGGGGTTTAGGTAAAAGGGCACCCGTGGGTCGGGGCCTTCACTGAGGCTGGCCTATAGATGACATCTGG
GAGAGAGTCAGGACCCAGGAAGGCAGGTCCAGGA