Documente Academic
Documente Profesional
Documente Cultură
Application
Assembly: Compare
Raw Pre- specific:
Question Alignment / samples / Answer?
reads processing Variant calling,
de novo methods
count matrix, ...
36626 - Next Generation Sequencing Analysis
Generalized NGS analysis
Data size
Application
Assembly: Compare
Raw Pre- specific:
Question Alignment / samples / Answer?
reads processing Variant calling,
de novo methods
count matrix, ...
36626 - Next Generation Sequencing Analysis
Assembly: Two basic approaches
• If you dont know which were used: FastQC will (may) find
them for you!
mer
Concept: Rare k-mers are seq. errors
0.015
rse
Need >15X coverage
na Error k-mers
the
0.010
uch
True k-mers
ACGTGGTTGCCCTTAAA
ACGTGGTTACCCTTAAA
Density
ACGTGGTTACCCTTAAA
(2)
ACGTGGTTACCCTTAAA
0.005
ACGTGGTTACCCTTAAA
ACGTGGTTACCCTTAAA
ACGTGGTTACCCTTAAA
(3) ACGTGGTTACCCTTAAA
ACGTGGTTACCCTTAAA
0.000
et k
me, 0 20 40 60 80 100
ea- Coverage
s in Figure 3 k-mer coverage. 15-mer coverage model fit to 76×
of coverage of 36 bp reads from E. coli. Note that the expected
36626 - coverage
ich of a k-mer
Next Generation
L −k +1
in the genome
Sequencing Analysis using reads of length L will be Kelley et al., 2010
times the expected coverage of a single nucleotide
Merge paired ends
• Example: L
• N: Number of reads: 5 mill C = N ⇥
G
• L: Read length: 100
• G: Genome size: 5 Mbases
•
G OnC = 5*100/5 = 100X
: genome size
• average there are 100 reads covering each position in the genome
N : number of reads
36626 - Next Generation Sequencing Analysis
Last, but important!
• Lots of data - storage is expensive!