Sunteți pe pagina 1din 31

RNASeq:MethodsandApplica6ons

PratThiru

Outline
IntrotoRNASeq
BiologicalQues6ons
ComparisonwithOtherMethods
RNASeqProtocol

RNASeqApplica6ons
Annota6on
Quan6ca6on
OtherApplica6ons

ExpressionProlingStepsandSoGware
RunningTopHatandCuinks(Commands)
2

GoalsofSequencingtheTranscriptome
Annota6on
Iden6fygenes,exons,splicingevents,ncRNAs,etc.
Novelgenesortranscripts

Quan6ca6on
Abundanceoftranscriptsbetweendierent
condi6ons

Transcriptome:RNAWorld

hYp://nchtalk.geospiza.com/2009/05/smallrnasgetsmaller.html

Transcriptome:Complexity

hYp://www.ncbi.nlm.nih.gov/books/NBK21128/

ComparisonofMethodsfor
StudyingtheTranscriptome
Technology

Tilingmicroarray

cDNAorESTsequencing RNASeq

Principle

Hybridiza6on

Sangersequencing

Highthroughputsequencing

Resolu.on

Fromseveralto100bp

Singlebase

Singlebase

Throughput

High

Low

High

Relianceongenomicsequence

Yes

No

Insomecases

Backgroundnoise
Applica0on

High

Low

Low

Simultaneouslymaptranscribed
Yes
regionsandgeneexpression

Limitedforgeneexpression

Yes

Dynamicrangetoquan.fygene
Uptoafewhundredfold
expressionlevel

Notprac6cal

>8,000fold

Limited

Yes

Yes

Limited

Yes

Yes

High

High

Low

Costformappingtranscriptomes
High
oflargegenomes

High

Rela6velylow

Technologyspecica0ons

Abilitytodis.nguishdierent
isoforms
Abilitytodis.nguishallelic
expression
Prac0calissues
RequiredamountofRNA

Wang,Z.etal.RNASeq:arevolu.onarytoolfortranscriptomicsNatureReviewsGene6cs(2009)

RNASeqExperiment

Wang,Z.etal.RNASeq:arevolu.onarytoolfortranscriptomicsNatureReviewsGene6cs(2009)

Outline
IntrotoRNASeq
BiologicalQues6ons
ComparisonwithOtherMethods
RNASeqProtocol

RNASeqApplica6ons
Annota6on
Quan6ca6on
OtherApplica6ons

ExpressionProlingStepsandSoGware
RunningTopHatandCuinks(Commands)
8

RNASeqApplica6onsAnnota6on:
Alterna6veSplicingEvents

Ozsolak,F.andMilos,P.RNAsequencing:advances,challengesandopportuni.esNatureReviewsGene6cs(2011)

RNASeqApplica6onsAnnota6on:
Iden6fyKnownandNovelTranscripts
Knownexons/gene

GuYman,M.etalAbini.oreconstruc.onofcelltypespecictranscriptomesinmouse
revealstheconservedmul.exonicstructureoflincRNAsNatureBiotechnology(2010)

MappedReads:
novelexonorgene?

UnmappedReads:
novelsplicejunc6ons?

Trapnell,C.etalTranscriptassemblyandquan.ca.onbyRNASeqrevealsunannotated
transcriptsandisoformswitchingduringcelldieren.a.onNatureBiotechnology(2010)

10

AssemblyandMappingRNASeq
Op6ons:
Alignandthen
assemble
Assembleandthen
align
Alignto
genome
transcriptome

Haas,B.J.,andZody,M.C.AdvancingRNASeqanalysisNatureBiotechnology(2010)

11

RNASeqApplica6onsQuan6ca6on:
ExpressionProling

MortazaviA.,etal.Mappingandquan.fyingmammaliantranscriptomesbyRNASeqNatureMethods(2008)

12

NeedforNormaliza6on
Morereadsmappedtoatranscriptifitis
i)long
ii)athigherdepthofcoverage
Normalizesuchthat
i)featuresofdierentlengths
ii)totalsequencefromdierentcondi6ons
canbecompared

13

Quan6fyingExpression:RPKM
RPKM:ReadsPerKilobaseperMillionmapped
reads
RPKM=
C:Numberofmappablereadsonafeature(eg.
transcript,exon,etc.)
L:Lengthoffeature(inkb)
N:Totalnumberofmappablereads(inmillions)

MortazaviA.,etal.Mappingandquan.fyingmammaliantranscriptomesbyRNASeqNatureMethods(2008)

14

RPKMExample
GeneA600basesGeneB1100basesGeneC1400bases
RPKM=12/(0.6*6)=3.33RPKM=24/(1.1*6)=3.64RPKM=11/(1.4*6)=1.31

Sample1

Sample2

C=12C=24C=11

N=6M

C=19C=28C=16

N=8M
RPKM=19/(0.6*8)=3.96RPKM=28/(1.1*8)=1.94RPKM=16/(1.4*8)=1.43

15

Quan6fyingExpression:FPKM
FPKM:FragmentsPerKilobaseoftranscriptper
Millionfragmentsmapped
AnalogoustoRPKMbutdoesnotusereadcounts.
therela6veabundancesoftranscriptsaredescribedin
termsoftheexpectedbiologicalobjects(fragments)
observedfromanRNASeqexperiment,whichinthe
futuremaynotberepresentedbysingleread

Trapnell,C.etalTranscriptassemblyandquan.ca.onbyRNASeqrevealsunannotatedtranscriptsandisoformswitchingduringcell
dieren.a.onNatureBiotechnology(2010)

16

Quan6fyingExpression:
Normaliza6onMethods
Totalcount(eg.RPKM)
UpperQuar6le(eg.75th
percen6le):SimilartoTotal
countbutperlaneupper
quar6leofcountsforgenes
withreadsinatleastonelane.
Quan6le:Foreachlanethe
distribu6onofreadcountsis
matchedtoareference
distribu6ondenedintermsof
mediancounts
Bullard,J.,etal.Evalua.onofsta.s.calmethodsfornormaliza.onanddieren.alexpressioninmRNASeqexperiments
BMCBioinforma6cs(2010)

17

RNASeqApplica6ons:
GeneFusion

Ozsolak,F.andMilos,P.RNAsequencing:advances,challengesandopportuni.esNatureReviewsGene6cs(2011)

18

Outline
IntrotoRNASeq
BiologicalQues6ons
ComparisonwithOtherMethods
RNASeqProtocol

RNASeqApplica6ons
Iden6fyingTranscripts
Quan6ca6on
OtherApplica6ons

ExpressionProlingStepsandSoGware
RunningTopHatandCuinks(Commands)
19

ExpressionProlingWorkow
QC:FilterShortReads

AlignandAssemble
orAssembleandAlign

Computa6onalAnalysis:
Quan6fyExpression,or
otherapplica6ons

VisualizeData

(SeeHotTopicsonMappingNGSReads)
FASTXToolkit
FastQC
R:ShortRead

AlignwithTopHat,assemblewithCuinks

Cucompare,Cudi
SAMtools,BEDtools
R:edgeR,DESeq

IGV(SeeHotTopicsonIGV)
UCSCGenomeBrowser
20

TheTuxedoTools

hYp://mged12deepsequencinganalysis.wikispaces.com/le/view/Cole_MGED_tutorial_slides.pdf

21

TopHatAlgorithm

Trapnell,C.,etalTopHat:discoveringsplicejunc.onswithRNASeqBioinforma6cs(2009)

22

CuinksAlgorithm

Trapnell,C.,etalTranscriptassemblyandquan.ca.onbyRNASeqrevealsunannotatedtranscriptsandisoformswitchingduringcell
dieren.a.onNatureBiotechnology(2010)

23

Outline
IntrotoRNASeq
BiologicalQues6ons
ComparisonwithOtherMethods
RNASeqProtocol

RNASeqApplica6ons
Iden6fyingTranscripts
Quan6ca6on
OtherApplica6ons

ExpressionProlingStepsandSoGware
RunningTopHatandCuinks(Commands)
24

RunningTopHat:AlignReads
TopHatManual:hYp://tophat.cbcb.umd.edu/
manual.html

RunningTopHatonTak
Usage:
tophat[op6ons]<bow6e_index><reads1[,reads2,...,readsN]>[reads1[,reads2,...,readsN]]
eg.
bsubtophatp2solexa1.3qualsmaxmul6hits5os_1_TopHat_Out/nfs/genomes/
mouse_gp_jul_07_no_random/bow6e/mm9s_1_sequence.txt
Op6ons(SeeManualforallavailableop6ons):
o/outputdir
solexaquals
solexa1.3quals

p/numthreads
g/maxmul6hits

SetsthenameofthedirectoryinwhichTopHatwillwriteallofitsoutput.
UsetheSolexascaleforqualityvaluesinFASTQles.
AsoftheIlluminaGApipelineversion1.3,qualityscoresareencodedinPhredscaledbase64.
Usethisop6onforFASTQlesfrompipeline1.3orlater.
Usethismanythreadstoalignreads.Thedefaultis1.
InstructsTopHattoallowuptothismanyalignmentstothereferenceforagivenread,and
suppressesallalignmentsforreadswithmorethanthismanyalignments.Thedefaultis40.

25

TopHatOutput
OutputofTopHatisabamle.Binaryversionof
SequenceAlignment/Map(SAM)le
UseIntegra6veGenomicsViewer(IGV)toviewbam
leoruseSAMtoolstoanalyzebamle
eg.SAMFile
WICMTSOLEXA:1:20:670:1533#137chr13240920330M*00CTGGATCTGGACCTGGACCTGGATCTATAT:::::::::::::::::::::::::::::NM:i:1NH:i:2CC:Z:chr6CP:i:83893005
WICMTSOLEXA:1:69:135:1285#89chr13269437130M*00TGCCTAAACTTATTAAGGCAGGCCATGGGC:((/+:::(+:+':/:+++&+//':++:::NM:i:2NH:i:4CC:Z:chr7CP:i:20934843
WICMTSOLEXA:1:84:584:747#153chr13270083030M*00AGCAAGTTTTTTNTTAGCCCTAGATTCCAG::::::::::::%:::::::::::::::::NM:i:1NH:i:5CC:Z:=CP:i:136301734
WICMTSOLEXA:1:75:1357:1675#163chr1352212825530M=35222870GTGGCTTTGTGGTCTTCACCAACCTTTCTC::::::::::::::::::::::::::::::NM:i:1NH:i:1
WICMTSOLEXA:1:75:1357:1675#83chr1352228725530M=35221280CTGTAGGTGTAATCCTAAATTCTTATTACG::::::::::::::::::::::::::::::NM:i:0NH:i:1
WICMTSOLEXA:1:8:59:283#153chr13522536330M*00TTTCTGCTTTGATTATGGTACTGATGTCTG:::::::::::4::::::::::::::::::NM:i:2NH:i:2CC:Z:chr5CP:i:134317691
WICMTSOLEXA:1:12:1161:945#89chr13523371130M*00TCTACATAGCCCAAACTGGCTTTGGACTCT::::::::::::::::::::::::::::::NM:i:0NH:i:3CC:Z:chr10CP:i:117172515
WICMTSOLEXA:1:45:1469:1826#73chr13620888330M*00CAAGTATTTAATGTTTTCATTAAATTGTTT::::::::::::::::::::::::::4:::NM:i:0NH:i:2CC:Z:chr11CP:i:22903295
WICMTSOLEXA:1:14:536:150#73chr13620943330M*00CTGGAAGACAATGTCCAAAAACTCTGAATC:::::::::::::::::::::::::%::&:NM:i:1NH:i:2CC:Z:chr11CP:i:22903240
WICMTSOLEXA:1:66:646:1188#137chr13662923030M*00AAAAAAAAAACACCACCCCCAACAAAAAAA+00++0+0+''0++++:00::.&:::,:,:NM:i:2NH:i:5CC:Z:chr10CP:i:94881279

26

Cuinks:
AssembleandQuan6fyReads

CuinksManual:

hYp://cuinks.cbcb.umd.edu/manual.html

RunningCuinksonTak
Op6onal:Supplyannota6oninGTFformatwith
Gop6on
Usage:

cuinks[op6ons]<hits.bam>

eg.

bsubcuinksp2os_1_Cuinks_Outs_1_TopHat_Out/accepted_hits.bam
eg.cuinkswillassembleandquan6fyusingknowntranscriptsusingg~lesupplied
bsubcuinksp2Gtranscripts.g~accepted_hits.bam
27

CuinksOutput
OutputofCuinksisaGTFlewithassembledisoforms
eg.

chr1Cuinkstranscript36321447363302701000.gene_id"Neurl3";transcript_id"NM_153408";FPKM"3.7155221121";frac"1.000000";
conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinksexon36321447363233981000.gene_id"Neurl3";transcript_id"NM_153408";exon_number"1";FPKM"3.7155221121";frac
"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinksexon36325501363255541000.gene_id"Neurl3";transcript_id"NM_153408";exon_number"2";FPKM"3.7155221121";frac
"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinksexon36326058363265461000.gene_id"Neurl3";transcript_id"NM_153408";exon_number"3";FPKM"3.7155221121";frac
"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinksexon36330183363302701000.gene_id"Neurl3";transcript_id"NM_153408";exon_number"4";FPKM"3.7155221121";frac
"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinkstranscript36364578363808744+.gene_id"Arid5a";transcript_id"NM_145996";FPKM"0.0015751054";frac"0.002360";conf_lo
"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36364578363646814+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"1";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36373054363731724+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"2";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36374929363750264+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"3";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36375333363754984+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"4";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36375837363808744+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"5";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";

28

LocalResources
Descrip6onofavailableles,see
/nfs/genomes/BaRC_Genomes_README.txt

Bow6eindex
/nfs/genomes/<species>/bowtie

eg.
/nfs/genomes/mouse_gp_jul_07_no_random/bowtie

GTFles
/nfs/genomes/<species>/gtf

eg.
/nfs/genomes/mouse_gp_jul_07/gtf
29

FurtherReading
RNASeq
Mortazavi,A.,etal.Mappingandquan.fyingmammaliantranscriptomesbyRNASeqNatureMethods
5(7):621628(2008)
Wang,Z.,atal.RNASeq:arevolu.onarytoolfortranscriptomicsNatureReviewsGene6cs10:5763
(2009)
Ozsolak,F.andMilosP.M.RNAsequencing:advances,challenges,andopportuni.esNatureReviews
Gene6cs12:8798(2011)

TopHat
Trapnell,C.,etal.TopHat:discoveringsplicejunc.onswithRNASeqBioinforma6cs25(9)11051111
(2009)

Cuinks
Trapnell,C.,etal.Transcriptassemblyandquan.ca.onbyRNASeqrevealsunannotatedtranscripts
andisoformswitchingduringcelldieren.a.onNatureBiotechnology28(5)511515(2010)

30

OnlineCommunity
ForumandDiscussion
hYp://seqanswers.com/

31

S-ar putea să vă placă și