Sunteți pe pagina 1din 23

PRACTICAL: CHIP-SEQ DATA ANALYSIS

Andre Faure & Petra Schwalie Paul Flicek Lab, Vertebrate Genomics, EMBL-EBI 9. March 2010

Wednesday, 9 March 2011

RESOURCES
http://www.bioconductor.org http://seqanswers.com data

(R packages & workows; help)

(software overview; forum)

repositories: ArrayExpress & GEO, ENA (collaborative efforts, ChIP-seq)

ENCODE, modENCODE Reviews

+ Benchmarks (see last slide)

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling (2) Genomic context Read prole plots (3) Motif analysis (de novo & scanning) (4) Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here)

Peak-calling (1) Peak-calling

(2) Genomic context Read prole plots (3) Motif analysis (de novo & scanning) (4) Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling (2) Genomic context Read prole plots (3) Motif analysis (de novo & scanning) (4) Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling

Genomic context (2) Genomiccontext

Read prole plots (3) Motif analysis (de novo & scanning) (4) Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling (2) Genomic context Read prole plots (3) Motif analysis (de novo & scanning) (4) Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling (2) Genomic context

Read prole plots Read prole plots

(3) Motif analysis (de novo & scanning) (4) Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling (2) Genomic context Read prole plots (3) Motif analysis (de novo & scanning) (4) Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling (2) Genomic context
CTCF

Read prole plots


T
G
C
A T

CG
G

A C ATC

A AG
T

CCA AGGGGGC
C
T
G
A

TG

CT

GC

TT

A AGCT

AGC

AT
C T

GC
AG
CG
TA
AA

AC

AC

CT

C AGCTGT

TT

(3) Motif analysis Motif analysis

(de novo & scanning)

(4) Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling (2) Genomic context Read prole plots (3) Motif analysis (de novo & scanning) (4) Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling (2) Genomic context Read prole plots (3) Motif analysis (de novo & scanning)

Sampl

Sampl

(4) Differential enrichment Differential enrichment

Wednesday, 9 March 2011

WORKFLOW

Raw data (ENCODE CTCF, H3K36me3, Input in K562 & HepG2) Quality check & align (not discussed here) (1) Peak-calling (2) Genomic context Read prole plots (3) Motif analysis (de novo & scanning) (4) Differential enrichment

Wednesday, 9 March 2011

(1) PEAK-CALLING

chipseq, GenomicRanges

(Bioconductor)

estimating fragment length extending reads islands of enrichment modeling the background (e.g. Poisson, neg. binomial) calling peaks (manual, MACS, SWEMBL) genomic overlaps: comparison of peak-calling results

Wednesday, 9 March 2011

(2) GENOMIC CONTEXT

biomart, GenomicRanges

(Bioconductor)

obtaining annotation (Ensembl) overlaps with annotation (e.g. promoters) enrichment of peaks in genomic areas (e.g. promoters) (not discussed here) functional term enrichment (not discussed here) (e.g. GREAT, McLean et al. Nat Biotechnol) average prole plots on genomic feature/peak summit

Wednesday, 9 March 2011

(3) MOTIF ANALYSIS


BSgenome, seqLogo, GenomicRanges MEME

(Bioconductor)

(de novo motif discovery)

obtaining the peak sequences de novo motif discovery motif scanning: motifs per peaks? motif enrichment vs. background (not discussed here) rening the PWM for a given factor motif prole plot (distribution of motif around peak summit)

Wednesday, 9 March 2011

(4) DIFFERENTIAL ENRICHMENT


DESeq, GenomicRanges

(Bioconductor)

dening regions of interest (ROI) obtaining counts per regions of interest (replicates & conditions) estimating library sizes estimating variation of counts per ROIs calling differentially modied regions (negative binomial distribution) overview of signicantly modied regions

Wednesday, 9 March 2011

http://www.ebi.ac.uk/~schwalie/chipseqprac_0311/chipseq_practical.pdf

Wednesday, 9 March 2011

(1) PEAK-CALLING

Wednesday, 9 March 2011

PEAK ANALYSIS

Wednesday, 9 March 2011

(3) MOTIF ANALYSIS

motif discovery
MACS Swembl

motif prole

motifs/peaks
Wednesday, 9 March 2011

(4) DIFFERENTIAL HISTONE MODIFICATION

Wednesday, 9 March 2011

CHIP-SEQ REVIEWS + BENCHMARKS


ChIP-seq: advantages and challenges of a maturing technology (Park, Nat Rev Genet 2009) Computation for ChIP-seq and RNA-seq studies (Peke et al, Nat Methods 2009) Design and analysis of ChIP-seq experiments for DNA-binding proteins (Kharchenko et al, Nat Biotechnol 2008) Q&A: ChIP-seq technologies and the study of gene regulation (Liu et al, MBC Biol 2010)

Evaluation of algorithm performance in ChIP-seq peak detection (Wilbanks, PLos ONE 2010) A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments (Laajala et al, BMC Bioinformatics)

Wednesday, 9 March 2011

S-ar putea să vă placă și