Terrence Furey, Duke University - “A Genome-Wide Open Chromatin Map in Human Cell Types in the ENCODE Project”
2007: Scale up from 1% to 100%
Where are all of the regulatory element in the genome: a parts list of all functional elements.
We now know: 53% unique, 45% repetitive, 2% are genes. Some how, the 98% controls the other 2%.
Focussed on regions of open chromatin. Open chromatin is not bound to nucleosomes.
5.locus control regions
6.meiotic recombination hotspots.
Use two assays: DNAse hyper-sensitivity. Used at single site in the past, now used for high throughput genome wide assays. The second method is FAIRE: formaldehyde assisted identification of regulatory elements. It's a ChIP-Seq. [I don't know why they call it FAIRE... it's exactly a ChIP experiment – I must be missing something.]
Also explaining what ChIP-Seq/ChIP-chip is. They now do ChIP-Seq. Align sequences with MAQ. Filter on number of aligned locations. (keep up to 4 alignments). Use F-Seq. Then call peaks with a threshold. Use a continuous value signal.
The program is F-Seq, created by Alan Boyle. Outputs in Bed and Wig format. Also deals with alignability “ploidy”. (Boyle et al, Bioinformatics 2008). They use Mappability to calculate smoothing.
[This all sounds famillar, somehow... yet I've never heard of F-Seq. I'm going to have to look this up!]
Claim you need normalization to do proper calling. Normalization can also be applied if you know regions of duplications.
[as I think about it, continuous read signals must create MASSIVE wig files. I would think that would be an issue.]
Peak calling validation: ROC analysis. False positive along bottom axis, true positives on vertical axis. Show chip-seq and chip-array have very high concordance.
Dnase I HS – 72 Million sequences, 149,000 regions, 58.5Mb – 2.0%
FAIRE – 70 Million sequences, 147,000 regions, 53Mb – 1.8%
Compare them – and you see the peaks correspond with the peaks in the other. Not exact, but similar. Very good coverage by FAIRE of the Dnase peaks. Not as good the other way, but close.
Goal of project should be done on a huge list of cells (92 types?? - 20 cell lines, add 50 to 60 more, including different locations in body, disease, cells exposed to different agents... etc etc.) RNA is tissue specific, so that changes what you'll see.
Using dnase and fare assays to define open chromatin map
exploring many cell times,
discovery of ubiquitous and cell specific elements.
Note: Data is available as quickly as possible - next month or two, but may not be used for publication for the first 9 months.
Labels: AGBT 2009