fejes.ca: Questions about controls.

After my post on controls, the other day, I've discovered I'd left a few details out of my blog entry. It was especially easy to come to that conclusion, since it was pointed out by Aaron in a comment below.... Anyhow, since his questions were right on, I figured that they deserved a post of their own, not just a quick reply.

There were two parts to the questions, so I'll break up the reply into two parts. First:

One question - what exactly are you referring to as 'control' data? For chip-seq, sequencing the "input" ie non-enriched chromatin of your IPs makes sense, but does doubling the amount of sequencing you perform (therefore cost & analysis time) give enough of a return? eg in my first analysis we are using a normal and cancer cell line, and looking for differences between the two.

Control data, in the context I was using it, should be experiment specific. Non-enriched chromatin is probably going to be the most common for experiments with transcription factors, or possibly even histone modifications. Since you're most interested in the enrichment, a non-enriched control makes sense. At the GSC, we've had many conversations about what the appropriate controls are for any given experiment, and our answers have ranged from genomic DNA or non-enriched-DNA all the way to stimulated/unstimulated sample pairs. The underlying premise is that the control is always the one that makes sense for the given experiment.

On that note, I suspect it's probably worth rehashing the Robertson 2007 paper on stimulated vs. unstimulated IFN-gamma STAT1 transcription markers... hrm..

Anyhow, The second half of that question boils down to "is it really worth it to spend the money on controls?"

YES! Unequivocally YES!

Without a control, you'll never be able to get a good handle on the actual statistics behind your experiment, you'll never be able to remove bias in your samples (from the sequencing itself, and yes, it is very much present). In the tests I've been running on the latest FP code (3.3.3.1), I've been seeing what I think are incredibly promising results. If it cost 5 times as much to do controls, I'd still be pushing for them. (Of course, it's actually not my money, so take that for what it's worth.)

In any case, there's a good light at the end of the cost tunnel on this. Although yes, it does appear that the better the control, the better the results will be, we've also been able to "re-use" controls. So, in fact, having one good set of controls per species already seems to make a big contribution. Thus, controls should be amortizable over the life of the sequencing data, so the cost should not be anywhere near double for large sequencing centres. (I don't have hard data, to back this up, yet, but it is generally the experience I've had so far.)

To get to the second half of the question:

Also, what controls can you use for WTSS? Or do you just mean comparing two groups eg cancer vs normal?

That question is similar to the one above, and so is my answer: it depends on the question you're asking. If you want to use it to compare against another sample (eg, using FP 3.3.3.1's compare function), then you'll want to compare with another sample, which will be your "control". I've been working with this pipeline for a few weeks now and have been incredibly impressed with how well this works over my own (VERY old 32bp) data sets.

On the other hand, if you're actually interested in discovering new exons and new genes using this approach, you'd want to use a whole genome shotgun as your control.

As with all science, there's no single right answer for which protocol you need to use until you've decided what question you want to ask.

And finally, there's one last important part of this equation - have you sequenced either your control or your sample to enough depth? Even if you have controls working well, it's absolutely key to make sure your data can answer your question well. So many questions, so little time!

Labels: Chip-Seq, FindPeaks, Vancouver Short Read Analysis Package

4 Comments:

David Dooling said...: It still seems to me that "what is the proper control in an RNA-seq experiment?" is still and open question. DNA does not really cut it in all cases and there is no such thing as a "normal".

Also, if you are looking for variants, then validation is likely a better strategy than a control.; May 19, 2009 6:23:00 AM PDT
Anthony Fejes said...: Hi David,

You're absolutely correct on several points - calling variants doesn't require a control, and DNA wouldn't cut it for many types of RNA-Seq experiments. However, I think my main point stants: If you're designing an experiment, you'll need to pick the appropriate control for it, which people have been entirely skipping for ChIP-Seq.

At any rate, the RNA-Seq experiments I've been doing, where I'm identifying splice variations and alternative expression patterns DO require a good control - and so my comments in post above should probably be read in that context. Sorry, I should probably have been clearer about the goals of the RNA-Seq I was discussing above.

Cheers!; May 19, 2009 8:47:00 AM PDT
Nicolas said...: Hi Anthony,

Don't you think that controls used for microarray (expression and ChIP-chip) are well established and that we could use these controls with NGS?

Cheers!; May 25, 2009 7:37:00 AM PDT
Anthony Fejes said...: Hi Nicolas,

I've put up a new article on this subject:

link; May 25, 2009 10:02:00 AM PDT