Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Thursday, June 26, 2008

Chip-Seq revisited

In the world of ChIP-Seq, things don't seem to slow down. A collaborator of mine pointed out the new application called MACS, which is yet another peak finder, written in python as an open source project. That makes 2 open source peak finders that I'm aware of: Useq and now MACS.

The interesting thing, to me, is the maturity of the code (in terms of features implemented). In neither cases is it all that great, as it's mostly lacking features I consider to be relatively basic, and relatively naive in terms of algorithms used for peak detection. Though, I suppose I've been working with FindPeaks long enough that nearly everything else will seem relatively basic in comparison.

However, I'll come back to a few more FP related things in a moment. I wanted to jump to another ChIP-Seq related item that I'd noticed this week. The Wold lab merged their Peak Finder software into a larger development package for Genomic and Transcriptome work, which I think they're calling ERANGE. I've long argued that the Peak Finding tools are really just a subset of the whole Illumina tool-set required, and it's nice to see other people doing this.

This is the development model I've been using, though I don't know if the wold lab does exactly the same thing. The high-level organization uses a core library set, core object set, and then FindPeaks and other projects just sit on top, using those shared layers. It's a reasonably efficient model. And, in a blog a while ago, I mentioned that I'd made a huge number of changes to my code after coming across the tool called "Enerjy". I sat down to figure out how many lines were changed in the last two weeks: 26,000+ lines of code, comments and javadoc. That's a startling figure, since my entire code base ( grep -r " " * | wc -l) is only 22,884 lines, of which 15,022 contain semi-colons.

Anyhow, I have several plans for the next couple of days:
  1. try to get my SVN repository to somewhere other people can work on it as well, and not just restricted to GSC developers.
  2. Improve the threading I've got going
  3. Clean up the documentation, where possible
  4. and work on the Adaptive mode code.

Hopefully, that'll clean things up a bit.

Back to FindPeaks itself, the latest news is that my Application note in Bioinformatics has been accepted. Actually, it was accepted about a week ago, but I'm still waiting to see it in the advanced access section - hopefully it won't be much longer. I also have a textbook chapter on ChIP-Seq coming out relatively soon, (I'm absolutely honoured to have been given that opportunity!) assuming I can get my changes done by Monday.

I don't think that'll be a problem.

Labels: , ,

3 Comments:

Blogger William said...

I've been tooling around with data from the ENCODE project here in the states, which is trying to do ChIP-seq and other assays like DNAse-seq on a pretty large scale. We've been making pretty heavy use of FindPeaks, but I was wondering about your throughts in comparing it to USeq, specifically. To my knowledge findPeaks doesn't currently accept a 'mock' or control data set against which to compare enrichment and significance. I'd love to know your thoughts on this,

Your Blog has been an excellent resource, please keep it up!

July 23, 2008 7:49:00 AM PDT  
Blogger Anthony said...

Hi William,

Thanks for the reply, and the feedback! - I'm really glad to hear that FindPeaks is being used out in the "wild." Nothing pleases a software developer more than to hear their tools are being used. (=

I think your questions are important enough that I'll write a blog entry about them today or tomorrow. I have been thinking about controls since last night, when someone else asked me the same question. Some things need to be discussed at the GSC before FP4 gets committed to anything.

Anthony

July 23, 2008 8:48:00 AM PDT  
Blogger William said...

Thanks for the amazingly quick response! I have some (limited) experience in bioinformatic software development and I know exactly what you mean. I look forward to your next entry.

July 23, 2008 8:51:00 AM PDT  

Post a Comment

<< Home