Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Friday, March 28, 2008

Chip-Seq on a laptop.

Every once in a while, I get an interesting email or comment from someone, which is worthy of a blog post. I woke up this morning to a comment left on my last post:

What kind of computing infrastructure do you need to be able to handle chipSEQ datasets? I'm guessing my standard IBM T60 laptop is not going to do the trick - but what does?


So yes, I do check my email when I get up... but that's not the point. Moving along...

The simple answer is, your laptop probably can't run a chip-Seq data set with the currently available programs... but it's probably not far off. My laptop has 2Gb of RAM (after tossing two new 1Gb sticks in last week), and a dual core AMD, running at 2GHz. Ironically, it's not far from what we do use to run FindPeaks: A quadcore 2.2GHz opteron with 8Gb of RAM.

Of course, FindPeaks doesn't take up the full resources of the machine. (Though I've seen it take up to 12CPUs on one of our fancier servers.) Generally, I only allocate 4Gb of RAM to the process, which should be more than enough for several millions or tens of millions of reads. The reason it takes so much ram? Because I don't drop any of the information being held that's contained in the Eland file, and because I need to retain all of the reads in memory to do a sort.

What does it need? Generally, FindPeaks only needs the start, end and direction of each read to do the basics, however, I don't throw away any of the other information collected, in case we want to use it later. If I did that, or if I pre-sorted the reads, I could probably drop the memory requirements down by an order of magnitude or more. (No one's asked me to do this, yet, however.) Is there anyone out there who needs a laptop-runnable version of FindPeaks?

This is somewhat timely, though. For the last few days I've been playing with a companion piece of software for FindPeaks which generates UCSC-compatible BED tracks from Eland data where I've adopted a minimalist approach. It takes about the same amount of time, and runs in under 300m of RAM. That, clearly, should run on anyone's laptop.

Labels: , , ,

0 Comments:

Post a Comment

<< Home