Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Monday, September 28, 2009

Recursive MC solution to a simple problem...

I'm trying to find balance between writing and experiments/coding. You can't do both at the same time without going nuts, in my humble opinion, so I've come up with the plan of alternating days. One day of FindPeaks work, one day on my project. At that rate, I may not give the fastest responses (yes, I have a few emails waiting), but it should keep me sane and help me graduate in a reasonable amount of time. (For those of you waiting, tomorrow is FindPeaks day.)

That left today to work on the paper I'm putting together. Unfortunately, working on the paper doesn't mean I don't have any coding to do. I had a nice simulation that I needed to run: given the data sets I have, what are the likely overlaps I would expect?

Of course, I hate solving a problem once - I'd rather solve the general case and then plug in the particulars.

Today's problem can be summed up as: "Given n data sets, each with i_n genes, what is the expected number of genes common to each possible overlap of 2 or more datasets?"

My solution, after thinking about the problem for a while, was to use a recursive solution. Not surprisingly, I haven't written recursive code in years, so I was a little hesitant to give it a shot. In contrast, I whipped up the code, and gave it a shot - and it worked the first time. (That's sometimes a rarity with my code - I'm a really good debugger, but can often be sloppy when writing code quickly the first time.) Best of all, the code is extensible - If I have more data sets later, I can just add them in and re-run. No code modification needed beyond changing the data. (Yes, I was sloppy and hard coded it, though it would be trivial to read it from a data file, if someone wants to re-use this code.)

Anyhow, it turned out to be an elegant solution to a rather complex problem - and I was happy to see that the results I have for the real experiment stick out like a sore thumb: it's far greater than random chance.

If anyone is interested in seeing the code, it was uploaded into the Vancouver Short Read Analysis Package svn repository: here. (I'm doubting the number of page views that'll get, but what the heck, it's open source anyhow.)

I love it when code works properly - and I love it even more when it works properly the first time.

All in all, I'd say it's been a good day, not even counting the 2 hours I spent at the fencing club. En gard! (-;

Labels: , ,

0 Comments:

Post a Comment

<< Home