Recursive MC solution to a simple problem...
That left today to work on the paper I'm putting together. Unfortunately, working on the paper doesn't mean I don't have any coding to do. I had a nice simulation that I needed to run: given the data sets I have, what are the likely overlaps I would expect?
Of course, I hate solving a problem once - I'd rather solve the general case and then plug in the particulars.
Today's problem can be summed up as: "Given n data sets, each with i_n genes, what is the expected number of genes common to each possible overlap of 2 or more datasets?"
My solution, after thinking about the problem for a while, was to use a recursive solution. Not surprisingly, I haven't written recursive code in years, so I was a little hesitant to give it a shot. In contrast, I whipped up the code, and gave it a shot - and it worked the first time. (That's sometimes a rarity with my code - I'm a really good debugger, but can often be sloppy when writing code quickly the first time.) Best of all, the code is extensible - If I have more data sets later, I can just add them in and re-run. No code modification needed beyond changing the data. (Yes, I was sloppy and hard coded it, though it would be trivial to read it from a data file, if someone wants to re-use this code.)
Anyhow, it turned out to be an elegant solution to a rather complex problem - and I was happy to see that the results I have for the real experiment stick out like a sore thumb: it's far greater than random chance.
If anyone is interested in seeing the code, it was uploaded into the Vancouver Short Read Analysis Package svn repository: here. (I'm doubting the number of page views that'll get, but what the heck, it's open source anyhow.)
I love it when code works properly - and I love it even more when it works properly the first time.
All in all, I'd say it's been a good day, not even counting the 2 hours I spent at the fencing club. En gard! (-;