Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Thursday, August 13, 2009

Ridiculous Bioinformatics

I think I've finally figured out why bioinformatics is so ridiculous. It took me a while to figure this one out, and I'm still not sure if I believe it, but let me explain to you and see what you think.

The major problem is that bioinformatics isn't a single field, rather, it's the combination of (on a good day) biology and computer science. Each field on it's own is a complete subject that can take years to master. You have to respect the biologist who can rattle off the biochemicals pathway chart and then extrapolate that to the annotations of a genome to find interesting features of a new organism. Likewise, theres some serious respect due to the programmer who can optimize code down at the assembly level to give you incredible speed while still using half the amount of memory you initially expected to use. It's pretty rare to find someone capable of both, although I know a few who can pull it off.

Of course, each field on it's own has some "fudge factors" working against you in your quest for simplicity.

Biologists don't actually know the mechanisms and chemistry of all the enzymes they deal with - they are usually putting forward their best guesses, which lead them to new discoveries. Biology can effectively be summed us as "reverse engineering the living part of the universe", and we're far from having all the details worked out.

Computer Science, on the other hand, has an astounding amount of complexity layered over every task, with a plethora of languages and system, each with their own "gotchas" (are your arrays zero based or 1 based? how does your operating system handle wild cards at the command line? what does your text editor do to gene names like "Sep9") leading to absolute confusion for the novice programmer.

In a similar manner, we can also think about probabilities of encountering these pitfalls. If you have two independent events, and each of which has a distinct probability attached, you can multiply the probabilities to determine the likelihood of both events occurring simultaneously.

So, after all that, I'd like to propose "Fejes' law of interdisciplinary research"

The likelihood of achieving flawless work in an interdisciplinary research project is the product of the likelihood of achieving flawless work in each independent area.


That is to say, that if your biology experiments (on average) are free of mistakes 85% of the time, and your programming is free of bugs 90% of the time. (eg, you get the right answers), your likely hood of getting the right answer in a bioinformatics project is:
Fp = Flawless work in Programming
Fb = Flawless work in Biology
Fbp = Flawless work in Bioinformatics

Thus, according to Fejes' law:
Fb x Fp = Fbp

and the example given:
0.90 x 0.85 = 0.765

Thus, even an outstanding programmer and bioinformatician will struggle to get an extremely high rate of flawless results.

Fortunately, there's one saving grace to all of this: The magnitude of the errors is not taken into account. If the bug in the code is tiny, and has no impact on the conclusion, then that's hardly earth shattering, or if the biology measurements have just a small margin of error, it's not going to change the interpretation.

So there you have it, bioinformticians. if i haven't just scared you off of ever publishing anything again, you now know what you need to do...

Unit tests, anyone?

Labels: , , , ,

1 Comments:

Anonymous Steve said...

This post has been removed by a blog administrator.

August 14, 2009 6:59:00 AM PDT  

Post a Comment

<< Home