Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Thursday, August 7, 2008

MS word formatted resumes?

I just took a look at Bioinformatics Solutions Inc's web page. They're the makers of the upcoming ZOOM (Zillions of Oligos Mapped) aligner software. Rumour says it's supposed to perform well. If I understood correctly, they're applying some neat pattern matching algorithms from their Pattern Hunter software to do ultra-fast short read gapped alignments.

Anyhow, I saw their careers page, (No, I'm not currently looking for a job) and thought that was a great example of mis-purposed document formats. Somehow, I'd expect a bioinformatics company to be a bit more tuned into things like that. I also came to the realization that any software company that wants MS format docs instead of PDFs can't be a great place for a Linux tool-chain-based coder. (MS documents seem so 1990's... what does Google ask for?)

Considering I haven't even done my comprehensive exam yet, I guess I won't have to worry about that for a while.

And now, back to work.

[Update: Google is much more clued in: "PDF, HTML, or Microsoft Word documents or text formats are acceptable or you can submit using plain text format"]

7 Comments:

Anonymous Anonymous said...

Note the trick: they are benchmarking ZOOM which only performs ungapped alignment; they have a gapped version ZOOM-I, but it is not evaluated. ZOOM-I seems to have quite a different algorithm and I do not think it can be any close to ZOOM in speed. ZOOM series do ultra-fast alignment and do gapped alignment, but not at the same time.

August 7, 2008 2:35:00 PM PDT  
Blogger Anthony said...

Very sneaky. Thanks for the heads-up.

August 7, 2008 2:37:00 PM PDT  
Anonymous Anonymous said...

Me again. You can see such kind of tricks here and there in the area of short read alignment. For example, benchmarking speed with a fast version while evaluating the accuracy with an accurate (yet slow) version; boasting off speed while hiding the insane memory consumption; benchmarking performance in situations that strongly favour their own software; comparing an aligner that only find one best unique hit with another aligner that gives far more information.

It is not not always obvious for a research outside the area to see the tricks. It is a pity, sometimes.

August 7, 2008 3:10:00 PM PDT  
Blogger Anthony said...

I absolutely agree, but it's not just confined to short read assembly.

These problems exist both in quality assurance (which is often lacking) and bias (which is often not lacking) - whether intentional or not.

I wonder if you could get a publication by demonstrating bias in other people's publication, and/or filling in the missing pieces. You'd definitely make enemies, anyhow.

August 7, 2008 3:19:00 PM PDT  
Anonymous Anonymous said...

Hi, anonymous.

Really nice that you've read the manuscript of our paper. Seems that we caused some misunderstanding.  It's true we didn't show the speed of ZOOM-I in the paper. The focus of our paper is a general framework for fast filtering and with full sensitivity, which may offer some help to the researches in this field. So we only benchmark the speed of basic ZOOM under the condition of the same accuracy with other software, such as ELAND. Then to make the system complete, we added the part describing how confidence scores, insertion/deletions and pair-end reads are handled, but didn't show their performance.

In one word, what we want to show in the paper is the superior of our filtering strategy over existing strategies, other than the superior of ZOOM software package. For your reference, the speed of ZOOM-C and ZOOM-P is comparable to basic ZOOM, while the speed of ZOOM-I is five times slower than basic ZOOM.

I agree with Anthony that those sneaky tricks you mentioned exist in many research areas. Unfortunately, this is the fact we are facing and must watch clearly. But we sincerely apologize if we had caused any misunderstanding from our paper.

(And for anyone interested, our paper "ZOOM! Zillions of Oligos mapped" has been accepted by Bioinformatics, so I think the online version will be available soon).

spirit

August 11, 2008 11:55:00 AM PDT  
Anonymous Anonymous said...

Actually what annoys me is the advertisement at ZOOM's website. It sounds like ZOOM does all the good things in one go, but in fact most of the functionality comes separately with different programs, which is dishonest and not useful. I know most commercial advertisements look like this and this is not your fault at all.

There is no doubt that you have done a great job and your publication is invaluable. However, I think you can make the benchmark better by addressing the following issues. Firstly, you did not quote the memory consumption of Eland. So far as I know, Eland will only use ~40% of memory of ZOOM. Memory is important especially for parallelizing on multi-core machines. I bet if Anthony Cox had also meant to use larger memory, he would have achieved faster speed. Secondly, it seems to me that in benchmark, you only make ZOOM output the best unique hits, while Eland will always count the alternative hits. If counting is cheap, I would suggest implementing in ZOOM as it is useful in some cases; if counting is expensive, then we cannot say ZOOM is faster as the alignment results are different. Of course it is not necessary to get ZOOM output the same things as Eland, but properly addressing this issue would be appreciated.

Note that I am not saying you are hiding something. I just mean to say that the benchmark can be improved a little. I believe ZOOM is faster anyway; even if it were slower than Eland, I would still prefer ZOOM as it eliminates the 2 mismatches/32bp read liength limits in Eland.

August 12, 2008 7:36:00 AM PDT  
Anonymous Anonymous said...

Thanks for your suggestion. We have added the memory consumption of ELAND on our website. Sorry for our carelessness. For multiple hits of each read, ZOOM in fact does all the possible alignments without any heuristic tricks. Users can choose to output uniquely mapped result or best N mapped results for each read. The speed comparations with ELAND are carried out with the condition that the alignment results are the same.

Ps. We have released ZOOM today. Welcome to try it out. :)

spirit

August 20, 2008 3:13:00 PM PDT  

Post a Comment

<< Home