Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Tuesday, May 12, 2009

Quality vs Quantity

Today was an interesting day, for many reasons. The first was the afternoon tours for high-school students that came by the Genome Sciences Centre and the labs. I've been taking part in an outreach program for some of the students at two local high schools, which has involved visiting the students to teach them a bit of the biology and computers we do, as well as the tours that bring them by to see us "at work." Honestly, it's a lot of fun, and I really enjoy interacting with the kids. Their questions are always amusing and insightful - and are often a lot of fun to answer well. (How do you explain how the academic system works in 2 minutes or less?)

For my part, I introduced the kids to Pacific Biosystems SMRT technology. I came up with a relatively slick monologue that goes well with a video from PacBio. (If you haven't seen their video, you should definitely check this out.) The kids seem genuinely impressed with the concept, and really enjoy the graphics - although they enjoy the desktop effects with Ubuntu too... so maybe that's not the best criteria to use for evaluation.

Anyhow, aside from that distraction, I've also had the pleasure of working on some of my older code today. After months of people at the GSC ignoring the fact that I'd already written code to solve many of the problems they were trying to develop software, a few people have decided to pick up some of the pieces of the Vancouver Short Read Package and give it a test spin.

One of them was looking at FindFeatures - which I've used recently to find exons of interest in WTSS libraries - and the other was PSNPAnalysiPipeline code - which does some neat metrics for WTSS.

The fun part of it is that the code for both of those applications were written months ago - in some cases before I had the data to test them on. When revisiting them and now actually putting the code to use, I was really surprised by the number of options I'd tossed in, to account for many situations that hadn't even been seriously anticipated. Someone renamed all of your fasta files? No worries, just use the -prepend option! Your junction library has a completely non-standard naming? No problem, just use the -override_mapname option! Some of your MAQ aligned reads have indels - well, ok, i can give you a 1-line patch to make that work too.

I suppose that really makes me wonder: If I were writing one-off scripts, which would obviously lack this kind of flexibility, I'd be able to move faster and more nimble across the topics that interest me. (Several other grad students do that, and are well published because of it.) Is that a trade off I'm willing to make, though?

Someone really needs to hold a forum on this topic: "Grad students: quality or quantity?" I'd love to sit through those panel discussions. As for myself, I'm still somewhat undecided on the issue. I'd love more publications, but having the code just work (which gets harder and harder as the codebase hits 30k lines) is also a nice thing. While I'm sure users of my software are happy when these options exist, I wonder what my supervisor thinks of the months I've spent building all of these tools - and not writing papers.

Ah well, I suppose when it comes time to defend, I'll find out exactly what he thinks about that issue. :/

Labels: , , ,

0 Comments:

Post a Comment

<< Home