Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Saturday, August 15, 2009

What would you do with 10kbp reads?

I just caught a tweet about an article on the Pathogens blog (What can you do with 1000 base pair reads?), which is specifically about 454 reads. Personally, I'm not so interested in 454 reads - the technology is good, but I don't have access to 454 data, so it's somewhat irrelevant to me. (Not to say 1kbp reads isn't neat, but no one has volunteered to pass me 454 data in a long time...)

So, anyhow, I'm trying to think two steps ahead. 2010 is supposed to be the year that Pacific Biosciences (and other companies) release the next generation of sequencing technologies - which will undoubtedly be longer than 1k. (I seem to recall hearing that PacBio has 10k+ reads.- UPDATE: I found a reference.) So to heck with 1kbp reads, this raises the real question: What would you do with a 10,000bp read? And, equally important, how do you work with a 10kbp read?
  • What software do you have now that can deal with 10k reads?
  • Will you align or assemble with a 10k read?
  • What experiments will you be able to do with a 10k read?
Frankly, I suspect that nothing we're currently using will work well with them - we'll all have to go back to the drawing board and rework the algorithms we use.

So, what do you think?

Labels: , , , ,

Tuesday, May 12, 2009

Quality vs Quantity

Today was an interesting day, for many reasons. The first was the afternoon tours for high-school students that came by the Genome Sciences Centre and the labs. I've been taking part in an outreach program for some of the students at two local high schools, which has involved visiting the students to teach them a bit of the biology and computers we do, as well as the tours that bring them by to see us "at work." Honestly, it's a lot of fun, and I really enjoy interacting with the kids. Their questions are always amusing and insightful - and are often a lot of fun to answer well. (How do you explain how the academic system works in 2 minutes or less?)

For my part, I introduced the kids to Pacific Biosystems SMRT technology. I came up with a relatively slick monologue that goes well with a video from PacBio. (If you haven't seen their video, you should definitely check this out.) The kids seem genuinely impressed with the concept, and really enjoy the graphics - although they enjoy the desktop effects with Ubuntu too... so maybe that's not the best criteria to use for evaluation.

Anyhow, aside from that distraction, I've also had the pleasure of working on some of my older code today. After months of people at the GSC ignoring the fact that I'd already written code to solve many of the problems they were trying to develop software, a few people have decided to pick up some of the pieces of the Vancouver Short Read Package and give it a test spin.

One of them was looking at FindFeatures - which I've used recently to find exons of interest in WTSS libraries - and the other was PSNPAnalysiPipeline code - which does some neat metrics for WTSS.

The fun part of it is that the code for both of those applications were written months ago - in some cases before I had the data to test them on. When revisiting them and now actually putting the code to use, I was really surprised by the number of options I'd tossed in, to account for many situations that hadn't even been seriously anticipated. Someone renamed all of your fasta files? No worries, just use the -prepend option! Your junction library has a completely non-standard naming? No problem, just use the -override_mapname option! Some of your MAQ aligned reads have indels - well, ok, i can give you a 1-line patch to make that work too.

I suppose that really makes me wonder: If I were writing one-off scripts, which would obviously lack this kind of flexibility, I'd be able to move faster and more nimble across the topics that interest me. (Several other grad students do that, and are well published because of it.) Is that a trade off I'm willing to make, though?

Someone really needs to hold a forum on this topic: "Grad students: quality or quantity?" I'd love to sit through those panel discussions. As for myself, I'm still somewhat undecided on the issue. I'd love more publications, but having the code just work (which gets harder and harder as the codebase hits 30k lines) is also a nice thing. While I'm sure users of my software are happy when these options exist, I wonder what my supervisor thinks of the months I've spent building all of these tools - and not writing papers.

Ah well, I suppose when it comes time to defend, I'll find out exactly what he thinks about that issue. :/

Labels: , , ,

Wednesday, October 15, 2008

Not first on the subject...

One of the challenges of writing a blog is to try and provide new content. I've always thought that there was no point in covering information that someone else has already covered - but it's hard to be the first to break every story. So, I figured I'd do something else, which is useful: provide a few links to things that someone else found first.

Today's breaking news was something I found at SeqAnswers.com, on the new relase of information on Pacific Biosystems' new SMRT technology. The open access article is here. Thanks to ECO for updating with that link! (I was trying to get it all morning, to no avail.)

For humour, I wanted to discuss the IgNobel award winning paper: You Bastard: A Narrative Exploration of the Experience of Indignation within Organizations. However, another blogger on my reading list beat me to it.

I was also going to comment on lawsuit launched by the makers of Endnote against Zotero, but the same blogger also beat me to that. (Though, there's certainly a lot to be said about that case in terms of open standards, and people with antiquated business models fighting technology advances by any means possible...) Since I'm still considering using Zotero for my own research, it's probably something I'll save for another day, when there's more information available about the progress of the case.

Otherwise, there's still the new Apple MacBook, milled out of a single piece of aluminum, and the new Dell Mini 9, which I've been debating buying. Since it now comes with Ubuntu pre-installed, maybe I can forgive Dell for not refunding me for the XP licence that they made me buy at xmas last year with my current laptop.

Oh well. Now you know what I've been looking at while I procrastinate on my thesis proposal, and the mound of code changes I need to make ASAP... and now, back to work!

Labels: , , ,

Saturday, February 9, 2008

Pacific Biotech new sequencing technology

I have some breaking news. I doubt I'm the first to blog this, but I find it absolutely amazing, so I had to share.

Steve Turner from Pacific Biosciences (PacBio), just gave the final talk of the AGBT session, and it was damn impressive. They have a completely new method of doing sequencing that uses DNA polymerase as a sequencing engine. Most impressively, they've completed their proof of concept, and they presented data from it in the session.

The method is called Single Molecule Real Time (SMRT) sequencing. It's capable of producing 5000-25,000 base pair reads, at a rate of 10 bases/second. (They apparently have 25bps techniques in development, and expect to release when they have 50bps working!)

The machinery has zero moving part, and once everything is in place, they anticipate that they'll have a sequencing rate of greater than 100 Gb per hour! As they are proud to mention, that's about a full draft genome for a human being in 15 minutes, and at a cost of about $100. Holy crap!

Labels: , , , ,