Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Saturday, August 15, 2009

What would you do with 10kbp reads?

I just caught a tweet about an article on the Pathogens blog (What can you do with 1000 base pair reads?), which is specifically about 454 reads. Personally, I'm not so interested in 454 reads - the technology is good, but I don't have access to 454 data, so it's somewhat irrelevant to me. (Not to say 1kbp reads isn't neat, but no one has volunteered to pass me 454 data in a long time...)

So, anyhow, I'm trying to think two steps ahead. 2010 is supposed to be the year that Pacific Biosciences (and other companies) release the next generation of sequencing technologies - which will undoubtedly be longer than 1k. (I seem to recall hearing that PacBio has 10k+ reads.- UPDATE: I found a reference.) So to heck with 1kbp reads, this raises the real question: What would you do with a 10,000bp read? And, equally important, how do you work with a 10kbp read?
  • What software do you have now that can deal with 10k reads?
  • Will you align or assemble with a 10k read?
  • What experiments will you be able to do with a 10k read?
Frankly, I suspect that nothing we're currently using will work well with them - we'll all have to go back to the drawing board and rework the algorithms we use.

So, what do you think?

Labels: , , , ,

5 Comments:

Blogger Luke said...

Denovo assembly, all the way. People are already working on assembling illumina paired-end data into pretty big contigs - 10kb reads would assemble easy as pie.

The other option is to do local alignment, which iswhat 454 does.

August 16, 2009 1:15:00 AM PDT  
Blogger Daniel said...

Absolutely - once we get close to accurate 10 kb reads we can finally throw away the reference sequence entirely.

Of course, we'll need 10 kb reads at a much higher throughput than 454 is offering its 1000 base reads...

August 16, 2009 3:52:00 AM PDT  
Blogger graveley said...

Even more than genome assembly, 10 kb reads would finally allow for the true diversity of alternatively spliced transcriptomes to be determined. Even with paired-end reads and sophisticated assembly programs, it is nearly impossible to determine whether two distant alternative exons are present in the same transcript or not. 10 kb reads would make for instant EST project, without the "T" part.

August 16, 2009 5:47:00 AM PDT  
Anonymous Anonymous said...

I am sure the plant genomics guys are drooling at the idea of 10kb reads, someone somewhere is planning a DNA Noah's ark when de novo sequencing of complex / polyploid genomes is possible. Lets hope PacBio or Oxford Nanopore really do give us some new toys to play with soon.

August 17, 2009 3:41:00 AM PDT  
Anonymous Erik said...

De novo sequencing, splice form detection, structural variant detection, SNP phasing etc.

As for what algorithms will be in use, we should at least be able to wipe the dust off of and update some of the methods used for Sanger reads.

I don't agree with the poster above that we'll not be using reference sequences anymore - much as microarrays still serves a purpose, RNA-seq and similar short read technologies will be around for a long time, and reference aligning will continue being an important tool.

Not all problems are better addressed by long reads, and if nothing else, you can bet that Illumina, ABI (Life) etc will continue to push their short read applications for many years to come.

August 25, 2009 10:44:00 PM PDT  

Post a Comment

<< Home