AGBT post #2.
Anyhow, on the AGBT side of things, I just came back form the Pacific Biosciences panel discussion, which was pretty neat. The discussion was on "how many base pairs will it take to enable personalized medicine?" A topic I'm really quite interested in.
The answers stretched from infinite, to 6 Billion, to 100TB, to 100 people (if they can pick the right person), to 1 (if they find the right one). It was a pretty decent discussion, covering things from American politics, to snp finding, to healthcare... you get the idea. The moderator was also good, the host of a show (Biotechworld?) on NPR.
My one problem is that in giving their answers, they brushed on several key points, but never really followed up on it.
1) just having the genome isn't enough. Stuff like transription factor binding sites, methylation, regulation, and so forth are all important. If you don't know how the genome works, personal medicine applications aren't going to fall out of it. (Elaine Mardis did mention this, but there was little discussion of it.)
2) Financial aspects will drive this. That, in itself was mentioned, but the real paradigm shifts will happen when you can convince the U.S. insurance companies that preventive medicine is cheaper than treating illness. That's only a matter of time, but I think that will drive FAR more long term effects than having people's genomes. (If insurance companies gave obese people a personal trainer and cooking lessons, assuming their health issues are diet related, they'd save a bundle in not having to pay for diabetes medicine, heart surgery, and associated costs.... but targeting people for preventive treatment requires much more personal medicine than we have now.)
Other points that were well covered include the effect of computational power as a limiting agent in processing information, the importance of sequencing the right people, and how its impossible to predict where the technology will take us, both morally and scientifically.
Anyhow, as I'm typing this while sitting in other talks:
Inanc Birol, also from the GSC, gave a talk on his work on a new de novo assembler:
80% reconstruction of the C.elegans genome from 30x coverage, which required 6 hours (10 cpu) for data preparation and performing the assembly in less than 10 minutes on a single CPU, using under 4Gb of RAM.
There you go.. the question for me (relevant to the last posting) is "how much of the 20% remaining has poor sequencability?" I'm willing to bet it's the same.
And I just heard a talk on SSAHA_pileup, which seems to try to sort snps. Unfortunately, every SNP caller talk I see always assumes 30X coverage.. How realistic is that for human data? Anyhow, I'm sure I missed something. I'll have to check out the slides on slideshare.net, once they're posted.
And the talks continue....
btw, remind me to look into the fast smith-waterman in cross-match - it sounds like it could be useful.