Thursday, January 29, 2009

FindPeaks 3.3.0 and AGBT proposal

Well, I finally have a version of FindPeaks 3.3.0 that runs without known bugs. Tracking down that last bug was tricky, and took me 3 days to find and squash it. It's hard to find bugs that only happen when they're near to a fragment that is duplicated. (-;

Anyhow, now that that's working better, it's time to add in the new functionality. The most pressing parts are the controls (in two parts - one of which is a top secret collaboration, while the other is just too boring to really talk about), and the other is implementing SAM/BAMtools interface. Whenever the "new MAQ" is ready, I'd like to be prepared for it.

Incidentally, I think controls will be the easier of the two, and I think I'll be able to finish the boring parts off this week. At the rate things are going, it might be another 2 days of debugging after that, but that's what makes software writing fun. (Just imagine a tall thin guy hunched over a computer keyboard cackling insanely while staring deep into the monitor displaying green matrix-style characters drifting downward...)

At any rate, I'm also working towards my poster for AGBT, which reminds me of what else I wanted to suggest. If anyone who reads my blog is going to be at AGBT and is happy to meet up to talk some ChIP-Seq or SNP finding (or anything remotely related), let me know. I'm thinking it would be neat to gather people together who are working on the same topic and talk for a bit. (I'm even willing to miss formal talks for it, as long as they're not directly related to my work.)

So, to that effect, I'll point to this page on SeqAnswers, and suggest if anyone is interested they let me know. (= It would definitely be an efficient way to network.

Oh, and (still) for those of you who've already registered for AGBT, check out the nifty package Illumina is sending out to people. I'm HIGHLY impressed with the creative idea and timeliness. (If you don't know what it is, the suspence is killing you and you care enough to ask, I'll put the answer in the comments.) (-:

Labels: ,

Thursday, February 7, 2008

AGBT post #2.

Good news.. my bag arrived! I'm going to go pick it up after the current session, and finally get some clean clothes and a shave. Phew!

Anyhow, on the AGBT side of things, I just came back form the Pacific Biosciences panel discussion, which was pretty neat. The discussion was on "how many base pairs will it take to enable personalized medicine?" A topic I'm really quite interested in.

The answers stretched from infinite, to 6 Billion, to 100TB, to 100 people (if they can pick the right person), to 1 (if they find the right one). It was a pretty decent discussion, covering things from American politics, to snp finding, to healthcare... you get the idea. The moderator was also good, the host of a show (Biotechworld?) on NPR.

My one problem is that in giving their answers, they brushed on several key points, but never really followed up on it.

1) just having the genome isn't enough. Stuff like transription factor binding sites, methylation, regulation, and so forth are all important. If you don't know how the genome works, personal medicine applications aren't going to fall out of it. (Elaine Mardis did mention this, but there was little discussion of it.)

2) Financial aspects will drive this. That, in itself was mentioned, but the real paradigm shifts will happen when you can convince the U.S. insurance companies that preventive medicine is cheaper than treating illness. That's only a matter of time, but I think that will drive FAR more long term effects than having people's genomes. (If insurance companies gave obese people a personal trainer and cooking lessons, assuming their health issues are diet related, they'd save a bundle in not having to pay for diabetes medicine, heart surgery, and associated costs.... but targeting people for preventive treatment requires much more personal medicine than we have now.)

Other points that were well covered include the effect of computational power as a limiting agent in processing information, the importance of sequencing the right people, and how its impossible to predict where the technology will take us, both morally and scientifically.

Anyhow, as I'm typing this while sitting in other talks:

Inanc Birol, also from the GSC, gave a talk on his work on a new de novo assembler:

80% reconstruction of the C.elegans genome from 30x coverage, which required 6 hours (10 cpu) for data preparation and performing the assembly in less than 10 minutes on a single CPU, using under 4Gb of RAM.

There you go.. the question for me (relevant to the last posting) is "how much of the 20% remaining has poor sequencability?" I'm willing to bet it's the same.

And I just heard a talk on SSAHA_pileup, which seems to try to sort snps. Unfortunately, every SNP caller talk I see always assumes 30X coverage.. How realistic is that for human data? Anyhow, I'm sure I missed something. I'll have to check out the slides on slideshare.net, once they're posted.

And the talks continue....


btw, remind me to look into the fast smith-waterman in cross-match - it sounds like it could be useful.

Labels: , , ,

AGBT post #1.

I'm here.. and online. I almost didn't make it, thanks to bad weather in florida, but at least the car we rented didn't break down on the road, the way the other group's did. Apparently the police saved them from the aligators and wild pigs... No one can say AGBT hasn't been exciting, so far.

Anyhow, lots of good topics, and meeting interesting people already. (I'm even sitting beside an exec from Illumina, in the ABI sponsored lunch.. how's that for irony?) Anyhow, I'm excited to start the poster sessions and get some good discussions going.

Unfortunately, I missed two of the talks this morning, while I negotiated with the good people at United airlines to have my bag delivered. The three others I've seen so far have been good. Some interesting points:

The best graphics are the ones with the two DNA strands shown separately. Too cool - must include that in my FindPeaksToR scripts.

Loss or gain of homozygosity can screw up what you think you have, compared to what's really there. Many models assume you have only one copy of a gene, or just don't really do much to make sense of these events.

From David Cox, I learned that Barcoding isn't new (it's not), but that it usually doesn't work well (I can't prove that), but hey, they got it to work (and that's good!).

And yes, my favorite line from David Cox's presentation was something like: 900 PCR products, 90 people['s samples]... 1 tube. Make sure you don't drop it!

Anyhow, I'm getting lots of ideas, and I'm thoroughly enjoying this conference. I'm saturated in next-gen sequencing work.

Anyhow, if anyone else is reading this, My poster is #38... feel free come come by and talk.

Labels: , ,