Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Thursday, February 25, 2010

AGBT 2010 - Elliot Margulies - NHGRI/NIH

Sequencing and analysis of matched tumor and normal genomes from a melanoma patient

Experimental Design:
* melanoma tumor sample - sequence it
* matched normal blood sample - sequence it
* seems simple, but takes new tools.
* unique advertisement strategies. (-;

Saved 10 runs of Images alone - more than 100 Tb of storage

Compare Illumina 1.6 v 1.4
* Uniquely aligning read and next_phred
* Didn't explain the results of the graphs shown... missed the point.

Used Eland, partition into bins
* realign with xmatch. (well characterized and scales well.)

In the end, 2 whole genome datsets
* 2 x 100 bp read
* 33 tumour and 24 normal (lanes)
* total runs (5 and 3)
* total alignable reads 1billion/1.2billion

Coverage statistics:
* Greater than 99% covered 1x
* 5x-10x range for variants covered by 94-95%

Method for variant detection
* Most Probable Genotype
* bayesian statistic approach, prior probability of observing a non-ref allele (expected mutation rate)
* Equation given - not going to copy that for html.
* Confidence is the difference between the best call and the next most probable call.

[This looks VERY much like SNVMix2...]

Graph concordance with percentage called. If you use a cutoff of 10, you get 95% in the normal genome, 90% in the tumor.

Moved from MPG to Most Probable Variant (MPV)
* Compare between the best call and the probability of the reference data.
* improves the quality of the call.

Settings:
* Using MPV greater than 10 (4Million variants)
* Subtract out evidence for germ line or low coverage
** take out high confidence gernline variants
** subtract MPG is less than 10, but looks like a variant.
** throw out low confidence somatic variants.
* leaves 189,000 somatic variants (tumour variants)
* also filtering dbsnp
* break into coding/non-coding
* synonymous/non-synonymous
* verify SNVs by sanger sequencing. (75/84 verify) It may be that some of them are there, but not detectable by sanger.

Summary table of SNV pipeline.
* 174,000 non coding variants.

Paper: Local DNA Topography correlates with functional noncoding regions of the human genome.

Impact on SNPs on Local DNA Structure - sometimes this can change the structure alot.

Use "Chai" to do structure informed evolutionary information
* only about 10,000 overlap "chai" regions
* 2,176 appear to dramatically change DNA shape.

"Chai" spots are "mutation cold spots"
Future plans, look at more tumor normal pairs, and investigate it further.

Labels:

0 Comments:

Post a Comment

<< Home