Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: - Please come visit my blog there.

Thursday, February 25, 2010

AGBT 2010 - Kristian Cibulskis - Broad Institute

ITector: Accurate Somatic Mutation Detection in Whole Genome and Exome Capture

Mutation Detection - the goal
* Somatic Point Mutations: SNV in the tumour DNA that are not present in the normal

One challenge: Sensitivity
* tumour purity : Normal tissue gets into the sample - you may be testing normal tissue in high quantities
* ploidy: often there are multiple copies of the DNA

60% tumour is common - with 3x ploidy and min allele fraction: 0.23

Challenge 2: Specificity
* Signal: 1 somatic mutation per Mb
* Noise: 1000 common germline varients per Mb (in dbsnp)
* Mutations are not recurrent. (Constant discovery mode)
* 1000s mutations per sample, 100s of samples
* Too expensive to validate every mutation - would cost more than to discover.

* Core detection algorithm and practical artifact filters
* Under dev since Nov 2008
* Built upon GATK

Some artifacts can be cleaned up globally
* Remove molecular Duplicates
* Recalibrate Quality Scores (make Q values match)
* Locally Realign [Gapped - uses SW - I saw the poster]

Core Statistical Test
* Prior genotype probablities enforce variant expectation rate..
* first calculate score for non-reference (for tumor)
* then calculate scover for it being reference (for normal)
* Controling sequencing error
* Controlling missing a germline ref in the normal.

Running: you get more somatic mutations
* expected 30 somatic mutations, ended up with 133 in 30mb of coding sequence
* Error processes not captured by the core statistic produce high confidence mistakes
* Information about reference alleles and mutatn alleles should come from similar distributions
* linked mutations, library errors... etc

* Sequence context causes base hallucinations
* Fisher's exact test to check distribution of strand of reads containing reference allele versus alternate allele
* Bigger effect in capture than whole genome

* Sequencers/Aligners tend to make reproduceable errors, which then show up in alignments

Small changes to filters have big effects
* Very sensitive!

Filtering goes from 133 to 35.

* 26/29, 30/35, 31/36, 92/100
* Around 95%

How Sensitive?
* use core statistics.
* depends on coverage! [of course]
* use theoretical prediction data and ultra deep coverage as "control"
* Both seem to give the same/similar results
* Average 60-80% power to detect

Beta Testing going on
* Release of the software will be soon!



Post a Comment

<< Home