AGBT 2010 - Complete Genomics Workshop
- sequence only human genomes - 1 Million genomes in the next 5 years
- build out tools to gain a good undertanding of the human genome
- done 50 genomes last year
- Recent Science publication
- expect to do 500 genomes/month
Lots of Customers.
- Deep projects
- don't waste pixels,
- use ligases to read
- very high quality reads - low cost reagents
- provide all bioinformatics to customers
- don't sell technology, just results.
- just return all the processed calls (snps, snv, sv, etc)
- more efficient to outsource the "engineering" for groups who just want to do biology
- fedex sample, get back results.
- high throughput "on demand" sequencing
- 10 centres around the world
- Sequence 1 Million genomes to "break the back" of the research problem
- they do the bioinformatics
- first wave: understand functional genomics
- second wave: pharmaceutical - patientient stratification
- third wave: personal genomics - use that for treatment
Focus on research community
Two customers to present results:
Jared Roach, Senior Research Sceintist, Institute for Systems Biology (Rare Genetic disease study)
- studied coverage in four genomes
- 85-92% of genome
- 96% coverage in at least one individual
- Excellent coverage in unique regions.
- within 25bp, and some places down to 10bp
- identified 125 breakpoints
- 90/125 occur at hotspots
- can reconstruct breakpoints in the family
Since they have twins, they can do some nice tests
- infer error rate: 1x10^-5
- excluded regions with compression blocks (error goes up to 1.1^-5)
- Homozygous only: 8.0x10^-6 (greater than 90% of genome)
- Heterozygous only: 1.7x10^-4
[Discussion of genes found - no names, so there's no point in taking notes. They claim they get results that make sense.]
[Time's up - on to next speaker.
Zemin Zhang, Senior Scientist, Genentech/Roche (Lung Cancer Study)
Cancer and Mutations
[Skipping overview of what cancer is.... I think that's been well covered elsewhere.]
- lung cancer is the leading cause of cancer related mortality worldwide...
- significant unmet need for treatment
Start with one patient
- non small cell lung adenocarcinoma.
- 25 cigarettes/day
- tumour: 95% cancer cells
Genomic characterization on Affy and Agilent arrays
- lots of CNV and LOH
- circos diagrams!
- 131GB mapped sequence in normal, 171Gb mapped seq in tumour
- 46x coverage normal, 60x tumour
[Skipping some info on coverage...]
KRAS G12C mutation
what about rest of 2.7M SNVs?
- SomaticScore predicts SNV validation rates
- 67% are somatic by prediction
- more than 50,000 somatic SNV are projected
Selection and bias observed in the lung cancer genome by comparing somatic and germline mutations
GC to TA changes: Tobacco-associated DNA damage signature
Protection against mutations in coding and promoter regions.
- look at coding regions only - mutations are dramatically less than expected - there is probably strong selection pressure and/or repair
Fewer mutations in expressed genes.
- expressed genes have fewer mutations even lower in transcribed strand
- non-expressed genes have mutation rate similar to non-genic regions
Positive selection in subsets of genes
- KRAS is the only previously known mutation
- Genes also mutated in other lung cancers...
Finding structural variation by paired end reads
- median dist between pairs 300bp.
- distance almost never goes beyond 1kb.
Look for clusters of sequence reads where one arm is on a different chromosome or more than 1kb away
- small number of reads
- 23 inter-chr
- 56 intra-chr
- use fish + pcr
- validate results
- 43/65 test cases are found to be somatic and have nucleotide level breakpoint junctions
- chr 4 to 9 translocation
- 50% of cells showed this fusion (FISH)
Possible scenario of Chr15 inversion and deletion investigated.
[got distracted, missed point.. oops.]
- very nice Circos diagram
- > 1 mutation for every 3 cigarettes
In the process of doing more work with Complete Genomics
Labels: AGBT 2010