Jun Wang, Beijing Genomics Institute at Shenzhen - “Sequencing, Sequencing and Sequencing”
With >500Gb per month, what would you do?
The obvious choice is to do whole genomes: From Giant Panda to the tree of life. (Is the panda really a bear?) Formal reason: They eat bamboo, are cute and nice.... and they're cute! Ok, the real reason: Selected an animal “without competition” for sequencing, has a significant “Chinese element”, and proof of concept that short read length is good for the assembly on a large genome.
Why do we need longer reads? 10 years ago, it was the question, can you sequence by shotgun sequencing? Yes... now can we do with short reads? Yes: but there are questions
Read length: the longer the better
Insert sizes: for finishing, this becomes important
Depth: determines quality.
Why short reads work: most of the genome is really unique anyhow. Insert size is probably the most important mater.
( Started with a pilot project: cucumber. )
Panda: has 20 chr + X/X. Did inserts from 150-10,000. 50X sequence coverage, 600X physical coverage.
Genome coverage is 80%. Gene coverage 95%, Single base error rate is Q50, less than 1/100kb.
Gene stats: 27.8k homologous to dog genes.
Evolutionarily closer to dog, of sequenced genomes, next closest to cat. (But panda is a bear.) It's evolutionary rate is slightly higher than dog. Would like to add significant species to tree of life.
One of the original questions on what to sequence: “Tastes good, sequence it!” Now, it's close to 50% of the major dinner table! [yikes]
Instead, now proposes cute things: Penguins!
Aiming to sequence “big genomes”, 100Gb+ genomes.
First Asian was sequenced last year.
Is one genome enough? No, probably not.. Need 100's to study population genetics. Now taking part in 100 genome project. Committed for 3Tb. (about 500 individuals.)
De Novo assembly is the only solution for a complete structure variation (SV) map. Still too expensive, though.
Started a new project in sequencing asian cancer patients. The cost is about $4000-5000 per sample. [I missed how many per person]
top 10 causes of death for asians... start to rank, and decide which to attack.
4P healthcare (personalized medical care) is coming (All based on personal genomics). Picture of FAR too many people on a beach in china.
Already sequenced all major rice cultivars. Found many selective sweeps – lots of new variation?
Also working on Silkworm study... [this is just rapidly turning into a list of projects they've started. Interesting, but nothing much to gain from it.]
DNA methylome: just finished the first asian version.
Also working on methylation that changes as you climb mountains. [Ok, I just don't really get this one.] High altitude adaptation... [but why is this a priority?]
[At the bottom of the slide it said “Work? Fun? Science?” I'm not really sure if that was any of the above.... strange.]
Also doing Whole Transcriptome. Several species, plants, insects, etc.
You need huge depths (400x) to get all transcripts greater than or equal to 1, but decreases from there.
Also started a 1000 plant collaboration. Genomics has barely scratched the bast biodeiversity on the planet. They are going to start working on this. From Algae to flowering plants.
1Gb of transcript sequencing per sample would be equivalent to 2M EST.
Now doing 75bp reads PET.
Another project: Metagenomics of the Human Intestinal Tract.
“Sequencing is Basic” [eh?]
My Comments: It's interesting to know what these guys are doing, but it just seems really random. They may be the biggest, but I wonder where they're going with the technology.. It appears to be a technology in search of a project, unlike the way the rest of the world is working towards projects, and then applying the technology. Maybe someone else can figure out what their underlying goal is and explain it to me. :/
Labels: AGBT 2009