AGBT 2010 - Christopher Mason - Weill Cornel Medical College
How do we go from sequence to organism?
Example of disease that they were able to find change in exon.. but that's not the normal. Brain transcriptome is especiallly bad.
Complexity of transcriptome is vast.
NGS transformed the amount of data we're getting
Compared microarrays vs RNA-seq
* RNA-seq gives you much more information on DE.
* Metric for RNA-seq expression (Reads per kb per million reads)
* Controls: spike in synthetic w poly-A tails [next slide: control worked]
Looking at brain
* validate existing gene boundaries.
* longer isoforms
* find other genes
* 70-90% of genes expressed in the brain with strong neuro-developmental correlation
* Ensembl genes categories expressed: many types of RNAs found
* ~18% of splicee forms are unique to each individual - splicing levels similar across development
* at high expression, 80-90% of genes have alt isoforms
[Lists of genes that were DE in fetal/adult brain - "things that make sense"]
What is different is Transcription Factors - especially Zinc Finger TFs.
* Shift towards fetal expression
* most rapidly expanding class of genes
Look at UTRs
* fetal brain exhibits myriad extensions of gene models and variable UTRs.
* TARs found. (Transcriptionally activated regions) - confirmed with PCR
No visible end of gene discovery.
* the deeper you go, the more new things you see.
* sensitivity (TP / TP + FN) and specificity
* looks incredible - nearly straight to 1.
Source of "wiggles" in RNA-seq.
* it's everything, really
* biggest problem: annotation is one source.
Human genome is not just 33Mb.... it's only 1/2 to 1/5th ofthe exome capture.
* 165 Mb have been validated on multiple SeQC platforms!
There aren't just 20,000 genes - it's closer to 45,000!
Begat: every bp of the genome is a locus for ttesting, each remiaing sequence is a variable.
Don't forget, we also have to filter out viruses/bacteria/other
* Code for Begat is available. (Email given - forgot to copy it down.)
Labels: AGBT 2010