Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: - Please come visit my blog there.

Monday, December 14, 2009

Talk: Writing a manuscript for bio-medical publication - Ian E.P. Taylor (Professor emeritus of Botany, UBC)

I was asked to blog this by a colleague. I haven't caught up with the other talks from last week, but since I'm here, and I found an outlet, I figured I could just dump my notes to the web. So here they are. I've left in a few comments of my own, but really, the lecture stands on it's own. Most of the notes are derived directly from the talk - but they mirror the distributed hand outs pretty closely. And frankly, the talk was full of examples and anecdotes that really helped illustrate the point. If you get a chance to see Dr. Taylor speak, I highly suggest it. Any mistakes, as always, are mine, and you shouldn't take my advice on publishing...

[I will clean up the (HORRIBLY BAD) HTML that was caused by dumping in an open office document. (Update: I've now removed the bad HTML - and I won't be cutting and pasting from open office again. That was brutal.)]

Writing a manuscript for bio-medical publication

By Ian E.P. Taylor (Professor emeritus of Botany)

Started with an anecdote: When you publish a paper, the people you have to impress are 2 reviewers and an editor. They are harder to impress, and you have to worry about first impressions, or they will be hostile. Don't be negative about your work. For things that are missing: keep them for your next experiment.

Peer review: An independent but generic tool that allows an editor to
  • Determine originality
  • operationally competence
  • coherent reports of research
  1. Unpublished research may as well not have been done.
  2. Unread publications may as well not have been written.
Four steps to understand: (Plan for the lecture.)
  1. Plan and plan for your journal
  2. Real and credible authorship. (Those who are listed have done something... names have been put on cheques for authorships.)
  3. Peer review. (The web hasn't changed this much – people still want to see peer review.)
  4. Responses to review.

Picking a journal:
Polling the audience – lots of usual reasons. [Joke about having your professor on the editorial board....]
His reasons:

  1. indexed
  2. best in field
  3. appropriate readership
  4. i read it a lot
  5. timeliness
  6. costs
  7. profile of other authors
  8. professor-choice

Impact factor:

  • Kind of a fake idea.... the papers have impact factors, not journals. Several examples of high impact papers in low impact factor journals.

The plan:

Know the giants upon who's shoulders you stand. What are their backgrounds, and where do they publish?

  1. Know your goals, hypothesis
  2. If you can't write what you discovered in 20 words, you haven't discovered it.
  3. How did you discover it
  4. prepare the outcomes (text, figs, tables, supplementary)
  5. what do they mean?
  6. Keep your references as you write. (Missing references piss off reviewers!)
[The only decent cheese in the world is cheshire cheese – and yes, that's where he grew up.]

Planning abstract:
  • State what you discovered in the first line, and your conclusion in the second line
  • As you write, challenge each sentence against your plan.
  • If necessary, change plan

“We have discovered...” Everything you write should fit this goal.

Elements of a paper:
  • Results – no results, no paper
  • Methods – how you got the results
  • Discussion – why the methods. (May or may not be the choppy approach....)
  • Introduction – direct the reader
  • Conclusions – not repeat of results
  • Aside on abstract: can be structured – explains order of what you're discussing, or can be intro: explain significance.
How to read a paper:
  • Introduction -> Methods -> Discussion.
  • Introduction should lead you straight to discussion.... should be able to skip results and methods.
  • Writing author (always singular) – You should try to write it out yourself, as this is key. You don't want to mix styles.
  • Co-authors
  • senior author (first)
  • senior author (PI)
  • corresponding author.
In case it's not obvious, first is the person who did the work, last is the PI. (for Biomedical) Some places do alphabetical, but it depends on your field.

Writing Author:
  • Get the instructions to authors from the journal
  • write the first draft, which is a record of your work
  • unify style
  • share draft with all co-authors
  • Responsible for ensuring the record of ethical performance.
  • It's key that you get the style correct for your journal.

Before you write – make sure you know which journal.

[Great advice to me: Journals do not publish the truth! They publish results, written truthfully!]

Wonderful anecdote about how they used to write papers by putting everyone into a room and not leaving till the 2nd draft was done.

Who are the authors?
  • Only the people who should be on there. Eg, supervisors who wrote the grant, didn't actually supervise.
  • Data providers
  • Analysis data (e.g. statisticians)
  • political.... yeah, it happens
  • All who contributed to an essential part of the actual research reported
  • No one who is there for contributing a gift of ANY sort
  • check the journal for the criteria.
  • All authors must agree to the authorship list
  • All authors must agree to submission.
People who create libraries should not be included. Eg, if they supply you with a DNA library or something of that nature – if they didn't do the work, they don't belong on the paper. Just as a acknowledgement. See American Journal of Medicine – they have a form for this. (=

Fabrication, Falsification, Publication: they are criminal offenses in the scholarly community. Most time detection is by accident.

Author obligations - everyone on the paper takes responsibility for the whole paper.
  • inform editor of related works
  • refer to instructions to authors for details
  • covering letter should context of other published work. Make sure each paper is an original idea.
  • Inform editors of financial or other conflicts of interest
  • Identify (un)acceptable bias
  • Full justification of 'representative results'
  • Negative results???
  • The stuff you'd expect
  • list of suitable reviewers! Don't just pick the people who are best in the field - give others too. Don't just pick editorial board either. Give some reason why you picked them.
Remember: You are the world expert on the subject you're writing on.

Reviewer Obligations:
  • Treat as you would wish a stranger to treat you
  • Follow directions from editor
  • Consult editor before consulting others
  • group reviews must adhere to confidentially
  • One person must sign off on it.
  • Refuse, rather than delay: 2 weeks max. Do not delay!
  • if you're waiting for journals, don't put up with delay, get on the phone and call!
  • If asked for recommendation, follow criteria.
  • Annotate manuscript, but AVOID being rude.
If you're reviewing, Track changes is acceptable
  • Don't steal ideas or use their ideas for your gain
  • Do not break confidentially
  • Avoid becoming investigator. - don't tell the research how to do the research.
Disclose potential conflict of interests - let the editor decide if it's a problem
and yeah, don't talk to the author unless the editor says it's ok (explicitly)
  • Reviewer conflicts:
  • Recent collaborations
  • intellectual conflicts
  • scientific bias or personal animosity
  • parallel research activity, research work on a competing project
  • potential financial benefit
  • AND potential benefit from advanced knowledge of new work
Responding to reviewers:

  • In the end, the editors decide what's in the journal - not the reviewer
  • It is not an election - the editor can ignore the recommendations
  • Take all the comments - particularly the editors - seriously
  • The editor can say "your paper is accepted subject to the actions of the reviewers...." It has not yet been accepted.
  • Vent about negative comments - but only for 10 minutes.
  • Mark every point on a copy of the manuscript
  • Fix all typos and mechanics IMMEDIATELY. (within 24 hours.)
  • Fix the criticisms, THEN and ONLY THEN worry about the rebut.
  • The comment expressed by reviewer 1 is incorrect because...
  • We have addressed....
  • Don't delay!

Labels: ,

Monday, December 7, 2009

Talk: Peter Campbell - A lung Cancer genome: Complex signatures of tobacco exposure

I had the pleasure of accompanying my supervisor and a few other people from the BC Genome Sciences Centre down to St. Louis last week for a "joint lab meeting" and symposium on Cancer Genomics. The symposium had a fantastic line up of the "who's who" of 2nd generation sequencing (next-gen) cancer genomics, and was open to the public, so I thought I'd take the time to blog some of my notes. (I have a couple servers chugging away on stuff, so I've got a few minutes...) As always, this is drawn from my notes, so if there are mistakes, they are undoubtedly my fault either in interpretation, or in reading my own messy handwriting.

First talk: Peter Campbell.
Title: A lung Cancer Genome: complex signatures of tobacco exposure.

  • Tobaco usage (cigarettes/day) is dropping in USA, and lung cancer deaths is starting to fall. About a 20 year lag.
  • However, usage in China, Indonesia, etc, is rising dramatically, and thus, tobacco cancer research is sill important.
  • Tobacco causes DNA adducts to be formed, including benzo[alpha]pyrane, which distortes residues on either side, and also causes backbone distortions. This leads to DNA copying errors
  • Used NCI-H209 cell line, and generated 30-40 fold coverage using ABI SOLiD machine.
  • observed 22,910 somatic mutations, 334 CNVs, 65 indels, 58 genomic
  • rearrangements.

(My notes are a little sketchy at this point) They were able to show a sensitivity of 76%, however, they sacrificed some sensitivity for specificity, and were able to show 97% true positives in coding regions, 94% true positives in non-coding regions.

The break down of the 23,000 snps appears to be: 70.41% intergenic, 28.21% intronic, 0.79% non-coding, 0.58% coding. There is a Kn/Ks ratio of 2.6:1. These are fairly standard.

They then characterized it by nucleotide change, and showed a significant number of A->G and T-> C changes, which is expected for smoking patients. They also compared the spectrum of changes seen in the literature for Tp53, using IARC database.

Another interesting point is that G mutations enrich at CpG islands, and there is some evidence that mutations are preferentially targeting methylated CpGs. However, there are fewer adenine mutations at GpA locations.

Looking at this data suggested that there are repair mechanisms: eg, transcription coupled repair. This can cause a difference in mutation on transcribed vs non-transcribed DNA. In fact, this is known for G->T mutations, however, a change in A->G mutations is also seen, and is lower as expression increases. Also interesting, G-> Mutations decrease as transcription increases on both strands, however the mechanism of that observation is not yet known. This also seems to hold for other changes as well.

On the subject of genomic rearrangements, a large complex array of rearrangements was observed, however it was not mainly classical inversions as was expected. Instead, insertions in the midst of translocations were observed. What they did see matched FISH digital karyotyping, so that seems to be working.

Finally, some discussion was included on a CHD7 fusion in Small cell lung carcinomas, however, my notes end at this point without further discussion.

Overall, I really enjoyed this talk - as you'd expect, it was well done and covered a wide range of techniques that are available with 2nd gen sequencing. I wasn't aware of the range of nucleotide changes, or the specificity of tobacco-related carcinogens for specific nucleotides - nor even the repair mechanisms, so I got quite a bit out of it. My notes are relatively sparse, but just the few points I've recorded were of great interest to me.

I'll post the remaining talks as I have time over the next few days.

Labels: ,

Friday, July 17, 2009


This week has been a tremendous confluence of concepts and ideas around community. Not that I'd expect anyone else to notice, but it really kept building towards a common theme.

The first was just a community of co-workers. Last week, my lab went out to celebrate a lab-mate's successful defense of her thesis (Congrats, Dr. Sleumer!). During the second round of drinks (Undrinkable dirty martinis), several of us had a half hour conversation on the best way to desalinate an over-salty martini. As weird as it sounds, it was an interesting and fun conversation, which I just can't imagine having with too many people. (By the way, I think Obi's suggestion wins: distillation.) This is not a group of people you want to take for granted!

The second community related event was an invitation to move my blog over to a larger community of bloggers. While I've temporarily declined, it raised the question of what kind of community I have while I keep my blog on my own server. In some ways, it leaves me isolated, although it does provide a "distinct" source of information, easily distinguishable from other people's blogs. (One of the reasons for not moving the larger community is the lack of distinguishing marks - I don't want to sink into a "borg" experience with other bloggers and just become assimilated entirely.) Is it worth moving over to reduce the isolation and become part of a bigger community, even if it means losing some of my identity?

The third event was a talk I gave this morning. I spent a lot of time trying to put together a coherent presentation - and ended talking about my experiences without discussing the actual focus of my research. Instead, it was on the topic of "successes and failures in developing an open source community" as applied to the Vancouver Short Read Analysis Package. Yes, I'm happy there is a (small) community around it, but there is definitely room for improvement.

Anyhow, at the risk of babbling on too much, what I really wanted to say is that communities are all around us, and we have to seriously consider our impact on them, and the impact they have on us - not to mention how we integrate into them, both in our work and outside. If you can't maximize your ability to motivate them (or their ability to motivate you), then you're at a serious disadvantage. How we balance all of that is an open question, and one I'm still working hard at answering.

I've attached my presentation from this morning, just in case anyone is interested. (I've decorated it with pictures from the South Pacific, in case all of the plain text is too boring to keep you awake.)

Here it is (it's about 7Mb.)

Labels: , , , , , , , ,

Friday, May 15, 2009

UBC Seminar - Dr. Dawn Bowdish, McMaster University

[This talk was given by a good friend, Dr. Dawn Bowdish - and is WAY outside of the topics that I normally cover. However, there is some interesting work with SNPs which is worth mentioning, if you can get past the non-genomic part at the start - which I suggest. As always, mistakes are my misunderstanding of the topic - not the speakers!]

Talk title: The class A scavenger receptors are associated with host defense towards Mycobacteium tuberculosis.

Lab URL:

Post-doc was done at oxford, where most of the work that will be presented today was done.
1.The role of the scavenger receptors in Mycobacterium tuberculosis in infection
2.Polymorphisms in scavenger receptors and susceptibility to M. Tuberculosis infection
3.The role of the cytoplasmic tail in scavenger receptor signalling
4.Evolution of scavenger receptor domains.

Macrophages are beautiful cells. They don't have a single form – you know it when you see it. Paraphrased: 'phooi to t-cells.'

[at this point, the projector died. Dawn offers to tell macrophage stories... Someone wants to know all about Oxford. “It was very Harry Potter.” AV people mess around in the back....]

Macrophages are central to everything she studies. They are an integral part of mammalian biology:
  • Embryonic development, organ structure
  • chronic disease
  • genetic diseases
  • infectious disease
  • autoimmunity
  • cancer
Macrophages receptors are indicators of phenotype, function and biomarkers for disease phenotype

Scavenger receptors: several classes of them exist. The only conserved feature is that they bind modified lipids (acLDL) with varying efficiency.

Class A scavengers: includes 2 that Dawn studies specifically: MARCO and SRA (I and II). Found in all organisms from plants to humans, yeast.. etc. They are involved in cell-cell interactions, and have been adapted to many other cell-interactions.

Marco (Macrophage receptor with collagenous structure) and SRA (scavenger receptor class A)have similar ligands, which is very broad. “Molecular fly paper.” In general, restricted to expression in macrophages)

They only bind some bacterial well, but not all.

SRA plays a role in homeostasis and infectious disease, septic shock.
Marco plays a role in infectious disease. (Redundancy in vitro – requires double knock out.)

The binding domains, however are very different. In Marco, binding is at the end of the receptor. In SRA, it's the 2nd last.

MARCO is not expressed in any cell line, and is not in bone marrow macrophage. Thus, it's often overlooked.

Three types of activation: Classical, alternative, innate (hypothesized). Marco seems to be innate activation, and the properties/phenotype are not well understood. Possibly a phenotype for immuno-modulation, but when it's used is not known. Fills a niche, which doesn't quite fit with the known models in mouse.

So, how does it work in TB? (Not something Dr. Bowdish intended to study, but ended up falling into it in oxford.)

There are many types of uptake – and many new ones have been discovered. There's still room for more receptors, however, and it's possible that the scavenger receptors might be involved in TB.

SRA is up-regulated in response to IFN-gamma and BCG, knockouts are susceptible to BG induced shock. But MARCO? No clear connection. There is still no human anti-MARCO antibody, so these experiments can't be repeated for human cells.

Collaboration with Dr. Russell and Sakamoto from Cornell, and ended up getting involved. They had a ligand (Trehalose dimycholate) that no one had ever found a receptor for – and that turned out to be MARCO. Using TDM coated beads, you could see if it was picked up.

Use a cell line with MARCO receptor – and the beads. MARCO showed that it picked up the beads, SRA did not pick up beads. Could knock it down with a specific inhibitor for MARCO. (shown with fluorescence microscopy.)

Previous work had shown that TDM induces cytokine production in a MyD88 dependent fashion. There was a TLR2 &4 response – so did a knock out, and showed that it could use either of them.

Minimum signal complex required is Marco + TLR (2 or 4). This recreates the pro-inflammatory response. Could never recreate this with SLA.

Is MARCO the missing factor in TDM signalling? Yes. So, it's not that they've lost the pathway or ability – just lacking the particular co-receptor to interact with TDM.

How MARCO works in cytoplasm, however, is another story – it has a very small cytoplasmic tail... which includes a predicted myristolation site. Made constructs with different part of the tail – which didn't change the signalling much. The model proposed, however, is that MARCO is a tethering receptor, which binds or transports the TDM beads to TLRs via CD14. (Similar to the LPS signalling complex.) This was tested with a NF-kb reporter system.

More experiments were done using the knockouts without MARCO or DKO, and were able to continue along to find that MARCO appears to be involved in response to M. Tuberculosis.

Up till now, this was in vitro and mouse. A switch was made to human models.

Started looking for groups looking at SNPs in humans. Did a study interested in whether these SNPs are related to human disease. (Adrian Hill?)

It works well because TB has been around for a long time – 40,000 years.

The Hill group has samples from Gambia, to study TB. Screened 3,500 individuals (HIV free), do controls for the usual (age, sex, etc), and then screened 25SNPs in MARCO and 22 in MSR1.

[Presents a fancy map, showing coverage.]

Much to surprise: there were no SNPs what so ever in SRA – found 4 in MARCO with association to susceptibility and resistance. However, they were all in introns. They were found in introns, and discovered that it was in a putative splice site. (There were no splice variants known in mice, at the time – and there are still none known.) Using assays, Dr. Bowdish found there were indeed splice variants, caused by the SNP.

Oddly enough, this splice variant seems to knock out the binding domain of MARCO. (And the SNP seems to be predominant in african populations - and is very uncommon in caucasians.)

Tentative model: TDM induces MARCO expression. MARCO is regulated at transcriptional and post-translational modification levels. Thus, splice variants may induce differences in response to TB bacteria.

Goals for the future:
  • Understand role of macrophage receptors in infectious disease
  • Attribute functional significance of genetic variability in macrophage genes
  • Characterize phenotype of innate activation & determine if this can be manipulated by immunomodulation
  • Collaborating with people studying other receptors.
Open day on October 26th, 2009 : Institute of infectious disease research opening day.

Labels: ,

Friday, March 13, 2009

Dr. Michael Hallett, McGill University - Towards as systems approach to understanding the tumour microenvironment in breast cancer

Most of this talk is from 2-3 years ago. Breast cancer is now more deadly for women than lung cancer. Lifetime risk for women is 1 in 9 women. Two most significant risk factors: being a woman, aging.

Treatment protocols include surgery, irradiation, hormonal therapy, chemotherapy, directed antibody therapy. Several clinical and molecular markers are now available to decide the treatment course. These also predict recurrence/survival well... but...

Many caveats: only 50% of Her2+ tumours respond to trastuzumab (Herceptin). No regime for (Her2-, ER-, PR-) “tripple negative” patients other than chemo/radiation. Many ER+ patients do not benefit from tamoxifen. 25% of lymph node negative patients (a less aggressive cancer) will develop micrometastatic disease and possibly recurrence (an example of under-treatment.) - Many other examples of undertreatment.

Microarray data caused a whole new perspective on breast cancer treatment. Created a taxonomy of breast cancer – Breast cancer is at least 5 different diseases. (Luminal Subtype A, Subtype B, ERBB2+, Basal Subtype, Normal Beast-like. Left to right, better prognosis to worst prognosis.)

[background into cellular origin of each type of cell. Classification, too.]

There are now gene expression biomarker panels for breast cancer. Most of them do very well in clinical trials. Point made that we almost never find biomarkers that are single gene. Most of the time you need to look at many many genes to figure out what's going on. (“Good sign for bioinformatics”)

Microenvironment: Samples used on arrays, as above, include environment when run on arrays. We end up looking at averaging over the tumour. (Contribution of microenvironment is lost.) Epithelial gene expression signature “swamping out” signatures from other cell types. However, tumour cells interact successfully with it's surrounding tissues.

Most therapies target epithelial cells. Genetic instability in epi cells lead to therapeutic resistance. Stromal cells (endothelial cells in particular) are genetically stable (eg, non-cancer.)

Therefore, If you target the stable microenvironment cells, it won't become resistant.

Method: using invasive tumours, patient selection, laser capture microdiseaction, RNA isolation and amplification (Two rounds) -> microarray.

BIAS bioinformatics integrative application software. (Tool they've built)

LCM + Linear T7 amplification leads to 3' Bias. Nearly 48% of probes are “bad”. Very hard to pick out the quality data.

Looking at just the tumour epitheila profiles (tumours themselves), confirmed that subtypes cluster as before. (Not new data. The breast cancer profiles we already have are basically epithelial driven.) When you look just at the stroma (the microenvironment), you find 6 different categories, and each one of them have distinct traits, which are not the same. There is almost no agreement between endothelial and epithelial cell categorization.. they are orthogonal.

Use both of these categorizations to predict even more accurate outcomes. Stroma are better at predicting outcome than the tumour type itself.

Found a “bad outcome cluster”, and then investigated each of the 163 genes that were differentially expressid between cluster and rest. Can use it to create a predictor. The subtypes are more difficult to work with, and become confounding effects. Used genes ordered by p-value from logistic regression. Apply to simple naive bayes' classifier and cross validation using subsets. Identified 26 (of 163) as optimal classifier set.

“If you can't explain it to a clinician, it won't work.”

Stroma classifier is stroma specific.. It didn't work on epithelial cells. But shows as well or better than other predictors (New, valuable information that wasn't previously available.)

Cross validation of stromal targets against other data sets: worked on 8 datasets which were on bulk tumour. It was surprising that it worked that way, even though bulk tumour is usually just bulk tumour. You can also replicate this with blood vessels from a tumour.

Returning back to biology, you find the genes represent: angiogensis, hypoxic areas, immunosuppression.

[Skipping a few slides that say “on the verge of submission.”] Point: Linear Orderings are more informative than clustering! Things are not binary – it's a real continuum with transitions between classic clusters. (Crosstalk between activated pathways?)

In a survey (2007, Breast Cancer Research 9-R61?), almost all things that breast cancer clinicians would like research done on is bioinformatic driven classification/organization,etc.

  • define all relevant breast cancer signatures
  • analysis of signatures
  • focus on transcriptional signatures
  • improve quality of signatures
  • aims for better statistics/computation with signatures.

There are too many papers coming out with new signature. Understanding breast cancer data in the litterature involves a lot of grouping and teasing out information – and avoiding noise. Signatures are heavily dependent on tissues type, etc etc.

Traditional pathway analysis: Always need experiment and control and require rankings. If that's just two patients, that's fine, if it's a broad panel of patients, you won't know what's going on- you're now in an unsupervised setting.

There are more than 8000 patients who have had array data collected. Even outcome is difficult to interpret.

Instead, using “BreSAT” to do linear ranking instead of clustering, and try to tease out signatures.

There is an activity of a signature – clinicians have always been ordering patients, so that's what they want.

What is the optimal ordering that matches with the ordering....[sorry missed that.] Many trends show up when you do this than with hierarchical clustering. (Wnt, Hypoxia) You can even order two things: (eg. BRCA and Interferon), you can see tremendously strong signals. Start to see dependencies between signatures.

Working on several major technologies (chip-chip, microarray, smallRNA) and more precise view of microenvironment.

Labels: ,

Anamaria Crisan and Jing Xiang, UBC – Comparison of Hidden Markov Models and Sparse Bayesian Learning for Detection of Copy Number Alterations.

Point was to implement a C-algorithm in Matlab. (Pique-Regi et al, 2008). Uses sparse Bayesian Learning (SPL) and Backward Elimination. (Used microarray data for this experiment.)

Identifying gains, loss or neutral. (in this case, they looked at specific genes, rather than regions.) [Probably because they were using array data, not 2nd gen sequencing.]

Novelty of algorithm: piece-wise constant (pwc) representation of breakpoints.

Assume normal distribution of weights, forumale as a posteriori estimate, and apply SBL. Hierarchical prior of the weights and hyperparameters....

[some stats in here] Last step is to optimize using (expectation maximization) EM algorithm.

Done in matlab “because you can do fancy tricks with the code”, easily readable. It's fast, and diagonals from matrices can be calculated quickly and easily.

Seems to take 30 seconds per chromosome.

Have to filter out noise, which may indicate false breakpoints. So, backwards elimination algorithm – measures significance of each copy number variation found, and removes insignificant points. [AH! This algorithm is very similar to sub-peak optimization in FindPeaks... Basically you drop out the points until you find and remove all points below threshold.]

It's slower, but more readable than C.

Use CNAHMMer by Sohrab Shah (2006). HMM with Gaussian mixture model to assign CNA type (L,G,N). On the same data set, results were not comparable.

SBL not much faster than CNAHMMer. (Did not always follow vectorized code, however, so some improvements are possible.)

Now planning to move this to Next-Gen sequencing.

Heh.. they were working from template code with Spanish comments! Yikes!

[My comments: this is pretty cool! What else do I need to say. Spanish comments sound evil, though... geez. Ok, so I should say that all their slagging on C probably isn't that warranted.... but hey, to each their own. ]

Labels: ,

Aria Shahingohar, UWO – Parameter estimation of Bergman's minimal model of insulin sensitivity using Genetic Algorithm.

Abnormal insulin production can lead to serious problems. Goal is to enhance the estimation of insulin sensitivity. Glucose is injected into blood at time zero, insulin is injected shortly after. Bergman has a model that describes the curves produced in this experiment.

Equations given for:
Change in plasma glucose over time = ......
Rate of insulin removal....

There are 8 parameters in this model which vary from person to person. The model is a closed loop system, and requires the partitioning of the subsystems [?] Requires good signal to noise ratio.

Use a genetic algorithm to optimize the 8 parameters.

Tested different methods: Genetic algorithms and Simplex method. Also tested various methods of optimization using subsets of information.

Used a maximum of 1000 generations in Genetic Algorithm. Population size 20-40, depending on expt. Each method tested 50 times (stochastic) to measure error for each parameter separately.

Results: GA was always better, and partitioning subsystem works better than trying to estimate all parameters at once.

Conclusion: Genetic algorithm significantly lowers error, and parameters can be estimated with only glucose and insulin measurements.

[My Comments: This was an interesting project which clearly has real world impacts. Although much of it wasn't particularly well explained, leaving the audience to pick out out the meaning. Very nice presentation, and cool concept. It would be nice to see more information on other algorithms.... ]

An audience member has asked about saturation. That's another interesting topic that wasn't covered.

Labels: ,

Harmonie Eleveld and Emilie Lalonde, Queen's University – A computational approach for the discovery of Thi1 and Thi5 regulated (Thiamine repressible)

[Interesting – two presenters! This is their undergraduate project]

Bioinformatics looking for genes activated by thiamine, using transcription factor binding motifs. [Some biological background] Thi1 and Thi5 binding sites are being detected.

Thiamine uptake causes repression of Thi1 and Thi5.

Used upstream sequences from genes of interest. Used motif detection tools to generate a dataset of potential sites.

Looking at Zinc finger TF's, so bipartite, palindromic sites. Used BioProspector, from Stanford. It did what they wanted the best.

Implemented a pattern recognition network (feed forward), using training sets from bioprospector + negative (random) controls. Did lots of gene sets, many trials and tested many different parameters.

Used 3 different gene sets (nmt1 and nmt2 gene sets from different species), (gene set from s. Pombe only, 6 genes), (all gene sets all species)

Preliminary results: used length of 21, Train on S. pombe and S. japonicus, test on S. octosporus.
Results seem very good for first attempt. Evaluation with “confusion matrix” seems very good. (Accuracy appears to be in the range of 86-95%)

Final testing with the neural network: Significant findings will be verified biologically, and knockout strains may be tested with microarrays.

Labels: ,

Denny Chen Dai, SFU – An Incremental Redundancy Estimation for the Sequence Neighbourhood Boundary

Background: RNA primary and secondary structure. Working on the RNA design problem (Inverse RNA folding.) [Ah, the memories...]

Divide into sequence space and structure space. Structure space is smaller than sequence space. (Many to one relationship.)

Biology application: how does sequence mutation change the structure space?

Neighbourhood Ball : Sequences that are closely related, but fold differently. As you get closer to the edge of the ball, you find... [something?]

  • Sample n sequences with unique mapping strucure
  • for each sample: search neutral sequence within inner layers, redundancy hit?
  • Compute redundancy rate p.
  • Redundancy rate distribution over Hamming layers. P will approach 1. (all structure are redundant.)
The question is at what point do you saturate? Where do you find this boundary? Somewhere around 50% of sequence space. [I think??]

  • An efficient estimation boundary – confirmed the existence of the neigborhood ball
  • ball radius is much smaller than the seqeunce length.
Where is this useful?
  • Reduce computational effort for RNA design
  • naturally occurring RNA molecules, faster reduncdancy growth rate suggests mutational robustness.
[My Comment: I really don't see where this is coming from.  Seems to be kind of silly, doesn't reference any of the other work in the field that I'm aware of.  (Some of the audience questions seem to agree.)  Overall, I just don't see what he's trying to do - I'm not even sure I agree with his results.  I'll have to check out his poster later to see if I can make sense of it.  Sorry for the poor notes.  ]

Labels: ,

Connor Douglas, UBC – How User Configuration in Bioinformatics Can Facilitate “Translational Science” - A Social Science Perspective

Background is in sociology of science – currently based in centre for applied ethics.

What is civic translational science? Why is it important?

Studying pathogenomics of innate immunity in a large project, including Hancock lab, Brinkman lab, etc. GE(3)LS: Genomics, Ethics, Economics, Environment, Legal and Social issues. What are the ramifications of the knowledge? Trying to hold a mirror up to scientific practices.

Basically, studying bioinformaticians from a social science perspective!

[talking a lot about what he won't talk a lot about.... (-: ]

“Pathogenomics of Innate Immunity” (PI2). This project was required to have a GE(3)LS component, and that is what his research is.

What role does user configuration play in fostering civic translational science? What is it?

It is “iterative movements between the bench to markets to bedside”. Moving knowledge out from a lab into the wider research community.

Studying the development of the “InnateDB” tool being developed. It's open access, open source, database & suite of tools. Not just for in-house use.

Looking at what forces help move tools out into the wider community:
  • Increased “Verstehen” within the research team. (Taking into account the needs of the wider community – understanding what the user wants.)
  • limited release strategies – the more disseminating the better
  • peer-review publication process: review not just the argument but the tool as well.
  • A continued blurring of divisions between producers and users.
And out of time....

Labels: ,

Medical Imaging and Computer-Assisted Interventions - Dr Terry Peters, Robarts Institute, London Ontario

This talk was given as the keynote at the 2009 CSCBC (Fourth Canadian Student Conference on Biomecical Computing.)

In the beginning, there were X-rays. They were the mainstay of medical imaging till the 70s, although ultrasound started in the 50's, it didn't take off for a while. MRI appeared in the 80's. Tomography in 1973.

Of course, all of this required computers. [A bit of the history of computing.]

Computer Tomography. The fundamentals go back to 1917 - “The Radon Transform”, which are the mathematical underpinnings of CT.

Ronald Bracewell made contributions in 1956, with Radio Astronomer used this to reconstruct radio sources. He recognized that fourier transform relation between signals and reconstruction. He developed math very similar to what's used for CT reconstruction.. he was working on a calculator (3 instructions /min)!

Sir Godfrey Hounsfield, Nobel prize winner in 1979. He was an engineer for EMI (the music producer!) Surprisingly, it was the profit of the Beatle's albums that funded this research.

Dr Peters himself began working on CT in the late 1960's. “Figure out a way sof measuring bone-density in the forearm using ultrasound....” (in the lab of Richard Bates, 1929-1990). That approach was a total disaster, so turned to X-ray. Everything in Dr. Bates lab started with Fourier transforms, so his research interests gave him a natural connection with Bracewell at Stanford... The same math that Bracewell was working on made the jump to CT.

The first “object” they used to do was with sheep bones – in New Zealand – what else??

The first reconstruction required 20 radiographs, a densitometer scan, a manual digitization, and 20 minutes on an IBM 360. “Pretty pictures but they will never replace radiographs” - NZ Radiologizt 1972.

The following months, Hounsfiled reports on invention of EMI scanner – scooping Dr. Peters PhD project. However, there were still lots of things to work on. “If you find you're scooped, don't give up there are plenty of problems to be solved...” “Don't be afraid to look outside your field.”

How does CT work?  The central slice Theorem. Take an X-ray projection, fourier transform it, so instead of inverting the matrix, you can do the whole thing in the fourier transform space.

Filtered Back Projection: FT -> | rho | -> Inv FT.

This all leads to the clinical acceptance of CT. Shows us the first CT image ever. His radiology colleagues were less than enthusiastic. However, Dr. James Ambrose in London, saw the benefits of the EMI scanner. Of course, EMI only though there will ever be a need for 6 CT machines.

First CT was just for the head. It took about 80 seconds of scanning, and about the same to recreate the image.

His first job was to “build a CT scanner”, with a budget of $20,000, in 1975-78.

in 1974: 80x80 2009 : 1024x1024
3mm pixels less than .5mm pixels
13mm thick slices  less than 1mm thick slices

What good is CT scanning? Good for scanning density. Great for bones, good for high constrast, not so good in brain (poor contrast between white and grey matter), high spacial resolution,
tradeoff, high cost of radiation dose to patient.
Use for image-guidance for modeling and for pre-operative patients. Not used during surgery, however.

CT Angiography is one example of the power of the technique. You can use contrast dyes, and then collect images to observe many details, and reconstruct vessels. You can even look for occlusions in the heart in blood vessels.

Where is this going? Now working on robotically assisted CABG. Stereo visualization systems.

Currently working to optimize robot tools + CT combination. Improper thoracic port placement, and optimize patient selection.

Pre-operative imaging can be used to measure distances and optimize locations of cuts. This allows the doctor to work without opening the rib cage. They can now use a laser to locate and identify where the cuts should be made, in a computer controlled manner.


Has roots in physics and chemistry labs. NMR imaging built on mathematical foundations similar to CT. Some “nifty tricks” can be used to make images from it. Dropped “N” because nuclear wasn't politically correct.

In 1975, Paul Lautebur presented “Zeumatography”. Magnets, water, tubes... confusing everyone! Seemed very far away from CT scanning. Most people thought he was WAY out there. He ended up sharing a Nobel Prize.

Sr Peter Mansfield in 1980 produced an MRI of a human using this method – although it didn't look much better than the first CT.

[Explanation of how NMR works – and how Fourier transforms and gradients are applied.]

More than anything else, MRI combines more scientific disciplines than anything else he can think of.

We are now at 35 years of MRI. Originally said that MRI would never catch on. We now generate high resolution 7 Tesla images. [Very impressive pictures]

Discussion of Quenching of the magnets... yes, boiling off the liquid helium is bad. Showing image of how a modern MRI works.

What good is MRI? Well, the best signals come from water (protons), looking at T1 and T2 relaxation times. Have good soft tissue contrast – including white and grey matter brain cells. High spatial resolution, high temporal resolution. No radiation dose, great use for image-guidance.
(As far as we can tell, the human body does not react negatively to the magnetic fields we generate.)

Can also be used for inter-operative techniques, however everything used must be non-magnetic. Several neat MRI scanners exist for this purpose, including robots that can do MRI using just the “fringe fields” from a nearby MRI machine.

Can be used for:
  • MRA - Angiography (vascular system), 
  • MRS – Spectroscopy (images of brain and muscle metabolism)
  • fMRI – Functional magnetic resonance imagine (image of brain function)
  • PW MRI – Perfusion-Weighted imaging. (Blood flow in ischemia and stroke)
  • DW MRI – Diffusion-Weighted imaging (water flow along nerve pathways – images of nerve bundles).

FMRI: Looks at regions that demand more oxygen. Can differentiate 1% changes, and then can correlate signal intensity with some task (recognition, or functional) Can be used to avoid critical areas during surgery.

Diffusion Tensor: looks at the diffusion of water, resulting in technique of “Tractography”, which can be used to identify all of the nerve pathways, which can then be avoided during surgery.

There are applications for helping to avoid the symptoms of Parkinson's. Mapped hundreds of patients to find best location, and now can use this information to tell them exactly where to place the electrodes in new patients.

[Showing an image in which they use X windows for their computer imaging – go *nix.]

Two minutes of Ultrasound: [How it works.] Typical sonar, and then reconstruct. “Reflections along line of sight.” Now, each ultrasound uses various focal lengths, several transducers, etc, etc. All done electronically now.

The beam has an interesting shape – not conical, as I had always though.

Original Ultrasound used an oscilloscope with long persistence, and they'd use a Polaroid camera to take pictures of it. The ultrasound head used joints to know where it was to graph points on the oscilloscope. (Long before computers were available.)

Advantage: Images interfaces between tissues, inexpensive, portable, realtime 2D/3D, does not pass through air or bone. Can be used to measure changes of reflective frequency, so blood flow direction and speed. Can be used for image-guidance – can be much more useful when combined with MRI, etc.
Disadvantage: difficult to interpret.

In the last year, 3d, dynamic ultrasound is now available. You can put a probe in the ultrasound and watch the heart valves.

For intra-cardiac intervention: Create model from pre-op imaging, register model to patient, use trans-esophogeal ultrasound for real-time image guidance, introduce instruments through chest/heart wall, magnetically track ultrasound and instruments, display in VR environment.

[Very cool demonstrations of the technology.] [Now showing another VR environment using windows XP. Bleh.]

Other modalities: PET – positron emission tomography, SPECT,

One important tool, now, is the fusion of several of these techniques: MRI-PET, CT-MRI, US-MRI.

Conclusion: CT and MRI provide high resolution 3d/4d data, but can't be used well in operating room. US is inexpensive and 2d/3d imaging, but really hard to get context.

Future: image-guided procedures, deformable models with US synchronization. Challenges: tracking intra-op imaging devices and real-time registration. Deformation of pre-op models to intra-op anatomy.

Labels: ,

Wednesday, March 11, 2009

Notes from the Michael Smith Panel at the Gairdner Symposium.

I took some notes from the panel session this evening, which was the final event for the Vancouver portion of the Gairdner Symposium 50th Anniversary celebrations. Fortunately, they are in electronic format, so I could easily migrate them to the web. I also took notes from the other sessions, but only with my pen and paper, so if people are interested I can also transcribe some of those notes, or summarize them from the rest of the sessions, but I will only do that upon request.

As always, if there's something that doesn't make sense, it was my fault, not those of the panelists, and when in doubt, this version of the talk is probably wrong.

You'll also notice several changes in tenses in my notes - I apologize, but I don't think I'm going to fix them tonight.

Before you read the transcription below, I'd also like to comment that all of the speakers tonight were simply outstanding, and that I can't begin to do them justice. Mr. Sabine is simply an amazing orator, and he left me impressed with his ability to mesmerize the crowd and bring his stories to life in a way that I doubt any scientist could match. He was also an exemplary moderator. If the panelists were any less impressive in their ability to hold the crowd's attention, it was easily made up by their ability to give their opinions clearly and concisely, and they were always able to add insightful comments on each subject that they addressed. Clearly, I couldn't have asked for more from anyone involved.

There was also a rumour that the video version of the talk would be available on the web, somewhere. If that's the case, I highly suggest you watch it, instead of reading my "Coles notes" version. If you can't find it, this might tide you over till you do.


Introduction by Michael Hayden

Michael Smith told reporters, on the day he won the Nobel Prize: He sleeps in the nude, has a low IQ and wears Berkenstocks.

Tonight's focus is different: DNA is “everywhere”.. it has become a cultural icon. Sequencing the human genome was estimated to take $3 billion and 10 years, and it took nearly that. Now, you can do it for about $1000. Who knows, it might even be your next Christmas gift. Personal genomics was the invention of the year in 2008.

Our questions for tonight – what are the advantages, and what harm can it do.

Moderator: Charles Sabine
Dr. Cynthia Kenyon,
Dr. Harold Varmus
Dr Muin Khoury

Questions: Personalized Genomics: Hope or Hype?

[Mr. Sabine began his talk, which included a slideshow with fantastic clips of Michael Smith, his experiences in the war, and was narrated with dramatic stories from conflicts in which he reported upon, and human tragedies he witnessed. I can't begin to do justice to this extremely eloquent and engaging dialogue, so I'll give a quick summary.]

Mr Sabine recently took a break from broadcasting to begin participating in science discussions, and to engage the community on issues that are tremendously important: science, genomics and medicine. His family recently found out that Mr. Sabine's father was suffering from Huntington's disease, which is a terrible hereditary genetic disease. With his father's diagnosis of Huntingtons, he himself had a 50% chance of developing the disease, as do all of his siblings. His older brother, a successful lawyer, has developed the disease, and is now struggling with the symptoms.

An interesting prediction is that in the near future, as much as 50% of the population will have dementia by the time they die.

From his experience in wars, if you take away dignity and hope, people will lose their moral compass. (Mr Sabine is much more eloquent than my notes make out, and this was the result of a long set of connected points, which I was unable to jot down.)

Huntington's disease is an interesting testing ground, for it's ability to predict personal medical futures. It has a high penetrance, and is one of the early genetic diseases for which a test was identified, thus, it is the precursor to the genetic testing and personalized medicine processes that people have envisioned for the future. But the question remains, will personalized medicine be a saviour, by enabling preventative medicine, or will it be a huge distraction by presenting us with information that just isn't actionable. Many people have different answers to this question: insurance companies would like to remove risk, and the personal impact is, of course, enormous.

Mr. Sabine recently took the test for Huntington's himself. The result was positive: he will suffer the same fate as his brother and his father.

[If only I could type fast enough! Fantastic metaphors, stories and wit.]

End of introduction – Beginning of the panel

Question: Is personal medicine a source of hope, or is it just hype?

Varmous: Middle of the road. First, The fact that we're talking about genes excites special attention – it's hereditary, and seem unchangeable. However, this new modality that plays a role in risk assessment is just one more part of the continuum of care we already have. All our environment and physical choices are just one more component of what goes into our medical care.

Second, We're already using medical diagnosis which is based on genomics. These are mostly high penetrance genes, however, so we have to consider the penetrance of each of the genes we're going to use clinically.

Third, it's not always easy to implement the changes in the clinic, when we find them in the lab. There is resistance – physicians are creatures of habit, there are licensing and cost issues. These things are important in how the future plays out.

Many of the new commercial ventures (genotyping companies) are grounded in a questionable area of science. There may be a slightly increased risk of a disease because of a couple changes in your genome, but it's not an accurate description of what we know. There are suppressors and other mitigating factors, so to put your health in the hands of a commercial vendor is premature.

Khoury: My job is to make sure that information is used to improve the public health. Have an interest in making sure that it gets used, and used well. Therefore, I'm for it, and believe it can be done. However, I have concerns about the way it's being used now. “The genie is out of the bottle, will we get our wish?” (Title of a recent publication he was involved in.)

Kenyon: Excited about it. Humans are not all the same as one another. Can we correlate attributes with genes? We know about disease genes. Is there a gene for perfect pitch? Happiness? Etc etc. It would be interesting to know, we could start asking if we had more sequences. What would you do with that knowledge? If you don't have the gene for happiness, how would you feel?

The more we know, the more we'll be able to do with the tests. We're too early for real action most of the time. Can you make the right judgements based on the information? Now, probably not, there's too much room to make bad choices.

Question: Is knowing a patient's gene sequence going to impact their care, and what role does the environment play in all of this?

Khoury: most diseases are interactions between environment and genes. Huntington's is one of the few exceptions with high penetrance. You need to understand both parts of the puzzle, to identify the risk of disease. It's still too early.

Varmous: Get away from phrase “sequencing whole genomes”, we're not there yet.. that would cost $100,000's. Right now we do more targeted (Arrays?) or we do shotgun (random?) sequencing. So we have a wide variety of techniques that are used, but we're not sequencing people's genomes.

In some cases there are very low environmental influences. Many people have gene mutations that guarantee you will get a disease.. These should be indicators where genes must be tested and knowing genetic information provides a protective preventative power.

Kenyon: Agree with Harry. Sometimes genes are the answer, other times we just don't know.

Khoury: There is a wide variety of diseases with a variety of possible intervention. Sometimes the diseases are treatable in environmental ways. (eg. phenylketonuria). However, single gene diseases make only 5% of the diseases that are affecting the population.

Genome profiles tell us that there are MANY complex diseases, and we can use non-genetic properties to indicate many of our risks, instead of using genome screens.

Varmous: The word environment does not include all things non-genetic. Behaviour is a huge component. Dietary, drug, motor accidents, warfare, smoking... they are controllable and make a huge contribution.

Khoury: Every disease is 100% environmental and 100% genetic. (-;

Question: given that a genome scan is ~$400-500, would you have your genome scanned. Would you share that information with anyone, and who would that be?

Kenyon: havn't had it done, would do it if there was a familial disease. She likes a little mystery in life, so probably wouldn't do it.

Varmous: Wouldn't do a scan – would only do a test for a single gene. The scan results aren't interpretable. The stats are population based, and not personal. He wouldn't publish his own.

Khoury: he's had the offer from 3 main companies.. and has turned it down. What you get from the scans is incomplete, but also misleading. Some people have had scans by multiple groups – they aren't always consistent. The information is based on published epidemiology studies.... and some of the replicable, some of them...well, aren't. The ones that do give stable information give VERY low odds risk. What do you do with the difference between a 10 and 12% lifetime risk? What changes should you make?

Why waste $400 on this, go spend it on a gym membership.

Kenyon: If you have the test done, it could mislead you to make changes that really hurt you in the long run.

Q: Should personal health care be incorporated into the health care system, and would it become a tiered system?

Khoury: We all agree that personal genomics isn't ready for prime time. After validation, and all....
(Varmous: define personal genomics), insurance companies are paying for this information, genetic counselling, and people make up their minds. The question is whole genome scans, though. And this is all about microarray chips and small variants. However, if the 3 billion bases are sequenced, what then? Would you start adding that information to do massive interpretation. We need to wait till things are actionable. Once that happens, we'll see them move through the health care system.

Will it become part of people's medical record.. probably not.

Varmous: The president is interested in using genetics in interesting ways. The most useful aspect is pharmacogenetics, and start looking at genetic variation and response to drugs. When you get sick away from your home, the physician you visit should have access to that information.

Kenyon: There are a lot of drugs already being tested that have very specific actions which only work for a subpopulation. Once we know what the variations in the population are that treatable with a drug, more drugs will make it through the trials.

Varmous: not a correction, just not sure the audience knows enough about cancers, which are heterogenous. Many of the changes are not somatic, and we don't yet have the tools to analyse cancers at that level.

Q: How worried should we be that personal genomics will lead to discrimination?

Khoury: it has been a huge topic for discussion. Congress has passed an act to prevent exactly that.
It's a good start in the right direction, and there's still plenty of room to worry.

Varmous: This is most worrisome in employment and insurance. There were few cases in the past anyhow, based on predisposition. However, some employers may discriminate against people who have diseases which may potentially reoccur, because they don't want their insurance premiums to rise.

The only way to avoid that is to have a Canada-like health care system.

Sabine: would you tell your employer?

Kenyon: it wouldn't be a problem at University of California, but it could be a major problem for people. There's a risk that things change and you don't know where it goes.

Sabine: would you trust insurance companies?

Kenyon: Gov't can't do things that harm too many people and stay in power. In long run, we can trust that things will be put right.

Khoury: not right now.

Varmous: I wouldn't reveal it [his own genome sequencing results].

Q: How do you suggest we bring personalized medicine to developing countries?

Khoury: Same technologies can be used everywhere around the world. The concerns around chronic diseases are there. They have their fair share of infectious diseases, and the genomic information will help with better medications, which will help on that front. We can also use the technology for solving other difficult problems, beyond human personal medicine.

Dr. Singer [the authority on the subject was pulled up from the audience]: It can help. There is a war being waged on the global poor, waged by diseases. Killing millions of people. We could use the technology to create better tests, better drugs, etc etc. We can use life-sciences and biotech to save people in the third world. Personalized genomics: that too could have an effect, but not at the individual level. If you apply personalized genomics at the population level... [I think he's talking about doing WGAS to study infectious diseases, and confusing it with personalized genomics.]

Varmous: Always eager to agree with Dr. Singer. The use of genetic technology could be very important for infectious diseases in the developing world .

Kenyon: Something about using WGAS to study..... [I missed the end of the thought.]

Q: How might this information [personal genomics] shape romantic relationships?

Khoury: Just a public health guy! But there's an angle: it's highly unlikely that we'll find the gene “for something”, so while he specializes in prevention and disease, personal genomics are just unlikely to be useful outside of that realm. He doesn't think it's going to have an impact, but there are medical applications such as tay-sachs screening. There are forms of screening, but it's not romantic in the sense of “romantic”

Varmous: We're unlikely to ever see genomes on facebook for romantic purposes, but some times it is useful in preventing disease. May be useful in screening embryos.

Kenyon: Thinks the same thing. Predicting love or even personality from DNA is impossible.. cheaper to do 2 minute dating. However, many of the screens are still useless in terms of predictions that really carry weight. We should instead teach statistics to kids to better understand risk. If we bring in testing, we have to bring in education.

Q: Should information be used to screen your potential partners?

Varmous: If he had genetic testing, and if he were single, he still wouldn't tell his dates.

Q: Genome canada's funding was reduced to zero. How do we advocate for funding?

Varmous: Wasn't aware of that change of funding. There are many factors to be considered. Both economic and political climates have to be considered. Scientists must keep explaining in an honest and straightforward manner how science works and it's contribution to the public. Do what you can, and engage the politicians! They do listen, and they learn – visit local pharma, etc etc. All scientists have to do their part.

Khoury: Harry said it all.

Kenyon: Opportunities arise all the time, engage everyone around you. Take the time to talk with people on the bus, whatever, but just in general seize opportunities, and they come up all the time.

Varmous: just don't become the crazy scientist on the bus who will talk to anybody.

Q: Epigentics are influenced by the environment, and we can influence them with our behaviour. How long will it be before we know enough about the epigenome before we can start making predictions about disease?

Khoury: The sequence variation we measure may mean something different for people, depending on the patterns we see. It can be a big factor, and the environment also complicates our efforts to understand how it all works together. Particularly in cancer. How long will it take to mature? Progress is moving forward rapidly, but can't make a prediction. Excited by prospects, though.

Varmous: Epigenomics is being most vigourously applied in oncology. Gene silencing and other effects can be seen in the epigenome. That may contribute to the cancer, and determine efficacy of drugs. However, the tools are still crude.

[Hey, what about FindPeaks!? :P]

Kenyon: Explanation of what epigenetics is.

Q: What kind of regulation should exist, if any, on the companies that do personal genomics.

Khoury: FDA does regulation, and the US oversight is fairly loose. Talk about CLIA. [previously mentioned in other talks on my blog, so I'm not taking notes on this.] Basically, more people are concerned, and people believe that other regulation is necessary.

Varmous: There's an uneven playing field out there. Certain things are tightly regulated, whereas other things are too loosely tested. Seems like DNA testing wasn't really the point of the original screening regulation, so that could be improved.

Sabine: Closing remarks. Thanks to everyone.


Saturday, December 6, 2008

Nothing like reading to stimulate ideas

Well, this week has been exciting. The house sale competed last night, with only a few hiccups. Both us and the seller of the house we were buying got low-ball offers during the week, which provided the real estate agents lots to talk about, but never really made an impact. We had a few sleepless nights waiting to find out of the seller would drop our offer and take the competing one that came in, but in the end it all worked out.

On the more science-related side, despite the fact I'm not doing any real work, I've learned a lot, and had the chance to talk about a lot of ideas.

There's been a huge ongoing discussion about the qcal values, or calibrated base call scores that are appearing in Illumina runs these days. It's my understanding that in some cases, these scores are calibrated by looking at the number of perfect alignments, 1-off alignments, and so on, and using the SNP rate to identify some sort of metric which can be applied to identify an expected rate of mismatched base calls. Now, that's fine if you're sequencing an organism that has a genome identical to, or nearly identical to the reference genome. When you're working on cancer genomes, however, that approach may seriously bias your results for very obvious reasons. I've had this debate with three people this week, and I'm sure the conversation will continue on for a few more weeks.

In terms of studying for my comprehensive exam, I'm now done the first 12 chapters of the Weinberg "Biology of Genomes" textbook, and I seem to be retaining it fairly well. My girlfriend quizzed me on a few things last night, and I did reasonably well answering the questions. 6 more days, 4 more chapters to go.

The most interesting part of the studying was Thursday's seminar day. In preparation for the Genome Sciences Centre's bi-annual retreat, there was an all-day seminar series, in which many of the PIs spoke about their research. Incidentally, 3 of my committee members were speaking, so I figured it would be a good investment of my time to attend. (Co-incidentally, the 4th committee member was also speaking that day, but on campus, so I missed his talk.)

Indeed - having read so many chapters of the textbook on cancer biology, I was FAR better equipped to understand what I was hearing - and many of the research topics presented picked up exactly where the textbook left off. I also have a pretty good idea what questions they will be asking now: I can see where the questions during my committee meetings have come from; it's never far from the research they're most interested in. Finally, the big picture is coming together!

Anyhow, two specific things this week have stood out enough that I wanted to mention them here.

The first was the keynote speaker's talk on Thursday. Dr. Morag Park spoke about the environment of tumours, and how it has a major impact on the prognosis of the cancer patient. One thing that wasn't settled was why the environment is responding to the tumour at all. Is the reaction of the environment dictated by the tumour, making this just another element of the cancer biology, or does the environment have it's own mechanism to detect growths, which is different in each person. This is definitely an area I hadn't put much thought into until seeing Dr. Park speak. (She was a very good speaker, I might add.)

The second item was something that came out of the textbook. They have a single paragraph at the end of chapter 12, which was bothering me. After discussing cancer stem cells, DNA damage and repair, and the whole works (500 pages of cancer research into the book...), they mention progeria. In progeria, children age dramatically quickly, such that a 12-14 year old has roughly the appearance of an 80-90 year old. It's a devastating disease. However, the textbook mentions it in the context of DNA damage, suggesting that the progression of this disease may be caused by general DNA damage sustained by the majority of cells in the body over the short course of the life of a progeria patient. This leaves me of two minds: 1), the DNA damage to the somatic cells of a patient would cause them to lose tissues more rapidly, which would have to be regenerated more quickly, causing more rapid degradation of tissues - shortening telomeres would take care of that. This could be cause a more rapid aging process. However, 2) the textbook just finished describing how stem cells and rapidly reproducing progenitor cells are dramatically more sensitive to DNA damage, which are the precursors involved in tissue repair. Wouldn't it be more likely then that people suffering with this disease are actually drawing down their supply of stem cells more quickly than people without DNA repair defects? All of their tissues may also suffer more rapid degradation than normal, but it's the stem cells which are clearly required for long term tissue maintenance. An interesting experiment could be done on these patients requiring no more than a few milliliters of blood - has their CD34+ ratio of cells dropped compared to non-sufferers of the disease? Alas, that's well outside of what I can do in the next couple of years, so I hope someone else gives this a whirl.

Anyhow, just some random thoughs. 6 days left till the exam!

Labels: , , , , , ,

Sunday, April 13, 2008

Genomics Forum 2008

You can probably guess what this post is about from the title - which means I still haven't gotten around to writing an entry on thresholding for ChIP-Seq. Actually, it's probably a good thing I haven't, as we've been learning a lot about thresholding in the past week. It seems many things we took for granted aren't really the case. Anyhow, I'm not going to say too much about that, as I plan to collect my thoughts and discuss it in a later entry.

Instead, I'd like to discuss the 2008 Genomics Forum, sponsored by Genome BC, which took place on Friday - though, in particular, I'm going to focus on one talk, near to my own research. Dr. Barbara Wold from Caltech gave the first of the science talks, and focussed heavily on ChIP-Seq and Whole Transcriptome Shotgun Sequencing (WTSS). But before I get to that, I wanted to mention a few other things.

The first is that Genome BC took a few minutes to announce a really neat funding competition, which really impressed me, the Genome BC Science Opportunities Fund. (There's nothing up on the web page yet, but if you google for it, you'll come across the agenda for Friday's forum in which it's mentioned - I'm sure more will appear soon.) Its whole premise revolves around the question: "Are there experiments that we need to be doing, that are of strategic importance to the BC life science community?" I take that to mean, are there projects that we can't afford not to undertake, that we wouldn't have the funding to do otherwise? I find that to be very flexible, and very non-academic in nature - but quite neat. I hope the funding competition goes well, and I'm looking forward to seeing what they think falls into the "must do" category.

The second was the surprising demand for Bioinformaticians. I'm aware of several jobs for bioinformaticians with experience in next-gen sequencing, but the surprise to me was the number of times (5) I heard people mention that they were actively recruiting. If anyone with next-gen experience is out there looking for a job (post-doc, full time or grad student), drop me a note, and I can probably point you in the right direction.

The third was one of the afternoon talks, on journalism in science, from the perspective of traditional news paper/tv journalists. It seems so foreign to me, yet the talk touched on several interesting points, including the fact that journalists are struggling to come to terms with "new media." (... which doesn't seem particularly new to those of us who have been using the net since the 90's, but I digress.) That gave me several ideas about things I can do with my blog, to bring it out of the simple text format I use now. I guess even those of us who live/breath/sleep internet don't do a great job of harnessing it's power for communicating effectively. Food for though.

Ok... so on to the main topic of tonight's blog: Dr. Wold's talk.

Dr. Wold spoke at length on two topics, ChIP-Seq and Whole Transcriptome Shotgun Sequencing. Since these are the two subject I'm actively working on, I was obviously very interested in hearing what she has to say, though I'll comment more on the ChIP-Seq side of things.

One of the great open questions at the Genome Sciences Centre has been how to do an effective control for a ChIP-Seq experiment. It's not something we've done much of, in the past, but the Wold lab demonstrated why they're necessary, and how to do them well. It seems that ChIP-Seq experiments tend to yield fragments in several genomic regions that have nothing to do with the antibody or experiment itself. The educated guess is that these are caused by hypersensitive sites in the genome that tend to fragment in repeatable patterns, giving rise to peaks that appear in all samples. Indeed, I spend a good portion of this past week talking about observations of peaks exactly like that, and how to "filter" them out of the ChIP-Seq results. I wasn't able to get a good idea of how the Wold lab does this, other than by eye, (which isn't very high throughput), but knowing what needs to be done now, it shouldn't be particularly difficult to incorporate into our next release of the FindPeaks code.

Another smart thing that the Wold lab has done is to separate the interactions of ChIP-Seq into two different types: Type 1 and Type 2, where Type 1 refers to single molecule-DNA binding events, which give rise to sharp peaks, and very clean profiles. These tend be transcription factors like NRSF, or STAT1, upon which the first generation of ChIP-Seq papers were published. Type 2 interactomes tend to be less clear, as they are transcription factors that recruit other elements, or form complexes that bind to the DNA at specific sites, and require other proteins to bind to encourage transcription. My own interpretation is that the number of identifiable binding sites should indicate the type, and thus, if there were three identifiable transcription factor consensus sites lined up, it should be considered a Type 3 interactome, though, that may be simplifying the case tremendously, as there are, undoubtedly, many other proteins that must be recruited before any transcription will take place.

In terms of applications, the members of the wold lab have been using their identified peaks to locate novel binding site motifs. I think this is the first thing everyone thinks of when they hear of ChIP-Seq for the first time, but it's pretty cool to see it in action. (We also do it at the GSC too, I might add.) The neatest thing, however, was that they were able to identify a rather strange binding site, with two halves of a motif, split by a variable distance. I haven't quite figured out how that works, in terms of DNA/Protein structure, but it's conceptually quite neat. They were able to show that the distance between the two halves of the structure vary by 10-20 bases, making it a challenge to identify, for most traditional motif scanners. Nifty.

Another neat thing, which I think everyone knows, but was cool to hear that it's been shown is that the binding sites often line up on areas of high conservation across species. I use that as a test for my own work, but it was good to have it confirmed.

Finally, one of the things Dr. Wold mentioned was that they were interested in using the information in the directionality of reads in their analysis. Oddly enough, this was one of the first problems I worked on in ChIP-Seq, months ago, and discovered several ways to handle it. I enjoyed knowing that there's at least one thing my own ChIP-Seq code does that is unique, and possibly better than the competition. (-;

As for transcriptome work, there were only a couple things that are worth mentioning. The Wold lab seems to be using MAQ and a list of splice junctions assembled from annotated exons to map the transcriptome sequences. I've heard that before, actually, from someone at the GSC who is doing exactly the same thing. It's a small world. I'm not really a fan of the technique, however. Yes, you'll get a lot of the exon junction reads, but you'll only find the ones you're looking for, which is exactly the criticism all the next-gen people throw at the use of micro-arrays. There has got to be a better solution... but I don't yet know what it is. (We thought it was Exonerate, but we can't seem to get it to work well, due to several bugs in the software. It's clearly a work in progress.)

Anyhow, I think I'm going to stop here. I'll just sum it all up by saying it was a pretty good talk, and it's given me lots of things to think about. I'm looking forward to getting back to coding tomorrow.

Labels: , , , ,

Friday, April 4, 2008

Dr. Henk Stunnenberg's lecture

I saw an interesting seminar today, which I thought I'd like to comment on. Unfortunately, I didn't bring my notes home with me, so I can only report on the details I recall - and my apologies in advance if I make any errors - as always, any mistakes are obviously with my recall, and not the fault of the presenter.

Ironically, I almost skipped the talk - it was billed as discussing Epigenetics using "ChIP-on-Chip", which I wrote off several months ago as being a "poor man's ChIP-Seq." I try not to say that too loud, usually, since there are still people out there who put a lot of faith in it, and I have no evidence to say it's bad. Or, at least, I didn't until today.

The presenter was Dr. Stunnenberg, from Nijmegen Center for Molecular Sciences, who's web page doesn't do him justice in any respect. To begin with, Dr. Stunnenberg gave a big apology for the change in date of his talk - I gather the originally scheduled talk had to be postponed because someone had stolen his bags while he was on the way to the airport. That has got to suck, but I digress...

Right away, we were told that the talk would focus not on "ChIP-on-Chip", but on ChIP-Seq, instead, which cheered me up tremendously. We were also told that the poor graduate student (Mark?) who had spent a full year generating the first data set based on the ChIP-on-Chip method had had to throw away all of his data and start over again once the ChIP-Seq data had become available. Yes, it's THAT much better. To paraphrase Dr. Stunnenberg, it wasn't worth anyone's time to work with the ChIP-on-Chip data set when compared to the accuracy, speed and precision of the ChIP-Seq technology. Ah, music to my ears.

I'm not going to go over what data was presented, as it would mostly be of interest only to cancer researchers, other than to mention it was based on estrogen receptor mediated binding. However, I do want to raise two interesting points that Dr. Stunnenberg touched upon: the minimum height threshold they applied to their data, and the use of Polymerase occupancy.

With respect to their experiment, they performed several lanes of sequencing on their ChIP-Seq sample, and used the standard peak finding to identify areas of enrichment. This yielded a large number of sites, which I seem to recall was in the range of 60-100k peaks, with a "statistically derived" cutoff around 8-10. No surprise, this is a typical result for a complex interaction with a relatively promiscuous transcription factor; a lot of peaks! The surprise to me was that they decided that this was too many peaks, and so applied an arbitrary threshold of a minimum peak height of 30, which reduced the number of peaks down to 6,400-ish peaks. Unfortunately, I can't come up with a single justification for this threshold at 30. In fact, I don't know that anyone could, including Dr. Stunnenberg, who admitted it was rather arbitrary, because they thought the first number, in the 10's of thousands of peaks was too many.

I'll be puzzling over this for a while, but it seems like a lot of good data was rejected for no particularly good reason. yes, it made the data set more tractable, but considering the number of peaks we work on regularly at the GSC, I'm not really sure this is a defensible reason. I'm personally convinced that there is a lot of biological relevance for the peaks with low peak heights, even if we aren't aware of what that is yet, and arbitrarily raising the minimum height threshold 3-fold over the statistically justifiable cut off is a difficult pill to swallow.

Moving along, the part that did impress me a lot (one of many impressive parts, really) was the use of Polymerase occupancy ChIP-Seq tracks. Whereas the GSC tends to do a lot of transcriptome work to identify the expression of genes, Dr. Stunnenberg demonstrated that polymerase ChIP can be used to gain the same information, but with much less sequencing. (I believe he said 2-3 lanes of Solexa data were all that were needed, whereas our transcriptomes have been done up to a full 8 lanes.) Admittedly, I'd rather have both transcriptome and polymerase occupancy, since it's not clear where each one has weaknesses, but I can see obvious advantages to both methods, particularly the benefits of having direct DNA evidence, rather than mapping cDNA back to genomic locations for the same information. I think this is something I'll definitely be following up on.

In summary, this was clearly a well thought through talk, delivered by a very animated and entertaining speaker. (I don't think Greg even thought about napping through this one.) There's clearly some good work being done at the Nijmegen Center for Molecular Sciences, and I'll start following their papers more closely. In the meantime, I'm kicking myself for not going to the lunch to talk with Dr. Stunnenberg afterwards, but alas, the chip-on-chip poster sent out in advance had me fooled, and I had booked myself into a conflicting meeting earlier this week. Hopefully I'll have another opportunity in the future.

By the way, Dr. Stunnenberg made a point of mentioning they're hiring bioinformaticians, so interested parties may want to check out his web page.

Labels: ,