Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: - Please come visit my blog there.

Tuesday, December 22, 2009

Link Roundup Returns - Dec 16-22

I've been busy with my thesis project for the past couple weeks, which I think is understandable, but all work and no play kinda doesn't sit well for me. So, over the weekend, I learned go, google's new programming languages, and wrote myself a simple application for keeping track of links - and dumping them out in a pretty html format that I can just cut and paste into my blog.

While I'm not quite ready to release the code for my little go application, I am ready to test it out. I went back through the last 200 twitter posts I have (about 8 days worth), and grabbed the ones that looked interesting to me. I may have missed a few, or grabbed a few less than thrilling ones. It's simply a consequence of me skimming some of the articles less well than others. I promise the quality of my links will be better in the future.

Anyhow, this experiment gave me a few insights into the process of "reprocessing" tweets. The first is that my app only records the person from whom I got the tweet - not the people from who they got it. I'll try to address that in the future. The second is that it's a very simple interface - and a lot of things I wanted to say just didn't fit. (Maybe that's for the better.. who knows.)

Regardless (or irregardless, for those of you in the U.S.) here are my picks for the week.

  • Bringing back Blast (Blast+) (PDF) - Link (via @BioInfo)
  • Incredibly vague advice on how to become a bioinformatician - Link (via @KatherineMejia)
  • Cleaning up the Human Genome - Link (via @dgmacarthur)
  • Neat article on "4th paradigm of computing: exaflod of observational data" - Link (via @genomicslawyer)

  • Gene/Protein Annotation is worse than you thought - Link (via @BioInfo)
  • Why are europeans white? - Link (via @lukejostins)

Future Technology:
  • D-Wave Surfaces again in discussions about bioinformatics - Link (via @biotechbase)
  • Changing the way we give credit in science - Link (via @genomicslawyer)

Off topic:
  • On scientists getting quote-mined by the press - Link (via @Etche_homo)
  • Give away of the best science cookie cutters ever - Link (via @apfejes)
  • Neat early history of the electric car - Link (via @biotechbase)
  • Wild (innacurate and funny) conspiracy theories about the Wellcome Trust Sanger Institute - Link (via @dgmacarthur)
  • The Eureka Moment: An Interview with Sir Alec Jeffreys (Inventor of the DNA Fingerprint) - Link (via @dgmacarthur)
  • Six types of twitter user (based on The Tipping Point) - Link (via @ritajlg)

Personal Medicine:
  • Discussion on mutations in cancer (in the press) - Link (via @CompleteGenomic)
  • Upcoming Conference: Personalized Medicine World Conference (Jan 19-20, 2010) - Link (via @CompleteGenomic)
  • deCODEme offers free analysis for 23andMe customers - Link (via @dgmacarthur)
  • UK government waking up to the impact of personalized medicine - Link (via @dgmacarthur)
  • Doctors not adopting genomic based tests for drug suitabiity - Link (via @dgmacarthur)
  • Quick and dirty biomarker detection - Link (via @genomicslawyer)
  • Personal Genomics article for the masses - Link (via @genomicslawyer)

  • Paper doing the rounds: Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data - Link (via @BioInfo)
  • Archiving Next Generation Sequencing Data - Link (via @BioInfo)
  • Epigenetics takes aim at cancer and other illnesses - Link (via @BioInfo)
  • (Haven't yet read) Changing ecconomics of DNA Synthesis - Link (via @biotechbase)
  • Genomic players for investors. (Very light overview) - Link (via @genomicslawyer)
  • Haven't read yet: Recommended review of 2nd and 3rd generation seq. technologies - Link (via @nanopore)
  • De novo assembly of Giant Panda Genome - Link (via @nanopore)
  • Welcome Trust summary of 2nd Gen sequencing technologies - Link (via @ritajlg)

Labels: , ,

Thursday, December 17, 2009

One lane is (still) not enough...

After my quick post yesterday where I said one lane isn't enough, I was asked to elaborate a bit more, if I could. Well, I don't want do get into the details of the experiment itself, but I'm happy to jump into the "controls" a bit more in depth.

What I can tell is that with one lane of RNA-Seq (Illumina data50bp), all of the variations I find show up either in known polymorphism database or as somatic SNPs, with a few exceptions. The few exceptions just turn out to be exceptions for lack of coverage.

For a "control", I took two data sets (from two separate patients) - each with 6 individual lanes of sequencing data. (I realize this isn't the most robust experiment, but it shows a point.) In the perfect world, each of the 6 lanes per person would have sampled the original library equally well.

So, I matched up one lane from each patient into 6 sets and asked the question: How many transcripts are void (less than 5 tags) in one sample and at least 5x greater in the other sample. (I did this in both directions.)

The results aren't great. In one direction, I see an average of 1245 Transcripts (about 680 genes, so there's some overlap amongst the transcript set) with a std dev. of 38 Transcripts. That sounds pretty consistent, till you look for the overlap in actual transcripts: avg 27.3 with a std dev of 17.4. (range 0-60). And, when with do the calculations, the most closely matched data sets only have a 5% overlap.

The results for the opposite direction were similar: Average of 277 transcripts found that met the criteria ( of 33.61), with an average overlap between data sets being 4.8, std. dev 4.48. (range of 0-11 transcripts in common.) The best overlap in "upregulated" genes for this dataset was just over 4% concordance with a second pair of lanes.

So, what this tells me (for a VERY dirty experiment) is that expression of genes in one lane is highly variable depending on the lane for genes expressed at the low end. (Sampling at the high end usually pretty good, so I'm not too concerned about that.)

What I haven't answered yet is how many lanes is enough. Alas, I have to go do some volunteering, so that experiment will have to wait for another day. And, of course, the images I created along the way will have to follow later as well.

Labels: , , , ,

Wednesday, December 16, 2009

one lane is not enough....

Without giving too much away about the stuff I'm working on - trying to study anything of interest in one lane of RNA-Seq data is futile. Do not try this at home kids.

Now, the question is, how valid is it to compare 36bp reads to 72bp reads? Ah, the joys of research.

Labels: ,

post on blogging

I received an email from a friend last night, who read my post on committee meetings. First, I'm thrilled that someone is reading my post, but the content of the email (roughly paraphrased, since I haven't asked his permission to quote the email) was something like this:
You're writing about real people and not everything you say will be taken in the best possible light. Since people who will be considering you for future positions will be reading your blog, should you be posting in this tone? Will having a blog hurt your future career?"
His point was far more articulate than that, but that gives you the idea. He was able to pick out a few examples of things I've said that could clearly be taken in a bad light - and I can certainly see why they might be taken that way. So, I thought I should explain a little.

When writing a blog, there are three things that are never far from your mind - anonymity, veracity, context.

The first is anonymity: both my own identity and those of the people who participate in my life. Personally, I've made the choice to blog as an individual - so people who read my blog can figure out who I am. I ensure that I don't discuss my family or friends on the blog, as they have not explicitly consented to participate in this project. However, the identity of the people who interact with me in my daily life, particularly at work or school, where the bulk of the blog-related topics occur, is a fine line. I would hate for people to stop talking with me because they're afraid I'm going to blog something that's inappropriate.

So the point about my committee meeting post is quite valid - my committee members did not agree to be discussed on my blog. And yet, committee meetings are a core activity in all PhD studies, so talking about the lessons I'm learning from it is something important to me - as long as I am careful not to infringe on the privacy of the committee members themselves.

In terms of veracity, a good lesson from this blog has been that idle speculation is a "Bad thing" (tm). Going out on a limb where people can Google what you've speculated about can lead you to a boat load of trouble - and it stays on the web for a long time. Consequently, it's important not to exaggerate or speculate needlessly - and certainly slander is a terrible blog-crime.

In the example of my committee meeting post, I have to be careful not to dwell on my interpretations of comments, but just refer to the facts as I see them. However, the blog *is* about my interpretation of events - to which I'm entitled, and which I'm allowed to discuss so long as it doesn't violate the anonymity of the participants or misrepresent events.

The third point is context. In a conversation, it's easy to clarify a context if someone misinterprets what you've said, however blogs require that each entry be atomic and self-contained. If I miss the context in the post, the reader will walk away without knowing what I had in mind. However, this is difficult to do - and it's what separates the good bloggers from the bad. (And clearly, I'm still learning.)

With those three points in mind, here's an example from the post, describing how your PhD committee interacts with you:
"They may be friendly with you, but they're not there to give you friendly advice and guide you through your PhD. Instead, they are really engaged in an adversarial relationship in which they are the gate keepers that will decide when you can leave this pit of doom, and they are the ones that will open the door at the end when they believe you're ready to depart. Yes, they do have the roadmap to letting you out, but they would much rather you figure it out yourself instead of asking them to help plan it."
When I wrote that, I meant for it to describe the relationship that is mandated by the committee - not my relationship to my own committee members. Personally, I think I get along well with them, for the most part - although if they misinterpret that particular paragraph, that might change!

In reality, I have a cordial relationship with my committee - they do give friendly advice (particularly when discussions occur one on one), and they are all wonderful people. And while I may be guilty of some hyperbole (grad school isn't actually a pit of doom), the actual purpose of the committee is fairly accurately described. If you want advice, you should talk with your committee members - not wait for a committee meeting. And, in fact, I'm going to stand by my point that they want you to figure out the roadmap. Every student and every project is different, and there is no single way to get out. It's your advisor/supervisor's job to help you plan your exit - not your committee.

So, was I overly harsh? Perhaps - but it was all about making the point about committees, not about individual committee members. Having gone over the rest of the article, I can tell I've done a poor job of explaining the difference between committee members and a committee, so I'll make a few revisions to reflect that. That clearly shows I goofed on the point of context.

At any rate, that's why I love feedback so much - thanks to my friend's email, I've had the opportunity to re-evaluate and clarify the three points (anonymity, veracity, context) that underpin responsible blogging. And, of course, to learn from a valuable experience from excellent feedback.

To my anonymous friend who took the time to email me, THANKS!


Monday, December 14, 2009

Talk: Writing a manuscript for bio-medical publication - Ian E.P. Taylor (Professor emeritus of Botany, UBC)

I was asked to blog this by a colleague. I haven't caught up with the other talks from last week, but since I'm here, and I found an outlet, I figured I could just dump my notes to the web. So here they are. I've left in a few comments of my own, but really, the lecture stands on it's own. Most of the notes are derived directly from the talk - but they mirror the distributed hand outs pretty closely. And frankly, the talk was full of examples and anecdotes that really helped illustrate the point. If you get a chance to see Dr. Taylor speak, I highly suggest it. Any mistakes, as always, are mine, and you shouldn't take my advice on publishing...

[I will clean up the (HORRIBLY BAD) HTML that was caused by dumping in an open office document. (Update: I've now removed the bad HTML - and I won't be cutting and pasting from open office again. That was brutal.)]

Writing a manuscript for bio-medical publication

By Ian E.P. Taylor (Professor emeritus of Botany)

Started with an anecdote: When you publish a paper, the people you have to impress are 2 reviewers and an editor. They are harder to impress, and you have to worry about first impressions, or they will be hostile. Don't be negative about your work. For things that are missing: keep them for your next experiment.

Peer review: An independent but generic tool that allows an editor to
  • Determine originality
  • operationally competence
  • coherent reports of research
  1. Unpublished research may as well not have been done.
  2. Unread publications may as well not have been written.
Four steps to understand: (Plan for the lecture.)
  1. Plan and plan for your journal
  2. Real and credible authorship. (Those who are listed have done something... names have been put on cheques for authorships.)
  3. Peer review. (The web hasn't changed this much – people still want to see peer review.)
  4. Responses to review.

Picking a journal:
Polling the audience – lots of usual reasons. [Joke about having your professor on the editorial board....]
His reasons:

  1. indexed
  2. best in field
  3. appropriate readership
  4. i read it a lot
  5. timeliness
  6. costs
  7. profile of other authors
  8. professor-choice

Impact factor:

  • Kind of a fake idea.... the papers have impact factors, not journals. Several examples of high impact papers in low impact factor journals.

The plan:

Know the giants upon who's shoulders you stand. What are their backgrounds, and where do they publish?

  1. Know your goals, hypothesis
  2. If you can't write what you discovered in 20 words, you haven't discovered it.
  3. How did you discover it
  4. prepare the outcomes (text, figs, tables, supplementary)
  5. what do they mean?
  6. Keep your references as you write. (Missing references piss off reviewers!)
[The only decent cheese in the world is cheshire cheese – and yes, that's where he grew up.]

Planning abstract:
  • State what you discovered in the first line, and your conclusion in the second line
  • As you write, challenge each sentence against your plan.
  • If necessary, change plan

“We have discovered...” Everything you write should fit this goal.

Elements of a paper:
  • Results – no results, no paper
  • Methods – how you got the results
  • Discussion – why the methods. (May or may not be the choppy approach....)
  • Introduction – direct the reader
  • Conclusions – not repeat of results
  • Aside on abstract: can be structured – explains order of what you're discussing, or can be intro: explain significance.
How to read a paper:
  • Introduction -> Methods -> Discussion.
  • Introduction should lead you straight to discussion.... should be able to skip results and methods.
  • Writing author (always singular) – You should try to write it out yourself, as this is key. You don't want to mix styles.
  • Co-authors
  • senior author (first)
  • senior author (PI)
  • corresponding author.
In case it's not obvious, first is the person who did the work, last is the PI. (for Biomedical) Some places do alphabetical, but it depends on your field.

Writing Author:
  • Get the instructions to authors from the journal
  • write the first draft, which is a record of your work
  • unify style
  • share draft with all co-authors
  • Responsible for ensuring the record of ethical performance.
  • It's key that you get the style correct for your journal.

Before you write – make sure you know which journal.

[Great advice to me: Journals do not publish the truth! They publish results, written truthfully!]

Wonderful anecdote about how they used to write papers by putting everyone into a room and not leaving till the 2nd draft was done.

Who are the authors?
  • Only the people who should be on there. Eg, supervisors who wrote the grant, didn't actually supervise.
  • Data providers
  • Analysis data (e.g. statisticians)
  • political.... yeah, it happens
  • All who contributed to an essential part of the actual research reported
  • No one who is there for contributing a gift of ANY sort
  • check the journal for the criteria.
  • All authors must agree to the authorship list
  • All authors must agree to submission.
People who create libraries should not be included. Eg, if they supply you with a DNA library or something of that nature – if they didn't do the work, they don't belong on the paper. Just as a acknowledgement. See American Journal of Medicine – they have a form for this. (=

Fabrication, Falsification, Publication: they are criminal offenses in the scholarly community. Most time detection is by accident.

Author obligations - everyone on the paper takes responsibility for the whole paper.
  • inform editor of related works
  • refer to instructions to authors for details
  • covering letter should context of other published work. Make sure each paper is an original idea.
  • Inform editors of financial or other conflicts of interest
  • Identify (un)acceptable bias
  • Full justification of 'representative results'
  • Negative results???
  • The stuff you'd expect
  • list of suitable reviewers! Don't just pick the people who are best in the field - give others too. Don't just pick editorial board either. Give some reason why you picked them.
Remember: You are the world expert on the subject you're writing on.

Reviewer Obligations:
  • Treat as you would wish a stranger to treat you
  • Follow directions from editor
  • Consult editor before consulting others
  • group reviews must adhere to confidentially
  • One person must sign off on it.
  • Refuse, rather than delay: 2 weeks max. Do not delay!
  • if you're waiting for journals, don't put up with delay, get on the phone and call!
  • If asked for recommendation, follow criteria.
  • Annotate manuscript, but AVOID being rude.
If you're reviewing, Track changes is acceptable
  • Don't steal ideas or use their ideas for your gain
  • Do not break confidentially
  • Avoid becoming investigator. - don't tell the research how to do the research.
Disclose potential conflict of interests - let the editor decide if it's a problem
and yeah, don't talk to the author unless the editor says it's ok (explicitly)
  • Reviewer conflicts:
  • Recent collaborations
  • intellectual conflicts
  • scientific bias or personal animosity
  • parallel research activity, research work on a competing project
  • potential financial benefit
  • AND potential benefit from advanced knowledge of new work
Responding to reviewers:

  • In the end, the editors decide what's in the journal - not the reviewer
  • It is not an election - the editor can ignore the recommendations
  • Take all the comments - particularly the editors - seriously
  • The editor can say "your paper is accepted subject to the actions of the reviewers...." It has not yet been accepted.
  • Vent about negative comments - but only for 10 minutes.
  • Mark every point on a copy of the manuscript
  • Fix all typos and mechanics IMMEDIATELY. (within 24 hours.)
  • Fix the criticisms, THEN and ONLY THEN worry about the rebut.
  • The comment expressed by reviewer 1 is incorrect because...
  • We have addressed....
  • Don't delay!

Labels: ,

Monday, December 7, 2009

Talk: Peter Campbell - A lung Cancer genome: Complex signatures of tobacco exposure

I had the pleasure of accompanying my supervisor and a few other people from the BC Genome Sciences Centre down to St. Louis last week for a "joint lab meeting" and symposium on Cancer Genomics. The symposium had a fantastic line up of the "who's who" of 2nd generation sequencing (next-gen) cancer genomics, and was open to the public, so I thought I'd take the time to blog some of my notes. (I have a couple servers chugging away on stuff, so I've got a few minutes...) As always, this is drawn from my notes, so if there are mistakes, they are undoubtedly my fault either in interpretation, or in reading my own messy handwriting.

First talk: Peter Campbell.
Title: A lung Cancer Genome: complex signatures of tobacco exposure.

  • Tobaco usage (cigarettes/day) is dropping in USA, and lung cancer deaths is starting to fall. About a 20 year lag.
  • However, usage in China, Indonesia, etc, is rising dramatically, and thus, tobacco cancer research is sill important.
  • Tobacco causes DNA adducts to be formed, including benzo[alpha]pyrane, which distortes residues on either side, and also causes backbone distortions. This leads to DNA copying errors
  • Used NCI-H209 cell line, and generated 30-40 fold coverage using ABI SOLiD machine.
  • observed 22,910 somatic mutations, 334 CNVs, 65 indels, 58 genomic
  • rearrangements.

(My notes are a little sketchy at this point) They were able to show a sensitivity of 76%, however, they sacrificed some sensitivity for specificity, and were able to show 97% true positives in coding regions, 94% true positives in non-coding regions.

The break down of the 23,000 snps appears to be: 70.41% intergenic, 28.21% intronic, 0.79% non-coding, 0.58% coding. There is a Kn/Ks ratio of 2.6:1. These are fairly standard.

They then characterized it by nucleotide change, and showed a significant number of A->G and T-> C changes, which is expected for smoking patients. They also compared the spectrum of changes seen in the literature for Tp53, using IARC database.

Another interesting point is that G mutations enrich at CpG islands, and there is some evidence that mutations are preferentially targeting methylated CpGs. However, there are fewer adenine mutations at GpA locations.

Looking at this data suggested that there are repair mechanisms: eg, transcription coupled repair. This can cause a difference in mutation on transcribed vs non-transcribed DNA. In fact, this is known for G->T mutations, however, a change in A->G mutations is also seen, and is lower as expression increases. Also interesting, G-> Mutations decrease as transcription increases on both strands, however the mechanism of that observation is not yet known. This also seems to hold for other changes as well.

On the subject of genomic rearrangements, a large complex array of rearrangements was observed, however it was not mainly classical inversions as was expected. Instead, insertions in the midst of translocations were observed. What they did see matched FISH digital karyotyping, so that seems to be working.

Finally, some discussion was included on a CHD7 fusion in Small cell lung carcinomas, however, my notes end at this point without further discussion.

Overall, I really enjoyed this talk - as you'd expect, it was well done and covered a wide range of techniques that are available with 2nd gen sequencing. I wasn't aware of the range of nucleotide changes, or the specificity of tobacco-related carcinogens for specific nucleotides - nor even the repair mechanisms, so I got quite a bit out of it. My notes are relatively sparse, but just the few points I've recorded were of great interest to me.

I'll post the remaining talks as I have time over the next few days.

Labels: ,