Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: - Please come visit my blog there.

Thursday, September 17, 2009

3 year post doc? I hope not!

I started replying to a comment left on my blog the other day and then realized it warranted a little more than just a footnote on my last entry.

This comment was left by "Mikael":

[...] you can still do a post-doc even if you don't think you'll continue in academia. I've noticed many life science companies (especially big pharmas) consider it a big plus if you've done say 3 years of post-doc.

I definitely agree that it's worth doing a post-doc, even if you decide you don't want to go on through the academic pathway. I'm beginning to think that the best time to make that decision (ivory tower vs indentured slavery) may actually be during your post-doc, since that will be the closest you come to being a professor before making the decision. As a graduate student, I'm not sure I am fully aware of risks and rewards of the academic lifestyle. (I haven't yet taken a course on the subject, and one only gets so much of an idea through exposure to professors.)

However, at this point, I can't stand the idea of doing a 3 year post doc. After 6 years of undergrads, 2.5 years of masters, 3 years of (co-)running my own company, and about 3.5 years of doing a PhD by the time I'm done... well, 3 more years of school is about as appealing as going back to the wet lab. (No, glassware and I don't really get along.)

If I'm going to do a post-doc (and I probably will), it will be a short and sweet one - no more than a year and a half at the longest. I have friends who are stuck in 4-5 year post-docs and have heard of people doing 10-year post-docs. I know what it means to be a post-doc for that long: "Not a good career building move." If you're not getting publications out quickly in your post-doc, I can imagine it won't reflect well on your C.V, destroying your chances of moving into the limited number of faculty positions - and wrecking havoc on your chances of getting grants.

Still, It's more about what you're doing than how long you're doing it. I'd consider a longer post doc if it's in a great lab with the possibility of many good publications. If there's one thing I've learned from discussions with collaborators and friends who are years ahead of me, it's that getting into a lab where publications aren't forthcoming - and where you're not happy - can burn you out of science quickly.

Given that I've spent this long as a science student (and it's probably far too late for me to change my mind on becoming a professional musician or photographer), I want to make sure that I end up somewhere where I'm happy with the work and can make reasonable progress: this is a search that I'm taking pretty seriously.

[And, just for the record, if company needs me to do 3-years of post-doc at this point, I have to wonder just who it is I'm competing with for that job - and what it is that they think you learn in your 2nd and 3rd years as a postdoc.]

With that in mind, I'm also going to put my (somewhat redacted) resume up on the web in the next few days. It might be a little early - but as I said, I'm taking this seriously.

In the meantime, since I want to actually graduate soon, I'd better go see if my analyses were successful. (=

Labels: , ,

Tuesday, September 15, 2009

Depressing view of Academia

So I officially started going through available post-doc positions this week, now that I'm back from my vacation. I'm still trying to figure out what I want to do when I finish my PhD next year (assuming I do...), and of course, I came back to the academia vs. industry question.

In weighing the evidence, a friend pointed me to this article on the problems facing new scientists in academia. Somehow, it does a nice job of dissuading me from thinking about going down that route - although I'm not completely convinced industry is the way to go yet either.

Read for yourself: Real Lives and White Lies in the Funding of Scientific Research

Labels: , , ,

Saturday, August 15, 2009

What would you do with 10kbp reads?

I just caught a tweet about an article on the Pathogens blog (What can you do with 1000 base pair reads?), which is specifically about 454 reads. Personally, I'm not so interested in 454 reads - the technology is good, but I don't have access to 454 data, so it's somewhat irrelevant to me. (Not to say 1kbp reads isn't neat, but no one has volunteered to pass me 454 data in a long time...)

So, anyhow, I'm trying to think two steps ahead. 2010 is supposed to be the year that Pacific Biosciences (and other companies) release the next generation of sequencing technologies - which will undoubtedly be longer than 1k. (I seem to recall hearing that PacBio has 10k+ reads.- UPDATE: I found a reference.) So to heck with 1kbp reads, this raises the real question: What would you do with a 10,000bp read? And, equally important, how do you work with a 10kbp read?
  • What software do you have now that can deal with 10k reads?
  • Will you align or assemble with a 10k read?
  • What experiments will you be able to do with a 10k read?
Frankly, I suspect that nothing we're currently using will work well with them - we'll all have to go back to the drawing board and rework the algorithms we use.

So, what do you think?

Labels: , , , ,

Monday, June 22, 2009

4 Freedoms of Research

I'm going to venture off the beaten track for a few minutes. Ever since the discussion about conference blogging started to take off, I've been thinking about what the rights of scientists really are - and then came to the conclusion that there really aren't any. There is no scientist's manifesto or equivalent oath that scientists take upon receiving their degree. We don't wear the iron ring like engineers, which signifies our commitment to integrity...

So, I figured I should do my little part to fix that. I'd like to propose the following 4 basic freedoms to research, without which science can not flourish.
  1. Freedom to explore new areas
  2. Freedom to share your results
  3. Freedom to access findings from other scientists
  4. Freedom to verify findings from other scientists
Broadly, these rights should be self evident. They are tightly intermingled, and can not be separated from each other:
  • The right to explore new ideas depends on us being able to trust and verify the results of experiments upon which our exploration is based.
  • The right to share information is contingent upon other groups being able to access those results.
  • The purpose of exploring new research opportunities is to share those results with people who can use them to build upon them
  • Being able to verify findings from other groups requires that we have access to their results.
In fact, they are so tightly mingled, that they are a direct consequence of the scientific method itself.
  1. Ask a question that explores a new area
  2. Use your prior knowledge, or access the literature to make a best guess as to what the answer is
  3. Test your result and confirm/verify if your guess matches the outcome
  4. share your results with the community.
(I liked the phrasing on this site) Of course if your question in step 1 is not new, you're performing the verification step.

There are constraints on what we are allowed to do as scientists as well, we have to respect the ethics of the field in which we do our exploring, and we have to respect the fact that ultimately we are responsible to report to the people who fund the work.

However, that's where we start to see problems. To the best of my knowledge, funding sources define the directions science is able to explore. We saw the U.S. restrict funding to science in order to throttle research in various fields (violating Research Freedom #1) for the past 8 years, which was effectively able to completely halt stem cell research, and suppress alternative fuel sources, etc. In the long term, this technique won't work, because the scientists migrate to where the funding is. As the U.S. restores funding to these areas, the science is returning. Unfortunately, it's Canada's turn, with the conservative government (featuring a science minister who doesn't believe in evolution) removing all funding from genomics research. The cycle of ignorance continues.

Moving along, and clearly in a related vein, Freedom #2 is also a problem of funding. Researchers who would like to verify other group's findings (a key responsibility of the basic peer-review process) aren't funded to do this type of work. While admitting my lack of exposure to granting committees, I've never heard of a grant being given to verify someone else's findings. However, this is the basic way by which the scientists are held accountable. If no one can repeat your work, you will have many questions to answer - and yet the funding for ensuring accountability is rarely present.

The real threat to an open scientific community occurs with the last two Freedoms: sharing and access. If we're unable to discuss the developments in our field, or are not even able to gain information on the latest work done, then science will come grinding to a major halt. We'll waste all of our time and money exploring areas that have been exhaustively covered, or worse yet, come to the wrong conclusions about what areas are worth exploring in our ignorance of what's really going on.

Ironically, Freedoms 3 and 4 are the most eroded in the scientific community today. Even considering only the academic world, where freedoms are taken for granted our interaction with the forums for sharing (and accessing) information are horribly stunted:
  • We do not routinely share negative results (causing unnecessary duplication and wasting resources)
  • We must pay to have our results shared in journals (limiting what can be shared)
  • We must pay to access other scientists results in journals (limiting what can be accessed)
It's trivial to think of other examples of how these two freedoms are being eroded. Unfortunately, it's not so easy to think of how to restore these basic rights to science, although there are a few things we can all do to encourage collaboration and sharing of information:
  • Build open source scientific software and collaborate to improve it - reducing duplication of effort
  • Publish in open access journals to help disseminate knowledge and bring down the barriers to access
  • Maintain blogs to help disseminate knowledge that is not publishable
If all scientists took advantage of these tools and opportunities to further collaborative research, I think we'd find a shift away from conferences towards online collaboration and the development of tools favoring faster and more efficient communication. This, in turn, would provide a significant speed up in the generation of ideas and technologies, leading to more efficient and productive research - something I believe all scientists would like to achieve.

To close, I'd like to propose a hypothesis of my own:
By guaranteeing the four freedoms of research, we will be able to accomplish higher quality research, more efficient use of resources and more frequent breakthroughs in science.
Now, all I need to do is to get someone to fund the research to prove this, but first, I'll have to see what I can find in the literature...

Labels: , , , , ,

Friday, May 15, 2009

On the necessity of controls

I guess I've had this rant building up for a while, and it's finally time to write it up.

One of the fundamental pillars of science is the ability to isolate a specific action or event, and determine it's effects on a particular closed system. The scientific method actually demands that we do it - hypothesize, isolate, test and report in an unbiased manner.

Unfortunately, for some reason, the field of genomics has kind of dropped that idea entirely. At the GSC, we just didn't bother with controls for ChIP-Seq for a long time. I can't say I've even seen too many matched WTSS (RNA-SEQ) experiments for cancer/normals. And that scares me, to some extent.

With all the statistics work I've put in to the latest version of FindPeaks, I'm finally getting a good grasp of the importance of using controls well. With the other software I've seen, they do a scaled comparison to calculate a P-value. That is really only half of the story. It also comes down to normalization, to comparing peaks that are present in both sets... and to determining which peaks are truly valid. Without that, you may as well not be using a control.

Anyhow, that's what prompted me to write this. As I look over the results from the new FindPeaks (, both for ChIP-Seq and WTSS, I'm amazed at how much clearer my answers are, and how much better they validate compared to the non-control based runs. Of course, the tests are still not all in - but what a huge difference it makes. Real control handling (not just normalization or whatever everyone else is doing) vs. Monte Carlo show results that aren't in the same league. The cutoffs are different, the false peak estimates are different, and the filtering is incredibly more accurate.

So, this week, as I look for insight in old transcription factor runs and old WTSS runs, I keep having to curse the lack of controls that exist for my own data sets. I've been pushing for a decent control for my WTSS lanes - and there is matched normal for one cell line - but it's still two months away from having the reads land on my desk... and I'm getting impatient.

Now that I'm able to find all of the interesting differences with statistical significance between two samples, I want to get on with it and find them, but it's so much more of a challenge without an appropriate control. Besides, who'd believe it when I write it up with all of the results relative to each other?

Anyhow, just to wrap this up, I'm going to make a suggestion: if you're still doing experiments without a control, and you want to get them published, it's going to get a LOT harder in the near future. After all, the scientific method has been pretty well accepted for a few hundred years, and genomics (despite some protests to the contrary) should never have felt exempt from it.

Labels: , , , , , , , ,

Tuesday, March 24, 2009

Decision time

Well, now that I've heard that there's a distinct possibility that I might be done my PhD in about a year, it's time to start making some decisions. Frankly, I didn't think I'd be done that quickly - although, really, I'm not done yet. I have a lot of publications to put together, and things to make sense of before I leave, but the clock to start figuring out what to do next has officially begun.

I suppose all of those post-doc blogs I've been reading for the last year have influenced me somewhat: I'm going to look for a lab where I'll find a good mentor, a good environment, and a commitment to publishing and completing post-docs relatively quickly. Although that sounds simple, judging by other blogs I've been reading, it's probably not all that easy to work out. Add to that the fact that my significant other isn't interested in leaving Vancouver (and that I would prefer to stay here as well), and I think this will be a difficult process.

I do need to put together a timeline, however - and since I'm not yet entirely convinced which track I should follow (academic vs industry), it's going to be a somewhat complex timeline. Anyhow, the point of blogging this it is an excellent way to open communication channels with people who you wouldn't be able to connect with in person - and the first one I'd like to open is to ask readers if they have any suggestions.

Input, at this time would be VERY welcome, both on the point of academia vs. industry, as well as what I should be looking for in a good post-doc position, if that ends up being the path I go down. (=

Anyhow, just to mention, I have another blog post coming, but I'll save it for tomorrow. I'd like to comment on another series of blog post from John Hawks and Daniel McArthur. I'm sure the whole blogosphere has heard all about the subject of training bioinformatics students from both the biology and computer science paths by now, but I feel I have something unique to talk about on that issue. In the meantime, I'd better get back to debugging and testing code. FindPeaks has a very cool new method of comparing different samples - and I'd like to get the testing finished. (=

Labels: ,

Wednesday, February 11, 2009

Epidemiology and next-generation(s) sequencing.

I had a very interesting conversation this morning with a co-worker, which ended up as a full fledged conversation about how next generation sequencing will end up spilling out of the research labs to the physician's office. My co-worker originally stated that it will take 20 years or so for it to happen, which seems kind of off to me. While most inventions take a lot longer to get going, I think that next-gen sequencing will cascade over more quickly to general use a lot more quickly than people appreciate. Let me explain why.

The first thing we have to acknowledge is that pharmaceutical companies have a HUGE interest in making next gen sequencing work for them. In the past, pharma companies might spend millions of dollars getting a drug candidate to phase 2 trials, and it's in their best interest to get every drug as far as they can. Thus, any drug that can be "rescued" from failing at this stage will decrease the cost of getting drugs to market, and increases revenues significantly for the company. With the price of genome sequencing falling to $5000/person, it wouldn't be unreasonable for a company to do 5-10,000 genomes for the phase 3 trial candidates, as insurance. If the drug seems to work well for a population associated with a particular set of traits, and not well for another group, it is a huge bonus for the company in getting the drug approved. If the drug causes adverse reactions in a small population of people which associate with a second set of traits, then it's even better - they'll be able to screen out adverse responders.

When it comes to getting FDA approval, any company that can clearly specify who the drug will work for - who it won't work for - and who shouldn't take it, will be miles ahead of the game, and able to fast track their application though the approval process. That's another major savings for the company.

(If you're paying attention, you'll also notice at least one new business model here: retesting old drugs that failed trials to see if you can find responsive sub-populations. Someone is going to make a fortune on this.)

Where does this meet epidemiology? Give it 5-7 years, and you'll start to see drugs appear on the shelf with warnings like "This drug is counter-indicated for patients with CYP450 variant XXXX." Once that starts to happen, physicians will really have very little choice but to start sending their patients for routine genetic testing. We already have PCR screens in the labs for some diseases and tests, but it won't be long before a whole series of drugs appear with labels like this, and insurance companies will start insisting that patients have their genomes sequenced for $5000, rather than have 40-50 individual test kits that each cost $100.

Really, though, what choice will physicians have? When drugs begin to show up that will help 99% of the patients for which they should be prescribed, but are counter indicated for genomic variations, no physician will be willing to accept the risk of prescribing without the accompanying test. (Malpractice insurance is good... but only gets you so far!) And as the tests get more complex, and our understanding of underlying cause and effect of various SNPs starts to increase, this is going to quickly go beyond the treatment of single conditions.

I can only see one conclusion: every physician will have to start working closely with a genetic councilor of some sort, who can advise on relative risk and reward of various drugs and treatment regimes. To do otherwise would be utterly reckless.

So, how long will it be until we see the effects of this transformation on our medical system? Well, give it 5 years to see the first genetic counter-indications, but it won't take long after that for our medical systems (on both sides of the border in North America) to feel the full effects of the revolution. Just wait till we start sequencing the genomes of the flu bugs we've caught to best figure out which anti-viral to use.

Gone are the days when the physician will be able to eye up his or her patient and prescribe whatever drug he or she comes up with off the top of their head. Of course, the hospitals aren't yet aware of this tsunami of information and change that's coming at them. Somehow, we need to get the message to them that they'll have to start re-thinking the way they treat people, instead of populations of people.

Labels: , ,

Tuesday, August 12, 2008

SNP callers.

I thought I'd switch gears a bit this morning. I keep hearing people say that the next project their company/institute/lab is going to tackle is a SNP calling application, which strikes me as odd. I've written at least 3 over the last several months, and they're all trivial. They seem to perform as well as any one else's SNP calls, and, if they take up more memory, I didn't think that was too big of a problem. We have machines with lots of RAM these days, and it's relatively cheap, these days.

What really strikes me as odd is that people think there's money in this. I just can't see it. The barrier to creating a new SNP calling program is incredibly low. I'd suggest it's even lower than creating an aligner - and there are already 20 or so of those out there. There's even an aligner being developed at the GSC (which I don't care for in the slightest, I might add) that works reasonably well.

I think the big thing that everyone is missing is that it's not the SNPs being called that important - it's SNP management. In order to do SNP filtering, I have a huge postgresql database with SNPs from a variety of sources, in several large tables, which have to be compared against the SNPs and gene calls from my data set. Even then, I would have a very difficult time handing off my database to someone else - my database is scalable, but completely un-automated, and has nothing but the psql interface, which is clearly not the most user friendly. If I were going to hire a grad student and allocate money to software development, I wouldn't spend the money on a SNP caller and have the grad student write the database - I'd put the grad student to work on his own SNP caller and buy a SNP management tool. Unfortunately, it's a big project, and I don't think there's a single tool out there that would begin to meet the needs of people managing output from massively-parallel sequencing efforts.

Anyhow, just some food for thought, while I write tools that manage SNPs this morning.


Labels: , , ,