Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: - Please come visit my blog there.

Sunday, January 24, 2010

Biopartnering North and a short break

First off, if anyone is going to BioPartnering North 2010 this week in Vancouver, I'll be there, and would be very happy to talk genomics/biotech and business with you. I was lucky enough to have been found worthy of one of the coveted BIOTECanada bursaries to attend the event, and I plan to get as much out of it as I can. I'll be at the reception tonight, and undoubtedly I'll be around throughout the next few days. (And, if you were wondering, I won't be blogging any talks from BPN.)

Second, I'm pretty sure everyone has noticed that my blogging output has dropped significantly since December, for which there are several good reasons. The first is that I've been quite busy. My personal life is now occupied by event planning, while my work life has been dominated by several major projects, of which I will undoubtedly be "ranting" about in posts in the near future.

However (and thirdly), the other reason I've not been blogging much is that I also had a conversation with a colleague in december about effective communications. He suggested I read a book on "Non-violent communication." I'm working my way through it slowly, and have taken a few suggestions to heart. It's always possible to become a better communicator and, to that end, I'm on a small hiatus while I re-evaluate my use of language. It won't last long - I like having a blog and I'm already itching to write a few more posts, but it's an opportunity to do some personal development.

Labels: , ,

Thursday, January 14, 2010

How to be a better Programmer: Tactics.

I'm a bit too busy for a long post, but a link was circulating around the office that I thought was worth passing on to any bioinformaticians out there.

The article above is on how to be a better programmer - and I wholeheartedly agree with what the author proposed, with one caveat that I'll get to in a minute. The point of the the article is that learning to see the big picture (not specific skills) will make you a better programmer. In fact, this is the same advice Sun Tzu gives in "The Art of War", where understanding the terrain, the enemy, etc are the tools you need to be a better general. [This would be in contrast to learning how to wield each weapon, which would only make you a better warrior.] Frankly, it's good advice, and this leads you down the path towards good planning and clear thinking - the keys to success in most fields.

The caveat, however, is that there are times in your life where this is the wrong approach: ie. grad school. As a grad student, your goal isn't to be great at everything you touch - it's to specialize in some small corner of one field, and tactics are no help here. If grad school existed for Ninjas, the average student would walk out being the best (pick one of: poisoner/dart thrower/wall climber/etc) in the world - and likely knowing little or nothing about how to be a real ninja beyond what they learned in their Ninja undergrad. Tactics are never a bad investment, but they aren't always what is being asked of you.

Anyhow, I plan to take the advice in the article and to keep studying the tactics of bioinformatics in my spare time, even though my daily work is more on the details and implementation side of it. There are a few links in the comments of the original article to sites the author believes are good comp-sci tactics... I'll definitely be looking into those tonight. Besides, when it comes down to it, the tactics are really the fun parts of the problems, although there is also something to be said for getting your code working correctly and efficiently.... which I'd better get back to. (=

Happy coding!

Labels: , , , , ,

Thursday, September 3, 2009

DTC Snps... no more risk factors!

I've been reading Daniel's blog again. Whenever I end up commenting on things I don't understand well, that's usually why. Still, it's always food for thought.

First of all, has anyone quantified the actual error rate on these tests? We know they have all sorts of mistakes going on. (This one was recently in the news, and yes, unlike Wikipedia, Daniel is a valid reference source for anything genomics related.) I'll come back to this point in a minute.

As I understand it, the risk factor is an adjustment made to the likelihood of the general population in characterizing the risk of an individual suffering from a particular disease.

So, as I interpret it, you take whatever your likelihood of having that disease was multiplied by the risk factor. For instance with a disease like Jervell and Lange-Nielsen Syndrome, 6 of every 1 Million people suffer from it's effects (although this is a bad example since you would have discovered it in childhood, but ignoring that for the moment we can assume another rare disease with a similar rate.) If our DTC test shows we have a 1.17 risk factor because we have a SNP, we would multiply that by 1.17.

6/1,000,000 x 1.17 = 7/1,000,000

if I've understood it all correctly, that means you've gone from knowing you have a 0.000,6% chance to being certain you have a 0.000,7% chance of suffering from your selected disease. (What a great way to spend your money!)

But lets not stop there. Lets ask about the the error rate on actually calling that snp is. From my own experience in SNP validation, I'd make a guess that the validation rate is close to 80-90%. Lets even be generous and take the high end. Thus:

You've gone from 100% knowing you've got a 0.000,6% chance of having a disease to being 90% sure you have a 0.000,7% chance of having a disease and a 10% sure you've still got a 0.000,6% of having the disease.

Wow, I'm feeling enlightened.

Lets do the same for something like Celiacs disease, which is estimated to strike 1/250 people, but is only diagnosed 1/4,700 people in the U.S.A. - and lets be generous and assume that the SNP in your DTC test has a 1.1 risk factor. (Celiacs is far from a rare disease, I might add.)

As a member of the average U.S. population, you had a 0.4% chance of having the disease, but a 0.02% chance of being diagnosed with it. That's a pretty big disparity, so maybe there's a good reason to have this test done. As a Canadian it's somewhat different odds, but lets carry on with the calculations anyhow.

lets say you do the test and find out you have a 1.1 times risk factor of having the disease. omg scary!

Wait, lets not freak out yet. That sounds bad, but we haven't finished the calculations.

Your test has the SNP.... 1.1 x 1/250 = 0.44% likelihood you have the disease. Because Celiacs disease requires a biopsy to definitively diagnose it (and treatment does not start till you've done the diagnosis), would you run out and submit yourself to a biopsy on a 0.44% chance you have a disease? Probably not unless you have some other knowledge that you're likely to have this disease already.

Then, we factor in the 90% likelyhood of getting the SNP call correct: You have a 90% likelihood of having a 0.44% chance of having the disease, and a 10% likelihood of having a 0.4% chance of having the disease.

Ok, I'd be done panic-ing about now. And we've only considered two simple things here. Lets add one more just for fun.

lets pretend that an unknown environmental stressor is actually involved in triggering the condition, which would explain why the odds are somewhat different in Canada. Since we know nothing about that environmental trigger, we can't even project odds of coming in contact with it. Who knows what effect that plays with the SNP you know about.

By now, I can't help thinking that all of this is just a wild goose chase.

So, when people start talking about how you have to take your DTC results to a Genetic Counsellor or to your MD I really have to wonder. I can't help but to think that unless you have a very good reason to suspect a disease or if you have some form of a priori knowledge, this whole thing is generally a waste. Your Genetic Counsellor will probably just laugh at you, and your MD will order a lot of unnecessary tests - which of those sounds productive?

Let me make a proposal (and I'm happy to hear dissent): Risk factors are great - but are absolutlely useless when it comes to discussing how genetic factors affect you. Lets leave the risk factors to the people writing the studies and ask the DTC companies to make a statement: what are your odds of being affected by a given condition? And, if you can't make a helpful prediction (aka, a diagnostic test), maybe you shouldn't be selling it as a test.

Labels: , , ,

Tuesday, September 1, 2009

How much time I spend...

I was just thinking about the division of time amongst the various things I work on - and realized it's pretty bizarre. Unlike most grad students, I have to interface with people using my software for many different analysis types - some of which are "production" quality. That has it's own challenges, but I'll leave that for another day.

I figured I could probably recreate my average week in a pie chart form, covering the work I've been doing...

Honestly, though, It's just an estimate - and the sum is actually more than 40 hours a week. (I do work in the evenings, sometimes - and support for FindPeaks happens when I check my email in the evening, too. - to compensate, I may have been stingy on the hours spend goofing off....)

Anyhow, I think it would be an interesting project to try to keep track of how I spend my time. Maybe I'll give it a try when I come back from vacation. (Yes, I'll be away next week.)

Still, even from this estimate, three things are very clear:
  1. I need to spend more time upfront writing tests for my software to cut down on debugging
  2. I need to spend more time reading journals.
  3. I am clearly underestimating the time I spend playing Ping Pong. But hey, I work through lunch!

Labels: , ,

Thursday, August 13, 2009

Ridiculous Bioinformatics

I think I've finally figured out why bioinformatics is so ridiculous. It took me a while to figure this one out, and I'm still not sure if I believe it, but let me explain to you and see what you think.

The major problem is that bioinformatics isn't a single field, rather, it's the combination of (on a good day) biology and computer science. Each field on it's own is a complete subject that can take years to master. You have to respect the biologist who can rattle off the biochemicals pathway chart and then extrapolate that to the annotations of a genome to find interesting features of a new organism. Likewise, theres some serious respect due to the programmer who can optimize code down at the assembly level to give you incredible speed while still using half the amount of memory you initially expected to use. It's pretty rare to find someone capable of both, although I know a few who can pull it off.

Of course, each field on it's own has some "fudge factors" working against you in your quest for simplicity.

Biologists don't actually know the mechanisms and chemistry of all the enzymes they deal with - they are usually putting forward their best guesses, which lead them to new discoveries. Biology can effectively be summed us as "reverse engineering the living part of the universe", and we're far from having all the details worked out.

Computer Science, on the other hand, has an astounding amount of complexity layered over every task, with a plethora of languages and system, each with their own "gotchas" (are your arrays zero based or 1 based? how does your operating system handle wild cards at the command line? what does your text editor do to gene names like "Sep9") leading to absolute confusion for the novice programmer.

In a similar manner, we can also think about probabilities of encountering these pitfalls. If you have two independent events, and each of which has a distinct probability attached, you can multiply the probabilities to determine the likelihood of both events occurring simultaneously.

So, after all that, I'd like to propose "Fejes' law of interdisciplinary research"

The likelihood of achieving flawless work in an interdisciplinary research project is the product of the likelihood of achieving flawless work in each independent area.

That is to say, that if your biology experiments (on average) are free of mistakes 85% of the time, and your programming is free of bugs 90% of the time. (eg, you get the right answers), your likely hood of getting the right answer in a bioinformatics project is:
Fp = Flawless work in Programming
Fb = Flawless work in Biology
Fbp = Flawless work in Bioinformatics

Thus, according to Fejes' law:
Fb x Fp = Fbp

and the example given:
0.90 x 0.85 = 0.765

Thus, even an outstanding programmer and bioinformatician will struggle to get an extremely high rate of flawless results.

Fortunately, there's one saving grace to all of this: The magnitude of the errors is not taken into account. If the bug in the code is tiny, and has no impact on the conclusion, then that's hardly earth shattering, or if the biology measurements have just a small margin of error, it's not going to change the interpretation.

So there you have it, bioinformticians. if i haven't just scared you off of ever publishing anything again, you now know what you need to do...

Unit tests, anyone?

Labels: , , , ,

Monday, June 22, 2009

4 Freedoms of Research

I'm going to venture off the beaten track for a few minutes. Ever since the discussion about conference blogging started to take off, I've been thinking about what the rights of scientists really are - and then came to the conclusion that there really aren't any. There is no scientist's manifesto or equivalent oath that scientists take upon receiving their degree. We don't wear the iron ring like engineers, which signifies our commitment to integrity...

So, I figured I should do my little part to fix that. I'd like to propose the following 4 basic freedoms to research, without which science can not flourish.
  1. Freedom to explore new areas
  2. Freedom to share your results
  3. Freedom to access findings from other scientists
  4. Freedom to verify findings from other scientists
Broadly, these rights should be self evident. They are tightly intermingled, and can not be separated from each other:
  • The right to explore new ideas depends on us being able to trust and verify the results of experiments upon which our exploration is based.
  • The right to share information is contingent upon other groups being able to access those results.
  • The purpose of exploring new research opportunities is to share those results with people who can use them to build upon them
  • Being able to verify findings from other groups requires that we have access to their results.
In fact, they are so tightly mingled, that they are a direct consequence of the scientific method itself.
  1. Ask a question that explores a new area
  2. Use your prior knowledge, or access the literature to make a best guess as to what the answer is
  3. Test your result and confirm/verify if your guess matches the outcome
  4. share your results with the community.
(I liked the phrasing on this site) Of course if your question in step 1 is not new, you're performing the verification step.

There are constraints on what we are allowed to do as scientists as well, we have to respect the ethics of the field in which we do our exploring, and we have to respect the fact that ultimately we are responsible to report to the people who fund the work.

However, that's where we start to see problems. To the best of my knowledge, funding sources define the directions science is able to explore. We saw the U.S. restrict funding to science in order to throttle research in various fields (violating Research Freedom #1) for the past 8 years, which was effectively able to completely halt stem cell research, and suppress alternative fuel sources, etc. In the long term, this technique won't work, because the scientists migrate to where the funding is. As the U.S. restores funding to these areas, the science is returning. Unfortunately, it's Canada's turn, with the conservative government (featuring a science minister who doesn't believe in evolution) removing all funding from genomics research. The cycle of ignorance continues.

Moving along, and clearly in a related vein, Freedom #2 is also a problem of funding. Researchers who would like to verify other group's findings (a key responsibility of the basic peer-review process) aren't funded to do this type of work. While admitting my lack of exposure to granting committees, I've never heard of a grant being given to verify someone else's findings. However, this is the basic way by which the scientists are held accountable. If no one can repeat your work, you will have many questions to answer - and yet the funding for ensuring accountability is rarely present.

The real threat to an open scientific community occurs with the last two Freedoms: sharing and access. If we're unable to discuss the developments in our field, or are not even able to gain information on the latest work done, then science will come grinding to a major halt. We'll waste all of our time and money exploring areas that have been exhaustively covered, or worse yet, come to the wrong conclusions about what areas are worth exploring in our ignorance of what's really going on.

Ironically, Freedoms 3 and 4 are the most eroded in the scientific community today. Even considering only the academic world, where freedoms are taken for granted our interaction with the forums for sharing (and accessing) information are horribly stunted:
  • We do not routinely share negative results (causing unnecessary duplication and wasting resources)
  • We must pay to have our results shared in journals (limiting what can be shared)
  • We must pay to access other scientists results in journals (limiting what can be accessed)
It's trivial to think of other examples of how these two freedoms are being eroded. Unfortunately, it's not so easy to think of how to restore these basic rights to science, although there are a few things we can all do to encourage collaboration and sharing of information:
  • Build open source scientific software and collaborate to improve it - reducing duplication of effort
  • Publish in open access journals to help disseminate knowledge and bring down the barriers to access
  • Maintain blogs to help disseminate knowledge that is not publishable
If all scientists took advantage of these tools and opportunities to further collaborative research, I think we'd find a shift away from conferences towards online collaboration and the development of tools favoring faster and more efficient communication. This, in turn, would provide a significant speed up in the generation of ideas and technologies, leading to more efficient and productive research - something I believe all scientists would like to achieve.

To close, I'd like to propose a hypothesis of my own:
By guaranteeing the four freedoms of research, we will be able to accomplish higher quality research, more efficient use of resources and more frequent breakthroughs in science.
Now, all I need to do is to get someone to fund the research to prove this, but first, I'll have to see what I can find in the literature...

Labels: , , , , ,

Saturday, May 16, 2009

BIoinformatics in the lab

After yesterday's talk by Dr. Bowdish (I just feel weird calling professors by the first name when referring to their talks), I walked away with several different trains of thought, one of which was the easy integration of bioinformatics into the research program she'd undertaken. The interesting thing isn't so much that it was there, but the absolutely relaxed attitude with which it had been presented.

When I first started talking to professors about the interface between computers and biology or biochemistry, the field had barely even been given a name - and most of the professors were utterly confused about what computers could do to enhance their research programs. (Yes, I was an undergrad in the mid-90's.) I remember several profs saying they couldn't think of a reason to have computers in their labs at all. (Of course, at the time, there probably wasn't much use for computers in the lab anyhow.)

There was one prof who was working on the edge of the two subjects: Dr. Patricia Schulte. Although she was working on the field of fish biology, somehow she was able to see the value and encourage her students to explore the interface of bioinformatics and lab integration - and she was the first person to introduce me to the term Bioinformatics (among many other topics: HMMs, Neural Nets, etc...)

Anyhow, at that point, I was hooked on bioinformatics, but finding the opportunity to do hands on work was nearly impossible. The biology professors didn't know what it could do for them - and clearly didn't have the vocabulary with which to express their interests in computational information. It was awkward, at times. One prof couldn't figure out why I wanted to use word processors for biology.

To my great amazement, things have dramatically changed in the (nearly) decade and a half since I started my first undergrad, and yesterday's talk was really a nice opportunity to contemplate that change. Dr. Bowdish's talk included a significant amount of biology, genomics and bioinformatics predictions. When the predictions didn't turn out (eg. the putative myristolation site wasn't actually important), there was no accompanying comment about how unreliable bioinformatics is (which I used to see ALL the time in the early days of the field), and there was no hesitation to jump in to the next round of bioinformatics predictions (structure predictions for the enzyme).

I think even this quiet incorporation of bioinformatics into a young lab is incredibly encouraging. Perhaps it's Dr. Bowdish's past, having done her PhD in Dr. Hancock's lab, who himself was an early adopter of bioinformatics predictions, or possibly it's just researchers who have grown up with computers for most of their life finally getting into the ranks of academia. Either way, I'm impressed and encouraged. Bioinformatics gold age may not be here yet, but I think the idea that they'll never become mainstream has finally started to fade from the halls of the ivory tower.

Labels: ,

Wednesday, February 11, 2009

Epidemiology and next-generation(s) sequencing.

I had a very interesting conversation this morning with a co-worker, which ended up as a full fledged conversation about how next generation sequencing will end up spilling out of the research labs to the physician's office. My co-worker originally stated that it will take 20 years or so for it to happen, which seems kind of off to me. While most inventions take a lot longer to get going, I think that next-gen sequencing will cascade over more quickly to general use a lot more quickly than people appreciate. Let me explain why.

The first thing we have to acknowledge is that pharmaceutical companies have a HUGE interest in making next gen sequencing work for them. In the past, pharma companies might spend millions of dollars getting a drug candidate to phase 2 trials, and it's in their best interest to get every drug as far as they can. Thus, any drug that can be "rescued" from failing at this stage will decrease the cost of getting drugs to market, and increases revenues significantly for the company. With the price of genome sequencing falling to $5000/person, it wouldn't be unreasonable for a company to do 5-10,000 genomes for the phase 3 trial candidates, as insurance. If the drug seems to work well for a population associated with a particular set of traits, and not well for another group, it is a huge bonus for the company in getting the drug approved. If the drug causes adverse reactions in a small population of people which associate with a second set of traits, then it's even better - they'll be able to screen out adverse responders.

When it comes to getting FDA approval, any company that can clearly specify who the drug will work for - who it won't work for - and who shouldn't take it, will be miles ahead of the game, and able to fast track their application though the approval process. That's another major savings for the company.

(If you're paying attention, you'll also notice at least one new business model here: retesting old drugs that failed trials to see if you can find responsive sub-populations. Someone is going to make a fortune on this.)

Where does this meet epidemiology? Give it 5-7 years, and you'll start to see drugs appear on the shelf with warnings like "This drug is counter-indicated for patients with CYP450 variant XXXX." Once that starts to happen, physicians will really have very little choice but to start sending their patients for routine genetic testing. We already have PCR screens in the labs for some diseases and tests, but it won't be long before a whole series of drugs appear with labels like this, and insurance companies will start insisting that patients have their genomes sequenced for $5000, rather than have 40-50 individual test kits that each cost $100.

Really, though, what choice will physicians have? When drugs begin to show up that will help 99% of the patients for which they should be prescribed, but are counter indicated for genomic variations, no physician will be willing to accept the risk of prescribing without the accompanying test. (Malpractice insurance is good... but only gets you so far!) And as the tests get more complex, and our understanding of underlying cause and effect of various SNPs starts to increase, this is going to quickly go beyond the treatment of single conditions.

I can only see one conclusion: every physician will have to start working closely with a genetic councilor of some sort, who can advise on relative risk and reward of various drugs and treatment regimes. To do otherwise would be utterly reckless.

So, how long will it be until we see the effects of this transformation on our medical system? Well, give it 5 years to see the first genetic counter-indications, but it won't take long after that for our medical systems (on both sides of the border in North America) to feel the full effects of the revolution. Just wait till we start sequencing the genomes of the flu bugs we've caught to best figure out which anti-viral to use.

Gone are the days when the physician will be able to eye up his or her patient and prescribe whatever drug he or she comes up with off the top of their head. Of course, the hospitals aren't yet aware of this tsunami of information and change that's coming at them. Somehow, we need to get the message to them that they'll have to start re-thinking the way they treat people, instead of populations of people.

Labels: , ,

Friday, January 30, 2009

A Change of Pace...

This post was inspired out of frustration - one of the biggest problems with bioinformatics is how quickly things change. It is also a huge strength, but it can be a major problem for people in the field.

The idea came out of a simple annoyance: someone renamed all of our reference genome fasta files last night, and clearly forgot to let people know. At least, those people I spoke to didn't know anything about it, so it wasn't just me missing a meeting. I can see the advantages of doing it, and I fully support it - but it should have been done a year ago, and when they finally got around to it, they should have sent a major email. Instead, I queued up a bunch of jobs and fired them off only to watch as they all started crashing.

Wonderful use of resources.

At any rate, that got me thinking about the change of pace of next-generation sequencing. I've seen several threads asking questions about getting set up to use the MAQ aligner, obviously written by people who are just getting started in with aligners. These threads, unfortuantely, are being written after the author of the software has already abandoned that project and moved on to a new aligner. So much goes on without people realizing they're working on the last wave of the technology.

That's far from an isolated case - I found a program called NestedMICA for doing motif scanning, which would be a cool side project. I won't link to it, though, because it's clearly also been abandoned. Only two years ago it was a very promising application with a decent publication. Now, it won't compile and the author isn't responding to emails. I've spoken to motif people and they all change motif scanners and tools about as often as they change their socks. (well, no, the socks get changed a little more often.)

Keeping up with the latest and greatest tools is a huge burden for people in this field, and it's practically impossible to do if your interests are at all diversified out of one of the major subjects.

I suppose that bioinformatics is far from the only field in which these things happen, but I just can't think of another example where the ante to get in the game is so low (being able to program), the subject is so accessible (internet access gets you access to the data), and the questions are so fundamental (how does the cell work?)

All this churn and people jumping head first into the field leads to a plethora of unmaintainable perl programs, abandoned code and half baked packages without documentation. (At the worst case, of course!) In industry, any field with this kind of bandwagon would be ripe for consolidation, but in academica, it's just a Darwinian process where many many failures seem to be required for each success. And somehow, I have no idea how to pick the winner.

All of this leaves me wondering where the field will be in 6 months or a year or two. I guess that's why scientists go to conferences: to see if we can get a glipse into the crystal ball.

Compare this with biology. Can you imagine if every 4 months, there would be a completely different way of doing cloning or that the pcr technique you used would become obsolete?

How about chemists? Need a new way to determine your compounds melting point every 4 months?

Or Physicists... your model of gravity changes every 4 months?

I dunno. the pace is exhilarating... but sometimes exhausting.

Excuse me while I go make a few more changes to change the way I process chip-seq samples.... again.


Tuesday, January 6, 2009

My Geneticist dot com

A while back, I received an email from a company called that is doing genetic testing to help patients identify adverse drug reactions. I'm not sure what the relationship is, but they seem to be a part of something called DiscoverMe technologies. I bring mygeneticist up, because I had an "interview" with one of their partners, to determine if I am a good subject for their genetic testing program. It seems I'm too healthy to be included, unless they later decide to include me as a control. Nuts-it! (I'm still trying to figure out how to get my genome sequenced here at the GSC too, but I don't think anyone wants to fund that...)

At any rate, I spoke with the representative of their clinical side of operations this morning and had an interesting conversation about my background. In typical fashion, I also took the time to ask a few specific questions about their operations. I'm pretty sure they didn't tell me much more than was available on their various web pages, but I think there was some interesting information that came out of it.

When I originally read their email, I had assumed that they were going to be doing WTSS on each of their patients. At about $8000 per patient, it's expensive, but a relatively cheap form of discovery - if you can get around some of the challenges involved in tissue selection, etc. Instead, it seems that they're doing specific gene interrogation, although I wasn't able to get the type of platform their using. This leads me to believe that they're probably doing some form of literature check for genes related to the drugs of interest, followed by a PCR or Array based validation across their patient group. Considering the challenges of associating drug reactions with SNPs and genomic variation, I would be very curious to see what they have planned for "value-added" resources. Any drug company can find out (and probably does already know) what's in the literature, and any genetic testing done without approval from the FDA will probaby be sued/litigated/regulated out of existance... which doesn't leave a lot of wiggle room for them.

And that lead me to thinking about a lot of other questions, which went un-asked. (I'll probably email the Genomics expert there to ask some questions, though I'm mostly interested in the business side of it, which they probably won't answer.) What makes them think that people will pay for their services? How can they charge a low-enough fee to make the service attractive while getting making a profit? And, from the scientific side, assuming they're not just a diagnostic application company, I'm not sure how they'll get a large enough cohort to make sense of the data they receive through their recruitment strategy.

Anyhow, I'll be keeping my eyes on this company - if they're still around in a year or two, I'd be very interested in talking to them again about their plans in the next-generation sequencing field.

Labels: , , ,

Saturday, December 6, 2008

Nothing like reading to stimulate ideas

Well, this week has been exciting. The house sale competed last night, with only a few hiccups. Both us and the seller of the house we were buying got low-ball offers during the week, which provided the real estate agents lots to talk about, but never really made an impact. We had a few sleepless nights waiting to find out of the seller would drop our offer and take the competing one that came in, but in the end it all worked out.

On the more science-related side, despite the fact I'm not doing any real work, I've learned a lot, and had the chance to talk about a lot of ideas.

There's been a huge ongoing discussion about the qcal values, or calibrated base call scores that are appearing in Illumina runs these days. It's my understanding that in some cases, these scores are calibrated by looking at the number of perfect alignments, 1-off alignments, and so on, and using the SNP rate to identify some sort of metric which can be applied to identify an expected rate of mismatched base calls. Now, that's fine if you're sequencing an organism that has a genome identical to, or nearly identical to the reference genome. When you're working on cancer genomes, however, that approach may seriously bias your results for very obvious reasons. I've had this debate with three people this week, and I'm sure the conversation will continue on for a few more weeks.

In terms of studying for my comprehensive exam, I'm now done the first 12 chapters of the Weinberg "Biology of Genomes" textbook, and I seem to be retaining it fairly well. My girlfriend quizzed me on a few things last night, and I did reasonably well answering the questions. 6 more days, 4 more chapters to go.

The most interesting part of the studying was Thursday's seminar day. In preparation for the Genome Sciences Centre's bi-annual retreat, there was an all-day seminar series, in which many of the PIs spoke about their research. Incidentally, 3 of my committee members were speaking, so I figured it would be a good investment of my time to attend. (Co-incidentally, the 4th committee member was also speaking that day, but on campus, so I missed his talk.)

Indeed - having read so many chapters of the textbook on cancer biology, I was FAR better equipped to understand what I was hearing - and many of the research topics presented picked up exactly where the textbook left off. I also have a pretty good idea what questions they will be asking now: I can see where the questions during my committee meetings have come from; it's never far from the research they're most interested in. Finally, the big picture is coming together!

Anyhow, two specific things this week have stood out enough that I wanted to mention them here.

The first was the keynote speaker's talk on Thursday. Dr. Morag Park spoke about the environment of tumours, and how it has a major impact on the prognosis of the cancer patient. One thing that wasn't settled was why the environment is responding to the tumour at all. Is the reaction of the environment dictated by the tumour, making this just another element of the cancer biology, or does the environment have it's own mechanism to detect growths, which is different in each person. This is definitely an area I hadn't put much thought into until seeing Dr. Park speak. (She was a very good speaker, I might add.)

The second item was something that came out of the textbook. They have a single paragraph at the end of chapter 12, which was bothering me. After discussing cancer stem cells, DNA damage and repair, and the whole works (500 pages of cancer research into the book...), they mention progeria. In progeria, children age dramatically quickly, such that a 12-14 year old has roughly the appearance of an 80-90 year old. It's a devastating disease. However, the textbook mentions it in the context of DNA damage, suggesting that the progression of this disease may be caused by general DNA damage sustained by the majority of cells in the body over the short course of the life of a progeria patient. This leaves me of two minds: 1), the DNA damage to the somatic cells of a patient would cause them to lose tissues more rapidly, which would have to be regenerated more quickly, causing more rapid degradation of tissues - shortening telomeres would take care of that. This could be cause a more rapid aging process. However, 2) the textbook just finished describing how stem cells and rapidly reproducing progenitor cells are dramatically more sensitive to DNA damage, which are the precursors involved in tissue repair. Wouldn't it be more likely then that people suffering with this disease are actually drawing down their supply of stem cells more quickly than people without DNA repair defects? All of their tissues may also suffer more rapid degradation than normal, but it's the stem cells which are clearly required for long term tissue maintenance. An interesting experiment could be done on these patients requiring no more than a few milliliters of blood - has their CD34+ ratio of cells dropped compared to non-sufferers of the disease? Alas, that's well outside of what I can do in the next couple of years, so I hope someone else gives this a whirl.

Anyhow, just some random thoughs. 6 days left till the exam!

Labels: , , , , , ,

Friday, November 28, 2008

It never rains, but it pours...

Today is a stressful day. Not only do I need to to finish my thesis proposal revisions (which are not insignificant, because my committee wants me to focus more on the biology of cancer), but we're also in the middle of real estate negotiations. Somehow, this is more than my brain can handle on the same day... At least we should know by 2pm if our counter-offer was accepted on the sales portion of the transaction, which would officially trigger the countdown on the purchase portion of the transaction. (Of course, if it's not accepted, then more rounds of offers and counter-offers will probably take place this afternoon. WHEE!)

I'm just dreading the idea of doing my comps the same week as trying to arrange moving companies and insurance - and the million other things that need to be done if the real estate deal happens.

If anyone was wondering why my blog posts have dwindled down this past couple of weeks, well, now you know! If the deal does go through, you probably won't hear much from me for the rest of this year. Some of the key dates this month:
  • Dec 1st: hand in completed and reviewed Thesis Proposal
  • Dec 5th: Sales portion of real estate deal completes.
  • Dec 6th: remove subjects on the purchase, and begin the process of arranging the move
  • Dec 7th: Significant Other goes to Hong Kong for~2 weeks!
  • Dec 12th: Comprehensive exam (9am sharp!)
  • Dec 13th: Start packing 2 houses like a madman!
  • Dec 22nd: Hannukah
  • Dec 24th: Christmas
  • Dec 29th: Completion date on the new house
  • Dec 30th: Moving day
  • Dec 31st: New Years!
And now that I've procrastinated by writing this, it's time to get down to work. I seem to have stuff to do today.

Labels: , , , , , ,

Saturday, November 22, 2008

Is medicine ready for 2nd Generation Genomics?

Yesterday was the second day of the 10th Annual B.C. Cancer Conference, which draws in researchers, practitioners and cancer survivors from around BC and the world. It also draws in a lot of drug companies, but that's somewhat besides the point.

One particular part of the conference caught my eye as a must see: the Cancer Genetics Laboratory Open House. This event was a set of posters and hands-on demonstrations for how genetics can be used to help cure cancer. Since that's essentially my thesis project, I figured I absolutely had to attend it.

Unfortunately, I was rather disappointed in the scope of the work they do, though not because they do poor work, but rather that my expectations were too high. For breast cancer, they only screen people who have a family history of breast cancer, and even then they only look for two markers in BRCA1 and BRCA2. That's not a bad thing, though - those two genes make up a significant portion of the hereditary breast cancer risk for women. What surprised me was how reactive the technology was - the lab only screens patients with a high likelihood of carrying the mutant genes, and only patients referred to them by physicians who suspect the familial disease association. Again, this is pretty standard, so there's no criticism meant.

However, what concerns me is whether these people will be ready for the onslaught of information that's about to hit them. Ina couple of years, genetics researchers will be handing off the testing of tens of thousands of simultaneous genes, gene splicing defects, and complete analyses of transcriptomes/genomes, which are currently being done in the lab, as we speak. This is a far cry from doing PCR on two genes to look for SNPs, with a massive technology gap in between.

The Cancer Genetics Lab appears to have 15 doctors and technologists, which is a small staff to support a whole city of several million people, let alone the whole province. I have to wonder if any of them have any experience with 2nd-generation sequencing, seriously high throughput genetic screens, or even the concept of how to do genetic councilling about risk factors in the "whole genome diagnostics" age. Of course, I didn't spend too long at the session, so I don't know if anyone there is an expert, however the few people I spoke to weren't really talking about the upcoming changes to their discipline, so I'm a little skeptical.

In any case, I do have to wonder how this pandoras box of information we're unleashing about the makeup of patient's cell and heredity will effect the downstream medical practitioners, and how well they are prepared to deal with it. Are the seminars to bring these people up to speed on what's coming at them? Are the agencies ready to shell out the money for the infrastructure they'll need? Are the people writing the textbooks that educate these people including chapters on the subject?

It's all nice that I talk about trying to understand how a cancer works in the lab through 2nd-generation sequencing, but I have to wonder what we should be doing in the meantime to prepare them from the firehose of information that we're going to point at them and let loose. The personal genomics revolution is poised to land on these people like a ton of bricks, and with about as much mercy.

Then again, lest we be smug about it, how many of us are writing aligners for SMRT sequencing that's already on the horizon in our own field. Preparing for the future is definitely hard when we're still coping with the present. I'm sure it's no different for the Hospital Genetics Labs, even if they're 15 years behind the cutting edge.

Labels: ,

Tuesday, November 18, 2008

Dancing your research.

I heard a new expression the other day, apparently credited to Steve Martin,
"Talking about music is like dancing about architecture."
I laughed for a few minutes, and then realized it wasn't so silly, after all. Dance is a pretty powerful form of communication, even if I personally couldn't communicate anything other than a broken toe through that medium.

Still, not only can you dance about architecture, you can also dance about science. I received an email advertising the second year of the AAAS Science Dance Contest competition, which has just closed. I'll admit I read all about the last year's competition, but I didn't personally watch the tapes. This year, with youtube playing an important role, we'll all be able to judge for ourselves just how effectively dance can be used to describe the universe.

It's just too bad that my research isn't on Honey Bees, or other odd bee shaped things. How do you express a SNP through dance?


Wednesday, September 17, 2008

Math as art

I came across an article on the BBC website about the art of maths, which is well worth the two minutes or so it takes to play. The images are stunning (I particularly like the four dimensional picture in two dimensions), and the narration is quite interesting. As a photographer, I especially enjoyed the comparison of math as art to photography as an art. My own take is that both math and photography share the common artistic element of forcing the artist to determine how to best express what it is they're seeing. Two people can see the same thing, and still get very different photos, which is also a component of expressing math in an artistic manner.

That got me thinking about genomics as art as well. I'm aware of people who've made music out of DNA sequences, but not visual art. Oh, sure, you can mount a karyotype or electrophoresis image on your wall, and it's pretty, but I don't think genomics has realized the potential for expressing DNA form and function in a non-linear manner yet.

Still, it's obviously on it's way. Every time I go to a presentation, I see a few more "pretty" graphs. Or maybe I've just gone to too many seminars when a graph of the clustering of gene arrays starts to look like a Mondrian picture. Who knows... maybe ChIP-Seq will start looking like that too? (=


Wednesday, July 9, 2008

RFC 1925

I just came across a wonderful RFC, dating back to 1996, that is well worth the read. In light of the problems I'm facing getting my MAQ .map file reader to work, I figured this is a great comment on my frustration with undocumented file formats.

Although written as The Twelve Networking Truths, I think it applies to writing code in general, and likely far more broadly as well.


Friday, June 15, 2007

Thought for the day

A while back, someone mentioned to me that human beings left Africa roughly 40,000 years or so, marking that start of Homo sapiens as a species. That works out to about 2000 generations. In evolutionary terms, that's pretty much nothing. If you apply the same thing to E. coli, the scientists workhorse bacteria, which doubles every 20 minutes or so, you'd get about 27 days of recorded history. That is to say, if E. coli suddenly developed intelligence today, by sometime in mid July, they'd have 2000 generations. Recorded history is roughly a quarter of that, meaning that some time next week, they'd have kept records over as many generations as we have.

If it takes bacteria a few years just to develop resistance to an antibiotic (tens of thousands of generations), how long will it take humans to change in any noticeable way?