Anthony tweets: Evil genius: Jalepeno flavoured almonds.

Friday, February 5, 2010

GFF3 undocumented feature...

Earlier today, I tweeted:
Does anyone know how to decypher a diBase GFF3 file? They don't identify the "most abundant" nucleotide uniquely. seems useless to me.
Apparently, there is a solution, albeit undocumented:

The attribute "genotype" contains an IUB code that is limited to using either a single base or a double base annotation (eg, it should not contain, H, B, V, D or N - but may contain R, Y, W, S, M or K ), which then allows you to subtract the "reference" attribute (that must be canonical) from the "genotype" attribute IUB code to obtain the new SNP - but only when the "genotype" attribute is not a canonical base.

If only that were documented somewhere...

UPDATE: Actually, this turns out not to be the case at all -- there are still positions for which the "genotype" attribute is an IUB code, and the reference is not one of the called bases. DOH!

Labels: , , ,

Thursday, February 4, 2010

Why do you blog - part II?

In the first part, I answered the generic "why do you blog" questions. The second part, I wanted to address one of the questions Heather Etchevers asked, because it really is the core reason of why we blog:

Who are you blogging for/who are you talking to?

After much soul searching, I have to answer the reason that probably lies behind all blogs: I blog for myself. If I didn't get something out of it, I wouldn't be doing it. Although, that doesn't mean there's nothing altruistic about the time I invest - there are people who find some of the information I post to be useful. However, it makes me happy to when I get a comment that tells me that something I post made them think, answered a question or just helped them get their computer configured. Yes, I (unashamedly) love to help people, and that is what I get out of blogging.

The less subtle question implied is "who do you think your target audience is?" As to that, I have to admit, I'm not sure. There are several distinct groups who might find information I post to be useful:
  1. People who do Chip-Seq may enjoy the posts on FindPeaks
  2. Next Generation Sequencing related posts may have a broader audience of scientists in the field
  3. People who use Linux probably enjoy the Ubuntu related posts
  4. Grad Students might find my school related posts to be insightful (maybe?)
  5. Anyone who enjoys art might find some of my science art to be unique.
And yes, what that should tell you is that I have a wide, diverse audience. I would suggest that many of the groups above are non-overlapping, so at any one time, I'm probably boring 80% of my audience.

That is the core of the "why am I blogging" question: who am I writing for? Between now and the time I move my blog over to NN (Yes, I think that's where I'm headed), I'm going to try to narrow it down a bit. Some decisions are fairly easy, I'm probably going to drop my linux related posts (there are better forums, and I already participate in them.) and the art/photography is already on the wane. The Grad School posts will probably accelerate for a bout a year (hopefully) and then tail off completely. That should provide a little more focus, in the wake of my scholastic adventures, assuming I can continue blogging once I leave academia - I'll cross that bridge when I get to it.

So, why do I blog? Because I enjoy the conversations and the community. As long as people are reading what I have to say, as long as I get the occasional comment, and as long as there is a reason to keep talking, I'll keep blogging.

Ah... Clarity. (=

Labels:

Why do you blog - part I?

In light of recent events, this is a question I've had to ask my self. Why am I blogging, and is it worth continuing. Actually, it's not hard to answer, but worth returning to, periodically.

Since I've been thinking about it quite a bit recently, it's no surprise that other blogger's posts on the same topic are of interest to me. One of the blogs I read quite often belongs to Heather Etchevers, and she has an interesting take on it. It's also worth noticing the link on the top of her post to a discussion on it, as well as the answers from other bloggers.

Anyhow, I thought I'd take a stab at the questions myself.

1. What is your blog about?

My blog is generally anything related to next-generation sequencing, the open source science development I do and my journey through grad studies. Anything that catches my eye that's related (sometimes tenuously) to one of those is fair game.

2. What will you never write about?

Anyone who hasn't explicitly agreed to be a part of my blog. I am fantastically lucky to have wonderful people in my life, but their participation in my life isn't consent to being included in anything I write.

3. Have you ever considered leaving science?

Yep - After leaving my start-up company, I briefly toyed with the idea of doing other things and just starting fresh in another field. In the end, my love of science won out.

4. What would you do instead?

Oddly enough, I'd probably have done photography. I'm content to let it be a hobby, for now, but Travel photography is really a passion of mine, and I'd love to do more of it. Incidentally, I started the blog about the same time, because I had initially intended to use it to display my pictures. Interesting how things work out...

5. What do you think will science blogging be like in 5 years?

Not all that much more different - just a lot more condensed. Twitter is becoming an alternative to blogging, and I think the two will converge somewhere for most people.

6. What is the most extraordinary thing that happened to you because of blogging

Wow... that's tough. All the really cool people I've met has been an incredible bonus that I never expected. The fact that people read my blog at all never seizes to amaze me. Anytime I'm at a conference and someone recognizes my name, I'm thrilled - and that's more than extrordinary enough for me. (When they actually pronounce it correctly, it blows my mind)

7. Did you write a blog post or comment you later regretted?

Of course... but most of those were done early on in my blogging days, on a blog that's no longer visible (thank goodness!) I've pissed off friends, insulted people, and even annoyed people in my own lab. My first blog taught me a LOT about what not to do on line. I hope I've learned most of those lessons.

8. When did you first learn about science blogging?

Long after I started posting science on my blog, really. People started telling me that I should take a look at other blogs, and the more I read, the more I discovered there was a community out there.

9. What do your colleagues at work say about your blogging?

Not much, really. Occasionally, one of them will comment on something I wrote, or offer me advice on something I've discussed, but for the most part, it doesn't come up much in conversation. Although, there is the "blog effect", where people around you suddenly know things going on in your life/research that you are sure you didn't tell them. It's somewhat creepy, and it has taught me not to tell stories to people who read my blogs - they already know what I have to say on some topics.

10. How the heck do you have time to blog and do research at the same time?

Code, commit, run, wait... wait.... (blog)... wait... RESULTS!

Labels:

Tuesday, February 2, 2010

The end is near...

Well, here we are, nearly 350 posts into my blog, and I have to say, I think it's coming down to the end. This isn't something I was contemplating until about 15 minutes ago, so it's a bit of a surprise to me. I should start at the beginning, however, and explain what's going on.

About 6 months ago, I received an invitation to join the Nature Networks blogs, which really appeals to me, in that they have a fantastic community going over there. There are a lot of advantages to participating in such neat group of people who all share a common interest in science. So, pending a few changes at NN, I thought I'd move my blog over there at some point, or at the very least, associate my blog with my NN account.

And then, just this morning, I got an email from blogger/google letting me know that they are going to drop support for FTP published blogs. I have until the end of March to step in line and move my blog to their server, because supporting FTP-based blogs publishing is no longer cost-effective for them. Unfortunately, the only reason I had picked blogger in the first place was because they supported FTP publishing, which leaves me somewhat in the lurch.

Regardless of what I do next, this blog will have to go through some major changes in the next month. There are three changes I can see might solve the issue:
  1. change back-ends and migrate away from blogger (eg. try out wordpress)
  2. change servers, and set up a "custom domain" (eg, use blogger's servers)
  3. move my blog to a completely new venue (eg, start fresh with Nature Blogs.)
Each of the above options will take a lot of work. At the very least, I'll have to archive the current fejes.ca blogs and comments, and then likely be forced to turn off commenting unless I pick option #2. However, with that said, I'm thinking it's time to just bite the bullet and move on to a completely new blog, and Nature Blogs does seem like a good place. For the moment, it is the most appealing option to me.

On the bright side, if I do start a new blog, maybe I can come up with a slightly easier to say name. (Yes, Fejes is actually pronounced as "fey-esh".) Unfortunately, on short notice, the best new blog name I can come up with is "Blog-seq". If anyone has better ideas, I'm definitely open to them. (=

Well, here's to the closing of one chapter, and opening a few new doors... I don't know where this will take me, but as always the journey should be fun!

Labels:

Wednesday, January 27, 2010

Nonviolent Communication by Marshall B. Rosenberg

I recently finished reading Nonviolent Communication by Marshall B. Rosenberg, and thought it would be worth putting forth a few comments.

It was recommended to me by a colleague who pointed out that my use of language on my blog (and perhaps in person) was "confrontational." At least, that's how I'll paraphrase their comment. I admit, I'm often a fan of hyperboles and metaphores, and I like having an "in your face" style or writing. I've always said that I will back down if I'm wrong - just as long as you can show me where I'm wrong, which can be construed as an aggressive way to go through life, and was probably what prompted my colleague to raise their concerns.

Anyhow, I took this book out from the library a couple weeks ago, and I've been slowly digesting it. I hadn't realized that there was a version available online (see the link above), but I'm glad it's there for future reference. I had to return the book last night, and ended up reading the last 60 pages in a rush.

With that said, I should probably make my first comment on the book: I skipped a few paragraphs here and there. My overall feeling of the book is that it would be well delivered as a motivational talk, but the translation to book format left me feeling unimpressed with the style. I often felt like I was reading a transcript from a motivational speaker - which is not quite the same as seeing it in person. That's not a comment on the contents - just that I felt that a book is really not the ideal media for this particular message.

That said, the contents were interesting: The author clearly knows what he's talking about and is able to walk you slowly through the process. When distilled to it's bare minimum, you can divide language into good and bad methods of communication.

The good:
  1. Speaking of Observations to express facts.
  2. Describing Feelings to express impact of observations
  3. Describing Needs, which underlay feelings and impact of observations
  4. Making Requests to indicate what you would like to happen.
The Bad:
  1. Making Judgments or interpretations of observations that are not neutral
  2. Inappropriately assigning Blame for actions, muddling motivations or creating scapegoats for actions.
  3. Being Unclear or Vague about Observations, Needs, Feelings and Requests. (If you don't get it right, you're not any better of than you were before.)
  4. Making Demands, and no one enjoys being told what to do.
Of course, the author is never this concise about what he's trying to teach you, and the above is my own interpretation. What the author does, instead, is walk you through a myriad of examples of each one, showing it in theory, in abstract, in practice and in a situation you might encounter. Overall, it's helpful to have these examples and they are really the best reason to sit down and read the book. I found myself skipping over much of discussion to zoom into his "anecdotes", which were really informative and entertaining.

That said, I'm not going to claim to have distilled out all of the value of this book. In fact, my breakdown of the book above probably won't make much sense without the context in which the author places them. There are also a lot of tips scattered throughout the book that are very useful for improving your communication skills. My favorite is "when you would like someone to change their behaviors, tell them what to do, not what not to do." Another enjoyable part of the book is the chapter where he reflects on how to turn his communication style inward to look at how we do our internal communication, which was really insightful to me as well.

One other thing that I need to clearly state about this book is that I felt the book was really just scratching the surface of it's topic. The constant and underlying message in this book has to do with communication through empathizing with people around you - how to be a good listener and to understand what people are telling you. Unfortunately, I felt that there was a lot more to this element of his method than what the author was willing to discuss. As a scientist, I always love the gory details of how the mind works and would have enjoyed a bit more depth on the subject. Overall, I'm left with the impression that the author believes that if you apply the communication style, you'll learn to be more empathetic and to express your emotions more clearly - and understand those of the people around you better. I often found myself wondering if the opposite approach might be more effective, although I can see how that would be much more difficult to encapsulate into a book.

Anyhow, to conclude, I think this book is worth a read. I wasn't a fan of the style of the book, and found the author dwelled too long each point, but I found it to be an insightful and helpful book overall. I might even rank it as inspirational, within its genre.

How much did I get out of it? That's a good question. I'll start blogging again this week, and people are more than welcome to comment on whether they notice a change in my tone. (-:

Labels:

Sunday, January 24, 2010

Biopartnering North and a short break

First off, if anyone is going to BioPartnering North 2010 this week in Vancouver, I'll be there, and would be very happy to talk genomics/biotech and business with you. I was lucky enough to have been found worthy of one of the coveted BIOTECanada bursaries to attend the event, and I plan to get as much out of it as I can. I'll be at the reception tonight, and undoubtedly I'll be around throughout the next few days. (And, if you were wondering, I won't be blogging any talks from BPN.)

Second, I'm pretty sure everyone has noticed that my blogging output has dropped significantly since December, for which there are several good reasons. The first is that I've been quite busy. My personal life is now occupied by event planning, while my work life has been dominated by several major projects, of which I will undoubtedly be "ranting" about in posts in the near future.

However (and thirdly), the other reason I've not been blogging much is that I also had a conversation with a colleague in december about effective communications. He suggested I read a book on "Non-violent communication." I'm working my way through it slowly, and have taken a few suggestions to heart. It's always possible to become a better communicator and, to that end, I'm on a small hiatus while I re-evaluate my use of language. It won't last long - I like having a blog and I'm already itching to write a few more posts, but it's an opportunity to do some personal development.

Labels: , ,

Friday, January 15, 2010

Symposium: Advances in Bioinformatics and Genomics - Feb 19, 2010

I just came across a forum that I hadn't heard of before: The 2nd Advances in Bioinformatics and Genomics Symposium, being held in the San Francisco/Bay area in February. Unfortunately, like AGBT, it's been scheduled to overlap with the Olympics. Now, normally I don't care about things overlapping with Olympic events, but this year travel in and out of Vancouver will be a nightmare, and I'm not willing to go through that twice. (Especially for a single day symposium.)

Regardless, if people are leaving from other destinations, or already happen to be in the bay area, this "open access" conference sounds pretty neat. The full schedule isn't up yet - and I have to admit I don't know either of the keynote speakers (my ignorance of the field, I'm sure), but the summary seems to fit my interests pretty well - and likely those of other people who read my blog.

Here's the link: http://meta-x.com/advancesbioinformatics/

There's always next year, I suppose. (=

Labels:

Thursday, January 14, 2010

How to be a better Programmer: Tactics.

I'm a bit too busy for a long post, but a link was circulating around the office that I thought was worth passing on to any bioinformaticians out there.

http://dlowe-wfh.blogspot.com/2007/06/tactics-tactics-tactics.html

The article above is on how to be a better programmer - and I wholeheartedly agree with what the author proposed, with one caveat that I'll get to in a minute. The point of the the article is that learning to see the big picture (not specific skills) will make you a better programmer. In fact, this is the same advice Sun Tzu gives in "The Art of War", where understanding the terrain, the enemy, etc are the tools you need to be a better general. [This would be in contrast to learning how to wield each weapon, which would only make you a better warrior.] Frankly, it's good advice, and this leads you down the path towards good planning and clear thinking - the keys to success in most fields.

The caveat, however, is that there are times in your life where this is the wrong approach: ie. grad school. As a grad student, your goal isn't to be great at everything you touch - it's to specialize in some small corner of one field, and tactics are no help here. If grad school existed for Ninjas, the average student would walk out being the best (pick one of: poisoner/dart thrower/wall climber/etc) in the world - and likely knowing little or nothing about how to be a real ninja beyond what they learned in their Ninja undergrad. Tactics are never a bad investment, but they aren't always what is being asked of you.

Anyhow, I plan to take the advice in the article and to keep studying the tactics of bioinformatics in my spare time, even though my daily work is more on the details and implementation side of it. There are a few links in the comments of the original article to sites the author believes are good comp-sci tactics... I'll definitely be looking into those tonight. Besides, when it comes down to it, the tactics are really the fun parts of the problems, although there is also something to be said for getting your code working correctly and efficiently.... which I'd better get back to. (=

Happy coding!

Labels: , , , , ,

Tuesday, December 22, 2009

Link Roundup Returns - Dec 16-22

I've been busy with my thesis project for the past couple weeks, which I think is understandable, but all work and no play kinda doesn't sit well for me. So, over the weekend, I learned go, google's new programming languages, and wrote myself a simple application for keeping track of links - and dumping them out in a pretty html format that I can just cut and paste into my blog.

While I'm not quite ready to release the code for my little go application, I am ready to test it out. I went back through the last 200 twitter posts I have (about 8 days worth), and grabbed the ones that looked interesting to me. I may have missed a few, or grabbed a few less than thrilling ones. It's simply a consequence of me skimming some of the articles less well than others. I promise the quality of my links will be better in the future.

Anyhow, this experiment gave me a few insights into the process of "reprocessing" tweets. The first is that my app only records the person from whom I got the tweet - not the people from who they got it. I'll try to address that in the future. The second is that it's a very simple interface - and a lot of things I wanted to say just didn't fit. (Maybe that's for the better.. who knows.)

Regardless (or irregardless, for those of you in the U.S.) here are my picks for the week.

Bioinformatics:
  • Bringing back Blast (Blast+) (PDF) - Link (via @BioInfo)
  • Incredibly vague advice on how to become a bioinformatician - Link (via @KatherineMejia)
  • Cleaning up the Human Genome - Link (via @dgmacarthur)
  • Neat article on "4th paradigm of computing: exaflod of observational data" - Link (via @genomicslawyer)

Biology:
  • Gene/Protein Annotation is worse than you thought - Link (via @BioInfo)
  • Why are europeans white? - Link (via @lukejostins)

Future Technology:
  • D-Wave Surfaces again in discussions about bioinformatics - Link (via @biotechbase)
  • Changing the way we give credit in science - Link (via @genomicslawyer)

Off topic:
  • On scientists getting quote-mined by the press - Link (via @Etche_homo)
  • Give away of the best science cookie cutters ever - Link (via @apfejes)
  • Neat early history of the electric car - Link (via @biotechbase)
  • Wild (innacurate and funny) conspiracy theories about the Wellcome Trust Sanger Institute - Link (via @dgmacarthur)
  • The Eureka Moment: An Interview with Sir Alec Jeffreys (Inventor of the DNA Fingerprint) - Link (via @dgmacarthur)
  • Six types of twitter user (based on The Tipping Point) - Link (via @ritajlg)

Personal Medicine:
  • Discussion on mutations in cancer (in the press) - Link (via @CompleteGenomic)
  • Upcoming Conference: Personalized Medicine World Conference (Jan 19-20, 2010) - Link (via @CompleteGenomic)
  • deCODEme offers free analysis for 23andMe customers - Link (via @dgmacarthur)
  • UK government waking up to the impact of personalized medicine - Link (via @dgmacarthur)
  • Doctors not adopting genomic based tests for drug suitabiity - Link (via @dgmacarthur)
  • Quick and dirty biomarker detection - Link (via @genomicslawyer)
  • Personal Genomics article for the masses - Link (via @genomicslawyer)

Sequencing:
  • Paper doing the rounds: Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data - Link (via @BioInfo)
  • Archiving Next Generation Sequencing Data - Link (via @BioInfo)
  • Epigenetics takes aim at cancer and other illnesses - Link (via @BioInfo)
  • (Haven't yet read) Changing ecconomics of DNA Synthesis - Link (via @biotechbase)
  • Genomic players for investors. (Very light overview) - Link (via @genomicslawyer)
  • Haven't read yet: Recommended review of 2nd and 3rd generation seq. technologies - Link (via @nanopore)
  • De novo assembly of Giant Panda Genome - Link (via @nanopore)
  • Welcome Trust summary of 2nd Gen sequencing technologies - Link (via @ritajlg)

Labels: , ,

Thursday, December 17, 2009

One lane is (still) not enough...

After my quick post yesterday where I said one lane isn't enough, I was asked to elaborate a bit more, if I could. Well, I don't want do get into the details of the experiment itself, but I'm happy to jump into the "controls" a bit more in depth.

What I can tell is that with one lane of RNA-Seq (Illumina data50bp), all of the variations I find show up either in known polymorphism database or as somatic SNPs, with a few exceptions. The few exceptions just turn out to be exceptions for lack of coverage.

For a "control", I took two data sets (from two separate patients) - each with 6 individual lanes of sequencing data. (I realize this isn't the most robust experiment, but it shows a point.) In the perfect world, each of the 6 lanes per person would have sampled the original library equally well.

So, I matched up one lane from each patient into 6 sets and asked the question: How many transcripts are void (less than 5 tags) in one sample and at least 5x greater in the other sample. (I did this in both directions.)

The results aren't great. In one direction, I see an average of 1245 Transcripts (about 680 genes, so there's some overlap amongst the transcript set) with a std dev. of 38 Transcripts. That sounds pretty consistent, till you look for the overlap in actual transcripts: avg 27.3 with a std dev of 17.4. (range 0-60). And, when with do the calculations, the most closely matched data sets only have a 5% overlap.

The results for the opposite direction were similar: Average of 277 transcripts found that met the criteria (std.dev of 33.61), with an average overlap between data sets being 4.8, std. dev 4.48. (range of 0-11 transcripts in common.) The best overlap in "upregulated" genes for this dataset was just over 4% concordance with a second pair of lanes.

So, what this tells me (for a VERY dirty experiment) is that expression of genes in one lane is highly variable depending on the lane for genes expressed at the low end. (Sampling at the high end usually pretty good, so I'm not too concerned about that.)

What I haven't answered yet is how many lanes is enough. Alas, I have to go do some volunteering, so that experiment will have to wait for another day. And, of course, the images I created along the way will have to follow later as well.

Labels: , , , ,