Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Thursday, August 6, 2009

New Project Time... variation database

I don't know if anyone out there is interested in joining in - I'm starting to work on a database that will allow me to store all of the snps/variations that arise in any data set collected at the institution. (Or the subset to which I have the right to harvest snps, anyhow.) This will be part of the Vancouver Short Read Analysis Package, and, of course, will be available to anyone allowed to look at GPL code.

I'm currently on my first pass - consider it version 0.1 - but already have some basic functionality assembled. Currently, it uses a built in snp caller to identify locations with variations and to directly send them into a postgresql database, but I will shortly be building tools to allow SNPs from any snp caller to be migrated into the db.

Anyhow, just putting it out there - this could be a useful resource for people who are interested in meta analysis, and particularly those who might be interested in collaborating to build a better mousetrap. (=

Labels: , , , , , ,

2 Comments:

Blogger Will said...

Fascinating and I am most definitely interested in the results, if not the process of building the database. I'm in the middle of a sort of "bioinformatics for babies" project looking at calling SNPs in some of our data sets. It would be great to have a database format with some set of tools (java API?) for manipulating and curating the data.

Let me also put in a quick promo for including some kind of compatibility with indel calls. I know the buggers are a pain in the ass to call properly, but it's going to get easier and easier as short reads become medium reads become long reads.

August 11, 2009 7:20:00 AM PDT  
Blogger Anthony Fejes said...

Hi Will,

Your description is pretty good: database with Java API (and then misc other SQL based tools, as necessary) for curating SNPs. Hopefully it'll catch on - several people have emailed about it already, so it sounds like there's some need for it.

As for the indels, we're also looking at that, but putting those in will be phase 3. We don't have great gapped aligners that do this well, so I think it makes much more sense to focus on the SNPs first. (=

Cheers,

Anthony

August 11, 2009 7:44:00 AM PDT  

Post a Comment

<< Home