Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Friday, June 20, 2008

MAQ mapview format. - Updated

Well, I promised a quick update on the MAQ mapview format, after I wrote the interpreter for it, but there isn't much to say.

Key bits of information:
  • The file is simply a zipped text file. (Update: this file is not normally gzipped. While the .map file is zipped, the .mapview file is not normally found in the gzipped state.) You can unzip it with 'gunzip' on a linux system.
  • Reads are pre-sorted by chromosome, then position.
  • The format is 1 based, so if you're using a zero based format, you'll need to convert.
  • Starting points are for the "left end", ie, regardless of which strand the sequence aligned to, the matching position with the lowest position is reported.
  • Sequences are not contained in this file, but if you go back to the original fastq file you can retrieve the sequence. If you do so, you will need to obtain the reverse compliment of any read that maps to the reverse strand to map to your fasta file sequence. Forward strand sequences will map correctly to the fasta sequence.
  • Most of the fields are not useful for any form of analysis, and what's given is mostly incomprehensible.


The best information I had was from the MAQ manpage. In a slightly more readable format:

  1. read name
  2. chromosome
  3. position
  4. strand
  5. insert size from the outer coorniates of a pair
  6. paired flag
  7. mapping quality
  8. single-end mapping quality
  9. alternative mapping quality
  10. number of mismatches of the best hit
  11. sum of qualities of mismatched bases of the best hit
  12. number of 0-mismatch hits of the first 24bp
  13. number of 1-mismatch hits of the first 24bp on the reference
  14. length of the read
  15. read sequence
  16. quality


Most strikingly, you'll notice 16 fields are listed above, while the file appears to have 14 fields. It seems not all files have the last two fields. I don't know if it's just the file I have, or if it's usually that way. (Update: Actually, there are normally 16 fields. The files I was given were generated using the mapview -B flag, which strips out some of the information, which I believe are the final two fields. Thus, the comments above reflect the -B flag output only! Thanks to Ryan for catching that!)

If I come across anything else that needs to be added, I'll update this entry.

Labels: , ,

1 Comments:

Blogger ppotato said...

Thank you Fejes. I was looking for .map documentation. I did not realize it was in MAQ page.
hi1

April 1, 2009 6:58:00 AM PDT  

Post a Comment

<< Home