Just a quick note. In the bowtie2/samtools pipeline I used annovar to add annotations to my VCF files. It was a bit difficult to get set up the first time, but I have since come to appreciate how uniquely useful annovar is: I posted an ask-the-community question to the GATK forum a couple months ago asking if people knew of any alternatives, and it’s now had 53 views and no suggestions for alternatives. annovar is really the only software out there for systematically adding in all that outside information: dbSNP, 1000G, phyloP, SIFT, etc. that are so helpful for interpreting your data, and once you get it set up it’s quick and effective.
Now, annovar is made to work with hg19, so what can you do if you are using GRCh37 as your reference genome? Well, no warranties here, but I discovered you can trick annovar into thinking your VCF is hg19:
# make it look like it's hg19 sed 's/^/chr/' variants.final.vcf | sed 's/^chr#/#/' | sed 's/chrMT/chrM/' > variants.pseudo.hg19.final.vcf # convert your vcf to an annovar file perl ~/bin/annovar/convert2annovar.pl --format vcf4 --includeinfo variants.pseudo.hg19.final.vcf > variants.annovar # do the annotation perl ~/bin/annovar/summarize_annovar.pl --buildver hg19 --ver1000g 1000g2012feb --verdbsnp 132 variants.annovar ~/bin/annovar/humandb -outfile annovar/variants
And it completes without errors and as far as I can spot-check, it seems to have gotten all the annotations correct. Again, this is clearly cheating and so there’s probably something wrong in there somewhere, but as a first pass this seems to mostly work.