top of page
panic.jpg

If this is you...

...while trying to analyse NGS data or when doing some bioinformatics in general, I'll try to give you a hand. I have more than 10 years experience in Bioinformatics, and while my main language is Bash, I can write in several others like R, C#, C++ and Perl. I have a good publications record and I was already around doing bioinformatics when the WTCCC GWAS original study came out. For fuck's sake, I have been so long in this that I have actually done some bioinformatics analysis on a Solaris.

I will try to share my experience not for the sake of the human race or to make your life easier (although those are cool), but for my own selfish interests: 

One, there seems not to be a golden standard to analyse Next Generation Sequencing (NGS) in the scientific community. I have gone through a year of publications that uses DNA NGS data in the top genetics journals (Nature Genetics, PLoS Genetics, AJHG, Genome Research and Human Molecular Genetics) and there is no single two papers that analyse NGS data in the same way, with the same pipelines, not even close to similar. There are things like GATK best practices, but these really don't work, and most bioinformaticians I have met agree on this point. For this reason I'm going to try to set up here some guidelines that actually work and that people can follow to analyse their NGS data.

And two, there are too many people out there that just lack the basics, not only in bioinformatics, but also in informatics (and sometimes in genetics additionally), who are trying to analyse NGS data, and this leads to publications which results are just wrong (KMT2C stop gain, I'm looking at you). This has two sides, the people analysing data incorrectly and referees from journals not being able to review the papers submitted for publication in the right way. You don't like computers? that's fine, but if you are not willing to learn how to use them properly for genetic data analysis, you should leave genetics. We are generating insane amounts of genetic data with the current generation of sequencing technologies, and it's only going to get worse (better?), so you need to know about how to process them whether you like it or not, not only to be able to know what to do with your data, but also to critically and adequately be able to judge the work of others.

So please, I beg of you, if you are in genetics research, follow me on this journey. For the sake of science, research and the scientific community. And probably world peace.

bottom of page