On Monday and Tuesday, 4/15 and 4/16, I had the opportunity to participate in a hackathon at the Bio-IT World conference in Boston. The experience was amazing. The hackathon was put on the NCBI hackathon team of Ben Busby (@DCGenomics), Kaitlyn Barago (@KaitlynMBarago), and Allissa Dillman (@DCHackathons). We have worked with this team at other hackathons that Mark III Systems has sponsored.
The overarching theme of the hackathon was FAIR data principles in science. FAIR stands for
F – Data is Findable
A – Data is Accesible
I – Data is Interoperable
R – Data is Re-usable
My team’s topic was “BLAST, Pipelines, and FAIR”. If you are not familiar with BLAST, it is a free software product, developed by NCBI, that allows you to search for DNA or protein sequences in a pre-made or custom databases. BLAST was by far the software package I used more than anything in my graduate work. I am a huge fan.
I was fortunate that our team leader was Tom Madden who is the head of BLAST team at NCBI!
Our project was to create a re-usable pipeline that could be used to automate a bioinformatic pipeline so it could be run by anyone in a standard environment, such as Linux. The only thing that would have to be changed is the input files that are used. For more detail on the actual pipeline and our final presentation, see our team’s Github page.
(https://github.com/NCBI-Hackathons/BLAST-Pipelines-and-FAIR). The presentation is in the Slides folder.
Our pipeline was developed using CWL (Common Workflow Language). CWL is an open source framework for creating workflow pipelines. All configuration information, such as data file paths, are stored in YAML files. As with CWL, YAML files are widely used as configuration files (for example, in Hadoop).
In the end, we had 1 very nice CWL file that ran the entire pipeline and 3 YAML configuration files. The CWL workflow was run using only one command from the command line.
Overall, it was a great experience. I met some great people from varied backgrounds and enjoyed the camaraderie and cooperation between all of the teams. I can’t wait for next year’s hackathon!
Big thanks to all of my team mates!
Amanda Ruby, Software Engineer/Bioinformatics Analyst at Rheonix, Inc. @AmandaRubyBio
Tom Madden, Team Lead for BLAST at the NCBI. @tom6931
Alexander Jung, Head of Digitalization Biologicals Development CMC at Boehringer Ingelheim
Matt Doherty, Founder at Resolute.ai, @ResoluteAI
Jody Burks, Developer Advocate, Quantum Computing Ambassador IBM, @JodyBurksPhD
My paper in graduate school that got the most attention was about Space Invaders. How do Space Invaders go along with Quantitative Biology, Genomics, and DNA in general? Here are a couple of links to explain. The first is a review that was done by National Geographic. The other is the link to the actual paper in PNAS.
Repeated horizontal transfer of a DNA transposon in mammals and other tetrapods, published in Proceeding of the National Academy of Science, October 2008.
First, I'm not an expert in drug discovery by any means. On a scale of 1-10, my knowledge is pretty low. However, I am extremely interested in how AI is being used in different fields. Any technology that improves drug discovery can result in lower mortality, better quality of life, and more effective treatments of diseases. This is a short review of what I learned at the conference.