On Monday and Tuesday, 4/15 and 4/16, I had the opportunity to participate in a hackathon at the Bio-IT World conference in Boston. The experience was amazing. The hackathon was put on the NCBI hackathon team of Ben Busby (@DCGenomics), Kaitlyn Barago (@KaitlynMBarago), and Allissa Dillman (@DCHackathons). We have worked with this team at other hackathons that Mark III Systems has sponsored.
The overarching theme of the hackathon was FAIR data principles in science. FAIR stands for:
F – Data is Findable
A – Data is Accesible
I – Data is Interoperable
R – Data is Re-usable
My team’s topic was “BLAST, Pipelines, and FAIR”. If you are not familiar with BLAST, it is a free software product, developed by NCBI, that allows you to search for DNA or protein sequences in a pre-made or custom databases. BLAST was by far the software package I used more than anything in my graduate work. I am a huge fan.
I was fortunate that our team leader was Tom Madden who is the head of BLAST team at NCBI!
Our project was to create a re-usable pipeline that could be used to automate a bioinformatic pipeline so it could be run by anyone in a standard environment, such as Linux. The only thing that would have to be changed is the input files that are used. For more detail on the actual pipeline and our final presentation, see our team’s Github page. The presentation is in the Slides folder.
Our pipeline was developed using CWL (Common Workflow Language). CWL is an open source framework for creating workflow pipelines. All configuration information, such as data file paths, are stored in YAML files. As with CWL, YAML files are widely used as configuration files (for example, in Hadoop).
In the end, we had 1 very nice CWL file that ran the entire pipeline and 3 YAML configuration files. The CWL workflow was run using only one command from the command line.
Overall, it was a great experience. I met some great people from varied backgrounds and enjoyed the camaraderie and cooperation between all of the teams. I can’t wait for next year’s hackathon!
Big thanks to all of my team mates!
Amanda Ruby, Software Engineer/Bioinformatics Analyst at Rheonix, Inc. @AmandaRubyBio
Tom Madden, Team Lead for BLAST at the NCBI. @tom6931
Alexander Jung, Head of Digitalization Biologicals Development CMC at Boehringer Ingelheim
Matt Doherty, Founder at Resolute.ai, @ResoluteAI
Jody Burks, Developer Advocate, Quantum Computing Ambassador IBM, @JodyBurksPhD
If you have questions and want to connect, you can message me on LinkedIn or Twitter. Also, follow me on Twitter @pacejohn, LinkedIn https://www.linkedin.com/in/john-pace-phd-20b87070/, and follow my company, Mark III Systems, on Twitter @markiiisystems
#hackathon #ncbi #blast #artificialintelligence #ai #machinelearning #cwl #yaml #bioitworld #bioinformatics #github