John Pace
  • Home
  • About
  • Contact

​​
​
Data Scientist, husband, father of 3 great daughters, 5x Ironman triathlon finisher, just a normal guy who spent a lot of time in school.
Let’s explore data science, artificial intelligence, machine learning, and other topics together.

Great experience at the Bio-IT World 2019 hackathon

4/18/2019

0 Comments

 
Picture
The Bio IT Team - Left to Right - Amanda Ruby, Matt Doherty, Tom Madden, me, Alexander Jung, Jody Burks
Picture

On Monday and Tuesday, 4/15 and 4/16, I had the opportunity to participate in a hackathon at the Bio-IT World conference in Boston.  The experience was amazing.  The hackathon was put on the NCBI hackathon team of Ben Busby (@DCGenomics), Kaitlyn Barago (@KaitlynMBarago), and Allissa Dillman (@DCHackathons).  We have worked with this team at other hackathons that Mark III Systems has sponsored.
Picture
Left to Right: Ben Busby, Kaitlyn Barago, Allissa Dillman

​The overarching theme of the hackathon was FAIR data principles in science.  FAIR stands for:

F – Data is Findable
A – Data is Accesible
I – Data is Interoperable
R – Data is Re-usable

My team’s topic was “BLAST, Pipelines, and FAIR”.  If you are not familiar with BLAST, it is a free software product, developed by NCBI, that allows you to search for DNA or protein sequences in a pre-made or custom databases.  BLAST was by far the software package I used more than anything in my graduate work.  I am a huge fan.

I was fortunate that our team leader was Tom Madden who is the head of BLAST team at NCBI!

Our project was to create a re-usable pipeline that could be used to automate a bioinformatic pipeline so it could be run by anyone in a standard environment, such as Linux. The only thing that would have to be changed is the input files that are used.  For more detail on the actual pipeline and our final presentation, see our team’s Github page.  The presentation is in the Slides folder.

Our pipeline was developed using CWL (Common Workflow Language).  CWL is an open source framework for creating workflow pipelines.  All configuration information, such as data file paths, are stored in YAML files.  As with CWL, YAML files are widely used as configuration files (for example, in Hadoop).

In the end, we had 1 very nice CWL file that ran the entire pipeline and 3 YAML configuration files.  The CWL workflow was run using only one command from the command line.

Overall, it was a great experience.  I met some great people from varied backgrounds and enjoyed the camaraderie and cooperation between all of the teams.  I can’t wait for next year’s hackathon!

Big thanks to all of my team mates!

Amanda Ruby, Software Engineer/Bioinformatics Analyst at Rheonix, Inc.  @AmandaRubyBio 
Tom Madden, Team Lead for BLAST at the NCBI.  @tom6931 
Alexander Jung, Head of Digitalization Biologicals Development CMC at Boehringer Ingelheim
Matt Doherty, Founder at Resolute.ai, @ResoluteAI
Jody Burks, Developer Advocate, Quantum Computing Ambassador IBM, @JodyBurksPhD

If you have questions and want to connect, you can message me on LinkedIn or Twitter. Also, follow me on Twitter @pacejohn, LinkedIn https://www.linkedin.com/in/john-pace-phd-20b87070/, and follow my company, Mark III Systems, on Twitter @markiiisystems

​#hackathon #ncbi #blast #artificialintelligence #ai #machinelearning #cwl #yaml #bioitworld #bioinformatics #github
0 Comments



Leave a Reply.

    Archives

    November 2020
    September 2020
    August 2020
    June 2020
    May 2020
    April 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    August 2019
    May 2019
    April 2019
    March 2019
    April 2018
    March 2018
    January 2018
    November 2017

    Tweets by pacejohn
Proudly powered by Weebly
  • Home
  • About
  • Contact