John Pace
  • Home
  • About
  • Contact

​​
​
Data Scientist, husband, father of 3 great daughters, 5x Ironman triathlon finisher, just a normal guy who spent a lot of time in school.
Let’s explore data science, artificial intelligence, machine learning, and other topics together.

Question and Answer for Long Passages Using BERT

12/20/2019

0 Comments

 
Picture
Image from https://d827xgdhgqbnd.cloudfront.net/wp-content/uploads/2019/04/09110726/Bert-Head.png

BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.  It’s safe to say it is taking the NLP world by storm. BERT was developed by Google and Nvidia has created an optimized version that uses TensorRT.

To run a Question & Answer query using BERT, you have to provide the passage to be queried and the question you are trying to answer from the passage. One drawback of BERT is that only short passages can be queried when performing Question & Answer. After the passages reach a certain length, the correct answer cannot be found.  ​

I have created a script that allows you to query longer passages and get the correct answer.  I take an input passage and break it into paragraphs that are delimited by \n. Each paragraph is then queried to try to find the answer. All answers that are returned are put into a list. The list is then analyzed to find the answer with the highest probability.  This is returned as the final answer. When you run the script, you will want to change the paths to correspond with your setup.  All code and files are on my GitHub HERE.
​
Picture

Setup

In order to run the script properly, you need to make sure that a Docker container is created. Before running the query, be sure to start the TensorRT engine. Here are the steps Nvidia says to do and that I am doing as well.

From your home directory, run the following. It takes a while.

  • Clone the TensorRT repository and navigate to the BERT demo directory

    
  • Create and launch the Docker image

    
  • Build the plugins and download the fine-tuned models

    
  • Build the TensorRT runtime engine and start it. If you don't nohup this, you won't be able to do anything else.

    
  • After you have started the engine, you can then run the Q&A query in the script

    
One caveat is that the TensorRT engine will terminate after a period of time.  Be sure it is running before you perform you query.

Files I am using for queries

There are 3 files that can I am using as input passages.  Feel free to try it with your own passages.

22532_Full_Document.txt - this is the full document I am using. If you ask a question about the first part, it will return the correct answer. If you ask a question about a later part, it will not find the answer.

22532_Short_Document_With_Answers.txt - this is a shortened passage that has answers to the query. If you use the same query as I did for the question, it will find 2 answers. The one with the higher probability is the correct answer.

22532_Short_Document_Without_Answers.txt - this is a shortened passage that does not have the answers to the query. If you use the same query as I did for the question, it will not find any answers.

The question that is asked is "How many patients experienced recurrence at 12 years of age?" Feel free to experiment.

If you have questions and want to connect, you can message me on LinkedIn or Twitter. Also, follow me on Twitter @pacejohn, LinkedIn https://www.linkedin.com/in/john-pace-phd-20b87070/, and follow my company, Mark III Systems, on Twitter @markiiisystems

This post is also on Medium at https://medium.com/analytics-vidhya/question-and-answer-for-long-passages-using-bert-dfc4fe08f17f.

​#ai #artificialintelligence #machinelearning #deeplearning #neuralnetwork #BERT #nlp #naturallanguageprocessing #nvidia #tensorrt #docker #github #google
0 Comments



Leave a Reply.

    Archives

    November 2020
    September 2020
    August 2020
    June 2020
    May 2020
    April 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    August 2019
    May 2019
    April 2019
    March 2019
    April 2018
    March 2018
    January 2018
    November 2017

    Tweets by pacejohn
Proudly powered by Weebly
  • Home
  • About
  • Contact