On November 8-10th, I was able to compete in The University of Texas Southwestern Medical School U-Hack Med Hackathon. I attended the event last year and helped multiple groups as a facilitator and mentor, but this year I applied to compete on a team and was accepted! The 48-hour event was a great learning experience and an amazing way to meet new people from varied backgrounds.
I was on Team 9. Our team’s project was entitled “REAL-TIME CARDIAC ASSESSMENT OF CATHETERIZATION-DERIVED FICK AND CMR-DERIVED FLOW” and focused on using machine learning and AI to reduce the time it takes for children to undergo pediatric heart catherization procedures. For a full description of the project, click here. In the procedure, a wire with a balloon attached to the end is inserted into a large vein near the child’s groin, then guided up through the torso into the heart and surrounding veins and arteries. In the past, this procedure has been done using x-rays to allow the physician to see the wire as it moves. However, this makes an already risky procedure even more dangerous. Studies have shown that children that are exposed to 2-4 hours of x-rays (the amount of time a procedure typically takes) are 3-4 times more likely to develop cancer later in life. More recently, cardiologists have been performing radiation-free heart catheterizations using an MRI to view the wire and balloon. In both procedures, the child has to be put under general anesthesia for the entire time. This much anesthesia carries a high risk of causing developmental and neurological problems. Thus, anything that can be done to reduce the exposure to x-rays and amount of time the child is under anesthesia can significantly reduce potential risks. This was our challenge. Click here for more info on radiation-free heart catheterizations at Children’s Medical Center in Dallas.
Dr. Youssef Arar, a pediatric cardiology fellow at Children’s Medical Center in Dallas was our team lead. Dr. Daniel Castellanos, a pediatric cardiology fellow and colleague of Dr. Arar, was our co-lead. they routinely perform these procedures so they are well acquainted with the intricacies, risks, and ways the procedures can possibly be improved. Dr. Arar presented us with two different tasks that could help reduce the time heart cath procedures take.
The first challenge involved automating calculations of blood flow and pressures. Currently, the physician doing the procedure has to tell a technician what values he is getting. The technician then enters those values into a spreadsheet and has to manually calculate the blood flow and pressure. This is a time consuming and error prone process. Every moment calculations are being performed manually is another moment the child is exposed to radiation or under anesthesia. When the procedure is done in the MRI room, there is a tremendous amount of noise, so communication is impaired significantly. This makes the process even more error prone. Our team developed a web-based application to automate the calculations. To use the app, the values are relayed to the technician, or the technician views them on a screen. As they are input, blood flow, pressure, and other values are automatically calculated, giving the physician immediate answers. Dr. Arar said this app could potentially shorten procedures by 40 minutes and give more accurate calculations!
The second challenge involved visualizing the balloon in the heart during the MRI procedure. An MRI takes an image of a single plane at any given time. If the balloon is not in that plane, the cardiologist cannot see it, so any movement of the balloon is done blindly. This results in the patient having to potentially be repositioned, lengthening the anesthesia time. Dr. Arar asked us to find a way to track the amount of time the balloon is seen during a procedure. By knowing this information, the cardiologists can evaluate the procedure after it is finished to discuss what could have been done differently to keep the balloon in view longer. By determining what adjustments can be made, future procedures can be more efficient, shorter, and safer. Our team used a process called “thresholding” to segment the balloon on the MRI image and count the number of frames it appeared in. This allowed us to calculate the percentage of time the balloon was seen.
Overall, I had a fantastic time. My team was small and we had varied backgrounds. We had a web developer, a computational scientist, a post-doctoral researcher, a PhD student, two MDs, and myself. In the end, our team ended up winning the Lyda Hill Best Overall Team award for our work. Great job everyone!
Don't forget to follow me on Twitter @pacejohn and my company @markiiisystems (https://www.markiiisys.com/blog/)
Yesterday I taught a computer vision workshop. In the section about image classification, I had the attendees train a deep learning model that could correctly classify images of the letters A, B, C, E, and H. To build the model, they started with another previously trained model. This is known as "transfer learning." More specifically, transfer learning is a method where a model developed for a task is reused as the starting point for a model on a second task. It is an optimization that allows rapid progress or improved performance when modeling the second task.
On October 14 and 15, I attended the annual Rice University Data Science Conference. There were a few things I took from the opening days sessions.
The keynote speaker was from a major cancer center in Houston. A large portion of his talk was about data governance and who actually owns a patient’s medical data? Is it the patient, the hospital, the doctor, the insurance company, a combination of these, or none of these? He displayed a map of the United States that showed how this varies from state to state. What was clear is that there is no consistency. This question has to be answered before real analytic work can be done on patient data. This is an absolutely critical problem since data science can improve patient care as well as operational efficiency and profitability of medical institutions, just to name a few. In most instances, only hospital staff have access to data, yet it is sitting unused. I asked the speaker why medical institutions are not making major investments in data science like companies such as Walmart, Wells Fargo, and Uber are making. He stated it is all about outcomes.
The second speaker discussed image analysis of tissue samples. She mentioned that images must be labeled by trained pathologists so there is a ground truth to compare the deep learning model predictions to. Unfortunately, this problem is a catch-22. Pathologists do not have enough time to label images since it is time and labor intensive. Without labeled images, deep learning cannot be used to find areas of interest on the slides. If the images were labeled, the pathologists would have more time work on more complicated cases that deep learning cannot currently solve. That’s where the catch-22 comes in. There is no time to label images due to lack of time, yet labeled images will save time ultimately.
Another hot topic was explainability of deep learning models and how physicians are reticent to adopt AI when no one can tell them why it made the predictions it did. This is where the tradeoff between classical machine learning and deep learning comes in. Maybe the solution to this problem is to use classical machine learning techniques, such as random forests or support vector machines for specific problems, rather than neural networks. In contrast to neural networks, classical machine learning techniques produce models that are explainable. Maybe it is better to start analysis using classical machine learning rather than deep learning and only implement deep learning models when the results are significantly better.
Finally, a speaker discussed the carbon footprint generated by servers that are executing deep learning jobs using GPUs. I had this considered before, but it is a problem that must be addressed since the use of deep learning is becoming more prevalent.
On Day 2, I attended several sessions on data science in healthcare. One session that stood out was about using quantitative measurements to determine the progression of Parkinson’s Disease in patients. Currently, the progression is measured be a neurologist looking at a patient’s movements. At a later time, the neurologist has the patient perform the same movements and makes a qualitative decision about the progression. This is highly subjective. One group is using 2 different methods to measure the progression. One involves hold time for keys on a keyboard. As the disease progresses, there is a marked and statistically significant difference in hold time on keys. They are also using metrics from swipes on mobile devices. As with hold time, swipe dynamics change over time. The pressure applied, the length of the swipe, how straight it is – all of these change. By using these measurements, physicians are able to get clear numerical values that show the progression.
Day 2 was also when the posters were presented. Kate Weeks, a student in the University of the Pacific’s Master’s degree in Data Science, presented her work that she has done on pothole detection using accelerometer data with San Joaquin County in their road maintenance. I have had the privilege of working with her on the project.
If you would like more details on the conference, feel free to reach out. Be sure to follow me on Twitter (@pacejohn) and my company (@markiiisystems).
I had the pleasure of attending the 2019 AI Summit in San Francisco on September 25 and 26.
At the conference, I was able to demo new AI software I am working on. The software is an American Sign Language translator. The goal is to produce an app that allows someone to point their phone camera at a person who is speaking in American Sign Language. The sign language is then transcribed onto the screen. Currently, the software can recognize and transcribe the ASL alphabet. In order to train the deep learning model, I need many, many images of people performing the ASL alphabet. As I explained the work and showed people the demo, I asked if they would let me take a video of their hand as they signed the letters. I had a total of 37 people who let me video them! I was tremendously grateful. The image labeling is being done using IBM Power AI Vision software and the models are being trained using the POWER Systems optimized version of TensorFlow included in the Watson Machine Learning Community Edition software running on an IBM POWER9 AC922 server. I also had the opportunity to do an oral presentation 4 times in the booth to describe my project. The turnout for the presentations was good and I people asked lots of good questions.
The ASL software project is challenging in that combines several machine learning/deep learning techniques that must work together both in concert and independently. The project uses convolutional neural networks (CNN) for object recognition, which is a subset of computer vision, as well as recurrent neural networks (LSTM), and natural language processing (NLP). At a high level, the software must recognize hands, arms, and facial expressions since all are involved in ASL. Upon recognition, it must determine what the hand and arm doing. Is it making a letter, a number, or some other symbol? When is the hand actually making a letter or signing a word and when it is transitioning? The transitions need to be discarded. Combining all of these into a workflow that can run on a mobile app is something that has not been done before but will make a huge impact for the deaf community.
There were several things I found interesting at the conference. Typically, at an AI conference, there is one technology that seems to be everywhere, such as autonomous vehicles or inferencing platforms. However, this conference was different in that no one technology was overrepresented. There were probably more companies that provided image labeling services than any others, but they were not overwhelming.
Overall, I thought the conference was well organized, had a nice breadth of vendors and technologies represented, and was quite productive. I will also be attending the AI Summit in New York in December where we will have an even bigger presence in the booth. Stop by and see the newest version of the ASL demo! If you would like more details on the conference, feel free to reach out. Be sure to follow me on Twitter (@pacejohn) and my company (@markiiisystems).
Over the past few weeks, I have been doing some benchmark testing between the IBM POWER9 AC922 server and the Nvidia DGX-1 server using time series data. The AC922 is IBM's Power processor-based server that is optimized for machine and deep learning. Nvidia's DGX-1 is Nvidia's Intel processor-based server that is optimized for machine and deep learning. Both servers have the latest Nvidia V100 GPU. The AC922 has 2x 16GB V100 GPUs, the DGX-1 has 8x 32GB. This post summarizes the general process used in the benchmarking as well as the results. I have intentionally kept the post somewhat conceptual to illustrate our methodology and key discoveries at a high level. If you want more technical detail, please contact me (firstname.lastname@example.org).
I chose time series data for the benchmark testing for 2 reasons. First, it is something our customers are asking about. Second, time series problems can be investigated using both classical machine learning algorithms like ARIMA, as well as by deep learning using recurrent neural networks, particularly LSTMs. ARIMA runs solely on the CPU whereas training for LSTMs takes place on a GPU. Thus, this testing allowed comparisons of both CPU- and GPU- based processes on both servers. In addition, it allowed me to compare the relative quality of the predictions made by two very distinct techniques.
In this work, I used synthetic data generated using software by Jinyu Xie called Simglucose v0.2.1, "a Type-1 Diabetes simulator implemented in Python for Reinforcement Learning purpose." Predicting blood glucose levels for patients with diabetes is of particular importance because the models can be used in insulin pumps to predict the correct amount of insulin to give to normalize blood glucose levels. The software generates synthetic blood glucose data for children, adolescents, and adults at designated time points. I used data at 1 hour time points. The training data consisted of 2,160 time points (90 days). The prediction data consisted of 336 time points (14 days). So, the models were trained using the 90 days of hourly time points, then tried to predict the hourly values for the next 14 days.
Two different algorithms were used to make the predictions. The first was a classical machine learning technique known as ARIMA (Autoregressive Integrated Moving Average), more specifically, SARIMA (Seasonal Autoregressive Integrated Moving Average). SARIMA is a specialized version of ARIMA that takes into account seasonality of the data (blood glucose levels have a very clear seasonal pattern). The second was an implementation of a recurrent neural network known as an LSTM (Long Short-Term Memory). To perform the comparisons, I used Jupyter notebooks that ran Python scripts. Thanks to Jason Brownlee of Machine Learning Mastery for his publicly available code that was adapted for this project. I compared the predictions for the SARIMA and LSTMs using Mean Squared Error as the quantitative metric of how well the model made predictions. I also calculated the time it took for the training and predictions to be done. For SARIMA, I only calculated the total run time for the training/prediction calculations since there is no distinct prediction phase. For the LSTM, I calculated separate run times for the training and prediction phases since they are distinct processes.
For training and predictions, I used synthetic data for adolescents that I termed as Patients 001, 002, 003, 004, and 005. For each patient, the 2,160 hourly training data points and 336 prediction data points were generated. I then did pairwise comparisons of all patients. For example, I would train a model using the training data for patient 001. I would then use that model to make predictions for patients 001, 002, 003, 004, and 005. The process was repeated for all patients. This gave a total of 25 comparisons. Full pairwise comparisons were performed to evaluate if either SARIMA or LSTM could create models that could be generalized to other patients.
Not surprisingly, the models could not be generalized, which was the expected result. This result underscores the importance of using relevant data to train machine learning models.
Below are examples of how the models performed on predictions. The blue lines are the actual values and the red lines are the predicted values. In the first 2 images, the model was trained on the data for Patient 002. Predictions were then made for Patient 002. In the last 2 images, the model was again trained on the data for Patient 002. Predictions were then made for Patient 001. As you can see, the results were significantly less accurate than when values for Patient 002 were predicted.
The results I obtained were very interesting. Here is a very high-level overview.
So, the burning question is, "Which server is better for time series analysis? The Nvidia DGX-1 or the IBM POWER9 AC922?" The answer is both. For SARIMA and LSTM training, the DGX-1 outperformed the AC922. For LSTM predictions, the AC922 outperformed the DGX-1. Which one should you choose? The answer depends on the use case. If you want to use SARIMA or if LSTM training time is your driving factor, the DGX-1 may be the better choice. If prediction speed is critical, the AC922 could be better.
Admittedly, there are a couple of caveats that must be considered. There were some version differences of pandas, NumPy, TensorFlow, and Keras due to the differing processor architectures. I tried multiple versions of each and the results were very similar. Also, the DGX-1 uses Docker containers while the AC922 uses conda environments. This could lead to some differences as well. Overall, I think these differences have very little effect on the overall benchmark outcomes, but it is something I plan to investigate further. Finally, the models were trained on a small dataset, only 2,160 data points. I will be trying a much larger dataset in the future as well as trying different combinations of hyperparameters to improve forecast accuracy.
The Jupyter notebooks, scripts, and data files, along with all of the summary statistics are available on my GitHub page (https://github.com/pacejohn/Glucose-Level-Prediction).
Brownlee, Jason. 11 Classical Time Series Forecasting Methods in Python (Cheat Sheet). (January 9, 2017) [Online]. Available:
https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/ (Accessed 2019)
Brownlee, Jason. How to Develop LSTM Models for Time Series Forecasting. (November 14, 2018) [Online]. Available:
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/ (Accessed 2019)
Jinyu Xie. Simglucose v0.2.1 (2018) [Online]. Available: https://github.com/jxx123/simglucose. (Accessed 2019).
This post gives summaries and personal commentaries of some, but not all, of the talks I heard on Day 2 of the Deep Learning in Healthcare conference hosted by Rework on May 23-24, 2019, in Boston. They are not listed in the order in which the sessions were delivered.
Some of the major overarching topics discussed were NLP in medical records (as with Day 1) and development of apps used in healthcare.
Panel Discussion – Anthony Chang, MD, Pediatric Cardiologist, CHOC Children’s Hospital, Hye Sun Na, AI Product Manager, GE Healthcare, Vijaya Kolachalama, Boston University School of Medicine
Dr. Chang discussed how fast medical knowledge doubles, ie, how often the number of new procedures, the number of new drugs, and the number of new research articles. He said it doubles every 2 months! Because of this, physicians are unable to keep up with their own field, much less other related fields. He stated very clearly that physicians need AI to help guide them with precision medicine and treatment plans. He said it would be nice to have a report at the end of the medical record that has discovered links and correlations between symptoms, drug interactions, and other things. A physician cannot read a 200+ page medical record every time they see a patient. AI can do that and help guide treatment. For more details see the my “Most impactful quotes from the Deep Learning in Healthcare conference in Boston on May 23 and 24, 2019” post for more details (https://ironmanjohn.weebly.com/home/most-impactful-quotes-from-the-deep-learning-in-healthcare-conference-in-boston-on-may-23-and-24-2019).
Dr. Chang also made some other striking comments. He stated “Physicians typing while talking to a patient is ‘criminal.’ A physician should be looking the patient in the eye.” He said a physician cannot fully focus on what the patient is saying while they are typing. He stated that useful AI could help avoid the physician having to type, thereby improving the visit.
Finally, Dr. Chang said that maybe 10-15% of AI projects are not useful or clinically relevant. He stated very strongly that there must be absolute collaboration between clinicians and the group doing the AI project.
A Multi-Modal Inferential System for Differential Diagnosis – Ajit Narayanan, CTO, mfine
Dr. Narayanan’s talk was moving. In it, he presented statistics on the dire state of healthcare in India. I truly had no idea how bad the situation is there (see my blog post on Impactful Quotes from the conference for me details). He presented 2 mobile apps his company has developed that allow patients to see doctors virtually. This allows people who would not typically have access to healthcare to get healthcare. The app keeps track of the medical record and helps with diagnosis. The app is providing an amazing service to a very underserved country.
Multilingual NLP for Clinical Text: Impacting Healthcare with Big Data, Globally” – Droice Labs
The presenters showed striking evidence that NLP software is highly biased towards the English language. They presented a slide that there are more UMLS concepts for the English language than for all other languages combined! They have developed a software called “Flamingo” that performs NLP on patient records in languages other than English. They are helping with a critical need!
Redefining the IVF Experience for Patients & Providers Using AI – Mylene Yao, CEO & Founder, Univfy
Dr. Yao, a gynecologist who specializes in in-vitro fertilization, discussed AI software her company has developed. IVF is an expensive process that is taxing both mentally and physically on couples. Predicting the success of IVF has historically been very challenging. Age have always been known to be a strong predictor of success, but in reality it only accounts for 50% of predictive ability. Dr. Yao’s company has developed a model that uses many factors to create a model that gives a probability of IVF being successful. The results are strikingly accurate. They actually allow physicians to offer refunds to patients who have a certain threshold probability of conceiving if they do not conceive after 3 IVF treatments. This allows couples and doctors to have a much better idea of the chances of conception and if IVF is really the right. This talk was a very good example of how AI is solving a real-world healthcare challenge.
Deploying AI in the Clinic: Thinking about the Box – Mark Gooding, Mirada
Mark Gooding is a character. He was uber active on the conference’s app, quite funny, and a snazzy dresser. Plus, he was a good speaker. He discussed software his company has developed that helps with finding the contours of medical images – DLC Expert. The thing that most stuck out about his talk was a statistic. He said that 70% of clinical institutions have some sort of auto-labeling/contouring software in place. However, of those institutions, 50% of them do not use it. Thus, products have been developed, sold, and implemented but are not being used. This highlights a point brought up in other talks. No matter how good a product is from an AI perspective, if it is not clinically relevant and easy to use, it will not be used and not benefit anyone. That has to stay front and center.
Using AI to Solve Diabetes & Diabetic Retinopathy – Stephen Odaibo, CEO, Founder, & Chief Software Architect, RETINA-AI Health
Dr. Odaibo, an ophthalmologist who specializes in the retina, discussed the different problems that can occur to the retina due to diabetes. His slides were fascinating, and he showed a video clip of an actual procedure where he did an injection into a patient’s eye. It was a little unsettling but got the point across that the complications to the retina due to diabetes are quite serious. His company has developed an app called “Retina-AI” which can accurately diagnose certain diseases of the retina. The model was built on thousands of images that were all expertly labeled by retina specialists. The software is in widespread use, with many users being optometrists. Using the software allows non-experts to have the equivalent of an expert performing the diagnosis and allowing for proper treatment to be given. This talk showed yet another real-world example of how AI is being used clinically to improve patient health.
Clinical Consideration in Implementing AI Solutions for Healthcare – Sujay Kakarmath, Physician-Scientist, Partners Healthcare, Pivot Labs
Dr. Kakarmath’s talk took a different approach than the other talks did. He addressed the potential implementation of AI solutions and the possible pitfalls. The project he discussed was aimed at predicting readmission of heart failure patients. Readmission is a serious consideration for healthcare providers. First, it is additional hospitalization and treatment for a patient who has recently undergone inpatient treatment for the same condition. Second, healthcare providers are penalized if patients are readmitted for the same problem within 30 days. Thus, there are significant implications for both patients and providers. In their project, they were able to achieve an AUC of 0.7 for predicting readmission of patients at one hospital. However, when they applied the model to patients from 5 other Boston hospitals, they only achieved an AUC of 0.57. This is not much better than random guessing as to whether or not someone will be readmitted. This underscored the point that generalizability of models across institutions is a significant challenge. He gave 3 takeaways that should be given careful consideration in any project.
AI Assisted Radiology Image Quality Assessment – Hye Sun Na, AI Product Manager, GE Healthcare
Dr. Na discussed a project where they built a classifier for chest x-rays. She said that a common problem in healthcare is incorrect x-rays. A physician may order a frontal chest x-ray, but the x-ray presented to the radiologist is not a frontal chest x-ray or the x-ray is not useful due to improper positioning. In these cases, the radiologist looks at the x-ray, notices the error, then has to have the x-ray redone. This is non-productive time for the radiologist, requires that the patient have the x-ray retaken, thereby adding additional radiation exposure, and increasing the time the patient is in the hospital or clinic. Together, these add up to additional costs and risks. Dr. Na’s team developed a classifier that can correctly determine if (1) the x-ray is actually a frontal chest x-ray, and (2) if the positioning is correct. For scenario 1, they created a one-class classifier using a VAE and CNN. In scenario 2, they created a binary classifier using a CNN. Their results were highly accurate. They are currently working to implement their models into medical imaging devices. Integration in the device could solve the problems presented above. When an x-ray that is supposed to be a frontal chest x-ray is taken, the radiographer will immediately know if it is correct. The patient will not have to have the x-ray taken again at a later time and the radiologist will be presented with the correct x-ray the first time.
This post gives summaries and personal commentaries of some, but not all, of the talks I heard on Day 1 of the Deep Learning in Healthcare conference hosted by Rework on May 23-24, 2019, in Boston. They are not listed in the order in which the sessions were delivered.
Some of the major overarching topics discussed were NLP in medical records, development of algorithms, and recurrent neural networks.
Panel Discussion: The Impacts of Machine Learning in Mental Health Care – Akane Sano, Rice University, Jordan Smoller, Psychiatrist, Harvard Medical School and Massachusetts General Hospital
This may have been the highlight of the day. I’ll just mention some of the major points Dr. Smoller brought up and discussed.
My second question was about if they were able to use socioeconomic data in conjunction with the medical record. He said in some cases they were, but that it would be tremendously helpful if it could be used.
Learning How the Genome Folds in 3D – Neva Duran, Aiden Lab, Baylor College of Medicine
I was particularly excited about this talk because the Aiden Lab is world renown in the field of biology. This lab invented the Hi-C sequencing method among other things. Dr. Duran discussed how they were trying to determine the location of enhancers and promoters along chromosomes. These can be difficult to find since they may be located great distances from each other yet work in concert. She described this as “spooky action at a distance.” I like that term. When enhancers and promoters interact, they bind together and form a looped DNA structure. By using deep learning to create “contact maps” of regions along the chromosomes, they were able to locate potential enhancers and promoters. This is something I want to investigate further.
Applying AI in Early Clinical Development of New Drugs – James Cai, Head of Data Science, Roche Innovation Center New York
This was the first talk that made me think of new use cases that could be developed for our customers. Dr. Cai discussed a phone app they developed that can identify standing, sitting, walking, and other movements. This is being used to give quantitative measurements of the activity of Parkinson’s Disease patients to see how their disease is progressing over time. He said this is critical because self-reporting of symptoms by patients is often inconsistent, vague, and lacks details. Next, he discussed how they are using smart watches to measure movement in schizophrenia patients. His hypothesis is that you can measure patient motivation and the levels and duration of depressive events by analyzing the amount of time patients are laying down or sleeping versus standing or moving. This can help physicians monitor the effectiveness of the medications the patient is taking without having to rely solely on self-reported data.
The “Why” Behind Barriers to Better Health Behaviors – Ella Fejer from the UK Science & Innovation Network
Dr. Fejer mentioned that our AI healthcare projects should “Be Inspirational.” I found this this statement to be challenging. So often we seem focused on demonstrating that something can be done. What if we focus on not just proving that it can be done, but can be done in a way that is “inspirational” to the clients and patients. What does “inspirational” mean? I think, in this context, it means that it should get people excited about what can be accomplished and how it can impact their lives for the better.
Application of a Deep Learning Model for Clinical Optimization and Population Health Management – Janos Perge, Principal Data Scientist, Clinical Products Team, CVS
Dr. Perge said that CVS owns Aetna, one of the largest insurance companies in the US. This gives them access to a wealth of medical data and records. I did not know CVS owned Aetna. His team has worked on 2 real-world applications. The first is to develop a low back surgery model to predict the risk of future back surgery on three time intervals. The second is to develop kidney failure models to predict the future risk of a member having chronic kidney failure and needing dialysis on different time intervals. Both projects have the potential to greatly lower costs since they can help physicians develop preventative treatment plans, thus avoiding major medical issues in the future. They developed hybrid models that consisted of LSTMs that utilized sequential ICD9 codes and a logistic layer to incorporate static features. One drawback of neural networks is the inability to tell why or how a model gives the results it does. This lack of being able to explain the results makes deep learning appear as a “Black Box.” To get around this, they added an Attention Layer near the top of their model that signaled relevant events. Dr. Perge did not go into much detail on this and I wish he had. They also added a Logistic Layer at the top of the model to incorporate static features from domain knowledge. They found that by using deep learning with transfer learning which included both sequential features and all static features, they were able to achieve the highest AUC.
How Chick-fil-A uses AI to Spot Food Safety Trends in Social Media – Davis Addy, Sr. Principal IT Leader, Food Safety & Product Quality
This was the only talk I attended that was not in the Healthcare track. I was interested in the talk because I wanted to see a real-world application of social media analysis. Their goal is simple: Use social media, such as Twitter and Yelp, to help spot potential food safety related issues. Their entire pipeline is run on AWS. They purchase social media data from a 3rd party company. This is analyzed using Amazon’s Comprehend sentiment analysis software. There are a couple of intermediate Python scripts they have written in-house. The results appear on a dashboard that both the restaurant and the corporate office can see. If a comment indicates a problem, the restaurant can respond as necessary. He discussed in detail the challenges of sentiment analysis and the steps they have had to do to make it more accurate. For example, take the word “sick.” One review might say “I ate a chicken sandwich and it made me sick.” This should be a negative sentiment. However, a teenager may write “Chick-fil-A makes a sick chicken sandwich.” In this case, the reviewer is using “sick” in a positive manner and this should get a positive sentiment. This was just one example he gave. There was a significant list of challenges they had to face. He said all of their code is being put on GitHub (github.com/chick-fil-a). I thought that was a nice touch.
Natural Language Processing for Healthcare – Amir Tahmasebi, Director of Machine Learning and AI, Codametrix
This was the first of many talks on natural language processing (NLP) in healthcare. Dr. Tahmasebi did a very good job of explaining some of the major challenges of using NLP with health records. These include: data size, data source/format, data structure, longitudinal data, clinical text and language complexity, language ambiguity, experience-driven domain knowledge and practice. Each of these is a hurdle that must be overcome in order to perform effective NLP. When discussing their deep learning strategy, he mentioned 2 things I had never heard of before. The first was ELMo, or Embeddings from Language Models, and the second was BERT, or Bidirectional Encoder Representations from Transformers. I’m not an NLP expert by any means so I’m going to spend some time looking into them. Thankfully he cited the papers that initially described these.
Deep Learning for the Assessment of Knee Pain – Vijaya Kolachalama, Assistant Professor, Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine
Dr. Kolachalama’s talk was particularly fascinating because he attempted to demonstrate how the location and severity pain, which is subjective, can actually be pinpointed by using deep learning. In his work, he looked at knee pain. His deep learning model used a convolutional Siamese network which uses the same set of weights while working in tandem on two different input vectors to compute comparable output vectors, outputting a heatmap that showed the predicted are of pain. During the QA, Dr. Tahmasebi (see summary above), asked if the network wasn’t just finding areas of abnormality in the knee. Dr. Kolachalama said this was the case, but that regions of abnormality are typically the source of pain so the model helped discover both the location of pain and any abnormalities. I asked if they had tried to apply the model to referred pain, such as when reported pain in the knee is actually caused by a pinched sciatic nerve in the hip. In this case, an area of abnormality would not be found in the knee. He said this is something they are looking into. I think there is potential there because finding no abnormality in the knee could direct the physician to look for other sources of the pain.
Distributed Tensorflow – Scaling Your Model Training – Neil Tenenholtz, Director of Machine Learning at the MGH and BWH Center for Clinical Data Science
This was a nice technical talk on how to employ multiple GPUs both within servers as well as across servers for both data parallel and model parallel training. He mentioned Horovod which allows GPUs from multiple servers to be used via MPI. I have recently spent some time studying Horovod and this helped augment my knowledge.
Most impactful quotes from the Deep Learning in Healthcare conference in Boston on May 23 and 24, 2019
I attended the “Deep Learning in Healthcare” conference put on by Rework in Boston on Thursday, May 23 and Friday, May 24. I can say it was one of the best conferences I have ever attended. The topics were relevant, challenging, innovative, and thought provoking. The sessions were only 15-20 minutes long. This allowed for more speakers to share and for the presentations to be succinct. Presenters stayed around the conference after their talks to answer questions and discuss further. The moderator kept the conference on time so all the speakers had their full amount of time. Overall, it was well run and exciting.
There were a couple of quotes from speakers that stood out to me and summed up the current state of Deep Learning and AI in healthcare. In this post, I am going to discuss these quotes. I will give details of the other talks in a follow-up post.
Quote 1: Ajit Narayanan, CTO of mfine, a company that developed a mobile app to help physicians and patients in India said, “There is approximately 1 physician per 5,500 people in India.” That’s the equivalent of having only 479 physicians for the entire population of Dallas county. In Dallas county alone, there were 5,924 physicians in 2015. The disparity is staggering. This statistic stood out to me because it demonstrated the amazing need that AI can potentially help with, not only in the US, but around the world. Physicians can be more accessible and more productive, thereby increasing the health of people who may not otherwise get it. Using AI to improve the world should be a significant goal of any project.
Quote 2: In a panel discussion titled “The Future of Healthcare – What Can We Expect?”, Dr. Anthony Chang, a pediatric cardiologist stated, “Computer vision in healthcare is easy overall and can be considered the low hanging fruit. The future needs to be in helping physicians individualize health plans and deliver precision medicine. It should use cognitive architectures and intelligence to assist physicians with decision making.” He continued and made the points that it is impossible for a physician to read a 200+ page patient medical record in the few minutes the physician has with the patient. He wants to see AI that can analyze the medical record and give direction on the prescribed treatment. Is there a certain condition the patient had in the past that is correlated with a condition the patient is likely to develop or that exacerbates the condition they are seeing the physician for? AI can take all the information in the patient record and analyze it quickly and in the context of the entire patient history as well as in the context of other patients with similar conditions. The physician could be presented with this information in a succinct manner and could use it to make more accurate diagnoses and more targeted treatment plans. I found this statement fascinating. Many, many talks at conferences are on using computer vision in healthcare, particularly in radiology and pathology. In those specialties, computer vision is critical. However, in many specialties, such as cardiology, it is not, and in psychiatry it is irrelevant. I have heard other physicians at conferences say similar things. When I asked what problem he would most like to see solved by AI, Dr. Daniel Rubin, a radiologist at Stanford University, said he would like a strong solution that predicts if patients will cancel their appointments. Cancelled appointments result in unproductive time for physicians and keep other patients from being seen because the time slot is designated as filled. Other physicians I have questioned have mentioned finding interactions between drugs and specific conditions that are not currently known but that can found by analysis of large numbers of patients. I think in some ways we may be pursuing the wrong avenues of AI research. We have to collaborate with physicians and solve problems they face in their practices daily. We, as non-physicians, can make assumptions about what will be helpful, but it may not be. AI in healthcare projects should be laser focused and done with strong collaboration with physicians or the project may end up being irrelevant or not useful in the end.
On Monday and Tuesday, 4/15 and 4/16, I had the opportunity to participate in a hackathon at the Bio-IT World conference in Boston. The experience was amazing. The hackathon was put on the NCBI hackathon team of Ben Busby (@DCGenomics), Kaitlyn Barago (@KaitlynMBarago), and Allissa Dillman (@DCHackathons). We have worked with this team at other hackathons that Mark III Systems has sponsored.
The overarching theme of the hackathon was FAIR data principles in science. FAIR stands for
F – Data is Findable
A – Data is Accesible
I – Data is Interoperable
R – Data is Re-usable
My team’s topic was “BLAST, Pipelines, and FAIR”. If you are not familiar with BLAST, it is a free software product, developed by NCBI, that allows you to search for DNA or protein sequences in a pre-made or custom databases. BLAST was by far the software package I used more than anything in my graduate work. I am a huge fan.
I was fortunate that our team leader was Tom Madden who is the head of BLAST team at NCBI!
Our project was to create a re-usable pipeline that could be used to automate a bioinformatic pipeline so it could be run by anyone in a standard environment, such as Linux. The only thing that would have to be changed is the input files that are used. For more detail on the actual pipeline and our final presentation, see our team’s Github page.
(https://github.com/NCBI-Hackathons/BLAST-Pipelines-and-FAIR). The presentation is in the Slides folder.
Our pipeline was developed using CWL (Common Workflow Language). CWL is an open source framework for creating workflow pipelines. All configuration information, such as data file paths, are stored in YAML files. As with CWL, YAML files are widely used as configuration files (for example, in Hadoop).
In the end, we had 1 very nice CWL file that ran the entire pipeline and 3 YAML configuration files. The CWL workflow was run using only one command from the command line.
Overall, it was a great experience. I met some great people from varied backgrounds and enjoyed the camaraderie and cooperation between all of the teams. I can’t wait for next year’s hackathon!
Big thanks to all of my team mates!
Amanda Ruby, Software Engineer/Bioinformatics Analyst at Rheonix, Inc. @AmandaRubyBio
Tom Madden, Team Lead for BLAST at the NCBI. @tom6931
Alexander Jung, Head of Digitalization Biologicals Development CMC at Boehringer Ingelheim
Matt Doherty, Founder at Resolute.ai, @ResoluteAI
Jody Burks, Developer Advocate, Quantum Computing Ambassador IBM, @JodyBurksPhD