If you have ever used Pandas, you know that it is great tool for creating and working with data frames. As with any software, optimizations can be done to make it perform better. Nvidia has developed software called RAPIDS, a data science framework that includes a collection of libraries for executing end-to-end data science pipelines completely on the GPU (www.rapids.ai). Included in RAPIDS is cuDF, which allows data frames to be loaded and manipulated on a GPU. In this post, I am going to discuss some benchmarking I have done with RAPIDS, particularly cuDF. I conducted multiple experiments where I created data frames that ran on the CPU using Pandas and data frames that ran on the GPU using cuDF, then executed common methods on those data frames. I will be using the term “processes” to describe the execution of the methods. I will also be using the convention that Nvidia has used, in which the Pandas data frames are named PDF, for Pandas data frame, and the cuDF data frames are named GDF, for GPU data frame.
For my benchmarking data, I used a CSV file from MIMIC-III, a freely accessible critical care database that was created by the MIT Lab for Computational Physiology (https://mimic.physionet.org/). The file was named MICROBIOLOGYEVENTS.csv. It consisted of 631,726 rows and 16 columns of data. I duplicated the records in the file to create new files that consisted of 5 million, 10 million, 20 million, and 40 million rows and 16 columns of data, respectively. Experiments were then conducted using each of these 5 files. An individual experiment is defined as running a processes on one of the 5 versions of the MICROBIOLOGYEVENTS files as input. Each experiment was repeated 5 times and the results averaged together to give the final results that I list.
The benchmarking was done on both an Nvidia DGX-1 and an IBM POWER Systems AC922 using a single GPU in each. The GPUs in the servers were both Nvidia V100 models, with the DGX-1 GPU having the model with 32GB of RAM and the AC922 having the 16GB model.
For the benchmarking, I ran some common processes on both PDF and GDF data frames and calculated the amount of time it took to run. The processes were done in the following order using a Jupyter Notebook that can be found on my Github (https://github.com/pacejohn/RAPIDS-Benchmarks).
In addition, I created an additional Jupyter Notebook that was used to concatenate 2 data frames. In this experiment, the MICROBIOLOGYEVENTS.csv, which has 631,726 rows, was concatenated onto each of the 5 MICROBIOLOGYEVENTS input files.
In 4 of the 9 experiments, the GDF outperformed the PDF regardless of the input file that was used. In 3 experiments, the PDF outperformed the GDF. Interestingly, in 2 experiments the PDF outperformed the GDF on small data frames but not on the larger ones. In the concatenation experiments, the GDF always outperformed the PDF. The results for the processes that were run on the AC922 are below. The results for the DGX-1 are similar. For complete results, including the actual times for the processes to run and the DGX-1 results, see my Github (RAPIDS_Benchmarks.xlsx).
The most remarkable differences in performance were in the following processes.
GDF Outperforms PDF
PDF Outperforms GDF
PDF Outperforms GDF on Smaller Data Frames
As shown above, data frames that run on the GPU can often speed up processes that manipulate the data frame by 10x to over 1,000x when compared to data frames that run on the CPU, but this is not always the case. There is also a tradeoff in which smaller data frames perform better on the CPU while larger data frames perform better on the GPU. The syntax for using a GDF is slightly different than using a PDF, but the learning curve is not steep, and the effort is worth the reward. I’m going to try the same benchmarking on some other data sets and use some other methods to see how the results compare. Stay tuned for the next installation.
If you have questions and want to connect, you can message me on LinkedIn or Twitter. Also, follow me on Twitter @pacejohn, LinkedIn https://www.linkedin.com/in/john-pace-phd-20b87070/, and follow my company, Mark III Systems, on Twitter @markiiisystems
This article is also published on Medium at https://medium.com/@johnpace_32927/benchmarking-nvidia-rapids-cudf-versus-pandas-4da07af8151c.
#ai #artificialintelligence #machinelearning #deeplearning #powersystems #ac922 #nvidia #dgx1 #gpu #gpus #pandas #cuDF #cuda #dataframe #python #rapids #jupyter #mimic #mit