RFQ/BOM 0 サインイン / 登録する

所在地を選択してください

Comparison of FPGA and GPU computing and storage acceleration: performance considerations per unit power consumption

9月 17, 2021

2160

In order to improve the performance of computing infrastructure and keep up with the increasing demands of data analysis and AI, many companies regard hardware acceleration as the main solution. In most cases, advanced programmable hardware (mainly GPUs and FPGAs) is the main method of acceleration. By using this advanced hardware, companies are gaining a computing advantage; however, they still have reasonable concerns about the difficulty of programming.

image.png

Hardware manufacturers are applying acceleration methods to computing storage, which is specifically designed for storage that contains embedded computing elements. This method has been proven to provide excellent performance for analytics and AI applications (Figure 1). Analysis and verification with or without the aid of machine learning can be accelerated with the help of computing storage devices. These devices provide a key advantage, allowing costly calculations to be offloaded to storage devices instead of having to be done on the server CPU. Compared with the standard storage/CPU method, the advantages gained through computational storage include:


1. Customize programmable hardware with application-specific programming to obtain higher performance


2. Offload computing tasks from the server to the storage device, freeing up CPU resources


3. Data and computing are co-located, reducing data transmission requirements


This novel method has a bright future. However, you should evaluate this approach based on specific use cases, considering performance, cost, power consumption, and ease of use. Cost performance and unit power consumption performance occupy the main ratio when selecting accelerated hardware evaluation. In this article, we will discuss the unit power consumption performance (the other article is devoted to the price/performance ratio).


Computational storage power consumption comparison


3 systems


In this scenario, we will compare three tools that focus on CSV data reading use cases: NVIDIA GPUDirect storage and RAPIDS storage, and Samsung SmartSSD storage based on Xilinx technology. CSV reading plays an important role in a computationally intensive pipeline (see Figure 1).


In the following, we define performance as the processing rate of CSV, or processing "bandwidth". Let's quickly review the operation of the three systems.


NVIDIA GPUDirect storage


·Satisfy analysis and AI needs end-to-end


·Using GPU as a computing unit, closely following the NVMe-based storage device layout (GPUDirect)


·Programming with CUDA (RAPIDS)


Nvidia uses its CSV data reading technology to measure the performance improvement relative to standard SSDs. The result is shown in Figure 1. When using 1 to 8 accelerators, the corresponding throughput is 4 to 23GB/s.


Samsung SmartSSD Drive


·Using Xilinx FPGA as a computing unit


Built-in storage logic resides on the same internal PCIe interconnect


· Carry out calculations on the storage platform through programming


Bigstream, a Xilinx data analysis solution partner, worked with Samsung to design an accelerator for Apache Spark, including IP for CSV and Parquet processing. The SmartSSD test uses a CSV parsing engine in stand-alone mode for comparison. The results are shown in Figure 2. When using 1 to 12 accelerators, the corresponding throughput is 4 to 23GB/s, and the results of Nvidia (using 1 to 8 accelerators) are also given. Please note that all results in this discussion are parameterized in terms of the number of accelerators on the x-axis.


These results are exciting, but when choosing your solution, be sure to take power consumption into consideration.

image.png

Comparison of unit power consumption performance


Figure 3 shows the results of the analysis taking power consumption into account. They represent the performance level achieved by the unit power consumption. Based on the relevant materials cited in the above discussion, the following assumptions are given:


·Tesla V100 GPU: Maximum power consumption of 200 watts


·SmartSSD drive FPGA: maximum power consumption of 30 watts

image.png

In this scenario, calculations show that when all 8 accelerators are used, the unit power consumption performance of SmartSSD is 25 times higher than that of GPUDirect Storage.


FPGA vs. GPU: Final thoughts on performance per power consumption


The advantage of computing storage is to enhance the performance of data analysis and AI applications. However, in order for this method to have the capability and practicality for practical deployment, power consumption must be taken into consideration in the evaluation.


For two different calculation and storage methods for CSV data analysis, we have proposed a throughput performance curve parameterized by power consumption. The results show that when a similar number of accelerators are used for comparison, the performance per unit power consumption of the SmartSSD drive is better than that of the GPUDirect storage method.