The new kid on the block: GPU-accelerated big data analytics
This article originally featured in the CIO review.
With open-source big data frameworks such as Apache Hadoop and Spark in the spotlight, most people are probably unfamiliar with the concept of using GPUs (graphics processing units) in either big data or analytics-rich applications. 9 out of 10 cases, the acronym is mentioned in the context of display hardware, video games, or how supercomputers can be built these days. For serious IT managers or data scientists, GPUs may seem too exotic to be the hardware of choice for big data infrastructure. While we see challenges ahead, there are however a few major misconceptions about GPUs that can use some clarification. In a nutshell, we believe most (if not all) of the analytics needs in big data can be met with advanced GPU-based machine learning technologies.
Myth #1: GPUs are only good for gamers or supercomputers
Truth: It’s true that the early adopters of GPUs are mostly in the computer gaming industry or makers of supercomputers. However, the massively parallel computing power of GPUs can also be used to speed up machine learning or data mining algorithms that have nothing to do with 3D graphics. Take the Nvidia Titan Black GPU as an example, it has 2880 cores capable of performing over 5 trillion floating-point operations per second (TFLOPS). For comparison a Xeon E5-2699v3 processor can perform about 0.75 TFLOPS, but may cost 4x as much. Besides TFLOPS, GPUs also enjoy a significant advantage over CPUs in terms of memory bandwidth, which is more important for data intensive applications. For Titan Black, its maximum memory bandwidth is 336 GB/sec; whereas E5-2699v3’s is only 68 GB/sec. Higher memory bandwidth means more data can be transferred between the processor and its memory in the same amount of time, which is why GPUs can process large quantities of data in a split second.
One of the hottest areas of machine learning nowadays is Deep Learning (DL), which uses deep neural networks (DNNs) to teach computers to perform tasks such as machine vision and speech recognition. GPUs are widely used in the training of DNNs, which can take up to a few months on the CPU. With GPU-accelerated DL packages such as Caffe and Theano, the training time is often reduced to a few days.
Myth #2: GPUs are only for small data
Truth: It’s true that GPU cards have limited on-board memory, which cannot be upgraded once they are manufactured, unlike the RAM of a CPU. Furthermore, the maximum RAM size of a GPU is typically much smaller. For example, the maximum memory currently supported by Nvidia GPUs is 12GB; whereas a multi-socket CPU system can have up to a few TBs of RAM. The conventional thinking is that GPUs are only suitable for processing small datasets.
There are, however, many ways to scale up GPUs for larger datasets. As a big data machine learning library, Berkeley’s BIDMach uses only a single GPU to outperform the fastest cluster systems running on up to a few hundred nodes. Our own experience with GPU-accelerated machine learning confirms this observation. Scalability can be significantly improved through the design of succinct data structures that can represent the original dataset compactly. For example, we found that it is possible to load the Twitter follower-followee graph with 1.4 billion connections on a single GPU. The same graph has also been benchmarked by GraphLab (an open-source parallel graph processing library), which used a cluster of 64 computers on Amazon EC2 with 23 GB of RAM in each instance. Not only can such a gigantic graph fit in a single GPU, the speed of PageRank is also more than 2x faster on a single GPU than a 64-node cluster. Given that the Twitter graph is one of the largest graphs tested in recent big data literature, we believe GPUs can handle many sizeable graph datasets without issues.
For many machine learning algorithms, it is not required that the entire dataset be present all at once. In the deep learning case, as long as the DNN itself can fit in the memory of a GPU, the training data can be streamed from the main memory of a CPU or even from disk, especially since adjusting the weights of the DNN is usually far more time-consuming than streaming training examples. In such cases, the size of the dataset a GPU can handle is only limited by the
size of the disk, which is the same for the CPU as well. Of course, not all machine learning algorithms fit this profile, but a majority of them do. Modern GPUs can transfer data both to and from the CPU while performing computation at the same time.
There are ways of incorporating GPUs in standard big data frameworks such as Hadoop. For example, there is CUDA on Hadoop, which uses GPUs in the second level after employing MapReduce as the first level of parallelization. Study has shown that a GPU-accelerated Hadoop cluster can achieve up to 20x speedup and reduce up to 95% power consumption.
Myth #3: GPUs are impossible to program by ordinary people
Truth: It’s true that GPUs are not as easy to program as their CPU counterparts, due to their unconventional processor designs. Furthermore, direct GPU support in high-level programming languages such as Java and Python still leaves a lot to be desired, although this is less of an issue for people who are already familiar with C++, which can directly access a rich set of APIs and tools for general-purpose GPU programming. For casual GPU programmers and data scientists, their best bet is to use domain-specific languages (DSLs) that are well tailored to their application domains. For example, statisticians can use GPU-accelerated algorithms in Matlab or R to dramatically speed up the computation. DSLs offer improved productivity, portability, and performance, without a steep learning curve. At PARC, we are researching ways to automatically generate optimized GPU code from high-level specifications of the algorithm with little knowledge about the underlying hardware. Once completed, it will enable fast GPU programming and real-time big data analytics running on top of a wide array of GPUs, each of which can have different hardware characteristics such as their compute capabilities, the number of streaming multiprocessors, the number of registers per core and etc. In the long run, we would like to support other forms of accelerator-based big data analytics besides GPU, including those based on Intel Xeon Phi coprocessors.
While there are still challenges for GPUs to be used in mainstream analytics applications, they can bring significant values to the full big data stack. For example, our GPU-based k-means
algorithm achieves higher performance than any CPU cluster-based implementation, and it even outperforms BIDMach’s highly efficient k-means implementation by 2x. For graph-based machine learning, our GPU-based PageRank can finish a full iteration under 50 milliseconds on the same benchmark graph that would take Spark/GraphX about 3.5 seconds using the cluster of 16 computers, each with 8 cores. Measured in performance per dollar and per watt, our GPU-based analytics is a few hundred times more cost and energy efficient than state-of-the-art CPU-based systems. Furthermore, if one cares about the physical space efficiency of big data systems in terms of performance per unit rack space, then ours is typically over a thousand times more space-efficient than its CPU-based competitors, since up to 8 GPU cards can fit in a single commodity x86 server.
GPUs are the new kid on the block with many unique traits that can disrupt the field of big data. For IT professionals who are interested in not only the scalability, but also the speed, the cost, the energy and rack-space footprint of big data systems, GPUs are a force to be reckoned with.
Rong Zhou, senior researcher and Manager of the High-Performance Analytics (HPA) area of the Interaction and Analytics Laboratory at PARC, a Xerox Company.
Our work is centered around a series of Focus Areas that we believe are the future of science and technology.
We’re continually developing new technologies, many of which are available for Commercialization.