In my last post, I talked about the computational challenges posed by graph analytics. I’d like to follow that up with a discussion of benchmarks.
The objective of a benchmark is to identify a representative problem, or suite of problems, which can be used to compare the performance of different systems. The Top 500 list (www.top500.org), for example, poses the challenge of a dense system of linear equations, and is used to identify the 500 fastest computers in the world. Great are the bragging rights of the champions… but the real benefit is the system optimization to ensure the best performance is delivered for the money spent.
The Graph 500 benchmark (www.graph500.org) arose out of a realization that graph analytic problems have computational requirements that are very different than the physics simulation problems characterized by the Top 500 benchmark. This is a good news/bad news scenario: it’s great that the importance of graph analytics is being recognized, but the actual benchmark is too limited to represent a “typical” graph workload. Let me explain.
Graph 500 uses Breadth First Search (BFS) starting from a random node in the graph as the “typical” graph application. Unlike many graph algorithms, BFS can be implemented as a “Bulk-Synchronous Parallel” algorithm. This class of algorithm is very familiar to all High Performance Computing programmers: each node does as much work as it can locally, collecting together messages that have to be sent to other nodes; then when the node hits a barrier, it transmits its collected messages and receives work from the other processors. Process the received messages, generate new ones, repeat. Inter-processor communication can be infrequent and batched. More processors equals more performance, and clever programming and big HPC systems with tens of thousands of processors will dominate the Graph 500 as long as it’s based on Breadth First Search.
Clever programming? Well, benchmark results on BFS will vary depending on how the graph is partitioned across a distributed memory compute cluster because the more work that can be done before inter-processor communication has to occur, the better. So, there are many research efforts focused on trying to calculate the optimum partitioning for BFS – which is theoretically interesting, but not terribly relevant to real world graph analytics.
So my greatest concern with Graph 500 is this: it sends the message to vendors that they do not need to innovate: that clever programming and lots of processors and caches are rewarded by a high ranking on the list.
What do you think? Should more graph analytics kernels be added to the Graph 500 list? Which graph algorithms do you think should be added?