Graph 500: The Quest for a Challenging Graph Benchmark

AmarIn my last post, I talked about the computational challenges posed by graph analytics. I’d like to follow that up with a discussion of benchmarks.

The objective of a benchmark is to identify a representative problem, or suite of problems, which can be used to compare the performance of different systems. The Top 500 list (www.top500.org), for example, poses the challenge of a dense system of linear equations, and is used to identify the 500 fastest computers in the world. Great are the bragging rights of the champions… but the real benefit is the system optimization to ensure the best performance is delivered for the money spent.

The Graph 500 benchmark (www.graph500.org) arose out of a realization that graph analytic problems have computational requirements that are very different than the physics simulation problems characterized by the Top 500 benchmark. This is a good news/bad news scenario:  it’s great that the importance of graph analytics is being recognized, but the actual benchmark is too limited to represent a “typical” graph workload. Let me explain.

Graph 500 uses Breadth First Search (BFS) starting from a random node in the graph as the “typical” graph application.    Unlike many graph algorithms, BFS can be implemented as a “Bulk-Synchronous Parallel” algorithm. This class of algorithm is very familiar to all High Performance Computing programmers: each node does as much work as it can locally, collecting together messages that have to be sent to other nodes; then when the node hits a barrier, it transmits its collected messages and receives work from the other processors. Process the received messages, generate new ones, repeat. Inter-processor communication can be infrequent and batched. More processors equals more performance, and clever programming and big HPC systems with tens of thousands of processors will dominate the Graph 500 as long as it’s based on Breadth First Search.  

Clever programming? Well, benchmark results on BFS will vary depending on how the graph is partitioned across a distributed memory compute cluster because the more work that can be done before inter-processor communication has to occur, the better. So, there are many research efforts focused on trying to calculate the optimum partitioning for BFS – which is theoretically interesting, but not terribly relevant to real world graph analytics.

So my greatest concern with Graph 500 is this: it sends the message to vendors that they do not need to innovate:  that clever programming and lots of processors and caches are rewarded by a high ranking on the list.  

What do you think? Should more graph analytics kernels be added to the Graph 500 list?  Which graph algorithms do you think should be added?

2 thoughts on “Graph 500: The Quest for a Challenging Graph Benchmark

  1. BFS’s primary requirement is NOT computational (unless you’re using an overly-broad definition of “computational”), but rather speaks more to the memory and communication requirements. The Kronecker graph that Graph500 requires is designed specifically to be difficult to partition (and representative of graphs that are interesting to people, though i happen to know there’s movement afoot to improve the generator to make the graphs even more representative and more difficult)—and if you have a good way of partitioning such a graph across a distributed memory cluster, I know a lot of folks who would love to give you a lot of money to know your secret. The determinant of performance tends more to be the ability to throw more memory controllers at the problem, modulo the latency of sending (relatively) small messages across the network. If your complaint is that there isn’t enough push to innovate, fair enough, but I would point out that some of the most interesting Graph500 data points in terms of teps-per-node are ones like the Convey systems (who, if you sort the list by teps/node, have three of the top six entries).

  2. Not to suggest that YARCData isn’t on to some powerful ideas, but to play devil’s advocate for a moment, and push you to clarify a bit: How do you distinguish between “innovating” and “clever programming?”

Leave a Reply

Your email address will not be published. Required fields are marked *

*


− three = 6

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>