Sequoia tops Graph 500 list of 'big data' supercomputers

Graph 500 logo (Download Image)

The Livermore Lab's Sequoia supercomputer topped the biannual Graph 500 list of the world's fastest systems for "big data" this week. The Graph 500 benchmark measures the speed with which a supercomputer can "connect the dots" within a massive set of data. Sequoia traversed 15,363 connections per second.

LLNL's 20 petaflops Sequoia supercomputer has retained its No. 1 ranking on the Graph 500 list, a measure of a system's ability to conduct analytic calculations -- finding the proverbial needle in the haystack.

An IBM Blue Gene Q system, Sequoia was able to traverse 15,363 giga edges per second on a scale of 40 graph (a graph with 2^40 vertices). The new Graph 500 list was announced at the International Supercomputing Conference (ISC'13) in Leipzig, Germany earlier this week. Sequoia has held the top ranking on the Graph 500 since November 2011.

"The Graph 500 provides an additional measure of supercomputing performance, a benchmark of growing importance to the high performance computing (HPC) community," said Jim Brase, deputy associate director for Big Data in the Computation Directorate. "Sequoia's top Graph 500 ranking reflects the IBM Blue Gene/Q system's capabilities. Using this extraordinary platform Livermore and IBM computer scientists are pushing the boundaries of the data-intensive computing critical to our national security missions."

The Graph 500 list ranks the world's most powerful computer systems for data-intensive computing and gets its name from graph-type problems -- algorithms -- at the core of many analytics workloads in applications, such as those for cyber security, medical informatics and data enrichment. A graph is made up of interconnected sets of data with edges and vertices, which in a social media analogy might resemble a graphic image of Facebook, with each vertex representing a user and edges the connection between users. The Graph 500 ranking is compiled using a massive data set test. The speed with which a supercomputer, starting at one vertex, can discover all other vertices determines its ranking.

Blue Gene/Q systems have dominated the Graph 500 over the last two years. What this means is that Sequoia is the world's highest performance supercomputer for processing gargantuan data sets of petabyte size (quadrillions of bytes).

Data-intensive computing, also called 'big data,' has become increasingly important to Lab missions as HPC platforms have become increasingly powerful, producing enormous quantities of information. The challenge for scientists is that the ability of HPC systems to produce and collect data far outstrips the rate at which those computers can analyze that information.

Extracting essential nuggets of information, sometimes called data mining, and/or recognizing patterns in large data sets is important to such LLNL missions as nonproliferation, atmospheric monitoring, intelligence, bioinformatics and energy. For example, computer scientists develop techniques for organizing data that allows nonproliferation analysts to pinpoint anomalous behavior in a huge quantity of data.

Computer scientists in LLNL's Center for Applied Scientific Computing (CASC) are working on novel HPC architectures that facilitate data-intensive computing. Leading researchers in this field at Livermore include Maya Gokhale, Roger Pearce, Van Emden Henson and Robin Goldstone. Pearce presented innovative graph traversal algorithms he developed at an ISC-13 session on Memory System Design Thursday. Bronis de Supinski, chief technology officer, accepted the No. 1 Graph 500 certificate on behalf of LLNL.

For more on data-intensive computing at LLNL, see the January 2013 edition of Science & Technology Review.