Three Goals in Parallel Graph Computations: High Performance, High Productivity, and Reduced Communication
Graph theory is used to model large-scale complex systems in various domains, such as genomics and social network analysis. I will begin by describing a software stack for parallel analysis of very large graphs whose layered architecture combines performance, customizability, and conceptual simplicity though separation of concerns.
The Combinatorial BLAS backend implements state-of-the-art parallel algorithms for sparse matrix primitives that serve as building blocks for tightly-coupled graph computations. The Python based frontend, the Knowledge Discovery Toolbox (KDT), provides high-productivity and simpler graph abstractions. KDT’s analytic queries on filtered semantic graphs achieve high-performance through selective just-in-time embedded specialization that automatically translates KDT filters into efficiency layer. Filtering on attributed semantic graphs ena ble a broad set of real applications that maintain important auxiliary information through attributes on individual edges and vertices.
I will then present new parallel algorithms that reduce the communication costs of certain graph computations. The new parallel communication-avoiding algorithms are all-pairs shortest-paths, breadth-first search, and sparse matrix-matrix multiplication. Most graph algorithms have low computational intensity; hence their execution time is bounded by their communication requirements. In addition to improving the running time drastically, reducing communication can also help improve the energy consumption of graph algorithms.