Day 1: Keynote: Bale 3.0 Applications and Conveyors
Speaker: Jason Devinney, Johns Hopkins University
Abstract: The bale effort is, first and foremost, a vehicle for discussion for parallel programming productivity. The bale effort attempts to demonstrate some challenges of implementing interesting (i.e. irregular) scalable distributed parallel applications, demonstrate an approach for using aggregation libraries and explore concepts that make it easier to write, maintain, and get top performance from such applications. We use bale to evolve our thinking on parallel programming in the effort to make parallel programming easier, more productive, and more fun. Yes, we think making it fun is a worthy goal!
Bio: Jason DeVinney got his PhD in Applied Mathematics from Johns Hopkins University. Ever since, he has been on the research staff at the Center for Computing Sciences in Bowie, Maryland. There he works on high performance computers, parallel algorithms, and programmer productivity.
Day 2: Keynote: NVSHMEM: GPU-Integrated Communication for NVIDIA GPU Clusters Speakers: Jim Dinan
Abstract:
NVSHMEM extends the OpenSHMEM specification with support for clusters containing NVIDIA GPUs. NVSHMEM allows the programmer to aggregate the memory of multiple GPUs into a single Partitioned Global Address Space (PGAS) that can be transparently accessed through CPU, CUDA stream, and CUDA kernel interfaces that read, write, and atomically update this shared memory space. By allowing communication operations to be enqueued on CUDA streams, NVSHMEM can eliminate the need to synchronize the stream before performing communication operations. And, by enabling communication to be performed directly from within CUDA kernels, NVSHMEM can eliminate the need to exit kernels prior to performing communication, thereby enabling efficient overlap of communication with computation. This talk will describe recent developments in NVSHMEM and describe how it is being used to improve the performance of HPC applications running on NVIDIA GPU clusters.
Speakers: Jim Dinan, NVIDIA
Bio: Jim Dinan is a
principal engineer at NVIDIA and leads the HPC communication software efforts.
Jim was a James Wallace Givens postdoctoral fellow at Argonne National
Laboratory and earned a Ph.D. in computer science from Ohio State University.
He has spent nearly a decade serving on open standards committees for parallel
programming models, including MPI and OpenSHMEM.
Day 3: Keynote: Multiresolution Support for Aggregated Communication in Chapel
Speaker: Brad Chamberlain, HPE (featuring contributions by Elliot Ronaghan and Engin Kayraklioglu)
Abstract: Chapel is a portable programming language designed for
productive parallel computing at scale. One of Chapel's recent flagship
applications is Arkouda, which provides a Python interface to key NumPy
and Pandas operations for data science at massive scales and interactive
rates. In this talk, we will introduce Chapel and Arkouda, then move on
to showing recent performance advances for Arkouda's most strenuous
kernels performing gathers, scatters, and sorts.