Day 1: Keynote: Bale 3.0 Applications and Conveyors

Speaker: Jason Devinney, Johns Hopkins University

Abstract: The bale effort is, first and foremost, a vehicle for discussion for parallel programming productivity. The bale effort attempts to demonstrate some challenges of implementing interesting (i.e. irregular) scalable distributed parallel applications, demonstrate an approach for using aggregation libraries and explore concepts that make it easier to write, maintain, and get top performance from such applications. We use bale to evolve our thinking on parallel programming in the effort to make parallel programming easier, more productive, and more fun. Yes, we think making it fun is a worthy goal!

Bio: Jason DeVinney got his PhD in Applied Mathematics from Johns Hopkins University. Ever since, he has been on the research staff at the Center for Computing Sciences in Bowie, Maryland. There he works on high performance computers, parallel algorithms, and programmer productivity.

A person in a red shirt

Description automatically generated with medium confidence

Day 2: Keynote: NVSHMEM: GPU-Integrated Communication for NVIDIA GPU Clusters Speakers: Jim Dinan

Abstract:

NVSHMEM extends the OpenSHMEM specification with support for clusters containing NVIDIA GPUs. NVSHMEM allows the programmer to aggregate the memory of multiple GPUs into a single Partitioned Global Address Space (PGAS) that can be transparently accessed through CPU, CUDA stream, and CUDA kernel interfaces that read, write, and atomically update this shared memory space. By allowing communication operations to be enqueued on CUDA streams, NVSHMEM can eliminate the need to synchronize the stream before performing communication operations. And, by enabling communication to be performed directly from within CUDA kernels, NVSHMEM can eliminate the need to exit kernels prior to performing communication, thereby enabling efficient overlap of communication with computation. This talk will describe recent developments in NVSHMEM and describe how it is being used to improve the performance of HPC applications running on NVIDIA GPU clusters.

Speakers: Jim Dinan, NVIDIA

Bio: Jim Dinan is a principal engineer at NVIDIA and leads the HPC communication software efforts. Jim was a James Wallace Givens postdoctoral fellow at Argonne National Laboratory and earned a Ph.D. in computer science from Ohio State University. He has spent nearly a decade serving on open standards committees for parallel programming models, including MPI and OpenSHMEM.

Author: Jim Dinan | NVIDIA Developer Blog

Day 3: Keynote: Multiresolution Support for Aggregated Communication in Chapel

Speaker: Brad Chamberlain, HPE (featuring contributions by Elliot Ronaghan and Engin Kayraklioglu)

Abstract: Chapel is a portable programming language designed for

productive parallel computing at scale. One of Chapel's recent flagship

applications is Arkouda, which provides a Python interface to key NumPy

and Pandas operations for data science at massive scales and interactive

rates. In this talk, we will introduce Chapel and Arkouda, then move on

to showing recent performance advances for Arkouda's most strenuous

kernels performing gathers, scatters, and sorts.

Brad Chamberlain