Jane Lo, Singapore Correspondent speaks with Al Geist, Corporate Research Fellow at Oak Ridge National Laboratory (“ ORNL” ). He is the chief technical officer of the Exascale Computing Project, as well as the CTO of the Leadership Computing Facility and chief scientist for the Computer Science and Mathematics Division at ORNL. He is helping lead the acquisition of the Frontier Exascale computer at ORNL. His recent research is on Exascale computing and resilience needs of the hardware and software.
At ORNL, Geist has published two books and over 200 papers in areas ranging from heterogeneous distributed computing, numerical linear algebra, parallel computing, collaboration technologies, solar energy, materials science, biology, and solid state physics.
Geist is one of the original developers of PVM (Parallel Virtual Machine), which became a worldwide de facto standard for heterogeneous distributed computing. He was also actively involved in the design of the Message Passing Interface (MPI-1 and MPI-2) standard. He was involved in the development of FT-MPI, a research prototype to explore how to make MPI applications fault tolerant.
In this podcast, Al goes behind the scenes to give the audience a glimpse into Frontier, the first Exascale supercomputer in the USA.
By referencing the highlights he presented at the Supercomputing Asia (28th Feb 2022 – 3rd March 2022, Singapore), he shares with the audience the challenges the team overcame to build Frontier. With the capabilities to perform billions of billions of floating point operations per second (“Exsacale”), Frontier joins Fugaku – the Japanese supercomputer currently ranked as the world’s fastest – in the list of high performance computers that have reached the Exascale milestone.
Besides speed, Al also explains how reliability – ability to mitigate computation failures and errors – is crucial to supercomputers in delivering results that decision makers can rely on with confidence.
Al also discusses the innovations in the cooling infrastructure to address the challenge of rising energy consumption that comes with increasing computation power. He also points to the impressive work in refitting the buildings that house Frontier – including rein-enforcing the 20,000 square feet of floor areas to withstand the weight of 8,000 pounds of supercomputer cabinets.
With applications running on Frontier that are of high sensitivity including national security implications, he also touches on security considerations – such as controls over remote access as well as physical access, and data segregation.
Looking ahead, Al shares how, by programming and coding smarter, supercomputers will continue to deliver the gains in computational speeds for a couple more generations to come, despite the slowdown in semiconductor advancements.
Recorded 7th March 2022 6pm (US Eastern Time) / 8th March 2022 7am (Singapore).