View on GitHub

UVA LLM Workshop

The 1st UVA Workshop on Large Language Models for Science and Engineering, Oct 19-20, 2024

Exploring New Frontiers in Materials Design with Large Language Models

Speaker: Dr. Wesley Reinhart, Assistant Professor in Materials Science and Engineering, Penn State

Slides (UVA login required)

Abstract

Large Language Models (LLMs) have been the subject of countless headlines in mainstream news outlets in recent years as the world grapples with legal and ethical issues related to their training and use. The widespread adoption of LLMs appears irreversible, with estimates of billions of queries being processed each day, yet the implications for science are still murky. In this talk, I will present my thoughts on the roles of LLMs in the physical sciences, including some exciting opportunities and potential pitfalls. I will give a detailed description of two applications of LLMs to materials science that my group has worked on recently: small molecule design and sequence-controlled copolymer self-assembly. In the case of small molecule design, we leverage the transformer architecture to modify molecules encoded as SMILES strings. We show that LLMs can modify these molecules in various nuanced ways as dictated by natural language prompts. The second problem of copolymer sequence selection highlights the surprising capability of LLMs to perform evolutionary optimization at or above the level of a standard evolutionary algorithm. I will conclude with my thoughts on what characteristics of problems in the physical sciences make them suitable to leverage LLMs and how to use them effectively.

Bio

Wesley Reinhart is an Assistant Professor of Materials Science and Engineering and Institute for Computational and Data Sciences Faculty Co-hire at the Pennsylvania State University. He received his B.S. in Chemical Engineering from the University of Minnesota Twin Cities and Ph.D. in Chemical and Biological Engineering from Princeton University. His doctoral thesis focused on strategies for predicting, understanding, and controlling colloidal crystallization using large-scale computer simulations and unsupervised machine learning. As a Research Scientist at Siemens Corporate Technology, he developed machine learning methods to solve problems in computational geometry, knowledge representation, and material modeling for additive manufacturing applications. In 2020, he launched his academic career at Penn State, where his research group now focuses on developing practical solutions to materials design challenges, especially those related to small, sparse, and noisy data sets within large design spaces.

AuroraGPT/Eval: Establishing a methodology to evaluate LLMs/FMs as Research Assistants

Speaker: Dr. Franck Cappello, Senior Computer Scientist, Argonne National Laboratory

Abstract

The capabilities of large language models such as ChatGPT, Claude, Gemini, and Llama have progressed dramatically in the past 2-3 years, raising the question of using them in a research context as research assistants. Moreover, recent results and publications suggest that future generations of LLMs may exceed scientist skills. However, while many benchmarks exist to assess the general language skills of these models, there is no established methodology to evaluate these models as scientific assistants. This talk introduces and presents the current state of the effort at Argonne in the context of the AuroraGPT project to establish a methodology to rigorously evaluate the capabilities, trustworthiness, and safety of LLMs as research assistants. As we will show, this is a complex open problem demanding expertise in domain sciences (including computer science), AI, and psychometrics.

Bio

Cappello received his Ph.D. from the University of Paris XI in 1994 and joined CNRS, the French National Center for Scientific Research. In 2003, he joined INRIA, where he holds the position of permanent senior researcher. He initiated the Grid’5000 project in 2003 and served as Director of Grid’5000 (https://​www​.grid5000​.fr) in its design, implementation, and production phase from 2003 to 2008. Grid’5000 is still used today and has helped hundreds of researchers with their experiments in parallel and distributed computing and to publish more than 2000 research publications. In 2009, Cappello became a visiting research professor at the University of Illinois. He created with Marc Snir the Joint Laboratory on Petascale Computing that was developed in 2014 as the Joint Laboratory on Extreme-Scale Computing (JLESC: https://​jlesc​.github​.io) gathering seven of the most prominent research and production centers in supercomputing: NCSA, Inria, ANL, BSC, JSC, Riken CCS and UTK. Over his ten-year tenure as the director of the JLPC and JLESC, Cappello has helped hundreds of researchers and students share their research and collaborate to explore the frontiers of supercomputing. From 2008, as a member of the executive committee of the International Exascale Software Project, he led the roadmap and strategy efforts for projects related to resilience at the extreme scale.

In 2016 Cappello became the director of two Exascale Computing Project (ECP: https://​www​.exas​calepro​ject​.org/) software projects related to resilience and lossy compression of scientific data that will help Exascale applications to run efficiently on Exascale systems.

Through his 30 years of research career, Cappello has directed the development of several high-impact software tools, including XtremWeb, one of the first Desktop Grid softwares, the MPICH-V fault tolerance MPI library, the VeloC multilevel checkpointing environment, the SZ lossy compressor for scientific data (https://​exas​calepro​ject​.org/​w​p​-​c​o​n​t​e​n​t​/​u​p​l​o​a​d​s​/​2​0​1​9​/​1​1​/​V​e​l​o​C​_​S​Z.pdf).

He is an IEEE Fellow, the recipient of the 2024 IEEE CS Charles Babbage Award, the 2024 Europar Achievement Award, the 2022 HPDC Achievement Award, two R&D100 awards (2019 and 2021), the 2018 IEEE TCPP Outstanding Service Award, and the 2021 IEEE Transactions of Computer Award for Editorial Service and Excellence.