Speakers: Dan Jacobson, PhD

Dan Jacobson, PhD

Chief Scientist for Computational Systems Biology
Oak Ridge National Laboratory

Dan’s research focuses on understanding the complex sets of interactions of molecules of all types (across all omics layers) in cells that lead to phenotypes, traits and disease states in organisms and how all of that is conditional on the surrounding environment. His research team applies these approaches to grand challenges in bioenergy, sustainable agriculture, ecosystems, zoonotic spillover and human health (and the intersections among those areas, i.e., One Health).

Recently, Dan’s lab has been doing a range of research to address the COVID-19 pandemic, including studies of the molecular evolution and pathogenic elements of coronaviruses, molecular mechanisms for human pathogenesis (and identification of potential new therapies), clinical predictors of disease severity, environmental variables that affect COVID-19 disease outcomes and the prediction and prevention of future zoonotic spillovers/pandemics. For this work Dan has been awarded the 2021 Secretary of Energy’s Achievement Award and the 2020 HPCwire Top HPC-enabled Science Award.

Dan’s team was the first group to break the Exascale barrier and is happy to have done so for a biology project. Dan’s lab has continued to push the boundaries of computational science and at present, the latest 9.4 Exaops calculation is the fastest scientific calculation ever done anywhere in the world. Their first Exascale project led to this team being awarded the 2018 Gordon Bell Prize (the first ever for Systems Biology). Dan’s career as a computational systems biologist has included leadership roles in academic, corporate, NGO and national lab settings. His lab focuses on the development and subsequent application of mathematical, statistical and computational methods to biological datasets in order to yield new insights into complex biological systems. His lab’s approaches include the use of Network Theory and Topology Discovery/Clustering, Wavelet Theory, AI, and explainable-AI, together with traditional and more advanced supercomputing architectures. Areas of statistics of particular interest to his lab include the use of both frequentist (parametric and non-parametric) and Bayesian methods as well as the development of new methods for Genome-Wide Epistasis Studies (GWES). These mathematical and statistical methods are applied to various population and (meta)multiomics data sets (Genomics, Phylogenomics, Transcriptomics, Proteomics, Metabolomics, Microbiomics, Viriomics, Phytobiomics, Chemiomics, etc.) individually as well as in combination in an attempt to better understand the functional relationships as well as biosynthesis, signaling, transcriptional, translational, degradation and kinetic regulatory networks at play in biological organisms and communities. His group takes a broad view of biological complexity and evolution that stretches from viruses to microbes to plants to humans. ORNL is home to some of the world’s largest supercomputers and thus his lab uses petascale and exascale computing to analyze and model complex biological systems.

Supercomputing and Systems Biology for a One Health Framework: AI for Agriculture, Sustainability, Human Health, and Pandemic Prevention

The cost of generating biological and environmental data is dropping exponentially, resulting in increased data that has far outstripped the predictive growth in computational power from Moore’s Law. This flood of data has opened a new era of systems biology in which there are unprecedented opportunities to gain insights into complex biological systems. Integrated biological models need to capture the higher order complexity of the interactions among cellular components. Solving such complex combinatorial problems will give us extraordinary levels of understanding of biological systems. Paradoxically, understanding higher order sets of relationships among biological objects leads to a combinatorial explosion in the search space of biological data. These exponentially increasing volumes of data, combined with the desire to model more and more sophisticated sets of relationships within a cell, across an organism and up to ecosystems and, in fact, climatological scales, have led to a need for computational resources and sophisticated algorithms that can make use of such datasets. The traits or phenotypes of an organism, including its adaptation to its surrounding environment and the interactions with its microbiome, are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants regulated by and related to biotic and abiotic signals. However, the effects of these variants can be viewed as the result of historic selective pressure and current environmental as well as epigenetic interactions, and, as such, their co-occurrence can be seen as genome- and omic-wide associations in a number of different manners. We have developed supercomputing and explainable-AI approaches to find complex epistatic architectures responsible for all measurable phenotypes as well as an organism’s ability to adapt to its environment and detect and modulate its microbiome. The result is a comprehensive systems biology model of an organism and how it has adapted to and responds to its abiotic and biotic environment which has applications in agriculture, sustainability, human health and pandemic prevention.