Abstracts
AI Expo Abstracts
SESSION 1:
Author List: Edmon Begoli, Ben Mayer, Kris Brown
Abstract: The nature of unstructured medical text poses interesting research opportunities for the AI researcher. This text is loosely structured and widely variable, making concept extraction notoriously difficult. In the mental health space, concepts are loosely defined and there are often no 1-to-1 mappings between concepts and text. Widely-used contextualized language models such as BERT are known to be highly effective on non-medical text. Our research aims to apply these models to detecting mental health disorders and risk factors associated with suicide and opioid abuse by pre-training these models on clinical text to get insight into how these concepts are expressed in an unstructured way.
Author List: Jared Smith (ORNL CADA within NSSD), Rachel Petrik (ORNL/U. Kentucky), and Berat Arik (ORNL/Georgia Tech)
Short Abstract:
Published at the flagship academic computer security conference, ACM CCS, this work takes a fundamentally different approach to analyzing potentially compromised devices for latent malware. Our multi-modal detection approach, termed Cincinnatus for the nickname of George Washington, the first independent president, is operating-system and instruction-architecture independent. The system relies only on raw binary data extracted from device volatile memory, without domain expert input needed for feature engineering. Cincinnatus is evaluated using a 700+ TB dataset of infected device memory snapshots and the first published dataset of benign device memory snapshots running normal, non-malicious software built for this specific evaluation. After an average of 35 seconds of pre-processing and feature extraction on a single 1 GB memory snapshot, Cincinnatus employs traditional machine learning algorithms, using statistical and numerical features of the raw bytes from memory, and deep learning algorithms, using visual and sequential representations of the bytes, to predict compromise without expert input into the feature construction. Across the evaluated ensemble of traditional and deep neural network models, Cincinnatus achieves on average 98% accuracy, 0.01% false positive rate, and takes less than 1 second per prediction after preprocessing device memory.
-
3. Title: Ground State Electronic Calculations with Restricted Boltzmann Machines using Quantum Annealing
Authors: Sindhu Tipirneni and Alex McCaskey
This work proposes a hybrid solution that uses classical machine learning techniques and quantum annealing to solve the many-body problem in quantum physics. The exponential complexity of the many-body wave function is handled through a variational approach that leverages a Restricted Boltzmann Machine to represent the wave function. Since computing the probability distribution associated with an RBM is NP-hard, we exploit the nature of quantum annealers for this purpose. The ground state is then learned by using optimization techniques to minimize the energy associated with the many-body Hamiltonian. This approach is scalable in terms of number of qubits and could make computations on larger systems feasible.
-
4. Title: An artificial neural network to estimate rainfall rates from satellite imagery and lightning strike
Speaker: Valentine (Val) Anantharaj
Abstract: Machine learning and artificial intelligence (MLAI) methods help incorporate disparate data that may be surrogates for different physical processes. We used an artificial neural network (ANN) to derive rainfall estimates from near-infrared data from a weather satellite. The ANN was trained with ground truth data from ground-based weather radar network operated by the NOAA National Weather Service. The radar data represent the amount of rainfall accumulated in an area over an hour. It is an integral quantity whereas the satellite data are instantaneous images from the sensor, obtained every 30 minutes. The typical life of a cloud is in the order of minutes. This mismatch introduces uncertainties in the estimates due to less than optimal sampling. We then incorporated lightning data, as a surrogate for cloud microphysical processes, from a ground-based network that allowed the MLAI methodology to extract more meaningful information, and thereby improve the accuracy of the hourly rainfall estimates. We are exploring ways to extend this methodology further.
Authors: Yilu Liu and Lingwei
Electrical and Electronics Systems Research Division, Oak Ridge National Laboratory
Abstract: The advances of artificial intelligence (AI) technologies provide unprecedented opportunities to boost capabilities to monitor, predict, and assess the security of the nation’s power grids. This talk will demonstrate a use case developed by ORNL and UTK on applying AI to improve the nation’s utility infrastructures. This use case will show AI can help accelerate power system stability evaluation to reduce the computational burden in real-time power system risk assessment. With AI technologies, operators can have better understanding of the power system status and potential risks, and many decision support tools can be developed to improve the system’s reliability and capability to operate at higher renewable penetration conditions.
Authors: Amir Koushyar Ziabari*, Michael Kirka^, Ryan Dehoff^, Vincent Paquit* , Philip Bingham*, and Singanallur Venkatakrishnan*
Abstract. X-ray computed tomography (XCT) plays a critical role for rapid non-destructive evaluation (NDE) of additively manufactured (AM) parts, and in turn advancing our understanding of the impact of various process parameters and qualifying the final quality of the built parts. Performing low noise and high-resolution reconstruction of 3D images of built part is a time-consuming task, which prohibits rapid inspection of a large number of parts that may be produced on a given day. Here, we present a novel AI algorithm to rapidly obtain high quality CT reconstructions for AM parts. We present several results for real case studies. We further make the case for using the CAD models to advance the capabilities of our technique beyond the state-of-the-art.
SESSION 3:
Authors: Nouamane Laanait, Junqi Yin, Joshua Romero, M. Todd Young, Vitalii Starchenko, Sean Treichler, Albina Borisevich, Alexander Sergeev, Michael Matheson
Distributed training of deep learning models becomes increasingly important as the data grows in both observational and simulational sciences. The current largest scale of data-parallel training is enabled largely by the Horovod framework. In this talk, we present a near-linear scaling efficiency (0.93) up to full Summit by introducing novel communication strategies in synchronous distributed training of a fully convolutional DenseNets (Tiramisu). The improvement in communication is generic and can be applied to most deep learning applications on Summit.
Authors: Travis Johnston (presenter), Steven Young, Katie Schuman, Robert Patton, and Tom Potok
Abstract: Since Summit–the worlds fastest and smartest supercomputer–has come on-line there has been a massive push for large scale AI that can leverage Summit’s unique capabilities. One of the challenges for AI researchers who are relatively new to HPC is how to communicate their research success with the broader HPC community whose stock-in-trade is FLOPs (floating point operations per second). This particular metric is rarely meaningful to the AI community; hence a disconnect. In this talk we provide examples of better metrics for AI based off recent experience which lead the authors to be Gordon Bell finalists.
Authors : Daniel Nichols, Stan Tomov, Kwai Wong, University of Tennessee
Abstract: MagmaDNN is a deep learning framework driven using the highly optimized MAGMA dense linear algebra package. The library offers comparable performance to other popular frameworks, such as TensorFlow, PyTorch, and Theano. C++ is used to implement the framework providing fast memory operations, direct cuda access, and compile time errors. Common neural network layers such as Fully Connected, Convolutional, Pooling, Flatten, and Dropout are included. MagmaDNN uses several techniques to accelerate network training. For instance, convolutions are performed using the Winograd algorithm and FFTs. Other techniques include MagmaDNNs custom memory manager, which is used to reduce expensive memory transfers, and accelerated training by distributing batches across GPU nodes.
Authors: Guannan Zhang (CSMD), Jiaxin Zhang (NCCS), and Jacob Hinkle (CSED)
Abstract: We developed a Nonlinear Level-set Learning (NLL) method for dimensionality reduction in high-dimensional function approximation with small data. This work is motivated by a variety of design tasks in real-world engineering applications, where practitioners would replace their computationally intensive physical models (e.g., high-resolution fluid simulators) with fast-to-evaluate predictive machine learning models, so as to accelerate the engineering design processes. There are two major challenges in constructing such predictive models: (a) high-dimensional inputs (e.g., many independent design parameters) and (b) small training data, generated by running extremely time-consuming simulations. Thus, reducing the input dimension is critical to alleviate the over-fitting issue caused by data insufficiency. Existing methods, including sliced inverse regression and active subspace approaches, reduce the input dimension by learning a linear coordinate transformation; our main contribution is to extend the transformation approach to a nonlinear regime. Specifically, we exploit reversible networks (RevNets) to learn nonlinear level sets of a high-dimensional function and parameterize its level sets in low-dimensional spaces. A new loss function was designed to utilize samples of the target functions’ gradient to encourage the transformed function to be sensitive to only a few transformed coordinates. The NLL approach is demonstrated by applying it to three 2D functions and two 20D functions for showing the improved approximation accuracy with the use of nonlinear transformation, as well as to an 8D composite material design problem for optimizing the buckling-resistance performance of composite shells of rocket inter-stages.
-
11. Title: Mathematical innovations for enabling new frontiers in deep learning and artificial intelligence
Author and Presenter: Clayton G. Webster
Joe Daws
(University of Tennessee) Anton Dereventsov
(Oak Ridge National Laboratory) Armenak Petrosyan
(Oak Ridge National Laboratory) Viktor Resniak
(Oak Ridge National Laboratory) Xuping Xing
(Oak Ridge National Laboratory)
Abstract
In this talk we will present several recent mathematical innovations that enable new computational approaches for constructing and training deep learning algorithms used in artificial intelligence. In particular, we will survey our new machine learning con- tributions, aimed at tacking a variety of challenging problems in scientific imaging, high-dimensional approximations, and reduced representations of complex systems, including:
– greedy networks that nullify the need for training and/or calibrating based on back-propagation;
– robust learning with implicit residual networks;
– polynomial-based techniques for architectural design and learning with deep neural networks;
– learning reduced basis approximations via the natural greedy algorithm; and
– analytic continuation of noisy data using multistep deep networks.
Approximation theoretic results justify the advantage of our proposed approaches and multiple numerical experiments demonstrate their tenability and benefits compared to classical techniques for training and constructing deep networks.
Authors: Dan Lu and Daniel Ricciuto
Abstract:
Neural networks (NNs) have been widely used in surrogate modeling to reduce forward model evaluation time. Building a surrogate for a large-scale problem with many model responses requires a complex NN, which increases AI model parameter and structural uncertainty. This work used singular value decomposition method to reduce model responses dimensions, which simplifies NN architecture and thus reduces NN model parameter uncertainty. In addition, it used Bayesian optimization to optimize the NN hyperparameters, which produces a best-performing NN model and thus reduces model structural uncertainty. This simple and optimized NN architecture enables only 20 training data to produce accurate predictions otherwise 200 data are needed for the similar accuracy. Moreover, the resulted simple NN limits overfitting which is a fundamental problem of all data-centric methods