2018 Midwest Bioinformatics Conference Agenda

WEDNESDAY, APRIL 11, 2018

MU Bond Life Sciences Center Monsanto Auditorium | 1201 Rollins St., Columbia, MO 65201

10:00 AM–10:30 AM

WELCOME AND INTRODUCTIONS

Keith Gary, PhD | Welcome

Mark Hoffman, PhD | Working in the Big Tent of Bioinformatics

Chi-Ren Shyu, PhD | 1+1>3, Regional Informatics Collaborations

10:30 AM – 11:30 AM

KEYNOTE SPEAKER

Johnny Park, PhD | Precision Ag — The Next Revolution in Agriculture to Feed 10 Billion People in 2050

Experts are predicting that the world population will reach 10 billion by 2050. In order to meet the demand of our growing population, food production needs to increase 70% by 2050. This means that we have to produce more food in the next 50 years than in the past 10,000 years combined. Precision ag—data-driven, high-resolution farm management—is the next breakthrough technology that is driving the necessary increase of food production. In this talk, I will provide an overview of the core challenges in today’s commercial food production and how various precision ag technologies are tackling those challenges. I will also share my own experience of starting and growing a precision ag startup based on technology from university research.

11:30 AM–1:00 PM

LUNCH & POSTER PRESENTATION | BOND LIFE SCIENCES CENTER

1:00 PM – 2:00 PM

DATA STRUCTURE

MODERATOR: Philip Payne, PhD, FACMI

Jeffrey Thompson, PhD | Preparing Data for Integration – a Key Step in Creating Multi-Omic Predictive Models
The last several years have seen multi-omic models grow from a promising idea to a required expertise in bioinformatics shared resources at many institutions. Frequently, the discussion around multi-omic predictive models focuses on the machine-learning approach used to make predictions – the last step in the process. However, successful data integration depends heavily on how data are prepared and combined, so that full advantage can be taken of any complementary information. Variable selection and transformation require particular consideration, to cope with a variety of issues that present special challenges for data-integrated models, including overfitting, variance inflation, interpretability, and optimization. While solutions to these problems will often be tailored to a specific scenario, this talk will touch on a few strategies for overcoming these challenges by concentrating on how multi-omic data are structured, prior to building a final predictive model.

Timothy Haithcoat | Constructing a Geospatially Enabled Health Context Cube
This presentation will describe the design and technical issues associated with the development of the Geospatial Health Context Cube. It will briefly outline the evolution of the idea for the context cube as well as its components. Design issues addressed include the development of the integrated spatial representation approach utilized for the foundation of the cube. The various complexities surrounding data collection and processing into the cube’s format as well as the geographies and primary keys assembled. Also addressed will be other design issues such as naming conventions, metadata, enabling broad adoption and use, as well as update. Technical issues addressed include those involved with big data and big tables, interaction and extraction processes to capture ‘context’ and spatial ‘relationships’, as well as structured methods for indexing, retrieval, and query. Finally, the power of the context cube will be demonstrated through complex queries to extract relevant data and information applicable to public and community health investigations in areas such as zika risk, prescription drug overdose, health equity, social determinants of health, and many others.

Dhundy Kiran Bastola | An Integrated Approach for Production of Bioactive Compounds of Potential Protective Effects Against Chronic Diseases
That the food is intimately linked to optimal health is not a novel concept, yet limited computational resources are available to study herbs as functional foods. Powered by progressive research efforts to identify properties and potential applications of natural compounds in health and wellness, public interest in alternative medicine continues to grow. Additionally, obesity continues to be a global issue, primarily in western world, and it could be prevented or minimized by proper diet and physical activities. To address the need of an integrated system to gain insights into the molecular mechanism of bioactive compounds in herbs, a computational approach has been developed using wide variety of public database. The phytochemicals present in 24 herbs studied here are the source of many traditional and contemporary medicines, providing antimicrobial compounds, immune-boosting polyphenols, analgesics and pain relievers, mood altering alkaloids, and chemotherapeutic cancer drugs. An ensemble approach for sustainable production of medicinal herbs will be discussed.

Student: Matt Spencer | Data-Driven Exploration of Autism Subgroups

Autism is a complex neurodevelopmental disorder with extensive phenotypic heterogeneity. The diversity in phenotype severity and variable presence of auxiliary phenotypes suggests a subgroup structure where many genetic factors contribute differently to the development of autism. Most previous attempts to identify and understand autism subgroups are hypothesis-driven, where researchers suggest a likely subgroup distinction and investigate traits specific to the proposed group. This limits potential for the discovery of unexpected autism subgroups. In this work, we utilize an exploratory subgroup discovery method to assess 251 criteria for defining autism subgroups based on 85 phenotype variables. Fifteen subgroups defined using this exploratory method showed a significant genetic contrast from the general autism cohort, marking these as likely candidates for meaningful autism subgroups. We explore what phenotype criteria can be added to these subgroups to further increase their genetic homogeneity. Autism subgroups defined using these highlighted phenotypes could lead to improvements in the precise prediction, diagnosis, and treatment of autism patients.

PANEL DISCUSSION

2:00 PM – 3:00 PM

DATA STANDARDIZATION AND INTEGRATION

MODERATOR: Devin Koestler, PhD

Christine Elsik, PhD | BovineMine: A Data Warehouse for the Bovine Genome Database

Efficient “omics” technologies constantly reveal novel insights into genome biology and function. To leverage the knowledge gained from genome sequencing, scientists must gather information about specific genomic elements and integrate that information with their own research data. BovineMine, the data mining resource of the Bovine Genome Database accelerates genomics analysis by enabling researchers to create and export customized datasets for use in downstream analyses. BovineMine uses the InterMine Data Warehousing platform to integrate heterogeneous data sets and allow fine-grained querying using simple and sophisticated data mining tools. Built-in query templates provide starting points for data exploration, while a QueryBuilder tool supports construction of complex queries. List Analysis and Genomic Regions search tools execute queries based on uploaded lists of identifiers and genome coordinates, respectively. BovineMine also supports meta-analyses by tracking identifiers across different genome assembly and annotation releases. Future plans for BovineMine include the incorporation of functional annotation datasets being generated by the international Functional Annotation of Animal Genomes (FAANG) consortium.

Juan Cui, PhD | Multi-omics Integration Towards Understanding Non-coding RNA Regulation in Human Disease

Rapid accumulation of large and complex omics data has presented unprecedented opportunities and challenges to researchers to study human disease in a more holistic and systems view. In this talk, I will focus on data integration in biomedical research under the context of leveraging multi-omics data in mechanistic study of non-coding RNA mediated gene regulation in human cancer and elucidating cell-cell communication via exosome-mediated RNA transfer. Particularly we investigate the dynamic and conditional properties of microRNA-gene regulation network and its association with tumor progression through statistical modeling along with integration of genomics, transcriptomics, and epigenetics data from multiple human cancers. Various data analytics, modeling, and data mining techniques will be discussed.

Shrieraam Sathyanarayanan | Performing Retrospective Phase IV Drug Trials Using EMR Data-Opportunities and Challenges
With the shift to increasing use of EHR systems in recent years, a huge amount of clinical data has become available for extensive research opportunities. To detect and evaluate long-term effects, phase IV trials are conducted after a new drug is released. Retrospective EMR-based drug studies can capture effects across an extremely large population. It is also more cost- and time-efficient compared to prospective and longitudinal research. Using the medication data available in Cerner Health Facts®, we conducted a retrospective phase IV drug trial to analyze the efficacy of drugs in treating patients with atrial fibrillation. Special care is needed when we use secondary data for any clinical research. However, handling drug related data gets even more challenging when taking into account, the factors such as the initial cohort selection, large amount of missing information, and data merging issues. Various cleaning techniques need to be performed during data extraction, preprocessing, and before modeling. This talk will showcase a drug case study along with several different data cleaning strategies we used to handle Cerner’s medication data.

Student: Lisa Neums | VaDiR: An Integrated Approach to Variant Detection in RNA

Whole genome sequencing and whole exome sequencing can be used to detect mutations, however the high costs of both techniques stand as barrier to their general use in research. Additional limitations of these techniques include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. In this research, we investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method we developed called “VaDiR: Variant Detection in RNA”. VaDiR integrates three variant callers, namely: SNPiR, RVBoost and MuTect2, which are modified so that they take RNAseq as input (MuTect2) and call somatic mutations, in addition to germline mutations (SNPiR and RVBoost). For each patient, one DNAseq germline variant call is needed for filtering out non-somatic variants. The output gives information about the variant along with by which caller it was called. For the evaluation of VaDiR, we used data collected on ovarian serous cystadenocarcinoma patients from TCGA (n = 21). The combination of all three methods produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA tumor level. We also found that the integration of those variants with variants called by MuTect2 and SNPiR alone produced the highest recall with acceptable precision. VaDiR provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis without the use of DNAseq of tumor tissue. VaDiR represents a cost-efficient approach especially when repeated measurements are collected on the same patient. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing data sets. A publicly available implementation of our methodology can be found at http://dx.doi.org/10.5524/100360

PANEL DISCUSSON

3:00 PM – 3:30 PM

BREAK

3:30 PM – 4:30 PM

DATA VISUALIZATION

MODERATOR: Shui Ye, MD, PhD

Sherwin Chan, MD, PhD | Futuristic Surgical Planning: A Role for Augmented Reality Models in Complex Congenital Heart Disease
Planning for complex congenital heart disease surgeries can be difficult with traditional two-dimensional imaging. Recently, cross sectional imaging that is acquired in 2D has been translated to 3D volumes. 3D printing has helped to further revolutionize the field by giving surgeons and trainees alike the ability to hold and manipulate a 3D structure to help them better understand the relationships in three dimensions. However, 3D prints of multiple cardiovascular structures can be complex and take significant time and resources. We describe the use of the Microsoft Hololens 3D mixed reality device to augment surgical planning for pediatric patients with complex congenital heart disease. Advantages of the Hololens vs 3D printing are: (1) you can remove and replace any structure you want from the 3D model with a gesture or click (2) the surgeon can step inside the heart and vessels to gain and intracardiac or intraluminal viewpoint (3) time and cost efficiency (4) easier digital accessibility of virtual model.

Bimal Balakrishnan, PhD |Human Performance in Healthcare Environments: Integrating VR and Motion Capture for Design Evaluation and Training
Research at the Immersive Visualization Lab (iLab) explores the potential of virtual reality and motion capture technology to design complex spaces as well as evaluate human performance in those spaces. Given the complex affordances of medical equipment and the adaptability needed for a variety of use cases, design and evaluation of healthcare spaces such as operating rooms are challenging. Over the last, we have explored new approaches to prototype and test both experiential and performance aspects of built environments using virtual reality. In recent years, we have extended this work to develop innovative methods to train medical professionals and take a data-driven approach to benchmark performance. This presentation will provide an overview of these projects that combine various research methodologies adopted from human factors, human-computer interaction and media psychology with creative practices drawn from architectural design.

K. Palaniappan |High-Throughput Image Analytics and Visualization for Neuroscience and Neurobiology

The zebrafish animal model is widely used to study development of tissues and organs including the brain and behavior. The branchiomotor neurons are part of the brain circuitry that controls jaw movements in vertebrates involved in aerodigestive (breathing and swallowing) kinematics. We compare different perturbations to the branchiomotor neurons using video analytics by high-throughput automated measurement of gape phenotype behavior. A high-throughput image analytics system was developed to extract deformable motion features using video microscopy, automated measurement of jaw movement termed gape or mouth opening. The approach combines multiple local deformable motion estimation models into a likelihood jaw movement trace curve with peak detection to identify gape activity. Automated activity phenotype analytics and neuronal mapping or connectomics opens the door for a range of high throughput studies in neuroscience and neurobiology. Several additional examples of behavior analysis and structural mapping to understand neurological disorders and neurophenotyping will be discussed.

Student: Yuexu Jiang| A Knowledge-Based Framework for Pathway Analysis and Hypothesis Generation

As large amounts of multi-omics data continue to be generated rapidly, a major focus of bioinformatics research has been towards integrating these data to identify active pathways or modules under certain experimental conditions or phenotypes. Although biologically significant modules can often be detected globally by many existing methods, it is often hard to interpret or make use of the results towards in silico hypothesis generation and testing. To address this gap, we have developed the IMPRes algorithm, a step-wise and innovative active pathway detection method using a dynamic programming approach. We take advantage of the existing pathway interaction knowledge in KEGG. Multiple omics data are then used to assign penalties to genes, interactions and pathways. Finally, starting from one or multiple seed genes, a shortest path algorithm is applied to detect downstream pathways that best explain the gene expression data. Since dynamic programming enables the detection one step at a time, it is easy for researchers to trace the pathways, which may lead to more accurate drug design and more effective treatment strategies. The evaluation experiments conducted on two yeast datasets have shown that IMPRes can achieve competitive or better performance than other state-of-art methods. Furthermore, a case study on human lung cancer dataset was performed and we provided several insights on genes and mechanisms involved in lung cancer, which have not been discovered yet. IMPRes algorithm and visualization tool are available via a web server at http://digbio.missouri.edu/impres.

PANEL DISCUSSON

4:30 PM – 5:30 PM

SYSTEMS STRATEGIES

MODERATOR: Mark Hoffman, PhD

Ashiq Masood, MD | Precision Oncology – Time is Right
Tailoring each patient’s treatment based on their tumor genomic profile is the crux of precision oncology. One of the roadblocks to delivering effective precision care is understanding the complexity of genomics data reported back to the physicians. Most, if not all, clinical oncology fellowship programs have not incorporated cancer genomics and bioinformatics into their teaching curriculum. Surveys have also shown oncologists are unsure how to interpret cancer genomics data. Multidisciplinary Molecular Tumor Boards has a potential to address this disparity. However, very few centers have the expertise to provide this service. At the Saint Luke’s Center for Precision Oncology, we have established a team of medical oncologists, surgeons, computational biologists, radiation oncologists, researchers, geneticists, and molecular pathologists. This team has the expertise in clinical oncology, tumor genomics, and computational biology which allows them to analyze the genomics data and provide individualized treatment recommendations for patients with advanced cancers. We will describe our precision oncology program and provide real-world examples on how it helps us understand the disease process and provide the right treatment to the patient.

Denise Baker, PhD | Is My Fitbit Stealing My Free Will?
If your Fitbit includes a heart rate monitor, the answer could be yes. Self-quantification technologies, often referred to as “wearables,” are marketed as a means to monitor and improve health, modify behavior, and reduce medical costs. Surprisingly, the direct impact these technologies have on user decision making, attitude formation, and behavior has not been well researched. I will discuss current research that suggests that reminding users about internal states of the body –such as reading one’s heart rate on a wearable –reduces beliefs in free will (BFW) by acting as a reminder of the biological nature of body. This is an important discovery considering that reduced BFW can have numerous negative impacts on individual behaviors such as increased anti-social, increased conformity, and decreased gratefulness. How can the design of wearables technologies harness the promised benefits while avoiding some of these potentials risks?

Jerry Parker, PhD | Research Data Networks: New Opportunities for Collaboration

Over the past 15 years, electronic health records (EHRs) have become nearly universal in healthcare settings, and they have generated massive amounts of health-related data. Individually, EHRs are extremely valuable, but they emerge as even more powerful tools when data can be shared across institutions for research purposes. Effective data sharing for research collaborations requires advanced informatics technology, data use agreements, and reliance IRB agreements; data sharing networks also benefit from active patient-centered engagement. In their mature form, data sharing networks greatly expand research opportunities, promote inter-institutional collaboration, and facilitate novel methodologies, including pragmatic clinical trials.

PANEL DISCUSSION

5:30 PM – 8:00PM

COCKTAILS, NETWORKING, & DINNER KEYNOTE SPEAKER| Bond Life Sciences Center

Walter Gassman, PhD | Moderator

Blake Meyers, PhD | From Small RNAs to Big Data: a Transformation Driven by Bioinformatics
The research in my lab is focused on plant small RNAs, and large-scale approaches, namely sequencing, have been a mainstay of our work for at least 15 years. In the last decade, we’ve worked in microRNA identification in a wide variety of species, the biogenesis of microRNAs and siRNAs, and the evolution of small RNAs. Most recently, we’ve been investigating the biogenesis, roles, and evolutionary diversification of pathways that give rise to “secondary” small interfering RNAs (siRNAs). Interestingly, the functions for these small RNAs in both plants and animals remain poorly characterized. From the earliest days of my lab’s work in this field, bioinformatics has been a key part of our work. In order to support analyses of the biological roles of small RNAs, their evolution, and their biogenesis, my lab has developed databases, analysis tools (stand-alone and web-based apps), and customized visualization tools. I will describe our ongoing analyses of plant small RNAs, the implications and applications of this work, and the computational methods that we are applying to their analysis – both large- and small-scale.

THURSDAY, APRIL 12, 2018 | Monsanto Auditorium

7:15 AM–7:45 AM

BREAKFAST

7:45 AM–8:00 AM

WELCOME & INTRODUCTIONS

Keith Gary, PhD | Thank Sponsors and Volunteers

Mark Hoffman, PhD | Welcome

8:00 AM – 9:00 AM

KEYNOTE SPEAKER

Marylyn D. Ritchie, PhD | Machine Learning Strategies in the Genome and the Phenome – Toward a Better Understanding of Complex Traits

Modern technology has enabled massive data generation; however, tools and software to work with these data in effective ways are limited. Genome science, in particular, has advanced at a tremendous pace during recent years with dramatic innovations in molecular data generation technology, data collection, and a paradigm shift from single lab science to large, collaborative network/consortia science. Still, the techniques to analyze these data to extract maximal information have not kept pace. Comprehensive collections of phenotypic data can be used in more integrated ways to better subset or stratify patients based on the totality of his or her health information. Similar, the availability of multi-omics data continues to increase. With the complexity of the networks of biological systems, the likelihood that every patient with a given disease has exactly the same underlying genetic architecture is unlikely. Success in understanding the architecture of complex traits will require a multi-pronged approach. Through applying machine learning to the rich phenotypic data of the EHR, these data can be mined to identify new and interesting patterns of disease expression and relationships. Machine learning strategies can also be used for meta-dimensional analysis of multiple omics datasets. We have been exploring machine learning technologies for evaluating both the phenomic and genomic landscape to improve our understanding of complex traits. These techniques show great promise for the future of precision medicine.

9:00 AM – 10:00 AM

DATA ANALYSIS

MODERATOR: Trupti Joshi, MBBS, ADB, MS, PhD

Warren A. Cheung, PhD | Scaling Single Cell Genomics for Routine Comprehensive Genomic Profiling of Disease
Single-cell sequencing technologies such as the 10X Genomics Chromium platform provide the unique opportunity to investigate the myriad of gene expression patterns of different cellular lineages in heterogeneous mixture tissues such as blood. We are developing an integrated, cost-effective pipeline to optimize the throughput of this technology. By simultaneously profiling multiple individuals and multiple cell types, we maximize the potential of each single-cell sequencing run. Leveraging the power of multiplexing multiple individuals, we use their unique genotypes to provide an innate quality control metric of the single cell genomics sequencing performance. We examine the interaction of the number of individuals we can profile simultaneously, the optimum number of cells as starting material as well as the ability to quantitatively resolve specific cell subpopulations. The overarching goal is to deliver a solution scalable to routine single cell genomics from clinical specimens, in conjunction with genome sequencing, for comprehensive genomic profiling of disease.

Jianlin Jack Cheng, PhD | 3D Genome Structure Modeling
3D genome structure is known to play an important role in various biological function such as gene regulation, genome methylation, DNA repair, DNA replication, cell cycle and cell development, but has not been studied as well as 1D genome sequence. The development of chromosomal conformation capturing techniques has made it possible to computationally model the 3D structure of genome, which leads to the creation of a new booming area of bioinformatics – 3D genomics. In this presentation, I will present an overview of several of our methods for reliably reconstructing 3D genome structures from chromosomal conformation capturing data, which were developed during the last several years. The 3D genome modeling reveals or confirms some structural features of human chromosomes and genome such as two-compartment partition, chromosomal loops, chromosomal territories and center-periphery chromosome organization. Moreover, 3D genome modeling provides a unique 3D perspective for visualizing 3D shape of genome and integrating multiple sources of biological data such as gene expression and gene function data.

Julie Banderas, PharmD |The NERD (Neuroscience Education in Research Datasets) Project

The push for expanded opportunities for medical student research training and experience comes from many levels. Some of the challenges in meeting these demands include “overloaded” medical school curricula, the limited number of experienced research mentors and projects, and the financial and time costs associated with training students for relatively short-term research experiences. The NERD Project was implemented in the Medical Neuroscience course at UMKC School of Medicine to address these challenges. Ten neuroscience-focused datasets were created from the de-identified, HIPAA compliant Cerner Health Facts dataset. About 100 third-year medical students, worked in groups of 3 to 4, received the clinical datasets, and consulted with an assigned clinical faculty mentor and biostatistician. Infusing this research experience into an existing course made it possible to address new research and informatics learning outcomes in the curriculum without displacing others. Leveraging Health Facts EMR data made it possible to provide a foundational and practice-based research learning experience for all students and early in their curriculum. Students who want to pursue additional research opportunities will be better prepared to work with research mentors.

Sharlee Climer, PhD | Identifying Combinations of Genetic Markers Associated with Complex Diseases

Most complex diseases of interest arise due to environmental factors coupled with multiple genetic factors. Traditional genome-wide association studies examine one marker at a time and have thus far revealed only a small portion of the expected heritability for many of these diseases, despite the vast quantities of data that have been collected. My research focuses on the development of computational tools to identify combinations of markers that are associated with complex traits and this presentation will highlight algorithms based on strategies borrowed from Artificial Intelligence and Operations Research.

PANEL DISCUSSION

10:00 AM – 10:30 AM

BREAK

10:30 AM–11:30 AM

IMAGE ANALYSIS

MODERATOR: Noah Fahlgren, PhD

Zhu Li, PhD | Rate Agnostic Visual Content Re-Identification
One of the key applications in image analysis is to re-identify visual content segments against a very large repository, with applications in transmission de-duplications in networks, and redundancy removal from network caches and online repositories. In this work we utilize the deep learning pipeline generated feature map and coupled with a novel Fisher Vector aggregation and hash design scheme, to come up with robust content representation and indexing solution, to support very fast and accurate content re-identification against large repositories.

Plamen Doynov, PhD | Image Processing for Skin Cancer Monitoring and Evaluation

Skin cancer is a serious public health problem because of the increasing incidence. It is often a tragedy due to the related mortality. Early detection and diagnosis is essential for the survival rate of the patient. Several non-invasive imaging techniques have been developed to aid the screening process. Currently, there is a great interest in the application of automatic image processing and analysis for monitoring, early detection, and quantitative clinical relevance evaluation. In this presentation, modern methods for segmentation and features extraction from digital images of melanoma skin lesions is reviewed. Novel algorithm for border irregularity measurements and changes is demonstrated. The use of artificial intelligence and machine learning based on neural networks for assisted diagnosis is presented.

Jon Marsh, PhD | Glomeruli Detection and Classification in Whole Slide Images Using Deep Neural Networks

Rapid and accurate evaluation of donor kidney biopsies is essential for optimal graft selection. Given the limited supply of deceased donor organs, it is critical to accurately assess organ viability prior to transplantation to minimize organ discard and to appropriately allocate organs to acceptable recipients. The criteria for rejecting these specimens rely heavily on selected key histologic features. For donor kidney biopsies, percent global glomerulosclerosis is a critical feature that correlates with outcome. However, observer variability in biopsy evaluation is distressingly large, leading to unnecessarily high rates of organ discard. Such variability may be heightened in the time-sensitive context of daily practice, when logistical constraints necessitate the use of suboptimal frozen H&E sections which are often read by nonspecialist pathologists at odd hours. There is thus a need for new objective techniques to assist pathologists with biopsy interpretation. Convolutional neural networks (CNNs) have been increasingly applied in various areas of medicine in recent years, including detection of tumor cells on histologic sections. Here, we sought to evaluate the performance of a CNN derived from a pre-trained network applied to the intractable problem of detection and classification of sclerotic and non-sclerotic glomeruli in frozen wedge donor biopsy whole slide images to improve accuracy and efficiency of pre-transplant donor kidney evaluation.

Student: Shivani Sivasankar | Application of Generalized Linear Mixed Model to Analyze National EHR Data to Evaluate Inappropriate HbA1c Orders for Sickle-Cell Disease Patients

Analyses of electronic health record (EHR) such as, the HIPAA compliant, de-identified data in the Cerner Health Facts ™ (HF) data warehouse is challenging as it contains valuable yet heterogenous and complex data. As it is not possible to fully control the circumstances under which the measurements are taken in the EHR, there may be considerable variation among patients in the number, timing and environment of the observation. In order to address these challenges, we developed the generalized linear mixed model (GLMM) to account for the random effects inherent within this diverse dataset. This project describes the GLMM methodology and illustrates its value in evaluating the frequency of inappropriate HbA1c orders in Sickle-Cell disease patients in Truman Medical Centers (TMC) compared to the other 393 national hospitals. Data collected from 2010 to 2016 were included in the analysis. The percent of HbA1c encountered sickle-cell patient is the main outcome variable. The proc GLIMMIX procedure in SAS was used with the logit link function for proportion outcome. The proposed GLMM offered some advantages. First, the predicted probability score calculated for each hospital can account for the random effects and removed the bias due to covariates such as hospital level demographic information and the number of sickle-cell disease patients. Second, GLMMs were able to model the longitudinal structure of the data. Ranking the conditional probabilities of all the hospitals show that TMC ranks in the bottom 25% quartile when compared to the other national hospitals with respect to inappropriate HbA1c orders. These findings indicate that inappropriate HbA1c orders in sickle-cell patients is a potential quality concern in TMC. The analytic pipeline developed here using GLMM model can be easily adopted to different measurements of outcome variables and address different research questions from the Health Facts data source.

PANEL DISCUSSION

11:30 AM–12:30 PM

DATA APPLICATION

MODERATOR: Donna Buchanan, PhD

Susan Abdel-Rahman, PharmD | Pediatric Precision Therapeutics
The premise underlying precision therapeutics is that not everyone given the same medicine at the same dose will have the same response. The promise of precision therapeutics is that we can leverage a patient’s specific information including biology, pathology, genetics, environment, and lifestyle to accurately treat their disease. The challenge, however, is that the knowledge (when available) and the skills to implement this knowledge differ from clinician to clinician. One solution that can assist in overcoming this challenge is clinical decision support. Tools designed with, and for use by, the clinician to assist with compilation and computation translating data into solutions.

Randi Foraker, PhD MA FAHA| Shared Decision-Making and the EHR
Extant electronic health record (EHR)-based clinical decision support (CDS) tools target individual risk factors, treatments, or diseases. None integrate multiple, complementary health behaviors and factors at the point-of-care to enable shared decision-making. We developed and tested an EHR-based cardiovascular health CDS tool in the primary care setting, which allowed patients and providers to quickly assess and efficiently address a variety of health behaviors and factors. Subsequent to the successful implementation of the tool and its positive effect on cardiovascular health outcomes, we adapted the tool to be used in cancer survivorship settings. In the refined CDS tool, cardiovascular risk factor data are presented alongside relevant information about potentially cardiotoxic cancer treatments. This talk will focus on the potential for interactive, workflow-aware CDS tools to enhance shared decision-making between patients and providers

Blake Meyers, PhD | Understanding Reproductive Small RNAs in Grasses for the Improved Productive of Hybrid Seeds
Plant secondary siRNAs function in a wide variety of pathways, including developmental control, defense gene regulation, and reproductive biology, particularly pollen development. In 2015, with the Walbot lab (Stanford), we described the temporal and spatial distribution of two sets of these “phasiRNAs”; we showed that they are extraordinarily enriched in the male germline of the grasses and are dependent on distinct cell layers. These phasiRNAs comprise the 21-nt (pre-meiotic) and 24-nt (meiotic) siRNAs. These phased siRNAs show striking similarity to mammalian “piRNAs” in terms of their abundance, distribution, distinct stage, and timing of accumulation, but have independent evolutionary origins. In the last two years, we have demonstrated that these reproductive phasiRNAs are required for full male fertility in plants. Perturbation of these phasiRNAs can alter pollen development, yielding phenotypes that are useful for the production of hybrid plants.

Praveen Rao, PhD | Blockchain and Potential Applications in Digital Pathology
Blockchain is a disruptive technology for managing decentralized transactions over data in a secure way. It is essentially a distributed ledger that combines techniques from peer-to-peer (P2P) systems and cryptography. In this talk, I will first provide a brief introduction of the blockchain technology. Then I will discuss some potential applications in digital pathology, specifically with respect to whole slide imaging and image analytics using deep learning.

LUNCH SESSION: REGIONAL COLLABORATION PANEL DISCUSSION

Regional institutions provide great informatics and data science talent, as well as offer resources for research and training. To learn more about the unique resources at these institutions, the 2018 Midwest Bioinformatics Conference features a panel devoted to “Leveraging Regional Collaboration with 1+1 >3.

The panel will share what training programs and research initiatives are at each institution, as well as share the ideas and experiences about existing regional collaborations for informatics activities. The outcomes of the panel will provide a written summary for potential actionable plans to target large-scale regional informatics training programs and research center for excellence grants through NSF/NIH/DoE.

12:30 PM–2:00 PM

MODERATOR: Chi-Ren Shyu, PhD

Mark Hoffman, PhD

Keith Gary, PhD

Jeffrey Thompson, PhD

Michael Goldwasser, PhD

Shui Qing Ye, MD, PhD

PANEL DISCUSSION

2018 Midwest Bioinformatics Conference Agenda

WEDNESDAY, APRIL 11, 2018

10:00 AM–10:30 AM

WELCOME AND INTRODUCTIONS

10:30 AM – 11:30 AM

KEYNOTE SPEAKER

11:30 AM–1:00 PM

LUNCH & POSTER PRESENTATION | BOND LIFE SCIENCES CENTER

1:00 PM – 2:00 PM

DATA STRUCTURE

2:00 PM – 3:00 PM

DATA STANDARDIZATION AND INTEGRATION

3:00 PM – 3:30 PM

BREAK

3:30 PM – 4:30 PM

DATA VISUALIZATION

4:30 PM – 5:30 PM

SYSTEMS STRATEGIES

5:30 PM – 8:00PM

COCKTAILS, NETWORKING, & DINNER KEYNOTE SPEAKER| Bond Life Sciences Center

THURSDAY, APRIL 12, 2018 | Monsanto Auditorium

7:15 AM–7:45 AM

BREAKFAST

7:45 AM–8:00 AM

WELCOME & INTRODUCTIONS

8:00 AM – 9:00 AM

KEYNOTE SPEAKER

9:00 AM – 10:00 AM

DATA ANALYSIS

10:00 AM – 10:30 AM

BREAK

10:30 AM–11:30 AM

IMAGE ANALYSIS

11:30 AM–12:30 PM

DATA APPLICATION

LUNCH SESSION: REGIONAL COLLABORATION PANEL DISCUSSION

12:30 PM–2:00 PM

2:00 PM – 3:30 PM

MOCK INTERVIEWS