Bayesian network learning and applications in bioinformatics
X Lin - 2012 - kuscholarworks.ku.edu
X Lin
2012•kuscholarworks.ku.eduA Bayesian network (BN) is a compact graphic representation of the probabilistic re-
lationships among a set of random variables. The advantages of the BN formalism include
its rigorous mathematical basis, the characteristics of locality both in knowl-edge
representation and during inference, and the innate way to deal with uncertainty. Over the
past decades, BNs have gained increasing interests in many areas, including bioinformatics
which studies the mathematical and computing approaches to under-stand biological …
lationships among a set of random variables. The advantages of the BN formalism include
its rigorous mathematical basis, the characteristics of locality both in knowl-edge
representation and during inference, and the innate way to deal with uncertainty. Over the
past decades, BNs have gained increasing interests in many areas, including bioinformatics
which studies the mathematical and computing approaches to under-stand biological …
Abstract
A Bayesian network (BN) is a compact graphic representation of the probabilistic re-lationships among a set of random variables. The advantages of the BN formalism include its rigorous mathematical basis, the characteristics of locality both in knowl-edge representation and during inference, and the innate way to deal with uncertainty. Over the past decades, BNs have gained increasing interests in many areas, including bioinformatics which studies the mathematical and computing approaches to under-stand biological processes. In this thesis, I develop new methods for BN structure learning with applications to bi-ological network reconstruction and assessment. The first application is to reconstruct the genetic regulatory network (GRN), where each gene is modeled as a node and an edge indicates a regulatory relationship between two genes. In this task, we are given time-series microarray gene expression measurements for tens of thousands of genes, which can be modeled as true gene expressions mixed with noise in data generation, variability of the underlying biological systems etc. We develop a novel BN structure learning algorithm for reconstructing GRNs. The second application is to develop a BN method for protein-protein interaction (PPI) assessment. PPIs are the foundation of most biological mechanisms, and the knowl-edge on PPI provides one of the most valuable resources from which annotations of genes and proteins can be discovered. Experimentally, recently-developed high-throughput technologies have been carried out to reveal protein interactions in many organisms. However, high-throughput interaction data often contain a large number of iv spurious interactions. In this thesis, I develop a novel in silico model for PPI assess-ment. Our model is based on a BN that integrates heterogeneous data sources from different organisms. The main contributions are: 1. A new concept to depict the dynamic dependence relationships among random variables, which widely exist in biological processes, such as the relationships among genes and genes' products in regulatory networks and signaling pathways. This con-cept leads to a novel algorithm for dynamic Bayesian network learning. We apply it to time-series microarray gene expression data, and discover some missing links in a well-known regulatory pathway. Those new causal relationships between genes have been found supportive evidences in literature. 2. Discovery and theoretical proof of an asymptotic property of K2 algorithm (a well-known efficient BN structure learning approach). This property has been used to identify Markov blankets (MB) in a Bayesian network, and further recover the BN structure. This hybrid algorithm is evaluated on a benchmark regulatory pathway, and obtains better results than some state-of-art Bayesian learning approaches. 3. A Bayesian network based integrative method which incorporates heterogeneous data sources from different organisms to predict protein-protein interactions (PPI) in a target organism. The framework is employed in human PPI prediction and in as-sessment of high-throughput PPI data. Furthermore, our experiments reveal some interesting biological results. 4. We introduce the learning of a TAN (Tree Augmented Naïve Bayes) based net-work, which has the computational simplicity and robustness to high-throughput PPI assessment. The empirical results show that our method outperforms naïve Bayes and a manual constructed Bayesian Network, additionally demonstrate sufficient informa-tion from model organisms can achieve high accuracy in PPI prediction.
kuscholarworks.ku.edu
Showing the best result for this search. See all results