This is the implementation of the following paper:
Arnab Bhattacharyya, Davin Choo, Rishikesh Gajjala, Sutanu Gayen, and Wang Yuhao "Learning Sparse Fixed-Structure Gaussian Bayesian Networks." arXiv preprint arXiv:2107.10450 (2021).
Gaussian Bayesian networks (a.k.a. linear Gaussian structural equation models) are widely used to model causal interactions among continuous variables. In this work, we study the problem of learning a fixed-structure Gaussian Bayesian network up to a bounded error in total variation distance. We analyze the commonly used node-wise least squares regression LeastSquares and prove that it has the near-optimal sample complexity. We also study a couple of new algorithms for the problem:
- BatchAvgLeastSquares takes the average of several batches of least squares solutions at each node, so that one can interpolate between the batch size and the number of batches. We show that BatchAvgLeastSquares also has near-optimal sample complexity.
- CauchyEst takes the median of solutions to several batches of linear systems at each node. We show that the algorithm specialized to polytrees, CauchyEstTree, has near-optimal sample complexity.
Experimentally, we show that for uncontaminated, realizable data, the LeastSquares algorithm performs best, but in the presence of contamination or DAG misspecification, CauchyEst, CauchyEstTree and BatchAvgLeastSquares respectively perform better.
- Python 3.6+
networkx
argpase
itertools
numpy
scipy
sklearn
matplotlib
torch
: Optional.
- R 4.0.0
rpy2
: Python interface, enables calling R from Python. Install rpy2 firstbnlearn
: Bayesian network learning and inferenceglasso
: Graphical Lasso: Estimation of Gaussian Graphical Modelsflare
: Family of Lasso Regression
- Data - Real Bayesian network data from bnlearn
data.py
- synthetic DAG data, including graph simulation and data simulation. Load real Bnlearn dataevl.py
- algorithm accuracy evaluation based on KL-divergenceconfig.py
- Set parameters for Bayesian network (eg. node number, graph degree, sample size)methods.py
- the implementation of all algorithmsutils.R
- load bnlearn graph; Run CLIME algorithmmain.py
- Run some examples of our algorithm
Parameter | Type | Description | Options |
---|---|---|---|
n |
int | number of nodes | - |
s |
int | number of samples | - |
d |
int | average degree of node | - |
ill |
int | number of ill conditioned nodes | - |
batch |
int | number of batch size | - |
load |
str | load synthetic or real data | syn , real |
choice |
str | choice which real data | ecoli70 , magic-niab , magic-irri , arth150 |
tg |
str | type of a graph | chain , er , sf , rt |
tn |
str | type of noise | ev , uv , ca , ill , exp , gum |
The simplest way to try out CauchyEst is to run a simple example:
$ git clone https://github.com/YohannaWANG/CauchyEst.git
$ cd CauchyEst/
$ python CauchyEst/main.py
Alternatively, if you have a CSV data file X.csv
, you can install the package and run the algorithm as a command:
$ pip install git+git://github.com/YohannaWANG/CauchyEst
$ cd CauchyEst
$ python main.py --n 100 --d 5 --tg 'er' --tn 'uv'
Algorithm 1 states our two-phased recovery approach. We estimate the coefficients of the Bayesian network in the first phase and use them to recover the variances in the second phase.
Algorithm 2 is the variance recovery algorithm given coefficient estimates.
Algorithm 3 is recovering the coefficients in a Bayesian network using a linear least squares estimator.
Algorithm 4 is a generalization of the least square algorithm.
Algorithm 5 is a coefficient recovery algorithm based on Cauchy random variables (for variable with p parents).
Algorithm 6 is a coefficient recovery algorithm based on Cauchy random variables (for general Bayesian networks).
Algorithm 7 works for the special case of polytree Bayesian networks.
100 nodes, degree 5, ER graph | Noisy data(5% noisy sample, 5/100 noisy nodes), d=5, ER graph |
---|---|
![]() |
![]() |
If you use any part of this code in your research or any engineering project, please cite our paper:
@article{bhattacharyya2021learning,
title={Learning Sparse Fixed-Structure Gaussian Bayesian Networks},
author={Bhattacharyya, Arnab and Choo, Davin and Gajjala, Rishikesh and Gayen, Sutanu and Wang, Yuhao},
journal={arXiv preprint arXiv:2107.10450},
year={2021}
}
Please feel free to contact us if you meet any problem when using this code. We are glad to hear other advise and update our work.
We are also open to collaboration if you think that you are working on a problem that we might be interested in it.
Please do not hestitate to contact us!