Learning Sparse Fixed-Structure Gaussian Bayesian Networks

This is the implementation of the following paper:

Arnab Bhattacharyya, Davin Choo, Rishikesh Gajjala, Sutanu Gayen, and Wang Yuhao "Learning Sparse Fixed-Structure Gaussian Bayesian Networks." arXiv preprint arXiv:2107.10450 (2021).

Introduction

Gaussian Bayesian networks (a.k.a. linear Gaussian structural equation models) are widely used to model causal interactions among continuous variables. In this work, we study the problem of learning a fixed-structure Gaussian Bayesian network up to a bounded error in total variation distance. We analyze the commonly used node-wise least squares regression LeastSquares and prove that it has the near-optimal sample complexity. We also study a couple of new algorithms for the problem:

BatchAvgLeastSquares takes the average of several batches of least squares solutions at each node, so that one can interpolate between the batch size and the number of batches. We show that BatchAvgLeastSquares also has near-optimal sample complexity.
CauchyEst takes the median of solutions to several batches of linear systems at each node. We show that the algorithm specialized to polytrees, CauchyEstTree, has near-optimal sample complexity.

Experimentally, we show that for uncontaminated, realizable data, the LeastSquares algorithm performs best, but in the presence of contamination or DAG misspecification, CauchyEst, CauchyEstTree and BatchAvgLeastSquares respectively perform better.

Example

Prerequisites

Python 3.6+
- networkx
- argpase
- itertools
- numpy
- scipy
- sklearn
- matplotlib
- torch: Optional.
R 4.0.0
- rpy2: Python interface, enables calling R from Python. Install rpy2 first
- bnlearn : Bayesian network learning and inference
- glasso : Graphical Lasso: Estimation of Gaussian Graphical Models
- flare: Family of Lasso Regression

Data - Real Bayesian network data from bnlearn
data.py - synthetic DAG data, including graph simulation and data simulation. Load real Bnlearn data
evl.py - algorithm accuracy evaluation based on KL-divergence
config.py - Set parameters for Bayesian network (eg. node number, graph degree, sample size)
methods.py - the implementation of all algorithms
utils.R - load bnlearn graph; Run CLIME algorithm
main.py - Run some examples of our algorithm

Parameters

Parameter	Type	Description	Options
`n`	int	number of nodes	-
`s`	int	number of samples	-
`d`	int	average degree of node	-
`ill`	int	number of ill conditioned nodes	-
`batch`	int	number of batch size	-
`load`	str	load synthetic or real data	`syn`, `real`
`choice`	str	choice which real data	`ecoli70`, `magic-niab`, `magic-irri`, `arth150`
`tg`	str	type of a graph	`chain`, `er`, `sf`, `rt`
`tn`	str	type of noise	`ev`, `uv`, `ca`, `ill`, `exp`, `gum`

Running a simple demo

The simplest way to try out CauchyEst is to run a simple example:

$ git clone https://github.com/YohannaWANG/CauchyEst.git
$ cd CauchyEst/
$ python CauchyEst/main.py

Runing as a command

Alternatively, if you have a CSV data file X.csv, you can install the package and run the algorithm as a command:

$ pip install git+git://github.com/YohannaWANG/CauchyEst
$ cd CauchyEst
$ python main.py --n 100 --d 5 --tg 'er' --tn 'uv'

Algorithms

Algorithm 1 states our two-phased recovery approach. We estimate the coefficients of the Bayesian network in the first phase and use them to recover the variances in the second phase.
Algorithm 2 is the variance recovery algorithm given coefficient estimates.
Algorithm 3 is recovering the coefficients in a Bayesian network using a linear least squares estimator.
Algorithm 4 is a generalization of the least square algorithm.
Algorithm 5 is a coefficient recovery algorithm based on Cauchy random variables (for variable with p parents).
Algorithm 6 is a coefficient recovery algorithm based on Cauchy random variables (for general Bayesian networks).
Algorithm 7 works for the special case of polytree Bayesian networks.

Performance

100 nodes, degree 5, ER graph	Noisy data(5% noisy sample, 5/100 noisy nodes), d=5, ER graph

Citation

If you use any part of this code in your research or any engineering project, please cite our paper:

@article{bhattacharyya2021learning,
  title={Learning Sparse Fixed-Structure Gaussian Bayesian Networks},
  author={Bhattacharyya, Arnab and Choo, Davin and Gajjala, Rishikesh and Gayen, Sutanu and Wang, Yuhao},
  journal={arXiv preprint arXiv:2107.10450},
  year={2021}
}

Contacts

Please feel free to contact us if you meet any problem when using this code. We are glad to hear other advise and update our work. We are also open to collaboration if you think that you are working on a problem that we might be interested in it. Please do not hestitate to contact us!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Sparse Fixed-Structure Gaussian Bayesian Networks

Introduction

Example

Prerequisites

Contents

Parameters

Running a simple demo

Runing as a command

Algorithms

Performance

Citation

Contacts

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
Data		Data
docs		docs
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data.py		data.py
evl.py		evl.py
main.py		main.py
methods.py		methods.py
utils.R		utils.R

License

YohannaWANG/CauchyEst

Folders and files

Latest commit

History

Repository files navigation

Learning Sparse Fixed-Structure Gaussian Bayesian Networks

Introduction

Example

Prerequisites

Contents

Parameters

Running a simple demo

Runing as a command

Algorithms

Performance

Citation

Contacts

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages