Releases: ChaissonLab/danbing-tk
*conda test*
v1.3.2.5 Missed commit. Respect CXX option
danbing-tk v1.3.2
Major changes:
- Automated bias correction using
danbing-tk-pred
Resources:
ikmer.meta
required bydanbing-tk-pred
ikmer.meta.txt
human readable version ofikmer.meta.txt
with format documented in Wiki- Example
trkmers.meta.txt
required bydanbing-tk-pred
Next release (v1.3.3):
- Automated dosage computation for motifs and TR loci
danbing-tk v1.3.1 (manuscript)
This version is associated with the manuscript: "The motif composition of variable-number tandem repeats impacts gene expression"
Major changes:
- Updated preferred usage of danbing-tk by turning on kmer filter:
-kf 4 1
- Reduces
*.tr.kmers
output size by saving only counts, and uses index file to reconstruct locus name and kmer names
Resources in Assets:
tr.good.bed
: VNTR set for building RPGG
Additional resource on Zenodo:
- VNTR statistics and annotations on 35 HGSVC assemblies
- RPGG built from the annotations
- GTEx gene-level eVNTR discoveries
- GTEx gene-level eMotif discoveries
- GTEx fine-mapping results using susieR
- Bias matrices for HGSVC, HPRC, GTEx, and Geuvadis samples used in bias correction
- GTEx bias-corrected kmer dosage table
- Geuvadis bias-corrected kmer dosage table
Additional analysis scripts for bias correction, eQTL mapping, and fine-mapping are available in this repo.
danbing-tk v1.3
Improvements:
- Significantly improve the time/mem usage of danbing-tk
- benchmark setting
- 31x HG00731 SRS sample from 1000 Genomes Project
- two-consortium RPGG, 81045 loci
- 16 cores xeon-2665, avx
samtools fasta -@2 -n $bam | danbing-tk -a -kf 4 1 -gc 80 -k 21 -qs pan -fa /dev/stdin -o $out -p 16 -cth 45 | gzip >$aln
- Sample was genotyped in ~43 min using 31.4 Gb mem
- 24x speedup, 37% reduction in mem usage
- Output file size: 1.3 Gb
- benchmark setting
- danbing-tk now takes binary graph/index as input
ktools serialze
was added to convert*kmers
to*.graph.umap
*.kmerDBi.umap
and*.kmerDBi.vv
bam2pe
is now merged withdanbing-tk
- use
-fa
option for non-interleaved fasta e.g.samtools fasta -@2 -n $bam
- use
-fai
for interleaved fasta
- use
Resources
- New RPGG and VNTR coordinates on 35 HGSVC genomes are available at Zenodo
danbing-tk v1.2
Improvements:
- Improved indel handling in graph threading.
- Improved the memory scalability of multiple-boundary-alignment.
Resources:
- New RPGG and VNTR coordinates on 35 HGSVC genomes are available at Zenodo
manuscript-1
Latest version of code and resources that associate with the manuscript "Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs". Released for creating DOI with Zenodo.
danbing-tk v1.1
Improvements:
- Faster
danbing-tk aliign
: 2.6x speedup on HG00096 when genotying 32,138 loci - More flexible use of
danbing-tk build
: generating RPGG without SRS data by skipping graph pruning - More informative aln-r2: fixed zero r2 when no variation in assembly kmer count by adding a dummy point at (0,0)
danbing-tk v1.0
Improvements:
- Improved length estimation accuracy using multi-boundary expansion, due to more accurate orthology mapping of VNTRs across haplotypes.
- More stringent QC on VNTR size, number of supporting haplotypes, consistency of liftover coordinates, etc.
- Slightly expand VNTR set from 29,111 to 32,138 loci.
- Added more user-friendly length estimation script.
- Added option for alignment output by using
-a
withdanbing-tk align
- DOI created using Zenodo
Additional resources:
- Repeat-pangenome graph encoded as
pan.tr.kmers
,pan.ntr.kmers
andpan.graph.kmers
inRPGG.tar.gz
- 84,411 raw VNTR coordinates
tr.84411.bed
- 32,138 raw VNTR coordinates (high-confidence genotypable set)
tr.good.bed
- 397 non-VNTR regions
ctrl.bed
- Locus-specific biases of VNTR and non-VNTR regions
LSB.tsv
- Summary of eGene discoveries
Alltissue.egenes.tsv
- Comprehensive VNTR statistics
vntr.statistics.tsv
vntr.statistics.README
- 13 PacBio CLR assemblies (26 haplotypes)
*.h?.fasta.gz
- 32,138 boundary-expanded VNTR coordinates in the 26 haplotypes
pan.tr.mbe.no_CCS.bed
andpan.tr.mbe.no_CCS.README
- 73,582 boundary-expanded VNTR coordinates
pan.tr.73582.mbe.no_CCS.bed
Example analyses:
- QC of multi-boundary expansion
202011.MultiBoundaryExpansion.QC.ipynb
- Measuring length prediction accuracy
202012.Acc.pan.ipynb
- Contrasting the most informative kmer between populations
202012.mikmer.ipynb
- eQTL mapping
202012.eQTL.32138.ipynb
- Sample QC on locus-specific bias
LSB_analysis.ipynb
- Heritability analysis of SNP v.s. SNP+VNTR models
202011.sg.joint.ipynb
- Miscellaneous analyses in the original manuscript
202012.revision.supp.ipynb
v0.0
Version 0 of genotypable VNTRs, RPGG and precomputed LSB are out! These files should be the same as the ones used for the analysis in the original paper.