You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Release v1.3: Updated RPGG built from 35 HGSVC genomes.
23
21
- Release v1.0: VNTR summary statistics and eGene discoveries are also included. Example analyses such as differential length/motif analysis, eQTL mapping, VNTR locus QC, sample QC are also included.
`danbing-tk align` takes ~12 cpu hours to genotype a 30x SRS sample. This will generate `$OUT_PREF.tr.kmers` and `$OUT_PREF.aln.gz` output with format specified in [File Format](#file-format).
41
+
42
+
**Important note:** If outputs of `danbing-tk align` are intended to be used directly for downstream analyses e.g. association tests, please check the [distribution of LSB](#distribution-of-lsb) section below before running.
43
+
44
+
45
+
## danbing-tk build
25
46
### Install Dependencies
26
47
For users intended to use `danbing-tk align` only, this step is not required.
`danbing-tk align` takes ~42 cpu hours to genotype a 30x SRS sample. This will generate a `*.tr.kmers` output with format specified in [File Format](#file-format).
72
-
73
-
**Important note:** If outputs of `danbing-tk align` are intended to be used directly for downstream analyses e.g. association tests, please check the [distribution of LSB](#distribution-of-lsb) section below before running.
Submitting jobs to cluster is preferred as `danbing-tk build` is compute-intensive, ~1200 cpu hours for the original dataset. Otherwise, remove `--cluster` and its parameters to run jobs locally.
98
95
99
-
###danbing-tk predict
96
+
## danbing-tk predict
100
97
Locus-specific sampling biases (LSB) at VNTR regions are critical for normalizing the sum of *k*-mer counts to VNTR length. We provided precomputed LSB at the VNTR regions for fast comparison, however this assumes the LSB of the dataset of interest is close enough to the dataset in the original paper. Please ensure this assumption is valid by running a joint PCA on the LSB of non-repetitive regions with the original dataset, provided in `LSB.tsv`. If this assumption failed, leave-one-out analysis (next section) on the dataset of interest is necessary to make accurate predictions. The following usage is for when the assumption holds.
101
98
102
99
Run `getCovByLocus.397.sh` on your SRS dataset.
@@ -148,6 +145,7 @@ kmer1 kmer_count1
148
145
>locus i+1
149
146
...
150
147
```
148
+
The second field is optional.
151
149
152
150
- Alignment output (`-a` option)
153
151
- Synopsis
@@ -159,7 +157,8 @@ kmer1 kmer_count1
159
157
- `ops`: operations to align the read to the graph
160
158
- `=`: a match in the repeat
161
159
- `.`: a match in the flank
162
-
- `[A|C|G|T]`: a mismatch in the repeat; letter in the graph is shown
163
-
- `[a|c|g|t]`: a mismatch in the flank; letter in the graph is shown
164
-
- `[H|h]`: a homopolymer run in the repeat or flank
165
-
- `S`: a gap (skipped)
160
+
- `[A|C|G|T]`: a mismatch; letter in the graph is shown
161
+
- `[0|1|2|3]`: a deletion; letter in the graph is shown as 0123 for ACGT, respectively.
0 commit comments