-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Bug in bumphunt(): cause index problems in chr2, chr10, chrX and chrM in dmrseq() results #37
Comments
Hello @kdkorthauer , I went through the toy sample
The above lines follow the dmrseq vignettes
In this case, indexRanges are smaller than the number of queried CpGs.
Clearly, meanDiff() yields different results for BS.chr21 and BS.chr21[240001:260000,], a subset of BS.chr21, with the same Well, however, this exploration could not explain the overflow of CpG indices in my previous question above. Looking forward to your insights. |
Hi @kdkorthauer , I think I figured it out. There is a small issue in the
Let's examine the above line by line using my data,
As shown here, indices of CpG in chrM will be the last one in the loop to be converted based on the index cumsum in chrY. But this is inconsistent with the order in chrlengths table where chrY is actually the last chromosome. As a consequence, the indices are not correctly assigned to chrX, chrY, and chrM. I would suggest apply str_sort() in the bumphunt() to correct this problem. Otherwise, I would suggest exclude chrM in the analysis.
Hope this helps. |
Bug identified by @zyqfrog10 in the way position indices are tracked in the internal function bumphunt(). Previously, the order of the `chrs` vector containing chromosome names did not necessarily match the order of the names in the cumulative index table when chrM was included and listed last. This commit introduces a fix by removing the `as.character` call when building the cumulative sum table, which was reordering the names alphabetically. It also adds a check that the order of these two items is identical, and if not will throw an error to prevent this type of bug from persisting.
Hi @zyqfrog10, Thank you for reporting the bug, and for tracking down the source of the problem! I was able to confirm that the chromsome reordering in the I'm going to do some more testing, will push the fix to release once I'm confident it solves the problem you were seeing. Best, |
The check for chromosome ordering matching the cumulative sum table didn't use a vectorized logical. Changing to the 'all.equal' function.
Hi @kdkorthauer , Thanks for your quick response. I just tried sort(bs) first as suggested in the updated bumphunt(). But it couldn't solve the issue.
As shown above (chrM already excluded), the sequence of chrs are still not matched after sort(bs). I think the problem is that sort(bs) will sort by chromosome as character while unique() function will order differently.
In the bs object,
A right way to do this is,
Or this if want to keep it as a factor
So in the bumphunt() function, the line should be updated is still the
Now I realize this is not an issue of misorder of chrM but more things, where indices in chr2, chr10, chrX and chrM are affected. Best, |
Explicitly order the cumulative sum table by the ordering in which the chromosomes appear in the bsseq object.
Hi @zyqfrog10, Thank you for following up. I see now that the ordering in the cumulative sum table was still not correctly enforced. The issue is that the Best, |
Hi @kdkorthauer, Wonderful! Best, |
Please go down to my 4th post for the origin of the problem and suggested solution.
My 2nd post down questioned the usage of meanDiff() in the dmrseq vignettes.
----------------------------------Original post------------
Hi @kdkorthauer,
In my recent tries on dmrseq, the performance of meanDiff() has not been stable. Sometimes it throws out an error (for different data) like,
Error in .Primitive("[")(c(1, 1, 1, 1, 1, 0.75, 1, 0.5, 1, 1, 0.666666666666667, : subscript out of bounds
. I tried to track down the issue and found theindexRanges
recorded inregions
can exceed the number of CpG loci inbs
.In my data, it looks like the following. There are indices greater than the total number of CpGs. It might be related to how
bumphunt
track along CpG indices which I haven't gone to the bottom yet. I would be grateful if you can help me identify the problem. Thanks a lot.The codes to obtain the
bs.filt
andregions
above.The text was updated successfully, but these errors were encountered: