fixAuthors and merging #128

ggrittz · 2025-02-12T19:37:07Z

For large vectors of scientific names (500k+), plantR::fixAuthors is not able to merge due to the size of each object being merged

The issue occurs here:

res0 <- data.frame(orig.name = taxa, tax.name = NA, tax.author = NA, 
                     ids = 1:length(taxa))
  res <- res[, -which(names(res) %in% "fix.author")]
  res1 <- merge(res0, res, by = "orig.name", all = TRUE, suffixes = c(".x", 
                                                                      ""))

**Error in merge.data.frame(res0, res, by = "orig.name", all = TRUE, suffixes = c(".x",  : 
  vetores de comprimento negativo não são permitidos**

Changing merge to dplyr::left_join gives us a bit more information about the problem:

res0 <- data.frame(orig.name = taxa, tax.name = NA, tax.author = NA, 
                     ids = 1:length(taxa))
  res <- res[, -which(names(res) %in% "fix.author")]
  res1 <- dplyr::left_join(res0, res, by = "orig.name", suffix = c(".x", ""))

**Error in `dplyr::left_join()`:
! This join would result in more rows than dplyr can handle.
ℹ 9758327359 rows would be returned. 2147483647 rows is the maximum number allowed.
ℹ Double check your join keys. This error commonly occurs due to a missing join key, or an improperly specified join condition.**

However, there is a fix to be added right before merging:

##### ADD THIS STEP #####
  res <- res[!duplicated(res$orig.name), ]
  
  res0 <- data.frame(orig.name = taxa, tax.name = NA, tax.author = NA, 
                     ids = 1:length(taxa))
  res <- res[, -which(names(res) %in% "fix.author")]

##### WHICH ALSO ALLOWS LEFT_JOIN (MUCH FASTER) TO BE USED #####
  res1 <- dplyr::left_join(res0, res, by = "orig.name", suffix = c(".x", ""))
  
##### Maybe the steps or removing duplicated and ordering can be removed too? I didn't test tho #####
  res1 <- res1[!duplicated(res1$ids), ]
  res1 <- res1[order(res1$ids), ]
  res1$tax.name[!rep_ids0] <- taxa[!rep_ids0]
  res2 <- res1[, c("orig.name", "tax.name", "tax.author")]
  return(res2)

The text was updated successfully, but these errors were encountered:

This was referenced Feb 21, 2025

Pull request for fixAuthors #134

Closed

Minor fix for fixAuthors #135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixAuthors and merging #128

fixAuthors and merging #128

ggrittz commented Feb 12, 2025 •

edited

Loading

fixAuthors and merging #128

fixAuthors and merging #128

Comments

ggrittz commented Feb 12, 2025 • edited Loading

ggrittz commented Feb 12, 2025 •

edited

Loading