Skip to content

Normalize hrefs while skipping unresolved links #900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 20, 2023

Conversation

gadomski
Copy link
Member

@gadomski gadomski commented Oct 13, 2022

Related Issue(s):

Description:

Adds a skip_unresolved argument to Catalog.normalize_hrefs and Catalog.normalize_and_save. Instead of using Catalog.get_items and Catalog.get_children (which resolve their links), we re-implement with additional logic to conditionally skip unresolved links. We considered pushing skip_unresolved all the way down into Link.get_stac_object, but this seemed like overkill, since the motivating use-case for skip_unresolved is to avoid walking the whole tree when normalizing (#284).

Because save already skips unresolved links, this enables normalize_and_save to only touch objects that have been added to the tree or modified (and therefore resolved).

Note this isn't exactly the solution proposed by @matthewhanson in #284, since the hrefs newly-added entities aren't normalized. These changes are only confined to normalize.

This is a feature-adding, backwards compatible change that could be incorporated in a MINOR release.

cc @l0b0

PR Checklist:

  • Code is formatted (run pre-commit run --all-files)
  • Tests pass (run scripts/test)
  • Documentation has been updated to reflect changes, if applicable
  • This PR maintains or improves overall codebase code coverage.
  • Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.

@gadomski gadomski force-pushed the issues/284-only-normalize-resolved branch from f5a41bc to f55e8b0 Compare October 13, 2022 14:18
@codecov-commenter
Copy link

codecov-commenter commented Oct 13, 2022

Codecov Report

Base: 90.09% // Head: 90.12% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (4c149a8) compared to base (b0aec93).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #900      +/-   ##
==========================================
+ Coverage   90.09%   90.12%   +0.02%     
==========================================
  Files          47       47              
  Lines        6057     6064       +7     
  Branches      905      909       +4     
==========================================
+ Hits         5457     5465       +8     
  Misses        420      420              
+ Partials      180      179       -1     
Impacted Files Coverage Δ
pystac/catalog.py 91.96% <100.00%> (+0.37%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@matthewhanson
Copy link
Member

This looks fine to me, I think it's a good change - in most cases of adding new items to a catalog you shouldn't need to resolve anything that isn't newly added (i.e., resolved).

But I'm not sure I follow this:

Note this isn't exactly the solution proposed by @matthewhanson in https://github.com/stac-utils/pystac/issues/284, since the hrefs newly-added entities aren't normalized. These changes are only confined to normalize.

What do you mean by newly-added entities aren't normalized?

@gadomski
Copy link
Member Author

What do you mean by newly-added entities aren't normalized?

If I understood #284 correctly, you were proposing a change where

collection.add_item(item)

would auto-magically normalize the item href (without having to call collection.normalize_hrefs()). This PR does not do that.

@gadomski gadomski force-pushed the issues/284-only-normalize-resolved branch from f55e8b0 to c7229bf Compare November 7, 2022 19:38
@gadomski gadomski added enhancement documentation Issues related to PySTAC documentation and removed documentation Issues related to PySTAC documentation labels Nov 7, 2022
@gadomski gadomski self-assigned this Dec 30, 2022
@gadomski gadomski force-pushed the issues/284-only-normalize-resolved branch from d313e30 to db41b8f Compare December 30, 2022 18:26
@gadomski gadomski added this to the 1.7 milestone Dec 30, 2022
@gadomski gadomski requested a review from pjhartzell January 18, 2023 17:19
@gadomski gadomski force-pushed the issues/284-only-normalize-resolved branch 2 times, most recently from 560245a to 3e55bdb Compare January 19, 2023 22:02
gadomski and others added 3 commits January 19, 2023 15:04
Adds a `skip_unresolved` argument to `normalize_hrefs` and `normalize_and_save`.
Instead of using `get_items` and `get_children` (which resolve their links), we
re-implement with additional logic to conditionally skip unresolved links. We
considered pushing `skip_unresolved` all the way down into `get_stac_object`,
but this seemed like overkill, since the motivating use-case for
`skip_unresolved` is to avoid walking the whole tree when normalizing (#284).

Because `save` already skips unresolved links, this enables `normalize_and_save`
to only touch objects that have been added to the tree or modified (and
therefore resolved).
Co-authored-by: Preston Hartzell <preston.hartzell@gmail.com>
@gadomski gadomski force-pushed the issues/284-only-normalize-resolved branch from 3e55bdb to 4c149a8 Compare January 19, 2023 22:04
@gadomski gadomski requested a review from pjhartzell January 19, 2023 22:05
@gadomski gadomski merged commit b4a44f7 into main Jan 20, 2023
@gadomski gadomski deleted the issues/284-only-normalize-resolved branch January 20, 2023 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Normalize URLs and set link types for newly added entities
4 participants