Skip to content

Add support for the version extension. #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 16, 2020

Conversation

schwehr
Copy link
Collaborator

@schwehr schwehr commented Oct 9, 2020

The initial version only implements version for Items. I still need to do version for Collections. Also, I'm sure more tests are needed.

https://github.com/radiantearth/stac-spec/tree/dev/extensions/version

@schwehr
Copy link
Collaborator Author

schwehr commented Oct 9, 2020

I will fix this:

flake8 pystac tests
pystac/extensions/version.py:24:7: E111 indentation is not a multiple of four

@schwehr
Copy link
Collaborator Author

schwehr commented Oct 10, 2020

I will try to fix the next linter's complaints later today.

Reverted the changes to pystac/validation/stac_validator.py

yapf3 --version
yapf 0.29.0
@schwehr
Copy link
Collaborator Author

schwehr commented Oct 10, 2020

I need to add a test that validates an item created with a version

@codecov-io
Copy link

codecov-io commented Oct 10, 2020

Codecov Report

Merging #193 into develop will increase coverage by 0.37%.
The diff coverage is 98.30%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #193      +/-   ##
===========================================
+ Coverage    91.81%   92.18%   +0.37%     
===========================================
  Files           27       28       +1     
  Lines         3225     3341     +116     
===========================================
+ Hits          2961     3080     +119     
+ Misses         264      261       -3     
Impacted Files Coverage Δ
pystac/extensions/version.py 98.26% <98.26%> (ø)
pystac/__init__.py 100.00% <100.00%> (ø)
pystac/stac_object.py 94.89% <0.00%> (+0.51%) ⬆️
pystac/catalog.py 89.93% <0.00%> (+1.38%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9944c51...fbd7abd. Read the comment docs.

@schwehr
Copy link
Collaborator Author

schwehr commented Oct 10, 2020

I will add validation code like this when I get a chance.

  def testValidateMinimum(self):
    with self.assertRaises(pystac.validation.STACValidationError):
      self.item.validate()

    self.item.ext.version.apply(self.version)
    self.item.validate()

  def testValidateAll(self):
    deprecated = True
    latest = MakeItem(2013)
    predecessor = MakeItem(2010)
    successor = MakeItem(2012)
    self.item.ext.version.apply(self.version, deprecated, latest, predecessor, successor)

    self.item.validate()

It appears that I need to add geometry to the item for it to be valid. (or remove the bbox and geometry)

def MakeItem(year):
  asset_id = f'USGS/GAP/CONUS/{year}'
  bbox = [-160.26, 18.85, -154.66, 22.29]
  geometry = {
      'type': 'Polygon',
     # TODO(schwehr): Match coords
      'coordinates': [[
          [100.0, 0.0],
          [101.0, 0.0],
          [101.0, 1.0],
          [100.0, 1.0],
          [100.0, 0.0]]]}
  start = datetime.datetime(year, 1, 2)

  item = pystac.Item(
      id=asset_id, geometry=geometry, bbox=bbox, datetime=start, properties={})
  item.set_self_href(URL_TEMPLATE % year)

  item.ext.enable(pystac.Extensions.VERSION)

  return item

Stop using an actual bbox and set it to None as geometry and bounds
are not relevant to testing the version extension.

Also add a test case with all three of latest, predecessor, and successor.
@schwehr
Copy link
Collaborator Author

schwehr commented Oct 11, 2020

Probably best to add a test where the deprecated arg to apply was not a bool or None, but I can do that after the rest of version gets written.


@classmethod
def _object_links(cls):
return [] # TODO(schwehr): What should this return?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method returns any STAC Objects that are linked to by the extension properties. That way, PySTAC knows that if can resolve links and should check if a link to a referenced object exists in the resolved object cache.

In this case, this should return any of the successor, predecessor, and latest_version links, e.g.

Suggested change
return [] # TODO(schwehr): What should this return?
link_types = [SUCCESSOR_VERSION, PREDECESSOR_VERSION, LATEST_VERSION]
return [link for link in self.item.get_links() if link.rel in link_types]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What call triggers this? This already returns the linked item

print('objects latest',
          [x for x in self.item.get_stac_objects(version.LATEST_VERSION)])

Your example has self.item, but this is a @classmethod with cls as the arg. I'm not following.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, forgot this was a classmethod. It should return the rel types that point to stac objects, not the actual links.

So return [SUCCESSOR_VERSION, PREDECESSOR_VERSION, LATEST_VERSION] should work for this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reasonably direct way to test this? I switched the order the be alphabetical. If there was a specific reason for the order you listed, I will change it.

I'm going to look at the catalog copy test now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason for the ordering, so alphabetical is great!

Not a very direct test; the full_copy logic is really what this is used for so the more complicated test is what will really ensure its working.

predecessor = MakeItem(2010)
successor = MakeItem(2012)
self.item.ext.version.apply(self.version, deprecated, latest, predecessor, successor)
# Just validate in tearDown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the _object_links method is implemented, there's a somewhat tricky test that should be performed that will make sure the in-memory object instance tracking works for this extension. The test could go as follows:

link = item_a.get_link('predecessor-version'`).resolve_stac_object()
self.assertTrue(link.target is item_b)

That the test passes. So we're checking that the copy didn't make one copy for Item B as it exists in the catalog, and another copy for the link from Item A to Item B with the version extension.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have so far:

    def test_copy_1_with_version(self):
        c2011 = make_collection(2011)
        c2012 = make_collection(2012)
        c2013 = make_collection(2013)
        c2014 = make_collection(2014)

        c2012.ext.enable(pystac.Extensions.VERSION)
        c2012.ext.version.apply('v2012', latest=c2012, predecessor=c2011, successor=c2013)

        cat = pystac.Catalog('my_catalog', 'my description')
        cat.add_child(c2011)
        cat.add_child(c2012)
        cat.add_child(c2013)
        cat.add_child(c2014)

        cat_copy = cat.full_copy()

Was trying to figure out how best to compare the two. Hopefully I can think through it easier where there aren't mindcraft videos playing next to me... :|

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a hard time figuring out the item link thing. This is what I had as of the end of yesterday

def compare_collection(col1, col2):
  if col1.id != col2.id:
    return False
  if col1.description != col2.description:
    return False
  has_ver_1 = col1.ext.implements(pystac.Extensions.VERSION)
  has_ver_2 = col2.ext.implements(pystac.Extensions.VERSION)
  if has_ver_1 != has_ver_2:
    return False
  if has_ver_1:
    print('has_ver')
    ver1 = col1.ext.version
    ver2 = col2.ext.version
    if ver1.version != ver2.version:
      return False
    if ver1.deprecated != ver2.deprecated:
      return False
    if str(ver1.latest_link) != str(ver2.latest_link):
      return False
    if str(ver1.predecessor_link) != str(ver2.predecessor_link):
      return False
    if str(ver1.successor_link) != str(ver2.successor_link):
      return False
  return True


class FullCopyTest(googletest.TestCase):

  def compare_catalogs(self, cat1, cat2):
    self.assertEqual(cat1.id, cat2.id)
    children1 = {child.id: child for child in cat1.get_children()}
    # print('children1 dict', children1)
    children2 = {child.id: child for child in cat2.get_children()}
    # print('children2 dict', children2)
    self.assertCountEqual(sorted(children1.keys()), sorted(children2.keys()))
    for child_id in children1:
      print(child_id)
      child1 = children1[child_id]
      child2 = children2[child_id]
      has_ver_1 = child1.implements(pystac.Extensions.VERSION)
      has_ver_2 = child1.implements(pystac.Extensions.VERSION)
      self.assertEqual(has_ver_1, has_ver_2)
      if has_ver_1:
        # Umm... ?
        self.assertTrue(compare_collection(child1, child2))
        link = child1.get_links(version.LATEST_VERSION)[0].resolve_stac_object()
        self.assertIs(link.target, child2)

    return True

  def testCopy1Linked(self):
    c2011 = make_collection(2011)
    c2012 = make_collection(2012)
    c2013 = make_collection(2013)
    c2014 = make_collection(2014)

    c2012.ext.enable(pystac.Extensions.VERSION)
    c2012.ext.version.apply(
        'v2012', latest=c2014, predecessor=c2011, successor=c2013)

    cat = pystac.Catalog('my_catalog', 'my description')
    cat.add_child(c2011)
    cat.add_child(c2012)
    cat.add_child(c2013)
    cat.add_child(c2014)

    cat_copy = cat.full_copy()
    pprint(cat_copy.to_dict())

    self.compare_catalogs(cat, cat_copy)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easier thing would be to do a straight object identity test to ensure the object cache ensures there's a single object that's linked to in the catalog and by the item that has the version link.

It might be easier/more clear to make the suggestion in code. I can take a crack at this tomorrow if you're alright with me pushing a commit to this branch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it! The only things I really care about is learning how it's done and getting the extensions implemented. I don't like what I did above except for the setup (assuming that it makes input of the form required.)

Running validate in tearDown happens after the test results are
recorded, so validate should happen in the tests.  This simplifies
the tests where validate doesn't really make sense.

Also removed a redundant call to self.collection.ext.enable.
@schwehr
Copy link
Collaborator Author

schwehr commented Oct 13, 2020

Doh. Committed the wrong thing, so I backed up and forced the commit.

@@ -104,6 +105,39 @@ def test_all_links(self):
self.item.ext.version.apply(self.version, deprecated, latest, predecessor, successor)
self.item.validate()

def test_full_copy(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schwehr here are the two tests that I used to make sure the object links were operating correctly in a full_copy scenario.

Notice that if you don't return the correct reltypes from the _object_link method, these tests fail.

This is because, when making a full copy of the catalog in memory, PySTAC doesn't know that those links point to STAC objects that should be considered in the copy. In this bit of code, the logic to determine whether or not a link has to be copied (and it's target either copied or pulled from the cache of already-copied objects) is determined by the _object_links method, which as seen here depends on object link the Catalog or Item declares (core reltypes like child, item or collection), as well as any reltypes that are declared to be STAC objects by implemented extensions.

Let me know if this helps explain the _object_links method and why this is a good way to test that, or feel free to make changes or give ideas on way to improve!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arg... I'm a noob and forced commit over your change. I'll recommit it from this :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! I find git cherry-pick to be the most useful in those types of situations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I spend most of my time in piper where cherry-picks are usually just for release branches. Appreciate any help leveling up my git skills.

This commit adds two tests, one for each of the Item and Collection
extensions of the Version extension. The test checks that when items
or collections are linked by the version links (predecessor, successor,
or latest_version), that a full copy will end up with a catalog that
has those links point to the correct object instances. If this test
were to fail, that would indicate that the STAC object linked to by
the version links was a separate copy of item or collection, which is incorrect.

Redo of stac-utils@defd7ef
by lossyrob
…ion.

Suggested by @lossyrob

Also:

- Add VERSION and DEPRECATED to avoid using bare strings
- Removed extra blank lines
- Added year to id set in make_collection for easier debugging
@schwehr
Copy link
Collaborator Author

schwehr commented Oct 15, 2020

Looks like this was a flake. Not able to read the schema, yes?

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/jsonschema/validators.py", line 777, in resolve_from_url
    document = self.resolve_remote(url)
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/jsonschema/validators.py", line 860, in resolve_remote
    result = requests.get(uri).json()
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/requests/models.py", line 889, in json
    return complexjson.loads(
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Copy link
Member

@lossyrob lossyrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small, more stylistic change - feel free to push back on it if you're used to that method :-)

Other then that this is looking 👍 and I'll merge after we sort out this last bit. Thank you!

Comment on lines 80 to 81
if items:
return next(items)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last little nitpick -

Making the return in the None case would be more explicit here and elsewhere, rather than not returning and relying on Python's default mechanism to serve None back to the caller.

Here and in the other iterator cases you could combine these lines into:

Suggested change
if items:
return next(items)
return next(items, None)

I think it'd still be readable if you even one-lined it:

return next(self.item.get_stac_objects(SUCCESSOR), None)

Copy link
Collaborator Author

@schwehr schwehr Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I haven't looked at next() in ages. The default None is so much simpler.

Done in schwehr@974dac8

The docs are nice an on point... https://docs.python.org/3/library/functions.html#next

@lossyrob
Copy link
Member

One more thing I forgot to mention - the next extension needs an entry in the API docs. We use autodoc but it requires we specify the class, and then give it a heading so it appears nicely in the TOC. See this for an example.

Also fix the Point Cloud Extension that had copy-paste remnants from projection.
@schwehr
Copy link
Collaborator Author

schwehr commented Oct 16, 2020

One more thing I forgot to mention - the next extension needs an entry in the API docs. We use autodoc but it requires we specify the class, and then give it a heading so it appears nicely in the TOC. See this for an example.

Cool. Done in fbd7abd. The order in the API doc is a bit confusing. I did fix up some complains from the tooling about the Point Cloud Extension.

@lossyrob
Copy link
Member

Thanks for fixing the pointcloud docs, I hadn't noticed that that had been breaking the doc build.

Looks great, merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants