Skip to content

Feature request: Support "signing" results #259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jul 8, 2022 · 6 comments · Fixed by #286
Closed

Feature request: Support "signing" results #259

TomAugspurger opened this issue Jul 8, 2022 · 6 comments · Fixed by #286
Labels
enhancement New feature or request

Comments

@TomAugspurger
Copy link
Collaborator

This is a feature request to add an argument to the pystac_client.Client constructor and the from_* classmethods that would enabling "signing" results like those returned by the Planetary Computer and Brazil Data Cube.

Motivation

Some STAC APIs, like the Planetary Computer's, are public but the assets are stored in private blob storage containers. The results returned from the STAC API need to be transformed somehow for the user to read the actual data. For the Planetary Computer, we have users hit another API endpoint that grants them a short-lived token enabling them to read from blob storage. This is implemented in python at https://github.com/microsoft/planetary-computer-sdk-for-python.

In [1]: import requests, pystac_client

In [2]: catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
   ...: search = catalog.search(collections=["goes-cmi"], query={"goes:image-type": {"eq": "FULL DISK"}, "platform": {"eq": "GOES-16"}}, datetime="2022-06-27/2022-06-29")

In [3]: item = next(search.get_items())

In [4]: requests.head(item.assets["C01_1km"].href).status_code
Out[4]: 404

In [5]: import planetary_computer

In [6]: requests.head(planetary_computer.sign(item.assets["C01_1km"].href)).status_code
Out[6]: 200

This works OK, but does need to be manually called on each result. This request is for convenience: you specifying the signing function once when creating the Client and it calls the callable on each result.

Proposal

Add a new parameter, something like sign_function (name TDB), that's a callable called with the result immediately before returning it to the user. planetary_computer.sign works with URLs, assets, items, or ItemCollections so I'm personally OK with lumping all of those under a single parameter. Alternatively we could have an object with dedicated implementations for each kind of thing being signed (or maybe just Items and ItemCollections, since those are the only thing returned by pystac-client.


Note that odc-stac as implemented something similar with the patch_url function: https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html?highlight=patch_url#odc.stac.load. I think that pystac-client is more appropriate place for this functionality since not all STAC metadata represent data that can be put in a datacube.

Docs for rstac's sign_bdc and sign_planetary_computer.

@gadomski gadomski added the enhancement New feature or request label Jul 15, 2022
@TomAugspurger
Copy link
Collaborator Author

One slightly awkward issue I've run into when prototyping this: this is a bit of state on the object that should probably preserved through things like Collection.clone(). Perhaps this makes more sense in pystac itself? (cc @duckontheweb).

That said, we can handle this in pystac-client by overriding clone and related methods to preserve this attribute.

@TomAugspurger
Copy link
Collaborator Author

I have a prototype implementaiton of this at https://github.com/stac-utils/pystac-client/compare/main...TomAugspurger:pystac-client:feature/sign?expand=1.

You can see the awkwardness of doing this in pystac-client at places like https://github.com/stac-utils/pystac-client/compare/main...TomAugspurger:pystac-client:feature/sign?expand=1#diff-31bfdc592adbc2c1ac2b67b8ef4ba341f775899fa3d9c4252a441d2b58a2e87fR145. That said, doing it upstream in pystac will still require changes in pystac-client to ensure that sign_function is called for the methods pystac-client overrides.

@gadomski
Copy link
Member

Thinking abstractly here, there's nothing signing-specific about your implementation, correct? It's really more of a post-hook sort of vibe, a mutate-this-object function that is called after creating PySTAC objects.

@TomAugspurger
Copy link
Collaborator Author

there's nothing signing-specific about your implementation, correct? It's really more of a post-hook sort of vibe

Yep, that's 100% correct. The name should probably reflect that. It's slightly similar to dataclasses' __post_init__ (https://docs.python.org/3/library/dataclasses.html#post-init-processing).

mutate-this-object function

In the case of planetary_computer.sign, it can either mutate an object or return a copy. I'm not sure whether pystac / pystac-client should take a stance on the behavior of the sign_function (or whatever we end up calling it) other than that the return type should be the same.


I'm realizing now that there's a subtle point I haven't stated explicitly: I've only considered the case of users providing a sign_function to a parent object that operates on the children of the parent when accessed, and propagating to that child's children. I've ignored the use-case of wanting the sign_function / post-hook to be called on the (parent) object being itself. I'm not sure if that's an actual use-case or not.

@gadomski
Copy link
Member

I'm not sure whether pystac / pystac-client should take a stance on the behavior of the sign_function (or whatever we end up calling it) other than that the return type should be the same.

My preference is to mutate, leaving it up to the user to copy before calling.

I've only considered the case of users providing a sign_function to a parent object that operates on the children of the parent when accessed, and propagating to that child's children. I've ignored the use-case of wanting the sign_function / post-hook to be called on the (parent) object being itself. I'm not sure if that's an actual use-case or not.

The only use-case I could see would be Collection-level assets. Otherwise, I agree that pystac-client usually would want to sign children. However, if we move this functionality down to PySTAC, then we absolutely want to sign the parent I think, since you can load Items directly with PySTAC.

@TomAugspurger
Copy link
Collaborator Author

My preference is to mutate, leaving it up to the user to copy before calling.

SGTM. Kinda leading into the next point: the user won't have any references to the children that might be mutated inplace, so API-wise mutating or not only matters if we're calling the hook on the paranet object itself.


Here's my concrete proposal:

A new parameter to the various constructors, modifier, with an expected signature Callable[[T], None]. The expected behavior of this callable is to take a T (pystac.Collection, pystac.Item, collection-like dict, collection-like item, ....) and mutate it in place, returning None

class Client:
    def open(...., modifier: Callable[[T], None]):
        """
            modifier : A callable that modifies the children collection and items
                returned by this Client. This can be useful for injecting
                authentication parameters into child assets to access data
                from non-public sources.

                The callable should expect a single argument, which will be one
                of the following types:

                * :class:`pystac.Collection`
                * :class:`pystac.Item`
                * :class:`pystac.ItemCollection`
                * A STAC item-like :class:`dict`
                * A STAC collection-like :class:`dict`

                The callable should mutate the argument in place and return ``None``.

                ``modifier`` propagates recursively to children of this Client.
                After getting a child collection with, e.g.
                :meth:`Client.get_collection`, the child items of that collection
                will still be signed with ``modifier``.
        """

Anticipating a future concern: what if users want to provide different modifier callables for different purposes? I think we could adopt a pattern similar to https://www.python-httpx.org/advanced/#timeout-configuration. If you provide a simple callable, we expect that it can handle anything. If you want fine-grained control, we can provide a simple class

class Modifier:
    def __init__(self, item_modifier: Callable[[pystac.Item], None], collection_modifier: Callable[[pystac.Collection], None], ...):
        ...

And we would call the appropriate method. IMO that's overkill for an initial implementation. Just accepting a callable for now is forwards-compatible with that design though.

I think we'll want to override the __init__ methods of the various pystac_client subclasses. It'll be a bit of a hassle to keep up with upstream changes in pystac, but I think it's worth it just to keep the definition of the subclass's additional attributes sound.

I'll put up a PR along these lines sometime in the next day or two.


Slight concern: up until now planetary_computer.sign has always copied. microsoft/planetary-computer-sdk-for-python#42 is adding a planetary_computer.sign_inplace. But if a user does pystac_client.Catalog.open(..., modifier=planetary_computer.sign), rather than sign_inplace they'll (silently) get unsigned results. We'll need to document that well)

TomAugspurger pushed a commit to TomAugspurger/pystac-client that referenced this issue Jul 29, 2022
**Related Issue(s):**

- Closes stac-utils#259

**Description:**

This adds a new keyword, `modifier`, to the various constructors. The motivation is described in stac-utils#259. As a brief demonstration:

```python
In [1]: import planetary_computer, pystac_client

In [2]: catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace)

In [3]: item = next(catalog.get_collection("sentinel-2-l2a").get_all_items())

In [4]: item.assets["B02"].href  # signed URL, notice the querystring
Out[4]: 'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/52/S/GH/2022/07/28/S2B_MSIL2A_20220728T020659_N0400_R103_T52SGH_20220729T082039.SAFE/GRANULE/L2A_T52SGH_A028157_20220728T021310/IMG_DATA/R10m/T52SGH_20220728T020659_B02_10m.tif?st=2022-07-28T16%3A03%3A54Z&se=2022-07-29T16%3A48%3A54Z...'
```

**PR Checklist:**

- [x] Code is formatted
- [x] Tests pass
- [x] Changes are added to the [CHANGELOG](https://github.com/stac-utils/pystac-api-client/blob/main/CHANGELOG.md)
TomAugspurger pushed a commit to TomAugspurger/pystac-client that referenced this issue Jul 29, 2022
**Related Issue(s):**

- Closes stac-utils#259

**Description:**

This adds a new keyword, `modifier`, to the various constructors. The motivation is described in stac-utils#259. As a brief demonstration:

```python
In [1]: import planetary_computer, pystac_client

In [2]: catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace)

In [3]: item = next(catalog.get_collection("sentinel-2-l2a").get_all_items())

In [4]: item.assets["B02"].href  # signed URL, notice the querystring
Out[4]: 'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/52/S/GH/2022/07/28/S2B_MSIL2A_20220728T020659_N0400_R103_T52SGH_20220729T082039.SAFE/GRANULE/L2A_T52SGH_A028157_20220728T021310/IMG_DATA/R10m/T52SGH_20220728T020659_B02_10m.tif?st=2022-07-28T16%3A03%3A54Z&se=2022-07-29T16%3A48%3A54Z...'
```

**PR Checklist:**

- [x] Code is formatted
- [x] Tests pass
- [x] Changes are added to the [CHANGELOG](https://github.com/stac-utils/pystac-api-client/blob/main/CHANGELOG.md)
TomAugspurger pushed a commit to TomAugspurger/pystac-client that referenced this issue Jul 29, 2022
**Related Issue(s):**

- Closes stac-utils#259

**Description:**

This adds a new keyword, `modifier`, to the various constructors. The motivation is described in stac-utils#259. As a brief demonstration:

```python
In [1]: import planetary_computer, pystac_client

In [2]: catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace)

In [3]: item = next(catalog.get_collection("sentinel-2-l2a").get_all_items())

In [4]: item.assets["B02"].href  # signed URL, notice the querystring
Out[4]: 'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/52/S/GH/2022/07/28/S2B_MSIL2A_20220728T020659_N0400_R103_T52SGH_20220729T082039.SAFE/GRANULE/L2A_T52SGH_A028157_20220728T021310/IMG_DATA/R10m/T52SGH_20220728T020659_B02_10m.tif?st=2022-07-28T16%3A03%3A54Z&se=2022-07-29T16%3A48%3A54Z...'
```

**PR Checklist:**

- [x] Code is formatted
- [x] Tests pass
- [x] Changes are added to the [CHANGELOG](https://github.com/stac-utils/pystac-api-client/blob/main/CHANGELOG.md)
gadomski pushed a commit that referenced this issue Aug 8, 2022
* Add a post-init modifier keyword

**Related Issue(s):**

- Closes #259

**Description:**

This adds a new keyword, `modifier`, to the various constructors. The motivation is described in #259. As a brief demonstration:

```python
In [1]: import planetary_computer, pystac_client

In [2]: catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace)

In [3]: item = next(catalog.get_collection("sentinel-2-l2a").get_all_items())

In [4]: item.assets["B02"].href  # signed URL, notice the querystring
Out[4]: 'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/52/S/GH/2022/07/28/S2B_MSIL2A_20220728T020659_N0400_R103_T52SGH_20220729T082039.SAFE/GRANULE/L2A_T52SGH_A028157_20220728T021310/IMG_DATA/R10m/T52SGH_20220728T020659_B02_10m.tif?st=2022-07-28T16%3A03%3A54Z&se=2022-07-29T16%3A48%3A54Z...'
```

**PR Checklist:**

- [x] Code is formatted
- [x] Tests pass
- [x] Changes are added to the [CHANGELOG](https://github.com/stac-utils/pystac-api-client/blob/main/CHANGELOG.md)

* pystac compat

* Handle return type

* use is

* Remove no_modifier

* fixup

* Apply suggestions from code review

Co-authored-by: Jon Duckworth <duckontheweb@gmail.com>

* fixup

Co-authored-by: Jon Duckworth <duckontheweb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants