Skip to content

Commit 3fe2670

Browse files
authored
Collection search (#735)
* create BaseSearch class * add collection search functionality * update tests * replace Z with +00:00 in datetime strings for Python 3.10 * move ItemSearch back to search.py * reject extra args if client-side filter * fix matched method * fix error for collection search support check * moar tests! * add warning expectation to test * add collection search functionality to cli * add collection search examples to quickstart * quote search tokens with special characters * add collection search example to intro notebook * add CollectionSearch entry to api docs * clean up client docs a bit * add collection_list_as_dict method * add CollectionSearch section to usage docs * fix lint error * reinstate item_search.py * clean up warnings, tests * address review comments * improve matched logic * actually clean up matched logic * update changelog
1 parent 44aa3a5 commit 3fe2670

34 files changed

+25741
-595
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -620,3 +620,4 @@ $RECYCLE.BIN/
620620
# Windows shortcuts
621621
*.lnk
622622

623+
uv.lock

CHANGELOG.md

+4
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
77

88
## [Unreleased]
99

10+
### Added
11+
12+
- Support for collection search via `CollectionSearch` class and associated client methods [#735](https://github.com/stac-utils/pystac-client/pull/735)
13+
1014
### Removed
1115

1216
- Python 3.9 support [#724](https://github.com/stac-utils/pystac-client/pull/724)

docs/api.rst

+11
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,16 @@ endpoint, if supported.
2929
:members:
3030
:undoc-members:
3131

32+
Collection Search
33+
-----------------
34+
35+
The `CollectionSearch` class represents a search of collections in a STAC API.
36+
37+
.. autoclass:: pystac_client.CollectionSearch
38+
:members:
39+
:undoc-members:
40+
:member-order: bysource
41+
3242
Item Search
3343
-----------
3444

@@ -39,6 +49,7 @@ The `ItemSearch` class represents a search of a STAC API.
3949
:undoc-members:
4050
:member-order: bysource
4151

52+
4253
STAC API IO
4354
-----------
4455

docs/quickstart.rst

+52-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ Python library.
77
CLI
88
~~~
99

10-
Use the CLI to quickly make searches and output or save the results.
10+
Use the CLI to quickly make item- or collection-level searches and
11+
output or save the results.
1112

1213
The ``--matched`` switch performs a search with limit=1 so does not get
1314
any Items, but gets the total number of matches which will be output to
@@ -18,6 +19,15 @@ the screen (if supported by the STAC API).
1819
$ stac-client search https://earth-search.aws.element84.com/v1 -c sentinel-2-l2a --bbox -72.5 40.5 -72 41 --matched
1920
3141 items matched
2021
22+
The ``--matched`` flag can also be used for collection search to get
23+
the total number of collections that match your search terms.
24+
25+
26+
.. code-block:: console
27+
28+
$ stac-client collections https://emc.spacebel.be --q sentinel-2 --matched
29+
76 collections matched
30+
2131
If the same URL is to be used over and over, define an environment
2232
variable to be used in the CLI call:
2333

@@ -87,6 +97,26 @@ than once to use additional operators.
8797
$ stac-client search ${STAC_API_URL} -c sentinel-2-l2a --bbox -72.5 40.5 -72 41 --datetime 2020-01-01/2020-01-31 --query "eo:cloud_cover<10" "eo:cloud_cover>5" --matched
8898
4 items matched
8999
100+
101+
Collection searches can also use multiple filters like this example
102+
search for collections that include the term ``"biomass"`` and have
103+
a spatial extent that intersects Scandinavia.
104+
105+
.. code-block:: console
106+
107+
$ stac-client collections https://emc.spacebel.be --q biomass --bbox 0.09 54.72 33.31 71.36 --matched
108+
43 items matched
109+
110+
Since most STAC APIs have not yet implemented the `collection search
111+
extension <https://github.com/stac-api-extensions/collection-search>`_,
112+
``pystac-client`` will perform a limited client-side
113+
filter on the full list of collections using only the ``bbox``,
114+
``datetime``, and ``q`` (free-text search) parameters.
115+
In the case that the STAC API does not support collection search, a
116+
warning will be displayed to inform you that the filter is being
117+
applied client-side.
118+
119+
90120
Python
91121
~~~~~~
92122

@@ -99,7 +129,7 @@ specific STAC API (use the root URL):
99129
100130
client = Client.open("https://earth-search.aws.element84.com/v1")
101131
102-
Create a search:
132+
Create an item-level search:
103133

104134
.. code-block:: python
105135
@@ -125,3 +155,23 @@ The ``ItemCollection`` can then be saved as a GeoJSON FeatureCollection.
125155
126156
item_collection = search.item_collection()
127157
item_collection.save_object('my_itemcollection.json')
158+
159+
160+
Create a collection-level search:
161+
162+
.. code-block:: python
163+
164+
collection_search = client.collection_search(
165+
q='"sentinel-2" OR "sentinel-1"',
166+
)
167+
print(f"{collection_search.matched()} collections found")
168+
169+
170+
The ``collections()`` iterator method can be used to iterate through all
171+
resulting collections.
172+
173+
.. code-block:: python
174+
175+
for collection in collection_search.collections():
176+
print(collection.id)
177+

docs/tutorials/pystac-client-introduction.ipynb

+47-3
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,9 @@
4040
"cell_type": "code",
4141
"execution_count": null,
4242
"id": "98942e75",
43-
"metadata": {},
43+
"metadata": {
44+
"scrolled": true
45+
},
4446
"outputs": [],
4547
"source": [
4648
"# STAC API root URL\n",
@@ -74,6 +76,48 @@
7476
" print(collection)"
7577
]
7678
},
79+
{
80+
"cell_type": "markdown",
81+
"id": "ebab2724-cab3-4fba-b25b-fdfb4e537014",
82+
"metadata": {},
83+
"source": [
84+
"# Collection Search\n",
85+
"\n",
86+
"Sometimes, it can be challenging to identify which collection you want to work with. The `collection_search` method allows you to discover collections by applying search filters that will help you find the specific collection(s) you need. Since many STAC APIs have not implemented the collection search extension, `pystac-client` will perform a limited client-side filter if the API does not conform to the collection search spec."
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": null,
92+
"id": "a23a53ec-5b5f-421d-9f0e-01dbde8c3697",
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"collection_search = cat.collection_search(\n",
97+
" q=\"ASTER\",\n",
98+
")"
99+
]
100+
},
101+
{
102+
"cell_type": "markdown",
103+
"id": "90b3d014-9c8f-4c5b-a94e-bfb7f17380ad",
104+
"metadata": {},
105+
"source": [
106+
"The `collections` method lets you iterate through the results of the search so you can inspect the details of matching collections."
107+
]
108+
},
109+
{
110+
"cell_type": "code",
111+
"execution_count": null,
112+
"id": "006f13fd-5e58-4f3f-bd5a-707cd830caa1",
113+
"metadata": {},
114+
"outputs": [],
115+
"source": [
116+
"for result in collection_search.collections():\n",
117+
" print(result.id, f\"{collection.description}\", sep=\"\\n\")\n",
118+
" print(\"\\n\")"
119+
]
120+
},
77121
{
78122
"cell_type": "code",
79123
"execution_count": null,
@@ -233,7 +277,7 @@
233277
"hash": "6b6313dbab648ff537330b996f33bf845c0da10ea77ae70864d6ca8e2699c7ea"
234278
},
235279
"kernelspec": {
236-
"display_name": "Python 3.9.11 ('.venv': venv)",
280+
"display_name": "Python 3 (ipykernel)",
237281
"language": "python",
238282
"name": "python3"
239283
},
@@ -247,7 +291,7 @@
247291
"name": "python",
248292
"nbconvert_exporter": "python",
249293
"pygments_lexer": "ipython3",
250-
"version": "3.9.11"
294+
"version": "3.12.3"
251295
}
252296
},
253297
"nbformat": 4,

docs/usage.rst

+106-8
Original file line numberDiff line numberDiff line change
@@ -229,10 +229,10 @@ creating your :class:`Client<pystac_client.Client>`.
229229
CollectionClient
230230
++++++++++++++++
231231

232-
STAC APIs may optionally implement a ``/collections`` endpoint as describe in the
232+
STAC APIs may optionally implement a ``/collections`` endpoint as described in the
233233
`STAC API - Collections spec
234-
<https://github.com/radiantearth/stac-api-spec/tree/master/collections>`__. This endpoint
235-
allows clients to search or inspect items within a particular collection.
234+
<https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/ogcapi-features#stac-api---collections>`__.
235+
This endpoint allows clients to search or inspect items within a particular collection.
236236

237237
.. code-block:: python
238238
@@ -245,7 +245,7 @@ allows clients to search or inspect items within a particular collection.
245245
PySTAC will get items by iterating through all children until it gets to an ``item`` link.
246246
PySTAC client will use the API endpoint instead: `/collections/<collection_id>/items`
247247
(as long as `STAC API - Item Search spec
248-
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search>`__ is supported).
248+
<https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/item-search>`__ is supported).
249249

250250
.. code-block:: python
251251
@@ -254,15 +254,113 @@ PySTAC client will use the API endpoint instead: `/collections/<collection_id>/i
254254
Note that calling list on this iterator will take a really long time since it will be retrieving
255255
every itme for the whole ``"sentinel-2-l2a"`` collection.
256256

257+
CollectionSearch
258+
++++++++++++++++
259+
260+
STAC API services may optionally implement a ``/collections`` endpoint as described in the
261+
`STAC API - Collections spec
262+
<https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/ogcapi-features#stac-api---collections>`__.
263+
The ``/collections`` endpoint can be extended with the
264+
`STAC API - Collection Search Extension <https://github.com/stac-api-extensions/collection-search>`__
265+
which adds the capability to apply filter parameters to the collection-level metadata.
266+
See the `Query Parameters and Fields
267+
<https://github.com/stac-api-extensions/collection-search?tab=readme-ov-file#query-parameters-and-fields>`__
268+
from that spec for details on the meaning of each parameter.
269+
270+
The :meth:`pystac_client.Client.collection_search` method provides an interface for making
271+
requests to a service's "collections" endpoint. This method returns a
272+
:class:`pystac_client.CollectionSearch` instance.
273+
274+
.. code-block:: python
275+
276+
>>> from pystac_client import Client
277+
>>> catalog = Client.open('https://planetarycomputer.microsoft.com/api/stac/v1')
278+
>>> results = catalog.collection_search(
279+
... q="biomass",
280+
... datetime="2022/.."
281+
... )
282+
283+
Instances of :class:`~pystac_client.CollectionSearch` have a handful of methods for
284+
getting matching collections as Python objects. The right method to use depends on
285+
how many of the matches you want to consume (a single collection at a time, a
286+
page at a time, or everything) and whether you want plain Python dictionaries
287+
representing the collections, or :class:`pystac.Collection` objects.
288+
289+
The following table shows the :class:`~pystac_client.CollectionSearch` methods for fetching
290+
matches, according to which set of matches to return and whether to return them as
291+
``pystac`` objects or plain dictionaries.
292+
293+
====================== ======================================================= ===============================================================
294+
Matches to return PySTAC objects Plain dictionaries
295+
====================== ======================================================= ===============================================================
296+
**Single collections** :meth:`~pystac_client.CollectionSearch.collections` :meth:`~pystac_client.CollectionSearch.collections_as_dicts`
297+
**Pages** :meth:`~pystac_client.CollectionSearch.pages` :meth:`~pystac_client.CollectionSearch.pages_as_dicts`
298+
**Everything** :meth:`~pystac_client.CollectionSearch.collection_list` :meth:`~pystac_client.CollectionSearch.collection_list_as_dict`
299+
====================== ======================================================= ===============================================================
300+
301+
Additionally, the ``matched`` method can be used to access result metadata about
302+
how many total items matched the query:
303+
304+
* :meth:`CollectionSearch.matched <pystac_client.CollectionSearch.matched>`: returns the number
305+
of hits (collections) for this search. If the API supports the STAC API Context Extension this
306+
value will be returned directly from a search result with ``limit=1``. Otherwise ``pystac-client``
307+
will count the results and return a value with an associated warning.
308+
309+
.. code-block:: python
310+
311+
>>> for collection in results.collections():
312+
... print(item.id)
313+
fia
314+
modis-13Q1-061
315+
modis-13A1-061
316+
sentinel-3-olci-lfr-l2-netcdf
317+
318+
The :meth:`~pystac_client.CollectionSearch.collections` and related methods handle retrieval of
319+
successive pages of results
320+
by finding links with a ``"rel"`` type of ``"next"`` and parsing them to construct the
321+
next request. The default
322+
implementation of this ``"next"`` link parsing assumes that the link follows the spec for
323+
an extended STAC link as
324+
described in the
325+
`STAC API - Collections: Collection Paging <https://github.com/radiantearth/stac-api-spec/blob/main/ogcapi-features/README.md#collection-pagination>`__
326+
section.
327+
328+
Alternatively, the Collections can be returned as a list, where each
329+
list is one page of results retrieved from search:
330+
331+
.. code-block:: python
332+
333+
>>> for page in results.pages():
334+
... for collection in page.collections():
335+
... print(collection.id)
336+
fia
337+
modis-13Q1-061
338+
modis-13A1-061
339+
sentinel-3-olci-lfr-l2-netcdf
340+
341+
If you do not need the :class:`pystac.Collection` instances, you can instead use
342+
:meth:`CollectionSearch.collections_as_dicts <pystac_client.CollectionSearch.collections_as_dicts>`
343+
to retrieve dictionary representation of the collections, without incurring the cost of
344+
creating the Collection objects.
345+
346+
.. code-block:: python
347+
348+
>>> for collection_dict in results.collections_as_dicts():
349+
... print(collection_dict["id"])
350+
fia
351+
modis-13Q1-061
352+
modis-13A1-061
353+
sentinel-3-olci-lfr-l2-netcdf
354+
257355
ItemSearch
258356
++++++++++
259357

260358
STAC API services may optionally implement a ``/search`` endpoint as describe in the
261359
`STAC API - Item Search spec
262-
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search>`__. This
360+
<https://github.com/radiantearth/stac-api-spec/tree/main/item-search`__. This
263361
endpoint allows clients to query STAC Items across the entire service using a variety
264362
of filter parameters. See the `Query Parameter Table
265-
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search#query-parameter-table>`__
363+
<https://github.com/radiantearth/stac-api-spec/tree/main/item-search#query-parameter-table>`__
266364
from that spec for details on the meaning of each parameter.
267365

268366
The :meth:`pystac_client.Client.search` method provides an interface for making
@@ -280,10 +378,10 @@ requests to a service's "search" endpoint. This method returns a
280378
... )
281379
282380
Instances of :class:`~pystac_client.ItemSearch` have a handful of methods for
283-
getting matching items into Python objects. The right method to use depends on
381+
getting matching items as Python objects. The right method to use depends on
284382
how many of the matches you want to consume (a single item at a time, a
285383
page at a time, or everything) and whether you want plain Python dictionaries
286-
representing the items, or proper ``pystac`` objects.
384+
representing the items, or :class:`pystac.Item` objects.
287385

288386
The following table shows the :class:`~pystac_client.ItemSearch` methods for fetching
289387
matches, according to which set of matches to return and whether to return them as

pystac_client/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
__all__ = [
22
"Client",
33
"CollectionClient",
4+
"CollectionSearch",
45
"ConformanceClasses",
56
"ItemSearch",
67
"Modifiable",
@@ -10,6 +11,7 @@
1011
from pystac_client._utils import Modifiable
1112
from pystac_client.client import Client
1213
from pystac_client.collection_client import CollectionClient
14+
from pystac_client.collection_search import CollectionSearch
1315
from pystac_client.conformance import ConformanceClasses
1416
from pystac_client.item_search import ItemSearch
1517
from pystac_client.version import __version__

0 commit comments

Comments
 (0)