A command-line tool for traversing IIIF collections and extracting manifest URLs. This tool helps you explore and collect IIIF manifest URLs from collections, with support for nested collections and paginated results.
- Recursively Traverses IIIF Collections: Finds all manifest URLs within a collection, including those in nested collections.
- Supports Multiple IIIF Presentation API Versions: Compatible with both IIIF Presentation API 2.0 and 3.0.
- Multiple Output Formats: Choose between
json
,jsonl
(JSON Lines), and formatted tables. - Download Full Manifest JSONs: Save the complete JSON content of each manifest, named by their IDs.
- Save Results to File or Display in Terminal: Flexible output options to suit your workflow.
- Debug Mode for Detailed Logging: Provides comprehensive logs for troubleshooting and monitoring.
- Robust Error Handling with Automatic Retries: Ensures reliable data fetching even in the face of transient network issues.
- Support for Paginated Collections: Handles collections that span multiple pages seamlessly.
Requires Python 3.10 or higher.
pip install loam-iiif
The basic command structure is:
loamiiif [OPTIONS] URL
-o, --output PATH
: If used with--download-manifests
, specifies directory to save manifest JSON files. Otherwise saves manifest URLs list to a file (JSON or plain text format)-f, --format [json|jsonl|table]
: Output format (default: json)-d, --download-manifests
: Download full JSON contents of each manifest--cache-dir, -c PATH
: Directory to cache manifest JSON files (defaults to system temp directory)--skip-cache
: Skip reading from cache but still write to it--no-cache
: Disable manifest caching completely--debug
: Enable debug mode with detailed logs--help
: Show help message--max-manifests, -m INTEGER
: Maximum number of manifests to retrieve
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections/c69bb1ed-accb-4cfb-b60e-495b9911690f?as=iiif"
Output as a formatted table:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --format table
Save manifest URLs to different formats:
# JSON output
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --output manifests.json
# JSON Lines (jsonl) output
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --format jsonl --output manifests.jsonl
Download manifest contents to a directory:
# Downloads full manifest JSON files to ./manifest_downloads/ directory
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --download-manifests --output ./manifest_downloads
Download manifests and save JSON output:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --format json --output manifests.json --download-manifests
Limit the number of manifests:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --max-manifests=42
Enable debug logging:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --debug
Use a custom cache directory:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --cache-dir ./my-cache-dir
Skip reading from cache but still write to it:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --skip-cache
Disable caching completely:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --no-cache
Example debug output (truncated):
[2025-01-17 14:14:48] DEBUG Starting traversal of IIIF collection: https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif
INFO Processing collection: https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif
DEBUG Fetching URL: https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif
DEBUG Successfully fetched data from https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif
DEBUG Found nested collection: https://api.dc.library.northwestern.edu/api/v2/collections/ba35820a-525a-4cfa-8f23-4891c9f798c4?as=iiif
INFO Processing collection: https://api.dc.library.northwestern.edu/api/v2/collections/ba35820a-525a-4cfa-8f23-4891c9f798c4?as=iiif
DEBUG Added manifest: https://api.dc.library.northwestern.edu/api/v2/works/e40479c4-06cb-48be-9d6b-adf47f238852?as=iiif
DEBUG Added manifest: https://api.dc.library.northwestern.edu/api/v2/works/f4720687-61b6-4dcd-aed0-b70eff985583?as=iiif
# ... more manifests and collections ...
The tool implements manifest caching to improve performance and reduce load on IIIF servers:
- By default, manifests are cached in your system's temporary directory (
/tmp
on Unix-like systems) - Use
--cache-dir
to specify a custom cache location --skip-cache
will ignore existing cache but still write new cache entries (useful for refreshing stale data)--no-cache
completely disables caching (not recommended for large collections)
Cached manifests are stored as JSON files named using a sanitized version of their URLs. The cache is particularly useful when:
- Working with large collections that you'll need to process multiple times
- Using the
--download-manifests
option to save full manifest contents - Running the tool repeatedly during development or testing
The JSON output includes both manifests and collections:
{
"manifests": [
"https://api.dc.library.northwestern.edu/api/v2/works/9d87853e-3955-4912-906f-6ddf0e2e3825?as=iiif",
"..."
],
"collections": []
}
Each line contains a single manifest or collection URL:
{"manifest": "https://api.dc.library.northwestern.edu/api/v2/works/9d87853e-3955-4912-906f-6ddf0e2e3825?as=iiif"}
{"manifest": "..."}
{"collection": "https://api.dc.library.northwestern.edu/api/v2/collections/ba35820a-525a-4cfa-8f23-4891c9f798c4?as=iiif"}
The table format provides a readable view of manifests and collections with indexed entries.
- Python 3.10+
- click>=8.1.8
- requests>=2.32.3
- rich>=13.9.4
- Clone the repository:
git clone https://github.com/nulib-labs/loam-iiif.git
cd loam-iiif
- Create and activate a virtual environment with
uv
:
uv venv --python 3.10
source .venv/bin/activate # On Windows use: .venv\Scripts\activate
- Install dependencies:
uv sync
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.