Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract Map Data from Website via /entract Endpoint #1135

Closed
muhammad-ai-42 opened this issue Feb 5, 2025 · 1 comment
Closed

Extract Map Data from Website via /entract Endpoint #1135

muhammad-ai-42 opened this issue Feb 5, 2025 · 1 comment

Comments

@muhammad-ai-42
Copy link

Problem Description
Currently, Firecrawl does not provide a direct method to extract geospatial or map-related data from websites. This limits the ability to analyze location-based information embedded in web pages, such as business listings, embedded maps, and geotagged data. Adding support for extracting map data from sites that include interactive maps (e.g., Google Maps, OpenStreetMap, Leaflet, Mapbox) would enhance Firecrawl’s usability for location-based intelligence tasks.

Proposed Feature
Enhance the /entract endpoint to detect and extract structured geospatial data from interactive maps embedded in web pages. This could involve parsing JavaScript objects, API calls, and embedded JSON containing coordinates, place names, and metadata.

Alternatives Considered
• Manual scraping of JavaScript-rendered maps: This approach requires additional tools like Puppeteer or Selenium, making it more complex and resource-intensive.
• Using existing APIs (Google Maps, OSM, etc.): While possible, these APIs often require API keys and rate limits, making direct extraction from web pages a more flexible alternative.
• Parsing raw HTML: Most map data is loaded dynamically via JavaScript, so a pure HTML-based approach is insufficient.

Use Case
• Business Intelligence: Extracting location data from online directories for competitive analysis.
• Geospatial Research: Aggregating geographic data from multiple sources to analyze urban development trends.
• Real Estate & Local SEO: Identifying business locations and mapping customer reviews for market research.

Additional Context
• Google Maps, OpenStreetMap, and other mapping platforms embed location data within JSON structures inside web pages.
• Some similar tools, like Scrapy and Playwright, allow JavaScript execution but lack built-in support for structured geospatial extraction.
• Enhancing Firecrawl to support this would make it more powerful for location-based web scraping without requiring external geocoding services.

@mogery
Copy link
Member

mogery commented Feb 20, 2025

Hi! This is currently not planned. If you want to get geospatial data, I recommend you download OpenStreetMap planet files and parse them directly.

@mogery mogery closed this as not planned Won't fix, can't repro, duplicate, stale Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants