GoScraper is a fast, multithreaded web scraper built in Go using the Colly library. It reads URLs from a links.txt
file, scrapes the HTML content, saves it locally, and serves the scraped files via an HTTP server.
- Multithreaded Scraping - Concurrently scrapes multiple websites for efficiency.
- Colly Web Scraping - Uses Go Colly, a web scraping library.
- Automatic URL Handling - Reads target websites from
links.txt
. - Custom User-Agent - Mimics real browsers to avoid bot detection.
- File-Based Storage - Saves scraped HTML files in
scrapedHTML/
.
- Go (Golang)
- Colly (Web Scraping)
- HTTP Server (for serving scraped files)
git clone https://github.com/yourusername/go-scraper.git
cd go-scraper
Ensure you have Go installed. Then, install Colly:
go get -u github.com/gocolly/colly
go build -o scraper scraper.go
Add website URLs (one per line) in links.txt
. Example:
https://example.com
https://golang.org
https://github.com
go run scraper.go
- Scrapes all websites listed in
links.txt
. - Saves HTML files in
scrapedHTML/
. - Starts a web server at http://localhost:8080/ to serve scraped content.
- Reads URLs from
links.txt
- Scrapes each website using Colly
- Saves the HTML content locally
- Hosts the scraped files via an HTTP server
This project is open-source under the MIT License.