GoScraper - Multithreaded Web Scraper with File Server

Overview

GoScraper is a fast, multithreaded web scraper built in Go using the Colly library. It reads URLs from a links.txt file, scrapes the HTML content, saves it locally, and serves the scraped files via an HTTP server.

Features

Multithreaded Scraping - Concurrently scrapes multiple websites for efficiency.
Colly Web Scraping - Uses Go Colly, a web scraping library.
Automatic URL Handling - Reads target websites from links.txt.
Custom User-Agent - Mimics real browsers to avoid bot detection.
File-Based Storage - Saves scraped HTML files in scrapedHTML/.

Technologies Used

Go (Golang)
Colly (Web Scraping)
HTTP Server (for serving scraped files)

Installation

Clone the Repository

git clone https://github.com/yourusername/go-scraper.git
cd go-scraper

Install Dependencies

Ensure you have Go installed. Then, install Colly:

go get -u github.com/gocolly/colly

Build the Scraper

go build -o scraper scraper.go

Prepare `links.txt`

Add website URLs (one per line) in links.txt. Example:

https://example.com
https://golang.org
https://github.com

Usage

Run the Scraper

go run scraper.go

Scrapes all websites listed in links.txt.
Saves HTML files in scrapedHTML/.
Starts a web server at http://localhost:8080/ to serve scraped content.

How It Works

Reads URLs from links.txt
Scrapes each website using Colly
Saves the HTML content locally
Hosts the scraped files via an HTTP server

License

This project is open-source under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
links.txt		links.txt
log.txt		log.txt
scraper.go		scraper.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GoScraper - Multithreaded Web Scraper with File Server

Overview

Features

Technologies Used

Installation

Clone the Repository

Install Dependencies

Build the Scraper

Prepare `links.txt`

Usage

Run the Scraper

How It Works

License

About

Languages

License

vaibhqvv/go-scraper

Folders and files

Latest commit

History

Repository files navigation

GoScraper - Multithreaded Web Scraper with File Server

Overview

Features

Technologies Used

Installation

Clone the Repository

Install Dependencies

Build the Scraper

Prepare links.txt

Usage

Run the Scraper

How It Works

License

About

Resources

License

Stars

Watchers

Forks

Languages

Prepare `links.txt`