Raven is a powerful and customizable web crawler written in Go. It allows you to extract internal and external links from a given website with options for concurrent crawling, depth customization, and maximum URL limits.
- Concurrent crawling to maximize efficiency.
- Customizable depth and maximum URL limits to tailor the crawling process to your needs.
- Extraction of both internal and external links for comprehensive analysis.
- Colorful logging for easy debugging and tracking of crawling progress.
- Error handling for fetching URLs to ensure robustness.
To install Raven, you have three options:
-
Compiled Version: Click Here
-
Clone the Raven repository:
git clone https://github.com/Symbolexe/Raven.git
-
Navigate to the project directory:
cd raven
-
Build the project:
go build
- To install Raven, use go get:
go get github.com/Symbolexe/raven
./raven [options] <startURL>
- -maxURLs : Maximum number of URLs to crawl (default: 100)
- -maxDepth : Maximum depth of crawling (default: 3)
- -concurrency : Number of concurrent requests (default: 10)
./raven -maxURLs 500 -maxDepth 5 -concurrency 20 https://example.com
This command will crawl the website https://example.com with a maximum of 500 URLs, a maximum depth of 5, and 20 concurrent requests.
-
Raven depends on the following external packages: golang.org/x/net/html: Used for HTML parsing.
-
You can install these dependencies using the following command:
go mod tidy
This project is licensed under the MIT License. See the LICENSE file for details.