Skip to content

danicat/spinarago

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spinarago

A basic web crawler written in Go.

Why "Spinarago"?

Spinarak = A spider Pokémon
Go = Go

You do the math. :)

Description

This is a basic web crawler that prints the site map of given URL. It does print external URLs, but doesn't follow them. This project is still a work in progress so it's not feature complete. Feel free to make suggestions for improvements, either by creating issues or submiting pull requests.

Install

$ go get github.com/danicat/spinarago

Usage

$ spinarago --hostname <host> --delay <milliseconds> --level <max-depth>

I highly recommend for you to install jq to pretty print the json output. Example:

$ spinarago --hostname http://example.com | jq

You can also redirect the stdout to a json file to make a site map dump:

$ spinarago --hostname http://example.com -level 1 -delay 10 > example_level1.json

jq is really handy to filter the output:

$ cat example_level1.json | jq '.[] | { url: .url }'

TODO

  • Handle relative paths

Contributing

I'm open to contributions. Just create an issue and/or submit a pull request.

Contact

Any comments please feel free to reach out to me at @danicat83 on Twitter.

About

A basic web crawler written in Go

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages