GitHub - ivecanski/scrape_my_wikiloc_stats: Scrape your wikiloc stats

Overview

Wikiloc is a great site for documenting your trails. However, there is currently no way to quickly get an overview of your own basic stats - how many times your trails were viewed and downloaded.

This script scrapes your own wikiloc account (you need to login) and pulls the stats from the 10-trail pages instead of going through each individual trail - we try to be as easy on the website as possible. Since the stats are collected from your own account (while logged in), the scraping does not affect your stats, i.e. the number of views of your trails won't be incremented by running the script. Note that if you visit the same pages without being logged in the statistics data won't be available!

Initially I tried to implement the scraping using Scrapy, and also BeautifulSoup with requests, but both approaches failed as wikiloc was asking for captchas. The only way I could get around this was by using Selenium and let the user resolve the captcha manually.

Sample data

Sample html pages that were used at the time the scraper was implemented are located in the sample_data folder.

Environment

The script runs with python3, you need to install selenium and pandas libraries into your virtual environment:

pip install selenium
pip install pandas

Apart from that, you will also need to install the WebDriber which drives the browser.

The script also requires a creds.py file to be present in the current directory. This is a simple file with two entries:

email = '<your_email>'
password = '<your_password>'

These will be picked up by the script to fill in the credential fields.

Running the script

After sourcing your virtualenv, run the script simply with python scrape.py:

~/dev/python/scrape_my_wikiloc_stats $ source /home/ivan/.virtualenvs/scraping_selenium/bin/activate
(scraping)  ~/dev/python/scrape_my_wikiloc_stats $ python scrape.py 
> going to sleep for 20 seconds
-- next_page link: https://www.wikiloc.com/wikiloc/user.do?id=6046035&from=10&to=20
> going to sleep for 2 seconds
-- next_page link: https://www.wikiloc.com/wikiloc/user.do?id=6046035&from=20&to=30
> going to sleep for 2 seconds
-- next_page link: https://www.wikiloc.com/wikiloc/user.do?id=6046035&from=30&to=40
> going to sleep for 2 seconds
< all data gathered, processing...
Scraping completed, results stored in output/my_wikiloc_stats_20220820-205704.csv

The script sleeps at certain points to allow the pages to be loaded (waiting could be optimized), and the initial wait is longer in order to allow the captcha to be resolved manually.

Resulting CSV file

The script outputs the results as a csv file in the output sub-directory, with a unique name based on the current timestamp. An example of the output is shown below. You can further order the fields as you please.

	trail name	views	downloads	trailrank
0	Vrhovi Povlena	5	1	41
1	Stara planina: Hotel 'Stara planina' - Babin zub	17	3	36
2	Stara planina: Hotel 'Babin zub' - Midžor	64	14	52
3	Rila: Yastrebets - Musala	223	14	49
4	Gvozdačke stene, Azbukovica	23	1	36
5	Kučajske planine: Vrelo Grze	5	2	30
6	Kosmaj	78	9	33
7	Avala	54	4	35
8	Fruška Gora: Stražilovo	50	3	39
9	Jablanik	34	3	35
10	Vršačka kula - Lisičija glava - Malo Središte	76	1	35
11	Košutnjak	11	1	19
12	Cer: Lipove Vode - Kosanin grad - Šančine - Trojanov grad	95	6	34
13	Rajac Suvobor	160	14	36

Possible further improvements

The output currently doesn't contain the date when the trail took place. The reason is that this information is not available on 10-trail pages. Fetching the date would require the scraping of each individual trail.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
sample_data		sample_data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Sample data

Environment

Running the script

Resulting CSV file

Possible further improvements

About

Releases

Packages

Languages

ivecanski/scrape_my_wikiloc_stats

Folders and files

Latest commit

History

Repository files navigation

Overview

Sample data

Environment

Running the script

Resulting CSV file

Possible further improvements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages