Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter_pagelinks.py counts in memory, pagelinks tables obsolete #76

Merged
merged 1 commit into from
Jul 23, 2023

Conversation

mtmail
Copy link
Contributor

@mtmail mtmail commented Jul 23, 2023

closes #41

follow-up to #69

Instead of sorting the output filter_pagelinks.py and summarizing the counts after the sort, now the script sorts in-memory. For English wikipedia page this 3.8 GB memory, still reasonable these days.

The ${lang}pagelinkcount tables are no longer needed. Saves about 10GB database size (103 -> 88GB). Also the CSV files are smaller: 294m rows for all languages before, now 174m (-40%).

Overall time taken about the same.

@mtmail mtmail merged commit 44754e6 into osm-search:master Jul 23, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pre-count pagelinks
1 participant