Skip to content

Latest commit

 

History

History
38 lines (26 loc) · 1.63 KB

README.md

File metadata and controls

38 lines (26 loc) · 1.63 KB

WikiData Dumps Processing Scripts

This repository contains scripts used for processing WikiData dumps, specifically for the "claims" and "labels" data.

Claims Package

The claims package includes scripts for processing claims data from WikiData dumps. The main functionalities include:

  • Parsing claims data from the WikiData dump.
  • Generating statistics and reports on the usage of properties within the claims.
  • Saving the processed data and statistics in a structured format.

Key Scripts

  • do_text.py: Processes the claims data and generates a textual report.
  • read_dump.py: Reads and parses the claims data from the WikiData dump.
  • save.py: Saves the processed claims data to a specified location.

Labels Package

The labels package consists of scripts for handling labels data from WikiData dumps. The primary features are:

  • Reading labels data from the WikiData dump.
  • Creating reports on the number of labels, descriptions, and aliases for items per language.
  • Outputting the results in a structured format for further analysis.

Key Scripts

  • do_text.py: Generates a text report based on the labels data.
  • read_dump.py: Reads and processes the labels data from the WikiData dump.
  • save.py: Saves the processed labels data to a designated location.

Reports links