Skip to content

Latest commit

 

History

History
120 lines (88 loc) · 5.29 KB

README.md

File metadata and controls

120 lines (88 loc) · 5.29 KB

wikidata-lang

A database of languages and their wikidata id, Wikimedia language code, ISO 639-1, ISO 639-2, ISO 639-3, and ISO 639-6 codes. A database of languages and their Wikidata id, Wikimedia language code, ISO 639-1, ISO 639-2, ISO 639-3, ISO 639-6 codes

This database is based on the result of a SPARQL query executed on query.wikidata.org, so the data is as good as Wikidata is: in case of an error, please update Wikidata and request an update of this database in the issues.

For each language, the data object looks like this:

{
  wd: 'Q8798', // https://wikidata.org/entity/Q8798
  label: 'Ukrainian', // English label
  native: 'українська мова', // P1705
  wmCode: [ 'uk' ], // P424
  iso6391: [ 'uk' ], // P218
  iso6392: [ 'ukr' ], // P219
  iso6393: [ 'ukr' ] // P220
  // iso6396 value (P221) missing for Q8798
}

NPM

Download stats

Summary

Install

From a terminal, in your project directory, run:

npm install wikidata-lang

How-To

Access full language data objects

In your JS code:

// languages indexed by Wikidata id is the default
const languagesByWdId = require('wikidata-lang')

// For indexes by other available codes, require the desired module:
const languagesByIso6391Code = require('wikidata-lang/indexes/by_iso_639_1_code')
const languagesByIso6392Code = require('wikidata-lang/indexes/by_iso_639_2_code')
const languagesByIso6393Code = require('wikidata-lang/indexes/by_iso_639_3_code')
const languagesByIso6396Code = require('wikidata-lang/indexes/by_iso_639_6_code')
const languagesByWmCode = require('wikidata-lang/indexes/by_wm_code')
// Note that you can also use an index regrouping all of the above
const languagesByAnyCode = require('wikidata-lang/indexes/by_any_code')

// The Wikidata id index returns single language data objects
languagesByWdId.Q8798
// => { "code": "uk", "label": "Ukrainian", "native": "Українська", "wd": "Q8798" }

// while indexes by code always return arrays of language data objects
// to account for edge cases where several Wikidata languages items share a common code.
languagesByIso6391Code.uk[0]
// => { "code": "uk", "label": "Ukrainian", "native": "Українська", "wd": "Q8798" }

Convert between codes

Conversion mappings can be build from the language data objects described above, but you can also use the pre-built ones:

const wdIdByWmCode = require('wikidata-lang/mappings/wd_id_by_wm_code.json')
wdIdByWmCode['zh-min-nan'] // => 'Q36495'

const wdIdByIso6391Code = require('wikidata-lang/mappings/wd_id_by_iso_639_1_code.json')
wdIdByIso6391Code['cr'] // => 'Q33390'

const wdIdByIso6392Code = require('wikidata-lang/mappings/wd_id_by_iso_639_2_code.json')
wdIdByIso6392Code['kzu'] // => 'Q6380723'

const wdIdByIso6393Code = require('wikidata-lang/mappings/wd_id_by_iso_639_3_code.json')
wdIdByIso6393Code['acc'] // => 'Q16110361'

const wdIdByIso6396Code = require('wikidata-lang/mappings/wd_id_by_iso_639_6_code.json')
wdIdByIso6396Code['jah'] // => 'Q2742661'

const wmCodeByWdId = require('wikidata-lang/mappings/wm_code_by_wd_id.json')
wmCodeByWdId['Q32762'] // => 'fiu-vro'

// This one seems useless, as ISO 639-1 seem to always be equivalent to their corresponding Wikimedia language code
const wmCodeByIso6391Code = require('wikidata-lang/mappings/wm_code_by_iso_639_1_code.json')
wmCodeByIso6391Code['id'] // => 'id'

const wmCodeByIso6392Code = require('wikidata-lang/mappings/wm_code_by_iso_639_2_code.json')
wmCodeByIso6391Code['zza'] // => 'diq'

const wmCodeByIso6393Code = require('wikidata-lang/mappings/wm_code_by_iso_639_3_code.json')
wmCodeByIso6391Code['yue'] // => 'zh-yue'

const wmCodeByIso6396Code = require('wikidata-lang/mappings/wm_code_by_iso_639_6_code.json')
wmCodeByIso6391Code['yyef'] // => 'zh-yue'

// etc.

See the mappings folder for a full list of available mappings.

NB: When several values are available for a given key, no value is added to the mapping for that key. It is added instead to the mappings/x_by_x.multiple_values.json file. For instance, in ISO 639-2, both ell and gre are mapped to el in ISO 639-1 (also true for Wikimedia language codes)

Examples

See how we use it in Inventaire

License

MIT