Skip to content

A Javascript implementation of the Unicode 9.0.0 Bidirectional Algorithm

License

Notifications You must be signed in to change notification settings

bbc/unicode-bidirectional

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unicode-bidirectional

Code Climate Test Coverage Build Status
A Javascript implementation of the Unicode 9.0.0 Bidirectional Algorithm

This is an implementation of the Unicode Bidirectional Algorithm (UAX #9) that works in both Browser and Node.js environments. The implementation is conformant as per definition UAX#9-C1.

Installation

npm install unicode-bidirectional --save

Usage

unicode-bidirectional is declared as a Universal Module (UMD), meaning it can be used with all conventional Javascript module systems:

1. ES6

import { resolve, reorder } from 'unicode-bidirectional';

const codepoints = [0x28, 0x29, 0x2A, 0x05D0, 0x05D1, 0x05D2]
const levels = resolve(codepoints, 0);  // [0, 0, 0, 1, 1, 1]
const reordering = reorder(codepoints, levels); // [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]

2. CommonJS

var UnicodeBidirectional = require('unicode-bidirectional/dist/unicode.bidirectional');
var resolve = UnicodeBidirectional.resolve;
var reorder = UnicodeBidirectional.reorder;

var codepoints = [0x28, 0x29, 0x2A, 0x05D0, 0x05D1, 0x05D2]
var levels = resolve(codepoints, 0);  // [0, 0, 0, 1, 1, 1]
var reordering = reorder(codepoints, levels); // [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]

3. RequireJS

require(['UnicodeBidirectional'], function (UnicodeBidirectional) {
  var resolve = UnicodeBidirectional.resolve;
  var reorder = UnicodeBidirectional.reorder;

  var codepoints = [0x28, 0x29, 0x2A, 0x05D0, 0x05D1, 0x05D2]
  var levels = resolve(codepoints, 0);  // [0, 0, 0, 1, 1, 1]
  var reordering = reorder(codepoints, levels); // [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]
});

4. HTML5 <script> tag

<script src="unicode.bidirectional.js" /> <!-- exposes window.UnicodeBidirectional -->
var resolve = UnicodeBidirectional.resolve;
var reorder = UnicodeBidirectional.reorder;

var codepoints = [0x28, 0x29, 0x2A, 0x05D0, 0x05D1, 0x05D2]
var levels = resolve(codepoints, 0);  // [0, 0, 0, 1, 1, 1]
var reordering = reorder(codepoints, levels); // [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]

You can download unicode.bidirectional.js from Releases. Using this file with a <script> tag will expose UnicodeBidirectional as global variable on the window object.

API

resolve(codepoints, paragraphlevel[, automaticLevel = false])

Returns the resolved levels associated to each codepoint in codepoints[1]. This levels array determines: (i) the relative nesting of LTR and RTL characters, and hence (ii) how characters should be reversed when displayed on the screen.

The input codepoints are assumed to be all be in one paragraph that has a base direction of paragraphLevel – this is a Number that is either 0 or 1 and represents whether the paragraph is left-to-right (0) or right-to-left (1). automaticLevel is an optional Boolean flag that when present and set to true, causes this function to ignore the paragraphlevel argument and instead attempt to deduce the paragraph level from the codepoints. [2]
Neither of the two input arrays are mutated.

reorder(codepoints, levels)

Returns the codepoints in codepoints reordered (i.e. permuted) according the levels array. [3]
Neither of the two input arrays are mutated.

reorderPermutation(levels[, IGNORE_INVISIBLE = false])

Returns the reordering that levels represents as an permutation array. When this array has an element at index i with value j, it denotes that the codepoint previous positioned at index i is now positioned at index j. [4]
The input array is not mutated. The IGNORE_INVISIBLE parameter controls whether or not invisible characters (characters with a level of 'x' [5]) are to be included in the permutation array. By default, they are included in the permutation (they are not ignored, hence IGNORE_INVISIBLE is false).

mirror(codepoints, levels)

Replaces each codepoint in codepoints with its mirrored glyph according to rule L4 and the levels array.
Neither of the two input arrays are mutated.

constants

An object containing metadata used by the bidirectional algorithm. This object includes the following keys:

  • mirrorMap: a map mapping a codepoint to its mirrored counterpart, e.g. looking up "<" gives ">". If a codepoint does not have a mirrored counterpart, then there is no key-value pair in the map and so a lookup will give undefined. [6]
  • oppositeBracket: a map mapping a codepoint to its bracket pair counterpart, e.g. looking up "(" gives ")". If a codepoint does not have a bracket pair counterpart, then there is no key-value pair in the map and so a lookup will give undefined. [7]
  • openingBrackets: a set containing all brackets that are opening brackets. [7]
  • closingBrackets: a set containing all brackets that are closing brackets. [7]

Additional Notes:

For all the above functions, codepoints are represented by an Array of Numbers where each Number denotes the Unicode codepoint of the character, that is an integer between 0x0 and 0x10FFFF inclusive. levels are represented by an Array of Numbers where Number is an integer between 0 and 127 inclusive. One or more entries of levels may be the string 'x'. This denotes a character that does not have a level [5].

[1]: Codepoints are automatically converted to NFC normal form if they are not already in that form.
[2]: This function deduces the paragraph level according to: UAX#P1, UAX#P2 and UAX#P3.
[3]: This is an implementation of UAX#9-L2.
[4]: More formally known as the one-line notation for permutations. See Wikipedia.
[5]: Some characters have a level of x – the levels array has a string 'x' instead of a number. This is expected behaviour. The reason is because the Unicode Bidirectional algorithm (by rule X9.) will not assign a level to certain invisible characters / control characters. They are basically completely ignored by the algorithm. They are invisible and so have no impact on the visual RTL/LTR ordering of characters. Most of the invisible characters that fall into this category are in this list.
[6]: This is taken from BidiMirroring.txt.
[7]: This is taken from BidiBrackets.txt.

Polyfills

unicode-bidirectional uses the following ECMAScript 2015 (ES5) features that are not fully supported by Internet Explorer and older versions of other browsers:

If you are targeting these browsers, you'll need to add one or more Polyfill libraries to fill in these features (for example, es6-shim and unorm).

More Info

For other Javascript Unicode Implementations see:

License

MIT.
Copyright (c) 2017 British Broadcasting Corporation