A Javascript implementation of the Unicode 9.0.0 Bidirectional Algorithm
This is an implementation of the Unicode Bidirectional Algorithm (UAX #9) that works in both Browser and Node.js environments. The implementation is conformant as per definition UAX#9-C1.
npm install unicode-bidirectional --save
unicode-bidirectional is declared as a Universal Module (UMD), meaning it can be used with all conventional Javascript module systems:
1. ES6 →
import { resolve, reorder } from 'unicode-bidirectional';
const codepoints = [0x28, 0x29, 0x2A, 0x05D0, 0x05D1, 0x05D2]
const levels = resolve(codepoints, 0); // [0, 0, 0, 1, 1, 1]
const reordering = reorder(codepoints, levels); // [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]
2. CommonJS →
var UnicodeBidirectional = require('unicode-bidirectional/dist/unicode.bidirectional');
var resolve = UnicodeBidirectional.resolve;
var reorder = UnicodeBidirectional.reorder;
var codepoints = [0x28, 0x29, 0x2A, 0x05D0, 0x05D1, 0x05D2]
var levels = resolve(codepoints, 0); // [0, 0, 0, 1, 1, 1]
var reordering = reorder(codepoints, levels); // [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]
3. RequireJS →
require(['UnicodeBidirectional'], function (UnicodeBidirectional) {
var resolve = UnicodeBidirectional.resolve;
var reorder = UnicodeBidirectional.reorder;
var codepoints = [0x28, 0x29, 0x2A, 0x05D0, 0x05D1, 0x05D2]
var levels = resolve(codepoints, 0); // [0, 0, 0, 1, 1, 1]
var reordering = reorder(codepoints, levels); // [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]
});
4. HTML5 <script>
tag →
<script src="unicode.bidirectional.js" /> <!-- exposes window.UnicodeBidirectional -->
var resolve = UnicodeBidirectional.resolve;
var reorder = UnicodeBidirectional.reorder;
var codepoints = [0x28, 0x29, 0x2A, 0x05D0, 0x05D1, 0x05D2]
var levels = resolve(codepoints, 0); // [0, 0, 0, 1, 1, 1]
var reordering = reorder(codepoints, levels); // [0x28, 0x29, 0x2A, 0x05D2, 0x05D1, 0x05D0]
You can download unicode.bidirectional.js
from Releases.
Using this file with a <script>
tag will
expose UnicodeBidirectional
as global variable on the window
object.
Returns the resolved levels associated to each codepoint in codepoints
[1].
This levels array determines: (i) the relative nesting of LTR and RTL characters, and
hence (ii) how characters should be reversed when displayed on the screen.
The input codepoints are assumed to be all be in one paragraph that has a base direction of paragraphLevel
–
this is a Number that is either 0 or 1 and represents whether the paragraph is left-to-right (0) or right-to-left (1).
automaticLevel
is an optional Boolean flag that when present and set to true,
causes this function to ignore the paragraphlevel
argument and instead attempt to deduce the paragraph level from the codepoints. [2]
Neither of the two input arrays are mutated.
Returns the codepoints in codepoints
reordered (i.e. permuted) according the levels
array. [3]
Neither of the two input arrays are mutated.
Returns the reordering that levels
represents as an permutation array.
When this array has an element at index i with value j, it denotes that the codepoint
previous positioned at index i is now positioned at index j. [4]
The input array is not mutated. The IGNORE_INVISIBLE
parameter controls whether or not
invisible characters (characters with a level of 'x' [5])
are to be included in the permutation array.
By default, they are included in the permutation (they are not ignored, hence IGNORE_INVISIBLE
is false).
Replaces each codepoint in codepoints
with its mirrored glyph according to rule L4
and the levels
array.
Neither of the two input arrays are mutated.
An object containing metadata used by the bidirectional algorithm. This object includes the following keys:
mirrorMap
: a map mapping a codepoint to its mirrored counterpart, e.g. looking up "<" gives ">". If a codepoint does not have a mirrored counterpart, then there is no key-value pair in the map and so a lookup will give undefined. [6]oppositeBracket
: a map mapping a codepoint to its bracket pair counterpart, e.g. looking up "(" gives ")". If a codepoint does not have a bracket pair counterpart, then there is no key-value pair in the map and so a lookup will give undefined. [7]openingBrackets
: a set containing all brackets that are opening brackets. [7]closingBrackets
: a set containing all brackets that are closing brackets. [7]
Additional Notes:
For all the above functions, codepoints are represented by an Array of Numbers where each Number denotes the Unicode codepoint of the character, that is an integer between 0x0 and 0x10FFFF inclusive. levels are represented by an Array of Numbers where Number is an integer between 0 and 127 inclusive. One or more entries of levels may be the string 'x'. This denotes a character that does not have a level [5].
[1]: Codepoints are automatically converted to NFC normal form if they are not already in that form.
[2]: This function deduces the paragraph level according to: UAX#P1, UAX#P2 and UAX#P3.
[3]: This is an implementation of UAX#9-L2.
[4]: More formally known as the one-line notation for permutations. See Wikipedia.
[5]: Some characters have a level of x – the levels array has a string 'x' instead of a number. This is expected behaviour. The reason is because the Unicode Bidirectional algorithm (by rule X9.) will not assign a level to certain invisible characters / control characters. They are basically completely ignored by the algorithm. They are invisible and so have no impact on the visual RTL/LTR ordering of characters. Most of the invisible characters that fall into this category are in this list.
[6]: This is taken from BidiMirroring.txt.
[7]: This is taken from BidiBrackets.txt.
unicode-bidirectional uses the following ECMAScript 2015 (ES5) features that are not fully supported by Internet Explorer and older versions of other browsers:
If you are targeting these browsers, you'll need to add one or more Polyfill libraries to fill in these features (for example, es6-shim and unorm).
For other Javascript Unicode Implementations see:
- devongovett/grapheme-breaker – Unicode Grapheme Cluster Breaking Algorithm (UAX #29)
- devongovett/linebreak – Unicode Line Breaking Algorithm (UAX #14)
MIT.
Copyright (c) 2017 British Broadcasting Corporation