A non-deterministic typical expressions parser for Haskell language. A typical expression is a type-safe regular expression.
Other proposed names include: typex (typical expressions) and texen (typical expressions engine).
import Typical
gives you many functions that correspond to standard regular expressions operators. To parse a string use a match :: [Pattern] -> [String]
. For example:
match [_digit, _char '+', _digit] "4+5"
match [_digit `_or` _alpha] "a1"
match [_digit, _optional . _seq $ [ _char '(', _word "one", _char ')' ] ] "one"
The result will be an array of strings that represents a list of possible matches. Match takes an array of patterns for convenience. Internally, it converts the array into the Sequence
structure.
The most important data type is called Pattern
. It is recursive and has a number of constructors. By applying it to the match
function you get a list of possible matches.
_seq :: [Pattern] -> Pattern
- converts an array of patterns into a sequence pattern._or :: Pattern -> Pattern -> Pattern
- converts two patterns into a disjunction.
_char :: Char -> Pattern
- converts a character into a pattern that matches it._digit :: Pattern
- matches any digit._alpha :: Pattern
- matches one upper or lower case letter._lower :: Pattern
- matches a lower case letter._upper :: Pattern
- matches upper case letter._anything :: Pattern
- matches any character._oneOf :: [Char] -> Pattern
- matches any one of the characters in the passed array._whitespace = _oneOf " \t\n\r"
_real :: Pattern
- matches a positive or negative real number.
_exactly :: Num -> Pattern -> Pattern
- matches a passed pattern exactly n times or fails._between :: Num -> Num -> Pattern
- greedily matches a pattern between min and max times._one = _exactly 1
_some :: Pattern -> Pattern
- greedily matches a pattern at least once._any :: Pattern -> Pattern
- greedily matches a pattern zero or more times._optional :: Pattern -> Pattern
- greedily matches zero or one times._not :: Pattern -> Pattern
- matches 0 times or fails.
Plans to convert this to a monadic parser are on the way.
This parser still lacks important features, like ^
and $
.
A non-deterministic lexer is also included in the Lexical
package.