Add initial release candidate

This adds the first snapshot of the project that should be good for releasing, a release candidate. There's a set of tests, a `pytest` project, written and used for testing this library, but these haven't been integrated with the library project yet, so are omitted for now. See `./README.md' for packaging, installation, usage etc.
amn · Sep 22, 2024 · b930beb · b930beb
commit b930beb
Show file tree

Hide file tree

Showing 17 changed files with 2,633 additions and 0 deletions.
diff --git a/LICENSE.txt b/LICENSE.txt
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1 @@
+exclude src/csspring/syntax/tokenizing.py
diff --git a/Makefile b/Makefile
@@ -0,0 +1,10 @@
+INSTALL_DATA = install -m 644
+PYTHON = python
+SHELL = bash
+.SHELLFLAGS += -o pipefail
+VPATH += $(dir $(lastword $(MAKEFILE_LIST)))
+
+$(addprefix csspring/,$(addprefix syntax/,$(addsuffix .py,tokenizing))): %.py: expand-macros.py expand/%.py
+	$(PYTHON) $< < $(lastword $^) | $(INSTALL_DATA) -D /dev/stdin $@
+
+.DELETE_ON_ERROR:
diff --git a/README.md b/README.md
@@ -0,0 +1,92 @@
+Offered here is `csspring`, a software library for parsing of [CSS](http://www.w3.org/TR/CSS) text, implemented as a Python package.
+
+## Installation
+
+Installation of the library follows [Python package installation conventions](http://packaging.python.org/en/latest/tutorials/installing-packages).
+
+Releases of the library are made available for immediate installation at PyPi, the project page is at http://pypi.org/project/csspring.
+
+Installation can thus be done with e.g. [`pip`](http://packaging.python.org/en/latest/key_projects/#pip) over Internet:
+
+```shell
+pip install csspring
+```
+
+Releases are also published with the project's [canonical] Github repository, at http://github.com/amn/csspring/releases. These releases contain the same files that get uploaded to PyPi.
+
+The installation package files are built with [the conventional method](http://packaging.python.org/en/latest/tutorials/packaging-projects) (with the repository being current working directory):
+
+ ```shell
+python -m build
+```
+
+It should go without saying that whether you choose to install the package with `pip install csspring` or specify one of the files downloaded from the "Releases" Github page or built yourself (`csspring-....tar.gz` or `csspring-....whl`), to `pip install` — the result is equivalent, as the building process described above is used for releasing the package for distribution and installation.
+
+## Usage
+
+### Examples
+
+The code snippet below demonstrates obtaining of a _parse tree_ (in the `stylesheet` variable) by parsing the file `example.css`:
+
+```python
+from csspring.parsing import normalize_input, parse_stylesheet
+stylesheet = parse_stylesheet(normalize_input(open('example.css', newline=''))))) # The `newline=''` argument prevents default re-writing of newline sequences in input — per the CSS Syntax spec., parsing does filtering of newline sequences so no rewriting by `open` is necessary or desirable
+```
+
+## Documentation
+
+Proficient usage of the library is expected first and foremost reading the documentation supplied with the source code in the form of [_docstrings_](http://docs.python.org/3.11/glossary.html#term-docstring) which annotate the package and its elements:
+
+```python
+import csspring
+help(csspring) # Will list packages and modules contained by `csspring`, which one may further invoke `help` on, as is convention
+```
+
+Some of the documentation is naturally deferred to the [Syntax](http://drafts.csswg.org/css-syntax) and [Selectors](http://drafts.csswg.org/selectors-4) specifications that the library implements.
+
+Requirements to Python (version, platform etc) are expressed with the provided `pyproject.toml` file (as per [PEP 621](http://peps.python.org/pep-0621/), originally).
+
+## Compliance
+
+The `csspring.syntax` package was written to implement the Editor's Draft edition of the ["CSS Syntax Module Level 3"](http://drafts.csswg.org/css-syntax) specification, with the latter serving as reference during development. This was done to reduce the amount of effort required to implement a parser — the parser can "blindly" follow the steps outlined in the specification and defer to the latter what regards ambiguities and even design choices (for better and for worse). The specification does de-facto double as an abstract CSS parser, after all.
+
+The Editor's Draft edition was chosen specifically instead of [the "Technical Report" version](http://www.w3.org/TR/css-syntax-3) because initial attempts at following the latter uncovered some ambiguities that we could not resolve. [^1]
+
+> [!NOTE]
+> Staying true to the specification, the `csspring.syntax` package does _not_ itself implement parsing of [_selectors_](http://drafts.csswg.org/selectors). Parsing of CSS text that includes parsing of selectors [in qualified rules], is enabled by the top-level `csspring` package augmenting construct(s) in the `csspring.syntax` package, which happens automatically during importing of any module. Said augmentation is done in a manner that doesn't break compliance for `csspring.syntax` yet enables parsing of selectors in CSS text all the same. Parsing of selectors is done on demand — a rule's `prelude` value is parsed when accessing the `selector_list` property on `QualifiedRule` objects.
+
+The `csspring.selectors` module was written to implement the Editor's Draft edition of the ["Selectors Level 4"](http://drafts.csswg.org/selectors-4) specification. The module offers parsing of CSS selectors specifically. The Editor's Draft edition was chosen over [the "Technical Report" version](http://www.w3.org/TR/selectors-4) for consistency with the `csspring.syntax` package following an Editor's Draft edition.
+
+## Deviations
+
+### Preservation of input
+
+Parsing offered by the library _preserves all input text_ it is fed, character for character (down to the original unfiltered input text). This allows recovery of white-space and comments in CSS text, as-is, a property of the parser that was neither defined nor facilitated by the CSS syntax specification. Such preservation of input was designed into this implementation to facilitate a broader range of parsing applications, where e.g. transforming stylesheets must be done without discarding of comments or inadverted change of white-space in transformed output. The parser thus includes the so-called identity transformation parsing — where input is parsed into a product from which the original text stream may be recovered _exactly_, without any loss.
+
+## Disclaimer
+
+Parsing is offered only in the form of Python modules — no "command-line" program(s), e.g. such that can be invoked from the shell to parse CSS file(s) and write parse trees (in some format fit for the purpose), are included. This was a deliberate choice to contain the scope of the project, since adding a command-line parsing tool arguably implies solving a number of problems which have little to do with parsing proper. For instance, the tool would need to decide on the serialization format for the parse trees it writes. In any case, such a tool would likely benefit from a development project of its own, and so was considered out of scope here. Rest assured this library should be well able to support such tool and that one may be written in the future, to complement the library and make its function more inter-operable and accessible.
+
+## Frequently Asked Questions
+
+### Why?
+
+We wanted a "transparent" CSS parser — one that one could be used in different configurations without it imposing limitations that would strictly speaking go beyond parsing. Put differently, we wanted a parser that does not assume any particular application, a software _library_ in the classical sense of the term, or a true _API_ if you will.
+
+For instance, the popular [Less](http://lesscss.org) software seems to rather effortlessly parse CSS [3] text, but it invariably re-arranges white-space in the output, without giving the user any control over the latter. Less is not _transparent_ like that — there is no way to use it with recovery of the originally parsed text from the parse tree — parsing with Less is a one-way street for at least _some_ applications (specifically those that "transform" CSS but need to preserve all of the original input as-is).
+
+In comparison, this library was written to preserve _all_ input, _as-is_. This became one of the requirements defining the library, contributing to its _reason d'etre_.
+
+### Why Python?
+
+As touched upon in [the disclaimer above](#disclaimer), the parser was written "from the bottom up" - if it ever adopts a top layer exposing its features with a "command line" tool, said layer will invariably have to tap into the rest of it, the library, and so in the very least a library is offered. Without a command-line tool (implying switches and other facilities commonly associated with command-line tools) the utility of the parser is tightly bound to the capabilities of e.g. the programming language it was written in, since the language effectively functions as the interface to the library (you can hardly use a library offered in the form of a C code without a C compiler and/or a dynamic linker). A parser is seldom used in isolation, after all — its output, the parse tree, is normally fed to another component in a larger application. Python is currently ubiquitous and attractive looking at a set of metrics that are relevant here. The collective amount of Python code is currently growing steadily, which drives adoption, which makes the prospect of offering CSS parsing written in specifically Python ever more enticing.
+
+Another factor for choosing Python was the fact we couldn't find any _sufficiently capable_ CSS parsing libraries written specifically as [reusable] Python module(s). While there _are_ a few CSS parsing libraries available, none declared compliance with or de-facto support CSS 3 (including features like nested rules etc). In comparison, this library was written in close alignment with CSS 3 standard specification(s) (see [the compliance declaration](#compliance)).
+
+### What's with the name?
+
+Ignoring the "css" part, "spring" in the name refers to my starting the project in [early] spring of [2024]. A Python package needs a name, _some_ name, from the get-go, and the name stuck. I pronounce it as *cs-spring*.
+
+## References
+
+[^1]: http://github.com/w3c/csswg-drafts/issues/10119#issuecomment-2016156566
diff --git a/expand-macros.py b/expand-macros.py
@@ -0,0 +1,90 @@
+"""A macro processing module for Python code.
+
+Macro processing refers here to eager rewriting/replacement/substitution of Python code constructs decorated with the "syntactic" (no definition available normally, when the containing module is imported) decorator `macro`. The purpose of such processing is to implement the equivalent to what is usually called "pre-processing" for e.g. C/C++ language(s). As `macro`-decorated procedures (only decorating of procedures is currently effectively supported for `macro`) are encountered during processing of Python code, the entire procedure is removed and "unparsed" equivalent of the series of AST statements it returned, are inserted in its place instead.
+
+This implements powerful and "semantically-aware" code pre-processing mechanism, for situations demanding it. Our immediate need with this was to allow type checkers like MyPy to be able to analyze as much of the project's Python code as possible, which these are normally unable to do in cases of so-called dynamically created types (and consequently object(s) of such types). And so instead of living with effectively uncheckable dynamic types created with the `type` built-in -- for e.g. `Token` subclasses -- we employ _pre-processing_ of Python code into Python code which lends to type-checking, a benefit we deemed to ba a "must-have" for the project.
+"""
+
+import ast
+from collections.abc import Mapping, Sequence
+import os
+import sys
+from typing import Any, Callable, cast, Iterable, TypeAlias
+
+Pos: TypeAlias = tuple[int, int] # A [2-D] "position" (aka vector) type, for dealing with source code locations
+
+def is_template_rewrite_decorator(decorator: ast.AST) -> bool:
+    """Identify the `macro` decorator.
+    :param decorator: An abstract syntax tree (AST) node representing a decorator in some [parsed] Python code
+    :returns: `True` if the node represents the `macro` decorator, our marker for rewriting the entire decorated object, `False` otherwise
+    """
+    match decorator:
+        case ast.Name(macro.__name__):
+            return True
+        case _:
+            return False
+
+def macro(callable: Callable[[], Iterable[ast.AST]]):
+    """A `macro` decorator stub.
+
+    The `macro` decorator isn't used beyond just identifying constructs in the code that it decorates -- but compiling of decorated constructs as part of dynamically constructed modules, something we depend on for actually executing the "macro" (the procedure `macro` decorates), demands that `macro` is defined in context of executing the module (see the `exec` call in `process`).
+    :param callable: A callable to decorate with this decorator, as per convention; although all callables are permitted, decoration of object(s) other than procedures is undefined
+    :returns: The decorated object; as is, currently `macro` is an identity function and the result value is immaterial to this module, since the result of decoration isn't actually executed
+    """
+    return callable
+
+def source_span(lines: Sequence[str], prev: Pos, cur: Pos) -> Iterable[str]:
+    """Get chunk(s) of text between two [line-and-column] positions
+
+    E.g. `source_span('foo\nbar\nbaz'.splitlines(keepends=True), (1, 1), (3, 1))` will yield `'oo\n'`, `'bar\n'` and `'b'` (in that order).
+
+    :param lines: Lines of source code to use for getting a span in
+    :param prev: The "starting" position of the span, a 2-tuple with the (1-based) line number and column offset for first and second items, respectively
+    :param cur: The "ending" position of the span, also a 2-tuple of the same profile as `prev`
+    :returns: An iterable of chunks of text contained exactly between the two positions
+    """
+    yield lines[prev[0] - 1].encode()[prev[1]:(cur[1] if cur[0] == prev[0] else None)].decode()
+    if cur[0] != prev[0]:
+        for i in range(prev[0] + 1, cur[0]):
+            yield lines[i - 1]
+        yield lines[cur[0] - 1].encode()[:cur[1]].decode()
+
+def process(source: str) -> Iterable[str]:
+    """Find and replace macros in Python source code, vending rewritten copy.
+
+    :param source: Body of Python source code (e.g. contents of Python module file)
+    :returns: An iterable of chunks of source code generally equivalent to `source` but with occurrences of `macro`-decorated constructs "expanded" (replaced with "unparsed" result of calling the decorated construct)
+    """
+    lines = getattr(ast, '_splitlines_no_ff')(source) # TODO: Find a stable way to split source into lines in the manner compatible with `ast.parse`
+    prev_node = ast.stmt(end_lineno=1, end_col_offset=0)
+    assert prev_node.end_lineno is not None
+    assert prev_node.end_col_offset is not None
+    for node in ast.parse(source).body:
+        is_macro_node = any(is_template_rewrite_decorator(decorator) for decorator in getattr(node, 'decorator_list', []))
+        if is_macro_node:
+            assert hasattr(node, 'decorator_list')
+            macros: Mapping[str, Any] = dict()
+            exec(compile(ast.Module(body=[ node ], type_ignores=[]), __file__, mode='exec'), globals(), macros)
+            macro_node = node
+            decorator_node = node.decorator_list[0]
+            node = ast.stmt(lineno=decorator_node.lineno, col_offset=decorator_node.col_offset - 1)
+        yield from source_span(lines, (prev_node.end_lineno, prev_node.end_col_offset), (node.lineno, node.col_offset))
+        if is_macro_node:
+            prev_item = None
+            yield f"# The following construct(s) were inserted automatically through expansion of the {repr(cast(ast.FunctionDef | ast.ClassDef, macro_node).name)} macro\n\n"
+            for item in next(iter(macros.values()))():
+                if prev_item:
+                    yield from os.linesep * 2
+                ast.fix_missing_locations(item)
+                yield ast.unparse(item)
+                prev_item = item
+            yield f"\n\n# End of macro expansion result"
+        else:
+            assert node.end_lineno is not None
+            assert node.end_col_offset is not None
+            yield from source_span(lines, (node.lineno, node.col_offset), (node.end_lineno, node.end_col_offset))
+        prev_node = macro_node if is_macro_node else node
+
+if __name__ == '__main__':
+    source = (sys.stdin if len(sys.argv) < 2 or sys.argv[1] == '-' else open(sys.argv[1], newline='')).read()
+    sys.stdout.writelines(process(source))