Skip to content

Related projects

Nicolas Silva edited this page Jun 27, 2023 · 33 revisions

Here are some other projects doing path rendering on the GPU and some notes about them.

Tiling approaches

A lot of recent attempts at doing vector graphics on the GPU revolve around breaking large and complex scenes into a regular grid of small (for example 16x16 pixels) tiles.

Very important paper to read (maybe the first to explore this tiling idea): Random-Access Rendering of General Vector Graphics or in short "the RAVG paper".

Interesting properties of working with tiles:

  • Break complex scenes into small manageable parts. It would be unreasonable to have a shader loop over all edges of a scene for each pixel, however it's totally reasonable to have it loop over all of the edges intersecting the 16x16 tile that the pixel is in.
  • Regular tile grids are easy to work with and wrap our heads around. Especially on GPU, the "regular" aspect is key to efficiently doing work in parallel. More involved data structures may offer better algorithmic complexity at the expense of worse parallelism, memory access patterns, register usage, etc.
  • Easy to notice that a tile is entirely covered by an opaque path and discard all of the content that is underneath.
  • Most of these approaches have a "tiling phase" and a "rendering phase", with a data format in between. The tiling phase usually the most challenging. It's rather easy with this architecture to have different implementations for the tiling phase (potentially falling back to CPU) and pick depending on hardware/driver features and bugs.

I couldn't find source code for the original RAVG paper, however there is a thesis by Ivan Lebel (link on his blog) on followup RAVG work with source code at https://github.com/ileben/RAVG.

Pathfinder

Source code: https://github.com/pcwalton/pathfinder

Several approaches were tried in pathfinder:

  • (pf2) Tessellation approach based on a trapezoidal partition. Each trapezoid generates a bounding rectangle and an "internal" rectangle: the latter is used to pre-fill opaque areas and use the z-buffer to reduce overdraw while the latter contains the equation of the edges on both sides of the trapezoid and renders the curve with anti-aliasing. The approach works generally well but has a few glitches in almost vertical edges (would be fixable with conservative rasterization).
  • (pf2-text) An approach similar to stencil-and-cover except that it renders the winding number in a float texture with anti-aliasing. Very fast for rather small text but performance degrades quickly with very high resolution so a good solution for text but not great for SVG paths. No overdraw mitigation.
  • (pf3) Same approach as pf2-text on the GPU but large paths are split into 16x16 tiles which are rasterized independently. A sweep-line algorithm is used to generate the tiles and the algorithm is somewhat similar to a software rasterizer but writes the edges that affect each tile rather than filling pixels. Full tiles are used to discard content from other paths below and reduce overdraw.
  • Explained here: https://nical.github.io/posts/a-look-at-pathfinder.html
  • Promising approach in that it is less of a numerical stability nightmare than many other approaches, is rather simple to implement and doesn't rely on advanced GPU features.
  • Performance depends on how fast the tiling algorithm is implemented and there's a trade-off between CPU and GPU cost depending on the tile size.
  • The sweep-line algorithm was later replaced by the tile decomposition algorithm from the RAVG paper.
  • A GPU tiling algorithm was later added in addition to the CPU tiling
  • Also has a compute code path that works like piet-metal

Ochre

Source code: https://github.com/glowcoil/ochre

RustFest presentation: https://www.youtube.com/watch?v=9MAC3bojeyo

Similar tile-based approach to pathfinder, except mask tiles are rasterized on the CPU.

Even simpler than PF, would run on very old hardware, nothing fancy done on the GPU so very little risk of driver bugs.

Piet-metal

Source code: https://github.com/linebender/piet-metal Most interesting bits are in https://github.com/linebender/piet-metal/blob/master/TestApp/PietRender.metal (actually quite easy to read).

Raph Levien's first GPU path rendering experiment.

Architecture explained here https://raphlinus.github.io/rust/graphics/gpu/2019/05/08/modern-2d.html and here https://raphlinus.github.io/rust/graphics/gpu/2020/06/01/piet-gpu-progress.html

  • Entirely in compute shaders
  • Tile-based architecture (like pathfinder) with 16x16 pixels tiles.
  • The scene is packed into a big GPU-visible data structure with path commands.
  • A compute shader first assigns edges to screen-space tiles in a kind of brute-force way by having all wavefronts traverse the whole scene. Wavefronts are organized in blocks of 16x1 tiles so that the edges above and below the row can be quickly discarded in parallel by each thread of the wavefront, using wavefront instructions (ballot) before proceeding with the second part of the shader that assigns remaining edges to tiles horizontally.
  • The result of the tiling shader is for each 16x16 tile is a list of rendering commands that affect the tile:
    • Begin/end path + id of the path's pattern
    • edge (line segment)
    • a "backdrop" command indicating that the top-left corner of the tile is inside a path
  • Another compute shader, the rendering shader, reads these commands and render them into the framebuffer. 1 thread per pixel, 1 wavefront per tile.
    • What's interesting about this approach is that blending happens entirely in registers. Only the final pixel is written into the framebuffer. That's a very important performance win.

Drawbacks:

  • Dynamic allocation inside a shader is hard (don't know ahead of time how much space to reserve for tile commands).
  • The whole rendering model is implemented in the rendering shader. For example solid colors, gradients, images, blend modes, etc. are all done inside a switch statement when looping over the commands. Extending the shader with new features is always a risk of making the shader slower (more register hungry, etc.). However this could be mitigated by moving some functionalities into a pass before rendering which bakes any kind of pattern into an image so that the rendering shader treats it as an image.
  • The tiling code is very simple but may not scale well for some scenes. Issues explained towards the end of this post: https://raphlinus.github.io/rust/graphics/gpu/2020/06/12/sort-middle.html

Piet-gpu

Piet-metal's successor. A similar tile-based architecture, with a very similar rendering shader at the end of the pipeline, however the tiling process is much more sophisticated.

https://raphlinus.github.io/rust/graphics/gpu/2020/06/12/sort-middle.html

  • Inspired by https://research.nvidia.com/publication/high-performance-software-rasterization-gpus
  • Multi-stage tiling which first bins into large 256x256 pixels bins, and later in small 16x16 tiles.
  • Part of the reason for intermediate bins is to be able to use shared memory which has a limited size.
  • Binning in parallel breaks ordering of the commands, so the next stage ("coarse raster") has to sort them back. It then counts edges to allocate spaces for tile drawing commands before writing them out.

Main challenges:

  • Dynamically allocating memory to make space for drawing commands on the GPU.
  • Very complicated compute kernels, subject to the whims of poor drivers.
  • Overall quite complicated.

In my humble opinion this is the most promising approach to rendering large-scale vector graphics on modern GPUs. Emphasis on "modern" because it requires advanced compute shader features and it's unclear to me how well it would work on the zoo of buggy hardware and drivers out there. But on the high-end it's quite impressive. There is work underway towards a software fallback using the same shader code but compiled ahead of time to be run on the CPU with SIMD.

Spinel

Source code: https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/src/graphics/lib/compute/spinel

Fully based on compute shaders, also tiling-based.

TODO

There's a rust/webgpu port of some of the ideas behind spinel here: https://github.com/Kangz/cassia It's a bit easier to digest but a large part of it is running on the CPU instead of the GPU.

The following are observations about cassia, some of it may apply to spinel:

  • Instead of rasetrizing line segments at the end of the pipeline, edges are broken up into "psegments" which I assume stands for pixel-segments. These encode the following into a single 64 bit integer:
  • tile x, y
  • local x, y (within tile, only 3 bits per axis, tiles are 8x8 pixels)
  • layer index
  • area
  • coverage
  • There's a sorting pass using the value itself (taking advantage of the tile y, x being at the top the psegment).
  • See surpass/src/rasterizer/mod.rs and surpass/src/segment.rs for the psegment building part (CPU-side).
  • See cassia/src/TileWorkgroupRasterizer.cpp for the rasterization (GPU-side).

vger / vger-rs

Source code: https://github.com/audulus/vger-rs and https://github.com/audulus/vger

  • The simple renderer
    • On the CPU, paths are decomposed into quadratic bézier curves and binned into horizontal bands via a simple sweep,
    • a fragment shader rasterizes the bands
  • the experimental tiled renderer
    • edges (and other types of primitives) are assigned to tiles in a shader,
    • in another pass, pixels are rendered by going over the per-tile commmands (like piet's k4).
  • In both configurations, the edges are not flattened, they are rendered as quadratic bézier curves using the ravg distance approximation for aa and something inpsired from loop-blinn for the inside/outstde test.

Other approaches

FastUIDraw (Intel)

Source code: https://github.com/01org/fastuidraw

  • Tessellation
  • Resolution-dependent tessellation
  • Relies a lot on caching geometry.
  • Splits all images into fixed size tiles and store them into a big atlas with padding to avoid sampling artifacts
    • Very similar to a sparse virtual texture
    • Using an indirection texture
    • Optimizes when several tiles are filled with the same color into one tile.
  • Uses the depth buffer to clip out (rendering the clip path before)
    • Clips are not anti-aliased
  • Separate solution for glyphs

Some technical details in these blog posts: https://web.archive.org/web/20190116182936/https://01.org/fast-ui-draw/blogs/krogovin/2016/fast-ui-draw-technical-details-1 and https://web.archive.org/web/20190116182939/https://01.org/fast-ui-draw/blogs/krogovin/2016/fast-ui-draw-technical-details-2.

NVPath (Nvidia)

  • Stencil-and-cover
  • Use a loop-blinn style shader to render curves to the stencil buffer on the gpu without streaming too much data
  • MSAA

Direct2D (Microsoft)

  • Scanline conversation with trapezoids for edge AA. See wpf-gpu-raster.
  • More recent versions use tessellation with Target Independent Rasterization (TIR) which gives a coverage value for an edge which get summed into mask texture. It looks like the mask texture is atlased to allow batching multiple paths. The mask texture also seems to be tiled.
    • To render the tiles of the atlas, some "sloppy" geometry (in the sense that it does not perfectly fit the tiles and overflows) is sent and the the contents of each tile is clipped to the tile by applying specific clip distances for the contents of each tile.

Skia GL backend (Google)

Source code: https://github.com/google/skia

  • Seems to be using a mix of:
    • tessellation
      • Half-edge mesh internal representation for paths
    • stencil-and-cover
    • software fallback with texture upload

TODO: I need to check this, but the code at least contains all of the above.

nanovg

  • Stencil-and-cover
  • vertex-aa.
    • The path geometry is inset by half a pixel to produce a mask via the stencil buffer in typical stencil-and-cover fashion for the non-antialiased (interior) part.
      • The insetting logic is very simple, it does not attempt to be strictly correct with self-intersecting shapes. It looks like all edges are offset on the same side and the side is probably chosen based on the number of turns per loop.
    • The vertex-aa part (the "fringe") is rendered in a separate draw call, the stencil buffer excluding it from the interior area so that it does not overlap with it. The fringe can self-overlap (presumably causing only minor artifacts at the border, but none inside the shape).
    • In other words this approach ensures that self-overlaping shapes don't shade pixels twice in the filled area, only potentially on the anti-aliased parts along the border.
    • Because it uses vertex-aa and blends the coverage of the edges independently, this approach has conflation artifacts.
  • A fast path for convex paths skips the stencil buffer part.

Scaleform

  • Full-scene Tessellation
    • tessellates meshes that contain multiple paths and minimizes overdraw at the tessellation level.
  • vertex-aa

Glyphy

Source code: https://github.com/behdad/glyphy

  • Vector texture
    • Approximates all curves with elliptic arc segments
    • Very little cpu work except for bézier -> arc approximation
    • Especially sensitive to driver bugs
    • Gotta trust the driver not to fiddle with texture formats behind our back, otherwise the path representation is corrupted.
    • No overdraw mitigation

Slug

  • Similar to glyphy in principle but with quadratic béziers instead of arcs
  • great care taken on numerical stability
  • No overdraw mitigation

Libtess2

Source code: https://github.com/memononen/libtess2

  • Fork of the GLU tessellator (no rendering engine, it is just a tessellator).
    • Half-edge mesh internal representation for paths

Font-rs

Source code: https://github.com/google/font-rs

Font-rs renders glyphs on the CPU but is interesting and the approach could be ported to the GPU. Some notes about it here: https://github.com/nical/lyon/wiki/Misc#font-rs.