Skip to content

Latest commit

 

History

History
340 lines (249 loc) · 9.02 KB

forest.adoc

File metadata and controls

340 lines (249 loc) · 9.02 KB

Tupelo Forest - One Tree To Rule Them All

Overview

Have you ever wanted to manipulate tree-like data structures such as hiccup or HTML? If so, then the tupelo.forest library is for you! Forest allows you to:

  • Easily search for tree nodes based on the path from the tree root

  • Search for tree nodes based on content

  • Limit a search to nodes in an arbitrary sub-tree

  • Find parents and siblings of a node found in a search

  • Chain searches together, so that nodes found in one search are used to limit the scope of sub-searches

In addition, tupelo.forest allows you to update the tree by adding, changing, or deleting nodes. Since tupelo.forest allows one to easily find parent and/or sibling nodes, this is a powerful feature missing in most other tree-processing libraries.

Resources

Quick Start

Create a tree using Hiccup:

(ns xyz
  (:use tupelo.core tupelo.forest))

(with-forest (new-forest)
  (let [root-hid (add-tree-hiccup [:a
                                   [:b 1]
                                   [:b 2]
                                   [:b
                                    [:c 4]
                                    [:c 5]]
                                   [:c 9]])]

Display the tree in a compact format:

(is= (format-paths (find-paths root-hid [:a]))
  [[{:tag :a}
    [{:tag :b, :value 1}]
    [{:tag :b, :value 2}]
    [{:tag :b}
     [{:tag :c, :value 4}]
     [{:tag :c, :value 5}]]
    [{:tag :c, :value 9}]]])

Display the paths from the root to all :c nodes:

(let [c-paths (find-paths root-hid [:** :c]) ]
  (is= c-paths [[:0006 :0004 :0002]
                [:0006 :0004 :0003]
                [:0006 :0005]])

Each value like :0004 is a Hexidecimal ID (HID) that serves as a pointer to a tree node like [:c 9]. Each vector of HIDs is a path from the root node :a to a :c node. There are three :c nodes and, hence, three paths. You can see that the length of each of the 3 paths matches the position in the tree of the 3 :c nodes. Each :c node is at the end of its respective path.

Display the last :c node as Hiccup:

(is= (hid->hiccup :0005) [:c 9] )

Find paths to all [:c 4] nodes. There is only one:

  (let [c4-paths (find-paths root-hid [:** {:tag :c :value 4}]) ]
    (is= c4-paths [[:0006 :0004 :0002]] ))

Find the parent of the [:c 4] node:

(let [c4-parent (-> c4-paths only reverse second)]
  (is= c4-parent :0004)
  (is= (hid->hiccup c4-parent)  [:b [:c 4] [:c 5]] )
  (is= (hid->node   c4-parent)  {:tag :b, :tupelo.forest/khids [:0002 :0003] } )

This shows the internal structure of tree nodes as simple maps. Node :0004 has a :tag value of :b and 2 child nodes (Kid HIDs => "khids").

Modify the value of the [:c 4] node:

(let [c4-hid (-> c4-paths only last)]
  (is= c4-hid :0002)
  (value-update c4-hid inc)
  (is= (hid->node c4-hid) {:tupelo.forest/khids [], :tag :c, :value 5}) )

Creating a Tree

The easiest way to create a tree is by using Hiccup:

(ns xyz
  (:use tupelo.core tupelo.forest))

(with-forest (new-forest)
  (let [root-hid (add-tree-hiccup [:a
                                   [:b 1]
                                   [:b 2]
                                   [:b
                                    [:c 4]
                                    [:c 5]]
                                   [:c 9]])]

The expression (with-forest …​) defines a context for all contained expressions. Here we create a new (empty) forest data-structure with (new-forest). The expression (add-tree-hiccup …​) inserts a tree (represented as hiccup data) into the forest, returning the address of the tree root at root-hid. Each forest can contain many trees.

  • add-tree-enlive

  • add-tree-xml

What is an HID?

tupelo.forest uses an opaque value called an Hexadecimal ID (HID) as a pointer to each tree node. An HID is a Clojure keyword derived from the SHA-1 of a UUID and are 20 hex digits long (160 bits total). Some typical HID values might be:

    :c3b0dccd4d344ac765183f49940f4d685de7a3f5
    :b40b6f37e6a746f815b092a8590cefe5cf37121a
    :c3b0dccd4d344ac765183f49940f4d685de7a3f5
    :76859beedd81468b4ee3cc5f17a5fdcf7a34a787

The HID format is designed so that each node will always have a unique ID value, without requireing coordination with trees created in other locations or at other times. Since 2^160 is approximately equal to the number of atoms on Earth, we can be confident that no two tree nodes will ever have the same HID value.

Debugging with HIDs

At times, it may be easier to perform debugging or other tasks using short, deterministic HIDs. In this case, you may use (with-debug-hid …​) to wrap an entire forest expression:

(with-debug-hid
  (with-forest (new-forest)
    ... ))

The (with-debug-hid …​) form will cause all HIDs to be limited to 4 hex digits (65536 values max). The HIDs will also be created deterministically, counting up from :0000. Some typical HIDs created using with-debug-hid might be:

    :0000
    :0001
    :0002
    :0003

Displaying a Tree

  • hid->tree

  • hid->bush

  • hid->hiccup

  • hid->enlive

Searching a Tree

  • find-paths

What is a Path?

A path is nothing more than a vector of HIDs. It describes tha path from one node to one of its descendant nodes. Each node in the path is represented by its HID in the path vector.

Displaying a Path

  • format-paths

Getting Node Information

  • attribute(s)

  • hid->attr

  • hid->attrs

  • hid->bush

  • hid->enlive

  • hid->higgup

  • hid->kids

  • hid->leaf

  • hid->node

  • hid->tree

Manipulating a Tree

Adding Nodes

  • node

  • leaf

  • tree

Modifying Child Nodes

  • kids-append

  • kids-prepend

  • kids-set

  • kids-update

Modifying Node Attributes

  • get

  • set

  • remove

  • update

Converting Between Formats

  • bush

  • enlive

  • hiccup

  • tree

Working with Sibling Nodes

Suppose we have some Hiccup nodes like the following:

  (with-debug-hid
    (with-forest (new-forest)
      (let [root-hid        (add-tree-hiccup
                              [:div {:class :some-div-1}
                               [:div {:class :some-div-2}
                                [:label "Some Junk"]
                                [:div {:class :some-div-3}
                                 [:label "Specify your shipping address"]
                                 [:div {:class :some-div-4}
                                  [:input {:type        "text" :autocomplete "off" :required "required"
                                           :placeholder "" :class "el-input__inner"}]]]]])

We want to find the :input node in the same :div as the :label node with text "Specify your shipping address". We then find its parent, and use the parent as the beginning of a new search for the desired :input node:

label-path                   (only (find-paths root-hid [:** {:tag :label :value "Specify your shipping address"}]))
parent-div-hid               (-> label-path reverse second)
shipping-address-input-hid   (find-hid parent-div-hid [:div :div :input])

Unit test show it working:

(is= label-path [:0006 :0005 :0004 :0001])
(is= parent-div-hid :0004)
(is= (hid->hiccup shipping-address-input-hid)
  [:input {:type        "text", :autocomplete "off", :required "required",
           :placeholder "", :class "el-input__inner"}])
(value-set shipping-address-input-hid "1234 Main St")
(is= (hid->hiccup shipping-address-input-hid)
  [:input {:type         "text", :autocomplete "off", :required     "required",
           :placeholder  "", :class        "el-input__inner"}
   "1234 Main St"])

We can output the final modified tree:

(hid->hiccup root-hid) =>
    [:div
     {:class :some-div-1}
     [:div
      {:class :some-div-2}
      [:label "Some Junk"]
      [:div
       {:class :some-div-3}
       [:label "Specify your shipping address"]
       [:div
        {:class :some-div-4}
        [:input
         {:type "text",
          :autocomplete "off",
          :required "required",
          :placeholder "",
          :class "el-input__inner"}
         "1234 Main St"]]]]]