Skip to content

Commit

Permalink
Add :code-point-limit option to accept bigger documents (#103)
Browse files Browse the repository at this point in the history
* Add code-point-limit option to user guide

* Add :code-point-limit option

* Update changelog
  • Loading branch information
pitalig authored Jul 10, 2023
1 parent cb82508 commit ab98016
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 3 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ Clj-yaml makes use of SnakeYAML, please also refer to the https://bitbucket.org/
** Added `:nesting-depth-limit` to `parse-string` and `parse-stream`
(https://github.com/clj-commons/clj-yaml/issues/81[#81])
(https://github.com/neeasade[@neeasade])
** Added `:code-point-limit` option to accept bigger documents
(https://github.com/clj-commons/clj-yaml/issues/94[#94])
(https://github.com/pitalig[@pitalig])
* Quality
** Stop using deprecated SnakeYAML Representer constructor
(https://github.com/clj-commons/clj-yaml/issues/76[#76])
Expand Down
14 changes: 14 additions & 0 deletions doc/01-user-guide.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,20 @@ You can ask clj-yaml to return parsed YAML with extra positional data markers vi

In reality, the `:start` `:end` and `:unmark` maps are internally a record and can be recognized via `marked?` and unwrapped via `unmark`.

==== Document size limit [[size-limit]]

SnakeYAML implementation (that clj-yaml uses for low-level encoding and decoding) imposes the default limit of 3 megabyte document size for security reasons (https://bitbucket.org/snakeyaml/snakeyaml/issues/547/restrict-the-size-of-incoming-data[issue]). If you hit this limitation, you need to explicitly increase the limit by setting the `:code-point-limit` option:

[source,clojure]
----
(parse-string bigger-than-default-limit)
;; Execution error (YAMLException) at org.yaml.snakeyaml.scanner.ScannerImpl/fetchMoreTokens (ScannerImpl.java:342).
;; The incoming YAML document exceeds the limit: 3145728 code points.
(parse-string bigger-than-default-limit :code-point-limit (* 10 1024 1024))
;; outputs the long string
----

=== Generating YAML

==== Dumper Options [[dumper-options]]
Expand Down
12 changes: 9 additions & 3 deletions src/clojure/clj_yaml/core.clj
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@
Returns internal SnakeYAML loader options.
See [[parse-string]] for description of options."
^LoaderOptions [& {:keys [max-aliases-for-collections allow-recursive-keys allow-duplicate-keys nesting-depth-limit]}]
^LoaderOptions [& {:keys [max-aliases-for-collections allow-recursive-keys allow-duplicate-keys nesting-depth-limit code-point-limit]}]
(let [loader (default-loader-options)]
(when nesting-depth-limit
(.setNestingDepthLimit loader nesting-depth-limit))
Expand All @@ -99,6 +99,8 @@
(.setAllowRecursiveKeys loader allow-recursive-keys))
(when (instance? Boolean allow-duplicate-keys)
(.setAllowDuplicateKeys loader allow-duplicate-keys))
(when code-point-limit
(.setCodePointLimit loader code-point-limit))
loader))

(defn make-yaml
Expand All @@ -107,11 +109,12 @@
Returns internal SnakeYAML encoder/decoder.
See [[parse-string]] and [[generate-string]] for description of options."
^Yaml [& {:keys [unknown-tag-fn dumper-options unsafe mark max-aliases-for-collections allow-recursive-keys allow-duplicate-keys nesting-depth-limit]}]
^Yaml [& {:keys [unknown-tag-fn dumper-options unsafe mark max-aliases-for-collections allow-recursive-keys allow-duplicate-keys nesting-depth-limit code-point-limit]}]
(let [loader (make-loader-options :max-aliases-for-collections max-aliases-for-collections
:allow-recursive-keys allow-recursive-keys
:allow-duplicate-keys allow-duplicate-keys
:nesting-depth-limit nesting-depth-limit)
:nesting-depth-limit nesting-depth-limit
:code-point-limit code-point-limit)
^BaseConstructor constructor
(cond
unsafe (Constructor. loader)
Expand Down Expand Up @@ -290,6 +293,9 @@
- `:nesting-depth-limit` the maximum number of nested YAML levels.
- Default: `50`
- throws when value is exceeded.
- `:code-point-limit` the maximum number of code points (document size).
- Default: `3145728`
- throws when value is exceeded.
- `:allow-recursive-keys` - when `true` allows recursive keys for mappings. Only checks the case where the key is the direct value.
- Default: `false`
- `:allow-duplicate-keys` - when `false` throws on duplicate keys.
Expand Down
11 changes: 11 additions & 0 deletions test/clj_yaml/core_test.clj
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,17 @@ the-bin: !!binary 0101")
(is (parse-string nested-depth-51 :nesting-depth-limit 51)
"passes when we bump max to 51"))

(def bigger-than-default-limit
(->> (repeat 400000 "- b: foo")
(cons "a: ")
(string/join "\n")))

(deftest code-point-limit-works
(is (thrown-with-msg? YAMLException #"The incoming YAML document exceeds the limit: 3145728 code points" (parse-string bigger-than-default-limit))
"throws when default of 3145728 is exceeded")
(is (parse-string bigger-than-default-limit :code-point-limit (* 10 1024 1024))
"passes when we bump limit to 10mb"))

(def recursive-yaml "
---
&A
Expand Down

0 comments on commit ab98016

Please sign in to comment.