Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string schema properties #587

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 74 additions & 1 deletion src/malli/core.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -570,6 +570,79 @@
(when-let [ns-name (some-> properties :namespace name)]
(fn [x] (= (namespace x) ns-name))))

;;
;; string schema helpers
;;

#?(:cljs (defn -numeric-char? [c] (and (< 47 c) (< c 58))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks work differently than the JVM versions.

#?(:cljs (defn -upper-alpha-char? [c] (and (< 64 c) (< c 91))))
#?(:cljs (defn -lower-alpha-char? [c] (and (< 96 c) (< c 123))))
#?(:cljs (defn -letter? [c] (or (-lower-alpha-char? c) (-upper-alpha-char? c))))
#?(:cljs (defn -alphanumeric? [c] (or (-letter? c) (-numeric-char? c))))

(defn -charset-predicate
[o]
(case o
:digit #?(:clj #(Character/isDigit ^char %) :cljs -numeric-char?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. there is both char and int versions of the predicates in Java:

https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isLetter(char)
https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isLetter(int)

The int version supports unicode characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is probably fine. Char supports 0-65535 so it is enough to support unicode range 0x0000 - 0xFFFF.

Clojure characters don't support supplementary char ranges either. Though strings support:

0x2F81A:

\冬
=> Unsupported character: \冬

(.codePointAt "冬" 0)
=> 194586

(char (.codePointAt "冬" 0))
=> Value out of range for char: 194586

(Character/isLetter (int 0x2F81A))
=> true

(int (.charAt "冬" 0))
=> 55422 in this case .charAt only returns first two bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS charCodeAt works the same as JVM charAt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is I'm using charAt which returns char, not int

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x2F81A (which is \⾁) is supported under the 12161 unicode: (link)

(int \⾁)
=> 12161

Same with all other unicode characters:

(char 8809)
=> \≩
(char 508)
=> \Ǽ
(int \Ǽ)
=> 508
(int \≩)
=> 8809
(int \Θ)
=> 920
(char 33071)
=> \脯

:letter #?(:clj #(Character/isLetter ^char %) :cljs -letter?)
(:alphanumeric :letter-or-digit) #?(:clj #(Character/isLetterOrDigit ^char %) :cljs -alphanumeric?)
:alphabetic #?(:clj #(Character/isAlphabetic (int %)) :cljs -letter?)
(cond
(set? o) (miu/-some-pred (mapv -charset-predicate o))
(char? o) #?(:clj #(= ^char o %) :cljs (let [i (.charCodeAt o 0)] #(= i %)))
:else (eval o))))

(defn string-char-predicate
[p]
(fn charset-pred ^Boolean [^String s]
(let [n #?(:clj (.length s) :cljs (.-length s))]
(loop [i 0]
(if (= i n)
true
(if (p #?(:clj (.charAt s (unchecked-int i))
:cljs (.charCodeAt s (unchecked-int i))))
(recur (unchecked-inc i))
false))))))

#?(:clj
(defn find-blank-method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could lift the minimum java to 11 and remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a huge breaking change for users. Sadly, there's still plenty of Java 8 in the world and we must accommodate

Copy link
Member

@Deraen Deraen Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikitommi Definitely ~half of our work projects are still on Java 8

[]
(try
(.getMethod String "isBlank" (into-array Class []))
#(.isBlank ^String %)
(catch Exception _
(require 'clojure.string)
clojure.string/blank?))))

#?(:clj (def blank? (find-blank-method))
:cljs (defn blank? [^String s] (zero? (.-length (.trim s)))))

(defn -string-predicates
([{:keys [charset pattern non-blank]}]
(let [pattern
(when pattern
(let [pattern (re-pattern pattern)]
#?(:clj #(.find (.matcher ^Pattern pattern ^String %))
:cljs #(boolean (re-find pattern %)))))
charset
(when charset
(let [p (-charset-predicate charset)]
(string-char-predicate p)))
non-blank (when non-blank #(not (blank? %)))]
(-> non-blank
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the :non-blank needed? much shorter to use :min:

[:string {:non-blank true}]
[:string {:min 1}]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid min 1 string: " ", but it's invalid for non-blank

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be solved by transformers, but users can certainly want to specify they want a string with minimum length which is not blank. Blank in this case is a superset of empty, which I almost missed in the beginning

(miu/-maybe-and charset)
(miu/-maybe-and pattern)))))

(defn -string-property-pred
[]
(fn [properties]
(miu/-maybe-and
((-min-max-pred
#?(:clj #(.length ^String %)
:cljs #(.-length ^String %)))
properties)
(-string-predicates properties))))

;;
;; Schemas
;;
Expand Down Expand Up @@ -625,7 +698,7 @@

(defn -nil-schema [] (-simple-schema {:type :nil, :pred nil?}))
(defn -any-schema [] (-simple-schema {:type :any, :pred any?}))
(defn -string-schema [] (-simple-schema {:type :string, :pred string?, :property-pred (-min-max-pred count)}))
(defn -string-schema [] (-simple-schema {:type :string, :pred string?, :property-pred (-string-property-pred)}))
(defn -int-schema [] (-simple-schema {:type :int, :pred int?, :property-pred (-min-max-pred nil)}))
(defn -double-schema [] (-simple-schema {:type :double, :pred double?, :property-pred (-min-max-pred nil)}))
(defn -boolean-schema [] (-simple-schema {:type :boolean, :pred boolean?}))
Expand Down
51 changes: 40 additions & 11 deletions src/malli/generator.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,45 @@

(defn- -double-gen [options] (gen/double* (merge {:infinite? false, :NaN? false} options)))

(defn- -string-gen [schema options]
(let [{:keys [min max]} (-min-max schema options)]
(def ^:private char-numeric (gen/fmap char (gen/choose 48 57)))

(defn- -char-gen
[k]
(case k
:digit char-numeric
:letter gen/char-alpha
(:alphanumeric :letter-or-digit) gen/char-alphanumeric
(cond
(and min (= min max)) (gen/fmap str/join (gen/vector gen/char min))
(and min max) (gen/fmap str/join (gen/vector gen/char min max))
min (gen/fmap str/join (gen/vector gen/char min (* 2 min)))
max (gen/fmap str/join (gen/vector gen/char 0 max))
:else gen/string-alphanumeric)))
(set? k)
(let [chars (into [] (filter char? k))
chars (gen/fmap chars (gen/choose 0 (dec (count chars))))
gens (into [] (comp (remove char?) (map -char-gen)) k)]
(gen/one-of (conj gens chars))))))

#?(:clj
(defn- -string-from-regex [re]
(if-let [string-from-regex @(dynaload/dynaload 'com.gfredericks.test.chuck.generators/string-from-regex {:default nil})]
(string-from-regex (re-pattern (str/replace (str re) #"^\^?(.*?)(\$?)$" "$1")))
(m/-fail! :test-chuck-not-available))))

(defn- -string-gen [schema options]
(let [{:keys [min max]} (-min-max schema options)
{:keys [charset pattern non-blank]
:or {charset :alphanumeric}} (m/properties schema options)
min (cond
(and min non-blank) (clojure.core/max min non-blank)
non-blank 1
min min)]
(if pattern
#?(:clj (-string-from-regex pattern) :cljs (m/-fail! ::unsupported-generator))
(let [seed (-char-gen charset)
gen (cond
(and min (= min max)) (gen/vector seed min)
(and min max) (gen/vector seed min max)
min (gen/vector seed min (* 2 min))
max (gen/vector seed 0 max)
:else (gen/vector seed))]
(gen/fmap str/join gen)))))

(defn- -coll-gen [schema f options]
(let [{:keys [min max]} (-min-max schema options)
Expand Down Expand Up @@ -101,10 +132,8 @@
#?(:clj
(defn -re-gen [schema options]
;; [com.gfredericks/test.chuck "0.2.10"+]
(if-let [string-from-regex @(dynaload/dynaload 'com.gfredericks.test.chuck.generators/string-from-regex {:default nil})]
(let [re (or (first (m/children schema options)) (m/form schema options))]
(string-from-regex (re-pattern (str/replace (str re) #"^\^?(.*?)(\$?)$" "$1"))))
(m/-fail! :test-chuck-not-available))))
(let [re (or (first (m/children schema options)) (m/form schema options))]
(-string-from-regex re))))

(defn -ref-gen [schema options]
(let [gen* (delay (generator (m/deref-all schema) options))]
Expand Down
7 changes: 7 additions & 0 deletions src/malli/impl/util.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,10 @@
(def ^{:arglists '([[& preds]])} -some-pred
#?(:clj (-pred-composer or 16)
:cljs (fn [preds] (fn [x] (boolean (some #(% x) preds))))))

(defn -maybe-and
[f g]
(cond
(and f g) #(and (f %) (g %))
f f
g g))
13 changes: 12 additions & 1 deletion src/malli/json_schema.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,18 @@
(defmethod accept :nil [_ _ _ _] {:type "null"})

(defmethod accept :string [_ schema _ _]
(merge {:type "string"} (-> schema m/properties (select-keys [:min :max]) (set/rename-keys {:min :minLength, :max :maxLength}))))
(let [props (-> schema m/properties)
pattern (case (:charset props)
:digit "^[0-9]*$"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These patterns don't match the Character predicates, the predicates allow other ranges, e.g.:

'\u0030' through '\u0039', ISO-LATIN-1 digits ('0' through '9')
'\u0660' through '\u0669', Arabic-Indic digits
'\u06F0' through '\u06F9', Extended Arabic-Indic digits
'\u0966' through '\u096F', Devanagari digits
'\uFF10' through '\uFF19', Fullwidth digits

There are some classes in at least JVM Pattern which might match the predicates, but not sure if there are equivelents to all, and what does JS support. Listing all the ranges might work for some cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use unicode ranges, this works:

(re-find #"[\u0061-\u00F6]" "a")

:letter "^[a-zA-Z]*$"
(:alphanumeric :letter-or-digit) "^[a-zA-Z0-9]*$"
nil)
props (cond-> props pattern (assoc :pattern pattern))]
(merge
{:type "string"}
(-> props
(select-keys [:min :max :pattern])
(set/rename-keys {:min :minLength, :max :maxLength})))))

(defmethod accept :int [_ schema _ _]
(merge {:type "integer"} (-> schema m/properties (select-keys [:min :max]) (set/rename-keys {:min :minimum, :max :maximum}))))
Expand Down
41 changes: 41 additions & 0 deletions test/malli/core_test.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -2622,3 +2622,44 @@
(is (= ["1"] (m/-vmap str (subvec [1 2] 0 1))))
(is (= ["1"] (m/-vmap str (lazy-seq [1]))))
(is (= ["1" "2"] (m/-vmap str [1 2]))))

(deftest string-test
(testing "pattern"
(let [s (m/schema [:string {:pattern "foo"}])]
(is (true? (m/validate s "foo")))
(is (true? (m/validate s "afoo")))
(is (true? (m/validate s "fooa")))
(is (false? (m/validate s "foao"))))
(let [s (m/schema [:string {:pattern "^foo"}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "afoo")))
(is (true? (m/validate s "fooa")))
(is (false? (m/validate s "foao")))))
(testing "charset"
(let [s (m/schema [:string {:charset :alphabetic}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "fo1o"))))
(let [s (m/schema [:string {:charset :letter}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "fo1o"))))
(let [s (m/schema [:string {:charset :letter-or-digit}])]
(is (true? (m/validate s "foo")))
(is (true? (m/validate s "fo0")))
(is (false? (m/validate s "f-1o"))))
(let [s (m/schema [:string {:charset #{\- :letter-or-digit}}])]
(is (true? (m/validate s "foo")))
(is (true? (m/validate s "fo0")))
(is (true? (m/validate s "f-1x")))
(is (false? (m/validate s "f?1o")))))
(testing "non blank"
(let [s (m/schema [:string {:non-blank true}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "")))
(is (false? (m/validate s " ")))))
(testing "Combined"
(let [s (m/schema [:string {:non-blank true :pattern "foo" :charset :letter-or-digit}])]
(is (true? (m/validate s "foo")))
(is (false? (m/validate s "")))
(is (false? (m/validate s " ")))
(is (false? (m/validate s " foo ")))
(is (true? (m/validate s "foo0"))))))