Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum #7797

Merged
merged 30 commits into from
Jun 25, 2024

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Jun 6, 2024

Building upon #6640, I went on a wild goose chase wondering whether one could support numeric bounds with grammars, and I'm relatively happy with the provisional outcome.

{"type": "integer", "minimum": 0}
root ::= ([0] | [1-9] [0-9]{0,15}) space
{"type": "integer", "minimum": 3}
root ::= ([1-2] [0-9]{1,15} | [3-9] [0-9]{0,15}) space
{"type": "integer", "minimum": -123, "maximum": 42}
root ::= ("-" ([0-9] | ([1-8] [0-9] | [9] [0-9]) | "1" ([0-1] [0-9] | [2] [0-3])) | [0-9] | ([1-3] [0-9] | [4] [0-2])) space

This is still a WIP, here are some TODOs:

  • Increase test coverage
  • Fix some of the heading zeros (& bad numbers) allowed
  • Port to JS
  • Port to Python
  • Benchmark the resulting grammars (how much do we pay for this?)

Possible follow ups:

  • Add decimal numbers

/cc @HanClinto this since you mentioned it in #7789 :-D

@github-actions github-actions bot added the testing Everything test related label Jun 6, 2024
@ochafik ochafik added enhancement New feature or request Review Complexity : High Generally require indepth knowledge of LLMs or GPUs and removed testing Everything test related labels Jun 6, 2024
@github-actions github-actions bot added the testing Everything test related label Jun 8, 2024
@github-actions github-actions bot added examples python python script changes server labels Jun 8, 2024
@p-e-w
Copy link

p-e-w commented Jun 9, 2024

{"type": "integer", "minimum": 3}
root ::= ([0-2] [0-9]{1,15} | [3-9] [0-9]{0,15}) space

This rule matches strings like 01, which are neither valid JSON integers nor greater than or equal to 3.

@ochafik
Copy link
Collaborator Author

ochafik commented Jun 9, 2024

{"type": "integer", "minimum": 3}
root ::= ([0-2] [0-9]{1,15} | [3-9] [0-9]{0,15}) space

This rule matches strings like 01, which are neither valid JSON integers nor greater than or equal to 3.

@p-e-w Oops good catch, thanks! That was one of the "heading zeros allowed" I had on my radar, hadn't realized it was also letting through bad numbers. Fixed.

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 12, 2024
HanClinto added a commit to HanClinto/llama.cpp that referenced this pull request Jun 12, 2024
HanClinto added a commit to HanClinto/llama.cpp that referenced this pull request Jun 12, 2024
HanClinto added a commit to HanClinto/llama.cpp that referenced this pull request Jun 22, 2024
HanClinto added a commit that referenced this pull request Jun 22, 2024
* Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars.

* Adding additional examples as documented in #7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program.

* Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs.

* Merging improved schema test methods added by @ochafik in #7797

* Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework.

* Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings.

* Fixing grammar indentation to be consistent throughout file.
@ochafik ochafik changed the title [WIP] json: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum json: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum Jun 24, 2024
@ochafik ochafik marked this pull request as ready for review June 24, 2024 00:43
);

test_schema(
"min 0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This schema looks identical with the "simple min 0" test from line 154. Can they be merged together to shorten this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@@ -800,6 +948,165 @@ static void test_json_schema() {
}
);

test_schema(
"min -123 max 42",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This schema looks identical to the one from the "simple min -123 max 42" test on line 249. Can they be merged together?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, reshuffled / merged them all, thanks!

@@ -16,7 +16,7 @@ static std::string join(Iterator begin, Iterator end, const std::string & separa

static std::string repeat(const std::string & str, size_t n);

static std::string build_repetition(const std::string & item_rule, int min_items, int max_items, const std::string & separator_rule = "") {
static std::string build_repetition(const std::string & item_rule, const int min_items, int max_items, const std::string & separator_rule = "") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're changing min_items to const, is it worth adding const to max_items as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped unintended change, thanks!

@@ -12,6 +12,9 @@
#include <cassert>
#include <string>
#include <vector>
#include <json-schema-to-grammar.h>

using json = nlohmann::ordered_json;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a fault of the way I merged the code, but we don't need these 3 lines because json-schema-to-grammar.h is already included, and this using line is already here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry bad merge, fixed

Copy link
Collaborator

@HanClinto HanClinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Collaborator

@HanClinto HanClinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice defensive programming!

@ochafik ochafik merged commit 84631fe into ggerganov:master Jun 25, 2024
62 of 63 checks passed
@HanClinto
Copy link
Collaborator

HanClinto commented Jun 25, 2024

Very excited to see these changes getting merged in!! :) 🚀 🚀 🚀

@ochafik ochafik deleted the json-bounds2 branch June 28, 2024 20:11
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024
* Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars.

* Adding additional examples as documented in ggerganov#7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program.

* Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs.

* Merging improved schema test methods added by @ochafik in ggerganov#7797

* Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework.

* Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings.

* Fixing grammar indentation to be consistent throughout file.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024
…Maximum (ggerganov#7797)

* json: support minimum for positive integer values

* json: fix min 0

* json: min + max integer constraints

* json: handle negative min / max integer bounds

* json: fix missing paren min/max bug

* json: proper paren fix

* json: integration test for schemas

* json: fix bounds tests

* Update json-schema-to-grammar.cpp

* json: fix negative max

* json: fix negative min (w/ more than 1 digit)

* Update test-grammar-integration.cpp

* json: nit: move string rules together

* json: port min/max integer support to Python & JS

* nit: move + rename _build_min_max_int

* fix min in [1, 9]

* Update test-grammar-integration.cpp

* add C++11-compatible replacement for std::string_view

* add min/max constrained int field to pydantic json schema example

* fix merge

* json: add integration tests for min/max bounds

* reshuffle/merge min/max integ test cases

* nits / cleanups

* defensive code against string out of bounds (apparently different behaviour of libstdc++ vs. clang's libc++, can't read final NULL char w/ former)
MagnusS0 pushed a commit to MagnusS0/llama.cpp-normistral-tokenizer that referenced this pull request Jul 1, 2024
…Maximum (ggerganov#7797)

* json: support minimum for positive integer values

* json: fix min 0

* json: min + max integer constraints

* json: handle negative min / max integer bounds

* json: fix missing paren min/max bug

* json: proper paren fix

* json: integration test for schemas

* json: fix bounds tests

* Update json-schema-to-grammar.cpp

* json: fix negative max

* json: fix negative min (w/ more than 1 digit)

* Update test-grammar-integration.cpp

* json: nit: move string rules together

* json: port min/max integer support to Python & JS

* nit: move + rename _build_min_max_int

* fix min in [1, 9]

* Update test-grammar-integration.cpp

* add C++11-compatible replacement for std::string_view

* add min/max constrained int field to pydantic json schema example

* fix merge

* json: add integration tests for min/max bounds

* reshuffle/merge min/max integ test cases

* nits / cleanups

* defensive code against string out of bounds (apparently different behaviour of libstdc++ vs. clang's libc++, can't read final NULL char w/ former)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request examples python python script changes Review Complexity : High Generally require indepth knowledge of LLMs or GPUs Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants