Skip to content

Commit

Permalink
fix wraparound issue in Jaro/JaroWinkler
Browse files Browse the repository at this point in the history
  • Loading branch information
maxbachmann committed Oct 31, 2023
1 parent ea6962a commit c1f0d0d
Show file tree
Hide file tree
Showing 5 changed files with 22 additions and 3 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Changelog
---------

[3.5.0] - 2023-10-30
[3.5.0] - 2023-10-31
^^^^^^^^^^^^^^^^^^^^
Changed
~~~~~~~
Expand All @@ -19,6 +19,7 @@ Performance
Fixed
~~~~~
* the preprocessing function was always called through Python due to a broken C-API version check
* fix wraparound issue in simd implementation of Jaro and Jaro Winkler

[3.4.0] - 2023-10-09
^^^^^^^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ else()
add_library(Taskflow::Taskflow ALIAS Taskflow)
endif()

find_package(rapidfuzz 2.2.0 QUIET)
find_package(rapidfuzz 2.2.1 QUIET)
if(rapidfuzz_FOUND)
message(STATUS "Using system supplied version of rapidfuzz-cpp")
else()
Expand Down
2 changes: 1 addition & 1 deletion extern/rapidfuzz-cpp
11 changes: 11 additions & 0 deletions tests/distance/test_Jaro.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ def test_hash_special_case():


def test_edge_case_lengths():
"""
these are largely found by fuzz tests and implemented here as regression tests
"""
assert pytest.approx(Jaro.similarity("", "")) == 1
assert pytest.approx(Jaro.similarity("0", "0")) == 1
assert pytest.approx(Jaro.similarity("00", "00")) == 1
Expand All @@ -20,6 +23,14 @@ def test_edge_case_lengths():
assert pytest.approx(Jaro.similarity("0" * 64, "0" * 65)) == 0.994872
assert pytest.approx(Jaro.similarity("0" * 63, "0" * 65)) == 0.989744

s1 = "000000001"
s2 = "0000010"
assert pytest.approx(Jaro.similarity(s1, s2)) == 0.878307

s1 = "01234567"
s2 = "0" * 170 + "7654321" + "0" * 200
assert pytest.approx(Jaro.similarity(s1, s2)) == 0.548740

s1 = "10000000000000000000000000000000000000000000000000000000000000020"
s2 = "00000000000000000000000000000000000000000000000000000000000000000"
assert pytest.approx(Jaro.similarity(s1, s2)) == 0.979487
Expand Down
7 changes: 7 additions & 0 deletions tests/distance/test_JaroWinkler.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ def test_hash_special_case():


def test_edge_case_lengths():
"""
these are largely found by fuzz tests and implemented here as regression tests
"""
assert pytest.approx(JaroWinkler.similarity("", "")) == 1.0
assert pytest.approx(JaroWinkler.similarity("0", "0")) == 1
assert pytest.approx(JaroWinkler.similarity("00", "00")) == 1
Expand All @@ -20,6 +23,10 @@ def test_edge_case_lengths():
assert pytest.approx(JaroWinkler.similarity("0" * 64, "0" * 65)) == 0.996923
assert pytest.approx(JaroWinkler.similarity("0" * 63, "0" * 65)) == 0.993846

s1 = "000000001"
s2 = "0000010"
assert pytest.approx(JaroWinkler.similarity(s1, s2)) == 0.926984

s1 = "10000000000000000000000000000000000000000000000000000000000000020"
s2 = "00000000000000000000000000000000000000000000000000000000000000000"
assert pytest.approx(JaroWinkler.similarity(s1, s2)) == 0.979487
Expand Down

0 comments on commit c1f0d0d

Please sign in to comment.