MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
-
Updated
Jun 4, 2024 - Python
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Go metrics for calculating string similarity and other string utility functions
Compare html similarity using structural and style metrics
A package to compute medical segmentation metrics.
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
A Clojure library for querying large data-sets on similarity
Spark functions to run popular phonetic and string matching algorithms
SetSketch: Filling the Gap between MinHash and HyperLogLog
Calculate various string metrics efficiently in Haskell
ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
This is an implementation of the paper written by Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett
Minhash and maxhash library in Python, combining flexibility, expressivity, and performance.
Easy-to-use Java similarity algorithms for text and numeric-series
Locality Sensitive Hashing for semantic similarity (Python 3.x)
A text similarity computation using minhashing and Jaccard distance on reuters dataset
Text Matching Based on LCQMC: A Large-scale Chinese Question Matching Corpus
Exploring Jaccard and Cosine similarities performances then visualising their output using k means and kmeans with pca. Additional input on time series analysis, web scrapping and twitter scrapping.
Add a description, image, and links to the jaccard-similarity topic page so that developers can more easily learn about it.
To associate your repository with the jaccard-similarity topic, visit your repo's landing page and select "manage topics."