MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
-
Updated
Jun 4, 2024 - Python
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Dynatrace hash library for Java
A Clojure library for querying large data-sets on similarity
Paper about the estimation of cardinalities from HyperLogLog sketches
DynaHist: A Dynamic Histogram Library for Java
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Memory-efficient Count-Min Sketch Counter (based on Madoka C++ library)
Implementation for - Mitigating DNS random subdomain DDoS attacks by distinct heavy hitters sketches
UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting
ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale
Approximate Sketches for Join Size Estimation (SIGMOD'24)
A Prototype For Fitting Monotonic Cubic Splines to a Tdigest Sketch
Yet Another Lame Algorithm Library
A barebones implementation of the simhash data sketching algorithm.
Program to test Performance of Data Sketches such as FastExpSketch, QSketch
Add a description, image, and links to the data-sketches topic page so that developers can more easily learn about it.
To associate your repository with the data-sketches topic, visit your repo's landing page and select "manage topics."