Skip to content

Commit

Permalink
readme update
Browse files Browse the repository at this point in the history
  • Loading branch information
tower120 committed Dec 18, 2023
1 parent 96f23cb commit f99311c
Showing 1 changed file with 19 additions and 17 deletions.
36 changes: 19 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Hierarchical sparse bitset

[![crates.io](https://img.shields.io/crates/v/hi_sparse_bitset.svg)](https://crates.io/crates/hi_sparse_bitset)
[![license](https://img.shields.io/badge/license-Apache--2.0_OR_MIT-blue?style=flat-square)](#license)
[![Docs](https://docs.rs/hi_sparse_bitset/badge.svg)](https://docs.rs/hi_sparse_bitset)
[![CI](https://github.com/tower120/hi_sparse_bitset/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/tower120/hi_sparse_bitset/actions/workflows/ci.yml)

Hierarchical sparse bitset. High performance of operations between bitsets (intersection, union, etc.).
High performance of operations between bitsets (intersection, union, etc.).
Low memory usage.

Think of [hibitset](https://crates.io/crates/hibitset), but with lower memory consumption.
Expand All @@ -18,7 +20,7 @@ algorithmic complexity on operations between bitsets.
<img src="https://github.com/tower120/hi_sparse_bitset/raw/main/doc/hisparsebitset-bg-white-50.png">
</picture>

# Usage
## Usage

```rust
type BitSet = hi_sparse_bitset::BitSet<hi_sparse_bitset::config::_128bit>;
Expand Down Expand Up @@ -46,7 +48,7 @@ let iter = union.iter().move_to(cursor);
assert_equal(iter, [9,10]);
```

# Memory footprint
## Memory footprint

Being truly sparse, `hi_sparse_bitset` allocate memory only for bitblocks in use.
`hi_sparse_bitset::BitSet` has tri-level hierarchy, with first and second levels
Expand All @@ -59,13 +61,13 @@ minimal(initial) = 416 bytes, maximum = 35 Kb.

See doc for more info.

# Performance
## Performance

It is faster than hashsets and pure bitsets for all inter-bitset operations
and all cases in orders of magnitude. It is even faster than
hibitset. See benchmarks.

## Against `hibitset`
### Against `hibitset`

Despite the fact that `hi_sparse_bitset` have layer of indirection for accessing
each level, it is faster (sometimes significantly) then `hibitset` for all operations.
Expand All @@ -74,7 +76,7 @@ On top of that, it is also **algorithmically** faster than `hibitset` in
non-intersection inter-bitset operations due to caching iterator, which
can skip bitsets with empty level1 blocks.

## Against `roaring`
### Against `roaring`

`roaring` is a hybrid bitset, that use sorted array of bitblocks for set with large integers,
and big fixed-sized bitset for a small ones.
Expand All @@ -86,15 +88,15 @@ bitblock in hierarchy, which is close to O(1) for each resulted bitblock.
Plus, hierarchical bitset discard groups of non-intersected blocks
early, due to its tree-like nature.

# DataBlock operations
## DataBlock operations

In order to speed up things even more, you can work directly with
`DataBlock`s. `DataBlock`s - is a bit-blocks (relatively small in size),
which you can store and iterate latter.

_In future versions, you can also insert DataBlocks into BitSet._

# Reduce on iterator of bitsets
## Reduce on iterator of bitsets

In addition to "the usual" bitset-to-bitset(binary) operations,
you can apply operation to iterator of bitsets (reduce/fold).
Expand All @@ -103,49 +105,49 @@ number of bitsets, but also have the same result type,
for any bitsets count. Which allows to have nested reduce
operations.

# Ordered/sorted
## Ordered/sorted

Iteration always return sorted sequences.

# Suspend-resume iterator with cursor
## Suspend-resume iterator with cursor

Iterators of `BitSetInterface` (any kind of bitset) can return cursor,
and can rewind to cursor. Cursor is like integer index in `Vec`.
Which means, that you can use it even if container was mutated.

## Multi-session iteration
### Multi-session iteration

With cursor you can suspend and later resume your iteration
session. For example, you can create an intersection between several bitsets, iterate it
to a certain point, and obtain an iterator cursor. Then, later,
you can make an intersection between the same bitsets (but possibly in different state),
and resume iteration from the last point you stopped, using cursor.

## Thread safe multi-session iteration
### Thread safe multi-session iteration

You can use "multi-session iteration" in multithreaded env too.
_(By wrapping bitsets in Mutex(es))_

### Invariant intersection
#### Invariant intersection

If intersection of bitsets _(or any other operation)_ does not change with possible bitsets mutations - you're guaranteed to correctly traverse all of its elements.

### Bitsets mutations narrows intersection/union
#### Bitsets mutations narrows intersection/union

If in intersection, only `remove` operation mutates bitsets - this guarantees that you will not loose any valid elements at the end of "multi-session iteration".

### Speculative iteration
#### Speculative iteration

For other cases - you're guaranteed to proceed forward, without repeated elements.
_(In each iteration session you'll see initial valid elements + some valid new ones)_
You can use this if you don't need to traverse EXACT intersection. For example, if you
process intersection of the same bitsets over and over in a loop.

# Changelog
## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version differences.

# Known alternatives
## Known alternatives

* [hibitset](https://crates.io/crates/hibitset) - hierarchical dense bitset.
If you'll insert one index = 16_000_000, it will allocate 2Mb of RAM.
Expand Down

0 comments on commit f99311c

Please sign in to comment.