Design of bitset for storing key-value pairs.
This experiment (unordered-map) was for comparing the performance between:
- Storing
DiGraph
edges using vector. - Storing
DiGraph
edges using unordered_map.
Each approach was attempted with a number of different graphs, performing
2 graph structure operations, readMtx
and addRandomEdge
. Surprisingly,
unordered_map approach performs works in all cases compared to using
vector for edges. That possibly means that even though a vector needs
O(n) search time for checking existence of edge, it still tends to be faster
that unordered_map, especially since many vertices will have a small number
of edges, where a simple linear search is usually better. A hybrid approach
might yield better results.
This experiment (vector-sorted-vs-unsorted) was for comparing performance between:
- Read graph edges to sorted bitset based DiGraph & transpose.
- Read graph edges to unsorted bitset based DiGraph & transpose.
Each approach was attempted on a number of temporal graphs, running each with
multiple batch sizes (1
, 5
, 10
, 50
, ...). Each batch size was run 5
times for each approach to get a good time measure. Transpose of DiGraph
based on sorted bitset is clearly faster than the unsorted one.
However, with reading graph edges there is no clear winner (sometimes
sorted is faster especially for large graphs, and sometimes unsorted).
Maybe when new edges have many duplicates, inserts are less, and hence
sorted version is faster (since sorted bitset has slow insert time).
The partially sorted bitset maintains 2 sublists in the same vector, sorted and unsorted. New items are added to the unsorted sublist at the end. When unsorted sublist grows beyond limit, it is merged with the sorted sublist (unsorted sublist becomes empty). Lookup can be performed in either the sorted sublist first, or the unsorted one.
This experiment (vector-partially-sorted-adjust-unsorted) was for comparing
performance of reading graph edges (readSnapTemporal
) and transpose
(transposeWithDegree
) between:
- Merge sublists using sort (modes
0
,1
). - Merge sublists in-place (modes
2
,3
). - Merge sublists using extra space for sorted sublist (modes
4
,5
). - Merge sublists using extra space for unsorted sublist (modes
6
,7
).
In each case given above, lookup in bitset is done either in sorted sublist first, or in unsorted sublist:
- Lookup first in sorted sublist (modes
0
,2
,4
,6
). - Lookup first in unsorted sublist (modes
1
,3
,5
,7
).
For all of the total 8
modes above, unsorted limit for unsorted sublist
was adjusted with multiple sizes (1
, 2
, 3
, ..., 10
, 20
, ..., 10000
).
Merge using sort simply sorts the entire list. Merging in-place uses
std::inplace_merge()
. Merge using extra space for sorted sublist
uses a shared temporary buffer for storing sorted values (which could be large)
and then performs a std::merge()
after sorting the unsorted sublist. On the
other hand, merge using extra space for unsorted sublist uses a shared
temporary buffer for storing unsorted values (which is potentially much smaller
than the sorted sublist), and performs a reverse merge after sorting the
unsorted sublist. From the result it appears that merging sublists in-place,
and using extra space for unsorted sublist, both with first lookup in
sorted sublist are fast (modes inplace-s
=2
, extrau-s
=6
). For both
cases, a limit of 128
appears to be a good choice.
This experiment (vector-sorted-full-vs-partial) was for comparing performance between:
- Read graph edges to sorted bitset based DiGraph & transpose.
- Read graph edges to partially sorted bitset based DiGraph & transpose.
Each approach was attempted on a number of temporal graphs, running each 5 times to get a good time measure. Transpose of DiGraph based on sorted bitset is clearly faster than the partially sorted one. This is possibly because partially sorted bitset based DiGraph causes higher cache misses due to random accesses (while reversing edges). However, with reading graph edges there is no clear winner (sometimes partially sorted is faster especially for large graphs, and sometimes unsorted). On average, it is better to stick with simple fully sorted bitset.
This experiment (vector-adjust-inbuilt-buffer) is for comparing various
buffer sizes for BitSet with small vector optimization. Read graph
edges, and transpose, was attempted on a number of temporal graphs,
running each with multiple buffer sizes (1
, 2
, 4
, ...4096
). Each buffer
size was run 5 times for for both read and transpose to get a good time
measure.
On average, a buffer size of 4
seems to give small improvement.
Any further increase in buffer size slows down performance. This is possibly
because of unnecessaryily large contiguous large memory allocation needed by the
buffer, and low cache-hit percent due to widely separated edge data (due to the
static buffer). In fact it even crashes when 26 instances of graphs with varying
buffer sizes can't all be held in memory. Hence, small vector
optimization is not useful, atleast when used for graphs.
This experiment (vector-subrange16-sorted-vs-unsorted) is for comparing
sorted vs unsorted for 16-bit subrange based BitSet. Each approach was
attempted on a number of temporal graphs, running each with multiple read sizes
(1E+3
, 5E+3
, 1E+4
, 5E+4
, ...). Each read size was run 5 times for each
approach to get a good time measure. 16-bit subrange bitset is similar
to [Roaring bitmap], which also breaks up all integers into buckets of 2^16
integers.
Although both sorted and unsorted 16-bit subrange based bitsets perform similarly, on average unsorted approach performs slightly better than sorted approach.
- Fast Compact Sparse Bit Sets by Justin Collins
- SparseBitSet: An efficient sparse bit set implementation for Java; by Brett Wooldridge
- boost::dynamic_bitset<Block, Allocator>
- Stanford Large Network Dataset Collection
- Good library for bitsets or bitarrays