In /src
,
hdpoint
is correponding to PtsHist, whileregion_tree
is for QuadHist.driver_*.py
are the drivers for experiments, with different input parameters.*_estimator.py
are the estimators, withtrain()
andevaluate()
as interfaces for their drivers.utility.py
includes various data loaders, error metrics, and other shared tools for the estimators.geometry.py
includes some geometric computations, like rectangle intersection.
We present the pseudo-codes of our algorithms' frameworks in the following.
load_data()
estimator = build_estimator()
estimator.train()
estimator.evaluate()
get_results()
class RegionTreeEstimator:
...
def train():
tree = build_region_tree()
for train_data in train_list:
recursively_split(tree, train_data)
build_equation_system()
solve()
def evaluate():
for test_data in test_list:
calc(test_data)
...
class HDPointEstimator:
...
def train():
weighted_points = []
for train_data in train_list:
weighted_points.append(train_data.sample())
build_equation_system()
solve()
def evaluate():
for test_data in test_list:
calc(test_data)
...
# driver_region_tree.py
# Vary XXX in the instruction, or use '--help' for hints
python driver_region_tree.py --dataset XXX --query_type XXX --train_size XXX --threshold XXX --buckets_limit XXX --test_size XXX --solver XXX
# driver_hdpoint.py
# Vary XXX in the instruction, or use '--help' for hints
python driver_hdpoint.py --dataset XXX --query_type XXX --train_size XXX --threshold XXX --buckets_limit XXX --alpha XXX --test_size XXX
To test other workloads, firstly add path and filename for both workload and min_max_range for data loaders in utility.py
, place them in the corresponding position, and then add the new item into --dataset []
. We will give more concrete examples in the released version.
scipy >= 1.7.2
cvxopt >= 1.2.7 (if use)
cplex >= 20.1.0.1 (and a license, if use)
gurobipy >= 9.5.0 (and a license, if use)
trainsize_buckets_threshold = {
50 : [
[100, 0.052],
[500, 0.012],
[1000, 0.0061],
[5000, 0.0015],
[10000, 0.0007]
],
200 : [
[100, 0.08],
[500, 0.018],
[1000, 0.0096],
[5000, 0.0021],
[10000, 0.0013]
],
500 : [
[100, 0.08],
[500, 0.0205],
[1000, 0.011],
[5000, 0.00267],
[10000, 0.0014]
],
1000 : [
[100, 0.11],
[500, 0.025],
[1000, 0.014],
[5000, 0.003],
[10000, 0.0017]
],
2000 : [
[100, 0.125],
[500, 0.03],
[1000, 0.016],
[5000, 0.0033],
[10000, 0.0019]
]
}
Use triple (train_size, buckets_limit, threshold) as above in the following instruction
python3 drive_region_tree.py --dataset Power-2d-data --query_type rect --train_size XXX --threshold XXX --buckets_limit XXX --test_size 100 --solver nnls
trainsize_buckets_threshold = {
50 : [
[100, 0.052],
[500, 0.0105],
[1000, 0.0063],
[5000, 0.0015],
[10000, 0.0006],
[50000, 0.00015],
[100000, 0.00007]
],
200 : [
[100, 0.08],
[500, 0.018],
[1000, 0.0096],
[5000, 0.0021],
[10000, 0.001],
[50000, 0.0002],
[100000, 0.0001]
],
500 : [
[100, 0.08],
[500, 0.02],
[1000, 0.0105],
[5000, 0.0025],
[10000, 0.0014],
[50000, 0.0003],
[100000, 0.00015]
],
1000 : [
[100, 0.11],
[500, 0.027],
[1000, 0.015],
[5000, 0.0031],
[10000, 0.0016],
[50000, 0.0004],
[100000, 0.00016]
],
2000 : [
[100, 0.125],
[500, 0.031],
[1000, 0.016],
[5000, 0.0036],
[10000, 0.0019],
[50000, 0.0004],
[100000, 0.0002]
]
}
Use triple(train_size, buckets_limit, threshold) as above in the following instruction
python3 driver_region_tree.py --dataset Power-2d-data --query_type rect --train_size XXX --buckets_limit XXX --threshold XXX --test_size 1000 --solver gurobi_linf