Name		Name	Last commit message	Last commit date
parent directory ..
paramgen-queries		paramgen-queries
scratch		scratch
scripts		scripts
README.md		README.md
paramgen.py		paramgen.py

README.md

Parameter generation

The paramgen implements parameter curation to ensure predictable performance results that (mostly) correspond to a normal distribution.

Getting started

Install dependencies:
```
scripts/install-dependencies.sh
```

Generating the factors entities with the Datagen: In Datagen's directory (ldbc_snb_datagen_spark), issue the following commands. We assume that the Datagen project is built and the ${LDBC_SNB_DATAGEN_MAX_MEM}, ${LDBC_SNB_DATAGEN_JAR} environment variables are set correctly.

export SF=desired_scale_factor
export LDBC_SNB_DATAGEN_MAX_MEM=available_memory
export LDBC_SNB_DATAGEN_JAR=$(sbt -batch -error 'print assembly / assemblyOutputPath')

rm -rf out-sf${SF}/
tools/run.py \
    --cores $(nproc) \
    --memory ${LDBC_SNB_DATAGEN_MAX_MEM} \
    -- \
    --format parquet \
    --scale-factor ${SF} \
    --mode raw \
    --output-dir out-sf${SF} \
    --generate-factors

Obtaining the factors: Create the scratch/factors/ directory and move the factor directories from out-sf${SF}/factors/csv/raw/composite-merged-fk/ (cityPairsNumFriends/, personDisjointEmployerPairs/, etc.) into it. Assuming that your ${LDBC_SNB_DATAGEN_DIR} and ${SF} environment variables are set, run:
```
scripts/get-factors.sh
```
To download and use the factors of the sample data set, run:
```
scripts/get-sample-factors.sh
export SF=0.003
```
To run the parameter generator, ensure that ${SF} is set correctly and issue:
```
scripts/paramgen.sh
```
The parameters will be placed in the ../parameters/parameters-sf${SF}/ directory.

Memory consumption

The parameter generator process performs several join and aggregation operations on large tables, therefore, it uses a significant amount of memory. For example, the process for SF30,000 uses 404.8 GB RAM and takes about 11 minutes to run on an AWS EC2 m6id.32xlarge instance with a 128 vCPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paramgen

paramgen

README.md

Parameter generation

Getting started

Memory consumption

Files

paramgen

Directory actions

More options

Directory actions

More options

Latest commit

History

paramgen

Folders and files

parent directory

README.md

Parameter generation

Getting started

Memory consumption