Skip to content

Latest commit

 

History

History

paramgen

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Parameter generation

The paramgen implements parameter curation to ensure predictable performance results that (mostly) correspond to a normal distribution.

Getting started

  1. Install dependencies:

    scripts/install-dependencies.sh
  2. Generating the factors entities with the Datagen: In Datagen's directory (ldbc_snb_datagen_spark), issue the following commands. We assume that the Datagen project is built and the ${LDBC_SNB_DATAGEN_MAX_MEM}, ${LDBC_SNB_DATAGEN_JAR} environment variables are set correctly.

    export SF=desired_scale_factor
    export LDBC_SNB_DATAGEN_MAX_MEM=available_memory
    export LDBC_SNB_DATAGEN_JAR=$(sbt -batch -error 'print assembly / assemblyOutputPath')
    rm -rf out-sf${SF}/
    tools/run.py \
        --cores $(nproc) \
        --memory ${LDBC_SNB_DATAGEN_MAX_MEM} \
        -- \
        --format parquet \
        --scale-factor ${SF} \
        --mode raw \
        --output-dir out-sf${SF} \
        --generate-factors
  3. Obtaining the factors: Create the scratch/factors/ directory and move the factor directories from out-sf${SF}/factors/csv/raw/composite-merged-fk/ (cityPairsNumFriends/, personDisjointEmployerPairs/, etc.) into it. Assuming that your ${LDBC_SNB_DATAGEN_DIR} and ${SF} environment variables are set, run:

    scripts/get-factors.sh

    To download and use the factors of the sample data set, run:

    scripts/get-sample-factors.sh
    export SF=0.003
  4. To run the parameter generator, ensure that ${SF} is set correctly and issue:

    scripts/paramgen.sh
  5. The parameters will be placed in the ../parameters/parameters-sf${SF}/ directory.

Memory consumption

The parameter generator process performs several join and aggregation operations on large tables, therefore, it uses a significant amount of memory. For example, the process for SF30,000 uses 404.8 GB RAM and takes about 11 minutes to run on an AWS EC2 m6id.32xlarge instance with a 128 vCPUs.