Operation counts not consistent across benchmarks #118

xwkuang5 · 2020-02-21T15:11:54Z

Hi,

I am reposting an open issue in the ldbc_snb_implementations repo here.

I am trying to use the cypher benchmark to evaluate the performance of Neo4j under different configurations. I set operation_count=2500 and run interactive-benchmark.sh script multiple times. However, I was getting three different final operations counts (2473, 2532, 2584) across 3 different runs. Is this the expected result?

Thanks for any help in advance!

Here is my configuration

endpoint=bolt://localhost:7687
user=neo4j
password=admin
queryDir=queries/
printQueryNames=false
printQueryStrings=false
printQueryResults=false

status=1
thread_count=2
name=LDBC-SNB
results_log=true
time_unit=MILLISECONDS
time_compression_ratio=0.001
peer_identifiers=
workload_statistics=false
spinner_wait_duration=1
help=false
ignore_scheduled_start_times=true

workload=com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcSnbInteractiveWorkload
db=com.ldbc.impls.workloads.ldbc.snb.cypher.interactive.CypherInteractiveDb
operation_count=2500
ldbc.snb.interactive.parameters_dir=../../ldbc_snb_datagen/substitution_parameters/
ldbc.snb.interactive.updates_dir=../../ldbc_snb_datagen/social_network/
ldbc.snb.interactive.short_read_dissipation=0.2
ldbc.snb.interactive.update_interleave=49274

warmup=100

## frequency of read queries (number of update queries per one read query)
ldbc.snb.interactive.LdbcQuery1_freq=26
ldbc.snb.interactive.LdbcQuery2_freq=37
ldbc.snb.interactive.LdbcQuery3_freq=123
ldbc.snb.interactive.LdbcQuery4_freq=36
ldbc.snb.interactive.LdbcQuery5_freq=78
ldbc.snb.interactive.LdbcQuery6_freq=434
ldbc.snb.interactive.LdbcQuery7_freq=38
ldbc.snb.interactive.LdbcQuery8_freq=5
ldbc.snb.interactive.LdbcQuery9_freq=527
ldbc.snb.interactive.LdbcQuery10_freq=40
ldbc.snb.interactive.LdbcQuery11_freq=22
ldbc.snb.interactive.LdbcQuery12_freq=44
ldbc.snb.interactive.LdbcQuery13_freq=19
ldbc.snb.interactive.LdbcQuery14_freq=49

# *** For debugging purposes ***

ldbc.snb.interactive.LdbcQuery1_enable=true
ldbc.snb.interactive.LdbcQuery2_enable=true
ldbc.snb.interactive.LdbcQuery3_enable=true
ldbc.snb.interactive.LdbcQuery4_enable=true
ldbc.snb.interactive.LdbcQuery5_enable=true
ldbc.snb.interactive.LdbcQuery6_enable=true
ldbc.snb.interactive.LdbcQuery7_enable=true
ldbc.snb.interactive.LdbcQuery8_enable=true
ldbc.snb.interactive.LdbcQuery9_enable=true
ldbc.snb.interactive.LdbcQuery10_enable=true
ldbc.snb.interactive.LdbcQuery11_enable=true
ldbc.snb.interactive.LdbcQuery12_enable=true
ldbc.snb.interactive.LdbcQuery13_enable=true
ldbc.snb.interactive.LdbcQuery14_enable=true

ldbc.snb.interactive.LdbcShortQuery1PersonProfile_enable=true
ldbc.snb.interactive.LdbcShortQuery2PersonPosts_enable=true
ldbc.snb.interactive.LdbcShortQuery3PersonFriends_enable=true
ldbc.snb.interactive.LdbcShortQuery4MessageContent_enable=true
ldbc.snb.interactive.LdbcShortQuery5MessageCreator_enable=true
ldbc.snb.interactive.LdbcShortQuery6MessageForum_enable=true
ldbc.snb.interactive.LdbcShortQuery7MessageReplies_enable=true

ldbc.snb.interactive.LdbcUpdate1AddPerson_enable=true
ldbc.snb.interactive.LdbcUpdate2AddPostLike_enable=true
ldbc.snb.interactive.LdbcUpdate3AddCommentLike_enable=true
ldbc.snb.interactive.LdbcUpdate4AddForum_enable=true
ldbc.snb.interactive.LdbcUpdate5AddForumMembership_enable=true
ldbc.snb.interactive.LdbcUpdate6AddPost_enable=true
ldbc.snb.interactive.LdbcUpdate7AddComment_enable=true
ldbc.snb.interactive.LdbcUpdate8AddFriendship_enable=true

The text was updated successfully, but these errors were encountered:

xwkuang5 · 2020-02-21T15:42:18Z

If I understand short_read_dissipation correctly, it is the delta in the random walk model. Larger short_read_dissipation means a shorter walk, e.g., in the extreme case where short_read_dissipation=1, there should be no short reads after the complex read. Is this the reason why the number of operations can be different across different runs at the end?

xwkuang5 · 2020-02-21T15:43:15Z

If the above is true, is there a way to set the random seed in the test driver to make sure that the workload of a particular benchmark can be replayed?

jackwaudby · 2020-04-13T01:00:58Z

Hi @xwkuang5

Sorry for the delay in replying.

I was getting three different final operations counts (2473, 2532, 2584) across 3 different runs. Is this the expected result?
I will discuss this with task force when we talk next

I've just ran the cypher implementation a few times with your configuration and can reproduce the issue. Which scale factor are you using to generate the data?

Best,

Jack

xwkuang5 · 2020-04-13T02:32:01Z

Hi Jack, thanks for your reply

I believed it's SF1 (or SF3)

jackwaudby self-assigned this Apr 13, 2020

szarnyasg added the question label Dec 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operation counts not consistent across benchmarks #118

Operation counts not consistent across benchmarks #118

xwkuang5 commented Feb 21, 2020

xwkuang5 commented Feb 21, 2020

xwkuang5 commented Feb 21, 2020

jackwaudby commented Apr 13, 2020

xwkuang5 commented Apr 13, 2020

Operation counts not consistent across benchmarks #118

Operation counts not consistent across benchmarks #118

Comments

xwkuang5 commented Feb 21, 2020

xwkuang5 commented Feb 21, 2020

xwkuang5 commented Feb 21, 2020

jackwaudby commented Apr 13, 2020

xwkuang5 commented Apr 13, 2020