diff --git a/README.md b/README.md index a22f20b..221e670 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ Code for the benchmark study described in this [blog post](https://thedataquarry Neo4j version | KΓΉzu version | Python version :---: | :---: | :---: -5.22.0 (community) | 0.6.0 | 3.12.4 +5.26.0 (community) | 0.7.0 | 3.12.5 [KΓΉzu](https://kuzudb.com/) is an in-process (embedded) graph database management system (GDBMS) written in C++. It is blazing fast πŸ”₯, and is optimized for handling complex join-heavy analytical workloads on very large graphs. KΓΉzu's [goal](https://kuzudb.com/docusaurus/blog/what-every-gdbms-should-do-and-vision) is to do in the graph database world what DuckDB has done in the world of relational databases -- that is, to provide a fast, lightweight, embeddable graph database for analytics (OLAP) use cases, while being heavily focused on usability and developer productivity. @@ -89,20 +89,21 @@ The following questions are asked of both graphs: The run times for both ingestion and queries are compared. -* For ingestion, KΓΉzuDB is consistently faster than Neo4j by a factor of **~18x** for a graph size of 100K nodes and ~2.4M edges. +* For ingestion, KΓΉzuDB is consistently faster than Neo4j by a factor of **~18x** for a * For OLAP queries, KΓΉzuDB is **significantly faster** than Neo4j, especially for ones that involve multi-hop queries via nodes with many-to-many relationships. ### Benchmark conditions -The benchmark is run M3 Macbook Pro with 36 GB RAM. +- Machine: M3 Macbook Pro with 36 GB RAM. +- Graph size: 100K nodes, ~2.4M edges. ### Ingestion performance Case | Neo4j (sec) | KΓΉzu (sec) | Speedup factor --- | ---: | ---: | ---: -Nodes | 2.33 | 0.11 | 21.2x -Edges | 31.08 | 0.42 | 74.0x -Total | 33.41 | 0.53 | 63.0x +Nodes | 1.85 | 0.13 | 14.2x +Edges | 28.79 | 0.45 | 64.0x +Total | 30.64 | 0.58 | 52.8x Nodes are ingested significantly faster in KΓΉzu, and using its community edition, Neo4j's node ingestion remains of the order of seconds @@ -123,44 +124,28 @@ The benchmarks are run via the `pytest-benchmark` library for the query scripts * Each query is run for a **minimum of 5 rounds**, so the run times shown in each section below as the **average over a minimum of 5 rounds**, or upwards of 50 rounds. * Long-running queries (where the total run time exceeds 1 sec) are run for at least 5 rounds. * Short-running queries (of the order of milliseconds) will run as many times as fits into a period of 1 second, so the fastest queries can run upwards of 50 times. -* Python's own GC overhead can obscure true run times, so the `benchamrk-disable-gc` argument is enabled. +* Python's own GC overhead can obscure true run times, so the `benchmark-disable-gc` argument is enabled. See the [`pytest-benchmark` docs](https://pytest-benchmark.readthedocs.io/en/latest/calibration.html) to see how they calibrate their timer and group the rounds. -#### Neo4j vs. KΓΉzu single-threaded +#### Neo4j vs. KΓΉzu -The following table shows the run times for each query (averaged over the number of rounds run, guaranteed to be a minimum of 5 runs) and the speedup factor of KΓΉzu over Neo4j when KΓΉzu is **limited to execute queries on a single thread**. +KΓΉzuDB supports multi-threaded execution of queries with maximum thread utilization as available on the machine. +The run times for each query (averaged over the number of rounds run, guaranteed to be a minimum of 5 runs) are shown below. Query | Neo4j (sec) | KΓΉzu (sec) | Speedup factor --- | ---: | ---: | ---: -1 | 1.375 | 0.216 | 6.4x -2 | 0.567 | 0.253 | 2.2x -4 | 0.047 | 0.008 | 5.9x -3 | 0.052 | 0.006 | 8.7x -5 | 0.012 | 0.181 | 0.1x -6 | 0.024 | 0.059 | 0.4x -7 | 0.155 | 0.013 | 11.9x -8 | 2.988 | 0.064 | 46.7x -9 | 3.755 | 0.170 | 22.1x - - -#### Neo4j vs. KΓΉzu multi-threaded - -KΓΉzuDB (by default) supports multi-threaded execution of queries. The following results are for the same queries as above, but allowing KΓΉzu to choose the optimal number of threads for each query. Again, the run times for each query (averaged over the number of rounds run, guaranteed to be a minimum of 5 runs) are shown. - -Query | Neo4j (sec) | KΓΉzu (sec) | Speedup factor ---- | ---: | ---: | ---: -1 | 1.375 | 0.251 | 5.5x -2 | 0.567 | 0.283 | 2.0x -3 | 0.052 | 0.011 | 4.7x -4 | 0.047 | 0.008 | 5.9x -5 | 0.012 | 0.017 | 0.7x -6 | 0.024 | 0.061 | 0.4x -7 | 0.155 | 0.014 | 11.1x -8 | 2.988 | 0.064 | 46.7x -9 | 3.755 | 0.142 | 26.5x - -> πŸ”₯ The second-degree path-finding queries (8 and 9) show the biggest speedup over Neo4j, due to innovations in KΓΉzuDB's query planner and execution engine. +1 | 1.464 | 0.204 | 7.2x +2 | 0.564 | 0.669 | 0.8x +3 | 0.047 | 0.008 | 5.9x +4 | 0.045 | 0.021 | 2.1x +5 | 0.011 | 0.008 | 1.4x +6 | 0.024 | 0.013 | 1.8x +7 | 0.142 | 0.012 | 11.8x +8 | 2.960 | 0.009 | 328.9x +9 | 3.361 | 0.099 | 33.9x + +> πŸ”₯ The n-hop path-finding queries (8 and 9) show the biggest speedup over Neo4j, due to core innovations in KΓΉzu's query engine. ### Ideas for future work diff --git a/kuzudb/README.md b/kuzudb/README.md index 26b24ec..dbac333 100644 --- a/kuzudb/README.md +++ b/kuzudb/README.md @@ -111,11 +111,11 @@ shape: (5, 2) β”‚ --- ┆ --- β”‚ β”‚ str ┆ f64 β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════║ -β”‚ Austin ┆ 37.732948 β”‚ -β”‚ Kansas City ┆ 37.83065 β”‚ -β”‚ Miami ┆ 37.860339 β”‚ -β”‚ Houston ┆ 37.894676 β”‚ -β”‚ San Antonio ┆ 37.896669 β”‚ +β”‚ Austin ┆ 38.506936 β”‚ +β”‚ Kansas City ┆ 38.589117 β”‚ +β”‚ Miami ┆ 38.61185 β”‚ +β”‚ San Antonio ┆ 38.653303 β”‚ +β”‚ Portland ┆ 38.659103 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Query 4: @@ -132,9 +132,9 @@ shape: (3, 2) β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════║ -β”‚ United States ┆ 30698 β”‚ -β”‚ Canada ┆ 3037 β”‚ -β”‚ United Kingdom ┆ 1819 β”‚ +β”‚ United States ┆ 30712 β”‚ +β”‚ Canada ┆ 3043 β”‚ +β”‚ United Kingdom ┆ 1809 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Query 5: @@ -199,7 +199,7 @@ shape: (1, 3) β”‚ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ str ┆ str β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════β•ͺ═══════════════║ -β”‚ 165 ┆ California ┆ United States β”‚ +β”‚ 150 ┆ California ┆ United States β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ @@ -234,78 +234,42 @@ shape: (1, 1) β”‚ --- β”‚ β”‚ i64 β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•‘ -β”‚ 46061065 β”‚ +β”‚ 45633521 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -Queries completed in 0.9352s +Queries completed in 1.1521s ``` -#### Query performance benchmark (KΓΉzu single-threaded) - -The benchmark is run using `pytest-benchmark` package as follows. - -```sh -$ pytest benchmark_query.py --benchmark-min-rounds=5 --benchmark-warmup-iterations=5 --benchmark-disable-gc --benchmark-sort=fullname -================================================================================================== test session starts ================================================================================================== -platform darwin -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=True min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=5) -rootdir: /Users/prrao/code/kuzudb-study/kuzudb -plugins: benchmark-4.0.0, Faker-27.0.0 -collected 9 items - -benchmark_query.py ......... [100%] - - --------------------------------------------------------------------------------------- benchmark: 9 tests ------------------------------------------------------------------------------------- -Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ -test_benchmark_query1 209.0226 (37.48) 225.0497 (29.72) 215.5997 (34.97) 5.8951 (13.21) 215.0382 (35.75) 5.9799 (20.00) 2;0 4.6382 (0.03) 5 1 -test_benchmark_query2 246.2083 (44.15) 264.1887 (34.89) 252.6910 (40.98) 7.4496 (16.69) 249.8977 (41.55) 10.9899 (36.76) 1;0 3.9574 (0.02) 5 1 -test_benchmark_query3 7.8650 (1.41) 10.3009 (1.36) 8.4119 (1.36) 0.5128 (1.15) 8.2522 (1.37) 0.2990 (1.0) 10;9 118.8793 (0.73) 77 1 -test_benchmark_query4 5.5767 (1.0) 7.5731 (1.0) 6.1661 (1.0) 0.4463 (1.0) 6.0145 (1.0) 0.5087 (1.70) 28;6 162.1773 (1.0) 110 1 -test_benchmark_query5 17.1654 (3.08) 19.9440 (2.63) 18.1153 (2.94) 0.5701 (1.28) 18.0157 (3.00) 0.5536 (1.85) 12;2 55.2021 (0.34) 43 1 -test_benchmark_query6 57.9529 (10.39) 61.3611 (8.10) 59.7041 (9.68) 1.0348 (2.32) 59.7962 (9.94) 1.7159 (5.74) 6;0 16.7493 (0.10) 17 1 -test_benchmark_query7 11.9360 (2.14) 14.5519 (1.92) 12.8594 (2.09) 0.4884 (1.09) 12.8518 (2.14) 0.5965 (2.00) 19;1 77.7638 (0.48) 61 1 -test_benchmark_query8 61.0895 (10.95) 79.2555 (10.47) 63.8552 (10.36) 4.3809 (9.82) 62.7752 (10.44) 1.2777 (4.27) 1;1 15.6604 (0.10) 15 1 -test_benchmark_query9 166.9417 (29.94) 172.1303 (22.73) 169.9383 (27.56) 1.9027 (4.26) 170.3806 (28.33) 2.4920 (8.34) 2;0 5.8845 (0.04) 6 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ - -Legend: - Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. - OPS: Operations Per Second, computed as 1 / Mean -================================================================================================== 9 passed in 11.21s =================================================================================================== -``` - -#### Query performance (KΓΉzu multi-threaded) +#### Query performance ```sh $ pytest benchmark_query.py --benchmark-min-rounds=5 --benchmark-warmup-iterations=5 --benchmark-disable-gc --benchmark-sort=fullname -================================================================================================== test session starts ================================================================================================== -platform darwin -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=True min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=5) +============================================= test session starts ============================================= +platform darwin -- Python 3.12.5, pytest-8.3.4, pluggy-1.5.0 +benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=True min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=5) rootdir: /Users/prrao/code/kuzudb-study/kuzudb -plugins: benchmark-4.0.0, Faker-27.0.0 -collected 9 items - -benchmark_query.py ......... [100%] - - --------------------------------------------------------------------------------------- benchmark: 9 tests -------------------------------------------------------------------------------------- -Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -test_benchmark_query1 231.0743 (32.50) 261.8040 (21.24) 250.8162 (30.15) 12.6695 (28.81) 257.7072 (31.81) 17.3084 (38.99) 1;0 3.9870 (0.03) 5 1 -test_benchmark_query2 273.6126 (38.48) 297.1214 (24.10) 282.6447 (33.97) 8.9920 (20.45) 279.3069 (34.48) 10.4895 (23.63) 2;0 3.5380 (0.03) 5 1 -test_benchmark_query3 9.9735 (1.40) 12.3276 (1.0) 11.0040 (1.32) 0.4398 (1.0) 11.0476 (1.36) 0.4770 (1.07) 21;3 90.8758 (0.76) 65 1 -test_benchmark_query4 7.1110 (1.0) 17.2494 (1.40) 8.3197 (1.0) 1.3347 (3.03) 8.1007 (1.0) 0.6346 (1.43) 4;4 120.1966 (1.0) 87 1 -test_benchmark_query5 16.3182 (2.29) 18.7523 (1.52) 17.3595 (2.09) 0.4974 (1.13) 17.3593 (2.14) 0.6131 (1.38) 11;1 57.6053 (0.48) 45 1 -test_benchmark_query6 58.4198 (8.22) 64.1914 (5.21) 60.5753 (7.28) 1.6598 (3.77) 60.5213 (7.47) 2.5499 (5.74) 4;0 16.5084 (0.14) 17 1 -test_benchmark_query7 12.5706 (1.77) 15.4372 (1.25) 13.6218 (1.64) 0.4853 (1.10) 13.5676 (1.67) 0.4439 (1.0) 14;4 73.4115 (0.61) 60 1 -test_benchmark_query8 60.2927 (8.48) 67.3211 (5.46) 64.2639 (7.72) 2.1083 (4.79) 64.5842 (7.97) 2.6848 (6.05) 5;0 15.5608 (0.13) 15 1 -test_benchmark_query9 134.8778 (18.97) 150.1560 (12.18) 141.7584 (17.04) 5.0156 (11.40) 141.7625 (17.50) 5.8772 (13.24) 2;0 7.0543 (0.06) 7 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +plugins: Faker-33.1.0, benchmark-5.1.0 +collected 9 items + +benchmark_query.py ......... [100%] + + +------------------------------------------------------------------------------------- benchmark: 9 tests ------------------------------------------------------------------------------------- +Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +test_benchmark_query1 195.5340 (25.53) 220.1770 (25.56) 204.1142 (25.67) 9.5230 (65.49) 200.9037 (25.38) 9.7862 (49.90) 1;0 4.8992 (0.04) 5 1 +test_benchmark_query2 666.2035 (86.97) 671.2414 (77.92) 668.8017 (84.10) 2.2369 (15.38) 669.2646 (84.56) 4.0761 (20.79) 2;0 1.4952 (0.01) 5 1 +test_benchmark_query3 7.7908 (1.02) 9.4851 (1.10) 8.2185 (1.03) 0.3032 (2.09) 8.1436 (1.03) 0.2459 (1.25) 10;5 121.6770 (0.97) 79 1 +test_benchmark_query4 20.0428 (2.62) 23.1529 (2.69) 20.7053 (2.60) 0.7585 (5.22) 20.4059 (2.58) 1.0047 (5.12) 4;2 48.2967 (0.38) 42 1 +test_benchmark_query5 7.6601 (1.0) 8.6142 (1.0) 7.9520 (1.0) 0.2088 (1.44) 7.9143 (1.0) 0.2467 (1.26) 23;5 125.7544 (1.0) 79 1 +test_benchmark_query6 11.9726 (1.56) 14.0642 (1.63) 12.8878 (1.62) 0.3841 (2.64) 12.7875 (1.62) 0.3539 (1.80) 18;5 77.5926 (0.62) 69 1 +test_benchmark_query7 11.2922 (1.47) 12.9255 (1.50) 12.0330 (1.51) 0.2941 (2.02) 12.0267 (1.52) 0.3973 (2.03) 20;1 83.1045 (0.66) 65 1 +test_benchmark_query8 8.9595 (1.17) 9.6740 (1.12) 9.3152 (1.17) 0.1454 (1.0) 9.3208 (1.18) 0.1961 (1.0) 28;0 107.3509 (0.85) 95 1 +test_benchmark_query9 92.3531 (12.06) 105.8683 (12.29) 99.2903 (12.49) 4.7381 (32.59) 101.1793 (12.78) 7.4598 (38.04) 3;0 10.0715 (0.08) 10 1 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean -================================================================================================== 9 passed in 11.99s =================================================================================================== +============================================= 9 passed in 13.47s ============================================== ``` diff --git a/kuzudb/benchmark_query.py b/kuzudb/benchmark_query.py index 4795dea..a220876 100644 --- a/kuzudb/benchmark_query.py +++ b/kuzudb/benchmark_query.py @@ -49,8 +49,8 @@ def test_benchmark_query3(benchmark, connection): assert result[0]["city"] == "Austin" assert result[1]["city"] == "Kansas City" assert result[2]["city"] == "Miami" - assert result[3]["city"] == "Houston" - assert result[4]["city"] == "San Antonio" + assert result[3]["city"] == "San Antonio" + assert result[4]["city"] == "Portland" def test_benchmark_query4(benchmark, connection): @@ -61,9 +61,9 @@ def test_benchmark_query4(benchmark, connection): assert result[0]["countries"] == "United States" assert result[1]["countries"] == "Canada" assert result[2]["countries"] == "United Kingdom" - assert result[0]["personCounts"] == 30698 - assert result[1]["personCounts"] == 3037 - assert result[2]["personCounts"] == 1819 + assert result[0]["personCounts"] == 30712 + assert result[1]["personCounts"] == 3043 + assert result[2]["personCounts"] == 1809 def test_benchmark_query5(benchmark, connection): @@ -114,7 +114,7 @@ def test_benchmark_query7(benchmark, connection): result = result.to_dicts() assert len(result) == 1 - assert result[0]["numPersons"] == 165 + assert result[0]["numPersons"] == 150 assert result[0]["state"] == "California" assert result[0]["country"] == "United States" @@ -132,4 +132,4 @@ def test_benchmark_query9(benchmark, connection): result = result.to_dicts() assert len(result) == 1 - assert result[0]["numPaths"] == 46061065 + assert result[0]["numPaths"] == 45633521 diff --git a/neo4j/.env.example b/neo4j/.env.example index 4f289ee..5504394 100644 --- a/neo4j/.env.example +++ b/neo4j/.env.example @@ -1,3 +1,3 @@ -NEO4J_VERSION = "5.22.0" +NEO4J_VERSION = "5.26.0" NEO4J_USER = "neo4j" NEO4J_PASSWORD = \ No newline at end of file diff --git a/neo4j/README.md b/neo4j/README.md index 248399f..94e2573 100644 --- a/neo4j/README.md +++ b/neo4j/README.md @@ -118,11 +118,11 @@ shape: (5, 2) β”‚ --- ┆ --- β”‚ β”‚ str ┆ f64 β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════║ -β”‚ Austin ┆ 37.732948 β”‚ -β”‚ Kansas City ┆ 37.83065 β”‚ -β”‚ Miami ┆ 37.860339 β”‚ -β”‚ Houston ┆ 37.894676 β”‚ -β”‚ San Antonio ┆ 37.896669 β”‚ +β”‚ Austin ┆ 38.506936 β”‚ +β”‚ Kansas City ┆ 38.589117 β”‚ +β”‚ Miami ┆ 38.61185 β”‚ +β”‚ San Antonio ┆ 38.653303 β”‚ +β”‚ Portland ┆ 38.659103 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Query 4: @@ -139,9 +139,9 @@ shape: (3, 2) β”‚ --- ┆ --- β”‚ β”‚ str ┆ i64 β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════║ -β”‚ United States ┆ 30698 β”‚ -β”‚ Canada ┆ 3037 β”‚ -β”‚ United Kingdom ┆ 1819 β”‚ +β”‚ United States ┆ 30712 β”‚ +β”‚ Canada ┆ 3043 β”‚ +β”‚ United Kingdom ┆ 1809 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Query 5: @@ -206,7 +206,7 @@ shape: (1, 3) β”‚ --- ┆ --- ┆ --- β”‚ β”‚ i64 ┆ str ┆ str β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════β•ͺ═══════════════║ -β”‚ 165 ┆ California ┆ United States β”‚ +β”‚ 150 ┆ California ┆ United States β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ @@ -241,10 +241,10 @@ shape: (1, 1) β”‚ --- β”‚ β”‚ i64 β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•‘ -β”‚ 46061065 β”‚ +β”‚ 45633521 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -Neo4j query script completed in 10.171206s +Neo4j query script completed in 10.207871s ``` ### Query performance benchmark @@ -253,32 +253,32 @@ The benchmark is run using `pytest-benchmark` package as follows. ```sh $ pytest benchmark_query.py --benchmark-min-rounds=5 --benchmark-warmup-iterations=5 --benchmark-disable-gc --benchmark-sort=fullname -================================================= test session starts ================================================== -platform darwin -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0 -benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=True min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=5) +============================================= test session starts ============================================= +platform darwin -- Python 3.12.5, pytest-8.3.4, pluggy-1.5.0 +benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=True min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=5) rootdir: /Users/prrao/code/kuzudb-study/neo4j -plugins: Faker-25.8.0, benchmark-4.0.0 -collected 9 items +plugins: Faker-33.1.0, benchmark-5.1.0 +collected 9 items -benchmark_query.py ......... [100%] +benchmark_query.py ......... [100%] --------------------------------------------------------------------------------- benchmark: 9 tests -------------------------------------------------------------------------------- Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -test_benchmark_query1 1.3575 (143.82) 1.4192 (107.03) 1.3752 (119.96) 0.0258 (16.24) 1.3649 (118.11) 0.0296 (16.66) 1;0 0.7272 (0.01) 5 1 -test_benchmark_query2 0.5362 (56.81) 0.5856 (44.16) 0.5665 (49.42) 0.0213 (13.43) 0.5760 (49.85) 0.0348 (19.59) 1;0 1.7652 (0.02) 5 1 -test_benchmark_query3 0.0426 (4.52) 0.0615 (4.63) 0.0515 (4.50) 0.0074 (4.63) 0.0536 (4.64) 0.0105 (5.92) 2;0 19.4029 (0.22) 5 1 -test_benchmark_query4 0.0453 (4.80) 0.0507 (3.82) 0.0474 (4.14) 0.0017 (1.09) 0.0470 (4.07) 0.0018 (1.0) 2;0 21.0891 (0.24) 7 1 -test_benchmark_query5 0.0094 (1.0) 0.0133 (1.0) 0.0115 (1.0) 0.0016 (1.0) 0.0116 (1.0) 0.0034 (1.91) 5;0 87.2324 (1.0) 9 1 -test_benchmark_query6 0.0218 (2.31) 0.0276 (2.08) 0.0240 (2.09) 0.0018 (1.14) 0.0237 (2.05) 0.0020 (1.12) 3;2 41.6853 (0.48) 13 1 -test_benchmark_query7 0.1525 (16.16) 0.1588 (11.97) 0.1552 (13.54) 0.0024 (1.48) 0.1554 (13.45) 0.0030 (1.66) 2;0 6.4434 (0.07) 5 1 -test_benchmark_query8 2.7747 (293.98) 3.2128 (242.29) 2.9876 (260.62) 0.1806 (113.66) 3.0578 (264.60) 0.2812 (158.43) 2;0 0.3347 (0.00) 5 1 -test_benchmark_query9 3.6285 (384.43) 3.9354 (296.78) 3.7553 (327.59) 0.1161 (73.06) 3.7202 (321.93) 0.1442 (81.22) 2;0 0.2663 (0.00) 5 1 +test_benchmark_query1 1.4139 (166.15) 1.4639 (101.65) 1.4318 (128.81) 0.0220 (14.78) 1.4218 (124.90) 0.0356 (44.71) 1;0 0.6984 (0.01) 5 1 +test_benchmark_query2 0.5343 (62.79) 0.6022 (41.81) 0.5642 (50.76) 0.0289 (19.37) 0.5491 (48.24) 0.0459 (57.54) 2;0 1.7725 (0.02) 5 1 +test_benchmark_query3 0.0394 (4.63) 0.0585 (4.06) 0.0468 (4.21) 0.0080 (5.35) 0.0461 (4.05) 0.0127 (15.87) 1;0 21.3803 (0.24) 5 1 +test_benchmark_query4 0.0435 (5.11) 0.0481 (3.34) 0.0447 (4.02) 0.0017 (1.15) 0.0443 (3.89) 0.0008 (1.0) 1;1 22.3566 (0.25) 6 1 +test_benchmark_query5 0.0085 (1.0) 0.0144 (1.0) 0.0111 (1.0) 0.0018 (1.20) 0.0114 (1.0) 0.0026 (3.23) 4;0 89.9685 (1.0) 11 1 +test_benchmark_query6 0.0220 (2.58) 0.0281 (1.95) 0.0236 (2.13) 0.0015 (1.0) 0.0233 (2.05) 0.0010 (1.21) 2;1 42.3216 (0.47) 13 1 +test_benchmark_query7 0.1390 (16.33) 0.1444 (10.03) 0.1423 (12.80) 0.0021 (1.44) 0.1433 (12.59) 0.0028 (3.51) 1;0 7.0266 (0.08) 5 1 +test_benchmark_query8 2.7413 (322.14) 3.0664 (212.92) 2.9599 (266.30) 0.1325 (88.95) 2.9873 (262.43) 0.1683 (211.12) 1;0 0.3378 (0.00) 5 1 +test_benchmark_query9 3.6300 (426.57) 3.7607 (261.13) 3.6916 (332.12) 0.0557 (37.37) 3.6990 (324.95) 0.0967 (121.38) 2;0 0.2709 (0.00) 5 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean -============================================= 9 passed in 66.86s (0:01:06) ============================================= +======================================== 9 passed in 66.14s (0:01:06) ========================================= ``` \ No newline at end of file diff --git a/neo4j/benchmark_query.py b/neo4j/benchmark_query.py index 940a66a..a6112a8 100644 --- a/neo4j/benchmark_query.py +++ b/neo4j/benchmark_query.py @@ -56,8 +56,8 @@ def test_benchmark_query3(benchmark, session): assert result[0]["city"] == "Austin" assert result[1]["city"] == "Kansas City" assert result[2]["city"] == "Miami" - assert result[3]["city"] == "Houston" - assert result[4]["city"] == "San Antonio" + assert result[3]["city"] == "San Antonio" + assert result[4]["city"] == "Portland" def test_benchmark_query4(benchmark, session): @@ -68,9 +68,9 @@ def test_benchmark_query4(benchmark, session): assert result[0]["countries"] == "United States" assert result[1]["countries"] == "Canada" assert result[2]["countries"] == "United Kingdom" - assert result[0]["personCounts"] == 30698 - assert result[1]["personCounts"] == 3037 - assert result[2]["personCounts"] == 1819 + assert result[0]["personCounts"] == 30712 + assert result[1]["personCounts"] == 3043 + assert result[2]["personCounts"] == 1809 def test_benchmark_query5(benchmark, session): @@ -96,7 +96,7 @@ def test_benchmark_query7(benchmark, session): result = result.to_dicts() assert len(result) == 1 - assert result[0]["numPersons"] == 165 + assert result[0]["numPersons"] == 150 assert result[0]["state"] == "California" assert result[0]["country"] == "United States" @@ -114,4 +114,4 @@ def test_benchmark_query9(benchmark, session): result = result.to_dicts() assert len(result) == 1 - assert result[0]["numPaths"] == 46061065 + assert result[0]["numPaths"] == 45633521 diff --git a/requirements.txt b/requirements.txt index 5be2c7a..9252d3d 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,8 +1,9 @@ -faker~=27.0.0 -polars~=1.5.0 -pyarrow~=17.0.0 -kuzu~=0.6.0 -neo4j~=5.23.0 -python-dotenv>=1.0.0 -codetiming>=1.4.0 -pytest-benchmark>=4.0.0 \ No newline at end of file +faker~=33.1.0 +polars~=1.17.0 +pyarrow~=18.1.0 +numpy~=2.2.0 +kuzu~=0.7.0 +neo4j~=5.27.0 +python-dotenv~=1.0.0 +codetiming~=1.4.0 +pytest-benchmark~=5.1.0 \ No newline at end of file