Design of high-performance OpenMP-based parallel Graph CSR loader.
High-performance graph processing frameworks like Gunrock, Hornet, Ligra, and Galois help accelerate graph analytics tasks. However, graph loading is a major bottleneck in these frameworks. Fast graph loading is crucial for improving response time and reducing system/cloud usage charges. To address this, we introduce GVEL, a highly optimized method for reading Edgelists from text files and converting them into Compressed Sparse Row (CSR) format.
Below we plot the time taken by Hornet, Gunrock, PIGO, and GVEL for reading Edgelist and converting it to CSR on 13 different graphs. PIGO and GVEL are not visible on this scale - they are significantly faster than Hornet and Gunrock. The graph loading time for Hornet is not shown for uk-2002
, it-2004
, and sk-2005
graphs as it crashed while loading. GVEL surpasses Hornet, Gunrock, and PIGO in CSR reading by 78×
, 112×
, and 1.8×
, respectively.
Below we plot only the time taken by PIGO and GVEL for reading Edgelist and converting it to CSR.
Next, we plot the time taken by PIGO and GVEL for reading Edgelist. Here, GVEL outperforms PIGO by 2.6×
, achieving a read rate of 1.9 billion edges/s
with 64 threads.
Finally, we plot the strong scaling behaviour of GVEL for reading Edgelist, and for reading CSR.
Refer to our technical report for more details:
GVEL: Fast Graph Loading in Edgelist and Compressed Sparse Row (CSR) formats.
Note
You can just copy main.sh
to your system and run it.
For the code, refer to main.cxx
.
The code structure of GVEL is as follows:
- inc/_cctype.hxx: Character classification/conversion
- inc/debug.hxx: Debugging macros (LOG, ASSERT, ...)
- inc/exception.hxx: Custom exception class (FormatError)
- inc/_mman.hxx: Memory mapping/allcation functions
- inc/_openmp.hxx: OpenMP utility functions
- inc/_string.hxx: Number parsing/string tokenization
- inc/_utility.hxx: Runtime measurement functions
- inc/_vector.hxx: Vector utility functions
- inc/io.hxx: COO/MTX file reading functions
- main.cxx: Experimentation code
- process.js: Node.js script for processing output logs
Note that each branch in this repository contains code for a specific experiment. The main
branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.
- Best/more standard graph representation file format? (GraphSON, Gexf, GraphML?) - Stack Overflow
- Is there documentation on the structure of MAT-files in MATLAB? - MATLAB Answers - MATLAB Central
- Matrix Market: File Formats
- Matrix Market exchange formats - Wikipedia
- Harwell-Boeing file format - Wikipedia
- RB Files - The Rutherford Boeing Sparse Matrix File Format
- HB Files - The Harwell Boeing Sparse Matrix File Format
- DIMACS graph format
- DIMACS-READER - Dimacs-reader
- GT-TDAlab/PIGO
- rapidsai/cuhornet
- gunrock/gunrock
- the-data-lab/GraphOne
- rapidsai/cugraph
- cmuparlay/pbbsbench
- KatanaGraph/katana
- GraphIt-DSL/graphit
- Why mmap is faster than system calls | by Alexandra (Sasha) Fedorova | Medium
- File Handling Through C++ | How to Open, Save, Read and Close
- How to use mmap function in C language?
- Efficient Memory Mapped File I/O for In-Memory File Systems
- DI-MMAP—a scalable memory-map runtime for out-of-core data-intensive applications | Cluster Computing
- Optimizing Memory-Access Patterns for Deep Learning Accelerators
- malloc vs mmap in C - Stack Overflow
- memory - Anonymous mmap zero-filled? - Stack Overflow
- Linux mmap() with MAP_POPULATE, man page seems to give wrong info - Stack Overflow
- Is there really no asynchronous block I/O on Linux? - Stack Overflow
- c - How to use mmap to allocate a memory in heap? - Stack Overflow
- c - When should I use mmap for file access? - Stack Overflow
- c - Zero a large memory mapping with
madvise
- Stack Overflow - c - When would one use mmap MAP_FIXED? - Stack Overflow
- c - Overlapping pages with mmap (MAP_FIXED) - Stack Overflow
- c++ - Platform independent memory mapped [file] IO - Stack Overflow
- c++ - mmap() vs. reading blocks - Stack Overflow
- c++ - Is mmap + madvise really a form of async I/O? - Stack Overflow
- c++ - Does unique_ptr::release() call the destructor? - Stack Overflow
- Does the OS dispatch to another process when there's a page fault on the current process? - Quora
- memory management - Does a page fault causes the faulting process to reschedule? - Stack Overflow
- memory management - How to get page size programmatically within Linux kernel module code - Stack Overflow
- Writing files using O_DIRECT in C
- LKML: Linus Torvalds: Re: O_DIRECT question
- file - Portability of open(...O_DIRECT) in C? - Stack Overflow
- Hugepages - Debian Wiki
- open(2) - Linux manual page
- close(2): close file descriptor - Linux man page
- mmap(3): map pages of memory - Linux man page
- mmap(2) - Linux manual page
- madvise(2): give advice about use of memory - Linux man page
- madvise(2) - Linux manual page
- msync(2) - Linux manual page
- mincore(2) - Linux manual page
- <sys/mman.h>
- isspace() in C - GeeksforGeeks
- Three reasons to pass
std::string_view
by value – Arthur O'Dwyer – Stuff mostly about C++
- What is vertical tab, form feeds and backspace character? How to use them in JavaScript? - Stack Overflow
- c++ - How to efficiently get a
string_view
for a substring ofstd::string
- Stack Overflow - c++ - Constructing istringstream with string_view doesn't compile - Stack Overflow
- c++ - Safely convert std::string_view to int (like stoi or atoi) - Stack Overflow
- c++ - Passing std::string_view by reference - Stack Overflow
- c++ - Why is std::string_view faster than const char*? - Stack Overflow
- c++ - Why can't I construct a string_view from range iterators? - Stack Overflow
- c++ - How you convert a std::string_view to a const char*? - Stack Overflow
- c++17 - Why is there no implicit conversion from std::string_view to std::string? - Stack Overflow
- pointers - Meaning of *& and **& in C++ - Stack Overflow
- std::basic_string_view - cppreference.com
- std::basic_string_view<CharT,Traits>::basic_string_view - cppreference.com
- std::basic_string_view<CharT,Traits>::substr - cppreference.com
- std::basic_string_view<CharT,Traits>::remove_prefix - cppreference.com
- C++ named requirements: LegacyRandomAccessIterator - cppreference.com
- memchr - cppreference.com
- c - converting a string into a double - Stack Overflow
- c++ - convert string to size_t - Stack Overflow
- c++ - Undefined symbol std::from_chars(...) - Stack Overflow
- C++ most efficient way to convert string to int (faster than atoi) - Stack Overflow
- c++ - optimal way to convert string to double? - Stack Overflow
- c++ - What's the fastest or most efficient way to convert a double value to its most concise string representation? - Stack Overflow
- Fastest way to read numerical values from text file in C++ (double in this case) - Stack Overflow
- c++ - What is more efficient? Using pow to square or just multiply it with itself? - Stack Overflow
- Standard library header (C++17) - cppreference.com
- std::chars_format - cppreference.com
- std::from_chars - cppreference.com
- std::is_integral - cppreference.com
- std::pow, std::powf, std::powl - cppreference.com
- strtoull - cplusplus.com
- Parsing integers quickly with AVX-512 – Daniel Lemire's blog
- Faster Integer Parsing
- Fast float parsing in practice – Daniel Lemire's blog
- AVX512/VBMI2: A Programmer’s Perspective | Hacker News
- Code-used-on-Daniel-Lemire-s-blog/2023/09/22/src/parse_integer.cpp at master · lemire/Code-used-on-Daniel-Lemire-s-blog · GitHub
- GitHub - lemire/fast_double_parser: Fast function to parse strings into double (binary64) floating-point values, enforces the RFC 7159 (JSON standard) grammar: 4x faster than strtod
- visual c++ - How to write c++ code that the compiler can efficiently compile to SSE or AVX? - Stack Overflow
- Header files for x86 SIMD intrinsics - Stack Overflow
- simd - What exactly do the gcc compiler switches (-mavx -mavx2 -mavx512f) do? - Stack Overflow
- x86 Options (Using the GNU Compiler Collection (GCC))
- Improving performance with SIMD intrinsics in three use cases - Stack Overflow
- How do I define a 512 bit integer in C++? - Stack Overflow
- c - Is there a 256-bit integer type? - Stack Overflow
- C++ 128/256-bit fixed size integer types - Stack Overflow
- c++ - How to create a left-packed vector of indices of the 0s in one SIMD vector? - Stack Overflow
- c++ - Can I use SIMD for speeding up string manipulation? - Stack Overflow
- c++ - SSE and AVX intrinsics mixture - Stack Overflow
- c++ - Why is transforming an array using AVX-512 instructions significantly slower when transforming it in batches of 8 compared to 7 or 9? - Stack Overflow
- c++ - Compiling legacy GCC code with AVX vector warnings - Stack Overflow
- c - inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch - Stack Overflow
- android - What causes signal 'SIGILL'? - Stack Overflow
- Convert double to string in scientific notation faster than sprintf in c++ - Stack Overflow
- std::to_chars - cppreference.com
- c++ - efficient push_back in std::vector - Stack Overflow
- Optimizing C++ vector for 2x performance ! - DEV Community
- c++ - How to reserve a multi-dimensional Vector without increasing the vector size? - Stack Overflow
- c++ - What is gsl::multi_span to be used for? - Stack Overflow
- c++ - The difference between vector and deque - Stack Overflow
- c++ - How is sort for std::deque implemented? - Stack Overflow
- performance - What does gcc's ffast-math actually do? - Stack Overflow
- How to: Create and use unique_ptr instances | Microsoft Learn
- Smart pointers (Modern C++) | Microsoft Learn
- c++ - Why can I not push_back a unique_ptr into a vector? - Stack Overflow
- std::unique_ptr - cppreference.com
- std::make_unique, std::make_unique_for_overwrite - cppreference.com
- What are the best practices when implementing C++ error handling? - Software Engineering Stack Exchange
- How to throw dynamic string in custom exception? I can't throw string created on stack! : r/cpp_questions
- C++ Exception Handling (With Examples)
- Modern C++ best practices for exceptions and error handling | Microsoft Learn
- Zero-cost exceptions aren't actually zero cost - The Old New Thing
- Should I return TRUE / FALSE values from a C function? - Stack Overflow
- error handling - C++ custom exception class - Code Review Stack Exchange
- c++11 - Exception propagation and context - Stack Overflow
- c++ - throw an exception from a lambda expression, bad habit? - Stack Overflow
- c++ - Augment a thrown exception with some contextual information - Software Engineering Stack Exchange
- c++ - throw new std::exception vs throw std::exception - Stack Overflow
- c++ - Use throw_with_nested and catch nested exception - Stack Overflow
- c++ - What is the correct way for exception handling with OpenMP tasks? - Stack Overflow
- Standard library header - cppreference.com
- noexcept specifier (since C++11) - cppreference.com
- std::perror - cppreference.com
- std::exception::exception - cppreference.com
- std::current_exception - cppreference.com
- std::nested_exception::nested_exception - cppreference.com
- std::nested_exception - cppreference.com
- OpenMP - Scheduling(static, dynamic, guided, runtime, auto) - Yiling's Tech Zone
- c++ - OPenMP loop increment - Stack Overflow
- c++ - Atomic Minimum on x86 using OpenMP - Stack Overflow
- Class method and variable with same name, compile error in C++ not in Java? - Stack Overflow
- c++ - Do the &= and |= operators for bool short-circuit? - Stack Overflow
- c++ - What is the purpose of std::make_pair vs the constructor of std::pair? - Stack Overflow
- c++ - What is the reason for
std::make_tuple
? - Stack Overflow - c++ - Pass an array as template type - Stack Overflow
- Multiple typename arguments in c++ template? (variadic templates) - Stack Overflow
- What does 'const static' mean in C and C++? - Stack Overflow
- c++ - Should I use size_t or ssize_t? - Stack Overflow
- Parameter pack(since C++11) - cppreference.com
- std::tie - cppreference.com
- std::pair - cplusplus.com
- std::array - cppreference.com
- std::size_t - cppreference.com
- Remove First n Lines of a Large Text File - Ask Ubuntu
- javascript - How to append to a file in Node? - Stack Overflow
- git - Pushing code from one branch of a repo to another branch of another repo - Stack Overflow