Improve speed of triangle counting #2

MichaelRoeder · 2017-09-18T12:56:07Z

Task description

Lemming already implements the counting of triangles in a given graph as a possible metric. At the moment, the implemented algorithm iterates over the vertices of the graph and tries to find all triangles in which the current vertex is part of. This is done by iterating over the edges that are starting or ending at the current vertex (the direction of edges is not interesting for counting triangles) and checking whether the other two vertices two edges have are connected via a third edge. The algorithm does not take triangles into account that involve a vertex that has been seen before (i.e., that has a lower ID than the current vertex) to make sure that every triangle is counted only once. The implementation can be found at https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetric.java#L41-L75.

Although the implementation is already parallelizing the problem, its complexity grows too fast for dense graphs. We would like to have additional implementations for counting triangles and a logic, that chooses a triangle counting approached based on the input graph (e.g., based on its density).

The project is mainly based on the Grph library.

Possible approaches

http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf gives an overview of different approaches. Our current approach is the edge-count and forward is an enhancement of this approach.

forward from http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf
matrix-multiplication is a different approach. It is based on the adjacency matrix A of the given graph and follows an algorithm like http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/. Unfortunately, the matrix might be very big and space limitations might become a problem for large graphs.
ayz might be a very good choice. However, for this approach the matrix-multiplication has to be implemented at first.
In https://www.cs.cmu.edu/~glmiller/Publications/Papers/Tsourakakis-KMF-09.pdf the authors propose a sampling based on that the triangles can be estimated faster without loosing too much accuracy. This would fit to our already existing implementation.

Testing

Apart from testing the correctness (with some delta for approaches that do not reach 100% correctness, the simple JUnit test class could be rused), the runtime have been checked as well. When commenting out all other metrics, only adding the triangle counting and executing the RefinementTest class it should be finished in a reasonable amount of time. For testing, the semantic web dog food data is necessary. It is provided at http://hobbitdata.informatik.uni-leipzig.de/lemming/SemanticWebDogFood-until-2015.zip

Forking / Merging

The solution should be implemented in a branch, forked from the master branch of the project. The pull request should merge the solution back into the master branch.

Responsible contact

Michael Röder

The text was updated successfully, but these errors were encountered:

MichaHoffmann · 2017-09-18T12:59:07Z

Check out DUOLION paper! Seems to fit very well here Am 18.09.2017 2:56 nachm. schrieb "Michael Röder" <notifications@github.com

…

: Task description Lemming already implements the counting of triangles in a given graph as a possible metric. At the moment, the implemented algorithm iterates over the vertices of the graph and tries to find all triangles in which the current vertex is part of. This is done by iterating over the edges that are starting or ending at the current vertex (the direction of edges is not interesting for counting triangles) and checking whether the other two vertices two edges have are connected via a third edge. The algorithm does not take triangles into account that involve a vertex that has been seen before (i.e., that has a lower ID than the current vertex) to make sure that every triangle is counted only once. The implementation can be found at https://github.com/dice-group/Lemming/blob/master/src/main/ java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetric.java# L41-L75. Although the implementation is already parallelizing the problem, its complexity grows too fast for dense graphs. We would like to have additional implementations for counting triangles and a logic, that chooses a triangle counting approached based on the input graph (e.g., based on its density). Possible approaches http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf gives an overview of different approaches. Our current approach is the edge-count and forward is an enhancement of this approach. 1. forward from http://i11www.iti.kit.edu/ extra/publications/sw-fclt-05_t.pdf <http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf> 2. matrix-multiplication is a different approach. It is based on the adjacency matrix A of the given graph and follows an algorithm like http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/ <http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/>. Unfortunately, the matrix might be very big and space limitations might become a problem for large graphs. 3. ayz might be a very good choice. However, for this approach the matrix-multiplication has to be implemented at first. 4. In https://www.cs.cmu.edu/~glmiller/Publications/Papers/ Tsourakakis-KMF-09.pdf the authors propose a sampling based on that the triangles can be estimated faster without loosing too much accuracy. This would fit to our already existing implementation. Testing Apart from testing the correctness (with some delta for approaches that do not reach 100% correctness, the simple JUnit test class <https://github.com/dice-group/Lemming/blob/master/src/test/java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetricTest.java> could be rused), the runtime have been checked as well. When commenting out all other metrics <https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/tools/RefinementTest.java#L53-L59>, only adding the triangle counting and executing the RefinementTest <https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/tools/RefinementTest.java> class it should be finished in a reasonable amount of time. Forking / Merging The solution should be implemented in a branch, forked from the master branch of the project. The pull request should merge the solution back into the master branch. Responsible contact Michael Röder <http://cs.uni-paderborn.de/ds/team/team/> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQpKf8KFmQ68vVUJQxdrwVAsHG2D0trWks5sjmhpgaJpZM4Pa3U3> .

MichaelRoeder · 2017-09-18T13:10:34Z

That is the fourth point in the description 😉
But thanks for the hint 😃

MichaelRoeder added the training label Sep 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve speed of triangle counting #2

Improve speed of triangle counting #2

MichaelRoeder commented Sep 18, 2017 •

edited

Loading

MichaHoffmann commented Sep 18, 2017 via email

MichaelRoeder commented Sep 18, 2017

Improve speed of triangle counting #2

Improve speed of triangle counting #2

Comments

MichaelRoeder commented Sep 18, 2017 • edited Loading

Task description

Possible approaches

Testing

Forking / Merging

Responsible contact

MichaHoffmann commented Sep 18, 2017 via email

MichaelRoeder commented Sep 18, 2017

MichaelRoeder commented Sep 18, 2017 •

edited

Loading