Relation of noise to module size #361

poldrack · 2024-10-02T20:41:39Z

poldrack
Oct 2, 2024

I've done some simulations to examine the relationship of noise to community size, and I'm trying to get my head around the results. I generate a network by thresholding a ground-truth affinity matrix after adding noise; the ground-truth matrix is generated by 20 modules of varying sizes (from 244-1566 vertices), out of a total of 14853 vertices. Code to run the simulations is available here: https://github.com/poldrack/infomap_sims/ and here is a summary of the module sizes as I increased noise (over 25 runs for each noise level):

What I see is that the modules that start largest get larger in general as noise increases and the clustering becomes more divergent from ground truth, and those that start smallest generally get smaller with increased noise. What I'm trying to determine is whether this is a necessary feature of clustering of noisy data, or a consequence of some specific feature of the infomap algorithm. Any thoughts would be most appreciated!

mrosvall · 2024-10-03T12:48:15Z

mrosvall
Oct 3, 2024
Maintainer

Thanks for your questions. We will investigate your analysis and let you know what we think is happening.

0 replies

danieledler · 2024-10-10T13:24:00Z

danieledler
Oct 10, 2024
Maintainer

Hi, I investigated your analysis and uploaded a notebook with my analysis here. I believe the pattern you found is neither from Infomap nor from clustering of noisy data in general, but a feature of the specific partition matching you use.

More specifically, you use the confusion matrix and map predicted labels (modules) to true labels by taking the true label with the highest count, i.e. the module in the ground truth that has the highest number of nodes from the predicted module. By skipping the clustering part and using toy data I recovered the same pattern you got by generating predicted labels as increased randomization of the true ones.

To test it further, I normalized the rows in the confusion matrix by the size of the corresponding true module to map a predicted module to the true module where the nodes of the predicted module make up the highest proportion. This changes the diverging pattern to a converging one.

Without large size differences among the true modules, the patterns disappear.

Partition matching is far from trivial though. For more details and potential solutions, see for example this paper: https://journals.aps.org/prx/pdf/10.1103/PhysRevX.11.021003.

Another type of analysis that may be helpful is to quantify the differences between two partitions on the modular or node level. I have ongoing work regarding this, but please feel free to contact us again if you need more help or are interested in potential collaboration.

1 reply

poldrack Oct 10, 2024
Author

Many thanks for taking the time to dig into this! I will have a look at your notebook and be in touch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relation of noise to module size #361

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Relation of noise to module size #361

poldrack Oct 2, 2024

Replies: 2 comments · 1 reply

mrosvall Oct 3, 2024 Maintainer

danieledler Oct 10, 2024 Maintainer

poldrack Oct 10, 2024 Author

poldrack
Oct 2, 2024

Replies: 2 comments 1 reply

mrosvall
Oct 3, 2024
Maintainer

danieledler
Oct 10, 2024
Maintainer

poldrack Oct 10, 2024
Author