Replies: 2 comments 1 reply
-
Thanks for your questions. We will investigate your analysis and let you know what we think is happening. |
Beta Was this translation helpful? Give feedback.
-
Hi, I investigated your analysis and uploaded a notebook with my analysis here. I believe the pattern you found is neither from Infomap nor from clustering of noisy data in general, but a feature of the specific partition matching you use. More specifically, you use the confusion matrix and map predicted labels (modules) to true labels by taking the true label with the highest count, i.e. the module in the ground truth that has the highest number of nodes from the predicted module. By skipping the clustering part and using toy data I recovered the same pattern you got by generating predicted labels as increased randomization of the true ones. To test it further, I normalized the rows in the confusion matrix by the size of the corresponding true module to map a predicted module to the true module where the nodes of the predicted module make up the highest proportion. This changes the diverging pattern to a converging one. Without large size differences among the true modules, the patterns disappear. Partition matching is far from trivial though. For more details and potential solutions, see for example this paper: https://journals.aps.org/prx/pdf/10.1103/PhysRevX.11.021003. Another type of analysis that may be helpful is to quantify the differences between two partitions on the modular or node level. I have ongoing work regarding this, but please feel free to contact us again if you need more help or are interested in potential collaboration. |
Beta Was this translation helpful? Give feedback.
-
I've done some simulations to examine the relationship of noise to community size, and I'm trying to get my head around the results. I generate a network by thresholding a ground-truth affinity matrix after adding noise; the ground-truth matrix is generated by 20 modules of varying sizes (from 244-1566 vertices), out of a total of 14853 vertices. Code to run the simulations is available here: https://github.com/poldrack/infomap_sims/ and here is a summary of the module sizes as I increased noise (over 25 runs for each noise level):
What I see is that the modules that start largest get larger in general as noise increases and the clustering becomes more divergent from ground truth, and those that start smallest generally get smaller with increased noise. What I'm trying to determine is whether this is a necessary feature of clustering of noisy data, or a consequence of some specific feature of the infomap algorithm. Any thoughts would be most appreciated!
Beta Was this translation helpful? Give feedback.
All reactions