Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option for getting most common edges instead of most recent #27

Open
milo-trujillo opened this issue Aug 9, 2019 · 1 comment
Labels
analysis Involves analysis of already-downloaded data enhancement good first issue

Comments

@milo-trujillo
Copy link
Member

We currently have an option -M, --maxreferences that restricts the maximum number of edges leaving a node, to avoid the celebrity problem. However, we currently accomplish this by reading through their tweets reverse-chronologically until we've found enough mentions and retweets. This means -M 30 will get the 30 most recent retweets or mentions for each user. This is not always desirable; what if we want the strongest links between users instead of the most recent?

We should provide an option like -C, --common that changes behavior to read all mentions and retweets per user, sort by occurrence, and use the top X most occurring connections rather than most recent activity.

@milo-trujillo milo-trujillo added analysis Involves analysis of already-downloaded data enhancement good first issue labels Aug 9, 2019
@milo-trujillo milo-trujillo changed the title Add an option for getting most common edges instead of most frequent Add an option for getting most common edges instead of most recent Nov 8, 2019
@milo-trujillo
Copy link
Member Author

Typo in title. This should be a pretty simple change, need to add a dictionary (or Counter from collections) for all the retweet usernames, then just get the top X from the dict. Requires adding an extra field to the options object, maybe passing an extra argument through the acquire code to the retweet collector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Involves analysis of already-downloaded data enhancement good first issue
Projects
None yet
Development

No branches or pull requests

1 participant