In order to correlate keywords in a document what I do is to split the document into smaller pieces: what I call paragraphs -paragraphs are obtained using the function .splitlines(). Then we identify which keywords are contained in every paragraph and we define an arbitrary distance between paragraphs which is an integer that i call 'k'. If two keywords are found in two paragraphs which fall appart in a distance smaller than 'k', then I consider they have a link. Obviously there may be many links if the frequency of the keywords is high enough. Thus, the ammount of links by itself is not really representative of the correlation between keywords: we normalize this number (the number of links) with respect to some value that assures the correlation is meaningfull.
-
Notifications
You must be signed in to change notification settings - Fork 0
arnaujc91/keyword_graph
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published