-
Notifications
You must be signed in to change notification settings - Fork 1
Ownership Algorithm
David Turner edited this page Feb 12, 2014
·
18 revisions
The algorithm must be run using the command line. The algorithm is located in tools/ownership. Once you have cd'd into that directory you can run the algorithm as follows:
php ownership.php [opts] -a "Article Title"
-h Show this help menu
-t Switch to process the talk pages or not
-f What memory factor to use when determining the sentence ownership (between 0.00 and 1.00. Default is 0.95)
-w The minimum word count for sentences (integer at least 1. Default is 5)
-a What article to process (To do them all do "*". Default is "*")
The ownership algorithm follows the following rules:
- Count how many sentences remain in most recent versions are created by you.
- When sentence is initially created, it is owned by creator.
- If one person makes a significant change (more than 50%) then they would now own the sentence.
- When no one person owns more than 50% then the sentence now belongs to the Public Domain
- Each sentence is considered a single unit, regardless of length
- The total number of sentences the user owns over the life of the article is tallied, and the percentage is calculated. There is a memory factor the change how much past revisions influence the results (0.00 = don't factor in past revisions, 1.00 don't diminish the effect at all). The way it works is the total number of sentences owned by each user is multiplied by the factor each revision, thus reducing the influence of past revisions if the factor < 1.00. If the factor is equal to 0.00, then only the instantaneous ownership of that revision is seen.
- 0.95 - 0.99 seems to give pretty good results.
We also want to know who is working with who, who is changing who's content (relations)
- For each edit, capture the section (filter by section)
- Edit 101, 102. Want to know if 102 is same section, or different section. As well as adjacent sentences.
- Preceeds, Comes after, Within
- For open source, different models of the program
- Each time the sentence is edited show that User A edited User B
- A makes a change within B's own content
- A adds a sentence just before B's
- A adds a sentence just after B's
- Sentences are defined as text ending in either . ! or ?
- Exceptions to this include titles like Mr. , Dr. or i.e.
- Sentences need to be at least 5 words
- id: Unique id for the relation
- sent_id: The id of the sentence that was changed/added/deleted
- rel_sent_id: The original id of the sentence that is related to
- article: The article id
- talk: Whether or not this is for the talk page or not
- modifier: The person who made the edit to the sentence
- type: The type of relation this is. Possible values include:
- adds_new: references the new sentence in the new section.
- adds_after: references the original id of the sentence before the added sentence
- adds_before: references the original id of the sentence after the added sentence
- changes: references the original id of the changed sentence
- deletes: references the original id of the sentence that was deleted
- wordsIns: The number of words inserted
- wordsDel: The number of words deleted
- takesOwnership: Whether or not 'modifier' takes ownership of this sentence
- article: The article id
- talk: Whether or not this is for the talk page or not
- rev_id: The revision id
- user: The name of one of the users who has edited that article up to this revision
- percent: The percent of sentences owned
- factor: The memory factor used when the algorithm was run
- id: The unique sentence id
- article: The article id
- talk: Whether or not this is for the talk page or not
- rev_id: The revision id
- section: The title of the section that the sentence is in
- sentence_id: The index of the sentence in that revision
- owner: The owner of that sentence at that time
- sentence: The plain text of the sentence