Ownership Algorithm

Usage

The algorithm must be run using the command line. The algorithm is located in tools/ownership. Once you have cd'd into that directory you can run the algorithm as follows:

php ownership.php [opts] -a "Article Title"
    -h  Show this help menu
    -t  Switch to process the talk pages or not
    -f  What memory factor to use when determining the sentence ownership (between 0.00 and 1.00.  Default is 0.95)
    -w  The minimum word count for sentences (integer at least 1.  Default is 5)
    -a  What article to process (To do them all do "*". Default is "*")

Rules

The ownership algorithm follows the following rules:

Count how many sentences remain in most recent versions are created by you.
When sentence is initially created, it is owned by creator.
If one person makes a significant change (more than 50%) then they would now own the sentence.
When no one person owns more than 50% then the sentence now belongs to the Public Domain
Each sentence is considered a single unit, regardless of length
The total number of sentences the user owns over the life of the article is tallied, and the percentage is calculated. There is a memory factor the change how much past revisions influence the results (0.00 = don't factor in past revisions, 1.00 don't diminish the effect at all). The way it works is the total number of sentences owned by each user is multiplied by the factor each revision, thus reducing the influence of past revisions if the factor < 1.00. If the factor is equal to 0.00, then only the instantaneous ownership of that revision is seen.
0.95 - 0.99 seems to give pretty good results.

We also want to know who is working with who, who is changing who's content (relations)

For each edit, capture the section (filter by section)
Edit 101, 102. Want to know if 102 is same section, or different section. As well as adjacent sentences.
Preceeds, Comes after, Within
For open source, different models of the program
Each time the sentence is edited show that User A edited User B
A makes a change within B's own content
A adds a sentence just before B's
A adds a sentence just after B's

Assumptions

Sentences are defined as text ending in either . ! or ?
Exceptions to this include titles like Mr. , Dr. or i.e.
Sentences need to be at least 5 words

Database Structure

ownership_relations

id: Unique id for the relation
sent_id: The id of the sentence that was changed/added/deleted
rel_sent_id: The original id of the sentence that is related to
article: The article id
talk: Whether or not this is for the talk page or not
modifier: The person who made the edit to the sentence
type: The type of relation this is. Possible values include:
adds_new: references the new sentence in the new section.
adds_after: references the original id of the sentence before the added sentence
adds_before: references the original id of the sentence after the added sentence
changes: references the original id of the changed sentence
deletes: references the original id of the sentence that was deleted
wordsIns: The number of words inserted
wordsDel: The number of words deleted
takesOwnership: Whether or not 'modifier' takes ownership of this sentence

ownership_results

article: The article id
talk: Whether or not this is for the talk page or not
rev_id: The revision id
user: The name of one of the users who has edited that article up to this revision
percent: The percent of sentences owned
factor: The memory factor used when the algorithm was run

ownership_sentences

id: The unique sentence id
article: The article id
talk: Whether or not this is for the talk page or not
rev_id: The revision id
section: The title of the section that the sentence is in
sentence_id: The index of the sentence in that revision
owner: The owner of that sentence at that time
sentence: The plain text of the sentence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly