Skip to content

Latest commit

 

History

History
203 lines (181 loc) · 6.82 KB

README.md

File metadata and controls

203 lines (181 loc) · 6.82 KB

Data Repository MathOverflow - zbMATH links

This repository provides data used for the EMS Newsletter article

References to academic literature in QA forums - A case study on zbMATH links from MathOverflow Fabian Müller, Moritz Schubotz Olaf Teschke EMS Newsletter October 19

Guide to reproduce

  • Download MathOverflow dump from
wget https://archive.org/download/stackexchange/mathoverflow.net.7z

  • Check if the md5 sum of the file is 8011aabf2ae76358abbcf9a493ba9655.

If the md5 sum does not match you might have downloaded a never version. To reproduce our results we stored the downloaded zip as GitHub release.

Take a moment to review the [dataset description](https://archive.org/details/stackexchange) the meta information
  • Format: 7zipped
  • Files:
    • badges.xml
      • UserId, e.g.: "420"
      • Name, e.g.: "Teacher"
      • Date, e.g.: "2008-09-15T08:55:03.923"
    • comments.xml
      • Id
      • PostId
      • Score
      • Text, e.g.: "@Stu Thompson: Seems possible to me - why not try it?"
      • CreationDate, e.g.:"2008-09-06T08:07:10.730"
      • UserId
    • posts.xml
      • Id
      • PostTypeId
        • 1: Question
        • 2: Answer
      • ParentID (only present if PostTypeId is 2)
      • AcceptedAnswerId (only present if PostTypeId is 1)
      • CreationDate
      • Score
      • ViewCount
      • Body
      • OwnerUserId
      • LastEditorUserId
      • LastEditorDisplayName="Jeff Atwood"
      • LastEditDate="2009-03-05T22:28:34.823"
      • LastActivityDate="2009-03-11T12:51:01.480"
      • CommunityOwnedDate="2009-03-11T12:51:01.480"
      • ClosedDate="2009-03-11T12:51:01.480"
      • Title=
      • Tags=
      • AnswerCount
      • CommentCount
      • FavoriteCount
    • posthistory.xml
      • Id
      • PostHistoryTypeId
        • 1: Initial Title - The first title a question is asked with.
        • 2: Initial Body - The first raw body text a post is submitted with.
        • 3: Initial Tags - The first tags a question is asked with.
        • 4: Edit Title - A question's title has been changed.
        • 5: Edit Body - A post's body has been changed, the raw text is stored here as markdown.
        • 6: Edit Tags - A question's tags have been changed.
        • 7: Rollback Title - A question's title has reverted to a previous version.
        • 8: Rollback Body - A post's body has reverted to a previous version - the raw text is stored here.
        • 9: Rollback Tags - A question's tags have reverted to a previous version.
        • 10: Post Closed - A post was voted to be closed.
        • 11: Post Reopened - A post was voted to be reopened.
        • 12: Post Deleted - A post was voted to be removed.
        • 13: Post Undeleted - A post was voted to be restored.
        • 14: Post Locked - A post was locked by a moderator.
        • 15: Post Unlocked - A post was unlocked by a moderator.
        • 16: Community Owned - A post has become community owned.
        • 17: Post Migrated - A post was migrated.
        • 18: Question Merged - A question has had another, deleted question merged into itself.
        • 19: Question Protected - A question was protected by a moderator
        • 20: Question Unprotected - A question was unprotected by a moderator
        • 21: Post Disassociated - An admin removes the OwnerUserId from a post.
        • 22: Question Unmerged - A previously merged question has had its answers and votes restored.
      • PostId
      • RevisionGUID: At times more than one type of history record can be recorded by a single action. All of these will be grouped using the same RevisionGUID
      • CreationDate: "2009-03-05T22:28:34.823"
      • UserId
      • UserDisplayName: populated if a user has been removed and no longer referenced by user Id
      • Comment: This field will contain the comment made by the user who edited a post
      • Text: A raw version of the new value for a given revision
        • If PostHistoryTypeId = 10, 11, 12, 13, 14, or 15 this column will contain a JSON encoded string with all users who have voted for the PostHistoryTypeId
        • If PostHistoryTypeId = 17 this column will contain migration details of either "from " or "to "
      • CloseReasonId
        • 1: Exact Duplicate - This question covers exactly the same ground as earlier questions on this topic; its answers may be merged with another identical question.
        • 2: off-topic
        • 3: subjective
        • 4: not a real question
        • 7: too localized
    • postlinks.xml
      • Id
      • CreationDate
      • PostId
      • RelatedPostId
      • PostLinkTypeId
        • 1: Linked
        • 3: Duplicate
    • users.xml
      • Id
      • Reputation
      • CreationDate
      • DisplayName
      • EmailHash
      • LastAccessDate
      • WebsiteUrl
      • Location
      • Age
      • AboutMe
      • Views
      • UpVotes
      • DownVotes
    • votes.xml
      • Id
      • PostId
      • VoteTypeId
        • 1: AcceptedByOriginator
        • 2: UpMod
        • 3: DownMod
        • 4: Offensive
        • 5: Favorite - if VoteTypeId = 5 UserId will be populated
        • 6: Close
        • 7: Reopen
        • 8: BountyStart
        • 9: BountyClose
        • 10: Deletion
        • 11: Undeletion
        • 12: Spam
        • 13: InformModerator
      • CreationDate
      • UserId (only for VoteTypeId 5)
      • BountyAmount (only for VoteTypeId 9)
extract the file
physikerwelt@math-docker:~/mathoverflow$ 7z e mathoverflow.net.7z

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel Core Processor (Skylake, IBRS) (506E3),ASM,AES-NI)

Scanning the drive for archives:
1 file, 317278828 bytes (303 MiB)

Extracting archive: mathoverflow.net.7z
--
Path = mathoverflow.net.7z
Type = 7z
Physical Size = 317278828
Headers Size = 349
Method = BZip2
Solid = +
Blocks = 7

Everything is Ok

Files: 8
Size:       1725254003
Compressed: 317278828
extract posts with references to zbmath.org
physikerwelt@math-docker:~/mathoverflow$ wc -l Posts.xml
252154 Posts.xml
physikerwelt@math-docker:~/mathoverflow$ grep 'zbmath.org' Posts.xml > zblPosts.xml
physikerwelt@math-docker:~/mathoverflow$ wc -l zblPosts.xml
774 zblPosts.xml

In the following we analyse Posts that contain the string zbmath.org.

Files in the repository

The following csv files, were all created using search and replace in a standard text editor.

  • counts.csv contains the number of all posts (not only with links to zbMATH) grouped by month. The first incomplete year and the current year were deleted.
  • mathoverflow-links-stat.xlsx is an Microsoft Excel File that analyses the date distribution of the MathOverflow posts with links to zbMath in the table dates.