What proportion of news is not news? What proportion of news stories are about topics unrelated to public affairs, very broadly construed, and instead are about things like cooking, sports, travel, movie reviews and such. Using a large corpus of text from web pages from hundreds of news outlets from the U.K., we tally the provision of not news. We describe how the provision of not news varies across outlets and over time.
Given copyright issues, we cannot share the full-text of news articles publicly. A dataset without the story text but including the URL, source, date, predicted and training labels can be found here.
We are happy to share the raw article text data under the following conditions:
- you will not share the data with anyone else, and
- you will only use it for research purposes
To request the data, please fill out this form. If your request is approved, you will get read access to a file in a Google Coldline Storage bucket for a month. The bucket is setup such that the requester pays---you will need to create a project that can be used for billing.
Suriyan Laohaprapanon and Gaurav Sood
If you see something, create a pull request or issue for that something! Be it an inconsistency in the data, issue with the analysis or writing, or a suggestion, or data that you would like to contribute to the project, or something else.