Skip to content

Latest commit

 

History

History
630 lines (572 loc) · 19.6 KB

schema.md

File metadata and controls

630 lines (572 loc) · 19.6 KB
title description layout toc
Schema & Data Formats
Superfeedr's schema is standard ATOM, with a few extra items fit into custom namespaces.
page
Status Entry JSON

This section mostly applies to feed subscriptions. If you're subscribing to arbitrary content resources, we will send you the exact content of the resource to which you've subscribed. Some of the status information is available to non-feeds resources as well.

Whatever the original format (RSS, Atom, or any other namespace) is, the notification that we will send to subscribers will use standard ATOM, as well as a few other namespaces detailed below. We will match as much as we can into this format. The overall goal here is to make it easy for the subscriber to consume a consistent schema.

Status

Upon notifications, when subscribing, or when retrieving a resource's content from Superfeedr, you'll see that it may include the following information. This data is useful for you to know the current state of a resource. Please note that some items my be missing at some point, either because we haven't processed the feed yet, or because they wouldn't be accurate.

  <tr>
    <td>title</td>
    <td></td>
    <td>The feed title</td>
  </tr>

  <tr>
    <td>http[@code]</td>
    <td>&nbsp;</td>
    <td>last HTTP status code, please see <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html">Status Code Definitions</a></td>
  </tr>

  <tr>
    <td>http</td>
    <td>&nbsp;</td>
    <td>the content of that tag is a more explicit log message for your information</td>
  </tr>

  <tr>
    <td>next_fetch</td>
    <td>&nbsp;</td>
    <td>the resource will be fetched at most before this time</td>
  </tr>

  <tr>
    <td>period</td>
    <td>&nbsp;</td>
    <td>the polling frequency in seconds for this resource (at least 60 seconds for feeds and at least 300 seconds for arbitrary content)</td>
  </tr>

  <tr>
    <td>last_fetch</td>
    <td>&nbsp;</td>
    <td>the last time at which we fetched the resource</td>
  </tr>

  <tr>
    <td>last_parse</td>
    <td>&nbsp;</td>
    <td>the last time at which we parsed the resource. It happens that we fetch a resource and do not parse it as its content hasn't been modified</td>
  </tr>

  <tr>
    <td>last_maintenance_at</td>
    <td>&nbsp;</td>
    <td>Each resource inside Superfeedr has a maintenance cycle that we use to detect stale resource, or related resource. We normally run maintenance at most every 24hour for each resource, but this is a low priority task, so it may go beyond this</td>
  </tr>

  <tr>
    <td>entries_count_since_last_maintenance</td>
    <td></td>
    <td>The number of updates in the resource since we last ran the maintenance script. This is a very good indicator of the verboseness of a resource. You may want to remove resources that are too verbose</td>
  </tr>

  <tr>
    <td id="velocity">velocity</td>
    <td></td>
    <td>The number of updates during a maintenance cycle (between 24 and 48 hours). More than the absolute number, the magnitude matters. </td>
  </tr>

  <tr>
    <td id="popularity">popularity</td>
    <td></td>
    <td>Float. Starts at 0 (not popular).The greater the number, the more popular the feed. Popularity is assessed for each feed based on several different signals from the social web, number of clicks, number of subscribers. It also depends on the popularity of the web pages which link to the feed.</td>
  </tr>


  <tr>
    <td id="porn_rank">porn_rank</td>
    <td>if available</td>
    <td>Betwen 0 and 1. The greater the rank, the greater the chances that the feed publishes only porn content. </td>
  </tr>

  <tr>
    <td id="bozo_rank">bozo_rank</td>
    <td>if available</td>
    <td>Betwen 0 and 1. The Bozo rank indicates that a feed is probably valid syntactically but likely invalid semantically: feeds with constantly changing unique identifier for new entries will rank high, for example.</td>
  </tr>

  <tr>
    <td id="generated_ids">generated_ids</td>
    <td>true</td>
    <td>Indicates whether the <code>id</code> for each entry was generated by Superfeedr. If this is missing, you can safely assume that we were able to extract the unique id from the feed themselves.</td>
  </tr>
</tbody>
Name Note Value
status[@feed]   contains the URL of the resource

Entry Schema (feeds only)

Notification entries will have the following form. It is standard ATOM. Please note that an entry might not have all of them.

Here are the components used to build the entries. Please note that they may use specific namespaces.

Link

  <tr>
    <td>link[@rel]</td>
    <td>optional</td>
    <td>the type of relation to that parent node (alternate, reply... etc)</td>
  </tr>
  <tr>
    <td>link[@type]</td>
    <td>optional</td>
    <td>MimeType of the link destination (text/html by default)</td>
  </tr>
  <tr>
    <td>link[@title]</td>
    <td>optional</td>
    <td>the link title</td>
  </tr>
</tbody>
Name Note Value
link[@href] the url related to the parent node

Example

{% prism markup %}

{% endprism %}

Category

Name Note Value
category[@term] optional, multiple a keyword related to the entry... (tag, category or topic)

Example

{% prism markup %} {% endprism %}

Point

Name Note Value
entry[@point] optional, multiple geolocation data. Contains a [georss](http://georss.org/) latitude and longitude. It's either extracted from the story or extrapolated from the content.

Example

{% prism markup %} 47.597553 -122.15925 {% endprism %}

Author

Name Note Value
author optional, multiple Author information
name optional the author's name (or nickname)
email optional the author's email address
uri optional the author's URI
object-type optional, multiple the object type, defined in the ActivityStreams spec
link optional, multiple links (see above). They can include links to the author's profile, to the user's avatar...

Example

{% prism markup %} John Doe john@superfeedr.com http://twitter.com/superfeedr as:object-typehttp://activitystrea.ms/schema/1.0/person</as:object-type>

{% endprism %}

Object

Name Note Value
object optional, multiple ActivityStreams
object-type optional, multiple the object type, defined in the ActivityStreams spec
id optional the unique identifier of the object
title optional the title of the object
published optional the publication date (iso8601) of the object
updated optional the updated date (iso8601) of the object
content optional the content of the object
author optional, multiple author information (see above)
category optional, multiple categories (see above)
link optional, multiple links (see above)

Example

{% prism markup %} as:object-typehttp://gowalla.com/schema/1.0/spot</as:object-type> as:object-typehttp://activitystrea.ms/schema/1.0/place</as:object-type> object-id

<title>Title of the Object</title> 2013-04-20T15:00:40+02:00 2013-04-21T14:00:40+02:00 hello world Second http://domain.tld/second second@domain.tld {% endprism %}

Verb

Name Note Value
verb optional, multiple defined in the ActivityStreams spec

Example

{% prism markup %} as:verbhttp://activitystrea.ms/schema/1.0/post</as:verb> {% endprism %}

Entries

Entries may include all the above elements. They also contain specific nodes, listed below.

Name Note Value
entry[@xml:lang] optional The language of the entry. It's either extracted or computed from the content (the longer the content, the more relevant).
entry[@title] The title of the entry.
entry[@published] optional The publication date (iso8601) of the entry.
entry[@updated] optional The last updated date (iso8601) of the entry.
entry[@content] optional The content of the entry. Check the type attribute to determine the mime-type.
entry[@summary] optional The summary of the entry. Check the type attribute to determine the mime-type.
entry[@source] optional The source of the entry. It includes all the available feed level elements, such as the feed title, the feed links, the feed's author(s)... etc. It's extremely useful for track feeds.

{% prism markup %} domain.tld:09/05/03-1 2013-04-21T14:00:40+02:00 2013-04-21T14:00:40+02:00

<title>Entry published on hour ago</title> Entry published on hour ago when it was shinny outside, but now it's raining Entry published on hour ago... 47.597553 -122.15925 First http://domain.tld/first first@domain.tld id-first <title>First</title> http://activitystrea.ms/schema/1.0/person http://activitystrea.ms/schema/1.0/dude Second http://domain.tld/second second@domain.tld http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/publish http://gowalla.com/schema/1.0/spot http://activitystrea.ms/schema/1.0/place object-id <title>Title of the Object</title> 2013-04-20T15:00:40+02:00 2013-04-21T14:00:40+02:00 hello world Second http://domain.tld/second second@domain.tld feed-id <title>feed-title</title> 2013-04-21T14:00:40+02:00 {% endprism %}

JSON

Superfeedr offers the ability to subscribe to Atom and RSS feeds, but receive notifications in JSON. It's a mapping of our Atom schema. This mapping was created with the goal of being compatible with the OSync and ActivityStreams JSON schemas.

  • Dates: the dates shown are Unix Timestamps (seconds since Epoch), expressed in UTC.
  • Keys: expressed as camel case.

Example

{% prism javascript %} { "status": { "entriesCountSinceLastMaintenance": 24, "velocity": 50, "lastParse": 1290793065, "period": "600", "lastMaintenanceAt": 1290778665, "feed": "http://domain.tld/feed.xml", "lastFetch": 1290796665, "code": 200, "title": "A wonderful feed", "nextFetch": 1290803865, "http": "Awesome we got the feed right" }, "items": [ { "geo": { "type": "point", "coordinates": [ 47.597553, -122.15925 ] }, "standardLinks": { "picture": [ { "type": "image/png", "href": "http://domain.tld/entry/image_.png", "title": "A beautiful picture that illustrates the entry" } ], "replies": [ { "type": "text/html", "href": "http://domain.tld/entry/1.xml", "title": "" } ] }, "permalinkUrl": "http://domain.tld/entry/1", "verb": "publish", "content": "Entry published on hour ago when it was shinny outside, but now it's raining", "published": 1271851240, "actor": { "displayName": "First", "image": "http://domain.tld/first.png", "permalinkUrl": "http://domain.tld/first/profile", "id": "id-first", "objectType": "dude", "title": "First" }, "categories": [ "Things", "Picture" ], "id": "domain.tld:09/05/03-1", "object": { "permalinkUrl": "http://domain.tld/object/2", "content": "hello world", "published": 1271768440, "actor": { "displayName": "Second", "permalinkUrl": "http://domain.tld/second" }, "id": "object-id", "updated": 1271851240, "title": "Title of the Object", "objectType": "place" }, "title": "Entry published on hour ago", "updated": 1271851240, "source": { "id": "http://blog.superfeedr.com/", "title": "Superfeedr Blog", "updated": 1245776753, "permalinkUrl": "http://blog.superfeedr.com/" }, "language": "en" }, { "permalinkUrl": "http://www.macrumors.com/2009/05/06/adwhirl-free-ad-supported-iphone-apps-can-very-lucrative/", "published": 1241616887, "content": "Mobile advertising company AdWhirl issued a report (PDF) that details the success of some of the top ad-supported iPhone apps. AdWhirl serves 250 million ad impressions monthly to over 10% of the top 50 Apps in the App Store.br /\n br /\n br /\n ...", "id": "http://www.macrumors.com/2009/05/06/adwhirl-free-ad-supported-iphone-apps-can-very-lucrative/", "title": "AdWhirl: Free Ad-Supported iPhone Apps Can Be Very Lucrative" } ], "subtitle": "This is certainly a wonderful feed that you need to read!", "standardLinks": { "canonical": [ { "type": "application/atom+xml", "href": "http://feed.domain.tld/main.xml", "title": null } ], "self": [ { "type": "application/atom+xml", "href": "http://domain.tld/feed.xml", "title": null } ] }, "id": "http://domain.tld/feed.xml", "title": "A wonderful feed", "updated": 1290800265 } {% endprism %}

It is recommended that you check the schema for some of the feeds to which you subscribe to make sure that all the required field for your application are included. Feel free to get in touch if you miss any content in the feeds.