Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feed of releases to API #894

Open
tobias opened this issue Nov 3, 2024 · 5 comments
Open

Add feed of releases to API #894

tobias opened this issue Nov 3, 2024 · 5 comments

Comments

@tobias
Copy link
Member

tobias commented Nov 3, 2024

We've had a request for a feed of releases.

I think we could do this via an API endpoint. Something like:

GET https://clojars.org/api/release-feed?from=2012-03-01T21:38:31.525Z

The required from param is a timestamp where the feed starts (releases after that timestamp will be returned). The response will be json, and include up to 30 days of releases, and include a link to get the next page/batch:

{
  "next": "https://clojars.org/api/release-feed?from=2012-03-01T21:38:31.525Z",
  "releases": [
    {
      "version": "1.3-SNAPSHOT",
      "group-id": "foobar",
      "artifact-id": "foobar",
      "released-at": "2012-03-06T21:38:31.525Z"
    },
    {
      "version": "0.1",
      "group-id": "org.tcrawley",
      "artifact-id": "swank-clojure",
      "released-at": "2012-03-08T21:38:31.525Z"
    }
  ]
}

The end of the feed would be signaled by an empty releases array, and the from value in the next property will be the released-at of the most recent release (though that can likely be considered just an implementation detail):

{
  "next": "https://clojars.org/api/release-feed?from=2012-03-01T21:38:31.525Z",
  "releases": []
}

Each non-SNAPSHOT version should appear only once in the feed, but SNAPSHOT versions could appear multiple times; they will appear for the latest version, but if a release occurs while you are paging the results, the SNAPSHOT will appear again. This is due to how we track versions in the db; SNAPSHOTs have a single entry in the table that is updated on release instead of a new one added (IIRC). That is incorrect; we store an entry per SNAPSHOT release, so they will appear in the feed at a position that matches each time it was released.

@tobias
Copy link
Member Author

tobias commented Nov 3, 2024

Would the above work for you @cursive-ide? This is I think the bare minimum, so I'm happy to discuss adding more data to the feed.

@cursive-ide
Copy link

Yes, I think that would work well. I'm a little confused by the pagination - I pass a from parameter, which will then get me releases up to 30 days after that date. But will the next field then return releases after the first 30? So the idea is that I would start from the oldest date and then iterate forward until there are none left?

@cursive-ide
Copy link

Also, it might be a good idea to have a flag to only include non-SNAPSHOT versions? I'm not sure about this, I'm not sure whether I'd want to index snapshots or not - I'll think about this.

@tobias
Copy link
Member Author

tobias commented Nov 3, 2024

@cursive-ide:

I'm a little confused by the pagination - I pass a from parameter, which will then get me releases up to 30 days after that date. But will the next field then return releases after the first 30? So the idea is that I would start from the oldest date and then iterate forward until there are none left?

Yes, correct. You would pass from=date1, and would get 30 days worth of releases. The next url in the response would have from=date2, where date2 would be the earlier of:

  • date1 plus 30 days
  • "now"

You could then page until you got an empty array, and the next url is where you could start next time.

However, I realize that that won't account for a 30 day period where there are no releases (I suspect we have gaps like that in the early days), as we will return an empty array for those gaps, which will appear to be the end of the stream. So we need another way to signal "there are no more pages".

An alternate approach is we don't give you 30 days of releases, but instead send up to n releases (100?). Then there will never be an empty page.

So then the from param in the next url would be either:

  • the released-at value from the last release on the page (if there are release items returned)
  • the from given in the request (if there are no release items to return)

Also, it might be a good idea to have a flag to only include non-SNAPSHOT versions? I'm not sure about this, I'm not sure whether I'd want to index snapshots or not - I'll think about this.

I'll start w/o this unless you say you need it; it would be simple to add later.

@tobias
Copy link
Member Author

tobias commented Nov 3, 2024

For context: if we returned 100 results/page, it would take 3045 pages to iterate through all of the releases throughout history. I think 500 results/page would also be fine from a performance or load perspective, which would mean only 608 pages to get all releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants