Skip to content
This repository has been archived by the owner on Nov 17, 2024. It is now read-only.

repository_url key should normalize against minor differences #7

Open
DuaneOBrien opened this issue Oct 20, 2020 · 1 comment
Open
Labels
effort: 2 good first issue Good for newcomers. hacktoberfest help type: fix Iterations on existing features or infrastructure. work: obvious The situation is obvious, best practices used.

Comments

@DuaneOBrien
Copy link

Nearly identical repositories can result in data fragmentation. Sample:

  {
    "repository_url": "http://tomcat.apache.org",
    "score": 3793
  },
  {
    "repository_url": "https://tomcat.apache.org/",
    "score": 3293
  },
  {
    "repository_url": "http://tomcat.apache.org/",
    "score": 12
  }

It seems like we could strip the protocol and any trailing slash or whitespace characters and reduce this, while getting the same results.

@mjpitz
Copy link
Member

mjpitz commented Oct 21, 2020

The new API scheme introduced a concept of a ProviderURL that is intended to do this. Definitely seems like an easy thing we can fix right now.

@mjpitz mjpitz added effort: 2 good first issue Good for newcomers. hacktoberfest help type: fix Iterations on existing features or infrastructure. work: obvious The situation is obvious, best practices used. labels Oct 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
effort: 2 good first issue Good for newcomers. hacktoberfest help type: fix Iterations on existing features or infrastructure. work: obvious The situation is obvious, best practices used.
Projects
None yet
Development

No branches or pull requests

2 participants