-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added logic to fetch README files, documentation, commit messages, an… #2919
base: main
Are you sure you want to change the base?
Added logic to fetch README files, documentation, commit messages, an… #2919
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: website/management/commands/update_projects.py
Did you find this useful? React with a 👍 or 👎 |
@DonnieBLT I believe for the CodeQL test the languages should be python, javascript; instead of being 'python javascript'. Though I am unsure about how to fix this. |
The reason that the token is there is because it will help with the rate limiting, you can create a token from your github profile settings, page and can you please avoid changes to white space I’m not sure what linter settings you’re using, but maybe if we can standardize them we won’t see the white space changes |
95dc670
to
34add4f
Compare
I've added the GitHub token and fixed the formatting. Please review and let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please check the comments?
) | ||
|
||
# Set Issue Tracker URL | ||
project.issue_tracker_url = f"https://github.com/{repo_name}/issues" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this since it's universal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood.
I have integrated a basic summary model "facebook/bart-large-cnn". However, due to the large variations in the readme files of the repositories, the summaries aren't too effective. I had thought of pre-processing the content to only pass relevant sections, but even that seems to be difficult since there is no particular structure followed. @DonnieBLT which direction should I look into to improve this? Though openai API is paid, it could do the job really well compared to the generic python models |
We can use the OpenAI we already have a API key and it is set up in the code |
I have modified the summary and label generation to use Openai's API. Meanwhile, I am working on the UI search functionality and displaying these summaries and labels. |
Hello @DonnieBLT, here's an overview of the search functionality: Looking forward to your feedback. |
website/models.py
Outdated
readme_content = models.TextField(null=True, blank=True) | ||
documentation_url = models.URLField(null=True, blank=True) | ||
recent_commit_messages = models.TextField(null=True, blank=True) | ||
issue_tracker_url = models.URLField(null=True, blank=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this because it's always /issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these migrations be merged?
website/summarization.py
Outdated
openai.api_key = os.getenv("OPENAI_API_KEY") | ||
|
||
|
||
def ai_summary(text, topics=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this to utils.py please
@@ -22,6 +22,15 @@ <h3>Projects: {{ projects.count }}</h3> | |||
<i class="fas fa-plus-circle"></i> Add Project | |||
</button> | |||
</form> | |||
<form id="search-form" class="search-form"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make sure the search in the header works and remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search bar in the header functions slightly differently. Upon entering a query and selecting a label, it transitions to another search bar specific to the selected label. This second search bar is more focused and offers only four distinct labels to refine the search (issue, domain, user, label). It doesn't allow you to search for say specific projects or orgs.
This search form filters exclusively on project's description, labels and name. Let me know if you want to remove this and and fix the header search form to include projects and organizations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you combine them into the one top search?
@@ -53,130 +62,7 @@ <h3>Projects: {{ projects.count }}</h3> | |||
{% endfor %} | |||
</ul> | |||
{% endif %} | |||
<ul class="project-list"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep this in this file please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I assume we don't want to create a separate template and have all the code here itself? Let us finalize what to do with the search function and I'll make the changes accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can combine this into one template and adjust the global search to work as you have it.
website/label_generation.py
Outdated
openai.api_key = os.getenv("OPENAI_API_KEY") | ||
|
||
|
||
def generate_labels(readme_content, github_topics): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this to utils.py too please
project.readme_content = readme_content | ||
readme_text = markdown_to_text(readme_content) | ||
project.ai_summary = ai_summary(readme_text, project.topics) | ||
project.ai_labels = json.loads(generate_labels(readme_text, project.topics)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just add the labels verbatim from the topics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the AI-generated labels to be more accurate and effective, but we can surely use these topics directly. I'll modify it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requesting a few changes, we're almost there
Have made most of the changes, just need a final heads-up on what to do with the search functionality as it's quite buggy and limited. Should I create a new PR to improve its working if we go that route? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few adjustments request
) | ||
|
||
# Check for Documentation URL (homepage) | ||
project.documentation_url = repo_data.get("homepage") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have a homepage field
website/models.py
Outdated
@@ -756,6 +756,10 @@ class Project(models.Model): | |||
closed_issues = models.IntegerField(default=0) | |||
size = models.IntegerField(default=0) | |||
commit_count = models.IntegerField(default=0) | |||
readme_content = models.TextField(null=True, blank=True) | |||
documentation_url = models.URLField(null=True, blank=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use homepage_url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -22,6 +22,15 @@ <h3>Projects: {{ projects.count }}</h3> | |||
<i class="fas fa-plus-circle"></i> Add Project | |||
</button> | |||
</form> | |||
<form id="search-form" class="search-form"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you combine them into the one top search?
@@ -53,130 +62,7 @@ <h3>Projects: {{ projects.count }}</h3> | |||
{% endfor %} | |||
</ul> | |||
{% endif %} | |||
<ul class="project-list"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can combine this into one template and adjust the global search to work as you have it.
I have combined the search bar into one and added all the remaining categories. I'll raise a new pr later to improve the UI of the search results, have kept it basic for now. |
I have merged the migrations, please check |
Can you please clean this up? |
26ef4fe
to
fd1dbc3
Compare
I have started from the main branch again. Here a normal summary is used The search bar changes are not included here, should I raise a separate pr for that or have those changes here itself? P.S. I'm not sure how to solve this error for ai_summary: You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API. You can run Alternatively, you can pin your installation to the old version, e.g. A detailed migration guide is available here: openai/openai-python#742 Got the error even after updating openai. It could be because I don't have access to a valid api key, please verify |
The OpenAI error is happening with a valid key for the pr analysis too in production. I think maybe updating and using a different function will fix it |
Understood, will resolve it |
…d issue trackers from repository APIs.
Fixes #2681
Fixed the migration file issue that I was facing in the previous PR.
I had to remove the line "Authorization": f"token {settings.GITHUB_TOKEN}" from the header as it was giving an error saying 'Unable to fetch repository - 401'. I was unsure how to deal with it so I have removed it for now.