Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a Code Similarity Checker for Function and Model Analysis Using AI #3122

Open
DonnieBLT opened this issue Dec 18, 2024 · 5 comments · May be fixed by #3134
Open

Build a Code Similarity Checker for Function and Model Analysis Using AI #3122

DonnieBLT opened this issue Dec 18, 2024 · 5 comments · May be fixed by #3134
Assignees

Comments

@DonnieBLT
Copy link
Collaborator

We need to build a robust code similarity detection feature within our existing Django application (website app). The goal is to implement a hybrid approach combining traditional code comparison methods and AI-powered functionality to detect code similarities beyond text matching. This includes analyzing function names, model names, method signatures, parameters, return types, and even function behavior.

User enters two GitHub repo urls

Scope of Work:
1. Traditional Comparison:
• Compare function and model names using string similarity metrics like Levenshtein distance or difflib.
• Compare method signatures, including parameter names, types, and default values.
• Compare model fields and attributes in Django models.
2. AI-Powered Code Analysis:
• Use an AI-based library like Hugging Face or OpenAI API for deeper code semantics analysis.
• Implement similarity detection based on what functions do, not just how they are written.
• Use abstract syntax trees (AST) for structure-level comparisons.
3. Integration:
• Create a Django management command or an API endpoint to accept GitHub repository URLs or uploaded ZIP files.
• Analyze the uploaded repositories by extracting relevant files (.py, .js, etc.).
• Return similarity scores and generate reports.
4. Reports and Visuals:
• Generate a detailed similarity report for download.
• Highlight the most similar parts using a front-end visualization library like Chart.js or D3.js.

Examples:

Example 1: Function Name and Signature Comparison

Repo 1:

def process_data(data: list, limit: int = 100):
processed = [d for d in data if d < limit]
return processed

Repo 2:

def filter_items(items: list, max_value: int = 100):
filtered = [i for i in items if i < max_value]
return filtered

Expected Similarity: High (similar parameter structure and function logic).

Example 2: Model Field Comparison

Repo 1:

from django.db import models

class User(models.Model):
username = models.CharField(max_length=150)
email = models.EmailField(unique=True)

Repo 2:

from django.db import models

class Account(models.Model):
login_name = models.CharField(max_length=150)
contact_email = models.EmailField(unique=True)

Expected Similarity: Medium (similar model structure with different field names).

Technical Details:
1. Libraries and Tools:
• difflib for traditional string comparison.
• ast for structural comparison.
• Hugging Face Transformers for semantic analysis.
• Django for web integration.
2. Suggested Workflow:
• Extract function definitions using ast.
• Compare function and model definitions using difflib and ast.
• Use AI-based comparison as a second layer.
• Combine scores into a final similarity score.

Acceptance Criteria:
• Ability to upload code repositories as ZIP files or provide GitHub URLs.
• Comparison based on function signatures, models, and fields.
• AI-based analysis for deeper code similarity checking.
• Detailed similarity reports and interactive visualizations.

@krrish-sehgal
Copy link
Contributor

/assign

Copy link
Contributor

Hello @krrish-sehgal! You've been assigned to OWASP-BLT/BLT. You have 24 hours to complete a pull request. To place a bid and potentially earn some BCH, type /bid [amount in BCH] [BCH address].

Copy link
Contributor

⏰ This issue has been automatically unassigned due to 24 hours of inactivity.
The issue is now available for anyone to work on again.

@krrish-sehgal
Copy link
Contributor

/assign

Copy link
Contributor

Hello @krrish-sehgal! You've been assigned to OWASP-BLT/BLT. You have 24 hours to complete a pull request. To place a bid and potentially earn some BCH, type /bid [amount in BCH] [BCH address].

@krrish-sehgal krrish-sehgal linked a pull request Dec 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

2 participants