-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build a Code Similarity Checker for Function and Model Analysis Using AI #3122
Comments
/assign |
Hello @krrish-sehgal! You've been assigned to OWASP-BLT/BLT. You have 24 hours to complete a pull request. To place a bid and potentially earn some BCH, type /bid [amount in BCH] [BCH address]. |
⏰ This issue has been automatically unassigned due to 24 hours of inactivity. |
/assign |
Hello @krrish-sehgal! You've been assigned to OWASP-BLT/BLT. You have 24 hours to complete a pull request. To place a bid and potentially earn some BCH, type /bid [amount in BCH] [BCH address]. |
We need to build a robust code similarity detection feature within our existing Django application (website app). The goal is to implement a hybrid approach combining traditional code comparison methods and AI-powered functionality to detect code similarities beyond text matching. This includes analyzing function names, model names, method signatures, parameters, return types, and even function behavior.
User enters two GitHub repo urls
Scope of Work:
1. Traditional Comparison:
• Compare function and model names using string similarity metrics like Levenshtein distance or difflib.
• Compare method signatures, including parameter names, types, and default values.
• Compare model fields and attributes in Django models.
2. AI-Powered Code Analysis:
• Use an AI-based library like Hugging Face or OpenAI API for deeper code semantics analysis.
• Implement similarity detection based on what functions do, not just how they are written.
• Use abstract syntax trees (AST) for structure-level comparisons.
3. Integration:
• Create a Django management command or an API endpoint to accept GitHub repository URLs or uploaded ZIP files.
• Analyze the uploaded repositories by extracting relevant files (.py, .js, etc.).
• Return similarity scores and generate reports.
4. Reports and Visuals:
• Generate a detailed similarity report for download.
• Highlight the most similar parts using a front-end visualization library like Chart.js or D3.js.
Examples:
Example 1: Function Name and Signature Comparison
Repo 1:
def process_data(data: list, limit: int = 100):
processed = [d for d in data if d < limit]
return processed
Repo 2:
def filter_items(items: list, max_value: int = 100):
filtered = [i for i in items if i < max_value]
return filtered
Expected Similarity: High (similar parameter structure and function logic).
Example 2: Model Field Comparison
Repo 1:
from django.db import models
class User(models.Model):
username = models.CharField(max_length=150)
email = models.EmailField(unique=True)
Repo 2:
from django.db import models
class Account(models.Model):
login_name = models.CharField(max_length=150)
contact_email = models.EmailField(unique=True)
Expected Similarity: Medium (similar model structure with different field names).
Technical Details:
1. Libraries and Tools:
• difflib for traditional string comparison.
• ast for structural comparison.
• Hugging Face Transformers for semantic analysis.
• Django for web integration.
2. Suggested Workflow:
• Extract function definitions using ast.
• Compare function and model definitions using difflib and ast.
• Use AI-based comparison as a second layer.
• Combine scores into a final similarity score.
Acceptance Criteria:
• Ability to upload code repositories as ZIP files or provide GitHub URLs.
• Comparison based on function signatures, models, and fields.
• AI-based analysis for deeper code similarity checking.
• Detailed similarity reports and interactive visualizations.
The text was updated successfully, but these errors were encountered: