Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Uploading .json through API not working #1012

Open
5 tasks done
F1gueron opened this issue Dec 12, 2024 · 6 comments
Open
5 tasks done

Bug: Uploading .json through API not working #1012

F1gueron opened this issue Dec 12, 2024 · 6 comments
Labels
bug Something isn't working needs more info This issue requires more information triage This issue requires triaging

Comments

@F1gueron
Copy link

F1gueron commented Dec 12, 2024

Description:

Hey, im trying to upload .json through API and its giving me 202 as satus code, which means it works, but in the file ingest page, it alway stays as running, or sometime it changes to cancel, i tried uploading the same .json manually and it works fine, so it may be my code. At a first, i tried uploading a file with all the JSON, but it didnt work, so i started to mount every json manually, if the solution lets me upgrade only the zip it would be great

Are you intending to fix this bug?

"no"

Component(s) Affected:

  • API

Steps to Reproduce:

  1. Go to [specific page or endpoint]
  2. Click on [button/element/etc.]
  3. Enter [input/data]
  4. See error at [this point]

Expected Behavior:

I expect to actually upload the data correctly

Actual Behavior:

Having 202 status code, which means, data uploaded correctly, but this happens
image

Screenshots/Code Snippets/Sample Files:

    def create_upload_job(self):
        """Creates a file upload job."""
        response = self._request("POST", "/api/v2/file-upload/start")
        return response

    def upload_file(self, zip_file_path):
        """Uploads all JSON files inside a ZIP to the backend."""
        # Step 1: Extract ZIP file contents
        with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
            extract_path = os.path.splitext(zip_file_path)[0]
            zip_ref.extractall(extract_path)

        # Step 2: Process each JSON file
        for root, _, files in os.walk(extract_path):
            for file_name in files:
                if file_name.endswith('.json'):  # Only process JSON files
                    file_path = os.path.join(root, file_name)
                    with open(file_path, 'rb') as file:
                        file_content = file.read()

                    # Step 3: Create an upload job for each file
                    upload_job = self.create_upload_job()
                    upload_job_data = upload_job.json()
                    file_upload_job_id = upload_job_data['data']['id']

                    # Step 5: Upload the file content to the job
                    response = self._request(
                        method="POST",
                        uri=f"/api/v2/file-upload/{file_upload_job_id}",
                        body=file_content
                    )
                    print("Data loaded. Waiting for data to be processed...")
                    sleep(30)

                    # Log the upload result
                    if response.status_code == 202:
                        print(f"Successfully uploaded {file_name}")
                    else:
                        print(f"Failed to upload {file_name}: {response.status_code} - {response.text}")

Environment Information:

BloodHound: 6.3.0

Collector: [SharpHound version / AzureHound version]

OS: Windows 11

Browser (if UI related): [browser name and version]

Node.js (if UI related: [Node.js version]

Go (if API related): [Go version]

Database (if persistence related): [Neo4j version / PostgreSQL version]

Docker (if using Docker): 4.36.0

Additional Information:

Any additional context or information that might be helpful in understanding and diagnosing the issue.

Potential Solution (optional):

If you have any ideas about what might be causing the issue or how it could be fixed, you can share them here.

Related Issues:

If you've found related issues in the project's issue tracker, mention them here.

Contributor Checklist:

  • I have searched the issue tracker to ensure this bug hasn't been reported before or is not already being addressed.
  • I have provided clear steps to reproduce the issue.
  • I have included relevant environment information details.
  • I have attached necessary supporting documents.
  • I have checked that any JSON files I am attempting to upload to BloodHound are valid.
@F1gueron F1gueron added bug Something isn't working triage This issue requires triaging labels Dec 12, 2024
@F1gueron
Copy link
Author

I found out that i have to end the upload in order to actually do something the upload, but im waiting 50 seconds before doing a get to list the status of files, and some files are getting canceled

@StephenHinck
Copy link
Collaborator

A few thoughts:

  1. Is there a reason you're extracting the files from the .zip? BHCE supports .zip ingest and will save you a step.
  2. Can you provide the relevant API logs during this time
  3. Which files are getting canceled, and what are their associated error messages?
  4. You are correct that the file upload must be completed with a POST request to /api/v2/file-upload/$ID/end

@F1gueron
Copy link
Author

F1gueron commented Dec 19, 2024

  1. I also try to upload from .zip, and it never works for the first upload, i need to do 2 tries.
  2. Logs tell me just upload correct or not, but not more details
  3. For me, it always gets cancelled the first upload, it doesnt matter if its a JSON or a ZIP, maybe need to check that for being a possible bug or just my code.

This is my code:

def upload_file(self, zip_file_path):
        file_name = zip_file_path.split("/")[-1]
        with open(zip_file_path, "rb") as file:
            file_content = file.read()
    
        count = 0
        uploaded = False
        i = 0
        while uploaded == False:
            upload_job = self.create_upload_job()
            upload_job_data = upload_job.json()
            file_upload_job_id = upload_job_data['data']['id']

            response_upload = self._requestZip(
                method="POST",
                uri=f"/api/v2/file-upload/{file_upload_job_id}",
                body=file_content
            )
            print(f"{file_name} loaded. Waiting for data to be processed...")

            response_end = self._request(
                method="POST",
                uri=f"/api/v2/file-upload/{file_upload_job_id}/end"
            )
            print(f"Data processing started for {file_name}")

            threshold = 20
            sleep(threshold)
            count += threshold

            while True:
                response_list = self._request(
                    method="GET",
                    uri=f"/api/v2/file-upload"
                )
                JSON_response = response_list.json()
                if JSON_response['data'][i]["status_message"] == "Partially Completed" or JSON_response['data'][0]["status_message"] == "Complete":
                    print(f"Successfully uploaded {file_name}")
                    uploaded = True
                    break  
                if JSON_response['data'][i]['status'] == 3:
                    print("Data processing was canceled. Error on file: " + file_name)
                    i += 1
                    break  
                sleep(threshold)
                count += threshold
                

With this code, it works, but at 2nd, try. I can stick to this because is a program that efficiency is not important as is run once a month. Thanks

@StephenHinck
Copy link
Collaborator

If you upload that same file via the UI, does it work fine there, or is it the same behavior?

@F1gueron
Copy link
Author

It works fine, its just by API, everything i upload, works at second try

@StephenHinck
Copy link
Collaborator

I asked one of our engineers to double-check this thread, and what you're doing appears to be correct. However, without additional logs or a view of the full code snippet you're running, it will be difficult for us to help you troubleshoot further.

@StephenHinck StephenHinck added the needs more info This issue requires more information label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs more info This issue requires more information triage This issue requires triaging
Projects
None yet
Development

No branches or pull requests

2 participants