Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repackage (rename) zip files with old filename #188

Merged
merged 7 commits into from
Aug 9, 2024

Conversation

nicolemah99
Copy link
Collaborator

@nicolemah99 nicolemah99 commented Aug 5, 2024

This PR introduces a new script, repackage-event-files.sh, designed to repackage event files with the old filenames in (event_id.zip and event_id.zip.message) system by unzipping, renaming, and re-zipping them according to a new naming convention. The script also includes a feature to skip files that already follow the new naming convention, allowing it to be run on directories that may contain both old and new file formats.

New Naming Convention: location-data_type-meter-YYYYMM-id

  • Examples:
    • kea-events-meter1-202401-12345.zip
    • kea-events-meter1-202401-12345.zip.message
  • Regex Check:
    • Before processing each .message file, the script checks if the file matches the new naming convention using the regex pattern ^.*/[^/]+-[^/]+-[^/]+-[0-9]{6}-[0-9]+\.zip\.message$. If a file matches this pattern, it is skipped, preventing unnecessary processing and allowing the script to be run on both file name formats.

Usage: ./repackage-event-files.sh <BASE_DIR>

  • Example BASE_DIR: camio-meter-stream/kea/events/level0

Testing:
I copy the data from camio-psi-streams to ot-dev and ran the script on it to rename/repackage and it was successful.

@nicolemah99 nicolemah99 self-assigned this Aug 5, 2024
@nicolemah99 nicolemah99 linked an issue Aug 5, 2024 that may be closed by this pull request
3 tasks
@nicolemah99
Copy link
Collaborator Author

nicolemah99 commented Aug 6, 2024

@teknofire this is the old message file format, I noticed theres no level0 in the path which is in the path. I would also assume I should update the filename to the new zipfile name and update id to event_id to match the new message format. Does that sound right?

{
  "id": "10009",
  "filename": "10009.zip",
  "path": "KEA/events/2024-01/KOTZ-735-GEN_07",
  "data_type": "events"
}

@teknofire
Copy link
Member

We should definitely fix the id/event_id and update the filename.

For the path, I'm not sure we should actually have that now. That was something we needed when the mqtt message was being pushed from the kea side but now that the message is being transferred with the file the script that will read it in should determine the path to use itself instead. Also can we add the md5sum of the zip file in the message.

@nicolemah99
Copy link
Collaborator Author

Okay great, I created this issue for it and will work on that!
#183

@nicolemah99 nicolemah99 linked an issue Aug 6, 2024 that may be closed by this pull request
@nicolemah99 nicolemah99 linked an issue Aug 6, 2024 that may be closed by this pull request
3 tasks
@nicolemah99
Copy link
Collaborator Author

nicolemah99 commented Aug 6, 2024

I've updated the scripts to updated the message files/content, it handles three different file versions:

For V1 file, the script repackages (unzips, renames, zips, creates new message file) according to the new naming convention.
For V2 files, the script updates the message file by removing the path field and adding the md5sum field.
For V3 files, no action is needed as they are already in the most updated format.

Message File Versions/Formats

V1

{
  "id": "10009",
  "filename": "kea-events-sel00-202406-10009.zip",
  "path": "kea/events/2024-06/sel00/10009",
  "data_type": "events"
}

V2

{
  "event_id": "10009",
  "filename": "kea-events-sel00-202406-10009.zip",
  "path": "kea/events/2024-06/sel00/10009",
  "data_type": "events"
}

V3

{
  "event_id": "10009",
  "filename": "kea-events-sel00-202406-10009.zip",
  "md5sum": "12345678901234567890qwerty",
  "data_type": "events"
}

Testing
I've run the script multiple times with data in all 3 formats (test meters, acep meter, data from camio-psi-streams successfully, I confirm it worked by running it a second time, it should log that all message files are already in v3 format. I'm not sure if there are other ways I should/could test the script.

@nicolemah99 nicolemah99 force-pushed the nicole/repackage-level0-files branch from b6beb79 to 34aed51 Compare August 8, 2024 18:37
Copy link
Member

@teknofire teknofire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@nicolemah99
Copy link
Collaborator Author

Added summary (tested on different data sets)

Summary of Processed Message Formats:

Version    Count               
-------    -----               
V1         35                  
V2         18                  
V3 (skipped) 64   
Summary of Processed Message Formats:

Version    Count               
-------    -----               
V1         35                  
V2         0                   
V3 (skipped) 82   
Summary of Processed Message Formats:

Version    Count               
-------    -----               
V1         0                   
V2         0                   
V3 (skipped) 117    

@nicolemah99 nicolemah99 merged commit dcd8619 into main Aug 9, 2024
1 check passed
@nicolemah99 nicolemah99 linked an issue Aug 9, 2024 that may be closed by this pull request
@nicolemah99 nicolemah99 deleted the nicole/repackage-level0-files branch August 13, 2024 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Renaming Script to Also Repackage Files Update old format message files
2 participants