You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Per discussion with @sgoggins, this issue proposes some major structural changes to the way commit data is stored. It was triggered by discussion around #33
For a while I have wanted to optimize analysis_data. Each row in the table contains info on each file that changed in a commit. Each row also contains its own copy of author and committer info. When a commit changes a single file, it's not really a big deal. But when a commit changes a lot of files, there's a lot of duplication in the metadata.
There is some benefit in breaking this info out into a separate table, called commits. It would reduce the overall size of analysis_data (I haven't run into issues with this yet, but I'm not using it at the same scale as Sean, see #31 ). It would also yield a graceful solution for #33 by providing us the ability to start over, storing dates as a native DATETIME rather than in ISO 8601 format as a VARCHAR.
In addition, it also gives us a new central place to store the commit message, which may be useful info.
The main changes required are:
Alter setup.py to move these columns out of analysis_data and into a new commits table
Add a clause to the function update_db in facade-worker.py to add the new commits table, copy over commit and author/committer info, remove old columns from analysis_data and optimize it, and then do a cursory walk through the git log of each repo to get full datetime info for authors/committers plus commit messages.
Update the caching functions with the new join between analysis_data and commits
Add the ability to view commit messages to various UIs
Cut a new major release, because this is a significant database change
While this is a big change, in theory it should be possible to do all of the changes transparently to a user with an existing database. The first facade-worker.py run after pulling this code will take longer than usual, but that's likely the only impact.
The text was updated successfully, but these errors were encountered:
Per discussion with @sgoggins, this issue proposes some major structural changes to the way commit data is stored. It was triggered by discussion around #33
For a while I have wanted to optimize
analysis_data
. Each row in the table contains info on each file that changed in a commit. Each row also contains its own copy of author and committer info. When a commit changes a single file, it's not really a big deal. But when a commit changes a lot of files, there's a lot of duplication in the metadata.There is some benefit in breaking this info out into a separate table, called
commits
. It would reduce the overall size ofanalysis_data
(I haven't run into issues with this yet, but I'm not using it at the same scale as Sean, see #31 ). It would also yield a graceful solution for #33 by providing us the ability to start over, storing dates as a native DATETIME rather than in ISO 8601 format as a VARCHAR.In addition, it also gives us a new central place to store the commit message, which may be useful info.
The main changes required are:
analysis_data
and into a newcommits
tablefacade-worker.py
to add the newcommits
table, copy over commit and author/committer info, remove old columns fromanalysis_data
and optimize it, and then do a cursory walk through the git log of each repo to get full datetime info for authors/committers plus commit messages.analysis_data
andcommits
While this is a big change, in theory it should be possible to do all of the changes transparently to a user with an existing database. The first
facade-worker.py
run after pulling this code will take longer than usual, but that's likely the only impact.The text was updated successfully, but these errors were encountered: