-
-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve filenames for downloaded assets #1232
base: master
Are you sure you want to change the base?
Conversation
Any update on this? I personally do use this patch to backup a few servers that trigger this bug. |
Can't really change the naming algorithm as it will break "skip downloaded assets" for all existing exports. |
That is, unfortunately, the point. "Skip Downloaded Assets" as it currently works has a severe risk of incorrectly skipping non-duplicated assets, and there's no good way to fix that without invalidating past downloads. Do you think it'd work to put this behind a command line flag, instead of changing the default behavior? |
I think your concern is statistically valid, but there yet haven't been any bug reports in regard to asset filename collision. If/when that happens, then it makes sense to introduce the breaking change to fix such an issue. I want to make sure we're fixing real problems when we add new ones (by breaking compatibility) 🙂 Maybe down the line there will be other related breaking changes, then it will make sense to group them together. Right now I don't see it happening any time soon. |
would there be an error? or would the file just silently be skipped? in a bigger discord that would be very easy to miss if it was just silently skipped |
There wouldn't be an error now, but that can be changed. |
I think that's a nice compromise :) |
I have a single channel with 8774 conflicts.
This command removes the lines with only 1 instance of a filename then sums up all of the file counts for files with conflicts:
Then count the number of filenames that occur more than once:
17453-8679 = 8774 I manually checked some of the messages that had duplicate filenames in discord and they were indeed unique images in discord but only one image was saved locally.
It would be really great if you could implement an option to make each name unique. People that want to stick with the old method don't need to do anything if it's done this way. |
Just as an update, this fix will be included in |
This changes the filenames output by DiscordChatExporter to be more meaningful (in the case of emojis and avatars) and more resilient against collisions.
The hash is 12 characters of base32 now, rather than 5 characters of hexadecimal. This allows for nearly 5 million downloads with the same name before there is even a 0.001% of a single collision. This should actually be enough, even for problematic filenames like those associated with Youtube thumbnails.
Emojis and Discord attachments are instead guaranteed to not collide as they instead contain the unique snowflake of the attachment or emoji. The 19 digit id is significantly longer than the old 5 digit ids - however, reencoding it in base32 would only save 6 characters, so better to use the more recognizable numeric form IMO.
Closes #1231