-
-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wp_get_attachment_metadata() strips what it thinks are html tags in Exif metadata #14
Comments
in wp-admin/includes/image.php :
calls:
which strips tags. So we have to use the php exif parsing. I'll look at hooking this in for the Media editor, pulling the values for the fields if they are not otherwise populated. @mattl does this indicate a more general problem with the Exif tag format we are proposing? I don't believe so, but worth considering. Also maybe we should consider adding source and CC+ if we haven't already. |
Making progress with the php Exif parsing, just trying to make it efficient for the code and logical for the user. |
Code now extracts license and attribution url when you view the media. Looking to see if I can hook this in to the image upload process, but if not this will be Good Enough, I think. |
Metadata now extracted on image upload. This won't get metadata for existing images if the plugin is installed and we have (e.g.) 20,000 images with Exif already in the system. @mattl we can run the extract code when you view the image in the Media editor, or is this something we might want to give the user the option of running manually from the settings for the plugin (a button [Scan Existing Images for License Metadata And Apply It] ) if that's possible? |
Won't existing images have been previously stripped by WordPress? |
I don't believe so. The strings are stripped after reading from the file, rather than the file itself being sanitised. |
Maybe something like this? We could pull all the existing images from the CC website as a test, but also @ericsteuer has good insight into how this works on a big site liked Wired.com who probably have a few hundred thousand images. |
I had in mind more a global "Extract CC License metadata where present but We could also add a button to the media manager to do this for individual So the former would support hundreds of thousands, the latter just a few if On Fri, Jul 29, 2016 at 1:57 PM, Matt Lee notifications@github.com wrote:
|
The worry I have there is that we'd wind up adding extra captions to existing images all over the place. |
Sure. It's the sort of thing where the user will want the plugin to do the right thing, for a value of "the right thing" that will differ from case to case. And they'll really want an Undo button. So if this is too difficult to do usefully we shouldn't make something that will just frustrate people. :-) |
Why not use the 'regenerate thumbnails approach' in which you have a plugin run once for all existing images? This could be a seperate add-on plugin which can be removed after it has run, since it's likely to be run only once. |
Sadly, after all my enthusiasm, I've noticed that issue creativecommons#14 had changed to `exif_read_data()` because `wp_read_image_metadata()` was removing all tags from the image metadata... even non-HTML tags. So, project abandoned!
If we have a jpeg with a Copyright field like:
then when we upload the file to WordPress and fetch the Exif metadata using:
then the string we get for Copyright is:
I assume this is due to WordPress taking the sensible precaution of stripping HTML tags from outside input, but it does mean that the format we are using for license URLs falls foul of this.
I've chased this down the call stack a way and I can't find anywhere to change it. I'd rather not have to use php's exif parsing, although I've just tested that and it doesn't have the same problem.
Investigating further, but if anyone knows of a quick fix for this please let me know.
The text was updated successfully, but these errors were encountered: