wp_get_attachment_metadata() strips what it thinks are html tags in Exif metadata #14

rheaplex · 2016-07-27T23:15:22Z

If we have a jpeg with a Copyright field like:

<no> tags, tags are <stripped>

then when we upload the file to WordPress and fetch the Exif metadata using:

wp_get_attachment_metadata($att_id[0], true);

then the string we get for Copyright is:

 tags, tags are

I assume this is due to WordPress taking the sensible precaution of stripping HTML tags from outside input, but it does mean that the format we are using for license URLs falls foul of this.

I've chased this down the call stack a way and I can't find anywhere to change it. I'd rather not have to use php's exif parsing, although I've just tested that and it doesn't have the same problem.

Investigating further, but if anyone knows of a quick fix for this please let me know.

The text was updated successfully, but these errors were encountered:

rheaplex · 2016-07-27T23:34:55Z

in wp-admin/includes/image.php :

wp_read_image_metadata()

calls:

wp_kses_post_deep()

which strips tags.

So we have to use the php exif parsing. I'll look at hooking this in for the Media editor, pulling the values for the fields if they are not otherwise populated.

@mattl does this indicate a more general problem with the Exif tag format we are proposing? I don't believe so, but worth considering. Also maybe we should consider adding source and CC+ if we haven't already.

rheaplex · 2016-07-28T00:40:55Z

Making progress with the php Exif parsing, just trying to make it efficient for the code and logical for the user.

rheaplex · 2016-07-29T18:11:29Z

Code now extracts license and attribution url when you view the media. Looking to see if I can hook this in to the image upload process, but if not this will be Good Enough, I think.

rheaplex · 2016-07-29T18:24:45Z

Metadata now extracted on image upload.

This won't get metadata for existing images if the plugin is installed and we have (e.g.) 20,000 images with Exif already in the system.

@mattl we can run the extract code when you view the image in the Media editor, or is this something we might want to give the user the option of running manually from the settings for the plugin (a button [Scan Existing Images for License Metadata And Apply It] ) if that's possible?

mattl · 2016-07-29T20:45:07Z

Won't existing images have been previously stripped by WordPress?

rheaplex · 2016-07-29T20:46:36Z

I don't believe so. The strings are stripped after reading from the file, rather than the file itself being sanitised.

mattl · 2016-07-29T20:57:21Z

Maybe something like this? We could pull all the existing images from the CC website as a test, but also @ericsteuer has good insight into how this works on a big site liked Wired.com who probably have a few hundred thousand images.

rheaplex · 2016-07-29T21:14:34Z

I had in mind more a global "Extract CC License metadata where present but
don't overwrite anything" option.

We could also add a button to the media manager to do this for individual
images.

So the former would support hundreds of thousands, the latter just a few if
you only want to use a few.

On Fri, Jul 29, 2016 at 1:57 PM, Matt Lee notifications@github.com wrote:

[image: screenshot from 2016-07-29 15-56-15]
https://cloud.githubusercontent.com/assets/33296/17263143/0538b4a8-55a5-11e6-9471-e62fa5f2e11b.png

Maybe something like this? We could pull all the existing images from the
CC website as a test, but also @ericsteuer https://github.com/ericsteuer
has good insight into how this works on a big site liked Wired.com who
probably have a few hundred thousand images.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABU8ocliyFl3zU0pVeCqom5cyrKPavHks5qamkxgaJpZM4JWuJI
.

mattl · 2016-07-29T21:15:23Z

The worry I have there is that we'd wind up adding extra captions to existing images all over the place.

rheaplex · 2016-07-29T23:11:47Z

Sure. It's the sort of thing where the user will want the plugin to do the right thing, for a value of "the right thing" that will differ from case to case. And they'll really want an Undo button.

So if this is too difficult to do usefully we shouldn't make something that will just frustrate people. :-)

BjornW · 2016-07-30T09:24:10Z

Why not use the 'regenerate thumbnails approach' in which you have a plugin run once for all existing images? This could be a seperate add-on plugin which can be removed after it has run, since it's likely to be run only once.

Sadly, after all my enthusiasm, I've noticed that issue creativecommons#14 had changed to `exif_read_data()` because `wp_read_image_metadata()` was removing all tags from the image metadata... even non-HTML tags. So, project abandoned!

rheaplex self-assigned this Nov 11, 2017

rheaplex added this to the 2.1 Release milestone Nov 11, 2017

rheaplex removed their assignment Sep 7, 2018

ahmadbilaldev modified the milestones: 2.1 Release, Later Jul 25, 2019

TimidRobot modified the milestones: Later, Someday Nov 8, 2019

TimidRobot modified the milestones: Someday, Backlog Nov 20, 2020

cc-open-source-bot added the 🏷 status: label work required Needs proper labelling before it can be worked on label Dec 2, 2020

TimidRobot added this to possumbilities Sep 10, 2024

TimidRobot moved this to Triage in possumbilities Sep 10, 2024

possumbilities added 🚧 status: blocked Blocked & therefore, not ready for work and removed 🏷 status: label work required Needs proper labelling before it can be worked on labels Oct 1, 2024

possumbilities moved this from Triage to Backlog in possumbilities Oct 1, 2024

possumbilities added the 🧹 status: ticket work required Needs more details before it can be worked on label Oct 1, 2024

cc-open-source-bot added the 🏷 status: label work required Needs proper labelling before it can be worked on label Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wp_get_attachment_metadata() strips what it thinks are html tags in Exif metadata #14

wp_get_attachment_metadata() strips what it thinks are html tags in Exif metadata #14

rheaplex commented Jul 27, 2016 •

edited

Loading

rheaplex commented Jul 27, 2016 •

edited

Loading

rheaplex commented Jul 28, 2016

rheaplex commented Jul 29, 2016

rheaplex commented Jul 29, 2016

mattl commented Jul 29, 2016

rheaplex commented Jul 29, 2016

mattl commented Jul 29, 2016

rheaplex commented Jul 29, 2016

mattl commented Jul 29, 2016

rheaplex commented Jul 29, 2016

BjornW commented Jul 30, 2016

wp_get_attachment_metadata() strips what it thinks are html tags in Exif metadata #14

wp_get_attachment_metadata() strips what it thinks are html tags in Exif metadata #14

Comments

rheaplex commented Jul 27, 2016 • edited Loading

rheaplex commented Jul 27, 2016 • edited Loading

rheaplex commented Jul 28, 2016

rheaplex commented Jul 29, 2016

rheaplex commented Jul 29, 2016

mattl commented Jul 29, 2016

rheaplex commented Jul 29, 2016

mattl commented Jul 29, 2016

rheaplex commented Jul 29, 2016

mattl commented Jul 29, 2016

rheaplex commented Jul 29, 2016

BjornW commented Jul 30, 2016

rheaplex commented Jul 27, 2016 •

edited

Loading

rheaplex commented Jul 27, 2016 •

edited

Loading