Skip to content
This repository has been archived by the owner on Mar 26, 2021. It is now read-only.

Broken media links when using offload media plugin #64

Open
jwcounts opened this issue Dec 17, 2018 · 11 comments
Open

Broken media links when using offload media plugin #64

jwcounts opened this issue Dec 17, 2018 · 11 comments
Labels

Comments

@jwcounts
Copy link

My colleague David Purdy at KTOO in Anchorage brought this up and I was able to replicate it in my setup.

We are both using the WP Offload Media plugin by Delicious Brains to push our media files to Amazon S3 for service. After uploading to S3, the plugin updates the attachment URL and other metadata, but leaves the attachment GUID untouched.

Any images or audio files in the body of the post are handled properly when sending to the API, since they are either untouched, or completely stripped (in the "text with HTML" and "text" versions respectively). However, the audio and image attachments created in the API all point to the attached media's GUID, which in our case returns a 404 error.

I made some changes to the NPRML.php file in my test and production environments that have sorted the problem out. Here is a link to a copy of the changed file (all changes are between lines 303 - 430):

NPRML.php

The main change is to switch $audio->guid to wp_get_attachment_url( $audio->ID ) (or $image when handling images). The benefit of this is that if no media offload plugin is being used, the GUID should be returned.

Also, I made some modifications to how filenames are pulled from the image/audio URLs using parse_url() and pathinfo().

Finally, I noticed an issue where we'd have 2 audio entries on a story in the API, because one was pulled from the post attachments and the other was pulled from an enclosure in wp_postmeta.
When processing the post attachments, I created an array of filenames, and then added a check against that array when processing enclosures.

@benlk benlk added the bug label Dec 18, 2018
@davidmpurdy
Copy link

Thank you for these changes and the great write-up!

The other scenario we've run into is if an audio file is not a child of the story post (happens most often when one person uploads the audio straight to the media library for someone else who's writing the post). The audio is still in an enclosure, but since we don't use the PowerPress plugin we don't get the duration. Would wp_read_audio_metadata work to get duration from enclosures instead of relying on PowerPress?

@jwcounts
Copy link
Author

Hmmm... I haven't used Powerpress either, so I'm not entirely sure. There might be a way to find the media attachment ID by querying wp_posts with the GUID pulled from the enclosure, and then use the attachment ID to retrieve _wp_attachment_metadata from wp_postmeta. Having the attachment ID would also allow for using wp_get_attachment_url() for those of us using media offloaders.

I can work up a proof-of-concept tomorrow when I get back into the office.

@davidmpurdy
Copy link

davidmpurdy commented Dec 18, 2018

I forgot to mention an important piece: Delicious Brains wrote us a filter so that enclosures hold the actual URL instead of the GUID. I don't know if it's the ideal best practice, but it was our middle ground between changing GUIDs and breaking everything that uses enclosures (we do podcasting from our site).

Using the enclosure_links filter in the tweaks plugin constructor:
add_filter( 'enclosure_links', array( $this, 'as3cf_enclosure_links'), 10, 2 );

The function itself uses the WP Offload S3 library to do the standard URL replacement:

function as3cf_enclosure_links( $post_links, $post_ID ) {
	global $as3cfpro;

	foreach ( $post_links as $key => $link ) {
		$post_links[ $key ] = $as3cfpro->filter_local->filter_post( $link );
	}
	return $post_links;
}

Edit: sorry, it wouldn't be wp_read_audio_metadata directly, because the enclosure has an URL, not a file path. I think under normal circumstances if you wrote something to get the path from the URL, wp_read_audio_metadata should work. As a disclaimer, however, I haven't actually done that: when we send audio to NPR One, we use a function that Delicious Brains wrote for us that function that copies the file down from S3 temporarily and then uses ID3 to get the duration from that local file.

@eteare
Copy link
Contributor

eteare commented Dec 19, 2018

@jwcounts as I read the above, I see you have solutions for the issues you found. Are they related to plugins you use that others may not use? And if these issues apply to all, would you be interested in applying your fixes to a PR that could be reviewed for a merge?

@jwcounts
Copy link
Author

@davidmpurdy: Thanks for sending that filter over. Once the file is uploaded, Wordpress should save the audio file metadata to wp_postmeta, so you shouldn't necessarily have to redownload the file and parse it directly. However, pulling from wp_postmeta works best if you can query using the audio file's attachment ID. Finding that attachment ID is what I'm working on today.

@eteare: The changes I made are related to a media offload plugin that @davidmpurdy and I are using, but the methods I'm using to accomplish this should be applicable to anyone using WP. I'm employing functions that are either built into Wordpress or normal PHP functions. Once I see if I can address @davidmpurdy 's concerns above, I would be more than happy to help with a PR for a merge.

@davidmpurdy
Copy link

@jwcounts: That's a good point. I'm not sure why they downloaded it and ran ID3 directly (I have to admit that I kind of passed the buck to them make it "just work" as part of our evaluation process of their plugin in our workflow).

@jwcounts
Copy link
Author

Okay, I think I've got it figured out now. I've updated the file at the link provided above, but just in case:

NPRML.php

There is a function in WP called attachment_url_to_postid(). I was able to test it with GUIDs as well as the URLs generated by the Offload Media plugin we're using, and it seems to return the proper attachment ID regardless. Once the enclosure's attachment ID is retrieved, it can be checked against any direct attachments to the post.

After that, I'm pulling the audio metadata from wp_postmeta and running some checks, and setting the duration off of (in order) postmeta 'length', postmeta 'formatted_length', or the unserialized duration from Powerpress in the enclosure.

I tested this in a stock version of Wordpress, retested after adding Powerpress, and then again after adding WP Offload Media. So far so good. It also seems to be working on my test site, so I'm rolling it out to our production server.

@jwcounts
Copy link
Author

Resubmitted a bunch of articles from the last couple of weeks, both to the API and NPR One, and so far so good. Image and audio attachments are present and accounted for.

@davidmpurdy, would you be willing to test out the changes on your end?

@davidmpurdy
Copy link

davidmpurdy commented Dec 21, 2018

It looks good on our staging site based on the NPRML being built (is there a sandbox version of StationConnect to test all the way through that I don't know about?). Thank you for doing all this @jwcounts!

Edit: I will test it on the live site, too, as soon as we have a story that's ready to send (which could be a while because of people being out the holidays)

@jwcounts
Copy link
Author

You're welcome @davidmpurdy, and thanks for bringing it up on the Slack! We had been sending out non-working links and junky post data for a while, and I wouldn't have known to fix it if you hadn't sent up a red flag on your end.

@davidmpurdy
Copy link

I just pushed my first couple of stories with the update nprml.php from our live site, and can confirm that they play in the app. Thanks again - looks like it's working perfectly for us.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants