Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTTP Last-Modified header for TEI resources #396

Merged
merged 8 commits into from
Feb 8, 2022

Conversation

joewiz
Copy link
Member

@joewiz joewiz commented Jan 21, 2022

As discussed with @windauer and @agh2342, accurate dependable HTTP Last-Modified headers could aid in our efforts to improve efficiency and performance of our servers. This PR is an initial step toward that goal.

It adds the Last-Modified header to resources on history.state.gov that are generated from TEI sources. Specifically, it defines new *-last-modified entries in config.xqm's $config:PUBLICATIONS map, which the pages:load templating function retrieves when constructing the TEI data needed on a page. (See Update below.) This function then sets a request attribute, hsgshell.last-modified (modeled on hsgshell.error), containing the last-modified date. Then, after view.xql finishes executing templates:apply, it checks for the presence of this attribute and, if found, passes the value to app:set-last-modified, which formats the date according to RFC 1123 (see w3c/qtspecs#36) and sends the formatted date to response:set-header("Last-Modified").

Besides TEI resources, this PR adds a Last-Modified header for FRUS volumes that we don't have TEI for yet, by using the date of the resource in /db/apps/frus/bibliography.

Other than this one non-TEI case, the PR doesn't handle Last-Modified headers for any other non-TEI resource. These could be added in future PRs.

The functions that look up last modified dates do so purely by looking up the last-modified date of TEI documents using the xmldb:last-modified function. It doesn't consider the last-modified date of the primary or any other imported HTML templates or XQuery modules, which could carry significant changes (i.e., site-wide menus, section sidebars, etc.). More nuanced handling could be added in a future PR.

The PR only handles direct requests to resources, not the ajax requests with prev/next buttons (i.e., frus-ajax.xql).

However, since FRUS TEI pages are some of the most taxing pages on the site (and AFAIK crawlers tend to make direct requests for resources rather than activating prev/next buttons), this PR should help us tell whether the addition of dependable Last-Modified headers is worthwhile.

In my local testing, the Last-Modified header is correctly set and updated as resources are stored in the database, and there are no changes to the functionality of non-TEI pages.

Update: With 59008ca, the code now checks for an "If-Modified-Since" header in pages:load and, if a 304 is appropriate, it immediately halts further processing of the template—and instead returns a 304.

let $if-modified-since := try { request:get-attribute("if-modified-since") => parse-ietf-date() } catch * { () }
let $should-return-304 :=
if (exists($last-modified) and exists($if-modified-since)) then
$if-modified-since gt $last-modified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per rfc2616 clients SHOULD set If-Modified-Since to a previously received Last-Modified. So using ge (greater or equal) instead of gr (greater) here seems desirable. Otherwise most requests will still get a 200 instead of a 304.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agh2342 Thank you! I've now switched to ge.

Address @agh2342's comment:

> As per rfc2616 clients SHOULD set If-Modified-Since to a previously received Last-Modified. So using ge (greater or equal) instead of gr (greater) here seems desirable. Otherwise most requests will still get a 200 instead of a 304.
@joewiz
Copy link
Member Author

joewiz commented Jan 22, 2022

For some info on how Last-Modified was handled on an earlier incarnation of HSG, see https://markmail.org/message/j5dk2mlho6khxfpv.


let $last-modified :=
if (exists($publication-id) and exists($document-id)) then
pages:last-modified($publication-id, $document-id, $section-id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pages:last-modified($publication-id, $document-id, $section-id)
pages:last-modified($publication-id, $document-id, $section-id) => format-dateTime("[FNn,*-3], [D01] [MNn,*-3] [Y0001] [H01]:[m01]:[s01] [Z0000]", "en", (), ()) => parse-ietf-date()

xmldb:last-modified() has millisecond resolution.
HTTP Last-Modified and If-Modified-Since headers have second resolution.
We need to make sure we mangle this the same way for outgoing and incoming HTTP headers and xmldb:last-modified() calls so comparisons work as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agh2342 I've applied the change and believe it correctly truncates milliseconds. To confirm the approach:

xquery version "3.1";

let $date := current-dateTime()
return
    (
        $date,
        $date
        => format-dateTime("[Y0001]-[M01]-[D01]T[H01]:[m01]:[s01][Z]")
        => xs:dateTime()
    )

This returns:

(
    xs:dateTime("2022-02-03T12:30:10.395-05:00"),
    xs:dateTime("2022-02-03T12:30:10-05:00")
)

I deployed the change locally, and the calculation appears to be correct.

@joewiz
Copy link
Member Author

joewiz commented Feb 3, 2022

In a new commit, I added functions to look up and generate the HTTP "Created" header, so that both the created date and last-modified dates are accurate. I did this because in practice the Created header that was automatically generated before was often later than the last-modified date. For example:

Last-Modified: Tue, 01 Feb 2022 05:46:48 +0000
Created: Thu, 03 Feb 2022 17:25:49 GMT

No logic on our end depends on the Created date, but it was unintuitive, so I thought it was worth applying the same approach we used for generating headers to this piece of information.

Now the created date is accurate - so it will always be the same as or earlier than the last-modified date:

Last-Modified: Tue, 01 Feb 2022 05:46:48 +0000
Created: Tue, 01 Feb 2022 05:46:48 +0000

@joewiz
Copy link
Member Author

joewiz commented Feb 8, 2022

@agh2342 Thanks for your reviews and feedback!

@joewiz joewiz merged commit a6af6b0 into master Feb 8, 2022
@joewiz joewiz deleted the add-last-modified-header branch February 8, 2022 16:24
joewiz added a commit that referenced this pull request Feb 11, 2022
Extend #396 to previous & next navigation buttons on TEI resources
joewiz added a commit that referenced this pull request Feb 11, 2022
Extend #396 to previous & next navigation buttons on TEI resources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants