Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr hOCR search #163

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Solr hOCR search #163

wants to merge 11 commits into from

Conversation

alxp
Copy link
Contributor

@alxp alxp commented Jul 19, 2024

** Update 2024/09/11: The PR has been rebased against 'main' and can now be built with make starter_dev **

What does this pull request do?

Adds configurations to support a search within a paged content object's children for text found in the page object's hOCR.

How should this be tested?

From a clean ISLE-DC instance:

Add this line to build/docker-compose/docker-compose.drupal.yml in the environment: section:

SOLR_HOCR_PLUGIN_PATH: '/opt/solr/server/solr/contrib/ocrhighlighting/lib'

Then from the isle-dc folder, clone this repository with the solr-hocr branch:

git clone git@github.com:Islandora-Devops/islandora-starter-site.git --branch=solr-hocr

Run 'make starter_dev' and wait for it to complete.

Then after logging in as admin:

  1. Create a Repository Item node with model 'Paged Content
  2. Click on 'Children' and 'Batch Upload Children'.
  3. Select Repository Item, Page, File, and Original File on the batch upload page and click Next.
  4. Upload one or more TIFF or JP2 files and click Finish.
  5. Go back to the Paged Content node, you should see the Mirador viewer and text should be selectable.
  6. Click the show sidebar button in Mirador and then click the Search button.
  7. In the search box, enter a term you know is in the text of an image you've uploaded.

You should see the search result in the viewer's sidebar, clicking on it should take you to the highlighted result in the viewer.

image

@alxp
Copy link
Contributor Author

alxp commented Jul 19, 2024

Not sure what gives with this failing test, it says:

 * Create a sites/default/settings.php file
 * CREATE the 'test_db' database.

 Do you want to continue? (yes/no) [yes]:

In SiteInstallCommands.php line 197:
                                                                               
  Existing configuration directory  does not contain a core.extension.yml fil  
  e.                                                                           
                                                                               

Script drush handling the __exec_command event returned with error code 1

but there's a core.extensions.yml file in the config/sync folder. Running the step locally works as expected.

@Natkeeran
Copy link
Contributor

On today’s call, it was noted that the issue may be related to Drush 13

@adam-vessey
Copy link
Collaborator

Not necessarily related to Drush 13, but possibly needing the same fix as is being done in the Drush 13 PR, altering the --db-url: https://github.com/Islandora-Devops/islandora-starter-site/pull/162/files#diff-b803fcb7f17ed9235f1e5cb1fcd2f5d3b2838429d4368ae4c57ce4436577f03fR45

@alxp alxp mentioned this pull request Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants