aws-bucketeer

Learning AWS Buckets with Weepy

Building a multi-container app to:

Index binary file (PDF, MP3, etc.) metadata, including a content-based SHA256 address, using a self-hosted Apache Solr container (with Tika and Solr Cell plugins) for indexing and search (but not file content storage)
For storage, upload each indexed file to an Amazon S3 bucket with the SHA256 content address as part of its bucket key.
Search results will include the SHA256 content address, enabling retrieval of the file content of any search result from the S3 bucket.

References

AWS S3 PHP example
https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingTheMPphpAPI.html https://aws.amazon.com/sdk-for-php/

Composer
https://getcomposer.org/download/

Installing AWS SDK using Composer:

composer require aws/aws-sdk-php

Ideally use Composer. However, I used the SDK .zip for now due to my Composer being sick.

https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/getting-started_installation.html

https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/guide_configuration.html

Solr

We want Apache Tika and Apache Solr Cell enabled for extracting metadata from binary files, so in the solrconfig.xml, we'll add:

  <lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar" />

and:

<requestHandler name="/update/extract"
                startup="lazy"
                class="solr.extraction.ExtractingRequestHandler" >
  <lst name="defaults">
    <str name="lowernames">true</str>
    <str name="fmap.content">_text_</str>
  </lst>
</requestHandler>

following the documentation at https://solr.apache.org/guide/8_5/uploading-data-with-solr-cell-using-apache-tika.html

CURLOPT

CURLOPT_RETURNTRANSFER set to true to return the transfer as a string of the return value of curl_exec() instead of outputting it directly.

https://www.php.net/manual/en/function.curl-setopt.php

php.ini

Set these to something big enough to fit your upload needs.

upload_max_filesize = 128M
post_max_size = 128M
memory_limit = 512M

Emoji

https://www.w3schools.com/charsets/ref_emoji.asp

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
css		css
js		js
.env.sample		.env.sample
.gitignore		.gitignore
.htaccess.sample		.htaccess.sample
Dockerfile		Dockerfile
README.md		README.md
composer.json		composer.json
core.php		core.php
docker-compose.yml		docker-compose.yml
environment.php		environment.php
footer.php		footer.php
front-page.php		front-page.php
functions-aws.php		functions-aws.php
functions-curl.php		functions-curl.php
functions-debug.php		functions-debug.php
functions-router.php		functions-router.php
functions-routes.php		functions-routes.php
functions-solr.php		functions-solr.php
functions.php		functions.php
header.php		header.php
index.php		index.php
php-uploads.ini		php-uploads.ini
screenshot.png		screenshot.png
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aws-bucketeer

References

Solr

CURLOPT

php.ini

Emoji

About

Releases

Packages

Languages

chrisbratlien/aws-bucketeer

Folders and files

Latest commit

History

Repository files navigation

aws-bucketeer

References

Solr

CURLOPT

php.ini

Emoji

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages