In this example, we will import a plain text file to the TextRepo, verify its contents in the repo as well as verify that it has been indexed in at least the full-text
index of Elasticsearch.
- All TextRepo documents have an
externalId
. In this example, we use a fixedexternalId
ofdoc_1
. - We will register a new type
txt
to representtext/plain
- We will upload a plain text document of type
txt
, so it will be (automatically) indexed in thefull-text
index.
export tr=http://localhost:8080/textrepo
curl -s $tr/rest/types | jq .
[]
curl -s -XPOST $tr/rest/types \
-H 'Content-Type: application/json' \
-d '{"name": "txt", "mimetype": "text/plain"}' \
| jq .
{
"id": 254,
"name": "txt",
"mimetype": "text/plain"
}
Note that as .id
is internally assigned, it may be different in your case.
echo "Hello, world!" > hw.txt
cat hw.txt
Hello, world!
- Use endpoint
task/import/documents/...
- We import the file to
{externalId}/{fileType}
, in this casedoc_1/txt
- Because document
doc_1
does not yet exist, we tell TextRepo it is OK to create a new document during this import, using the query parameterallowNewDocument=true
.
curl -s -XPOST $tr/task/import/documents/doc_1/txt?allowNewDocument=true \
-H "accept: application/json" \
-F "contents=@hw.txt" \
| jq .
{
"indexed": true,
"newVersion": true,
"versionId": "0f7a74b2-3b1f-4462-918e-2d7d8b1a53b9",
"contentsSha": "f12b74807e1e998def3a2080b5c237a1912c9be5df8ee80ea8ad028d",
"fileId": "a46dd361-bdf3-4cf6-a050-7434524a75b2",
"typeId": 254,
"documentId": "e3f41426-2cd2-4e19-af1d-da027eb4ea9c"
}
indexed
tells us that the file was sent to Elasticsearch for indexingnewVersion
tells us that this is a new version of thetxt
file for documentdoc_1
- note that
versionId
,fileId
anddocumentId
are internally generated UUIDs which will be different in your case. - however, as
contentsSha
is a hash based on the contents ofhw.txt
, it should be the same. typeId
may be different, but should still match theid
from when typetxt
was registered above.
curl -s $tr/task/find/doc_1/file/contents?type=txt
Hello, world
The TextRepo stack has a perimeter nginx
proxy which passes /index
to the ElasticSearch indexes. So to pull everything from the full-text
index, we can:
curl -s http://localhost:8080/index/full-text/_search \
-H 'Content-type: application/json' \
-d '{"query": {"match_all": {}}}' \
| jq .hits.hits
Notice how we directly visit http://localhost:8080
and don't use our shortcut ENV var $tr
here, as $tr
points to http://localhost:8080/textrepo
and is thus proxied to textrepo
instead of elasticsearch
which we need here.
[
{
"_index": "full-text",
"_type": "_doc",
"_id": "a46dd361-bdf3-4cf6-a050-7434524a75b2",
"_score": 1,
"_source": {
"contents": "Hello, world"
}
}
]
Note that _id
is equal to the fileId
we saw earlier
Alternatively, and assuming elasticsearch:9200
is not exposed via docker-compose.yml
, we may want to run a query inside the elasticsearch
container and pull out everything from the full-text
index:
docker-compose -f docker-compose-dev.yml exec elasticsearch \
curl -s localhost:9200/full-text/_search \
-H 'Content-type: application/json' \
-d '{"query": {"match_all": {}}}' \
| jq .hits.hits
[
{
"_index": "full-text",
"_type": "_doc",
"_id": "a46dd361-bdf3-4cf6-a050-7434524a75b2",
"_score": 1,
"_source": {
"contents": "Hello, world"
}
}
]
If, instead, you get a host of errors like:
WARN[0000] The "POSTGRES_PASSWORD" variable is not set. Defaulting to a blank string.
WARN[0000] The "POSTGRES_DB" variable is not set. Defaulting to a blank string.
WARN[0000] The "POSTGRES_USER" variable is not set. Defaulting to a blank string.
WARN[0000] The "POSTGRES_PORT" variable is not set. Defaulting to a blank string.
WARN[0000] The "DOCKER_TAG" variable is not set. Defaulting to a blank string.
WARN[0000] The "FULL_TEXT_XML_SUBTYPES" variable is not set. Defaulting to a blank string.
WARN[0000] The "FULL_TEXT_TXT_SUBTYPES" variable is not set. Defaulting to a blank string.
[...]
you will probably need to do one of the following, depending on how recent your docker-compose
is:
- either
source docker-compose.env
so that your current shell has values for all the missing ENV vars - or use a
docker-compose
(ordocker compose
) version which supports--env-file
and then pass--env-file docker-compose.env