Skip to content

Configuration for gtweb's apertium-apy and apertium-html-tools installations

Notifications You must be signed in to change notification settings

giellatekno/gtweb-apy-conf

Repository files navigation

Introduction

This repository contains the configuration for the MT interfaces at

The translation daemon/server is running under the user “apy” on the server gtweb (gtweb.uit.no), and the html pages also reside under the home directory of that user (hosted by the regular nginx server). The whole configuration is in this git repo, hosted at https://github.com/giellatekno/gtweb-apy-conf

The directory /home/apy is a git repo, with https://github.com/giellatekno/gtweb-apy-conf as the upstream.

Any change to the configuration is first checked in to git and pushed to github, then the configuration is updated at gtweb by running the update script, which also might update submodules and build them.

There are some submodules:

The apertium-apy daemon runs as a systemd service, and there’s an nginx config that makes gtweb.uit.no:2737 available under gtweb.uit.no/apy. Nginx should serve SSL signed by Lets Encrypt / certbot. All requests to apertium-apy from the static html pages go to https://gtweb.uit.no/apy, although we still allow accessing the pages through plain http since https makes the website translation break on loading external resources (see #2).

The following sections define how to do common or uncommon tasks like installing everything from scratch, installing new language pairs, or updating the code for the subrepos.

Installing new language pairs

Optionally enable it in jorgal/tolkimine

The page mt-testing runs all language pairs, but jorgal and tolkimine only run certain pairs. See the files jorgal.conf / tolkimine.conf in the directory html-tools-confs – you’ll need to edit the line ALLOWED_PAIRS if you want your new pair to appear on one of those pages. Then check in your change and push to https://github.com/giellatekno/gtweb-apy-conf (it’s also possible to just check in that change on /home/apy without pushing, if you really have to), e.g.

edit html-tools-confs/jorgal.conf
git commit -am "added sme-eus"
git push # unless you're editing in /home/apy because you can't push

Install the pair and re-build pages

To install the new language pair on gtweb, log in as a user with sudo-rights and do:

sudo yum install apertium-from-to
sudo /home/apy/update

To see what pairs and language modules are available

https://build.opensuse.org/project/show/home:TinoDidriksen:nightly

Updating submodules apertium-apy / apertium-html-tools

A push to apertium-apy or apertium-html-tools isn’t automatically pulled into the gtweb just by running /home/apy/update; you need to also check into this repo that you want to use that.

This repo specifies which commit of each submodule we want to use. If you want to use the newest commit of the giellatekno branch of apertium-html-tools and apertium-apy, you can enter this repo and do

git submodule foreach git pull origin giellatekno
git commit -m "updated apy/html-tools"
git push

(if you don’t have push-rights to https://github.com/giellatekno/gtweb-apy-conf, you could just run the first two lines under gtweb:/home/apy).

Then, as a sudo-user on gtweb, do:

sudo /home/apy/update

In case everything needs reinstalling

In case e.g. gtweb dies and you need to reinstall everything:

Tell SELinux that nginx can proxy to ports 2737 (apy) and 3000 (spellcheck):

sudo semanage port -a -t http_port_t -p tcp 2737
sudo semanage port -a -t http_port_t -p tcp 3000

Install some dependencies from CentOS/EPEL:

sudo yum install certbot python2-certbot-nginx # for https
sudo yum install git jq                        # for this repo, build-html-pages
sudo yum install python36-tornado              # for running apertium-apy
sudo yum install zip unzip                     # for divvun-apy-extract-all-zcheck
curl -Ss https://apertium.projectjj.com/rpm/install-nightly.sh | sudo bash
sudo yum update
sudo yum install giella-sme       # for running in divvun-gramcheck
sudo yum install 'apertium-sme-*  # and other language pairs for MT

And these for libdivvun:

sudo yum install centos-release-scl   # for installing devtoolset's
sudo yum install devtoolset-8-gcc-c++ # for recent C++
sudo yum install libtool automake cmake pugixml-devel libarchive-devel
sudo yum install hfst-ospell-devel

Add a new user apy (group apy) with home dir /home/apy.

sudo adduser apy

Log in as this user apy – we’ll make the home directory a git repo tracking this repo:

cd /home/apy
git init
git remote add origin https://github.com/giellatekno/gtweb-apy-conf.git
git branch --set-upstream-to=origin/master master
git pull
git checkout master
git submodule init
git submodule update --recursive

Still as user apy, compile libdivvun manually since CentOS doesn’t have pugixml-devel in non-extra repos:

git clone https://github.com/divvun/libdivvun/
cd libdivvun
scl enable devtoolset-8 bash
./autogen.sh
./configure --enable-checker --enable-cgspell --enable-xml --disable-python-bindings --prefix=/home/apy/PREFIX/divvun-gramcheck
make -j5
make test # it's ok if the run-lib test fails
make -j5 install
exit

Then log in as a user with sudo-rights, and install configuration files:

sudo /home/apy/install-and-enable-services

That script will also ensure that the yum updater, apertium-apy service and apertium-apy-restarter service are running now and on restarts of gtweb.

Then update and build apertium-apy and the apertium-html-tools pages:

sudo /home/apy/update

PDF translation support

(Currently gtweb is using an old svn checkout in ~apy/apertium-apy/corpustools. This section needs updating to use Tino’s package, but apparently that’s only available for newer operating systems.)

apertium-apy uses CorpusTools if available. We need to ensure it’s possible to run /usr/bin/pdftohtml and to do from corpustools import pdfconverter from the apy directory. To install CorpusTools:

sudo yum install python3-corpustools

Check that you have all dependencies by logging in as user apy and

cd apertium-apy
python3 -c 'from corpustools import pdfconverter; print(pdfconverter.convert2intermediate.__doc__)'

This should print “Convert a pdf document to Giella xml format.” etc.

apertium-apy will detect if corpustools and pdftohtml are available.

Details

All the relevant configuration files for the gtweb machine are under the etc folder of this repo, so we know what configs are relevant in case we need to reinstall everything.

Language pairs are those that are installed with yum install (ExecStart in etc/systemd/system/apy.service gives the path to the modes files), but individual html configurations can specify a subset of pairs to run (see Installing new language pairs).

We expect a standard nginx httpd running; see configs in etc/nginx/default.d/.

The file etc/systemd/system/apy.service says how to run the apertium-apy MT daemon, which is started on restart of the gtweb machine.

About

Configuration for gtweb's apertium-apy and apertium-html-tools installations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages