Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting database and website #24

Open
GiovanniBussi opened this issue Apr 17, 2019 · 3 comments
Open

Splitting database and website #24

GiovanniBussi opened this issue Apr 17, 2019 · 3 comments

Comments

@GiovanniBussi
Copy link
Contributor

I think it is a bit annoying to update the website in the way it works now. I would like to restructure it as follows:

  • One repository (say plumed-nest/eggs-database) could contain only what's currently in the eggs19 directory as well as a single yml file corresponding to the current _data/eggs.yml file. This repo would be the one where the script pushes all the updates to the database and will be automatically generated.
  • Another repo (say plumed-nest/plumed-nest.github.io) could contain only the website information (everything else in the current plumed-nest/plumed-nest) plus a git submodule eggs-database corresponding to the repository mentioned above. In addition, a file _data/eggs.yml could be a symbolic link to eggs-database/eggs.yml. This repository will be edited by us maintainers manually.

The repository plumed-nest would still run nest.py and push to plumed-nest/eggs-database (as it does now). Whenever we commit on plumed-nest, the script should do the following:

  • recreate the database and, if everything is ok, push to eggs-database. The push would be done maintaining history
  • open a pull request on plumed-nest/plumed-nest.github.io asking to merge the submodule update.

This change will improve the workflow in several ways:

  1. We will be able to fix the website without regenerating the database everytime.
  2. We will be able to double check updates to the database before they end up in the real page. Since the tables in the eggs directories are done with markdown, we can also check them from github directly (not rendered). After checking, one can merge the pull request on the website with a click.
  3. We will be able to revert the database to previous versions (by using an older submodule in the website) if there are issues in one build.

The is no urgency but I will have a look at this when I have time, so I open the issue in case anyone wants to comment.

@carlocamilloni
Copy link
Member

I agree it would make sense

@maxbonomi
Copy link
Contributor

I also agree with this!

@GiovanniBussi
Copy link
Contributor Author

Related to this, I though we should make the construction of eggs parallelizable. This could be done in the following way:

  • Each time an egg is processed we only create a zip file (say eggs/19/003.zip). This file contain a portion of the finale _data/eggs.yml file plus all the generated md files.
  • Once all eggs are done we concatenate the individual yml files and push to the website.

If processing becomes slow, we could parallelize the generation using multiple travis jobs (each job processes some of the eggs), provided we find an easy way to combine the resulting files. Probably can be done by taking a hash of, say, plumed-nest version, plumed version, and nest.yml file (in principle given these three things the build is reproducible) and saving the resulting zip files somewhere (even on a temporary github repository where plumedbot has write permission).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants