This repository contains the software development guidelines for LVL.
Center for Analysis and Design of Intelligent Agents, Language and Voice Lab
Reykjavik University
Click to expand
These guidelines should help us achieve the following goals:
- Make our lives easier by not having to reinvent the wheel in each of our induvidual projects and find good resources and guides which we are all aware of.
- Help us write quality software by following good software development proceedures.
- Enable collaboration by having collaboration as a part of our workflow.
The guide can be summarized as the following:
- Have a good
README.md
, see our template. - Provide a
License
. - Write documentation within the code so that you can automatically generate it later (if you need to).
- Use
git
for semantic versioning. - Test your code, preferably automatically.
- Setup "GitHub actions" to run testing and linting.
- Check out our examples.
- Compute for information about the cluster (called Terra) at LVL.
- SĂŤM guidelines for writing and delivering software.
- Icelandic NLP resources to get an overview of existing icelandic resources.
- Máltækni fyrir Ăslensku - (in Icelandic) overview of language resources from the Language Technology for Icelandic 2019-2023
The SĂŤM guidelines define the deliverable types APP, MOD, ADD-ON, WEB and RES.
These types are quite abstract and LVL does not deliver all of them. Due to this we further break down these deliverables to offer more concrete guidelines:
- UI (webpage, mobile, browser-plugin) = WEB, APP or ADD-ON
- Server (restful webserver) = WEB
- Library (python package) = MOD
- Model (trained model, runnable trained model) = RES, APP or MOD
- Command line client (python package, bash) = APP, MOD
Translating from the SĂŤM requirements to these deliverables will need to be discussed with a SĂŤM project manager to answer the question of "what do they expect?".
To contribute to this project please submit a pull request to the master branch and request a review from someone in the software guidelines team. If you have a lot of suggestions feel free to make multiple requests.
- This guide should be short and simple and mention what is required or optional
- Offer examples
- Add links to good resources
- Gunnar Thor Ă–rnĂłlfsson gunnaro@ru.is
- Judy Yum Fong judyfong@ru.is
- Safa Jemai
- Staffan J. S. Hedström staffanh@ru.is
- Smári Freyr Guðmundsson smarig@ru.is
- Ăžorsteinn DaĂ°i Gunnarsson thorsteinng@ru.is
Word | Meaning |
---|---|
ADD-ON | a plugin to a larger framework |
APP | stand-alone application |
CD | Continuous Deployment / Continuous Delivery |
CI | Continuous Integration |
MOD | a module which can be embedded into other applications |
RES | language resource |
IDE | Integrated development environment |
Try to maintain a standardized project structure across your projects, as they always need to contain certain things. We suggest the following:
README.md # See more information below
LICENSE # See license section
docs/ # Contains automatically generated documentation in HTML.
Other scripts such as .py and .sh files should be in the root folder.
A readme.md template can be found here.
All of the projects we work on need to have licenses. A single LICENSE file, mentioned and linked from the README, suffices. Adding a license header in every single file is optional.
If a project does not contain a license, the default copyright is in place which means that no-one is allowed to make derivative work with your code.
We want our code to be freely available for everyone so we prefer permissive licenses such as
- Apache 2.0 (quite permissive)
- MIT License (very permissive)
- CC BY 4.0 (for resources)
For more help choosing a license, see Choosing a license. Keep in mind that if you are working through SĂŤM you have agreed to use open licenses such as these. Furthermore, according to RU are the copyright owner.
When a license has been chosen, simply copy the license text, fill in any additional information required (like copyright owner and year) and write it to a LICENCE file in the repository.
We all know that documentation is key if software is supposed to be usable.
Writing documentation can be hard but we have some suggestions to make it easier, but keep in mind that a good #Running
section the README.md
is complementary to the documentation, i.e. you need to write both.
- The
#Running
section handles common use-cases. This can be seen as a small user guide. - Write the documentation for functions and classes in your code using your language's conventions since all (common) languages have tools which can extract this documentation. This documentation is more detailed than the section but can also contain the same examples.
- For existing repositories, follow the coding conventions that are already in the repository.
- Use easy to understand variables and function names, that is avoid ambiguous names.
- Settle on a single format for the documentation. This is for automatic documentation generation.
- When you have settled on a format, find a tool which supports that format which can generate HTML and place it in the
docs/
folder. - Write documentation in English, unless you have a good reason not to do so.
- Be sure to define accepted inputs and return values.
We suggest the following format and tool for Python
- Format: Google. It's easily readable in code, widely support and not verbose.
- Tool:
Sphinx
withnapoleon
Given that you have generated HTML pages using your documentation tool, this documentation can be hosted on GitHub using github-pages. This allows us to host a website directory from our code respositories. This avoids us having to host the documentation in some remote server.
For larger projects a developer guide can be helpful for newcomers. These contain installation guides and contribution guidelines or any information that would help a developer use and modify this codebase faster.
User guides are (generally) not as technical as documentation and are thought for the end-user. For larger projects which require a user guide the guide should be in the language of the user. These guides should contain a lot of examples and lead the user through common-use cases. For example:
- This could be implemented as a tutorial within a webpage with popups that guide the user through the UI.
- This could be a video guide which go through the same steps as above.
- This could be implemented as a Wiki on GitHub.
- This could be implemented as a long
#Running
section. - If working with the command-line your program should support
--help / -h
commands which should offer some text explaining what the script does, examples and description of parameters. For Python argparse and click make it easy to get this functionality.
Within Cadia-LVL, using git for version control is required. A clean repository with descriptive comments makes for a good representation of your project which makes it easier for new developers to join it. The following is a way to maintain a project as such.
We use the GitHub flow workflow. We further clarify how we use this workflow in the next sections.
All merges to the master branch should include semantic version tags as listed below.
Here is a short guide
In short, the master branch should always contain production ready code. To achieve this, create "feature branches" from the master branch. In the feature branch you develop your changes. When the work is done, create/open a "Pull Request" in GitHub.
When you have finished working on the feature branch you should create a pull request (PR) to the master branch. Assign someone other than yourself to review the pull request (the code). If you request more than one reviewer, the default standard at Cadia-LVL is that only one person must review and approve the PR. The reviewer is responsible for making sure that everything is (within limits) tested and documented. If the reviewer has issues with the pull request (very common) the reviewer requests changes and repeats this process until they are satisfied. When there are no more issues, the reviewer approves the pull request and merges it into the main branch.
To clearly state the benefits of this, if enforced:
- No-one works in isolation, more people understand the project.
- Code quality generally increases.
- More tests are written.
- The project is well documented.
For more information about pull requests see here
Every commit message should include a short description of work being commited. Pull request comments should be more detailed.
Version tags can be informative, especially to current users. Given a version number v[major].[minor].[patch], increments represent the following:
- Major: Incompatible API changes, when you break something
- Minor: Added backwards compatible functionality, when you add features
- Patch: Backwards compatible fixes, when you fix bugs
- Read more about semantic versioning and how to use tags to apply semantic versioning.
- Developers and maintainers of a project should always Watch their repos to be notified of all issues and pull requests created.
How do you know that your code works? No-one writes code that is without bugs, and even though you could, testing the code might provide useful feedback (design, userability, architecture, etc.). We suggest testing your code often and thouroughly. Since doing that "manually" can be dull and cumbersome, we further suggest making these tests automatic.
Unit tests are conceptually the smallest tests possible, they can be thought of as "function tests", i.e. testing wether the function works. All projects should attempt to do as much unit testing as possible. At minimum aim to have tests for mission critical algorithms and functions. Every programming language has a unit testing framework, google it.
- Python: pytest
A huge part of the web is portability and accessibility, so making sure your website works on all major devices and platforms is very important.
When testing your website keep in mind the demographic that will be using the site, what devices they are using (mobile vs. desktop), and which browsers. Preferebly test each deplpoyment on these devices and browsers. You can see browsers usage statistics here to decide which browsers to test.
Major browsers on desktop are:
- Chrome
- Firefox
- Safari
- Edge
Major browsers for mobile are:
- Chrome
- Safari
- Samsung Internet
Caniuse.com is helpful if you are unsure about browser support for specific features.
If for some specific and clear reason a particular browser is needed make sure it is clearly stated on the page. You can use the User-Agent HTTP header to display warnings to all unsupported browsers.
Keep two entirely separate deployments up and running, staging and production. Deploy all changes first to the staging environment (this can be done automatically).
Preferably have multiple users test all major changes on the staging deployment before deploying to production. This part could be done automatically but is often time consuming and difficlut to keep up to date. Suggested tools for automated testing: Selenium with Python and Cucumber
User testing is a great way to see how the users interact with your product. However, user testing is difficult to do well. When doing user tests keep the following points in mind.
-
User tests should only be used if you have planned time to make adjustments
- There is no reason to spend time user testing if the results won't be used to make adjustments and improve the product.
-
User testing should focus on a specific task
- Focusing on a specific scope or task helps to keep the user focusing on what is important. This does not mean the same user can not test different tasks, only that each task you give the users should be clearly defined.
-
Up to 4 people for tests
- Asking too many users to do the same tests will only result in more of the same responses. Every users experience matters and should be taken into account. If you have more users to test, think about trying to test different things or do another round when adjustments have been made from the first one.
-
Testing prototypes often gives no useful information
- Users will most likely point out things that you know are missing and are less likely to give the feedback you are looking for. This can be reduced somewhat by targetting very specific tasks early in the development, see point about specific tasks.
When writing code, additional tools can help you find bugs, make your code look consistent and generally help you write better code.
Linters perform static analysis on code without running it. They alert you of possible errors, missing documentation or overly complex code without writing tests. They are easily incorporated into the continuous integration system and should be run on "Push" to notify of errors. Using (some) linters is required in all projects. Adding linters to your IDE is very easy and can be configured to run on file-save.
For Python we suggest the following tools for all projects.
- Common errors:
flake8
- Type checks:
mypy
- Documentation checker:
pydocstyle
- Best practices and errors:
eslint
- Best practices and errors:
shellcheck
- Best practices and errors:
stylelint
Maintaining the same style across a project makes the code more readable. We suggest using a tool which automatically formats/styles your code. This eliminates inconsistent styles in the code and allows you to focus on the rather than the format.
We highly suggest black
Continuous integration is a way to build software (compile), automatically run tests, lint, check documentation and alert developers when something is wrong. We suggest running all these steps automatically in the CI system. We suggest using Github Actions as a CI system for all your projects.
Simply navigate to your repository and click the Actions on the right side of the Pull request button. Here you can choose to use workflows from others or click set up a workflow yourself to create your own. This defines the steps you would like to perform when you push changes.
Your workflow should run at minimum each time a push or a pull request is made to the master. It is also recomended to run a scheduled workflow once a day, in-case a dependency of your project is updated and breaks your code.
An example workflow can be found here
- Language and framework guidelines
- Packaging on Github (optional)
- Workflow triggers
- Further on triggering workflows.
- Further on jobs.
When a project should be released try to:
- Package it so that it can be easily used by other people.
- Make sure that it includes all the main parts mentioned in the 2.2. TLDR section.
- Make sure that all changes have been committed and a proper README file is in place so that the release process can be repeated by someone else.
All deliverables should be referenced with a git tag (f.ex. a version). This is done so that the code of a deliverable can be easily reviewed and for easy recreation.
- Such as for milestone 4, add the
m4
tag. Here's an example for the samromur-asr repo.
We suggest using poetry
for dependency management, building and packaging.
For more complex deliverables (which have many dependencies) we also recommend packaging the deliverable using docker.
- An example of good documentation is provided: kaldi-asr
- Web Technologies reference
- bash help usage example
- Better Programming has a short list of Bash Best Practices
- Explains the Kaldi folder structure: kaldi-for-dummies
CC BY 4.0