Hiring challenge for the MLE (Machine Learning Engineer), MLD (Machine Learning software Developer), or MLE internship applicants at Radix.
The goal of this challenge is to build a Machine Learning model to predict the rating that a user will give to a movie. Your solution will be evaluated on the performance of your Machine Learning model and on the quality of your code.
To succeed, you must implement a Python package called challenge
, which exposes an API with two endpoints. Calling the first endpoint (/ratings/train
) should create and train your model on the data provided. Calling the second endpoint (/ratings/predict
) should use the former model to predict ratings base on the user and the movie.
You solve the challenge when you reach a final score of:
- MLE applicants: Total score of 61% or more
- MLD applicants: Total score of 58% or more, with a code quality score of at least 95%.
- MLE internship applicants: Total score of 58% or more.
On a high-level, the challenge is divided into five different tasks, each with a corresponding estimated effort:
- Read the documentation (15min)
- Set up the environment (15min)
- Create and train your model (4h)
- Set up the
/ratings/predict
endpoint to create predictions using your trained model (1h) - Improve code quality (30min)
You will be provided with 3 datasets:
df_train
: each row in this dataset contains an id of a user, an id of a movie and the score that the user gave to the movie.df_test
: this dataset has only the id of a user and the id of the movie for which the model will have to predict the rating.df_movies
: this dataset contains a set of features related to every movie in the 2 other datasets.
All the users in df_test
are present also in df_train
, however all the movies in df_test
are new movies that do not appear in df_train
(they are still present in df_movies
).
🌟 IMPORTANT🌟 you can use any feature inside df_movies
in your model but you are REQUIRED to use at least the overview
of the movie for which you will have to get embeddings using the transformer distilbert-base-uncased
from the library https://huggingface.co/docs/transformers/index.
- Create a new repository
- Name your repository as
{your-name}-radix-challenge
. - Make sure the repository is
Private
.
- Name your repository as
- After you created the repository:
- Go to
Settings > Collaborators and teams > Add people
and addRadixChallenge
(challenge@radix.ai
) withRead
permissions so we can follow along with your progress. - Clone the repository onto your machine.
- Go to
- Once you have the repository local:
- Download the hiring challenge as a ZIP-file and unpack thhis in your cloned repository.
- Push the unzipped folder to GitHub to check if everything works.
Windows users: Please be aware that this challenge relies on bash-scripts that cannot run natively on Windows. However, you can run both the ./init.sh
and ./run.sh
scripts on Windows using WSL2.
All users:
- Initialise the environment by running
./init.sh
. This will create a virtual environment.env
. - To activate this environment, run
source .env/bin/activate
. - Check if everything works properly by running
./run.sh
. This script should halt when calling the training endpoint, since this endpoint is not yet fully implemented.
To solve this challenge, you are going to implement a Python package called challenge
that exposes an API. This API must be implemented using FastAPI, and should expose two different endpoints:
- A training endpoint at
localhost:9876/ratings/train
to which you can POST a CSV with headeruserId,rating,movieID
, whererating
is a number from 1 to 5. - A prediction endpoint at
localhost:9876/ratings/predict
to which you can POST a CSV with headeruserId,movieId
. Your model, trained in the previous endpoint, should use this data to predict movie ratings. Once finished, this endpoint should return a dictionary in the following format:{0: rating-of-first-test-example, 1:rating-of-second-test-example, ...}
.
The data used to train your solution will be downloaded automatically when running ./run.sh
.
Here you can find an extensive list of the tasks you need to implement in order to complete the challenge:
- Run
init.sh
to create a virtual environment in which the code can run - In the
/ratings/train
endpoint:- Create a model
- Train the model on the received data
- Save the model
- In the
/ratings/predict
endpoint:- Create the endpoint
- Load in the previously trained model
- Make predictions (ranked) on the received data
- Return your predictions in dictionary-format, as specified above
- Run
run.sh
to evaluate your implementation
You should implement your code in the challenge/
folder, where it is not allowed to add any files outside this folder. When doing so, you are free to add new files but please don't remove any. If you want to use dependencies that are not yet supported by this package, you can add these to environment.yml
. However, please don't remove any of the pre-existing dependencies since this might break the code.
It is not allowed to change the bash files (init.sh
and run.sh
) and the setup.cfg
file, since these are used to evaluate your solution. The last section addresses these files in more detail, however, it is not required for you to understand these scripts in order to solve the challenge.
Every time you run `./run.sh`, your solution will run and gets evaluated.
- Download the
df_train.csv
anddf_test.csv
datasets - Start your FastAPI server on port 9876
- POST
df_train.csv
tolocalhost:9876/ratings/train
to train your model - POST
df_test.csv
tolocalhost:9876/ratings/predict
to create asubmission.json
with the predicted rating for each user-movie pair - Stop your FastAPI server once complete, or when either training or evaluation fails
- Compute a score that indicates the quality of your code
- Upload
submission.json
to our evaluation endpoint to get a score on your predictions - Geometrically combine both of your scores: code quality score (6) and predictive score (7)
- Ask for your git username and email address, if not yet configured
- Print your final score and send the results to us for validation
Your solution will be evaluated on the performance of your Machine Learning model and on the quality of your code. Once you achieve the target score (see Introduction), one of our engineers will review your code. Based on that review, we will set up an interview with you. We will evaluate your final commit to your repository, so please make sure it runs properly.
The final score is the geometric mean of two components:
- Your predictive score evaluated using Mean Sqaured Error. The mse is clipped between 0.5 and 2 and converted to a percentage, the exact formula is
max(0,1-min(mse-0.5,1.5)/1.5)
- Your code quality score, which is the geometric mean of:
- Whether you added files outside the
./challenge
folder:0%
if you did,100%
otherwise - A percentage score based on
flake8
- A percentage score based on
isort
- A percentage score based on
pydocstyle
- A percentage score based on
mypy
- A percentage score based on the actual number of lines of code
- Whether you added files outside the
We would love to help you with the challenge, but unfortunately we can't. 😉 That being said, if you find a bug or have troubles setting up your environment, we're happy to help you at challenge@radix.ai!