This project focuses on building a predictive model to forecast the number of goals scored by each team in a football match. The model incorporates various data sources, including player ratings and team information, to create a robust prediction system. The process involves data collection, preprocessing, feature engineering, exploratory data analysis (EDA), model training, and deployment.
- Data Collection
- Preprocessing
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Model Training
- API Development with Flask
- Dockerization
- Deployment on Azure
Data was collected using Scrapy from Transfermarkt.
The following datasets were collected:
clubs_team_players_v1.json
national_team_players_v1.json
matches_v1.json
players_rating_v1.csv
The preprocessing stage involves reading the collected data, normalizing team and player names, and mapping players to their respective teams. This step ensures consistency and prepares the data for further analysis.
graph TD
A[clubs_team_players_v1.csv] --> B[team_to_players mapping]
C[national_team_players_v1.csv] --> B
B --> D[matches_v1.json]
D --> E[matches_with_players.json]
F[players_rating_v1.csv] --> G[name to rating mapping]
E --> H[football_matches_dataset_v1.json]
G --> H
The data cleaning stage involves handling missing values and ensuring the integrity of the data. This includes:
- Handling missing values by imputing the mean rating for missing player ratings.
Top 10 Teams with the Most Players | Home Team vs. Away Team Average Rating |
---|---|
Top 10 Players by Frequency of Appearances | Home Team vs Away Team Scores |
---|---|
Home Team Score vs. Home Team Average Rating | Away Team Score vs. Away Team Average Rating |
---|---|
Number of Home vs Away Team Players |
---|
- We use the
RandomForestRegressor
for prediction due to its robustness and ability to handle multiple output variables efficiently when wrapped in aMultiOutputRegressor
. - Hyperparameters are optimized using RandomizedSearchCV with a predefined parameter distribution and cross-validation
The model is evaluated using several metrics:
- Mean Absolute Error (MAE):
1.14
- Mean Squared Error (MSE):
2.4
- Root Mean Squared Error (RMSE):
1.5
An API is developed using Flask to serve the model predictions. The API allows users to input both teams name and receive predicted scores.
The application is containerized using Docker to ensure consistency across different environments. A Dockerfile is created to define the environment and dependencies.
The Docker container is deployed on Azure, making the model accessible as an api. Azure provides scalability and reliability for the deployed model.
The Football Goals Prediction API provides predictions for football goals based on various data inputs. Below are the details on how to use the API.
The base URL for accessing the API is:
http://footballgoalspredictionapi.f6cpcjeweuhgacfm.eastus.azurecontainer.io
Endpoint: /predict
Method: POST
Description: This endpoint provides a prediction for football goals based on the provided query parameters.
Query Parameters:
home_team
: Name of the home team (string, required)away_team
: Name of the away team (string, required)
Using curl
command
curl "http://myfootballapi.f7bqcuhbhaenayfk.eastus.azurecontainer.io:5000/predict?home_team=Ac%20Milan&away_team=atalanta%20bc"
{
"home_team": "Ac Milan",
"away_team": "atalanta bc",
"predicted_home_goals": 4,
"predicted_away_goals": 3
}
- Using Python
requests
lib
import requests
url = "http://myfootballapi.f7bqcuhbhaenayfk.eastus.azurecontainer.io:5000/predict"
data = {
"home_team": "Ac Milan",
"away_team": "atalanta bc"
}
response = requests.post(url, json=data)
print(response.json())
- Using JavaScript
fetch
API
fetch('http://myfootballapi.f7bqcuhbhaenayfk.eastus.azurecontainer.io:5000/predict?home_team=Ac%20Milan&away_team=atalanta%20bc')
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
- Using Java
HttpClient
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.URI;
public class ApiClient {
public static void main(String[] args) throws Exception {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://myfootballapi.f7bqcuhbhaenayfk.eastus.azurecontainer.io:5000/predict?home_team=Ac%20Milan&away_team=atalanta%20bc"))
.build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
System.out.println(response.body());
}
}