This repository contains all resources for Homework 1 of TDT4173 fall 2021. A short writeup can be found in report.pdf
In this project I will be developing two well-known and simple (but occasionally very useful) machine learning algorithms. The interface I am implementing strongly resembles the one used in Scikit Learn.
k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.
A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.
- The <algorithm_name>.py files (e.g. k_means.py) contains the implementation of the algorithm.
- For each algorithm, there are two datasets that has been used to test the algorithm.
- The data_1.csv files contain easy problems.
- The data_2.csv files contain harder problems.
- The report.pdf file contains results, plots and descriptions of the work.
- The experiment.ipynb files are jupyter notebooks with code for loading the datasets, training, and evaluating the models.