Discipline of Machine Learning and Data Mining at the university
-
- Getting the data.
- Line plot.
- Axis labels and title.
- Second line.
- Legend.
- Gridlines.
- Text.
- Size of x ticks and y ticks.
-
human_bacteria_genome_comparison.ipynb
- Comparison of two DNA sequences: (1) human; vs. (2) bacteria.
-
distance_metrics_in_machine_learning.ipynb
- Continuous or numerical variables.
- Euclidean Distance.
- Manhattan Distance.
- Minkowski Distance.
- Vector Norm.
- Vector L1 Norm
- Vector L2 Norm
- Vector Lp Norm
- Categorical variables.
- Hamming Distance.
- Cosine Distance & Cosine Similarity
- Bonus.
- CountVectorizer.
- Continuous or numerical variables.
-
brazilian_population_growth_visualization.ipynb
- Data Visualization with Python.
- Visualize Data with Python.
- Get to know the MATPLOTLIB PYPLOT library.
- Build line, bar, scatter and boxplot charts.
- Manipulate data to build graphs.
- Predicting.
- Predicting population growth using a very simple datasus data.
- Comparing to complex IBGE prediction through ploting.
- Data Visualization with Python.
-
- Famous Iris Flower Species Dataset.
- The Iris Flower Dataset involves predicting the flower species given measurements of iris flowers.
- It is a multiclass classification problem. The number of observations for each class is balanced.
- Solve it in 3 steps: Calculate Euclidean Distance; Get Nearest Neighbors (KNN) and Make Predictions.
- Famous Iris Flower Species Dataset.