Hi there! I'm a Data Scientist with a keen interest in using analytics to understand human behaviour & sport science. I am also an advocate for the ethical use of data and take data privacy and model scrutinisation seriously.
Personal website: oli-portfolio.com
-
I enjoy writing blogs and currrently have three: my travel blog, music blog, NBA analytics blog. To streamline the process of creating a blogging framework (specifically, the Quarto framework), I authored an R package making convenience functions accessible to the open-source community. To view my code and download the package, visit my blogme Github repository.
-
Analysing the NBA. I discovered the ESPN NBA datasets in 2022, they are rich, well structured sources of game, player, tournament & misc basketball data. I have taken up the challenge to model NBA player injury. Namely, to answer the questions: When will a player will become injured? What are the driving factors behind injury? And, after being injured, for how long will a player be unavailable to play? - It is a work in progress. To assist with exploration of the datasets, and decision making in fantasy basketball leagues, I created a dashboard utilising the Rshiny framework. Data feeding the dashboard and analysis, is stored in the cloud on a relational database (using cockroach labs). A cron-job on my Raspberry Pi runs daily exrtacting data from the NBA API ingesting it into my cloud database. Using this approach, it is also possible to control and manage the process remotely from my mobile phone.
-
To further the NBA theme, I designed a machine learning algorithm to predict how many points a player will score in their next game. Again, my script is scheduled as a daily cron-job on my Raspberry Pi. To see predictions and view model performance (measured using mean-absolute-error), check out this dashboard.
-
New Zealand Political Donations. In 2021 I collected donation declaration forms from all NZ policitical parties, and extracted the names of donors and the amount donated. This was challenging because data is contained in PDF files, I used opensource OCR software (tesseract) to assist with information extraction. I wanted to carry out this research to better understand: Do any donors donate to multiple parties? And, who the donors are, for example, do they chair large corporations within NZ? - This project was an opportunity to structure my data in a graph database, and use graph algorithms to explore the common donors between parties.
-
Twitter Analysis of Aus & NZ Politicians. In 2020, I started collecting tweets from politicians across Australia and New Zealand. This was done with the intention to condunt sentiment & topic analysis with regards to the parties each politican represents. I did this to better understand: Do politicians within a single party deliver different messages on twitter (party uniformity)? What type of topics do different parties tweet about, and does this align with their party outlook? - As a side-quest, I also monitored the increase/decrease of followers politicians expereinceed after a particularly explosive/sensible tweet.
-
For my masters thesis (read here), I undertook work experience within a startup business called Agutary. I was tasked with finding a way to induce patterns from remote sensing satellite images in order to predict annual crop yield across Australian farms. As a deliverable, Agtuary requested a Python module that allowed crop yield predictions to be made, and in the process, outputting predictions, model metrics and model parameters to file. After experimenting, the final model was an XGBoost model with Hyperopt tuning. Due to the small sample of training data, observations were simulated using the Prophet module (developed by Facebook). I believe the simulated data points are what drove the success of this model. The success of this model actually outperformed the existing CSIRO’s (Aus govt body) crop prediction model.