This GitHub holds all my personal project's that I have worked on as a past time.
Project's are mainly focused on Data Science, Insurance Pricing & Reserving fields.
A mapping of these are laid out below.
It is simply the framework to integrate past data & statistics to predict future outcomes or project liabilities. There are 4 main techniques; Bayesian, Decision Trees, Support Vector Machines & Neural Networks. My project's utilizes mainly Bayesian & Decision Tree techniques. Hence, focused primarily on linear regression models.
.
An article publication aimed at explaining concepts to:
1. Generalised structure to Predictive Modelling
2. Alternative interpretations to various statistical model metrics
The article follows the generalized framework of:
- Data preparation
- - Preliminary data analysis, executing 4-Tier's of data cleaning. (Correct, Complete, Create, Convert)
- Exploratory Data Analysis
- - Uni- Bi- & Multi- Analysis
- Model Preparation
- - Data stratified Train/Test splits, Hyper parameter tuning, parameter evaluation metrics.
- - Feature Engineering (Quantity & Quality), Feature evaluation metrics
- Predictive Modelling (Classification Problem)
- - Ensembles (Hard & Soft Voting)
Click To View
In short, it is simply the automated process of extracting data from the web. Subsequently, cleaning any irregularities & conducting Exploratory Data Analysis to spot Trends & Patterns.
.
A Python Kernel written to automate repetitive clicking of 1,228c URLs & converting 1,000c PDF Tables into CSV to compile data.
Contents:
- 1. Collate online source code URLs & sub-page URLs
- 2. Download online data via URLs
- 3. Convert & Neaten PDF Table into CSV
- 4. Compile all CSV Tables
Click To View
.
After extracting Annual Insurance Data Returns in the Part 1 series, we proceed to analyze the data.
Contents:
- Patterns
- 1. Benchmark Range of ROC on Expense & Loss Ratios
- Trends
- 2. Growing reinsurance ceded abroad beyond the ASEAN region
- 3. Declining averages for Earned Premiums & Claims Incurred (with falling inflation rates)
- 4. Average ROC, Expense & Loss Ratios
Click To View
It is simply the analyzing of data sets to summarize characteristics & patterns. These include Uni- Bi- & Multi- Variate Analysis. Often discovering underlying relationships that conventional models overlook.
.
EDA Summary
1. Those who have had past experience of financial distress (target variable):
>Made lesser loans or exceed deadlines
>Tend to have lesser dependents & debt ratio & net worth
>As expected are of lower-tier income, But lower debt ratio
2. Ignoring mortality and time value of money (i.e.Annuities)
>Debt ratio & Net worth shows gaussian distribution against age
3. Those who had acts of debt delinquency (Made loans or exceed deadlines)
>Tend to be from the higher-tier income or Retired
4. Others
>The higher the income, the higher the debt ratio
>The higher the income, the lower the dependents
Click To View
It is simply applying the fundamental straight line concept of a Y = mx + C. In other words, the idea that variable relationships are 1-dimensional (positive or negative).
.
A Python Kernel aimed to:
- 1. Get a better understanding of the simplified predictive modelling framework
- 2. Grasp the logic behind different coding methods & concise techniques used
- 3. Comparisons between different models
- Coding Techniques :
- A.List comprehensions
- B.Samples to reduce computational cost
- C.Concise 'def' functions that can be used repetitively
- D.Pivoting using groupby
- E.When & How to convert and reshape dictionary’s into lists or dataframes
- F.Quickly split dataframe columns
- H.Loop Sub-plots
- I.Quick Lambda formulae functions
- J.Quick looping print or DataFrame conversion of summative scores
- K.Order plot components
- L.Create & Plot Bulk Ensemble comparative results
Click To View
In short, this projects contains a Python Kernel to automate the probabilistic claims simulation process for actuarial reserving calculations.
Reserving Method Used: Inflation Adjusted Chain Ladder
.
Article
or
Python Code Guide
or
Python Code v2
Present: Simulation supports Claim Numbers (Poisson, Negative Binomial) & Amounts (Gaussian, LogNormal).
Ongoing:
1. Support Bornhuetter-Ferguson Method (BF).
Contents:
- 0. Assumptions
- 1. Development-Year lags
- 2. Incremental & Cumulative claim amounts
- 3. Uplift past inflation for incremental amounts & Derive cumulative
- 4. Individual Loss Development Factors (LDFs)
- 5. Raw preliminary view of triangle
- 6. Establish predicted lag years data frame
- 7. Impute latest cumulative amounts
- 8. Simple Mean & Volume Weighted LDFs & 5/3 Year Averages & Select
- 9. Predict future cumulative amounts
- 10. Calculate incremental amounts
- 11. Project future inflation for incremental amounts
- 12. Reserve summation
Click To View
Prior to learning Python coding language, I had to refine the basics. Since Excel & VBA are broadly deemed essential skill-sets, I thought I build some personal models. Ideas are inspired whilst at my work placement tenure at a consultancy company. The main objective was to ease manual & repetitive tasking's.
.
A reproducible Excel VBA programme that automates bulk simultaneous word
document mail merges. Data entry checks (file exists etc.) & cleaning (excess
spaces, invalid file directory ...) are done by the coding as well. This code
does NOT use the standard mail merge function that operates ONLY on 1-single
document. Instead allows running on mass word documentations.
Inspiration:
Whilst assisting my previous employer to prepare clients for the European
General Data Protection Regulations (GDPR) privacy documentations, I created
this programme to streamline over 30hours of manual work.
.
A reproducible Excel VBA programme that automates multiple simultaneous email
communications if recipients receive overlapping/same attachments or spreadsheet
tables.
Inspiration:
A responsibility of mine at a previous company involved weekly roll-forward
projection updates. I found this repetitive & build this model to automate the
job. It mitigated manual human input errors & eased the job handing over
process.