ann.Rmd

---
title: "Artificial Neural Network - Classification of Stars and Simple Stellar Populations applied to PASTA data set"
author: Morgan Camargo, Paula Coelho, Alessandro Ederoclite
output: 
  html_notebook: 
    toc: true

---

# The first ANN implementation

An ANN structure is constituted by layers of nodes, so that all nodes in the $i$-th layer connect with all nodes in the $(i+1)$-th layer, $i \in \left\{1,...,N-1\right\}$ ($N$ is the number of layers). Each node carries a number, which can be an input sign (first layer case) or a probability (in the others). Below is shown a scheme of our ANN.

![](ann.png){width=80%}


This is an ANN classification algorithm written in R. First of all we implement the
functions we use to compose the network.

Activation functions convert an input signal of a node to an output
signal that can be used as input in the next layer. Here we use the
Sigmoid and Softmax functions.

```{r}
# sigmoid activation function: f(x) = 1/(1+exp(-x))
sigmoid = function(x, d = FALSE) {
  if (d == FALSE) {return(1.0/(1 + exp(-x)))}
  else {return(x*(1.0 - x))} }

# softmax activation function: f(xi) = exp(xi)/sum(exp(xi)), xi the ith component of the vector x
softmax = function(A) {return(exp(A) / rowSums(exp(A)))}
```

The next function loads the dataset and splits it into the three samples
we need: training (60%), validation (20%) and test (20%). 


```{r}
load_dataset = function(vals, tests) {
  
  dpath = "./catalog/xdata+ssp.txt"
  dataset = read.csv(dpath, sep = "")
  colours = c('FUV-r', 'NUV-r', 'u-r', 'J378-r', 'J395-r', 'J410-r', 'J430-r', 'g-r', 'J515-r', 'bp-r', 'r-J660', 'r-G','r-i','r-rp', 'r-J861', 'r-z', 'r-J', 'r-H', 'r-Ks', 'r-W1', 'r-W2', 'r-W3', 'r-W4')
  sample_percentage = 0.1
  n_ssp = 531 # how many GC ssp were imported from the data
  n_star = as.integer(n_ssp/sample_percentage) - n_ssp # the remaining of the set will be sampled stars
  
  colnames(dataset) = c('label', colours)
  # sampling the data
  ssp = dataset[dataset$label == 1, ]
  star = dataset[sample(nrow(dataset[dataset$label == 2, ]), size = n_star),]
  dataset = rbind(ssp,star)
  dataset = dataset[sample(nrow(dataset)),]  # sorting randomly
  
  X = matrix(unlist(dataset), ncol = 24)
  X = X[,2:24]
  Y = dataset$label
  
  vals = as.integer(vals*dim(dataset)[1])
  tests = as.integer(tests*dim(dataset)[1])
  
  # splitting
  X_validation = X[1:vals,]
  Y_validation = Y[1:vals]
  X_test = X[vals + 1:vals + tests,]
  Y_test = Y[vals + 1:vals + tests]
  X_train = X[(vals + tests + 1):dim(dataset)[1],]
  Y_train = Y[(vals + tests + 1):dim(dataset)[1]]
  
  samples = list("X_train" = X_train, 
                 "Y_train" = Y_train, 
                 "X_validation" = X_validation, 
                 "Y_validation" = Y_validation, 
                 "X_test" = X_test, 
                 "Y_test" = Y_test)
  return(samples) }
```

The dataset here is composed by 11 features of SSP’s (from line 1 to
636) and stars (the remain). It also has a column of labels, which indicates 
the class, 1 for SSP and 2 for star.

Given the trained model, this routine predicts the classification of
elements, based on the array of probabilities, which has the
probabilities of each object being a SSP or a star. Then it predicts the
classification taking the higher probability.

```{r}
predict = function(model, X) {
  fp = forward_propagation(model, X)
  probs = fp$probs
  colnames(probs) = c(1,2)
  choice = apply(probs,1,which.max)
  cname = colnames(probs)[choice]
  r = as.integer(cname)
  return(r) }
```

This routine starts the Ws (weight matrices) and bias arrays and train
the nn using the training sample and checking its performance on
validation dataset.


```{r}
build_train_model = function(ann_model) {
  
  # initialize the weigths (random values) and bias (=0)
  W1 = matrix(rnorm(ann_model$n_input_dim * ann_model$n_hlayer), ann_model$n_input_dim, ann_model$n_hlayer) / sqrt(ann_model$n_input_dim) 
  b1 = matrix(0L,1,ann_model$n_hlayer)
  W2 = matrix(rnorm(ann_model$n_hlayer * ann_model$n_output_dim), ann_model$n_hlayer, ann_model$n_output_dim) / sqrt(ann_model$n_hlayer) 
  b2 = matrix(0L,1,ann_model$n_output_dim)
  
  # define model which will contains Ws and biases
  model = list("W1" = W1, "b1" = b1, "W2" = W2, "b2" = b2)
  
  # loop over the n_passes
  for(i in 1:ann_model$n_passes) {
    
    # forward propagation
    fp = forward_propagation(model, ann_model$X_train)
    probs = fp$probs
    a2 = fp$a2
    # backpropagation
    model = back_propagation(ann_model, probs, a2, model)
    p = predict(model, ann_model$X_validation)
    s = score(p, ann_model$Y_validation)
    if(i%%50 == 0) {print(sprintf("Score after iteration %i: %f", i, s))}
  }
  
  return(model) }
```

Let a1 be the array of features. The forward Propagation function gives
us the output layer signals, which are the classification probabilities.


```{r}
forward_propagation = function(model, X){
  # forward propagation
  W1 = model$W1
  b1 = model$b1
  W2 = model$W2
  b2 = model$b2
  
  a1 = X
  z1 = a1 %*% W1
  # adding b1
  for (i in 1:ncol(b1)) {
    for (j in 1:nrow(z1)) {
      z1[j,i] = z1[j,i] + b1[1,i] } }
  a2 = sigmoid(z1) # hidden layer activation function: sigmoid
  z2 = a2 %*% W2
  # adding b2
  for (i in 1:ncol(b2)) {
    for (j in 1:nrow(z2)) { 
      z2[j,i] = z2[j,i] + b2[1,i] } }
  probs = softmax(z2) # hidden layer activation function: softmax
  
  return(list("probs" = probs, "a2" = a2)) }
```

Then we have the BackPropagation function, with the Gradient Descent
algorithm. It changes the parameters values on a way that minimizes the
loss function, based on its derivatives with respect to the weights and
bias (Chain Rule).

```{r}
back_propagation = function(ann_model, probs, a2, model) {
  
  # loading model
  W1 = model$W1
  b1 = model$b1
  W2 = model$W2
  b2 = model$b2

  # backpropagating
  
  error = probs
  for (i in 1:ann_model$n_train){
    error[i,Y_train[i]] = error[i,Y_train[i]] - 1 }  # loss function derivative
  delta1 = error %*% t(W2) * sigmoid(a2, d = TRUE)

  
  #################################
  # There's some weird error on these next operations, which occur sometimes and sometimes it doesn't occur.
  # Basically both colSums and matrix multiplications returns tables with only NA values.
  # Entry values (a2, error, delta1, X_train) all seems fine, so no idea what's going on WHEN the error happens.
  #################################
  # weights
  dW2 = t(a2) %*% error
  dW1 = t(ann_model$X_train) %*% delta1
  # bias
  db2 = colSums(error)
  db1 = colSums(delta1)
  #################################
  # add regularization terms (b1 and b2 don't have rt)
  dW2 = dW2 + ann_model$reg_lambda * W2 
  dW1 = dW1 + ann_model$reg_lambda * W1
  
  # update parameter (gradient descent)
  W1 = W1 + -ann_model$epsilon * dW1 
  b1 = b1 + -ann_model$epsilon * db1 
  W2 = W2 + -ann_model$epsilon * dW2 
  b2 = b2 + -ann_model$epsilon * db2 

  # update parameters to the model
  model = list("W1" = W1, "b1" = b1, "W2" = W2, "b2" = b2)

  return(model) }
```

The score function calculates the rate of correctly classified objects.

```{r}
score = function(class_out, Y) {  # class_out := output (classification)
  count = 0
  for (i in 1:length(Y)) 
  { if (Y[i] == class_out[i]) 
      { count = count + 1 } }
  score = count/length(Y)
  return(score) }
```

Now we run the network. In the end, we have the final score, that shows
the network performance on a totally unknown dataset, the test sample.

```{r}
vals = .2  # validation split
tests = .2  # test split
samples = load_dataset(vals,tests)  # loading dataset

# training sample
X_train = samples$X_train
Y_train = samples$Y_train
n_train = nrow(X_train)

# validation sample
X_validation = samples$X_validation
Y_validation = samples$Y_validation
n_validation = nrow(X_validation)

# test sample
X_test = samples$X_test 
Y_test = samples$Y_test
n_test = nrow(X_test)

# ann parameter
epsilon = 0.001 # learning rate
reg_lambda = 0.00 # regularization term
n_hlayer = 10 # hidden layer
n_input_dim = ncol(X_train)
n_passes = 1000
n_output_dim = 2 # output

ann_model = list("X_train" = X_train, 
                 "Y_train" = Y_train, 
                 "X_validation" = X_validation, 
                 "Y_validation" = Y_validation, 
                 "X_test" = X_test, 
                 "Y_test" = Y_test,
                 "n_train" = n_train,
                 "n_validation" = n_validation,
                 "n_test" = n_test,
                 "epsilon" = epsilon,
                 "reg_lambda" = reg_lambda,
                 "n_hlayer" = n_hlayer,
                 "n_input_dim" = n_input_dim,
                 "n_passes" = n_passes,
                 "n_output_dim" = n_output_dim)

model = build_train_model(ann_model)  # building and training ANN model
```


```{r}
score_final = score(predict(model, X_test), Y_test)   
print(sprintf("Final Score: %f", score_final))
```

# Motivation for this work

Globular clusters (GCs) are important for the understanding of galaxies evolution. The study of such systems can benefit itself of current and future surveys (S-PLUS, J-PLUS, J-PAS), which brazilian comunity has access, which applies innovative photometric systems with broad-band and narrow-band filters. However, the separation of extra-galactic GCs from faint stars in these images is quite challenging. Recently our group developed a pipeline to detect GCs candidates in J-PLUS images (`GCFinder`), but the list of candidates shows a still high fraction of false positives: stars that are listed as GCs candidates by our pipeline. This problem has motivated the present project, in which we study the viability of using supervised machine learning to separate faint stars from extra-galactic GCs, using the multitude of colors available in these surveys. Thus we implemented the presented here artificial neural network trained with stars and GCs simulated data. 

# Taking a look at our data


```{r message=FALSE, warning=FALSE}
# Loading packages

require(tidyverse)

```

Our data is stored in file `xdata_nl.txt`. 

```{r}
datafile <- "./catalog/xdata+ssp.txt"
datasample <- read.csv(datafile, sep="")
colours = c('FUV-r', 'NUV-r', 'u-r', 'J378-r', 'J395-r', 'J410-r', 'J430-r', 'g-r', 'J515-r', 'bp-r', 'r-J660', 'r-G','r-i','r-rp', 'r-J861', 'r-z', 'r-J', 'r-H', 'r-Ks', 'r-W1', 'r-W2', 'r-W3', 'r-W4')
colnames(datasample) <- c('class',colours)
datasample$class = factor(datasample$class)
```



Let's take a look at the data:

```{r}
glimpse(datasample)

datasample %>%
   ggplot(aes(x = `g-r`, fill = class)) +
   geom_density(alpha = 0.5)

datasample %>%
   ggplot(aes(x = `g-r`, y = `u-r`)) +
   geom_point(aes(color = class), alpha = 0.5)

datasample %>%
   ggplot(aes(x = `g-r`, y = `r-i`)) +
   geom_point(aes(color = class), alpha = 0.5)
```