GitHub

Brief

Wrapping of popular neural network libraries and providing of unified interface
Brute force scanning of neural network configurations
Graphic evaluation of training history

Code example

from blackboxes.box import Black

X = [[...], [...], ...]  # train input (2D ArrayLike)
Y = [[...], [...], ...]  # target      (2D ArrayLike)
x = [[...], [...], ...]  # test input  (2D ArrayLike)

phi = Black()
y = phi(X=X, Y=Y, x=x, backend='torch', neurons=[6,4], trainer='adam')

Purpose

The blackboxes Python package serves as a versatile wrapper for various implementations of neural networks, facilitating switching between different backends like Keras, NeuroLab, PyTorch, etc. This flexibility enables users to leverage the specific strengths of each backend, optimizing performance for diverse hardware configurations. By offering this interoperability, blackboxes helps users avoid vendor lock-in and empowers them to harness the potential of the best neural network implementation for a given application.

Additionally, blackboxes specializes in finding optimal hyperparameters for neural networks. It employs brute force scanning to fine-tune model configurations. Moreover, blackboxes exploits the effect of random initialization of neural networks, guaranteeing the discovery of (almost) optimal configurations.

Motivation

Optimal hyper parameters of neural networks can be difficult to estimate from theoretical considerations. It is mathematically proven that neural networks work effectively for most regression problems of higher complexity. However, algorithmic instructions for finding the optimal network configuration are often not available. Moreover, selecting from multiple optimal network structures contributes to achieving sufficient model performance.

Therefore an automatic configuration of network parameters is being proposed. This covers

variations of the number and size of hidden layers
activation functions of hidden and output layers
parameters of early stopping of the training or of deacy of weights
the effect of random initialization of the network weights etc

Options for finding the optimal configuration

Brute force scanning of hyper parameter space (slow, but transparent)
Automatic solutions such as Google’s AutoML (automatic regulariztion, but closed hood with the risk of insufficient model understanding)

Brute force scanning has been employed due to its explicit transparency and robust implementation.

This exhaustive search method relies solely on guessing wide parameter ranges and eliminates the risk of the algorithm getting trapped in local optima.

Implementation

Class BruteForce in module bruteforce performes nested search loops over selected hyper parameter ranges.

Figure 1: Loops (MSE: mean squared error)

The best configuration is chosen based on the mean squared error, as detailed in the module metric. BruteForce uses different backends such as NeuroLab, TensorFlow, PyTorch, etc.

Example: Sine curve

test_blackboxes_box.py is an example using synthetic data in 1D space with the backends TensorFlow, PyTorch, and NeuroLab.

    import numpy as np
    from blackboxes.box import Black

    N = 1000                    # number of training sets
    n = int(np.round(1.4 * N))  # number of test sets
    nse = 5e-2                  # noise

    # X and Y is training data, x is test input and y is prediction
    X = np.linspace(-2. * np.pi, 2. * np.pi, N).reshape(-1, 1)
    dx = 0.25 * (X.max() - X.min())
    x = np.linspace(X.min() - dx, X.max() + dx, n).reshape(-1, 1)
    Y_tru = np.sin(X)
    Y = Y_tru + np.random.uniform(-nse, +nse, size=X.shape)
    y_tru = np.sin(x)

    for backend in [
        'neurolab',
        'tensorflow'
        'torch',
    ]:
        phi = Black()
    
        y = phi(X=X, Y=Y, x=x,
            activation=('leaky', 'elu',) 
                if backend == 'tensorflow' else 'sigmoid',
            backend=backend,
            epochs=150,
            expected=1e-3,
            learning_rate=0.1,            # tensorflow learning rate
            neurons=[[i]*j for i in range(4, 4+1)       # i: neurons  
                           for j in range(4, 4+1)],       # j: layer
            output='linear',
            patience=10,      # delay of early stopping (tensorflow)
            plot=1,           # 0: none, 1: final only, 2: all plots 
            rr=0.1,                 # bfgs regularization (neurolab)
            show=1,
            tolerated=5e-3,
            trainer='adam' if backend != 'neurolab' else 'bfgs',
            trials=5,   # repetition of every training configuration 
            )

Results

The training data and the true values are plotted in Figure 2.

Figure 2: Training data and true values without noise

Figure 3 shows the history of the mean squared error of all trials for the TensorFlow backend.

Figure 3: Mean squared error history of all trials

In Figure 4 the history of the five best trials out of all trials plotted in Figure 3 is shown.

Figure 4: Mean squared error history of five best trials

The resulting errorbars are summarized in Figure 5.

Figure 5: Errorbars of all trials

It is evident that conducting a single training session is risky, as illustrated by the mean squared error (MSE) of training with leakyReLU in Figure 5. The first trial (#0) fails entirely. Therefore, it is advised to perform a minimum of 3 repetitions.

Example: UIC airfoil + noise dataset

This real-world example with 6 input, 1 output and 1503 data points is taken from the UIC database:

https://archive.ics.uci.edu/dataset/291/airfoil+self+noise

Each of the 5 hidden layers contains 8 neurons. The trainer is adam, the types of activation of hidden layers are: (elu, leakyrelu, sigmoid) and every configuration was repeated 5 times.

Figure 6 shows the history of the mean squared error of all trials for the TensorFlow backend.

Figure 6: Mean squared error history of all trials

The resulting errorbars are summarized in Figure 7.

Figure 7: Errorbars of all trials

A single training session is risky, as indicated by the mean squared error (MSE) of training with the sigmoid activation function in Figure 6. Both the first and second trials (#0 and #1) with sigmoid activation fail. Although the influence of the choice of activation function is minimal, there is an indication that the MSE variation with the leakyReLU activation function is less than the variation with other activation functions. In contrast, the sine curve example has shown that leakyReLU is not a good choice. Therefore, it is recommended to conduct a minimum of 3 repetitions.

Conclusion

The required number of training repetitions is highly problem-specific in regression analysis of measurements. There are examples where a single training is sufficient, and examples where multiple random initializations of the network weights are definitely needed. Relying solely on a single training of a network configuration on a new dataset can pose a substantial risk of missing an acceptable solution. A preference for a particular optimizer or activation function for minimizing the MSE variation across multiple trials has not been identified. Therefore, brute force scanning of the network parameter space is recommended. The random initialization of weights should be repeated 3-5 times for each network configuration.

Dependencies

Module neuralnlb is dependent on package neurolab [NLB15]
Module neuraltch is dependent on package torch [PAS24]
Module neuraltfl is dependent on package tensorflow [ABA15]

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.idea		.idea
blackboxes		blackboxes
doc/fig		doc/fig
tests		tests
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brief

Code example

Purpose

Motivation

Options for finding the optimal configuration

Implementation

Figure 1: Loops (MSE: mean squared error)

Example: Sine curve

Results

Figure 2: Training data and true values without noise

Figure 3: Mean squared error history of all trials

Figure 4: Mean squared error history of five best trials

Figure 5: Errorbars of all trials

Example: UIC airfoil + noise dataset

Figure 6: Mean squared error history of all trials

Figure 7: Errorbars of all trials

Conclusion

Dependencies

About

Releases

Packages

Languages

dwweiss/blackboxes

Folders and files

Latest commit

History

Repository files navigation

Brief

Code example

Purpose

Motivation

Options for finding the optimal configuration

Implementation

Figure 1: Loops (MSE: mean squared error)

Example: Sine curve

Results

Figure 2: Training data and true values without noise

Figure 3: Mean squared error history of all trials

Figure 4: Mean squared error history of five best trials

Figure 5: Errorbars of all trials

Example: UIC airfoil + noise dataset

Figure 6: Mean squared error history of all trials

Figure 7: Errorbars of all trials

Conclusion

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages