Skip to content

Latest commit

 

History

History
132 lines (93 loc) · 14.8 KB

File metadata and controls

132 lines (93 loc) · 14.8 KB

Automatic method for the recognition of hand gestures for the categorization of vowels and numbers in Colombian sign language

Método automático para el Reconocimiento de Gestos de la mano para la categorización de vocales y números en el Lenguaje de Señas Colombiano

This experiment developed a system which is designed to improve or facilitate communication between deaf people disability. The experiment has machine learning techniques to perform the due process of recognition of hand gestures of the Colombian sign language, recognizing the numbers from 0 to 5 and the vowels. This experiment works through 4 stages: Photo taking, pre-processing of the photo, extraction of the characteristics of the photo and finally performs the classification process for the identification of the gesture being performed. The image is captured by any camera that poses a good quality shot. Then, move on to the next stage of pre-processing, where you will for cleaning techniques to remove the shadow, the background and leave the image clean to perform the process of segmentation where the process of eliminating the noises that this pose takes place. In the feature extraction stage, it extracts the characteristics of the image that give us the mathematical methods like: Hu-moments, Fourier ellipticals, Histogram of oriented gradients (HOG) and Geometric characteristics. Finally, by means of the classifier of Multilayer perceptron Neural Network, Support Vector Machine and K-Nearest Neighbor it is obtained which the value of the sign, if it is a number or a vowel.

https://photos.app.goo.gl/8tvdcJYJdRHxcdYZ9

First step: Dataset

In this stage is done a field work, which consisted in taking photographs using a camera Fujifilm Finepix S4800, taking into account that for each sign were taken 3 photos of 3 different perspectives to the hand gestures of the vowels and numbers 0 to 5 of the L Sc. The photographs were taken to people with different hand sizes, skin color and ambient lighting. Most of the photos were taken using flash and a small part without flash, obtaining as a result a total of 3324 photos with resolution of 4608 x 2592 pixels in format. JPG.

https://photos.app.goo.gl/EhYS2sWFX1tfArZx8

Second step: Preprocessing

In order to clean the dataset, data preprocessing techniques are applied, such as: Image resizing, RGB conversion to YCBCR, binarisation, erosion and finally filling of gaps this process is done so that in the next stage the process of Data mining is clean.

https://photos.app.goo.gl/fGNiT1tBzD441XYW9

Third step: Feature Extractions

At this stage what is done is to represent numerically the image using the 4 methods used to extract characteristics: moments of Hu, histograms oriented to gradients, geometric characteristics and Fourier elliptic, obtaining as Result the corresponding numeric value characteristics vectors for each image. For each method used, a. txt document is generated with the image name followed by the characteristics vector and finally the tag.

After getting the features we store them in. txt files:

https://photos.app.goo.gl/pQD4LcochbsHBvxx6

Fourth step: Sampling

At this stage sampling is carried out using cross-validation of K-folds using 5 pages, the data is divided with percentages of 70 %, 75 % and 80 % to train the algorithm and percentages of 30 %, 25 %, 20 % for the test set, respectively, of each folio is Gets a validation score and finally calculates the average of the scores.

https://photos.app.goo.gl/rf9rrNQW6Fm8iR7h8

Fifth step: Classification

The grading stage consists of two key units, the feature extraction unit and the Pattern rating unit. First, characteristics such as Hu moments, gradient-oriented histograms, Fourier ellipticals and geometric characteristics are extracted. Applying PCA (principal component analysis) to HOG to take only the most relevant or important features. Following the pattern classification unit is implemented using the vector support machine method. Finally the patterns are recognized and they are classified in their different classes.

For the classification with Support Vector Machines, the following hyper-parameters were used:

Parameter Second Header
KERNEL rbf
GAMMA 0.0001, 0.001, 0.01, 0.1, 0.2, 0.5
C 0.01, 0.1, 1, 10, 100, 1000

https://photos.app.goo.gl/tJYim9w7Fymg31449

For the classification with Neural Network, the following hyper-parameters were used:

Parameter Second Header
ACTIVATION ['identity','logistic','tanh','relu']
SOLVER ['lbfgs']
LEARNING_RATE_INIT [0.0001]
HIDDEN_LAYER_SIZES [(100, 1), (100, 2), (100, 3)]

https://photos.app.goo.gl/CMpRy1aQexy1QCsq9

For the classification with K-Nearest Neighbors, the following hyper-parameters were used:

Parameter Second Header
N_NEIGHBORS [1,2,4,6,8,10]
ALGORITHM ['auto']
WEIGHTS ['uniform', 'distance']
N_JOBS [-1]

https://photos.app.goo.gl/iRLjPgWM78wWdyxe9

Results

The results show where it is divided by training percentages and testing according to the characteristics compared to the performance results of precision, recall and F1-score using the vector support machines, also highlights the best result That was obtained using that classifier.
According to table of results below, it can be seen that the best method for vector support machines is the gradient-oriented histograms with the Fourier ellipticals with a percentage of 70 %, 69 %, 69 % accuracy, recall and F1-score respectively using 80 % of Training set and 20 % test set.

Being:
P: Precision, R: Recall and F1: F1-score\n EF: Elliptic Fourier\n HOG: gradientes-oriented histograms\n HOG-PCA: Gradient-oriented histograms with main componentes analysis\n Hu: Hu Moments\n

https://photos.app.goo.gl/cmXqD2JJvhcf3in98

Then in Figure is shown the matrix of confusion for the best model of the SVM where you can see in the X axis predicted labels and on the axis and the true labels, this way you can spot graphically the model and detect in which Labels are confused more, for this case the classes with which the model is most confused with the class 0 and the U class.

https://photos.app.goo.gl/THYvWtazrRkZHtLy9

Conclusions

  1. With this method for the recognition of Colombian sign language can be tried with new signs extending the dataset, also is open research because it can be tested with new methods of preprocessing, extraction of characteristics, classification being able to get to raise even more the percentages of prediction.
  2. According to the methods used for the extraction of characteristics, based on table of results, the characteristics of the gradient-oriented histograms (HOG) are the ones that obtained the highest percentage.
  3. When performing the main component analysis process, it is concluded that this process will reduce the percentage of the model's performance measure slightly.
  4. The geometric characteristics did not give a good result because the images contain similar characteristics such as the area or contour, this results in the model being able to predict the signs in a bad way.