Comprehensive Study and Evaluation of Commonly used Dimensionality Reduction Techniques in Biometrics Field

In biometrics field, usually feature vectors have major length and contain ineffective information. This problem is so called “curse of dimensionality‟. Hence, there is a need for efficient dimensionality reduction technique to remove the redundant features and reduce the size of feature vectors to get high accuracy rate with fast performance. In this paper a comprehensive study of commonly used dimensionality reduction techniques: Principle Component Analysis, Linear Discremenant Analysis, and Generalized Discremenant Analysis, have been handled. Theoretical background of these techniques is illustrated along with the methods used to calculate their projection spaces then; practical implementation is conducted to find out and adopt the best one for retina based biometric authentication system. From this extensive study, it has been concluded that PCA technique has a number of problems make it has a bad classification power. LDA technique has a number of problems make it impossible to implement in most cases of biometrics field, while GDA technique is more efficient than the PCA and LDA techniques for dimensionality reduction purpose. It has high classification power and consumes less computational time. Hence, GDA technique is adopted in the proposed authentication system.


INTRODUCTION
In biometrics, feature vectors usually suffer from the "curse of dimensionality" problem where redundant features in these vectors increase time complexity and degrade the performance of the authentication system. To solve this problem, there is a real need to remove ineffective features and transform dataset to a lower dimensional space [1][2] [3].
Dimensionality reduction is used as a preprocessing step in many fields of machine learning related to the data mining; one of them is the biometrics field. Dimensionality reduction can be handled by using two techniques, feature selection and feature extraction which are explained as the following [2] This research will concentrate on most commonly used feature extraction techniques: Principle Component Analysis (PCA) technique, Linear Discriminant Analysis (LDA) technique, and Generalized Discriminant Analysis (GDA) technique. A comprehensive study with practical implementation of these techniques has been presented. The main objective is to practically demonstrate the best one among them in order to adopt it for the dimensionality reduction and classification purposes in biometrics field.
Feature extraction algorithms can be categorized based on the presence or absence of class labels in the training set (learning method), as:  Supervised: these algorithms are used when feature vectors in the training set (system database) have their associated class labels. For example, figure (1) shows a given training set that consists of five subjects (classes); each of them has three sample images. Each feature vector in the training set is associated with a certain class label; when using the supervised method these labels are used to find the lower dimensional space.  (2) illustrates an example of two linearly and two non-linearly separable classes.
 Non-linear algorithms: when classes are not linearly separable as shown in figure (2-b), non-linear feature extraction algorithms are used to seek a non-linear projection to discriminate among such classes [6][7] [8].
The rest of this paper is organized as follows: Section 2 states the related works. Section 3 illustrates the dimensionality reduction techniques and all their related issues in detail. Section 4 handles the experimental results and discussion. The main concluded points are illstrated in Section 5.

RELATED WORKS
There is a number of research works in literature which have been made a study about the matter of dimensionality reduction techniques. For example, Sh. Wang et al. have presented in [9] several feature selection and feature extraction techniques for partial discharge pttern recognition.
In order to compare the performance of these techniques, they carried out partial discharge tests on artificial partial discharge defect models. Alaa Tharwat in [5] has handled an extensive study in order to understand PCA technique. Then, she implemented PCA technique in the real applications. She has handled the same study on LDA technique in Fig. 1 Figure  (3) shows the complete categorization of dimentionality reduction algorithms. Over the past few decades or so, a number of methods have been used to implement these algorithms. This research work will address three feature extraction methods: PCA, LDA, and GDA which are the most commonly used in biometrics field.

Principle component analysis technique
PCA technique is one of the most famous linear unsupervised feature extraction algorithms. It seeks the space which represents the direction of the maximum variance of a given dataset [5][7] [9][10] [11].
PCA has a number of objectives involving: seeking relationships between samples, extracting the most important features from a given dataset, removing aberrant features, like noise which have a great impact on the classification process, and reducing the dimension of the dataset by retaining only important features.
These objectives can be achieved by generating the PCA space [5] [9]. PCA space is used to transform a given dataset to a lower dimensional space by projecting all samples of the dataset onto this lower space. PCA space consists of k orthogonal principle components (PCs). In this research work, covariance matrix method is used to calculate PCs [5]: Where, ∑ is the covariance matrix of the dataset, and ⋋ are the eigenvectors and eigenvalues of the covariance matrix. Eigenvalues are scalar values and represent the robustness of the PCs. Eigenvectors are non-zero vectors which represents PCs themselves where, each eigenvector represents one principle component.
Principle components are uncorrelated and represent the direction of the maximum variance of the dataset. The first principle component (PC1 or v1) represents the direction of the largest variance of the dataset, where the second principle component (PC2 or v2) represents the direction of the second largest variance of the dataset, and so on. In other words, each principle component has a different robustness depending on the amount of variance in its direction. Usually, PCA space consists of the PCs that have the maximum amount of variance (maximum amount of the original data). In order to construct the PCA space (Vk), eigenvectors have to be sorted according to their corresponding eigenvalues. Then k eigenvectors that have the largest eigenvalues are selected as [5]: Hence, increasing the number of the PCs in the PCA space will increase the robustness of the PCA technique which is measured as [5]: Where, k is the number of selected eigenvectors and M is the total number of eigenvectors that are calculated from covariance matrix.
After constructing the Vk space, all samples of the dataset are projected onto this lower space as [5]: Where, R is the mean centering samples of the dataset and Y is the resultant lower dimensional dataset.

Main problems of PCA technique
PCA technique suffers from a number of problems which make it not the best solution for "curse of dimensionality" problem in the biometrics field. These problems are [1]:

 Information Packing Transform problem:
The direction of the PCA projection space is determined by the maximum variance of a given dataset (i.e. maximum amount of original dataset). This direction may be useless for the classification process since it increases the total scatter across all classes in that dataset which leads to a bad class separability. Also, the PCA projection space in this direction may preserve useless information which degrades the system performance [1] [12].
 PCA technique does not care about classes of a given dataset: it handles the overall dataset as a uniform matrix without concern about whether this dataset consists of one or more classes. It does not take the discrimination power into consideration.
Due to these problems, the PCA technique did not achieve satisfactory results when it was implemented in the proposed system as will be illustrated in section 4. So, there was a need to study the LDA technique to supersede the PCA technique.

Linear discriminant analysis technique
LDA technique is a very common linear supervised feature extraction algorithm. LDA transforms a given dataset into a lower dimensional space with more advantage than the PCA technique where the LDA projection space maximizes the ratio of the between-class variance to the withinclass variance. This special advantage is very important in biometrics field for the classification process since it guarantees maximum class separability [8][9][10] [11].

LDA algorithm
Al-Rafidain Engineering Journal (AREJ) Vol.25, No.2, December 2020, pp. 152-163 LDA lower dimensional space needs three main steps to be calculated. The first one is to find the separability between the different classes (the distance between the means of the different classes), which is represented by calculating the between-class matrix or variance [6][7]: Where, SB is the between-class matrix; c is the number of classes in the dataset; Ej is the number of samples in the j th class; μj is the mean of the j th class; μ is the total mean of all samples in the dataset [6] [7].
The second step is to find the distance between the mean of each class and its samples, which is represented by calculating the within-class matrix or variance [6][7]: Where, Iij is the i th sample in the j th class.
As with the PCA technique, eigenvectors represent the direction of LDA space where each eigenvector represents one axis of the new space. Also, the corresponding eigenvalues represent the robustness of these eigenvectors. Robustness of an eigenvector reflects its ability to discriminate among different classes by increasing the betweenclass variance and decreasing the within-class variance. Hence, eigenvectors have to be sorted in descending order depending on their corresponding eigenvalues. Then, the first k eigenvectors are selected to construct the LDA lower dimensional space (Vk).
To obtain a lower dimensional dataset, project all samples of a given dataset (X) onto the Vk space: = 1 A matrix is considered a singular matrix when it is square and does not have a matrix inverse, the determinant is zero; hence, not all columns and rows are independent.

Main problems of LDA technique
Although the LDA technique is one of the most commonly used feature extraction algorithms, it suffers from two essential problems: the linearity problem and the Small Sample Size (SSS) problem. In this section each problem is illustrated in detail with its some state-of-the-art solutions [ (4) shows the difference between LDA technique and PCA technique in terms of the used mechanism to construct the lower dimensional projection space. Features in this figure are extracted from phase resolved partial discharge pattern and partial discharge waveforms to represent and recognize typical defects. The LDA shows maximum separation between two classes (defect A and defect B) which leads to better performance than the PCA technique [9]. However, the main problems of LDA technique make it impossible to be implemented in the proposed system. So, there has been a need to study GDA technique.

Generalized discriminant analysis technique
Generalized Discriminant Analysis or Kernel Discriminant Analysis (GDA) is a nonlinear supervised feature extraction technique. GDA is a kernel version of LDA; it is the more general case and used in this research work to eliminate any shortcomings of both the PCA and LDA techniques. Similar to LDA, GDA seeks a projection space that transforms features into a lower dimensional space and maximizes the ratio of the between-class variance to the within-class variance. With the GDA space the most valuable information is preserved which indicates high classification efficiency and reduces the training time of the used classifier [1][7] [9][13].

Calculating GDA projection space
The GDA projection space is calculated as the following [1] [13]: To eliminate the linearity problem, GDA is based on a kernel function φ which transforms the original dataset X into a higher dimensional space Z where: φ: X→Z (11) Then calculate the between-class matrix of the non-linearly mapped data (in Z space): Where, represents the between-class matrix in Z space; represents the mean of j th class in Z space; represents the total mean of the dataset in Z space; The within-class matrix in Z space is calculated as: Where, represents the within-class matrix in Z space. The transformation matrix of the GDA technique (W) is calculated as: represents a vector of some real weights. Eigenvalues are calculated as:
Al-Rafidain Engineering Journal (AREJ) Vol.25, No.2, December 2020, pp. 152-163 Where, A represents the kernel matrix, it is of (M×M) dimension. If A is not reversible then the regularization process is used to eliminate the SSS problem. D represents a (M×M) block diagonal matrix. At this point, determine the k eigenvectors which have the largest eigenvalues to construct the GDA projection space Vk. Then, the lower dimensional dataset is calculated as: represents the mapped dataset (using kernel function).
As illustrated above, GDA technique transforms a given dataset into a higher dimensional space using kernel function to make its classes are linearly separable. The same steps of the LDA technique are then applied to the mapped dataset to reduce its dimension. It selects those eigenvectors which have best classification capability than those eigenvectors which best describe the dataset (as with PCA technique) [1][7] [9][12] [13]. Hence, it can be said that the GDA technique can eliminate the problems of the PCA (linearity and poor discrimination capability) and the problems of the LDA (linearity and SSS problem).

EXPERIMENTAL RESULTS AND DISCUSSION
In this section, the dimensionality reduction techniques which are mentioned in section 3 are implemented in a retinal-based identification syatem. This system has been designed as shown in figure (5). The objective of the following experiments is to assess the best technique among PCA, LDA, and GDA for the mentioned identification system.

Preparing the environment of experiments
All experiments in this section are conducted in the same environment which is composed of: Windows 10 Pro operating system, Intel (R) Core (TM) CPU @ 1.8 GHz, 8 GB RAM, and Matlab (R2019b). Also; these experiments are performed using the following databases: In RIDB database, each user has five images, so three of them were used to train the system and the remaining two images were used for testing. Moreover, 17 out of 20 individuals were chosen as registered users in the system dataset and 3 individuals were not registered in the dataset and were considered as intruders to the system. Hence, the system training dataset consisted of 51 retinal images (17 individuals × 3 training images). Whereas the system performance was tested using 34 retinal images as genuine users (17 individuals × 2 testing images) and 15 retinal images as impostors (3 unregistered individuals × 5 images).
 Digital Retinal Images for Vessel Extraction (DRIVE) database. This database was acquired in the Netherlands from a diabetic retinopathy checking program. Checking people consisted of 400 diabetic subjects between 25-90 years old. Forty images of resolution 768 × 584 pixels have been randomly opted from them to construct the online database. DRIVE database is rotated based on the "Data Augmentation" concept [15] [16]. Rotation angles applied to retinal images are: ±10˚, ±15˚, ±20˚, ±25˚, ±30˚, ±35˚. After these rotation processes, the number of images becomes 500.
For DRIVE database, 34 out of 40 individuals were chosen as registered users and 6 individuals were considered as intruders. The system training dataset consisted of 170 retinal images (34 individuals × 5 training images), whereas the system was tested using 272 retinal images as genuine users (34 individuals × 8 testing images) and 58 retinal images as impostors.
There is a number of criteria used to evaluate the performance of biometric authentication systems [14] But practically, the distribution of the matching scores (threshold values) is not continuous and the crossover point may not exist within these distributions. In this case EER can be calculated as:

Implementation of PCA technique
In this section we will investigate the effect of the number of selected PCs (k) to construct the PCA space, on the system performance. RIDB database is used to implement PCA technique. Computational time and accuracy rate are considered as evaluation criteria for experiment results. In this experiment, different numbers of PCs are used to construct the PCA space. As a result, the dimension of the projected training set and testing vector are changed based on the value of k. Figure (6) illustrates the results of this experiment.
This figure shows that computational time and accuracy rate of the system are proportional to the number of the selected eigenvectors. Hence, when using PCA technique, the trade-off between these parameters should be considered. As mentioned earlier, eigenvectors in the PCA space are sorted according to their robustness, where robustness of each eigenvector reflects its ability to discriminate among different classes. This means that increasing the number of eigenvectors preserves more important information in the projected feature vectors. Hence, as shown in figure (6-a), when increasing the number of eigenvectors from 0.1% to 50% of the total number, the identification accuracy increases from 39.6% to 85.3%. Also, increasing the number of selected eigenvectors increases the dimension of PCA space and thus the dimension of the projected feature vectors. This causes the computational time to be increased from 3.3 to 3.6 seconds as shown in figure (6-b).
When the number of the selected eigenvector is 50% of the total number, the size of the feature vector becomes 8000 in RIDB dadabase and 4500 in DRIVE database, which is considered a long vector. So, at this point increasing process is stopped since more increasing will require more computational time which is inconsistent with the real time objective of this research work. Also, the remaining eigenvectors which are to be selected have less ability for classification than those of first selection and it is not expected to considerably increase the identification accuracy.

Implementation of LDA technique
In biometrics field and especially in this research work, it is impossible to implement LDA technique as a dimensionality reduction method due to its linearity and the SSS problems. Features extracted from retinal images are non-linearly separable also; in biometrics field usually the system database has a number of samples for each subject less than the dimension of each sample (SSS problem). As mentioned earlier there is a number of the LDA variants used to eliminate the LDA problems; GDA is considered one of them. Using GDA technique, the maximum number of eigenvectors which can be selected to construct the GDA space is (number of classes -1). So, it can considerably reduce the dimension and preserve the most important information due to its high classification power. Results of implementing GDA technique for both RIDB database and DRIVE database are shown in figures (7, 8, 9 and 10). Figure (7) shows that the proposed system is not sensitive to threshold values in the range between 24 and 34. In this region, FAR = FRR = ERR = zero, which represents the ideal performance for high security level application.

Implementation of GDA technique
Hence, the OP of proposed system is determined to be here by making the threshold value of the classification process in the range of [24 to 34]. Figure (8) also demonstrates the excellent performance of the proposed system. It shows a good separation distance between genuine and imposter classes. Figures (9 and 10) are related to DRIVE database and also demonstrate the same exallent performance. This research practicaly demonstrates that GDA technique is more suitable than PCA and LDA techniques for dimensionality reduction purpose in biometric field. Also, it consumes a less computational time. The PCA has a number of problems (linearity problem, information packing transform problem, and PCA technique does not care about classes of a given dataset) which make it has a bad classification power. The LDA has a number of problems (linearity problem and small sample size problem) which make it impossible to implement in most cases of biometrics field.