Categories

# Projecting eigenvectors for fisher’s linear discriminant analysis

I am trying to implement this feature selection method that also allows me to classify data. Following this: paper

I have tried many workarounds and ways to implement this, but always my data gets weird. The public dataset that is made available has only to aquisition 1 and 4, to which i have extracted the features already. My problem is when i try to project the eigenvectors the result is something like this: I have tried using other externals references for this problem, but i always seem to fail to achieve the proper projection. In the paper the scatter between is given by the sum of the means of each person and the scatter within is given by the sum of the standard deviation.

After measuring the distance by evaluating the means, compared to the standard deviation, the clusters formed by every image looked alright as the distance was greater than the standard deviation.

What i want to do is have my dataset when i plot, something like this: This is the best minimal reproducible code, i could compress from a 50×20 matrix to 5×2, reducing it down to from what i am trying:

``````from matplotlib.pyplot import figure
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.preprocessing import LabelEncoder
person_1_drunk = np.asarray([(216, 236), (237, 192), (218, 189), (201, 239), (237, 210)])
person_1_sober = np.asarray([(202, 202), (204, 205), (203, 206), (201, 207), (202, 205)])

person_2_drunk = np.asarray([(234, 235), (240, 188), (219, 197), (213, 244), (235, 214)])
person_2_sober = np.asarray([(191, 211), (213, 178), (212, 201), (207, 245), (245, 222)])

person_3_drunk = np.asarray([(226, 240), (241, 173), (238, 237), (199,  243), (248, 221)])
person_3_sober = np.asarray([(206, 234), (233, 153), (223, 168),  (211, 238), (240, 226)])

#Finding SW for 3 persons
person_1_drunk_std = np.std(person_1_drunk, axis=0).reshape(2, 1)
person_1_sober_std = np.std(person_1_sober, axis=0).reshape(2, 1)

person_2_drunk_std = np.std(person_2_drunk, axis=0).reshape(2, 1)
person_2_sober_std = np.std(person_2_drunk, axis=0).reshape(2, 1)

person_3_drunk_std = np.std(person_3_drunk, axis=0).reshape(2, 1)
person_3_sober_std = np.std(person_3_drunk, axis=0).reshape(2, 1)

#transforming the data

person_1_drunk_t = np.dot(person_1_drunk_std, person_1_drunk_std.T)
person_1_sober_t = np.dot(person_1_sober_std, person_1_sober_std.T)

person_2_drunk_t = np.dot(person_2_drunk_std, person_2_drunk_std.T)
person_2_sober_t = np.dot(person_2_sober_std, person_2_sober_std.T)

person_3_drunk_t = np.dot(person_3_drunk_std, person_3_drunk_std.T)
person_3_sober_t = np.dot(person_3_sober_std, person_3_sober_std.T)

#sum of the Sw
Sw = np.sum((person_1_drunk_t, person_1_sober_t, person_2_drunk_t, person_2_sober_t, person_3_drunk_t, person_3_sober_t), axis=0)

#Finding Sb for 3 persons
person_1_drunk_mean = np.mean(person_1_drunk, axis=0).reshape(2, 1)
person_1_sober_mean = np.mean(person_1_sober, axis=0).reshape(2, 1)

person_2_drunk_mean = np.mean(person_2_drunk, axis=0).reshape(2, 1)
person_2_sober_mean = np.mean(person_2_sober, axis=0).reshape(2, 1)

person_3_drunk_mean = np.mean(person_3_drunk, axis=0).reshape(2, 1)
person_3_sober_mean = np.mean(person_3_sober, axis=0).reshape(2, 1)

#transforming the data
person_1_drunk_m_t = np.dot(person_1_drunk_mean, person_1_drunk_mean.T)
person_1_sober_m_t = np.dot(person_1_sober_mean, person_1_sober_mean.T)

person_2_drunk_m_t = np.dot(person_2_drunk_mean, person_2_drunk_mean.T)
person_2_sober_m_t = np.dot(person_2_sober_mean, person_2_sober_mean.T)

person_3_drunk_m_t = np.dot(person_3_drunk_mean, person_3_drunk_mean.T)
person_3_sober_m_t = np.dot(person_3_sober_mean, person_3_sober_mean.T)

#sum of the Sb
Sb = np.sum((person_1_drunk_m_t, person_1_sober_m_t, person_2_drunk_m_t, person_2_sober_m_t, person_3_drunk_m_t, person_3_sober_m_t))

#Preparing dataset
dataset_stack = np.vstack((person_1_drunk, person_1_sober, person_2_drunk, person_2_sober, person_3_drunk, person_3_sober))
pixel_index = [1, 2]

indices = sorted(list(range(0,int(dataset_stack.shape/30)))*30)
df = pd.DataFrame(dataset_stack, columns=pixel_index)
class_names = list('AB')
target_names = ["Class_" + c for c in class_names]
n_sets = df.shape//5
class_col = []
for name in target_names:
class_col += [name]*5
n_sets = df.shape//(5*len(target_names))
class_col = class_col*n_sets
df['class'] = class_col
X = pd.DataFrame(dataset_stack, columns=pixel_index)
y = pd.Categorical.from_codes(indices,target_names)

#extracting eigenvalues and eigenvectors
eigen_values, eigen_vectors = np.linalg.eig(np.linalg.inv(Sw).dot(Sb))
pairs = [(np.abs(eigen_values[i]), eigen_vectors[:,i]) for i in range(len(eigen_values))]

#Linearly transforming the data
w_matrix = np.hstack((pairs.reshape(2,1), pairs.reshape(2,1))).real
X_lda = np.array(X.dot(w_matrix))
le = LabelEncoder()
y = le.fit_transform(df['class'])

#plotting the data
plt.xlabel('LD1')
plt.ylabel('LD2')
plt.scatter(
X_lda[:,0],
X_lda[:,1],
c=y,
cmap='rainbow',
alpha=0.7,
edgecolors='w'
)
``````

This approach yields me a eigenvalue of

[80645.45889483 302.25308639]

I tried following an external example and got a different value for the eigenvalue of:

[ 3.70859717e-01 -1.38777878e-17]

The copy and run version of this approach is here: https://pastebin.com/tNYL85hF

Basically, im trying to find a solution for the paper experiment, specifically finding the eigen vectors and the eigen values in the same way, how to calculate the features the same way the author did, the external methods of calculating yields different values